- University of Utah`s Tomography and Modeling

Download Report

Transcript - University of Utah`s Tomography and Modeling

Multiscale Waveform Inversion and
High-Performance Computing using
Graphics Processing Units (GPU)
Chaiwoot Boonyasiriwat
Feb. 6, 2009
Part I
Multiscale Waveform Inversion:
A Blind Test on A Synthetic Dataset
Outline
•
•
•
•
•
Previous Results on Marine and Land Data
Goals
Methods and Data Processing
Numerical Results
Summary
1
Gulf of Mexico Data
480 Hydrophones
515 Shots
a) Virtu
b) Original CSG 1
0
0
0.5
0.5
12.5 m
dt = 2 ms
Tmax = 10 s
1
Time (s)
1
1.5
1.5
2
2
2.5
2.5
3
1
2
1.5
Offset (km)
2.5
3
1
1.5
Offs
2
Kirchhoff Migration Images
3
Kirchhoff Migration Images
3
Comparing CIGs
4
Comparing CIGs
CIG from Traveltime Tomogram
CIG from Waveform Tomogram
4
Comparing CIGs
4
Comparing CIGs
CIG from Traveltime Tomogram
CIG from Waveform Tomogram
4
Comparing CIGs
4
Comparing CIGs
CIG from Traveltime Tomogram
CIG from Waveform Tomogram
4
Saudi Arabia Land Survey
1.6 km
Y-Coord. (km)
100 m
0 km
0 0
X-Coord. (km)
50
1. 1279 CSGs, 240 traces/gather
2. 30 m station interval,
max. offset = 3.6km
3. Line Length = 46 km
4. Pick 246,000 traveltimes
5. Traveltime tomography -> V(x,y,z)
2
-3.6
Offset (km)
3.6
5
Brute Stack Section
0
2.0
3920
CDP
5070
6
Traveltime Tomostatics + Stacking
0
2.0
3920
CDP
5070
7
Waveform Tomostatics + Stacking
0
2.0
3920
CDP
5070
8
Outline
•
•
•
•
•
Previous Results on Marine and Land Data
Goals
Methods and Data Processing
Numerical Results
Summary
9
Goals
• Blind Test
• Sensitivity Test
 unknown source wavelet
 unknown forward modeling
10
Outline
•
•
•
•
•
Previous Results on Marine and Land Data
Goals
Methods and Data Processing
Numerical Results
Summary
11
Methods and Data Processing
• Traveltime tomography
Time Picking: Shengdong
• Low-pass filtering
2 Hz
5 Hz
• Source estimation
• Waveform inversion
12
Outline
•
•
•
•
•
Previous Results on Marine and Land Data
Goals
Methods and Data Processing
Numerical Results
Summary
13
Original CSG
Time (s)
0
5
0
Offset (km)
5
14
Numerical Results
Kirchhoff Migration Image overlaid with Traveltime Tomogram
Depth (km)
0
1
0
Location (km)
10
15
Numerical Results
Kirchhoff Migration Image overlaid with Waveform Tomogram
Depth (km)
0
1
0
Location (km)
10
16
Results
Common Image Gathers obtained using Waveform Tomogram
Depth (km)
0
1
Offset (km)
0
0
Location (km)
10
0.5
17
Waveform Tomogram vs. True Velocity
Waveform Tomogram
True Velocity
Depth (km)
0
1
0
Location (km)
10 0
Location (km)
10
18
Investigation I
m/s
a) True Velocity
0
True Model
00
2500
2000
Depth (km)
Depth
Depth
(km)(km)
m/s
m/s
2500
3000
a) True Velocity
0.2
0.2
2000
1500
0.4
1500
1000
0.4
2
4
6
8
10
0.5 0
0
2
4
6
8
10
Waveform
Tomogram
Waveformb)Tomogram
using My Data
00
b) Waveform Tomogram
1000
1000
m/s
m/s
3000
Depth (km)
Depth
Depth
(km)(km)
0
2500
2500
2000
0.2
0.2
2000
1500
0.4
1500
0.4
0.5
00
2
4
6
Location (km)
8
10
10
1000
19
Investigation II
True Velocity
Depth (km)
Migration Image using Original Data
0
1
0
Location (km)
10 0
Location (km)
10
20
Investigation III
Migration Image using My Data
True Velocity
Depth (km)
0
1
0
Location (km)
10 0
Location (km)
10
21
Outline
•
•
•
•
•
Previous Results on Marine and Land Data
Goals
Methods and Data Processing
Numerical Results
Summary
22
Summary
• Blind test on a synthetic dataset.
• Waveform inversion failed.
• Need to investigate why waveform inversion
failed.
• Factors: source wavelet, forward modeling,
velocity structure, incorrect information.
23
Future Work
• Redo the inversion with correct information.
• Speed up waveform inversion.
24
Part II
High-Performance Computing
using GPUs
Outline
•
•
•
•
Motivation
Introduction to Computing on GPUs
Preliminary Results
Summary
1
Motivation: Peak Performance
Peak GFLOP/s
1000
750
500
250
0
Courtesy of NVIDIA
2
Motivation: Memory Bandwidth
120
100
80
Bandwidth
60
GB/s
40
20
0
Courtesy of NVIDIA
3
Outline
•
•
•
•
Motivation
Introduction to Computing on GPUs
Preliminary Results
Summary
4
CPU vs. GPU
GPU
CPU
GPU devotes more transistors to data processing.
Courtesy of NVIDIA
5
(Device) Grid
CPU vs. GPU
Block (0, 0)
Block (1, 0)
Shared Memory
Conventional
Storage
Hierarchy
Proc
Cache
L2 Cache
Host + GPU
Storage
Hierarchy
L3 Cache
Host
Memory
Registers
Registers
Shared Memory
Registers
Registers
Thread (0, 0) Thread (1, 0)
Thread (0, 0) Thread (1, 0)
Local
Memory
Local
Memory
Local
Memory
Local
Memory
Global
Memory
Constant
Memory
Texture
Memory
• Large memories are slow, fast memories are small
• Thread synchronization does not work across different thread
blocks.
Source: Mary Hall (U of Utah), NVIDIA
6
General-Purpose Computation on
GPUs (GPGPU)
•
•
•
•
GPUs were originally designed for graphics.
High Speed: Useful of a variety of applications.
Potential for very high performance at low cost
Architecture well suited for certain kinds of
parallel applications (data parallel)
• Demonstrations of 20-100X speedup over CPU
Source: Mary Hall (U of Utah), GPGPU.org
7
Programming Model: CUDA
(Compute Unified Device Architecture)
 Minimal extensions to C++.
 Allow kernel functions to be executed N times in
parallel by N different CUDA threads.
 Each thread performs roughly the same
computation to different partitions of data.
 Data-parallel interface to GPUs.
CS6963
Source: Mary Hall, CUDA Programming Guide
8
Outline
•
•
•
•
Motivation
Introduction to Computing on GPU
Preliminary Results
Summary
9
Preliminary Results
• Modeling Test: speedup factor of 20x using
1536 threads.
• Migration Test: N/A (Thread synchronization
problem)
• Inversion Test: N/A
10
Forward Modeling Test
NX = 1536 = Nthreads
00
m/s
500
4000
Depth (km)
4000
1000
3500
1500
NZ = 373
3000
2000
2500
2500
3000
2000
3500
3.5
1500
1500
0
0
5000
10000
Horizontal Location (km)
15000
15
11
Forward Modeling Test
00
a) CSG
from CPU
CPU
CSG
from
00
1
2
2
Time (s)
(s)
Time
Time (s)
(s)
Time
1
3
6
b) CSG
from GPU
CSG
from
GPU
3
4
4
5
5
6
00
5
10
Offset
(km)
Offset (km)
15
15
00
5
10
Offset
(km)
Offset (km)
15
15
12
Conventional C Code
for (iz=2; iz<nz-2; iz++) {
for (ix=2; ix<nx-2; ix++) {
indx = ix+iz*nx;
P2[indx] = (2.0+2.0*C1*alpha)*P1[indx]
- P0[indx] + alpha*(C2*(P1[indx-1]
+P1[indx+1]+P1[indx-nx]+P1[indx+nx])
+C3*(P1[indx-2]+P1[indx+2]
+P1[indx-2*nx]+P1[indx+2*nx]));
}
}
13
CUDA Code
ix = threadIdx.x;
for (iz=2; iz<nz-2; iz++) {
indx = ix+iz*nx;
P2[indx] = (2.0+2.0*C1*alpha)*P1[indx]
- P0[indx] + alpha*(C2*(P1[indx-1]
+P1[indx+1]+P1[indx-nx]+P1[indx+nx])
+C3*(P1[indx-2]+P1[indx+2]
+P1[indx-2*nx]+P1[indx+2*nx]));
}
14
Outline
•
•
•
•
Motivation
Introduction to Computing on GPU
Preliminary Results
Summary
15
Summary
• GPU is a cheap, high-performance processor.
• CUDA makes it possible to learn how to
program on GPU with a steep learning curve.
• Current timing result is very promising.
• Better understanding of GPU/CUDA will
improve the performance in the future.
16
Future
• Develop CUDA-based codes for
• FD forward modeling
• RTM
• Waveform inversion
• Release codes some time in the Fall of 2009
17
Acknowledgments
• I am grateful for the support from the UTAM
sponsors.
• Thanks Ross Whitaker for seminars on GPU.
• Thanks Mary Hall for her lectures on GPU.
• Thanks Jerry for buying the GPU card.
• Thanks Sam Liston for technical support.
• Finally, thank you for your attention.
18