Parallel Computing
Download
Report
Transcript Parallel Computing
Parallel Computing
Michael Young, Mark Iredell
NWS Computer History
1968 CDC 6600
1974 IBM 360
1983 CYBER 205
first vector parallelism
1991 Cray Y-MP
first shared memory parallelism
1994 Cray C-90
~16 gigaflops
2000 IBM SP
first distributed memory parallelism
2002 IBM SP P3
2004 IBM SP P4
2006 IBM SP P5
2009 IBM SP P6
2013 IBM Idataplex
SB
~200 teraflops
NEMS/GFS Modeling Summer School
2
Algorithm of the GFS Spectral Model
One time loop is divided into :
Computation of the tendencies of
divergence, surface pressure, temperature
and vorticity and tracers (grid)
Semi-implicit time integration (spectral)
First half of time filter (spectral)
Physical effects included in the model (grid)
Damping to simulate subgrid dissipation
(spectral)
Completion of the time filter (spectral)
NEMS/GFS Modeling Summer School
3
Algorithm of the GFS Spectral Model
Definitions :
Operational Spectral Truncation T574 with a
Physical Grid of 1760 longitudes by 880 latitudes and
64 vertical levels (23 km resolution)
θ is latitude
λ is longitude
l is zonal wavenumber
n is total wavenumber (zonal + meridional)
NEMS/GFS Modeling Summer School
4
Three Variable Spaces
Spectral (L x N x K)
Fourier (L x J x K)
Physical Grid ( I x J x k)
I is number of longitude points
J is number of latitudes
K is number of levels
NEMS/GFS Modeling Summer School
5
The Spectral Technique
All fields possess a spherical harmonic
representation:
JJ
l l
il
F
(, )
f
P
(sin
)
e
nn
l
0
n
l
where
1
2
l
n 2
n
1
(
2
n
1
)(
n
l
)
d
(
1
x
)
l
2
l
2
P
(
x
)
(
1
x
)
n
n
l
n
2
n
!
2
(
n
l
)!
dx
NEMS/GFS Modeling Summer School
6
Spectral to Grid Transform
Legendre transform:
J
F
(
)
fP
(sin
)
l
n
l
l l
nn
Fourier transform using FFT:
J
F
(
,
)
F
(
)
e
l
il
l
0
NEMS/GFS Modeling Summer School
7
Grid to Spectral Transform
1
l
fn
2
2
0
l
il
F
(
,
)
P
(sin
)
e
cos dd
n
2
2
Inverse Fourier transform (FFT):
1 2
F() F
(
,)eild
0
2
l
M
jl
i
M
F
(
)e
j,
j
0
Inverse Legendre (Gaussian quadrature):
l
l
l
F
w
F
(
)
P
(sin
)
n
k
k n
k
N
k
1
NEMS/GFS Modeling Summer School
8
MPI and OpenMP
GFS uses Hybrid 1-Dimensional MPI layout
and OpenMP threading at do loop level
MPI (Message Passing Interface) is used to
communicate between tasks which contain a
subgrid of a field
OpenMP supports shared memory
multiprocessor programming (threading) using
compiler directives
NEMS/GFS Modeling Summer School
9
MPI and OpenMP
Data Transposes are implemented using
MPI_alltoallv
Required to switch between the variable
spaces which have different 1-D MPI
decompositions
NEMS/GFS Modeling Summer School
10
Spectral to Physical Grid
Call sumfln_slg_gg (Legendre Transform)
Call four_to_grid (FFT)
Data Transpose after Legendre Transform in
preparation for FFT to Physical grid space
call mpi_alltoallv(works,sendcounts,sdispls,mpi_r_mpi,
x
workr,recvcounts,sdispls,mpi_r_mpi,
x
mc_comp,ierr)
NEMS/GFS Modeling Summer School
11
Physical Grid to Spectral
Call Grid_to_four (Inverse FFT)
Call Four2fln_gg (Inverse Legendre Transform)
Data Transpose performed before the Inverse
Legendre Transform
call mpi_alltoallv(works,sendcounts,sdispls,MPI_R_MPI,
x
workr,recvcounts,sdispls,MPI_R_MPI,
x
MC_COMP,ierr)
NEMS/GFS Modeling Summer School
12
Physical Grid Space Parallelism
1-D MPI distributed over latitudes.
OpenMP threading used on longitude
points.
Each MPI task holds a group of latitudes,
all longitudes, and all levels
Cyclic distribution of latitudes used for
load balancing the MPI tasks due to a
smaller number of longitude points per
latitude as latitude increases
(approaches the poles).
NEMS/GFS Modeling Summer School
13
Physical Grid Space Parallelism
Cyclic distribution of latitudes example
5 MPI tasks and 20 Latitudes would be
Task 1 2 3 4 5
Lat 1 2 3 4 5
Lat 10 9 8 7 6
Lat 11 12 13 14 15
Lat 20 19 18 17 16
NEMS/GFS Modeling Summer School
14
Physical Grid Space Parallelism
Physical Grid Vector Length per OpenMP thread
NGPTC (namelist variable) defines number (block)
of longitude points per group (vector length per
processor) that each thread will work on
Typically set anywhere from 15-30 points
NEMS/GFS Modeling Summer School
15
Spectral Space Parallelism
Hybrid 1-D MPI layout with OpenMP threading
Spectral space 1-D MPI distributed over
zonal wave numbers (l's). OpenMP
threading used on a stack of variables
times number of levels.
Each MPI task holds a group of l’s, all n’s,
and all levels
Cyclic distribution of l's used for load balancing
the MPI tasks due to smaller numbers of
meridional points per zonal wave number as the
wave number increases.
NEMS/GFS Modeling Summer School
16
GFS Scalability
1-D MPI scales to 2/3 of the spectral
truncation. For T574 about 400 MPI tasks.
OpenMP threading scales to 8 threads.
T574 scales to 400 x 8 = 3200 processors.
NEMS/GFS Modeling Summer School
17