les diapos - Maison de la Simulation
Download
Report
Transcript les diapos - Maison de la Simulation
Flexible Aerodynamic Solver Technology in an
HPC environment
I. Mary, N. Alferez, J.M. Legouez
Computational Fluid Dynamics Department
Outline
• Few words on Office National Etude recherche
Aerospatiale
• HPC for Direct numerical simulation of
turbulence
• Application: dynamic stall of rotor blade
ONERA: a state research laboratory dedicated to aerospace
Prospective: long term research
Expert advisor to the government
Innovative solutions for industry
3
3
A fleet of test facilities unrivaled in Europe
• 150 experimental test rigs and dedicated metrology
systems
• Combustion, aeroelasticity, optics, instrumentation
and sensing,
space environment
4
4
Europe’s leading center of expertise in large
wind tunnels
• Global clientele
• Half of the European fleet
• Key resources for Airbus and Dassault
• 50 years of working for industry
• Speed envelope from Mach 0.1 to Mach 20
• Research/experimentation synergies and integration
5
5
ONERA: close to our partners
6
Ile-de-France
Nord-Pas-de-Calais
1,275 employees
91 employees
Total
Dassault
EADS
Thales
Dassault
MBDA
Meudon
SNPE
Safran
Astech aerospace
cluster Université
Paris Saclay
Midi-Pyrénées
453 employees
Total
Toulouse
Airbus
Thales
Dassault
Fauga-Mauzac
EADS Astrium
Thales Alenia
Space
Safran
Aerospace Valley
6
Brussels
EU
EDA
Lille
A balanced
business portfolio:
• 1/3 civil
• 1/3 defense
• 1/3 dual-use
Châtillon
Palaiseau
Modane-Avrieux
Rhône-Alpes
162 employees
Salon de Provence
Provence-Alpes-Côte d’Azur
48 employees
Airbus Helicopters - Dassault
Pegase cluster
Large wind tunnels
2 Legacy codes CEDRE and elsA for cfd
used by Airbus & Safran
elsA : Multi-purpose CFD simulation
platform
Internal and external aerodynamics
From low subsonic to high supersonic
Compressible 3-D Navier-Stokes equations
Moving deformable bodies
Aircraft, helicopters, turbomachinery,
CROR, missiles, launchers…
Design and implementation
→Object-Oriented
→Kernel in C++/Fortran
→Millions of lines
→User interface in Python
→Python-CGNS interface for CGNS
extraction and coupling with external software
→CPU and parallel efficiency on a large
panel of computer platforms
Need of a new code to work on HPC implementation without the constraint
of large legacy codes: FAST project
Flexible Aerodynamic Solveur Technology: FAST
Transfer component
CGNS/Python, C
Black
box
CASSIOPEE
Python
Co-processing
CGNS/Python, C, Fortran
Mesh generation and
Adaptation component
CGNS/Python, C
elsA
Nastran
Cèdre
HPC « interior points »
fluid solvers on
unstructured grids
HPC « interior points »
fluid solvers on
structured grids
• Python for:
CGNS/Python, C, fortran
CGNS/Python, C, fortran
• CGNS Python standard for data
representation in memory:
FAST
- users scripting
- gluing between pre/post and
solver modules
- no needs of data copy between
modules
HPC required for Turbulence modelling or optimization
Cutoff wave number of resolved space
scales:
DNS
: kc > kKolmogorov, , , all scales resolved
RANS
: kc 0, all scales modelled
LES : kenergy-containing < kc < kKolmogorov
full airplane simulation:
•
Nowadays RANS affordable, but thousands of
simulations required for shape otimization process
•
DNS affordable around 2050 if Moore’s law still OK
High fidelity flow simulation
Need of high fidelity data to describe finely the turbulence
• Main applications:
- prediction of sources for the aeroacoustics
- comprehension of physical phenomena
- data base for the development of turbulence models
• Complementarity between experience and numerical simulation (LES or
DNS)
- zone where measurements are difficult
- confining
• Some recent collaborative examples with experimental teams:
Vortex breakdown
Tail shake
interaction
Slat noise sources
Impinging hot jet
High fidelity flow simulation
Choice of the mathematical model for the fluid problem
• Compressible Navier-Stokes equation
- Newtonian fluid
- Perfect gas
- Turbulence modelling: DNS, LES, Hybrid RANS/LES or RANS/DNS
• Keypoints for LES/DNS simulations by order of importance
- Mesh resolution (drive 70% of the flow solution)
- Low dissipative convective scheme
- Physical duration of the simulation (often too short due to CPU limitation)
- SGS model (weak influence if mesh OK, hazardous otherwise…)
• Success of LES/DNS relies mainly on the capacity to solve a huge
number of degrees of freedom
- Need important supercomputer resources
- efficient solver(s)
Exemple of DNS resolution for transitional wall bounded
flow
Mesh convergence study:
Mach = 0.25 ( U = 100m/s)
– Reθ= 600 – 1400
– DNS box size: 6×0.5×3 cm3
– 200 millions cellules
– 50 000 Δt
– 5 ×1013 degrees of freedom
– 48h with 1000 cœurs Nehalem
–
C. Laurent PhD (D. Arnal and A. Lerat)
Q criterion coloured by streamwise velocity
FastS solver
• 3 layers Python/C/Fortran module
- python for scripting and glue
- c for memory management
- fortran for loop computation
• Multibloc structured solver for nonlinear Navier-Stokes equations (NS)
- 5 variables in 3D problem: density, velocities, pressure
- edp contains d/dx and d2/dx2 operators
- stencils solver spreads over 21 neighbours cells
- after optimisation, 20 arrays(Ndim) needed to solve NS over Ndim cells
- float 64bit
• 2nd order VF
• hybrid centered/upwind scheme based on sensor flow regularity
• Time integration:
• explicit RK3
• 2nd order implicit method (Gear + Newton +LU-SGS)
• Parallelism based on hybrid MPI/OpenMP method
• Carefull memory design for cache access optim. (superscalar proc)
HPC status
• Distributed memory:
- Weak scaling easy to reach over O(5000) MPI process
Core
8
32
64
128
256
2165
4096
CPU per subiteration
And cells:
0.90μs
0.90μs
0.91μs
0.93μs
0.95μs
0.98μs
1.06μs
- Hybrid MPI/openmp for scalability over O(50000) cores
• Shared memory (openmp):
- Scaling more difficult (synchro, Numa)
• Efficiency at the node level is the more difficult to obtain
- DRAM access = bottleneck for CFD
improve the use of L1-L3
Implementation of cache blocking technic
Need to deeply rewrite the computational sheet
- Efficient use of SIMD unit is crucial
Openmp 4 directive (simd, align,…), no intrinsics
Avoid splilling of vectorial registers
Exemple of the OMP strategy for a Westmere bi-socket
node (1)
Zone
Split « socket » (the work, not the memory)
socket 2
socket 1
Automatic splitting of the work across 2 « sockets »:
• manual synchro at « sockets » interface (explicit lock and flush)
• improve memory placement (first touch policy)
Exemple of the OMP strategy for a westmere bi-socket
node (2)
Socket 1 (thread 1 to 6)
« thread » splitting
th1 th2 th3 th3 th2 th1
th4 th5 th6 th6 th5 th4
th4 th5 th6 th6 th5 th4
th1 th2 th3 th3 th2 th1
th1 th2 th3 th3 th2 th1
th4 th5 th6 th6 th5 th4
th4 th5 th6 th6 th5 th4
th1 th2 th3 th3 th2 th1
Automatic splitting of the work accros the cores of the socket:
• Manual lock and flush at thread interfaces
• Adjustable size for block thN (cache blocking)
• better L3 cache sharing due to stencil
Taylor-Green Vortex (Re=1600): HPC efficiency (1)
Explicit time integration (RK3)
Cartesian grid 300*300*300
t=0
t=8
t=16
Taylor-Green vortex (Re=1600): HPC efficiency (2)
Cache blocking and vecto effect Intel Ivybridge node (20 cores)
cache blocking size
Icache
Jcache
Kcache
300
300
300
300
300
6
300
300
5
300
40
5
300
20
5
300
2
3
300
1
4
300
1
5
CPU*coeur/cell/ssiter *10⁶
FastS -novec FastS -avx
2,94
1,49
2,72
1,36
0,58
0,32
0,52
0,31
0,47
0,26
0,42
0,23
0,41
0,20
0,40
0,21
• speedup FastS
•
•
cache + vecto = 14
Vecto = 2 ( 4 en theory)
• optim at core level more important than mpi/omp opt.
• Improvement of vectorization under progress (IPCC Intel)
Python overhead negligeable if the domain size is sufficient
(overhead ≈ computation of 5000 cells)
Taylor-Green vortex(Re=1600): HPC efficiency (3)
potential
improvement
Arithmetic intensity close to 2 for cartesian solver, and 3 for curvilinear solver
Still place for improvement for curvilinear solver
Taylor-Green vortex (Re=1600): HPC efficiency (4)
Westmere node (12 cores) and Haswell node (24 cores)
NUMA
access
L3 saturation
Compact
thread
affinity
Exemple of computation by DNS
Stall phenomenon on rotorblades due to laminar separation bubble
High speed forward
flight
Low speed and high
AoA on the retreating
side
Dynamic Stall
V0
High speed forward flight :
Large vibratory stresses
Aeroelastic instabilities
21
angle of attack (AoA) on the rotor disc
Introduction: flow physics description of an airfoil near stall
McCroskey & Philippe (1974)
• Laminar separation and turbulent
reattachment (LSB)
• Turbulent boundary layer with adverse
pressure gradient
• Trailing edge separation
LSB study by Horton (1968)
• Laminar separation
• Inflexion point, shear-layer (KelvinHelmholtz convective instability)
• Turbulent reattachment
Introduction: flow physics description of dynamic stall for moving airfoil
Doligalski T. et al. (1994)
• Formation and spillage of the leading
edge vortex (LEV) : LSB bursting ?
• Dynamic stall : Lift overshoot is linked
with the advection of the LEV
• Re number < 10000 : strong interaction
of the LEV with the wall
Gaster (1966), Horton (1968), Owen & Klanfer (1953)
• Experimental investigation of LSB on flat plate
• Influence of Re number and pressure gradient on bubble size
• Short and long bubbles, bubble bursting
Introduction
Stall prediction by RANS modelling
Dynamic stall
static stall
Unaccurate (delay) prediction of the stall (static or dynamic)
regardless of the choice of RANS and transition model
Introduction : recent studies of LSB physics and
modelling
Stable LSB on a flat plate and airfoil configuration
•
Transition process: TS, KH instabilities and very fast transition to 3D flow (Watmuff
JFM1999, Alam& Sandham JFM2000)
•
Absolute/convective instability (Yang & Voke JFM2000, Marxen et al. JFM2004,
Jones et al.
JFM2008 )
•
Acoustic loop feedback between LSB and TE ( Jones et al. JFM2010)
•
•
Upstream disturbance amplitude affects the size of LSB (Alam& Sandham JFM2000)
LSB modelling for RANS (Spalart & Strelets JFM2000, Laurent et al., Comput. Fluids2011,
Richez et al., TCFD2008)
LSB bursting on a flat plate and airfoil configuration
•
Flate plate: Switch between short and long LSB thanks to dynamic variation of the
perturbation amplitude ( Marxen & Henningson JFM2011)
• LSB bursting on airfoil (present study)
High fidelity LES of LSB bursting on airfoil leading to stall
Objective:
study the leading edge stall mechanism as a dynamical process thanks to small
incidence variation to improve understanding of this complex transient flow
Cy
Attached TBL
on suction side
Detached TBL
on suction side
αs
Max Cy ; Stable attached TBL
αs+ ε
α
Stall state
Tools
• High fidelity LES of moving airfoil from αs to αs+ ε
• Realize an ensemble averaging of the transient process by repeating the numerical experiment
with different initial condition at αs
Flow configuration
• Naca-0012 at Re=105: LSB and TBL on suction side for affordable CPU cost
o
o
o
• Critical incidence and variation determined by LES: αs= 10.55 and ε= 3 or 0.25
LES of LSB bursting on airfoil: motion details
Smooth ramp up motion from 10.55o to 10.8o or 13.55o
- Begin of the motion at To
2U
- Motion duration = base case0
2.2c
; fast case
100
0
case
;slow
2
Slow motion to reduce the flow perturbation
(for ):0
- Leading edge velocity 3 times smaller than in
usual dynamic stall study
- No angular velocity and angular acceleration at
the beginning and end of the motion
Limited effect of the motion on pressure
distribution (for 0 ) :
- Little effect on pressure gradient along the suction
side
- No change in bubble size
LES of LSB bursting on airfoil: Computational details
• Number of grid points:
160 million
-4
• ∆t = 0.15μs = 2.75 10
c/ U
• Number of Westmere core:480 (GENCI)
• Accuracy checked by convergence study
• M = 0.16
• 1 c/U ( 3600 time steps) computed in 1 hours
• 1 chord spanwise extend (Flow periodicity)
• Deterministic disturbance input
5
U
5
10
U
sin(
t
)
pert
Nearly DNS resolution
- Resolution at the suction wall before the motion (most constraining configuration)
+
Direction
Points
∆l at reattachement
Streamwise
634
6
Wall normal
351
0.8
spanwize
900
7
- ∆l < 10η(Kolmogorov scale) for the stalled flow configuration in all directions above the
suction side
Steady states around αc : High Reynolds number
physics
Q criteria attached flow, α = 10.55 °, Q = 500
Transition process: (Jones et al. JFM2008)
Large spanwise 2D vortices
3D structures
- short LSB: 0.15c
- turbulent boundary layer
over 80% of the chord
Time evolution of the bubble bursting
https://www.youtube.com/watch?v=2ZMKWB3tQ
V8
Base case motion at ω0
Effects of stall on lift and drag
• Initial turbulent state before the motion do not affect the fast transient process for this motion
parameters
• Two different regimes after the end of the motion
Effect on lift
T*-T0*
Beginning (at T*=T0*) and end of the motion
Effect on drag
T*-T0*
Unsteady analysis : spanwise and short time average
Time evolution of spanwise and short time (6% c/U) average data
Stall development : Time evolution of the displacement
thickness
Bubble bursting:Ensemble average case
0
Isoline= vanishing skin friction coefficient
1 c
1 c
1 c
The shear layer moves closer to
the wall to a constant distance
(T*-T0*= 10 to 16)
The shear layer goes away from
the wall (T*-T0*= 5 to 10)
Constant value (T*-T0*= 0 to 5)
LSB
1 c
≈ distance between the shear layer and the wall in the LSB
Stall development : Upstream motion of the point of
transition
Bubble bursting: Ensemble average case
0
Max of Turbulent Kinetic Energy in the boundary layer
Estimated location of transition
Effect of the motion : cases ω= 10ω0 and ω= ω0 /2
• No significant change of the point of transition position between the different
motion law
• Same LSB growth rate
Same initial flow condition (T0* = 14.7)
ω= 10ω0
ω= ω0 /2
Bursting criterion for RANS
• Diwan et al. (JFM 2006) criterion : P < -28
with ΔU = variation of
external velocity along ΔX
A
Diwan criterion for the three different motions
Conclusions: HPC optimisation
•
•
•
•
A deep modifification of the source code is required
- Memory acces
- Algorithm
- Toward a general coding for futures hardware adaptation
Efficiency weackly affected by MPI transfer in CFD ( Nproc <= 4000)
hybride MPI/Openmp
- no large gain % full MPI ( Nproc <= 4000)
- debug more difficult: “race condition” still hard to track
Difficult to optimize implicit Algorithm (cache blocking impossible)
- Work in progress to allow time consistent simulation with local timestep
•
1 Pflops sur 30000 coeurs Skylake (estimation)
Conclusions: LES/DNS simulations
• One measurement in windtunnel = 1 month of petaflop computation:
- Re=1 000 000
- Model size ≈ 1 meter
- U < 100m/s
- CPU cost % windtunnel?
• HPC very usefull in turbulence:
- Database for turbulence modelling (DNS)
- Understand complex flow phenomena::
* DNS or LES for Re affordable
* (U)RANS or hybride RANS/LES for industrial applications