HPC 2 Activities
Download
Report
Transcript HPC 2 Activities
NYS High Performance Computation
Consortium funded by NYSTAR at $1M/year
for 3 years
Goal is to provide NY State users support in
the application of HPC technologies in:
Research and discovery
Product development
Improved engineering and manufacturing
processes
The HPC2 is a distributed activity - participants
Rensselaer, Stony Brook/Brookhaven, SUNY
Buffalo, NYSERNET
2
Xerox
Corning
ITT Fluid Technologies: Goulds Pumps
Global Foundries
Objectives
Demonstrate end-to-end solution of two-phase flow
problems.
Couple with structural mechanics boundary
condition.
Provide interfaced, efficient and reliable software
suite for guiding design.
Tools
Simmetrix SimAppS Graphical Interface – mesh
generation and problem definition
PHASTA – two-phase level set flow solver
PhParAdapt – solution transfer and mesh adaptation
driver
Kitware Paraview – visualization
Systems
CCNI BG/L, CCNI Opterons Cluster
REPLACE
WITH
ANIMATION
Fluid ejected into air.
Ran on 4000 CCNI BG/L cores.
Six iterations of mesh adaptation on two-phase simulation.
Autonomously ran on 128 cores of CCNI Opterons for approximately 4 hours
Initial work interfaces simulations
through serial file formats for
displacement and pressure data.
Structural mechanics simulation
runs in serial. PHASTA simulation
runs in parallel.
Distribute serial displacement data
to partitioned PHASTA mesh.
Aggregate partitioned PHASTA
nodal pressure data to serial input
file.
Modifications to automated mesh
adaptation Perl script.
Structural Mechanics Mesh
of Input Face
PHASTA Partitioned Mesh
of Input Face
Objectives
Demonstrate capability of available computational
tools/resources for parallel simulation of highly
viscous sheet flows.
Solve a model sheet flow problem relevant to the
actual process/geometry.
Develop and define processes for high fidelity twin
screw extruder parallel CFD simulation.
Investigated Tools (to date)
ACUSIM AcuConsole and AcuSolve, Simmetrix
MeshSim, Kitware Paraview
Systems
CCNI Opterons Cluster
High Aspect Ratio Sheet
Aspect ratio : 500:1
Element count: 1.85 Million
7 mins on 512 cores
300 mins on 8 cores
9
Mesh generation in Simmetrix
SimAppS graphical interface.
Gaps that are ~1/180 of large
feature dimension.
Conceptual Rendering
of Single Screw Extruder
Assembly*
Single Screw Extruder CAD**
* http://en.wikipedia.org/wiki/Plastics_extrusion
10
** https://sites.google.com/site/oscarsalazarcespedescaddesign/project03
Objectives
Apply HPC systems and software to setup
and run 3D pump flow simulations in hours
instead of days.
Provide automated mesh generation for fluid
geometries with rotating components.
Tools
ACUSIM Suite, PHASTA, ANSYS CFX,
FMDB, Simmetrix MeshSim, Kitware
Paraview
Systems
CCNI Opterons Cluster
AcuConsole Interface
Problem definition, mesh
generation, runtime monitor,
and data visualization
Simmetrix provided customized mesh generation and
problem definition GUI after iterating with industrial
partner.
Supports automated identification of pump
geometric model features and application of
attributes
Problem definition with support for exporting data
for multiple CFD analysis tools.
Reduced mesh generation time frees engineers to
focus on simulation and design optimizations
improved products
Goal: Develop simulation technologies that
allow practitioners to evaluate systems of
interest.
To meet this goal we
Develop adaptive methods for reliable simulations
Develop methods to do all computation on
massively parallel computers
Develop multiscale computational methods
Develop interoperable technologies that speed
simulation system development
Partner on the construction of simulation systems
for specific applications in multiple areas
Software available (http://www.scorec.rpi.edu/software.php)
Some tools not yet linked – email [email protected]
with any questions
Simulation Model and Data Management
Geometric model interface to interrogate CAD models
Parallel mesh topological representation
Representation of tensor fields
Relationship manager
Parallel Control
Neighborhood aware message packing
Iterative mesh partition improvement with multiple criteria
Processor mesh entity reordering to improve cache
performance
Adaptive Meshing
Adaptive mesh modification
Mesh curving
Adaptive Control
Support for executing parallel adaptive unstructured mesh
flow simulations with PHASTA
Adaptive multimodel simulation infrastructure
Analysis
Parallel Hierarchic Adaptive Stabilized Transient Analysis
software for compressible or incompressible, laminar or
turbulent, steady or unsteady flows on 3D unstructured
meshes (with U. Colorado)
Parallel hierarchic multiscale modeling of soft tissues
Interoperable Technologies for Advanced
Petascale Simulations (ITAPS)
Petascale
Integrated
Tools
AMR
Front tracking
Shape
Optimization
Solution
Adaptive
Loop
Solution
Transfer
Petascale
Mesh
Generation
Build on
Component
Tools
Front
tracking
Smoothing
Mesh
Adapt
Swapping
Interpolation
Kernels
Dynamic
Services
Are unified
by
Common
Interfaces
Mesh
Geometry
Relations
Field
Geom/Mesh
Services
Excellent
strong scaling
Implicit time integration
Employs the partitioned mesh for
system formulation and solution
Specific number of ALL-REDUCE
communications also required
105M vertex mesh (CCNI Blue Gene/L)
#Proc.
512
El./core t(sec) scale
204,800 2120
1
1,024
102,400
1052
1.01
2,048
51,200
529
1.00
4,096
25,600
267
8,192
12,800
16,384
32,768
6,400
3,200
1 billion element anisotropic mesh on
Intrepid Blue Gene/P
#of
cores
Rgn
imb
Vtx
imb
Time (s)
Scaling
0.99
16k
2.03%
7.13%
222.03
1
131
1.02
32k
1.72%
8.11%
112.43
0.987
64.5
35.6
1.03
0.93
64k
1.6%
11.18%
57.09
0.972
128k
5.49%
17.85%
31.35
0.885
AAA 5B elements: full-system scale
on Jugene (IBM BG/P system)
Without ParMA partition improvement strong scaling factor is 0.88 (time is 70.5 secs).
Can yield 43 cpu-years savings for production runs!
Requires functional support for
Mesh distribution
Mesh level inter-processor communications
Parallel mesh modification
Dynamic load balancing
Have parallel implementations for each –
focusing on increasing scalability
Mesh size field of air bubbles distributing in a
tube (segment of the model – 64 bubbles total)
Initial mesh: uniform, 17 million mesh
regions
Adapted mesh: 160 air bubbles 2.2 billion
mesh regions
Multiple predictive load balance steps used
to make the adaptation possible
Larger meshes possible (not out of memory)
Initial and adapted mesh
(zoom of a bubble), colored by
magnitude of mesh size field
Test strong scaling uniform refinement
on Ranger 4.3M to 2.2B elements
Nonuniform field driven refinement
(with mesh optimization) on Ranger
4.2M to 730M elements (time for
dynamic load balancing not included)
Nonuniform field driven refinement
(with mesh optimization operations)
on Blue Gene/P 4.2M to 730M
elements (time for dynamic load
balancing not included)
# of Parts
Time (s)
Scaling
2048
21.5
1.0
4096
11.2
0.96
8192
5.67
0.95
16384
2.73
0.99
# of Parts
Time (s)
Scaling
2048
110.6
1.0
4096
57.4
0.96
8192
35.4
0.79
# of Parts
Time (s)
Scaling
4096
173
1.0
8192
105
0.82
16384
66.1
0.65
32768
36.1
0.60
Adaptive Loop Construction
Tightly coupled
Adv: Computationally efficient
Disadv: More complex code
t=2e-4
development
Example: Explicit solution of
cannon blasts
t=5e-4
Loosely coupled
Adv: Ability to use existing
analysis codes
Disadv: Overhead of multiple
structures and data conversion
Example: Implicit high-order
Active flow control modeling
t=0.0
Adaptive Loop Driver – C++
Coordinates API calls to execute solve-adapt loop
phSolver – Fortran 90
Flow solver scalable to 288k cores of BG-P, Field API
phParAdapt – C++
Invokes parallel mesh adaptation
▪ SCOREC FMDB and MeshAdapt, Simmetrix MeshSim and
MeshSimAdapt
Control
Control
Adaptive Loop
Driver
phSolver
Field
API
Compact Mesh and
Solution Data
Field
Data
phParAdapt
Field
Data
Field
API
Mesh Data
27
Base
Solution
Fields
•
Mesh curving applied to 8-cavity cryomodule simulations
•
•
2.97 Million curved regions
1,583 invalid elements corrected – leads to stable simulation and
executes 30% faster
mesh close-up before
and after correcting
invalid mesh regions
marked in yellow
28
• FETD for short-range wakefield calculations
▪ Adaptively refined meshes have 1~1.5
million curved regions
▪ Uniform refined mesh using small mesh
size has 6 million curved regions
Electric fields on the three refined curved meshes
Initial mesh has 7.1 million regions
Initial mesh is isotropic outside boundary layer
The adapted mesh: 42.8 million regions
7.1M->10.8M->21.2M->33.0M->42.8M
Boundary layer based mesh adaptation
Mesh is anisotropic
•
Multiscale simulation
linking microscale network
model to a macroscale
finite element continuum
model.
• Collaborating with
experimentalists at the
University of Minnesota
Macroscale Model
Microscale Model
Nano-void subjected to hydrostatic
tension. Finite element discretization
of the problem domain and
dislocation structures.
Nano-indentation of a thin film.
Concurrent model
configuration at 60th load
step (3 A indentation
displacement). Colors
represent
the sub-domains in which
manufacture
devices
circuits
size scale
atoms/carriers
design
use/performance
1st principles
CMOS modeling
Simulation
Automation
Components
Super-resolution
lithography tools
Mechanics of
damage nucleation
in devices
Modeling/simulation
development
Technology
development
Device
simulation
Reactive ion
etching
variation-aware
circuit design
Parallel
Computing
Methods
33
As Si CMOS devices shrink nanoelectronic effects
emerge.
Fermi-function based analysis gives way
Input to circuit level from
atomic level physics
to quantum energy-level analysis.
Poisson and Schrodinger equations reconciled
E
iteratively, allowing for current predictions.
Carrier dynamics respond to strain
in increasingly complex ways from mobility
changes to tunneling effects.
Fermi level
New functionalities might be exploited
▪
▪
▪
▪
Poisson
NU
Single-electron transistors
Graphene semiconductors
Carbon nanotube conductors
Schrödinger
UN
Spintronics – encoding information into charge carrier’s spin
UI
34
Motivation:
Reducing feature size in has made the
modeling of underlying physics critical.
In projective lithography simple biases
not adequate
In holographic lithography near-field
phenomenon is predominant
Modeling approach must be based on
Maxwell’s equations
Projective Lithography
Holographic
Lithography
Goal:
Develop unified computational
algorithms for the design and analysis of
super-resolution lithographic processes
that model the underlying physics
with high fidelity
35
To handle SRAM-scale systems, we expect much larger computational
systems, e.g., 105 - 106 surface elements.
Transport tracking scales O(n2) with number of surface elements n.
▪ Parallelizes well – every view factor can be computed completely
independently of every other view factor, giving almost linear speed up.
Computational complexity of chemistry solver depends upon particular
chemical mechanisms associated with etch recipe. Tend to be O(n2).
Cut away view of reactive ion etch simulation of an aspect ratio 1.4 via into a dielectric substrate with
7% porosity, and complete selectivity with respect to the underlying etch stop. A generic ion-radical
36
etch model was used. ~103 surface elements. [Bloomfield et al., SISPAD 2003, IEEE.]
At 90 nm and below, devices have come to rely on increased carrier mobility
produced by strained silicon.
As devices scale down, the relative importance of scattering centers
increases.
Can we have our cake and eat it too? How much strain can be built into a
given device before processing variations and thermo-mechanical load
during use cause critical dislocation shedding?
Continuum FEM calculations
automatically identify critical
high-stress regions.
A local atomistic problem is constructed and
an MD simulation is run, looking for criticality.
37
Results feed back to continuum.
Advanced meshing tools
and expertise exist at RPI
and associated spin-off
Leverage tools to support
CCNI projects such as the
advanced device-modeling.
Local refinement and
adaptivity can help carry the
computation resources
further. “More bang for the
buck.”
38