K-computer and supercomputing projects in Japan

Download Report

Transcript K-computer and supercomputing projects in Japan

K-computer and Supercomputing Projects in Japan
Makoto Taiji
Computational Biology Research Core
RIKEN Planning Office for the Center for
Computational and Quantitative Life Science
&
Processor Research Team
RIKEN Advanced Institute for Computational Science
[email protected]
Agenda
•
•
•
•
K-computer
Advanced Institute for Computational Science
High Performance Computing Infrastructure
My own perspective in future HPC,
and MDGRAPE-4 (in short)
My Backgrounds
• Physics
• Special-purpose computers for scientific simulations (1986~)
–
–
–
–
–
–
–
Monte Carlo simulations of spin systems (1986, m-TIS I)
FPGA-based reconfigurable machine (1990, m-TIS II)
Gravitational N-body problems (1992~96, GRAPE-4,5)
Molecular Dynamics simulations
(1994~, MD-GRAPE, MDM, MDGRAPE-3,4)
Dense Matrix Calculation, quasi-general-purpose machine
(MACE, 2000)
• Ultrafast laser spectroscopy (1987~92)
– Conjugated Polymers
– Rhodopsin and Bacteriorhodopsin
• Learning process as dynamical systems, multi-agent dynamics
(1996~2002)
• Physical Random Number Generator (1997~2004)
World situation of HPC (Top 500)
Country
Share of Japan:
Down to
6th position
Next-Generation Supercomputer Project
• National project to develop a leading generalpurpose supercomputer in Japan
• Not for single purpose – cf. Earth Simulator
• Location: Kobe Port Island
• Developer: Fujitsu
• Linpack
K computer system (CG)
10 PetaFLOPS
• Partial operation:
Spring 2011
• Full service:
Autumn 2012
Location of K computer
Mt. Rokko
Shinkansen-Line
Shin-Kobe Station
Sannomiya
Ashiya
Kobe Medical Industry
Development Project
Core Facilities
Port Island
About 5km from Sannomiya
12 min. by Portliner
K-computer &
Advanced Institute for
Computational Sciences
Portliner
To Osaka
To Akashi / Awaji-Island
Kobe Airport
Photo: June, 2006
Kobe Sky Bridge
RIKEN Advanced Institute
for Computational Science
National Center to cover wide fields of
computational science and engineering
Formation of Central Hub in Kobe
Advanced Institute for
Computational Science
【Strategic Use】
【Public Use】
Strategic
Strategic Region
Region
Director:
Dr. Kimihiko Hirao
Strategic
Region
Strategic
Region
Operation and
sophistication of
the supercomputer,
Computational Sciences
Interdisciplinary research
Academia
Industry
Interdisciplinary
Research,
Computer Science
Operation
Sophistication
【Operation
Organization Use】
Registered
Organization
Selection of
applications
User Support
8
RIKEN Advanced Institute
for Computational Science
Director
Deputy Director
Operation
Technology
Division
Research Division
Field Theory Research Team
(TL: Yoshinobu Kuramashi)
Research
Promotion
Division
Computational Molecular Science Research Team
(TL: Takahito Nakajima)
Computational Biophysics Research Team
(TL: Yuji Sugita)
Computational
Science
Research
Computational Materials Science Research Team
(TL: Seiji Yunoki)
System Software Research Team
(TL: Yutaka Ishikawa)
Computer
Science
Research
Processor Research Team
(TL: Makoto Taiji)
9
Grand Challenge Applications
Next-Generation Integrated Life-Science
Simulation Software (2006–2012)
Next-Generation Integrated Nano-Science
Simulation Software (2006–2011)
Next-Generation
Energy
Next-Generation information
FunctionNonlinear
Materialsoptical Device
Medicines, New drug,
and DDS
Solar energy
fixation
Fuel alcohol
Fuel cells
Electric energy
Nafion
storage
Nano quantum devices
Next-Generation
Nano Biomolecules
Viruses
Anticancer drugs
Protein control
Nano processes for DDC
Water
Spin electronics
Ultra high-density
storage devices
Mesoscale structure of
naflon membrane
27 nm
Electronic
conduction in
integrated systems
Molecular
Integrated
system
Nafion membrane
Molecular
assembly
Liposome
Polio virus
One-dimensional
crystal of silicon
Protein folding
Water molecules
inside lisozyme
cavity
Electrons
and
molecules
Selfassembly
Quantum
chemistry
Self- organized
magnetic nanodots Condensed
matters
Orbiton
(orbital waves) Domain
RMSD 4.8 Å(all Cα)
Ferromagnetic
half-metals
10-8~-6
Molecular
dynamics
Doping of fullerene
and carbon
switch
nanotubes
Electron theory
of solids
Size
Protein structural
analysis
Drug response
analysis
Electrons
Base site: Institute for Molecular Science
To create next-generation nano-materials (new
semiconductor materials, etc.) by integrating theories
(such as quantum chemistry, statistical dynamics and
solid electron theory) and simulation techniques in the
fields of new-generation information functions/materials,
nano-biomaterials, and energy
Hyperthermia
Organ and
body scale
10-3~-2
Whole
body
Cardiova
scular
system
10-1
Meso
Molecular
network analysis
MD/first principle/quantum
chemistry simulations
Micromachines
Organs
10-5~-4
Micro
Catheters
Tissues
Microscopic approach
“off” “on”
light
light
Optical
light
Capsulation
Cellular
scale
Cells
5nm
High Intensity
Focused
Ultrasound
Drug Delivery
System
Regenerative
medicine
Proteins/ scale
DNA
46 nm
Semimacroscopic
Drug
development
Tailor-made
medicine
Integrated electronic
devices
15nm
Surgical
procedures
Toward therapeutic
technology
100
Macro
Fluids, heat,
structures
Achievement of
chemical reactions
Brain
Function
Vascular system
modeling
Skeleton model
Macroscopic approach
Continuous entity simulations
<Multi-scale human body simulations>
Base site: RIKEN Wako Institute
To provide new tools for breakthroughs against
various problems in life science by means of
petaflops-class simulation technology, leading to
comprehensive understanding of biological
phenomena and the development of new
drugs/medical devices and diagnostic/therapeutic
Appointment of Strategic Regions
Computational resources and budget will be allocated for the
following regions
“Strategic organization” will organize the research
• Region 1. Foundations for predictive life sciences,
medical care, and drug design
• Region 2. Innovation of new materials and new energies
• Region 3. Prediction of global change for disaster prevention
and reduction
• Region 4. Next-generation manufacturing
• Region 5. Origin and structure of matter and the universe
2009-2010: Feasibility Studies
2011-2015: Strategic Researches
11
Schedule of Project
Partial operation within FY2010, Full operation starts from FY2012
FY2006
Processing unit
System
Front-end unit
(total system
software)
Shared file
system
Applications
Next-Generation
Integrated
Nanoscience
Simulation
Next-Generation
Integrated
Life Simulation
FY2007
Conceptual
design
FY2008
Buildings
Research
building
Prototype and
evaluation
Detailed design
Basic
Detailed design
design
Basic
design
Detailed design
FY2010
Production and evaluation
FY2012
Tuning and improvement
Production, installation, and adjustment
Verification
Development, production, and evaluation
Feasibility
Studies
Design
FY2011
Production, installation,
and adjustment
Development, production, and evaluation
Strategic
Researches
Computer
building
FY2009
Verification
Preparatory
Researches
Research Promotion
Construction
Design
Construction
12
Features of K computer
• 京 = “K” means 1016
• High Performance : Linpack 10 PFLOPS
• Massive Parallelization
– > 80,000 Processors, > 640,000 Cores
• SPARC64 VIIIfx: Processor designed for
HPC
– VISIMPACT / HPC-ACE extensions
• 16GB / node, 2GB / core
• ~20MW
K-Computer System
Number of nodes : > 80,000
Number of Processors: > 80,000
Number of Cores:
> 640,000
Peak Performance: > 10 PFLOPS
Memory Capacity: > 1PB (16GB/node)
5GB/s x
Bidirectional
Network: Tofu interconnect (6-dim. Torus)
User view: 3D-Torus
Bandwidth: 5GB/s bidirectional for each six direction
4 Simultaneous Communication
Bisection Bandwidth: >30TB/s (bidirectional, nominal
peak)
ノード
5GB/s x
Bidirectional
CPU: 128GFLOPS
(8 Core)
Core
Core Core
SIMD(4FMA)
Core
Core
SIMD(4FMA)
Core SIMD(4FMA)
Core
16GFlops
SIMD(4FMA)
SIMD(4FMA)
Core
16GFlops
16GFlops
SIMD(4FMA)
SIMD(4FMA)
16GFlops
16GFlops
SIMD(4FMA)
16GFlops
16GFlops
16GFLOPS
5GB/s x
Bidirectional
L2$: 5MB
z
64GB/s
MEM: 16GB
x
5GB/s x
Bidirectional
y
3D-Torus Network
Cabinet of K computer
• 24 boards/cabinet
• 192 CPUs
• 24 TFLOPS
15
What is special in K computer?
• Network
– High Bandwidth, Low Latency
• Processor for HPC
– VISIMPACT
• Shared Cache & Hardware Barrier
• Multi-core parallelization of inner loop
– HPC-ACE
• Register Extension
• SIMD 2FMA, 2 issue/cycle (4FMA/Core)
• Instructions for special functions (trigonometric,
inverse, square-root, inverse square-root etc.)
16
T. Maruyama, Proc. Hot Chips 2009.
17
Software
• OS: Linux
• Compiler
– Fujitsu compiler will support
•
•
•
•
•
Fortran(2003), C(1999), C++(2003)
GNU C/C++ extensions
Automatic vectorization for SPARC64 VIIIfx
OpenMP 3.0
MPI-2.1
– gcc may also be available. However, it cannot
generate CPU specific instructions (e.g SIMD)
and poor performance is expected.
How to use it?
• Five “Strategic Regions” has been selected.
For these fields, MEXT will fund some research
budget, and machine time will be delivered.
• General Use
For general use, “registered organization” will
control distribution of machine time.
• Commercial Use
RIKEN does not responsible for the usage of the
machine, basically.
HPCI:
High Performance Computing Infrastructure
• System to utilize academic
supercomputers in Japan
• 2012~
• User Communities
– 5 strategic regions, Industrial Consortiums,
National Universities and Institutes
• Computing Resource Provider
– RIKEN AICS, University Centers, National
Institutes
20
Basic Idea of HPCI
25 Organization
13 Organization
Logical
Structure
Physical
Structure
21
Problem in Future of HPC Hardware
• If the problem can be parallelized…
Computing performance is cheap.
• However, in every aspects…
Data movements dominates costs.
– CoreーCache
– CacheーMain Memory
– NodeーNode
– NodeーDisk
– SystemーSystem/Apparatus/Internet
22
Future Processors for HPC
• Gap between top-end HPC processors and
commodity will increase
• What are needed for HPC
– Many-core processors, Accelerators for “dense
problems”
– Chip stacking for bandwidth
– Network integration
• Network will be the most important factor in
HPC
Future Directions (1)
• Network integration is essential both for generalpurpose machines and special-purpose ones
• Platform for Accelerators
– General-purpose processor cores
– Cache or local memory
– Fast, low-latency on-chip and off-chip networks
Accelerator
Memory
>100GB/s
PU
Memory
Network
>30GB/s
On-chip Network
>100GB/s/router
Future Directions (2)
• High Memory Bandwidth System
– “Single-chip BlueGene/L”
by System-on-Chip or Chip stacking by TSV
– B/F〜1
– B/F〜0.1 for remote node
>500GFLOPS
PU
Network
>50GB/s
Memory
>500GB/s
Problem in Network
•
•
•
•
•
Molecular Dynamics: Strong Scaling is important
〜50,000 FLOP/particle/step
N=105
5 GFLOP/step
5TFLOPS effective performance
1msec/step = 170nsec/day
Rather Easy
• 5PFLOPS effective performance
1μsec/step = 200μsec/day???
Difficult, but important
Anton
• D. E. Shaw Research
• Special-purpose pipeline
+ General-purpose core
+ Dedicated Network
• By decreasing communication
latency, it can achieve high
sustained performance even for
small systems
R. O. Dror et al., Proc. Supercomputing 2009, in USB memory.
MDGRAPE-4
• Special-purpose computer for
molecular dynamics simulations
• Test bed for future HPC hardware
• FY2010-FY2012
• System-on-Chip
–
–
–
–
Accelerator
Memory
General-purpose processor
Network
• ~4Tflops / chip
Fin
29