energy - Indico

Download Report

Transcript energy - Indico

Enabling Supernova Computations by
Integrated Transport and Provisioning Methods
Optimized for Dedicated Channels
Nagi Rao, Bill Wing, Tony Mezzacappa, Qishi Wu, Mengxia Zhu
Oak Ridge National Laboratory
Malathi Veeraraghavan
University of Virginia
DOE MICS PI Meeting: High-Performance Networking Program
September 27-29, 2005
Brookhaven National Laboratory
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
1
Outline





Background
Networking for TSI
Cray X1 Connectivity
USN-CHEETAH Peering
Network-Supported Visualizations
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
DOE ORNL-UVA Project:
Complementary Roles
•Project Components:
•Provisioning for UltraScience Net - GMPLS
•File transfers for dedicated channels
•Peering – DOE UltraScience Net and NSF CHEETAH
•Network optimized visualizations for TSI
•TSI application support over UltraScience Net + CHEETAH
ORNL
Visualization
TSI Application
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
UVA
Peering
Provisioning
File Transfers
This project leverages two projects
•DOE UltraScience Net
•NSF CHEETAH
Terascale Supernova Initiative - TSI
 Science Objective: Understand supernova evolutions
 DOE SciDAC Project: ORNL and 8 universities
 Teams of field experts across the country collaborate on
computations
 Experts in hydrodynamics, fusion energy, high energy physics
 Massive computational code
 Terabyte/day generated currently
 Archived at nearby HPSS
 Visualized locally on clusters – only archival data
 Current Networking Challenges
 Limited transfer throughput
 Hydro code – 8 hours to generate and 14 hours to transfer out
 Runaway computations
 Find out after the fact that parameters needed adjustment
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
TSI Desired Capabilities
Data and File Transfers
(terabyte – petabyte)


Move data from
computations on
supercomputers
Supply data to
visualizations on
clusters and
supercomputers
Visualization channel
Visualization control channel
Steering channel
Computation or
OAK RIDGE NATIONAL LABORATORY visualization
U. S. DEPARTMENT OF ENERGY
Interactive Computations and
Visualization
 Monitor, collaborate and
steer computations
 Collaborative and
comparative visualizations
USN-CHEETAH Peering: Data-plane

Peering: data and control planes
 Coast-to-coast dedicated channels
 Access to ORNL supercomputers to CHEETAH users
CDCI
e300
UltraScience Net
CHEETAH
UltraScienceNet
CHEETAH
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
Peering at ORNL:
Data plane:
10GigE between
SN16000 and e300
Control-Plane:
VPN tunnel
USN-CHEETAH Peering: Control-Plane
•
Wrap USN control-plane with GMPLS RSVP-TE
GMPLS Wrapper
CHEETAH
GMPLS
Control-plane
USN
TL1/CLI
Centralized
Control-plane
•
Authenticated Encrypted Tunnel
USN
Control-host
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
ORNL
NS-50
SN16000
CHEETAH
Network Connectivity of Cray X-Class
X1(E):
FC
Cray
nodes
crossconnect
Cray Network
Subsystem
(CNS)
1GigE
10GigE (future)
FC
Cray-X1(E):
upgraded version of X1
10GigE upgrade to CNS
FC
Disk storage
Redstorm:
1/10GigE
crossconnect
Cray nodes
FC
Cluster-based architecture
GigE-based cross connect
FC Disk storage
Cray X2: Expected to be based on a combination of X1 and Redstorm
– Cray Rainier Plans
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
Internal Data Paths of Supercomputers
We concentrate on two types of connections:
•Ethernet/IP connections from compute/service nodes
•FiberChannel (FC) connections to disks
Analysis of Internal data paths to identify potential bottlenecks:
X1(E): 1GigE – FC; 10GigE – FC bundling
X2: 1/10GigE channels; FC channels
Coordinate with Cray’s plans
Disk path
Disk path
X1(E)
Cross-connect
FC
FC
convert
FC
compute
nodes
Service
nodes
Network path
CNS
Network path
compute
nodes
Service
nodes
disk
Cross-connect
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
X2 (expected)
FC
Experimental Results:
Production 1GigE Connection
Cray X1 to NCSU
 Tuned/ported existing bbcp protocol (unicos OS):
 optimized to achieve 250-400Mbps from Cray X1 to NCSU;
 actual throughput varies as a function of lnternet traffic
 tuned TCP achieves ~50 Mbps.
currently used in production mode by John Blondin
 developed new protocol called Hurricane
 achieves stable 400Mbps using a single stream from Cray X1 to
NCSU;
These throughput levels are the highest achieved between ORNL Cray
X1 and a remote site located several hundred miles away.
GigE
Cray X1
Juniper
M340
All user
connection
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
Cisco
Shared Internet
connection
GigE
Linux
cluster
Experimental Results Cray X1:
Dedicated Connection
Initial testing – Dedicated Channel
 UCNS connected to Cray X1 via four 2Gbps FC connections.
 UCNS is connected to another linux host via 10 GigE connection
 Transfer results:
 1.4Gbps using single flow using Hurricane protocol
highest file transfer rates achieved over Ethernet connections from
ORNL Cray X1 to an external (albeit local) host
2G FC
Cray OS
nodes
Cray
FC convert
Cray X1
UCNS
10GigE
Local host
upgrade
upgrade
2G FC
Cray X1E
Faster processors
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
UCNS
1 Gbps
CHEETAH
600 miles
NCSU
cluster
1Gbps Dedicated Connection: Cray X1(E) - NSCU orbitty cluster
Performance degraded:
bbcp: 30-40Mbps; single TCP: 5 Mbps
Hurricane – 400Mbps (no jobs) – 200Mbps (with jobs)
Performance bottleneck is identified inside Cray X1E OS nodes
UltraScienceNet
National Leadership
Class Facility Computer
CHEETAH
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
Modules of Visualization Pipeline
Data
source
raw data
filtering
filtered data
transformation
(topological surface
construction, volumetric
transfer function)
transformed data
(geometric model,
volumetric values)
rendering
framebuffer
Display

Visualization Modules
 Pipeline consists of several modules
 Some modules are better suited to certain network nodes
 Visualization clusters
 Computation clusters
 Power walls
 Data transfers between modules are of varied sizes and rates
Note:
Commercial tools do not support efficient decomposition
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
Grouping Visualization Modules
G1
G2
Gq-1
mu-1
mv-1
Mu-1
Mu
Mv-1
Mw
c1
cu-1
cu
cv-1
cw
bs
,P[2]
ps
vP[2]
bP[2],P[3]
pP[2]

mx-1
M1
vs
Gq
Mx-1
cx-1
Mx
cx
b P[q-1],d
vP[q-1]
Mn+1
cn+1
vd
pd
pP[q-1]
Grouping
 Decompose the pipeline into modules
 Combine the modules into groups
 Transfers on single node are generally faster
 Between node transfers take place over the network
 Align bottleneck network links between modules with least data
requirements
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
Optimal Mapping of Visualization Pipeline:
Minimization of Total Delay
q
q 1
i 1
i 1
Ttotal ( Path P of q nodes)  Tcomputing  Ttransport  TGi  TLP[ i ],P[ i1]
 1
 

i 1  pP[ i ]
q
 c m 
jGi and j  2
j
j 1
 q 1  m(Gi ) 
   

b
i

1
P
[
i
],
P
[
i

1]



Dynamic Programming Solution
 Combine modules into groups
 Align bottleneck network links between modules with least
data requirements
 Polynomial-time solvable O(n  E ) – not NP-complete
 T m 1 (v)  cm mm

pv
m
T (v)  min 
m 1 to n , vV
 min ( T m 1 (u )  cm mm  cm mm
)
pv
bu ,v
uadj ( v )





Note:
1. Commercial tools (Ensight) are not readily amenable to optimal
network deployment
2. This method can be implemented into tools that provide appropriate
hooks
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
Optimal Mapping of Visualization Pipeline:
Maximization of Frame Rate
Tbottleneck ( Path P of q nodes )

max
Path P of q nodes
i 1,2, , q 1
T
computing

(Gi ), Ttransport ( LP[i ], P[i 1] ), Tcomputing (Gq )
 1

 c j m j 1 ,
 pP[i ] jGi and j  2
 m(G )

i
 max 
,
Path P of q nodes b
P
[
i
],
P
[
i 1]
i 1,2, , q 1 
 1

 c j m j 1
 pP[ q ] jGq and j  2














(2)
Dynamics Programming Solution
 Align bottleneck network links between modules with least
data requirements
 Polynomial-time solvable O(n  E ) – not NP-complete

 m 1
GS m 1  vi   cm mm  

max  F (vi ),
,
p


v
i

m


F (vi )  min 
m 1 to n , vi V


 m 1 cm mm

mm
min
max
F
(
u
),
,
uadj ( vi ) 

pvi
bu ,vi  



OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY



 (6)



Computational Steering

Monitor output and modify the parameters while computation is running
 computational monitoring and steering

Computation cycles and time can be saved if
 unproductive jobs can be terminated
 strayed parameters can be corrected in time.
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
Experimental Results
 Deployed on six Internet nodes located at ORNL, LSU, UT,
NCSU, OSU, and GaTech
 UT and NCSU are clusters
 Configuration:




Client at ORNL,
CM node at LSU
DS nodes at OSU and GaTech
CS nodes at UT and NCState.
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
Estimate Network and Computing Time


The overall estimation error of transport and computing times is
within 5.0%, which demonstrates the accuracy of our performance
models for network and visualization parts.
We also observed that the system overhead is less than one
second. This overhead consists of two components: setup time and
loop time.
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
Performance Comparison for Different Viz Loops

Optimal visualization pipeline: GaTech-UT-ORNL:

The differences in these end-to-end delay measurements are
mainly caused by the

 GaTech isdata storage node and
 UT is used as a computing node.
 disparities in the computing power of nodes and
 bandwidths of network links connecting them
Optimal visualization loop provided substantial performance
enhancements over other pipeline configurations.
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
Comparison with ParaView
 Tested out lightweight system with ParaView under the
same configuration
 our system consistently achieved relatively better
performances than ParaView.
 Performance differences may have been caused by higher
processing and communication overhead incurred by
ParaView.
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
Examples: Human Head Isosurface
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
Example datasets
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
ORNL Personnel
Nagi Rao, Bill Wing, Tony Mezzacappa (PIs)
Qishi Wu (Post-Doctoral Fellow)
Mengxia Zhu (Phd Student – Louisiana State Uni.)
PhD Thesis
Mengxia Zhu, Adaptive Remote Visualization System With Optimized Network Performance for
Large Scale Scientific Data, Department of Computer Science, Lousiana State University,
defending on October 3, 2005
Papers
•
•
•
•
X. Zheng, M. Veeraraghavan, N. S. V. Rao, Q. Wu, and M. Zhu. CHEETAH: Circuit-switched high-speed
end-to-end transport architecture testbed, IEEE Communications Magazine, 2005.
N. S. Rao, S. M. Carter, Q. Wu, W. R. Wing, M. Zhu, A. Mezzacappa, M. Veeraraghavan, J. M. Blondin,
Networking for Large-Scale Science: Infrastructure, Provisioning, Transport and Application Mapping,
SciDAC Meeting, 2005.
M. Zhu, Q. Wu, N. S. V. Rao, S. S.Iyengar, Adaptive Visualization Pipeline Partition and Mapping on
Computer Network, International Conference on Image Processing and Graphics, ICIG2004.
M. Zhu, Q. Wu, N. S. V. Rao, S. S.Iyengar, “On Optimal Mapping of Visualization Pipeline onto Linear
Arrangement of Network Nodes”, International Conference on Visualization and Data Analysis, 2005
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY
Conclusions
Summary
Developed several components to support TSI
• USN-CHEETAH peering
• File and data transfers
• Visualization modules
Ongoing Tasks
• USN-CHEETAH GMPLS peering
• Work with Cray to address performance issues
• Transitioning visualization system to production TSI
OAK RIDGE NATIONAL LABORATORY
U. S. DEPARTMENT OF ENERGY