Slides_Joseph

Download Report

Transcript Slides_Joseph

Joseph Antony, Andrew Howard, Jason Andrade,
Ben Evans, Claire Trenham, Jingbo Wang
Production Petascale Climate Data
Replication at NCI –
Lustre and our engagement with the
Earth Systems Grid Federation (ESGF)
nci.org.au
@NCInews
nci.org.au
MOTIVATION
nci.org.au
International Climate Change Research – The CMIP projects
• The UN’s International Panel on Climate
Change (IPCC) prepares an intergovernmental
assessment report every 6 years
• This effort requires significant scientific and
HPC/HPD resources to back it
• The most recent of these activities was the
Coupled Model Intercomparison Project 5
(CMIP5)
• The NCI is a major data node within the ESGF
federation
• In this talk I will share with you a ‘view from
the coalface’, replicating ~2PB of data
nci.org.au
nci.org.au
CMIP DATA VOLUMES
nci.org.au
CMIP1 thru CMIP5 Data Volumes
Taken from Dean Williams’ ESGF Internet2 presentation, 2014
nci.org.au
ESGF NODE ARCHITECTURE
nci.org.au
ata archival
retrieval
Theand
ESGF Data
Archival and Retrieval System
s
ct
d
s
e
http://esgf.org'
8"
• The ESGF is a federated
peer-to-peer
international data
archival and retrieval
system
• Incorporates singlesign-on for end-users
• It has publication and
version management
tools
• Supports data
aggregations and can
notify users if datasets
have been modified
LLNL-PRES-648666
nci.org.au
THE END-USER PERSPECTIVE
nci.org.au
The Last-Mile Problem …
• Data is too large to move onto desktop for
analysis – CMIP3 to CMIP5
• Users want versioned, curated data to be
able to jump right into scientific analysis
• At NCI
– An integrated eco-system exists for dataintensive science
• Data Repositories
• Virtual Laboratories
– The ICNWG effort to solve the ‘Last Mile
Problem’ for networking
nci.org.au
ICNWG Activities
nci.org.au
Okay … so where’s Lustre in all of this you ask?
nci.org.au
Okay … so where’s Lustre in all of this you ask?
We use Lustre as our distributed filesystem for a set
of dedicated WAN data transfer nodes (DTNs)
nci.org.au
Okay … so where’s Lustre in all of this you ask?
We use Lustre as our distributed filesystem for a set
of dedicated WAN data transfer nodes (DTNs)
But first a detour …
nci.org.au
A small amount of packet loss makes a huge
difference in TCP performance
1Gbps
==
125
MB/sec
Local
(LAN)
Metro Area
With loss, high performance
beyond metro distances is
essentially impossible
International
Regional
Continental
Measured (TCP Reno)
Measured (HTCP)
Courtesy Eli Dart, ESnet
Theoretical (TCP Reno)
Measured (no loss)
5/5/14
Lawrence Berkeley National Laboratory
nci.org.au
U.S. Department of Energy | Office
of Science
Science DMZ Design Pattern (Abstract)
Border Router
perfSONAR
WAN
10G
Enterprise Border
Router/Firewall
10GE
Site / Campus
access to Science
DMZ resources
Clean,
High-bandwidth
WAN path
10GE
perfSONAR
10GE
Site / Campus
LAN
Science DMZ
Switch/Router
10GE
perfSONAR
Per-service
security policy
control points
High performance
Data Transfer Node
with high-speed storage
Courtesy Eli Dart, ESnet
5/5/14
Lawrence Berkeley National Laboratory
6
nci.org.au
U.S. Department of Energy | Office
of Science
Local And Wide Area Data Flows
Border Router
perfSONAR
WAN
10G
Enterprise Border
Router/Firewall
10GE
Site / Campus
access to Science
DMZ resources
Clean,
High-bandwidth
WAN path
10GE
perfSONAR
10GE
Site / Campus
LAN
Science DMZ
Switch/Router
10GE
perfSONAR
Per-service
security policy
control points
High performance
Data Transfer Node
with high-speed storage
Courtesy Eli Dart, ESnet
5/5/14
Lawrence Berkeley National Laboratory
High Latency WAN Path
Low Latency LAN Path
7
nci.org.au
U.S. Department of Energy | Office
of Science
Abstract HPC Center With Data Path
Border Router
WAN
Firewall
Routed
Offices
perfSONAR
Virtual
Circuit
perfSONAR
Core
Switch/Router
Front end
switch
Front end
switch
perfSONAR
Data Transfer
Nodes
High Latency WAN Path
Supercomputer
Low Latency LAN Path
Parallel Filesystem
Courtesy Eli Dart, ESnet
5/5/14
Lawrence Berkeley National Laboratory
High Latency VC Path
8
nci.org.au
U.S. Department of Energy | Office
of Science
Abstract HPC Center With Data Path
Border Router
WAN
Firewall
Routed
Offices
perfSONAR
Virtual
Circuit
perfSONAR
Core
Switch/Router
Front end
switch
Front end
switch
perfSONAR
Data Transfer
Nodes
High Latency WAN Path
Supercomputer
Low Latency LAN Path
Parallel Filesystem
Courtesy Eli Dart, ESnet
5/5/14
Lawrence Berkeley National Laboratory
High Latency VC Path
8
nci.org.au
U.S. Department of Energy | Office
of Science
nci.org.au
AARNet International Links
nci.org.au
NCI’s DTN Nodes
nci.org.au
CBR-SYD and onto the CONUS via SXtransport
nci.org.au
SXtransport – Physical Layout
Cable Station
Network Segment
nci.org.au
SXtransport – Logical Network Layout
nci.org.au
What are some of the world’s longest submarine cables you ask?
39,000 Km
of submarine fibre
nci.org.au
What are some of the world’s longest submarine cables you ask?
39,000 Km
of submarine fibre
28,900 Km of submarine fibre
1,600 Km of terrestrial fibre
nci.org.au
Networking Topology for Data Replication
Courtesy Mary Hester, ESnet
nci.org.au
Initial Transfer Rates from NCI
•
•
Graph shows the data rate vs. the volume of data transferred
Different lines in the graph represent how many data streams were required to obtain the given
performance. The results of the graph indicate that it is possible to get a line-rate of 1GB/s (8Gbps)
between Australia and the United States, however, it requires configuring transfers to run more than
100 parallel streams
nci.org.au
Data replication and Science DMZs
• Currently we’ve replicated ~1.5PB
• Working on improving these rates by
employing a Science DMZ model and
dedicated data transfer nodes
nci.org.au
Globus Online
• Globus Online is a hosted data-transfer-asa-service offering, run by the University of
Chicago
• It makes the job of large data transfers easy
for both instrument owners and end-users
nci.org.au
Globus Online Architecture
nci.org.au
nci.org.au
nci.org.au
nci.org.au
nci.org.au
Using Dedicated DTNs – January 2015
nci.org.au
Using Dedicated DTNs – March 2015
nci.org.au
State of the Union Numbers from the ICNWG Consortium
nci.org.au
Conclusion
• Non-trivial to get various ducks lined-up
– 10GigE WAN networking
– Mellanox tuning work for 10GigE Ethernet
and 56Gbp FDR
– Being NUMA aware is critical for the GridFTP
daemon!
nci.org.au
THE END
nci.org.au
VERIFIED, CURATED SCIENTIFIC
DATASETS
nci.org.au
quality control
processing
Centralized
Quality Control for Data Processing
• Multi-layered QC
– Initial Level 1 QC done
at data nodes
– DKRZ performs L2 QC
– Further metadata and
variable checking is
done to get to L3 QC
• At every step, end-users
can see the QC Level for
their data
• Replicated data has
passed QC Level 3 and
receives a DOI
3-Layer
Quality
Assurance
Concept
9"
LLNL-PRES-648666
nci.org.au