Sonoma_2010_Wednesday_Hofmann

Download Report

Transcript Sonoma_2010_Wednesday_Hofmann

Joint Capability Technology Demonstration
OSD (RFD) – USSTRATCOM – NRL – NGA – INSCOM – DISA
Agenda
▪
▪
▪
▪
▪
▪
Warfighter Problem
Large Data Concept of Operations
Operational Utility Assessment
Why LD JCTD Works
Transition
Summary
2
Global enterprise data available
“as if on local desktop”
Warfighter
Data
Combatant Command
Ops & Intel
Enterprise
High-performance data access
Resilient failover/Continuity of Operations
Enhanced transport
Extremely dense, shared storage
DCGS
ISR
Enterprise
Intelligence
Community (IC)
Enterprise
CALA
Other
Enterprise
Data
Medical,
Weather,
Logistics, etc
3
Large Data Reference Model
Net-File
Servers
Web
Servers
File System Client
File System Client
Application
Servers
File System Client
Database
Servers
File System Client
Graphic
Stations
File System Client
Wide Area
Networks
Workgroup
Ethernet
WAN
Bridge
InfiniBand
Switches
Ethernet
Switches
Type 1
Encryptor
IP, SONET, or
Lambda
Links
Work
Stations
File System Client
Client
Application
Domain
Examples:
• Standard Workstations
• Thin Clients
• File System Clients
Storage
Appliance
Storage
Appliance
File
Servers
Shared File System
Links to Disks:
• Fibre Channel
• SCSI
• SATA II
• SAS
Core
Large-Data
Domain
Disk Arrays
Key Interfaces:
InfiniBand
Ethernet
IP or SONET
Open Standards
• OFED
• Key Interfaces
•
InfiniBand DDR/SDR
•
Ethernet
•
Software Standards
•
SuSE Linux
•
OMAR (OGS WMS)
•
Lustre (POSIX)
•
Open Source Software
•
Lustre File System
Commodity Hardware
•
Intel Servers
•
Multiple suppliers for
modular components
Fully MOSA Compliant
Architecture
Scalable to 1000’s of Nodes — Petabyte File System today…Exabytes tomorrow
4
Large Data DISN Interface
• LD architecture efficiently exploits
GIG-BE/DISN
• Cost effective system interfaces can
take full advantage of 10 Gbps
• Flexibility to utilize all encryption
solutions
• Options to leverage lambda, IP/MPLS,
Ethernet services
• Options to connect to current DISN
Interfaces
• 10 Gbps lambda
• 1 Gbps IP for smaller clients
• 2.5 Gbps PT-to-PT
•
Options to connect to future DISN
Interfaces
• 40** Gbps lambda
Large Data JCTD
Data - Centric
Info Services, Collaboration,
Visualization, Applications
DISN
Servers
Global File
System
Storage Virt.
RDMA IP
IB & Ethernet
IP - Centric
Voice, Video, and Data
All security domains (e.g.
Unclassified etc.)
Lambda Access
IP
• 10/40** Gbps IP/MPLS service*
• 10/40** Gbps Ethernet service*
DWDM
* Requires no new development; not yet validated in LD
FIBER
** 40 Gbps hardware development underway
5
Standard 2-Rack Node
• Scalable & Flexible
• Extremely Dense
• High Performance
• Low Latency
• Open Standards
• Open Systems
• Open Source
• Commodity Parts
6
LD Storage-Processing Solutions
1.2 PB
Availability: Access to global data – “as if
on desktop”
Availability: Drive failure tolerance,
typically 8+2 parity, Raid6 technology
Availability: Journaling, providing the
Reduction / Elimination of LUN rebuilds
through I/O bit mapping
Reliability: Detection and protection
against silent data corruption
75 TB
Parity calculation on all reads
On-the-fly-data-correction
Space: Extreme disk density (150TB/ft2)
Performance: Delivers sustainable realtime raw large block I/O at near line rates
– 8 GBps
Economics: Sleep mode for
power/cooling savings
High Performance = Low Latency
Access to Critical Data
22,400 cores – 1540u
up to 11.8 PB
5-16u
90TB
Man-portable
7
LD Web Services
• LD provides “point and
click” access to
globally distributed
data
• Thin-client access with
thumbnail previews
• Provides near real-time
updates to planning
and intelligence
applications
8
Use of Open Technology in JCTD
Lustre Key benefits
▪ Lustre
▫ Providing scalable, global distributed file
system
▫ Leverages DoE investment and Sun
Microsystems backing
▪ OSSIM Mapping Archive (OMAR)
▫ Provides dynamic web services
▫ Leverages IC investment with Radiant Blue
▪ Open Federation Enterprise Development
(OFED)
▫ Industry consortium for an open standard
high performance transport, storage and
processing protocol stack
•
•
•
•
•
•
•
•
•
•
•
•
•
Open source, multi-platform and multivendor under GPL
Aggregates petabytes of storage and tens of
thousands of nodes into one file system
Production-quality stability and high
availability
POSIX-compliant
Object-based architecture
Windows support with pCIFS
Heterogeneous networking
RDMA performance enhancements with
InfiniBand networks
Manual and automated failover
All designs and the internals course are on
lustre.org web site
CVS tree (repository of all code as it
develops) is open to all
Architecture discussions open to all on
lustre development mail list
Managed by a Fortune 500 major technology
company (Oracle Sun) and in wide use
among high performance systems
9
LD JCTD Operational Utility Assessment
Large
Data
LD
LD
LD
LD
“While the first of about 10 files
was still being transferred to the
legacy work station, Large Data
had all ten files.”
LD
LD JCTD IOUA Report, Nov 09, JHU/APL
Globally Synchronized, Shared Data and High Resolution Collaboration
10
LD Web Services (Backend)
LD Site Backend Processing Comparison
Time to create on-demand products (Seconds)
25
Sites 1 & 2 orthorectify and display Imagery to thin clients (on-demand);
Site 3 converts MPG-2 to FLV (on-demand) and streams video to clients
Site 1 AVG = 3.77 sec
Site 2 AVG = 4.44 sec
For Imagery Processing
20
Site 3 AVG = 9.35 sec
For Video Processing
15
10
5
0
7/21/2009
7/26/2009
7/31/2009
Site A Backend Proc AVG
8/5/2009
Site S Backend Proc AVG
8/10/2009
8/15/2009
Site N Backend Proc AVG
No discernible difference to user for remote (~1200 miles*) vs. local
imagery processing
* ~2000 fiber miles
11
Operational Demonstration Results
Operational Impact
• GEOINT access & web services for warfighters
• UNCLAS US Gov’t and NGO support
• Remote access to large, distributed ISR files
• Data virtualization & near real-time failover
“Simply put, the system NRL has in place for delivering large,
AOI detailed imagery is outstanding and truly a model for the
DoD/IC.”
– Sean Wohltman, Google Inc.
Performance Results Summary as a
Fraction of Theoretical Maximum Data Transfer Rate
RDMA/IB
Suitability
• Demonstrated TRL-7/8/9
• Cost effective, open source
• Commodity components
• Stability/availability on par with
operational systems
“A quick overview of the system was all that
was required for ease of use”
-- LD User
TCP/IP/Ethernet
NETWORK
(Bandwidth
Efficiency)
Single
Stream
Multiple
Streams
Single
Stream
Multiple
Streams
Transition Metric
Threshold/Objective
75% / 90%
80% / 90%
75% / 90%
80% / 90%
94% ++
98% ++
86% +
17%
89% +
83% +
42%
22%
FILE SYSTEM
(Read/Write)
Single File
Multiple
Files
Single File
Multiple
Files
Transition Metric
Threshold/Objective
60% / 80%
60% / 80%
60% / 80%
60% / 80%
Long-Haul WAN
(13,000 fiber miles)
72% ++
94% ++
34%
6%
Wide Area Network
(2,000 fiber miles)
Long-Haul WAN
(13,000 fiber miles)
Meets Threshold + / Meets Objective ++
Effectiveness: LD met or exceeded Transition Thresholds
12
LD System Performance
Test 1: File System-to-File System Data Results:
91%-99% of Max Theoretical Data Transfer Rate
Network Type
Pt-to-Pt
Connection
Line Rate
MAN
(2.5 ms latency)
OC-192
WAN
(~2000 fiber mi;
34.5 ms latency)
Long Haul WAN
(~13,000 fiber
miles;
206 ms latency)
Type of File Transfer
(Single or Multiple)
Single file
Aggregate Multiple
Simultaneous Files
Single File
OC-192
Partial
OC-48
Aggregate Multiple
Simultaneous Files
Results Achieved:
% of Theoretical/
Max Rate (MBps)
62.8%
590.6
99.6%
935.7
59.1%
555.2
91.1%
856.4
Single File
86.0%
182.3
Aggregate Multiple
Simultaneous Files
94.6%
200.7
• Maximum Theoretical Data Transfer Rate for OC-192 = 939.7 MBps;
for Partial OC-48 = 212.0 MBps; 1 MB = 1,048,576 bits (2^20)
Test 2: Remote Access Results:
User “accessing” vs. “copying” file
from remote file system to local
workstation
• RDMA-enabled
• Access “as if local” to large ISR files
over MAN/WAN/Long Haul WAN
• Matches local performance
• Saves 99.9% of time over “copy to
local storage & open”!
Test 3: Data Stream Results:
File System-to-User Workstations
• OC-192 Link:
-- MAN: 808 MBps (85.9% Max)
-- WAN: 759 MBps (80.8% Max)
• OC-48 Partial Link:
-- LH Wan: 160 MBps (75.4% Max)
OFED RDMA / Lustre file system enables remote access
and/or very rapid file delivery to user applications
13
LD Network Performance
Typical
RDMA/IB
Performance
Typical
RDMA/IB
Performance
TESTS ON 8 Gbps CIRCUIT (~1200 miles)
[~2000 fiber miles]
Typical
TCP/IP/ETHERNET
Performance
TESTS ON 1 Gbps CIRCUIT (~8000 miles)
[ ~13,000 fiber miles]
• RDMA/Infiniband (IB) provides highly
efficient use of available bandwidth
• IB scales well with multiple, concurrent
data flows
• RDMA/IB performance ≥ 80%
• TCP/IP performance ≤ 40%
• RDMA CPU usage estimated 4x less
• IB is lossless and provides “fair share”
of bandwidth
14
Large Data DISN Interface
• LD architecture efficiently exploits
GIG-BE/DISN
• Cost effective system interfaces can
take full advantage of 10 Gbps
• Flexibility to utilize all encryption
solutions
• Options to leverage lambda, IP/MPLS,
Ethernet services
• Options to connect to current DISN
Interfaces
• 10 Gbps lambda
• 1 Gbps IP for smaller clients
• 2.5 Gbps PT-to-PT
•
Options to connect to future DISN
Interfaces
• 40** Gbps lambda
Large Data JCTD
Data - Centric
Info Services, Collaboration,
Visualization, Applications
DISN
Servers
Global File
System
Storage Virt.
RDMA IP
IB & Ethernet
IP - Centric
Voice, Video, and Data
All security domains (e.g.
Unclassified etc.)
Lambda Access
IP
• 10/40** Gbps IP/MPLS service*
• 10/40** Gbps Ethernet service*
DWDM
* Requires no new development; not yet validated in LD
FIBER
** 40 Gbps hardware development underway
15
Shared IP/MPLS Test Configuration
IP/MPLS Configuration
Testing goals:
1.
2.
3.
Validate carrying LD
JCTD
traffic over a HAIPE/
IP/MPLS network
Validate feasibility of
operating at scalable
provisioning levels
(2, 3, 4, 5, … Gbps)
Test LD JCTD in the
presence of competing
IP traffic
Ciena
encryp
WAN GW
Ciena
Site 1
Ciena
encryp
WAN GW
Site 2
DISA Lab
NRL Lab
16
RDMA/IB Performance on DISN Core
RDMA/IB performance with competing traffic on 10 Gbps IP/MPLS network
Validated:
RDMA/IB traffic over a
shared IP/MPLS core
network in a DISA lab
RDMA/IB network
transactions performed
well at a range of
operating levels
Excellent RDMA
results over IP core
up to assigned BW
Expected RDMA
results over IP core
beyond assigned BW
This is the approx. volume of data
that the node attempted to send.
RDMA performs well within DISN assigned bandwidth
17
LD Data Virtualization/Failover
Final Configuration
Initial Configuration
Initial
Database
Host
VM3
Site 1
Server
Mirrored
VM
Storage
Site 1
VM4
VM0
Final
Database
Host
VM1
Site 2
Server
VM2
Site 1
Server
VM6
Site 1
Remote mirroring allows
VM clusters to include
servers from different
remote data centers
Site 2
Server
VM2
VM5
Site 2
VM6
Shared Resource Cluster
Shared Resource Cluster
High performance
computing/storage for
virtualized enterprise
VM0
VM1
Mirrored
VM
Storage
VM5
Site 2
VM3 VM4
Various
combinations of data
migration from
locations ~1200
miles apart
Database
Database Applications
Operator Initiated Migration
Linux VM
VM Hypervisor
Ethernet
InfiniBand
Site 1 Data
Site 2 Data
Site 1
Linux VM
Ethernet
Switch
WAN Bridge
WAN Bridge
Ethernet
Switch
VM Hypervisor
Storage (IB)
Switch
Storage (IB)
Switch
Virtual Storage
Controller
Virtual Storage
Controller
VM
Storage “Distributed RAID 1” Mirrored Storage
VM
Storage
Linux VM
Site 2
18
Why the LD JCTD works
client
The LD JCTD demonstrated the use
of RDMA and a clustered Global File
System over long distances to
create a globally accessible storage
and compute cloud
• Data available to clients anywhere in the world
• Remote clients may disconnect at will
• Centralized apps available to clients
client
client
client
Data Center
Data Center
Data Center
Global File System
+
RDMA interconnect
1. LD JCTD used RDMA and parallel file systems to build
multiple scalable, cost effective data centers
2. LD JCTD extended RDMA over a high bandwidth WAN to
virtualize the data centers
3. Clients connect to virtual data center via cost effective,
low bandwidth (1 Gbps) IP networks
LD JCTD used standards-based COTS technology and components to demonstrate a
secure, cloud computing infrastructure operating over the DISN
19
LD JCTD – Instant Data Access
Structured Data
Access
• Global ingest, processing, distribution of data
• Data is “known” from the moment of ingest
• Direct file access (file search not required)
Global File System
• Scalable cost effective storage - Inherent COOP
• Globally distributed clustered file system
• Local caching (for performance)
• Direct file access (file transfer not required)
RDMA interconnect
• Fast data access times (seconds, not hours)
• Efficient WAN usage, fast data transfers
• Scalable, cost effective, low power data centers
Instantaneous data access model – “as though the data were
on your desktop”
20
LD Technical Readiness Level
Demonstrated
TRL-7, with
major elements
of TRL-8 and
TRL-9
Successful LD
operations in
mission
environment
21
LD Transition
▪ DoD and IC programs of record are adopting
LD benefits and capabilities in FY10/11 for:

Rapid, global data access and federated exploitation for very large files
such as imagery and wide area persistent surveillance

Operationally responsive data dissemination/transfer

Data federation & synchronization for planning

Support to global intelligence operations

Enhancing net-centric data delivery to warfighters
22
SC09 Demonstration for Laboratory for
Advanced Computing @ UIC / NRL
• Adjusted BW in presence
of congestion and loss to
achieve max rates
InfiniBand
4x SDR
Workstation
WAN
Gateway
WAN
Gateway
IBoIPv6oE
NRL/SC09
Portland, OR
UDToIPv6oE
10-GigE
Monitor
TeraFlow
Layer 2
10-GigE
IP Router
(IPoIB-to-IPoE)
UDToE
IBoIPv6oE
InfiniBand
Switch
Fabric
4x SDR
10-GigE
InfiniB
and
4x DDR
4x SDR
InfiniBand
IBoIPv6oE
1 PB
Disk
Array
InfiniBand
4x DDR
InfiniBand
UDToE
Ethernet
Switch
InfiniBand
4x SDR
IPv4oE
IPv4oE
WAN
Gateway
WAN
Gateway
NLR
Layer 3
(IPv6)
10GigE
Workstation
Open Cloud
Testbed
Layer 3
(IPv4)
10-GigE
• Rendered images in
Portland, processed in
DC, then retransmitted
back to Portland
Data
Center
IPv4oE
Ethernet
Switch
• Displayed high-res
(2560x1600x70fps video
stream in Portland from
Washington
LAC/SC09
Portland, OR
FPGAs
• IBoIPV6oE, switched
and routed including
RDMA
NRL
Data
Center
Data
Center
FPGAs
LAC/UIC
Chicago, IL
UDToIPv6oE
IBoIPv6oE
NRL
Washington, DC
LD HPC
Testbed
18 November 2009
Super Computing 2009 Bandwidth Challenge Winner
(“Overall” and “Manifold-Process Implementation” Categories)
23
Dr. Hank Dardy, 1943-2010
24
Questions?
25
LD JCTD Concept of Operations
Moving
Data to
Moving Data to Users
Users
26
Large Data Reference Model
Client Domain
(IP/Eth)
DISN
Workstations
Client Domain
(RDMA/IB)
Workstations
InfiniBand
LAN GW
Web
Servers
InfiniBand
Switch
Application
Servers
Lustre
File System
Remote
Systems
Encryptor
WAN GW
InfiniBand
Switch
Remote
Systems
Encryptor
WAN GW
Storage
Controller
WAN
Domain
Core
Domain
Open Standards
Open Systems
Open Source
Commodity Parts
Disk
Drives
Scalable to 1000’s of Nodes — Petabyte File System today…Exabytes tomorrow
27
Warfighter Problem
1018
Exabytes
Warfighter
Problem
1015
UUVs
FIRESCOUT VTUAV DATA
GIG Data Capacity (Services, Transport & Storage)
2000
Today
2010
2015 & Beyond
Large Data JCTD
Notional Gap
Petabytes
1012
Terabytes
28
Summary
▪ LD underpins net-centric warfighting by providing a
data-centric DoD information enterprise
▫ LD seeded in key programs
▫ Next generation performance (scalable to exabytes)
in smaller footprint at lower cost
▪ Working with Transition Partners to ensure
integrated enterprise implementation
29
LD JCTD – What Does It Mean?
Standard DVD = 4.7 GB
1 Hr of WAOS* raw data = 1 TB
▪ 5 Mbps (Home Connection)
2.1 Hrs
▫ DVD
▫ WAOS
▪ OC-3 (155 Mbps – decent business connection)
4 Min
▫ DVD
14 Hrs
▫ WAOS
▪ 1 Gbps (Standard GIG/DISN LAN)
▫ DVD
37.6 Seconds
2.2 Hrs
▫ WAOS
▪ 10 Gbps (OC-192 – 8 Gbps effective)
▫ DVD
4.7 Seconds
16.7 Min… still too slow…
▫ WAOS
18.5 Days
Shared File Systems and Distributed SAN structure – no “FTP” latency
• DVD
Nearly instant ≤ 5 seconds effective remote access time
• WAOS
* Wide Area Optical Surveillance
30
Large Data JCTD
Global Access, Global Visualization
OE
OM
TM
Fritz Schultz
703.697.3443
[email protected]
Randy Heth
402.232.2122
[email protected]
Jim Hofmann
202.404.3132
[email protected]
XM
Mike O’Brien
703.735.2721
[email protected]
XM
Mike Laurine
703.882.1358
[email protected]