PowerPoint - OptIPuter

Download Report

Transcript PowerPoint - OptIPuter

OptIPuter System Software
Andrew A. Chien
Computer Science and Engineering, UCSD
January 2005
OptIPuter All-Hands Meeting
System Software
OptIPuter Software Architecture for Distributed
Virtual Computers v1.1
OptIPuter Applications
DVC/
Middleware
Visualization
DVC #1
Higher Level
Grid Services
DVC #2
Security
Models
DVC #3
Data Services: Real-Time Layer 5: SABUL, RBUDP,
DWTPHigh-Speed
Objects
Fast, GTP
Transport
Grid and Web Middleware – (Globus/OGSA/WebServices/J2EE)
Layer 4: XCP
Optical
Signaling/Mgmtl-configuration, Net Management
Node Operating Systems
Physical Resources
•
January 2003, OptIPuter All Hands Meeting
System Software
OptIPuter Software Architecture
Distributed Applications/ Web Services
Visualization
Telescience
SAGE
Data Services
JuxtaView
Vol-a-Tile
LambdaRAM
DVC API
DVC Runtime Library
DVC Configuration
DVC Services
DVC
Communication
DVC Job
Scheduling
DVC Core Services
Resource
Namespace
Identify/Acquire
Management
Security
Management
High Speed
Communication
Storage
Services
GSI
XIO
RobuStore
Globus
PIN/PDC
GRAM
GTP
System Software
CEP
XCP
LambdaStream
UDT
RBUDP
System Software/Middleware Progress
•
•
Significant Progress in Key Areas!
A unified Vision of Application Interface to the OptIPuter Middleware
– Distributed Virtual Computer: Simpler Application Models, New Capabilities
– 3-Layer Demonstration: JuxtaView/LambdaRAM Tiled Viz on DVC on Transports
•
Efficient Transport Protocols to exploit High Speed Optical Networks
– RBUDP/LambdaStream, XCP, GTP, CEP, SABUL/UDT
– Single Streams, Converging Streams, Composite Endpoint Flows
– Unified Presentation under XIO (single application API)
•
Performance Modeling
– Characterization of Vol-a-tile Performance on Small-scale Configurations
•
Real-time
– Definition of a Real-time DVC, Components for Layered RT Resource Management –
IRDRM, RCIM
•
Storage
– Design and Initial Simulation Evaluation of LT Code-based Techniques for Distributed
Robust (low variance of access, guaranteed bandwidth) Storage
•
Security
– Efficient Group Membership Protocols to support Broadcast and Coordination across
OptIPuters
System Software
Cross Team Integration and Demonstrations
• TeraBIT Juggling, 2-layer Demo [SC2004, November 8-12, 2004]
– Distributed Virtual Computer, OptIPuter Transport Protocols (GTP)
– Move data between OptIPuter Network Endpoints (UCSD, UIC,
Pittsburgh)
– Share efficiently; Good Flow Behavior, Maximize Transfer Speeds
(saturate all rcvrs)
– Configuration: 10 endpoints, 40+ nodes, 1000’s of miles
– Achieved 17.8Gbps, a TeraBIT in less than one minute!
• 3-layer Demo [AHM2005, January 26-7, 2005]
– Visualization, Distributed Virtual Computer, OptIPuter Transport
Protocols
• 5-layer Demo [iGrid, September 26-8, 2005 ??]
– Biomedical/Geophysical, Visualization, Distributed Virtual Computer,
OptIPuter Transport Infrastructure, Optical Network Configuration
System Software
OptIPuter Software “Stack”
Applications (Neuroscience, Geophysics)
Visualization
5-layer
Demo
Distributed Virtual Computer
(Coordinated Network and Resource Configuration)
Novel Transport Protocols
Optical Network Configuration
System Software
3-layer
Demo
Year 3 Goals
•
Integration and Demonstration of Capability
–
–
–
•
Distributed Virtual Computer
–
–
–
•
Prototype RT DVC, Experiment: remote device control within Campus Scale OptIPuter
Storage
–
•
Characterization of Vol-a-tile, JuxtaView Performance on Wide-Area OptIPuter
Real-time
–
•
LambdaStream: Implement, Analyze Effectiveness, Integrate with XIO
GTP: Release and Demonstrate at Scale; Analytic Stability Modeling
CEP: Implement and Evaluate Dynamic N-to-M Communication
SABUL/UDT: Integrate with XIO; Flexible Prototyping Toolkit
Unified Presentation under XIO (single application API)
Performance Modeling
–
•
Integrate with Network Configuration (e.g. PIN)
Deploy as persistent OptIPuter Testbed Service
Alpha Release of DVC as a Library
Efficient Transport Protocols
–
–
–
–
–
•
All Five Layers (Application, Visualization, DVC, Transport Protocols, Optical Network Control)
Across a Range of Testbeds
With Neuroscience and Geophysical Applications
Prototype RobuSTore, Evaluate using OptIPuter Testbeds and Applications
Security
–
Develop and Evaluate High Speed / Low Latency Network Layer Authentication and Encryption
System Software
10Gig WANs: Terabit Juggling
SC2004: 17.8Gbps, a TeraBIT
< 1 minute!
UI atinChicago
SC2005: Juggle Terabytes in a Minute
10 GE
10 GE
10 GE
NIKHEF
Trans-Atlantic Link
PNWGP
Seattle
10 GE
StarLight
Chicago
NetherLight
Amsterdam
U of Amsterdam
10 GE
2 GE
UCI
10 GE
SC2004
Pittsburgh
CENIC
Los Angeles
Netherlands
United States
2 GE
ISI/USC
10 GE
CENIC
San Diego
UCSD/SDSC
SDSC
10 GE
10 GE
JSOE
2 GE
10 GE
1 GE
System Software
CSE
SIO
3-layer Integrated Demonstration
Nut Taesombut, Venkat Vishwanath, Ryan Wu, Freek Dijkstra,
David Lee, Aaron Chin, Lance Long
UCSD/CSAG, UIC, UvA, UCSD/NCMIR, etc.
January 2005, OptIPuter All Hands Meeting
1. Visualization Application (Juxtaview + LambdaRAM)
2. System SW Fmwork (Distributed Virtual Computer)
3. System SW Transports (GTP, UDT, etc.)
System Software
3-Layer Demo Configuration
SDSC/
San Diego
Output
Video
Streaming
NCMIR/
San Diego
CAMPUS GE
10G/ 0.5 msec
Audiences
•
EVL/
Chicago
NLR/CAVEWAVE
10G/ 70 msec
GTP Flows
Transatlantic Link
4G/ 100 msec
Configuration
– JuxtaView at NCMIR
– LamdaRAM Client at NCMIR
– LambdaRAM Server EVL, UvA
•
•
High Bandwidth (2.5Gbps, ~7 streams)
Long Latencies, Two Configurations
System Software
UvA/
Amsterdam
Distributed Virtual Computers
Nut Taesombut and Andrew Chien
University of California, San Diego
January 2005
OptIPuter All-Hands Meeting
System Software
Distributed Virtual Computer (DVC)
DVC
•
Application Request: Grid Resources AND Network Connectivity
– Redline-style Specification, 1st Order Constraint Language
•
DVC Broker Establishes DVC
– Binds Ends Resources, Switching, Lambda’s
– Leverages Grid Protocols for Security, Resource Access
•
DVC <-> Private Resource Environment, Surface thru WSRF
System Software
Distributed Virtual Computer (DVC)
•
Key Features
– Single Distributed Resource Configuration Description and Binding
– Simple use of Optical Network Configuration and Grid Resource Binding
– Single Interface to Diverse Communication Capabilities
– Transport Protocols, Novel Communication Capabilities
•
Using a DVC
– Application presents Resource Specification
– Requests Grid Resources and Lambda Connectivity
– DVC Broker Selects Resources and Network Configuration
– DVC Broker Binds Resources and Configures Network, and Return List of Bound
Resources and Their Respective (Newly Created) IP’s
– Application Uses These IP’s to Access Created Network Paths
– Application Selects Communication Protocols and Mechanisms amongst Bound
Resources
– Application Executes
– Application Releases the DVC
System Software
[Taesombut & Chien, UCSD]
JuxtaView and LambdaRAM on DVC Example
(1) Requests a Viz Cluster, Storage Servers, and High-Bandwidth Connectivity
Resource/Network
Information Services
(Globus MDS)
Application DVC Manager
Requirements
and Preference
(communication
+ end resources)
Physical Resources and
Network Configuration
[ viz ISA [type =="vizcluster"; InSet(special-device, "tiled display")];
str1 ISA [free-memory>1700; InSet(dataset, "rat-brain.rgba")];
str2 ISA [free-memory>1700; InSet(dataset, "rat-brain.rgba")];
str3 ISA [free-memory>1700; InSet(dataset, "rat-brain.rgba")];
str4 ISA [free-memory>1700; InSet(dataset, "rat-brain.rgba")];
Link1 ISA [restype = "conn"; ep1 = <viz>; ep2 = <str1>; bandwidth > 940; latency <= 100];
Link2 ISA [restype = "conn"; ep1 = <viz>; ep2 = <str2>; bandwidth > 940; latency <= 100];
Link3 ISA [restype = "conn"; ep1 = <viz>; ep2 = <str3>; bandwidth > 940; latency <= 100];
Link4 ISA [restype = "conn"; ep1 = <viz>; ep2 = <str4>; bandwidth > 940; latency <= 100] ]
System Software
viz1: ncmir.ucsd.sandiego
str1: rembrandt0.uva.amsterdam
str2: rembrandt1.uva.amsterdam
str3: rembrandt2.uva.amsterdam
str4: rembrandt6.uva.amsterdam
(rembrandt0,yorda0.uic.chicago)
--- BW 1, LambdaID 3
(rembrandt1,yorda0.uic.chicago)
--- BW 1, LambdaID 4
(rembrandt2,yorda0.uic.chicago)
--- BW 1, LambdaID 5
(rembrandt6,yorda0.uic.chicago)
--- BW 1, LambdaID 17
JuxtaView and LambdaRAM on DVC Example
(2) Allocates End Resources and Communication
•
•
•
Resource Binding (GRAM)
Lambda Path Instantiation (PIN) (Current Demo doesn’t yet include this)
DVC IP Allocation
DVC Manager
PIN Server
192.168.85.13
192.168.85.14
192.168.85.15
192.168.85.16
192.168.85.12
NCMIR/San Diego
System Software
UvA/Amsterdam
JuxtaView and LambdaRAM on DVC Example
(3) Create Resource Groups
•
•
Storage Group
Viz Group
DVC Manager
Storage Group
192.168.85.13
Viz Group
192.168.85.14
192.168.85.15
192.168.85.16
192.168.85.12
NCMIR/San Diego
System Software
UvA/Amsterdam
JuxtaView and LambdaRAM on DVC Example
(4) Launch Applications
•
•
Launch LambdaRAM Servers
Launch JuxtaView/ LambdaRAM Clients
DVC Manager
Storage Group
192.168.85.13
Viz Group
192.168.85.14
192.168.85.15
192.168.85.16
192.168.85.12
NCMIR/San Diego
System Software
UvA/Amsterdam
OptIPuter Component Technologies
1. Real-time DVC’s
2. Application Performance Analysis
3. High Speed Transports (CEP, LambdaStream, XCP, GTP,
UDT)
4. Storage
5. Security
System Software
Vision – Real-Time Tightly Coupled Wide-Area
Distributed Computing
Goals
RealTime
Object
network
• High-precision
Timings of
Critical Actions
• Tight Bounds on
Response Times
• Ease of
Programming
Dynamically
formed
Distributed
Virtual
Computer
–High-Level Prog
–Top-Down Design
• Ease of Timing
Analysis
System Software
Source: Kim, UCI
Real-Time DVC Architecture
Real-time Application
Application expressed as teal time
objects and links w/ various latency
constraints)
Real-Time Object Network
TMO Real-Time Middleware
Collection of Resources with known
performance and security capabilities,
and control & management
Distributed Virtual Machine
Libraries that realize initial
configuration and ongoing management
High Speed Protocols/Network Management
/Basic Resource Management
System Software
Schedules and manages underlying
resources to achieve desired RT
Provides simple resource and
management abstractions, hides
detailed resource management (i.e.
network provisioning, machine
reservation)
Controls and Manages “single”
resources
Real-Time: from LAN to WAN
• RT grid (or subgrid) ::= A grid (or subgrid)
facilitating
(RG1) Message communications with easily determinable
tight latency bounds and
(RG2) Computing node operations enabling easy
guaranteeing of timely progress of threads toward
computational milestones
• RG1 realized via
– Dedicated optical-path WAN
– Campus networks, the LAN part of the RT grid,
equipped with Time-Triggered (TT) Ethernet switches
(a new research task in collaboration with Hermann Kopetz)
System Software
Source: Kim, UCI
Real-Time DVC
(RD1) Message paths with easily determinable tight latency
bounds.
(RD2) In each computing or sensing-actuating site within the
RT DVC, computing nodes must exhibit timing behaviors
which are not different from those of computing nodes in
an isolated site by more than a few percents.
Also, computing nodes in an RT DVC must enable easy
procedures for assuring the very high probability of
application processes and threads reaching important
milestones on time.
=> Computing nodes must be equipped with appropriate infrastructure
software, i.e., OS kernel & middleware with easily analyzable QoS.
(RD3) If representative computing nodes of two RT DVCs are
connected via RT message paths, then the ensemble
consisting of the two DVCs and the RT message paths is
also an RT DVC.
Source: Kim, UCI
System Software
Middleware for Real-Time DVC
" Let us start a chorus at 2pm "
" e-Science "
data
data
data
Acq of l’s;
Alloc of Virtual l’s;
Coord of msg-send
timings
Support exec of
appls via
Alloc of comp &
comm resources
within DVC
On-demand creation of DVCs
RGRM
RT grid resource management
RCIM
IRDRM
RT comm infrastr mgt
Intra-RT-DVC res mgt
Basic Infrastructure Services
IRDRM agent
RCIM agent
Globus System
System Software
l-Configuration
Source: Kim, UCI
Net Management
Progress
• RCIM (RT comm infrastructure mgt)
– Study of TT Ethernet began with the help of Hermann Kopetz
– The 1st unit is expected to become available to us by June 2005.
• IRDRM (Intra-RT-DVC resource
mgt)
var

Compo– TMO (Time-triggered Messagenents of
AAC
TT Method 1
triggered Object) Support
a C++
Middleware (TMOSM) adopted as a object
TT Method 2
AAC
starting base


– A significantly redesigned version
Deadlines
Service Method 1
(4.1) of TMOSM (for improved
modularity, concurrency, and
Service Method 2
portability) has been developed.


It runs on Linux, WinXP, and WinCE.
• No thread, No priority
– An effort for extending the TMOSM
High-level Programming Style
to fit into the Jenks’ cluster began.
Source: Kim, UCI
System Software
Progress
(cont.)
• Programming model
– An API wrapping the services of the RT middleware enables
high-level RT programming (TMO) without a new compiler.
– The notion of Distance-Aware (DA) TMO, an attractive buildingblock for RT wide-area DC applications, was created and a study
for its realization began.
• Application development experiments
– Fair and efficient Distributed On-Line Game Systems and LANbased feasibility demonstration
– Application of the global-time-based coordination principle
– A step towards OptIPuter environment demonstration
• Publication
– A paper on distributed on-line game systems in IDPT2003 proc.
– A paper on distributed on-line game systems to appear in ACMSpringer Journal on Multimedia Systems
– A keynote paper on RT DVC at AINA2004 proc.
– A paper on RT DVC middleware to appear in WORDS2005 proc.
Source: Kim, UCI
System Software
Year 3 Plan
• RCIM (RT comm infrastructure mgt)
– Development of middleware support for TT Ethernet
– The 1st unit of TT Ethernet switch is expected to become
available to us by June 2005.
• IRDRM (Intra-RT-DVC resource mgt)
– Extension of TMOSM to fit into clusters
– Interfacing TMOSM to the Basic Infrastructure Services of
OptIPuter
System Software
Source: Kim, UCI
Year 3 Plan
• Application development experiments
– An experiment for remote access and control within the UCI or
UCSD campus
– A step toward preparation of an experiment for remote access
and control of electron microscopes at UCSD-NCMIR
System Software
Source: Kim, UCI
 Use Prophesy system to Instrument and Study VolaTile
on 5-node System
 Evaluate Performance Impact of Configuration (data
servers, clients, network)
Data access time on 1+4 nodes
Time (secs)
Xingfu Wu <[email protected]>
Performance Analysis and
Monitoring of VolaTile
20
18
16
14
12
10
8
6
4
2
0
protein64x64x64
fuel64x64x64
foot256x256x256
geo256x256x256
geo440x290x198
furdave160x255x75
Scenario 1 Scenario 2 Scenario 3
[Wu & Taylor, TAMU]
Comparison of VolaTile
Configuration Scenarios
Time (secs)
Xingfu Wu <[email protected]>
Data access time on 1+4 nodes
20
18
16
14
12
10
8
6
4
2
0
protein64x64x64
fuel64x64x64
foot256x256x256
geo256x256x256
geo440x290x198
furdave160x255x75
Scenario 1 Scenario 2 Scenario 3
Year 3+ Plans
Xingfu Wu <[email protected]>
•

Port the instrumented Volatile to a largescale optiputer testbed for analysis
(3/2005)
Analyze the performance of JuxtaView
and LambdaRam applications (6/2005)

Where possible, develop models of data
accesses for the different visualization
applications (9/2005)

Continue collaborating with Jason’s
group about viz applications (12/2005)
High Speed Protocols
System Software
High Performance Transport Problem
•
•
OptIPuter is Bridging the Gap Between High Speed Link Technologies and
Growing Demands of Advanced Applications
Transport Protocols Are the Weak Link
– TCP Has Well-Documented Problems That Militate Against its Achieving High
Speeds
– Slow Start Probing Algorithm
– Congestion Avoidance Algorithm
– Flow Control Algorithm
– Operating System Considerations
– Friendliness and Fairness Among Multiple Connections
– These Problems Are the Foci of Much Ongoing Work
– OptIPuter is Pursuing Four Complementary Avenues of Investigation
– RBUDP Addresses Problems of Bulk Data Transfer
– SABUL Addresses Problems of High Speed Reliable Communication
– GTP Addresses Problems of Multiparty Communication
– XCP Addresses Problems of General Purpose, Reliable Communication
System Software
OptIPuter Transport Protocols
E2e Path
Allocated Lambda
Unicast
Managed
Group
Standard
Routers
Enhanced
Routers
RBUDP/
GTP
SABUL/
XCP
l-stream
•
Shared, Routed
UDT
Composite Endpoint Protocol (Efficient N-to-M Communication)
System Software
Composite Endpoint Protocol (CEP)
Eric Weigle and Andrew A. Chien
Computer Science and Engineering
University of California, San Diego
OptIPuter All Hands Meeting, January 2005
System Software
Composite-EndPoint Protocol (CEP)
Uh-oh!
• Network Transfers Faster than Individual Machines
–
–
–
–
A Terabit flow? A 100Gbit flow? A 10Gbps flow w/ 1Gbps NIC’s
Clusters are Cost-effective means to terminate Fast transfers
Support Flexible, Robust, General N-to-M Communication
Manage Heterogeneity, Multiple Transfers, Data Accessibility
System Software
[Weigle & Chien, UCSD]
Example
•
•
•
•
Move Data from a Heterogeneous Storage Cluster (N)
Exploit Heterogeneous network structure and Dedicated Lambda’s
Terminate in a Visualization Cluster (M)
Render for a Tiled Display Wall (M)
– Data flow is not easy for the application to handle.
– May want to locally to the storage cluster to offload checksum/buffering
requirements or avoid a contested link.
System Software
Composite Endpoint Approach
•
Transfers Move Distributed Data
– Provides hybrid memory/file
namespace for any transfer
request
•
Choose Dynamic Subset of Nodes
to Transfer Data
– Performance Management for
Heterogeneity, Dynamic Properties
Integrated with Fairness
•
API and Scheduling
– API enables easy use
– Scheduler handles performance,
fairness, adaptation
•
System Software
Exploit Many Transport Protocols
CEP Efficiently Composes Heterogenous and
Homogeneous Cluster Nodes
50000
7000
45000
6000
Flow BW (Mbps)
5000
4000
Uniform
CEP
Ideal
3000
2000
Flow BW (Mbps)
40000
35000
30000
Ideal
25000
CEP
20000
15000
10000
1000
5000
1
2
3
4
5
6
7
8
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
0
0
Heterogeneous Nodes
•
•
Seamless Composition of Performance, widely varying node performance
High Composition efficiency, demonstrated 32Gbps from 1Gbps nodes!
– Efficiency increasing as implementation improves
– Scaling suggests 1000 node Composites => Terabit Flows
•
Uniform Nodes
Next Steps: Wide Area, Dynamic Network Performance
System Software
Summary and Year 3 Plans
•
Current Scheduling Mechanism is Static
– Selects nodes to move data
– Handles static heterogeneity
– node/link capabilities
– 32Gbps in LAN
•
Simple API Specification
– Ease of use; scheduler takes care of transfer
– Allows Scatter/Gather with arbitrary constraints on data
•
Plans: 1H2005
– XIO implementation: Use GTP, TCP, other transports
– Tuned WAN Performance
– Dynamic Transfer Scheduling (adapt to network and node conditions)
•
Plans: 2H2005
– Security, code stabilization, optimization
– Initial Public Release
– 5-layer Demo Participation
– Better Dynamic Scheduling
– De-centralization
– Fault
Tolerance
System
Software
LambdaStream
Chaoyue Xiong, Eric He, Venkatram Vishwanath,
Jason Leigh, Luc Renambot, Tadao Murata,
Thomas A. DeFanti
January 2005
OptIPuter All Hands Meeting
Electronic Visualization Laboratory
University of Illinois at Chicago
LambdaStream (Xiong)
Applications Need High BW with low jitter
Idea
• Combine loss-based and rate-based techniques
• Loss type prediction, respond appropriately
• => Good BW and Low Jitter
Throughput of TCP and LS on the 1Gbps Link
Jitter of TCP and LS Flow with 2MB Payload
1800
120
1600
100
1200
172Mbps
1000
1720Mbps
800
983Mbps
TCP
600
400
80
Time (ms)
Throughput (Mbps)
1400
TCP
60
LS
40
20
200
0
0
0
0.5
1
1.5
2
2.5
3
3.5
4
Time (s)
Electronic Visualization Laboratory
0
100
200
300
400
500
Round
University of Illinois at Chicago
Loss Type Prediction
When packet loss occurs,
Average receiving interval =
Loss Types:
•Continuous decrease in receiving capability
•Occurrence of congestion in the link
•Sudden decrease in receiving capability or random loss
Electronic Visualization Laboratory
University of Illinois at Chicago
Incipient undesirable
situations avoidance (1)
• When there is no loss, longer receiving
packet interval indicates link congestion or
lower receiving capability.
Sender
∆ts
Bottleneck router
Receiver
wi
wi+1
Electronic Visualization Laboratory
∆tr
University of Illinois at Chicago
Incipient undesirable
situations avoidance (2)
• Metric:
– Ratio between the sending interval and
the average receiving interval during one
epoch.
• Methods to improve precision
– Use weighted addition of receiving
intervals from the previous three epochs.
– Exclude unusual samples.
Electronic Visualization Laboratory
University of Illinois at Chicago
Single Stream Experiment Result (1)
Throughput of TCP and LS on the 1Gbps Link
1800
1600
Throughput (Mbps)
1400
1200
172Mbps
1000
1720Mbps
800
983Mbps
TCP
600
400
200
0
0
0.5
1
1.5
2
2.5
3
3.5
4
Time (s)
Electronic Visualization Laboratory
University of Illinois at Chicago
Single Stream Experiment Result (2)
Jitter of TCP and LS Flow with 2MB Payload
120
100
Time (ms)
80
TCP
60
LS
40
20
0
0
100
200
300
400
500
Round
Electronic Visualization Laboratory
University of Illinois at Chicago
Year 3 Plans
•
•
•
•
Development of XIO driver
Experiments with multiple streams
Integrate with TeraVision and SAGE.
Use formal modeling (Petri Net) to improve the
scalability of the algorithm.
Electronic Visualization Laboratory
University of Illinois at Chicago
Information Sciences
Institute
Joe Bannister
Aaron Falk
Jim Pepin
Joe Touch
OptIPuter Project
Progress
January 18, 2005
OptIPuter XCP Progress
Design of Linux XCP port
Net100 tweaks
Makes most sense for end-systems only; little benefit by
changing OS for XCP routers
Strategy is to put XCP in generic Linux 2.6 kernel; then port to
Net100 (Net100 optimizations are largely orthogonal to XCP)
Technical challenges exist in extending Linux kernel to handle
64-bit arithmetic needed for XCP
Linux port is pending conclusion of on-going design work to
eliminate line-rate divide operations from router
[Bannister, Falk, Pepin, Touch ISI]
OptIPuter XCP Activities
Workshops
 Aaron Falk, Ted Faber, Eric Coe, Aman Kapoor, and Bob Braden. Experimental
Measurements of the eXplicit Control Protocol. Second Annual Workshop on Protocols
for Fast Long Distance Networks. February 16, 2004. http://www.isi.edu/isixcp/docs/falk-pfld04-slides-2-16-04.pdf
 Aaron Falk. NASA Optical Network Testbeds Workshop. August 9-11, 2004, NASA
Ames Research Center. User Application Requirements, Including End-to-end Issues.
http://duster.nren.nasa.gov/workshop7/report.html
Papers


Aaron Falk and Dina Katabi. Specification for the Explicit Control Protocol (XCP), draftfalk-xcp-00.txt (work in progress), October 2004. http://www.isi.edu/isi-xcp/docs/draftfalk-xcp-spec-00.txt
Aman Kapoor, Aaron Falk, Ted Faber, and Yuri Pryadkin. Achieving Faster Access to
Satellite Link Bandwidth. Submitted to Global Internet 2005). December 2004.
http://www.isi.edu/isi-xcp/docs/kapoor-pep-gi2005.pdf
OptIPuter Network Infrastructure
Deployed GBE link between CENIC I2 cloud and ISI
Operational for NSF site visit
Used extensively by viz and Globus groups
Group Transport Protocol (GTP)
Ryan Wu and Andrew A. Chien
Computer Science and Engineering
University of California, San Diego
OptIPuter All Hands Meeting, January 2005
System Software
Optical Network Cores Shift Contention to Network Edge
•
•
Lambda-Grid: Dedicated Optical Connections Provide Plentiful Core
Bandwidth
Driving Applications Access Many High Data Rate Sources
– Multipoint-to-point communication
•
•
=> Congestion moves to the endpoints
Group Transport Protocol: Rate-based + Receiver Based Management
S1
S3
S2
R
S3
S1
S2
`
(a) Shared IP Network
System Software
R
`
(b) Dedicated lambda connections
[Wu & Chien, UCSD]
GTP: Receiver-based Congestion Management
• Request-response for Reliable Data Transfer
• Receiver-based Flow Co-scheduling for Fairness and Low Loss Rate
– Balance Concurrent Data Fetching from Multiple Sources
– Fair across Varied Sender RTTs
– Efficient Transitions under Rapid Changes
• Single Flow Adaptation and Capacity Estimation
Applications
GTP
…...
Single Flow
Control and
Monitoring
Centralized Rate Allocation
R1
R2
UDP (data flow) / TCP (control flow)
IP
Multipoint-to-point contention at receivers
System Software
GTP Receiver Architecture
Quick Single Flow Rate Adaptation
GTP flow 1 starts at t=0, with capacity 1000Mbps; flow 2 starts at
time t=2s, and its maximum transmission rate is 300Mbps.
Single GTP flow (flow 1) is able to quickly probe the available bandwidth.
System Software
Group Transport Protocol (GTP)
•
Multipoint Performance in NS2 Simulations
– Four GTP flows with RTT 20, 40, 60 and 80ms starting at time 0, 2, 3, and 4s.
Converging Flows:
80ms
S4
S3
R
60ms
40ms
S2
20ms
S1
•
GTP uses Receiver-based Management to achieve Rapid Convergence and
Fair Allocation
System Software
[Wu & Chien, UCSD]
Quick Adaptation to Flow Transition
•
•
GTP Simulation, Emulation, TCP Simulation
Second Flow begins at t=10 seconds
Converging Flows:
50ms
S1
R
S2
•
GTP Utilizes Network Efficiently through Flow Transitions
System Software
25ms
Benefits of Receiver-Based Control
•
•
•
SDSC -- NCSA, 10GB transfer (1Gbps link capacity), 58ms RTT
Convergent Flows
GTP outperforms the other Rate-based Protocols due to Receiver-oriented
managment
Converging flows:
S1
1000
R
S2
800
S3
600
NCSA
400
200
0
RBUDP
UDT
GTP
Throughput (Mbps)
443
811
865
Loss Ratio (%)
53.3
8.7
0.06
System Software
SDSC
Year 3 Plan
1H2005
• GTP Implementation and Testing
– Release a reliable version of GTP with XIO driver
•
•
Comprehensive comparison studies between GTP and other
transport protocols
Demonstrations with OptIPuter System Software
2H2005
• Formal stability proofs for GTP will be Developed
– Proof of stability and convergence properties of GTP
– Networking conference publication
•
Extend GTP to Sender Capacity Managment
– Sender side contention managed to achieve good global performance
and fairness
– From single M-to-1 to Multiple M-to-1 (senders to multiple receivers)
System Software
UDP Data Transport (UDT)
Robert L Grossman, Yunhong Gu, Xinwei Hong, &
David Hanley
National Center for Data Mining
University of Illinois at Chicago
OptIPuter All Hands Meeting, January 2005
System Software
Composable Protocol Toolkit (CPT)
(UIC-LAC)
Different CPT
CC functions.
•
Concept / Goals:
–
–
–
–
–
•
Accomplishments:
–
–
–
•
Some Applications will send multiple high volume flows (teraflows) over a single lambda
Application interface to OptIPuter Communication is via XIO interface
Specialized congestion control (CC) algorithms may be needed for these teraflows.
Idea: Accelerate development of new congestion control algorithms with toolkit
– New congestion control implementation <-> different CPT CC functions.
Project co-funded by NSF & DOE
Developed prototype Composable Protocol Toolkit
Interpreted UDT as new type of AIMD protocol called Decreasing Increases AIMD
Conducted initial experimental studies.
Future:
–
–
Continue development and testing of Composible Protocol Toolkit (CPT).
Use CPT to explore congestion control algorithms
System Software
[Grossman, UIC]
Storage Research Activities
Huaxia Xia, Justin Burke, and Andrew Chien
University of California, San Diego
January 2005
OptIPuter All Hands Meeting
System Software
RobuSTore: Robust Performance (Gigabytes/Second) from
Geographically Distributed Storage
•
RobuSTore: Statistical Storage
– Systematic Introduction of Redundancy, High Efficiency LDPC Codes across Distributed
Storage
– Improve Aggregate Statistical Properties of Access => Guaranteed, High Performance
– Predictable Access Latency, Isolatable Performance in Shared Environments
•
Goals
– Distributed RobuSTore System
– Support Flexible Distributed Storage Sharing
System Software
Storage Progress
•
High Performance File System Survey
– Study existing parallel/distributed file systems
– GPFS, Lustre, PVFS, Galley, DASF, Vesta, Armada, FAB, MPIO,, Zebra, etc.
– No existing system meets needs of OptIPuter environment!
– => Selected Lustre (emerging Open Source Standard) as Prototyping
Environment
•
Key Question: Can Erasure Codes can be Applied in a High Performance
System?
– Best previous performance: ~150Mb/s (LuigiRizzo)
– New Memory Hierarchy Tuned, Tiled Implementation Achieves 300+ MByte/s
(about 16 times faster) on a 2Ghz Xeon
– Fast enough to keep up with OptIPuter network
•
RobuSTore Design: Complete at High Level
– Detailed Analytical Modeling and Simulation is underway
– There are MANY (millions) of ways to apply the idea
– Initial Performance Results
System Software
Preliminary RobuSTore Simulation Results
Disks: Same Type,
Different Layout
Simple Striping: 1-16x
Storage Overhead
Erasure Code: 3x
Storage Overhead
•
Read 1GB Data: Simple Striping versus Erasure-Coded Striping
–
RobuSTore use of Erasure Codes Improvement
– 3-5x Average Performance
– 3x Standard Deviation
System Software
Year 3 Plans
•
Extensive Simulations of RobuSTore Design and Testbed Configurations
– Evaluate Alternatives
– Provide Configuration Guidelines for Layout, Striping Algorithms
•
Prototype Implementation on Lustre
–
–
–
–
Experiments on UCSD Testbeds
Exploit high speed OptIPuter Transport Protocols (GTP, CEP, etc.)
Efficient Name Space Management and Metadata Service
Evaluation Using Benchmarks and Neuroscience and Geophysical Application
Workloads
System Software
Security
Mike Goodrich
University of California, Irvine
January 2005
OptIPuter All Hands Meeting
Broadcast Encryption




Group controller (GC)
broadcasts messages
A set S of n devices receive
every message
A subset R of r devices from
S are revoked
The group controller should
encrypt messages so that
only non-revoked devices
can decrypt them, even if
the revoked devices collude
GC
Valid
Devices
Revoked
Devices
Efficient Secure Broadcast
Encryption


Tree-based Membership Revocation (the hard part)
Invented the first zero-state broadcast encryption scheme
to achieve O(r) messages per broadcast and O(log n) keys
per device, with r revoked devices



Small number of keys / member
Small number of messages (few round trips!)
The constants are small and the schemes are practical
The n devices
[Goodrich, Sun, Tamassia, UCI]
Range Counting in Geometric
Data Streams
Bagchi, Chaudhary, Eppstein, Goodrich


A Data Stream is a massive data set which is
revealed one item at a time.
Several data stream settings involve spatial data:




Sensor data e.g. for air quality measurement.
Traffic or herd monitoring e.g. location information for
mobile phones.
Scientific data.
The challenge is to perform useful computations
on these data streams while maintaining a small
memory footprint.
New Results for Data Streams


Deterministic epsilon-Approximations for data
streams can be computed in polylogarithmic
time and space.
These have many applications, including
solving iceberg queries and in robust statistics.
Authentication for Weak
Computational Devices
Atallah, Frikken, Goodrich, Tamassia



Computationally ``lightweight''
schemes for performing
biometric authentication without
revealing information that can
later be used to impersonate the
user.
The client and server need only
perform cryptographic hash
computations on the feature
vectors, and do not perform any
expensive public-key encryption
operations.
Appealing even in a framework
of powerful devices capable of
public-key signatures and
encryptions.
Authentication for Weak
Computational Devices, cont.
Atallah, Frikken, Goodrich, Tamassia


Our schemes make it
computationally infeasible for an
attacker to impersonate a user
even if the attacker completely
compromises the information
stored at the server.
Likewise, our schemes make it
computationally infeasible for an
attacker to impersonate a user
even if the attacker completely
compromises the information
stored at the client device.
Year 3 Plans


UCI: Uncheatable Grid Computing
[Touch & Bannister, USC/ISI]


Transec: High Speed Transport Security for
OptIPuter
Scalable defenses to protect



TCP against SYN attacks, RST/data window attacks,
etc.
UDP against port overload
Applies FASTsec (IPsec++ for perf.)
Information Sciences
Institute
Joe Bannister
Aaron Falk
Jim Pepin
Joe Touch
OptIPuter Project
Year 3 Plans
January 18, 2005
OptIPuter TranSec
Scalable defenses
•
•
Protect TCP against SYN attacks, RST/data window attacks, etc.
Protect UDP against port overload
Applies FASTsec (IPsec++ for perf.)
•
•
Pipelining, parallelism support
Partial protection variants
Merges per-packet w/per-data security
•
Decouple header security from data security
FASTSec for OptIPuter
Pipelining support
•
•
Reduces per-packet latency
Multiple IPsec headers with chunked data
Parallelism support
•
Multiple IPsec headers using different keys on a single stream, to
enable parallel hardware
Partial / delayed protection
•
•
Protect header with IPsec on-line
Protect data with CRC elsewhere if needed
Goals
Coordinated but diverse protection:
•
•
•
SYN protection during connection establishment
RST / data window protection after
Port protection throughout
Scales with performance
•
Enables parallel, offloaded pre-validation
Protect header differently than data
•
•
Different strength
Different time (per packet vs. per data chunk)
>> lower latency, higher-throughput transport security
Summary
•
Lots of progress!
•
•
Integrated demonstrations: 3-layer to full 5-layer with applications!
Increasing in size, scale, and performance!
•
Broad Range of Activities driving Core Technologies forward
–
–
–
–
–
–
DVC
Real-Time (TMO)
Performance Analysis (Prophesy)
High Speed Protocols (CEP, LambdaStream, XCP, GTP, UDT)
Storage (RobuSTore)
Security
•
Come and Join the fun!
•
Questions?
System Software