The California Institute for Telecommunications and

Download Report

Transcript The California Institute for Telecommunications and

OptIPuter System Software
Andrew A. Chien
SAIC Chair Professor,
Computer Science and Engineering, UCSD
Director, Center for Networked Systems
September 2003
System Software
OptIPuter System Software Team
•
Challenge
– ~20 Lead Researchers, Many More in Entire Team
– Diverse Researcher Backgrounds and Focus
– Broad Research Agenda, Abstract Shared Perspective
•
Process
– Innumerable Phone Calls and 1-on-1 Meetings, Fall 2002-Spring 2003
– Team Meeting with UCSD and UCI Teams (October 4, 2002)
– Straw Man OptIPuter System Software Architecture (January 2003)
– Goals, Context, Organization, Relationship of Efforts
– OptIPuter All Hands Meeting, February 6-7, 2003
– First Presentation to Entire Team
– Feedback, Revision, Improvement, Deeper Understanding, Shared
Perspective
– Optical Signalling and Network Management Meeting (May 22, 2003)
– Mambretti Organized
– OptIPuter Software Architecture Version 1.0 (July 2003)
– Structure Stabilized, interfaces Becoming Concrete
System Software
l’s Transform Distributed Systems
• Key Technology Changes
– Massive Bandwidth
– 100-1000x Increases Wide-Area Systems
– “End To End” l-Connections
– Private Networks, Guaranteed Bandwidth
– Endpoints are Parallel Clusters
– Large-Scale Network-Attached
Challenge is
– Storage
Abstractions,
– Instruments
Technologies, and
– Displays
Protocols (SOFTWARE!)
– Other Peripherals
to Deliver these
– Grids and Flexible Wide-Area Sharing
• Opportunities
–
–
–
–
Communication
Tight Wide-area Resource Coupling
Simpler Distributed Applications
Proactive Computing and Communication
System Software
Capabilities to
Applications
Towards Middleware for l-Networked Systems
Globus Architecture
DUROC, GARA, Replica
Catalogs, Metadata Servers,
Brokers, Workflow
GRAM, GridFTP, GRIS, Coallocation
•
Application
Collective
Resource
Globus_IO/XIO & GSI
Connectivity
Resource Access and
Control: Computers,
Storage, Networks
Fabric
Leverage Investment and Capabilities (e.g. Globus 2.2 and 3.0)
– Carl Kesselman OptIPuter Participant
– Ian Foster, OptIPuter Frontier Advisory Board
•
Explore What Must Change
– New Software/Protocols for Managing Lambdas
– Simplify, Deliver Higher Performance and New Capabilities
System Software
OptIPuter Software Architecture for Distributed
Virtual Computers v1.1
OptIPuter Applications
DVC/
Middleware
Visualization
DVC #1
Higher Level
Grid Services
DVC #2
Security
Models
DVC #3
Data Services: Real-Time Layer 5: SABUL, RBUDP,
DWTPHigh-Speed
Objects
Fast, GTP
Transport
Grid and Web Middleware – (Globus/OGSA/WebServices/J2EE)
Layer 4: XCP
Optical
Signaling/Mgmtl-configuration, Net Management
Node Operating Systems
Physical Resources
System Software
OptIPuter Links Three Major Sets of
Technology Activities
• Distributed Virtual Computers
– Provide a Simple Abstractions
– Aggregate Component Technology Capabilities
– Surface Novel Capabilities
• High speed Transport Protocols [Bannister’s Talk]
– Long Thread of High Bandwidth-Delay Product Network Protocols
– Span The Range “Reach” For Dedicated Optical Connections
– Complete Integration with IP Network Management
– Hybrid – to Local Packet-Switched Networks
– Separate – End-to-end
• Optical Network Signaling and Management [Mambretti’s Talk]
– Single Domain and Inter-Domain
– Hybrid Circuit and Packet-Switched Networks
– Planning and Execution
System Software
Distributed Virtual Computers
System Software
Exploiting l’s for an Application
•
Network View: Ad Hoc connections
•
System View: Enclave of Resources and Connections
– Applications Request l-Connections
– Network Recognizes High BW flows and Configures
– a Distributed Virtual Computer (a SYSTEM)
– How to Specify, Implement, and Exploit?
System Software
DVC Examples
SDSC
UCI or UIC
UCSD CSE
•
Virtual Cluster (Hide Complexity of Grid; Resource Flexibility)
–
–
–
–
–
•
SIO/NCMIR
Shared Single Domain (Spans Multiple)
Private Connections; Simple Network Naming
Simple Resource Discovery and Access
Uniform Performance Characteristics
Direct Access to Everything (Storage, Displays, etc.)
Real-Time Virtual Cluster for Distributed Collaborative Visualization
– Grid Resources + Real-Time (TMO)
•
Collaborative Visualization Cluster
– Grid Resources + Photonic Multicast or LambdaRAM (Leigh)
System Software
Realizing Distributed Virtual Computers
• Research Challenges
– Application-driven Definition of Abstractions
– Useful Collections which Match Application Paradigms and Needs
– Incorporates New Collective Models
– DVC Description
– Namespaces, Communication, Performance, Real-Time, …
– Standard Specifications; Most Applications Parameterize
– Integration Of Component Technologies
• Executing the DVC on a Grid
– Planner That Identifies Resources
– Selects from Virtual Grid Resources
– Negotiates with Resource Managers and Brokers
– Executor and Monitor for DVC
– Acquires and Configures
– Monitors for Failures and Performance
– Adapts and Reconfigures
System Software
OptIPuter Component Technologies
System Software
Current Storage Views
•
Network-attached Storage (NAS)
– Filesystem protocols; Integrated Access-Control and Security
– Low performance; Little Aggregation and Parallelism
•
Grid View: High-Level Storage Federation
– GridFTP (Distributed File Sharing)
– GSI-based Access/Authentication
– Put/Get, Third-Party Transfers, Whole File and Segments
•
Single-System view: Lower-level storage federation
– Secure Single System View
– SAN – Block Level Disk and Controller Protocols
– High Performance, Efficient sharing
•
Research Areas
– Network-Attached Secure Disk
– Direct Access File Systems
System Software
We Need a Distributed Storage Solution
for e-Science Distributed Data Generators
•
BIRN: Distributed Data, Intensive Analysis
– 100GB Data Elements; Petabyte Data Sets
– Comparative and Collective Analysis across Data Elements
– Visualization of Multi-Scale Data Objects
System Software
Storage Research Directions
• From Performance to Performability
– Manage and Exploit Multi-Latency Performance
– Parallel Performance, Stability, and Isolation
– Integration of Device, Network, Site Reliability Concerns
• OptIPuter Storage Directions
– Application-Driven Design
– Needs, Performance, Device/Site/Network Flexibility, Coding and
Selection
– Integrate Dynamic l’s and SAN Networks
– Peering, Protocol Interfacing, Performance
– Performance Robust Storage
– Erasure/Other Redundancy; Large-Scale Parallelism; Statistical
Approaches to Performance Isolation
– Secure Shared Storage: Threshold Cryptography Approach
System Software
OptIPuter Security Considerations
• OptIPuter as a Computing Platform
– Information Assurance and Security Needed for Applications
– Current Plan: use Globus Security Infrastructure
• OptIPuter as a Research Platform
– Current Efforts
– Distributed Security Services (Goodrich & Tamassia)
– Incremental IP Trace-Back via Packet Marking for DOS Defense
(Goodrich)
– Enhanced Forensic Analysis By Design (Karin & Peisert)
– Planned Efforts
– Minimum Round Trip Latency Control (Goodrich)
– Hardening Against Attacks by Multi-Path Routing (Goodrich, Karin)
– End-to-End Application and Session Security Through Dedicated
Lambdas (Karin)
System Software
Source: Karin, UCSD and Goodrich, UCI
Multi-Lambda Security Opportunities
• Security Frequently Defined Through Three Measures:
– Integrity, Confidentiality, And Reliability (“Uptime”)
• Can These Measures be Enhanced by Employing
Multiple Lambdas?
• Can Confidentiality be Improved by Dividing the
Transmission Over Multiple Lambdas?
– Fundamentally or Using “Cheap” Encryption?
• Can Integrity be Ensured or Reliability Improved by
Exploiting Redundancy?
– Source Coding and Performance
– Adaptive Techniques
System Software
Vision – Real-Time Tightly Coupled Wide-Area
Distributed Computing
Goals
RealTime
Object
network
• High-precision
Timings of
Critical Actions
• Tight Bounds on
Response Times
• Ease of
Programming
Dynamically
formed
Distributed
Virtual
Computer
–High-Level Prog
–Top-Down Design
• Ease of Timing
Analysis
System Software
Source: Kim, UCI
Real-Time: from LAN to WAN
• Time-Triggered Message-Triggered Object (TMO)
Middleware Subsystem Model that can be Easily
Implemented on Both Windows and Linux Platforms
Components of
a C++
object
• Developed a Global Time-Based
Coordination for use in Fair and
Efficient Distributed On-Line Game
Systems and LAN Feasibility
Demonstration
– a Step towards Distributed OptIPuter
Environment Demonstration
– Paper will be Presented at IDPT 2003
Conference, December 2003
System Software
var

AAC
TT Method 1
AAC
TT Method 2


Service Method 1
Deadlines
Service Method 2


• No thread, No priority
High-level Programming Style
Source: Kim, UCI
TMO and OptIPuter Software
• TMO will be Integrated into the
Overall OptIPuter Software
Architecture
• Begin Design TMO Programming
Framework for the OptIPuter
• Prototype Implementation TMO
Support on Linux Platforms,
Including OptIPuter Visualization
Cluster (UIC – Leigh, UCI -- Jenks)
" Let us start a chorus at 2pm "
data
data
Middleware
" e-Science "
data
Middleware
FT Support
TMOSM
FT Support
TMOSM
Kernel
Kernel
Lambda
mux / demux
Lambda
mux / demux
• An API Wrapping the Services of the RT Middleware Enables
High-Level RT Programming Without a new Compiler
System Software
Source: Kim, UCI
Prophesy:
Application Performance Modeling
•
•
•
•
•
Performance Modeling of
Applications on OptIPuter
Cross Platform Comparison
(vs. Traditional Grid & Parallel)
Yr1: Completed Data Analysis
Profiling &
Instrumentation
Module
Yr2: Work with Applications
and High Speed Transport
Protocols
Actual
Target applications include:
– SIO Geophysical Data
Visualization
– NCMIR/BIRN Neuroscience
Applications
System Software
Web-based GUI
Template
Database
Model
Builder
Performance
Database
Symbolic
Predictor
Execution
Systems
Database
DATA
COLLECTION
Source: Taylor, TAMU
DATABASES
DATA
ANALYSIS
Summary
• OptIPuter System Software Team Organization
– Development of a Concrete, Shared Perspective
– Organization into Tightly-Coupled Teams
• OptIPuter Software Architecture 1.0 (July 2003)
– Provides Focus on Key Problems, Clusters Related Activities
– Framework for Integrating Diverse Capabilities, Identifying Gaps,
Integrating and Delivering Solutions
• Research Activity Clusters
– Distributed Virtual Computers
– Including Real-Time, Security, Storage, Performance Modeling
– High Speed Transport Protocols
– Optical Signaling and Network Management
System Software