TCP/IP Protocol - Open Grid Forum
Download
Report
Transcript TCP/IP Protocol - Open Grid Forum
Observations on Architecture,
Protocols, Services, APIs, SDKs,
and the Role of the Grid Forum
Ian Foster
With: Carl Kesselman, Steven Tuecke
Thanks also to: Bill Johnston, Marty
Humphrey, Rusty Lusk, Reagan Moore,
and others
1
Overview
1.
2.
3.
4.
5.
The Grid problem: controlled resource
sharing in multi-institutional settings
Standards as a means of enabling
sharing of code, resources, services
Aside: definition, role, and importance
of protocols, services, SDKs, APIs, etc.
A “Grid Architecture”: a categorization
of protocols, services, SDKs, and APIs
Questions for the Grid Forum
2
The Grid Problem
Grid
R&D has its origins in high-end
computing & metacomputing, but…
In practice, the “Grid problem” is about
resource sharing & coordinated problem
solving in dynamic, multi-institutional
virtual organizations
– Lack of central control, omniscience, trust
Primary
challenge: to enable, maintain,
and control the sharing of resources to
achieve a common goal
3
Examples of Virtual Organizations
Members
of a scientific collaboration
– E.g., NSF PACIs, IPG, NEESgrid, GriPhyN
– Sharing: computers, storage, software, …
Application
server provider + customers
– Sharing: ASP computers
Participants
in peer-to-peer network
– E.g., Gnutella, Napster, Entropia, …
– Sharing: resources on individual PCs
Tremendous variety in scope, timescale,
types of sharing, etc.
4
Universal Nature of the Grid Problem
“Sharing”
fundamental in many settings
– Application Service Providers, Storage
Service Providers, etc.; Peer-to-peer
computing; Distributed computing; Business
to business; …
Sharing
issues not adequately
addressed by existing technologies
– Sharing at a deep level, across broad ranges
of resources and in a general way
– E.g., user provides ASP with controlled
access to their data on an SSP: how??
Grid
community has unique experience
5
Creating Usable Grids:
What are the Challenges?
Approaches
to problem solving
– Data Grids, distributed computing, peerto-peer, collaboration grids, …
Structuring
and writing programs
– Abstractions, tools
Enabling
resource sharing across
distinct institutions
– Resource discovery, access, reservation,
allocation; authentication, authorization,
policy; communication; fault detection and
notification; …
6
What is the Role of Grid Forum in
Enabling Grid Computing?
1.
Information exchange, of course
Experiences, patterns, structures
Useful even if every application & Grid is
a vertical “stovepipe”
2.
3.
Advocacy
Enabler of shared effort
In code development: libraries, tools, …
Via resource sharing: shared Grids
In infrastructure
Opinion: Long term, only the third is
sufficiently compelling to justify GF
7
Q: How do we Enable Shared Effort?
A: “Standards” are Required
To
enable portability/sharing of code
– E.g., MPI lets me write portable // programs
To
enable resource sharing
– E.g., IP lets my computer speak to yours
To
enable shared infrastructure
– E.g., X.509 lets me share Certificate Authorities
But
what sorts of “standards”?
– Variously, APIs/SDKs, protocols, syntax, …
– Observe that these are sometimes confused, so
let’s spend some time on definitions …
8
Some Important Definitions
Resource
Network
protocol
Network enabled service
Application Programmer Interface (API)
Software Development Kit (SDK)
Syntax
Not
discussed, but important: policies
9
Resource
An
entity that is to be shared
– E.g., computers, storage, data, software
Does
not have to be a physical entity
– E.g., Condor pool, distributed file system, …
Defined
devices
in terms of interfaces, not
– E.g. scheduler such as LSF and PBS define a
compute resource
– Open/close/read/write define access to a
distributed file system, e.g. NFS, AFS, DFS
10
Network Protocol
A
formal description of message formats
and a set of rules for message exchange
– Rules may define sequence of message
exchanges
– Protocol may define state-change in
endpoint, e.g., file system state change
Good
protocols designed to do one thing
– Protocols can be layered
Examples
of protocols
– IP, TCP, TLS (was SSL), HTTP, Kerberos
11
Network Enabled Services
Implementation
of a protocol that
defines a set of capabilities
– Protocol defines interaction with service
– All services require protocols
– Not all protocols are used to provide
services (e.g. IP, TLS)
Examples:
FTP and Web servers
FTP Server
Web Server
FTP
Telnet
Protocol Protocol
HTTP Protocol
TCP Protocol
TCP Protocol
IP Protocol
IP Protocol
TLS Protocol
12
Application Programmer Interface
A
specification for a set of routines to
facilitate application development
– Refers to definition, not implementation
– E.g., there are many implementations of MPI
Spec
often language-specific (or IDL)
– Routine name, number, order and type of
arguments; mapping to language constructs
– Behavior or function of routine
Examples
– GSS API (security), MPI (message passing)
13
Software Development Kit
A
particular instantiation of an API
SDK consists of libraries and tools
– Provides implementation of API
specification
Can
have multiple SDKs for an API
Examples of SDKs
– MPICH, Motif Widgets
14
Syntax
Rules
for encoding information, e.g.
– XML, Condor ClassAds, Globus RSL
– X.509 certificate format (RFC 2459)
– Cryptographic Message Syntax (RFC 2630)
Distinct
from protocols
– One syntax may be used by many protocols
(e.g., XML); & useful for other purposes
Syntaxes
may be layered
– E.g., Condor ClassAds -> XML -> ASCII
– Important to understand layerings when
comparing or evaluating syntaxes
15
A Protocol can have Multiple APIs
E.g., TCP/IP
TCP/IP
APIs include BSD sockets,
Winsock, System V streams, …
The protocol provides interoperability:
programs using different APIs can
exchange information
I don’t need to know remote user’s API
Application
Application
WinSock API
Berkeley Sockets API
TCP/IP Protocol: Reliable byte streams
16
An API can have Multiple Protocols
E.g., Message Passing Interface
MPI
provides portability: any correct
program compiles & runs on a platform
Does not provide interoperability: all
processes must link against same SDK
– E.g., MPICH and LAM versions of MPI
Application
Application
MPI API
MPI API
LAM SDK
MPICH-P4 SDK
LAM protocol
TCP/IP
Different message
formats, exchange
sequences, etc.
17
MPICH-P4 protocol
TCP/IP
Back to Grids:
The Programming & Systems Problems
Approaches
to problem solving
– Data Grids, distributed computing, peerto-peer, collaboration grids, …
Structuring
and writing programs
– Abstractions, tools Programming Problem
Enabling resource sharing across
distinct institutions
– Resource discovery, access, reservation,
allocation; authentication, authorization,
policy; communication; fault detection and
notification; … Systems Problem
18
Aspects of the Programming Problem
Need
for abstractions and models to add
to speed/robustness/etc. of development
– E.g., OO abstractions, MPI for messaging
Need
for code/tool sharing to allow reuse
of code components developed by others
– E.g., MPI allows reuse of message passing
– E.g., standard profilers, debuggers
Primary
need is for standard programming
environments: APIs and SDKs
20
Aspects of the Systems Problem
Need
for interoperability when different
groups want to share resources
– Diverse components, policies, mechanisms
– E.g., standard notions of identity, means of
communication, resource descriptions
Need
for shared infrastructure services to
avoid repeated development, installation
– E.g., one port/service for remote access to
computing, not one per tool/application
– E.g., Certificate Authorities: expensive to run
Need
standard protocols, services, syntax
21
I.e., Standard APIs and Protocols are
Both Important: For Different Reasons
Standard
APIs/SDKs are important
– They enable application portability
– But w/o standard protocols, interoperability
is hard (every SDK speaks every protocol?)
Standard
protocols are important
– Enable cross-site interoperability
– Enable shared infrastructure
– But w/o standard APIs/SDKs, application
portability is hard (different platforms
access protocols in different ways)
22
Grid “Architecture”
We
now proceed to analyze Grid
systems with respect to standards
Identify key areas where protocols,
services, APIs, and SDKs can occur
Result is a layered protocol
architecture
We assert this can be useful as a
means of describing and structuring
Grid Forum activities
23
Layered Grid Architecture
(By Analogy to Internet Architecture)
Application
User
“Managing multiple resources”:
ubiquitous infrastructure services
Collective
Application
“Sharing single resources”:
negotiating access, controlling use
Resource
“Talking to things”: communication
(Internet protocols) & security
Connectivity
Transport
Internet
“Controlling things locally”: Access
to, & control of, resources
Fabric
Link
24
Internet Protocol Architecture
“Specialized services”: user- or
appln-specific distributed services
Protocols, Services, and Interfaces
Occur at Each Level
Applications
Languages/Frameworks
User Service APIs and SDKs
User Services
Collective Service APIs and SDKs
Collective Services
Resource APIs and SDKs
Resource Services
User Service Protocols
Collective Service Protocols
Resource Service Protocols
Connectivity APIs
Connectivity Protocols
Local Access APIs and Protocols
Fabric Layer
25
An Aside on Terminology
Is
this an “architecture” or just a
“categorization” or “taxonomy”?
– A matter of opinion (c.f. IAB: “Many
members of the Internet community would
argue that there is no architecture”)
– Our opinion: it is somewhere in between,
but is useful regardless
Becomes
more architectural if/as we
define “necessary” pieces at each level
Note that protocols says nothing about
SDKs/APIs architecture (& vice versa)
26
Important Points
We
build on Internet protocols
– Communication, routing, name resolution, etc.
“Layering”
here is conceptual, does not
imply constraints on who can call what
– Protocols/services/APIs/SDKs will, ideally, be
largely self-contained
– But some things are fundamental: e.g.,
communication and security
– But, advantageous for higher-level functions to
use common lower-level functions
27
Example: User Portal
API
SDK
Appln
Web Portal
User Source code discovery, application
configuration
Collective Brokering, co-allocation, certificate
authorities
Access to data, access to computers,
Resource
access to network performance data
Communication, service discovery (DNS),
Connect authentication, authorization, delegation
Fabric Storage systems, schedulers
28
Lookup
Protocol
Source
Code Repository
API
SDK
Access
Protocol
Compute
Resource
Example:
High-Throughput Computing System
API
SDK
Appln
High Throughput Computing System
User Dynamic checkpoint,
failover, staging
job management,
C-point
Protocol
Checkpoint
Repository
Collective Brokering, certificate authorities
API
Resource Access to data, access to computers,
access to network performance data
Connect Communication, service discovery (DNS),
authentication, authorization, delegation
Fabric Storage systems, schedulers
29
SDK
Access
Protocol
Compute
Resource
Standards, Again:
Intergrid Protocols and Grid APIs
One
or many protocols?
– No one “right” protocol for any one function
– But: interoperability requires that we define
and commit to core “Intergrid” protocols
– Definition: “A resource is Grid-enabled if it
speaks Intergrid protocols”
One
or many APIs and SDKs?
– Many APIs, SDKs, programming models can
target Intergrid protocols
– But: code sharing requires standards
– So, e.g., “standard Grid collaboration APIs”
30
Questions for the Grid Forum
Is
the “Grid architecture” described
here a useful framework?
– Could it be made more useful?
– Are there things that it fails to capture or
misrepresents?
Would
it be a useful discipline for us to
try to place GF efforts in this context
– E.g., be clear whether we are defining a
protocol, service, API, SDK, syntax (or
something else: which is fine, too)
– E.g., explain (and argue about) where in
the stack different pieces fit
31
Questions for the Grid Forum
Are
some things easier, or more
important, to standardize than others?
– Protocols vs. APIs vs. syntax
– Connectivity vs. resource vs. collective vs.
user layer protocols/services/APIs/SDKs
I
would suggest that
– Items lower in the stack tend to have broader
impact, but standards useful at all levels
– Size of community effected (e.g., number of
adopters) is the key figure of merit
– We should ask explicitly for such an analysis
as part of a WG charter
32
Questions for the Grid Forum
Can
we define core “intergrid protocols”?
– I.e., instantiate (lower) layers in the diagram
– We have avoided it until now (implies choice)
– Until we do, interoperability is difficult
Possible
approaches
– Avoid seeking consensus, instead standardize
where it makes sense and where we can; rely
on sense of “best practice” emerging
– Or, create an architecture WG, charged with
defining requirements for “core protocols”??
– I think latter is better, unsure if it can work
33
Summary
Grids
are about [large-scale] sharing
– Hence require standard protocols to enable
interoperability and shared infrastructure
– And, of course standard APIs and SDKs to
enable portability & code sharing
– Both important; but very different
Well
defined architecture can help
understanding & progress
– Provides a framework for figuring out
where the pieces fit
– Facilitates asking questions such as “where
are standards particularly important?”
34
Questions?
35