Grids for Data Intensive Science
Download
Report
Transcript Grids for Data Intensive Science
Grids for Data Intensive Science
Paul Avery
University of Florida
http://www.phys.ufl.edu/~avery/
[email protected]
Texas APS Meeting
University of Texas, Brownsville
Oct. 11, 2002
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
1
Outline of Talk
Grids
and Science
Data
Grids and Data Intensive Sciences
High
Energy Physics
Digital Astronomy
Data
Grid Projects
Networks
and Data Grids
Summary
This talk represents only a small slice of a
fascinating, multifaceted set of research efforts
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
2
Grids and Science
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
3
The Grid Concept
Grid:
Geographically distributed computing resources
configured for coordinated use
Fabric:
Physical resources & networks provide raw capability
Middleware: Software ties it all together (tools, services, etc.)
Goal:
Transparent resource sharing
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
4
Fundamental Idea: Resource Sharing
Resources
for complex problems are distributed
Advanced
scientific instruments (accelerators, telescopes, …)
Storage and computing
Groups of people
Communities
require access to common services
Research
collaborations (physics, astronomy, biology, eng. …)
Government agencies
Health care organizations, large corporations, …
Goal
“Virtual Organizations”
Create
a “VO” from geographically separated components
Make all community resources available to any VO member
Leverage strengths at different institutions
Add people & resources dynamically
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
5
Short Comment About “The Grid”
There
is no single “Grid” a la the Internet
Many
Grids
Grids, each devote to different organizations
are (or soon will be)
The
foundation on which to build secure, efficient, and fair sharing
of computing resources
Grids
are not
Sources
of free computing
The means to access and process Petabyte-scale data freely
without thinking about it
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
6
Proto-Grid: SETI@home
Community:
Arecibo
Over
SETI researchers + enthusiasts
radio data sent to users (250KB data chunks)
2M PCs used
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
7
More Advanced Proto-Grid:
Evaluation of AIDS Drugs
Entropia
“DCGrid”
software
Uses 1000s of PCs
Chief
applications
Drug
design
AIDS research
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
8
Some (Realistic) Grid Examples
High
energy physics
3,000
physicists worldwide pool Petaflops of CPU resources to
analyze Petabytes of data
Climate
modeling
Climate
scientists visualize, annotate, & analyze Terabytes of
simulation data
Biology
A
biochemist exploits 10,000 computers to screen 100,000
compounds in an hour
Engineering
A
multidisciplinary analysis in aerospace couples code and data in
four companies to design a new airframe
Many
commercial applications
From Ian Foster
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
9
Grids: Why Now?
Moore’s
law improvements in computing
Highly
functional endsystems
Universal
wired and wireless Internet connections
Universal
Changing
connectivity
modes of working and problem solving
Interdisciplinary
teams
Computation and simulation as primary tools
Network
(Next
exponentials
slide)
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
10
Network Exponentials & Collaboration
Network
(WAN) vs. computer performance
Computer
speed doubles every 18 months
WAN speed doubles every 12 months (revised)
Difference = order of magnitude per 10 years
Plus ubiquitous network connections!
1986
to 2001
1,000
Networks: 50,000
Computers:
2001
to 2010?
60
Networks: 500
Computers:
Scientific American (Jan-2001)
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
11
Basic Grid Challenges
Overall
goal: Coordinated sharing of resources
Resources
Many
under different administrative control
technical problems to overcome
Authentication,
authorization, policy, auditing
Resource discovery, access, negotiation, allocation, control
Dynamic formation & management of Virtual Organizations
Delivery of multiple levels of service
Autonomic management of resources
Failure detection & recovery
Additional
issue: lack of central control & knowledge
Preservation
of local site autonomy
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
12
Advanced Grid Challenges: Workflow
Manage
workflow across Grid
Balance
policy vs. instantaneous capability to complete tasks
Balance effective resource use vs. fast turnaround for priority jobs
Match resource usage to policy over the long term
Goal-oriented algorithms: steering requests according to metrics
Maintain
a global view of resources and system state
Coherent
end-to-end system monitoring
Adaptive learning: new paradigms for execution optimization
Handle
user-Grid interactions
Guidelines,
Build
agents
high level services & integrated user environment
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
13
Layered Grid Architecture
(Analogy to Internet Architecture)
Application
User
Managing multiple resources:
ubiquitous infrastructure services
Collective
Sharing single resources:
negotiating access, controlling use
Talking to things:
communications, security
Application
Resource
Connectivity
Transport
Internet
Fabric
Link
Controlling things locally:
Accessing, controlling resources
Internet Protocol Architecture
Specialized services:
App. specific distributed services
From Ian Foster
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
14
Globus Project and Toolkit
Globus
Project™ (UC/Argonne + USC/ISI)
O(40)
researchers & developers
Identify and define core protocols and services
Globus
Toolkit™ 2.0
Reference
Globus
Toolkit used by most Data Grid projects today
US:
GriPhyN, PPDG, TeraGrid, iVDGL, …
EU-DataGrid and national projects
EU:
Recent
implementation of core protocols & services
progress: OGSA and web services (2002)
OGSA:
Open Grid Software Architecture
Applying “web services” to Grids: WSDL, SOAP, XML, …
Keeps Grids in the commercial mainstream
Globus ToolKit 3.0
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
15
Data Grids
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
16
Data Intensive Science: 2000-2015
Scientific
discovery increasingly driven by IT
Computationally
intensive analyses
Massive data collections
Data distributed across networks of varying capability
Geographically distributed collaboration
Dominant
2000
2005
2010
2015
factor: data growth (1 Petabyte = 1000 TB)
~0.5 Petabyte
~10 Petabytes
~100 Petabytes
~1000 Petabytes?
How to collect, manage,
access and interpret this
quantity of data?
Drives demand for “Data Grids” to handle
additional dimension of data access & movement
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
17
Data Intensive Physical Sciences
High
energy & nuclear physics
Including
Gravity
new experiments at CERN’s Large Hadron Collider
wave searches
LIGO,
GEO, VIRGO
Astronomy:
Digital sky surveys
Sloan
Digital sky Survey, VISTA, other Gigapixel arrays
“Virtual” Observatories (multi-wavelength astronomy)
Time-dependent
3-D systems (simulation & data)
Earth
Observation, climate modeling
Geophysics, earthquake modeling
Fluids, aerodynamic design
Pollutant dispersal scenarios
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
18
Data Intensive Biology and Medicine
Medical
data
X-Ray,
mammography data, etc. (many petabytes)
Digitizing patient records (ditto)
X-ray
crystallography
Bright
X-Ray sources, e.g. Argonne Advanced Photon Source
Molecular
genomics and related disciplines
Human
Genome, other genome databases
Proteomics (protein structure, activities, …)
Protein interactions, drug delivery
Brain
Craig Venter keynote
@SC2001
scans (1-10m, time dependent)
Virtual
Population Laboratory (proposed)
Database
of populations, geography, transportation corridors
Simulate likely spread of disease outbreaks
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
19
Example: High Energy Physics @ LHC
“Compact” Muon Solenoid
at the LHC (CERN)
Smithsonian
standard man
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
20
CERN LHC site
CMS
LHCb
ALICE
Texas APS Meeting (Oct. 11, 2002)
Atlas
Paul Avery
21
Collisions at LHC (2007?)
ProtonProton
Protons/bunch
Beam energy
Luminosity
2835 bunch/beam
1011
7 TeV x 7 TeV
1034 cm2s1
Bunch
Crossing rate
Every 25 nsec
Proton
Collision rate ~109 Hz
(Average ~20 Collisions/Crossing)
Parton
(quark, gluon)
l
l
Higgs
o
Z
+
e
Particle
e+
New physics rate ~ 105 Hz
e-
o
Z
jet
Texas APS Meeting (Oct. 11, 2002)
jet
e-
Selection: 1 in 1013
SUSY.....
Paul Avery
22
Data Rates: From Detector to Storage
Physics filtering
40 MHz
~1000 TB/sec
Level 1 Trigger: Special Hardware
75 GB/sec
75 KHz
Level 2 Trigger: Commodity CPUs
5 GB/sec
5 KHz
Level 3 Trigger: Commodity CPUs
100 MB/sec
100 Hz
Raw Data to storage
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
23
LHC Data Complexity
“Events”
resulting from beam-beam collisions:
Signal
event is obscured by 20 overlapping uninteresting collisions
in same crossing
CPU time does not scale from previous generations
2000
Texas APS Meeting (Oct. 11, 2002)
2007
Paul Avery
24
LHC: Higgs Decay into 4 muons
(+30 minimum bias events)
All charged tracks with pt > 2 GeV
Reconstructed tracks with pt > 25 GeV
109 events/sec, selectivity: 1 in 1013
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
25
LHC Computing Overview
Complexity:
Millions of individual detector channels
Scale:
PetaOps (CPU), Petabytes (Data)
Distribution:
Global distribution of people & resources
1800 Physicists
150 Institutes
32 Countries
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
26
Global LHC Data Grid
Experiment (e.g., CMS)
Tier0/( Tier1)/( Tier2) ~1:1:1
~100 MBytes/sec
Online
System
Tier 0
2.5 Gbits/sec
Tier 1
France
Italy
UK
CERN Computer
Center > 20 TIPS
USA
2.5 Gbits/sec
Tier 2
Tier2 Center
Tier2 Center
Tier2 Center
Tier2 Center
Tier2 Center
~0.6 Gbits/sec
Tier 3
InstituteInstitute Institute
~0.25TIPS
Institute
0.1 - 1 Gbits/sec
Tier 4
Physics data cache
PCs, other portals
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
27
LHC Tier2 Center (2001)
“Flat” switching topology
FEth/GEth
Switch
WAN
Router
>1 RAID
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
Tape
28
LHC Tier2 Center (2001)
“Hierarchical” switching topology
FEth Switch
FEth Switch
GEth Switch
FEth Switch
FEth Switch
WAN
Router
>1 RAID
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
Tape
29
Hardware Cost Estimates
Buy
1.4 years
1.1 years
2.1 years
1.2 years
late, but not too late: phased implementation
R&D
Phase
2001-2004
Implementation Phase
2004-2007
R&D to develop capabilities and computing model itself
Prototyping at increasing scales of capability & complexity
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
30
Example: Digital Astronomy Trends
Future dominated by detector improvements
1000
• Moore’s Law growth in CCDs
100
• Gigapixel arrays on horizon
10
• Growth in CPU/storage
tracking data volumes
1
• Investment in software critical
Glass
MPixels
0.1
1970
1975
1980
1985
1990
1995
2000
CCDs
Glass
•Total area of 3m+ telescopes in the world in m2
•Total number of CCD pixels in Mpix
•25 year growth: 30x in glass, 3000x in pixels
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
31
The Age of Mega-Surveys
Next
generation mega-surveys will change astronomy
Top-down
design
Large sky coverage
Sound statistical plans
Well controlled, uniform systematics
The
technology to store and access the data is here
We
are riding Moore’s law
Integrating
these archives is for the whole community
Astronomical
data mining will lead to stunning new discoveries
“Virtual Observatory”
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
32
Virtual Observatories
Multi-wavelength astronomy,
Multiple surveys
Standards
Source Catalogs
Image Data
Specialized Data:
Information Archives:
Spectroscopy, Time Series,
Polarization
Discovery Tools:
Derived & legacy data:
NED,Simbad,ADS, etc
Visualization, Statistics
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
33
Virtual Observatory Data Challenge
Digital
representation of the sky
All-sky
+ deep fields
Integrated catalog and image databases
Spectra of selected samples
Size
of the archived data
40,000
square degrees
Resolution < 0.1 arcsec > 50 trillion pixels
One band (2 bytes/pixel)
100 Terabytes
Multi-wavelength:
500-1000 Terabytes
Time dimension:
Many Petabytes
Large,
globally distributed database engines
Multi-Petabyte
data size
Thousands of queries per day, Gbyte/s I/O speed per site
Data Grid computing infrastructure
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
34
Sloan Sky Survey Data Grid
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
35
Data Grid Projects
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
36
New Collaborative Endeavors via Grids
Fundamentally
Old:
New:
alters conduct of scientific research
People, resources flow inward to labs
Resources, data flow outward to universities
Strengthens
Couples
universities
universities to data intensive science
Couples universities to national & international labs
Brings front-line research to students
Exploits intellectual resources of formerly isolated schools
Opens new opportunities for minority and women researchers
Builds
partnerships to drive new IT/science advances
Physics
Application
Universities
sciences
Fundamental
sciences
Research Community
Texas APS Meeting (Oct. 11, 2002)
Astronomy, biology, etc.
Computer Science
Laboratories
IT infrastructure
IT industry
Paul Avery
37
Background: Major Data Grid Projects
Particle Physics Data Grid (US, DOE)
Data Grid applications for HENP expts.
GriPhyN (US, NSF)
Petascale Virtual-Data
iVDGL (US, NSF)
Global Grid lab
Grids
Data
DOE Science Grid (DOE)
Link major DOE computing
TeraGrid (US, NSF)
Dist. supercomp.
sites
resources (13 TFlops)
European Data Grid (EU,
Data Grid technologies,
CrossGrid (EU, EC)
Realtime Grid tools
DataTAG (EU, EC)
Transatlantic network,
EC)
EU deployment
intensive expts.
Collaborations
of
application scientists &
computer scientists
Infrastructure
deployment
Globus
devel. &
based
Grid applications
Japanese Grid Project (APGrid?) (Japan)
Grid deployment throughout Japan
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
38
GriPhyN: PetaScale Virtual-Data Grids
Production Team
Individual Investigator
Interactive User Tools
Virtual Data Tools
Request Planning &
Scheduling Tools
Resource
èResource
èManagement
Management
èServices
Services
Workgroups
~1 Petaflop
~100 Petabytes
Request Execution &
Management Tools
èSecurity
and
Security
and
èPolicy
Policy
èServices
Services
Other Grid
Services
èOther Grid
èServices
Transforms
Distributed resources
Raw data
source
Texas APS Meeting (Oct. 11, 2002)
(code, storage, CPUs,
networks)
Paul Avery
39
Major facilities, archives
Virtual Data Concept
Data request may
Compute locally
Compute remotely
Access local data
Access remote data
Regional facilities, caches
Scheduling based on
Local policies
Global policies
Cost
Texas APS Meeting (Oct. 11, 2002)
Fetch item
Paul Avery
Local facilities, caches
40
Early GriPhyN Challenge Problem:
CMS Data Reconstruction
Master Condor
job running at
Caltech
5) Secondary
reports complete
to master
Caltech
workstation
6) Master starts
reconstruction jobs
via Globus
jobmanager on
cluster
April 2001
Caltech
NCSA
Wisconsin
2) Launch secondary job on Wisconsin
pool; input files via Globus GASS
Secondary
Condor job on UW
pool
3) 100 Monte
Carlo jobs on
Wisconsin Condor
pool
9) Reconstruction
job reports
complete to master
7) GridFTP fetches
data from UniTree
4) 100 data files
transferred via
GridFTP, ~ 1 GB
each
NCSA Linux cluster
Texas APS Meeting (Oct. 11, 2002)
8) Processed
objectivity
database stored
to UniTree
Paul Avery
NCSA UniTree
- GridFTPenabled FTP
server
41
Particle Physics Data Grid
Funded by DOE MICS ($9.5M for 2001-2004)
DB replication, caching, catalogs
Practical orientation: networks, instrumentation, monitoring
Computer Science Program of Work
CS1: Job Description Language
CS2: Schedule and Manage Data
Processing & Placement Activities
CS3 Monitoring and Status Reporting
CS4 Storage Resource Management
CS5 Reliable Replication Services
CS6 File Transfer Services
….
CS11 Grid-enabled Analysis
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
42
iVDGL: A Global Grid Laboratory
“We propose to create, operate and evaluate, over a
sustained period of time, an international research
laboratory for data-intensive science.”
From NSF proposal, 2001
International
A
A
A
A
A
U.S.
Virtual-Data Grid Laboratory
global Grid laboratory (US, EU, Asia, South America, …)
place to conduct Data Grid tests “at scale”
mechanism to create common Grid infrastructure
laboratory for other disciplines to perform Data Grid tests
focus of outreach efforts to small institutions
part funded by NSF (2001-2006)
$13.7M
(NSF) + $2M (matching)
UF directs this project
International partners bring own funds
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
43
Current US-CMS Testbed (30 CPUs)
Wisconsin
Princeton
Fermilab
Caltech
UCSD
Florida
Brazil
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
44
US-iVDGL Data Grid (Dec. 2002)
SKC
LBL
Wisconsin Michigan
PSU
Fermilab
Argonne
NCSA
Caltech
Oklahoma
Indiana
Paul Avery
J. Hopkins
Hampton
FSU
Arlington
Texas APS Meeting (Oct. 11, 2002)
BNL
Vanderbilt
UCSD/SDSC
Brownsville
Boston U
UF
Tier1
Tier2
Tier3
FIU
45
iVDGL Map (2002-2003)
Surfnet
DataTAG
New partners
Brazil
T1
Russia
T1
Chile
T2
Pakistan T2
China
T2
Romania ?
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
Tier0/1 facility
Tier2 facility
Tier3 facility
10 Gbps link
2.5 Gbps link
622 Mbps link
Other link
46
TeraGrid: 13 TeraFlops, 40 Gb/s
Site Resources
26
4
Site Resources
HPSS
HPSS
24
8
External
Networks
Caltech
External
Networks
Argonne
40 Gb/s
External
Networks
Site Resources
HPSS
Texas APS Meeting (Oct. 11, 2002)
SDSC
4.1 TF
225 TB
5
NCSA/PACI
8 TF
240 TB
Paul Avery
External
Networks
Site Resources
UniTree
47
DOE Science Grid
Link
major DOE computing sites (LBNL)
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
48
EU DataGrid Project
Work
Package
Work Package title
Lead
contractor
WP1
Grid Workload Management
INFN
WP2
Grid Data Management
CERN
WP3
Grid Monitoring Services
PPARC
WP4
Fabric Management
CERN
WP5
Mass Storage Management
PPARC
WP6
Integration Testbed
CNRS
WP7
Network Services
CNRS
WP8
High Energy Physics Applications
CERN
WP9
Earth Observation Science Applications
ESA
WP10
Biology Science Applications
INFN
WP11
Dissemination and Exploitation
INFN
WP12
Project Management
CERN
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
49
LHC Computing Grid Project
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
50
Need for Common Grid Infrastructure
Grid
computing sometimes compared to electric grid
You
plug in to get a resource (CPU, storage, …)
You don’t care where the resource is located
This analogy is more appropriate than originally intended
expresses a USA viewpoint uniform power grid
What happens when you travel around the world?
It
Different frequencies
Different voltages
Different sockets!
60 Hz, 50 Hz
120 V, 220 V
USA, 2 pin, France, UK, etc.
Want to avoid this situation in Grid computing
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
51
Grid Coordination Efforts
Global
Grid Forum (GGF)
www.gridforum.org
International
forum for general Grid efforts
Many working groups, standards definitions
Next one in Toronto, Feb. 17-20
HICB
(High energy physics)
Represents
HEP collaborations, primarily LHC experiments
Joint development & deployment of Data Grid middleware
GriPhyN, PPDG, TeraGrid, iVDGL, EU-DataGrid, LCG, DataTAG,
Crossgrid
Common testbed, open source software model
Several meeting so far
New
infrastructure Data Grid projects?
Fold
into existing Grid landscape (primarily US + EU)
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
52
Networks
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
53
Next Generation Networks for HENP
Rapid
access to massive data stores
Petabytes
Balance
and beyond
of high throughput vs rapid turnaround
Coordinate
Seamless
& manage: Computing, Data, Networks
high performance operation of WANs & LANs
WAN:
Wide Area Network
LAN:
Local Area Network
Reliable, quantifiable, high performance
Rapid access to the data and computing resources
“Grid-enabled” data analysis, production and collaboration
Full
participation by all physicists, regardless of location
Requires
good connectivity
Grid-enabled software, advanced networking, collaborative tools
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
54
2.5 Gbps Backbone
201 Primary Participants
All 50 States, D.C. and Puerto Rico
75 Partner Corporations and Non-Profits
14 State Research and Education Nets
15 “GigaPoPs” Support 70% of Members
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
55
Total U.S. Internet Traffic
100 Pbps
Limit of same % GDP as Voice
10 Pbps
1 Pbps
100Tbps
New Measurements
10Tbps
1Tbps
100Gbps
Projected at 4/Year
Voice Crossover: August 2000
10Gbps
1Gbps
ARPA & NSF Data to 96
100Mbps
10Mbps
4X/Year
2.8X/Year
1Mbps
100Kbps
10Kbps
1Kbps
100 bps
10 bps
1970
1975
1980
1985
1990
1995
2000
2005
2010
U.S. Internet Data Traffic
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
56
Source: Roberts et al., 2001
Bandwidth for the US-CERN Link
Link Bandwidth (Mbps)
10000
Evolution typical
of major HENP
links 2001-2006
8000
6000
4000
2000
0
FY2001 FY2002 FY2003 FY2004 FY2005 FY2006
BW (Mbps)
310
622
1250
2500
5000
10000
2155
Mbps in 2001
622 Mbps May 2002
2.5 Gbps Research Link Summer 2002 (DataTAG)
10 Gbps Research Link in mid-2003 (DataTAG)
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
57
Transatlantic Network Estimates
2001 2002 2003 2004
CMS
2005
2006
100
200
300
600
800
2500
ATLAS
50
100
300
600
800
2500
BaBar
300
600 1100 1600 2300
3000
CDF
100
300
400 2000 3000
6000
D0
400 1600 2400 3200 6400
8000
BTeV
20
40
100
200
300
500
DESY
100
180
210
240
270
300
CERN
311
622 2500 5000 10000 20000
BW
in Mbps, assuming 50% utilization
See http://gate.hep.anl.gov/lprice/TAN
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
58
All Major Links Advancing Rapidly
Next
generation 10 Gbps national network backbones
Starting
Major
to appear in the US, Europe and Japan
transoceanic links
Are/will
Critical
be at 2.5 - 10 Gbps in 2002-2003
path
Remove
regional, last mile bottlenecks
Remove compromises in network quality
Prevent TCP/IP inefficiencies at high link speeds
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
59
U.S. Cyberinfrastructure Panel:
Draft Recommendations (4/2002)
New
initiative to revolutionize science, engineering research
Capitalize
on new computing & communications opportunities
Supercomputing, massive storage, networking, software,
collaboration, visualization, and human resources
Budget estimate: incremental $650 M/year (continuing)
New
office with highly placed, credible leader
Initiate
competitive, discipline-driven path-breaking applications
Coordinate policy and allocations across fields and projects
Develop middleware & other software essential to scientific research
Manage individual computational, storage, and networking resources
at least 100x larger than individual projects or universities
Participants
NSF
directorates, Federal agencies, international e-science
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
60
Summary
Data
Grids will qualitatively and quantitatively change the
nature of collaborations and approaches to computing
Current
Data Grid projects will provide vast experience for
new collaborations, point the way to the future
Networks
Many
must continue exponential growth
challenges during the coming transition
New
grid projects will provide rich experience and lessons
Difficult to predict situation even 3-5 years ahead
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
61
Grid References
Grid Book
www.mkp.com/grids
Globus
www.globus.org
Global Grid Forum
www.gridforum.org
TeraGrid
www.teragrid.org
EU DataGrid
www.eu-datagrid.org
PPDG
www.ppdg.net
GriPhyN
www.griphyn.org
iVDGL
www.ivdgl.org
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
62
More Slides
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
63
1990s Information Infrastructure
O(107) nodes
Network
Network-centric
Simple,
fixed end systems
Few embedded capabilities
Few services
No user-level quality of service
From Ian Foster
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
64
Emerging Information Infrastructure
O(1010) nodes
Application-centric
Caching
Resource
Discovery
Processing
QoS
Grid
Heterogeneous,
mobile end-systems
Many embedded capabilities
Rich services
User-level quality of service
Qualitatively different,
not just “faster and
more reliable”
From Ian Foster
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
65
Globus General Approach
Define
Applications
Grid protocols & APIs
Protocol-mediated
access to remote resources
Integrate and extend existing standards
Develop
reference implementation
Diverse global services
Open
source Globus Toolkit
Client & server SDKs, services, tools, etc.
Grid-enable
wide variety of tools
Globus
Toolkit
FTP, SSH, Condor, SRB, MPI, …
Learn
about real world problems
Core
services
Deployment
Testing
Applications
Diverse resources
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
66
ICFA SCIC
SCIC:
Standing Committee on Interregional Connectivity
Created
by ICFA in July 1998 in Vancouver
Make recommendations to ICFA concerning the connectivity
between the Americas, Asia and Europe
SCIC
duties
Monitor
traffic
Keep track of technology developments
Periodically review forecasts of future bandwidth needs
Provide early warning of potential problems
Create subcommittees when necessary
Reports:
February, July and October 2002
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
67
SCIC Details
Network
status and upgrade plans
Bandwidth
and performance evolution
Per country & transatlantic
Performance
Study
measurements (world overview)
specific topics
Example:
Bulk transfer, VoIP, Collaborative Systems, QoS, Security
Identification
of problem areas
Ideas
on how to improve, or encourage to improve
E.g., faster links equipment cost issues, TCP/IP scalability, etc.
Meetings
Summary
and sub-reports available (February, May, October)
http://www.slac.stanford.edu/grp/scs/trip/notes-icfa-dec01cottrell.html
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
68
Internet2 HENP Working Group
Mission:
Ensure the following HENP needs
National
and international network infrastructures (end-to-end)
Standardized tools & facilities for high performance end-to-end
monitoring and tracking
Collaborative systems
Meet
HENP needs in a timely manner
US
LHC and other major HENP Programs
At-large scientific community
Create program broadly applicable across many fields
Internet2
Working Group: Oct. 26 2001
Co-Chairs:
S. McKee (Michigan), H. Newman (Caltech)
http://www.internet2.edu/henp
(WG home page)
http://www.internet2.edu/e2e
(end-to-end initiative)
Texas APS Meeting (Oct. 11, 2002)
Paul Avery
69