PowerPoint - Science of Collaboratories Home

Download Report

Transcript PowerPoint - Science of Collaboratories Home

Where do we go from here?
“Knowledge Environments to Support
Distributed Science and Engineering”
Symposium on Knowledge Environments for Science and Engineering
November 26, 2002
Mary Anne Scott
Dept of Energy
Office of Science
Distributed Resources; Distributed Expertise
Pacific Northwest
National Laboratory
Idaho National
Environmental and
Engineering
Laboratory
Ames Laboratory
Argonne National
Laboratory
Fermi
National
Accelerator
Laboratory
Lawrence
Berkeley
National
Laboratory
Brookhaven
National
Laboratory
Stanford
Linear
Accelerator
Center
Princeton
Plasma
Physics
Laboratory
Lawrence
Livermore
National
Laboratory
Major User Facilities
User Institutions
Specific-Mission Laboratories
Program-Dedicated Laboratories
Multiprogram Laboratories
Thomas Jefferson
National
Accelerator Facility
Oak Ridge
National
Laboratory
Sandia
National
Laboratories
Los Alamos
National
Laboratory
National
Renewable Energy
Laboratory
DOE Office of Science Context

Research
Pre-1995
Foundational technology (Nexus, MPI, Mbone, …)
1995-1997 Distributed Collaborative Experiment Environment
Projects (testbeds and supporting technology)
1997-2000 DOE 2000 Program
(pilot collaboratories and technology projects)
2000-present National Collaboratories Program
2001-present Scientific Discovery Through Advanced Computing (SciDAC)

Planning
In order to inform the development and deployment of technology, a set of highimpact science applications in the areas of high energy physics, climate, chemical
sciences, magnetic fusion energy, and molecular biology have been analyzed* to
characterize their visions for the future process of science, and the networking and
middleware capabilities needed to support those visions
*DOE Office of Science, High Performance Network Planning Workshop.
August 13-15, 2002: Reston, Virginia, USA.
http://doecollaboratory.pnl.gov/meetings/hpnpw
3
MAGIC for addressing the coordination problem?
Middleware And Grid Infrastructure Coordination



A team under the Large Scale Network (interagency coordination)
Meets Monthly (1st Wed of each month)
Federal participants


Other participants


ANL, DOE, LANL, LBL, NASA, NCO, NIH, NIST, NOAA, NSF, PNL, UCAR
Boeing, Cisco, Educause, HP, IBM, Internet2, ISI, Level3, Microsoft, U-Chicago, UIUC, U-Wisconsin
Workshop held in Chicago, Aug 26-28



editors, contributors and participants from Federal Government, agencies and labs; industry,
universities, and international organizations
~100 participants
“Blueprint for Future Science Middleware and Grid Research and Infrastructure”
4
Driving Factors for Middleware and Grids
Science Push

New classes of scientific problems
are enabled from technologies
development




High energy physicists will harness
tens of thousands of CPUs in a
worldwide data grid
On-line digital sky survey requires
mechanisms for data federation and
effective navigation
Advances in medical imaging and
technologies enable collaboration
across disciplines and scale
Coupling of expertise, collaboration,
and disciplines encourage the
development of new science and
research.
Technology Pull





Continuing exponential advances in
sensor, computer, storage and
network capabilities will occur.
Sensor networks will create
experimental facilities.
PetaByte and ExaByte databases
will become feasible.
Increase in numerical and computer
modeling capabilities broaden the
base of science disciplines.
Increase in network speeds makes it
feasible to connect distributed
resources as never before.
5
Future Science (~5yr)
Discipline
Characteristics
Many simulation
elements/components added as
understanding increases
100 Tby/100 yr generated
simulation data, 1-5PBy/yr (per
institution) distributed to major
users in large chucks for postsimulation analysis
Climate

High Energy Physics

Instrument based data sources
Hierarchial data repositories
Hundreds of analysis sites
100s of petabytes of data
Global collaboration
Compute and storage
requirements satisfies by optimal
use of all available global
resources

Chemical Sciences
3D simulation sets (30-100 TB)
Coupling of MPP quantum
chemistry and molecular
dynamics simulations
Validation using large
experiment data set

Vision for the Future
Process of Science
Enable the analysis of model
data by all of the collaborating
community
Productivity aspects of rapid
response
Worldwide collaboration will
cooperative analyze data and
contribute to a common
knowledge base
Discover of publishe
(structured) data and its
provenace

Remote steering of simulation
time step
Remote data sub-setting,
mining, and visualization
Shared data/metadata
w/annotation evolves to
knowledge base

Anticipated Requirements
Networking
Authenticated data streams for
easier site access through
firewalls
Robust access to large
quantities of data

100 Gbit/se
Lambda based point-to-point
for single high b/w flows;
capacity planning
Network monitoring


~100Gbit for distributed
computation chemistry and
molecular dynamics simulations

Middleware
Server side data processing
(compute/cache embedded in the
net)
Reliable data/file transfer
(accounting for system/network
failures)

Track world-side resource
usage patterns to maximize
utilization
Direct network access to data
management systems
Monitoring to enable optimized
use of network caching/
compute, and storage resources
Publish/subscribe and global
discovery

Management of metadata
Global event services
Cross-discipline respoitories
International interoperability for
collab. infrastructure,
respositories, search, and
notification
Archival publication

6