PowerPoint - Science of Collaboratories Home
Download
Report
Transcript PowerPoint - Science of Collaboratories Home
Where do we go from here?
“Knowledge Environments to Support
Distributed Science and Engineering”
Symposium on Knowledge Environments for Science and Engineering
November 26, 2002
Mary Anne Scott
Dept of Energy
Office of Science
Distributed Resources; Distributed Expertise
Pacific Northwest
National Laboratory
Idaho National
Environmental and
Engineering
Laboratory
Ames Laboratory
Argonne National
Laboratory
Fermi
National
Accelerator
Laboratory
Lawrence
Berkeley
National
Laboratory
Brookhaven
National
Laboratory
Stanford
Linear
Accelerator
Center
Princeton
Plasma
Physics
Laboratory
Lawrence
Livermore
National
Laboratory
Major User Facilities
User Institutions
Specific-Mission Laboratories
Program-Dedicated Laboratories
Multiprogram Laboratories
Thomas Jefferson
National
Accelerator Facility
Oak Ridge
National
Laboratory
Sandia
National
Laboratories
Los Alamos
National
Laboratory
National
Renewable Energy
Laboratory
DOE Office of Science Context
Research
Pre-1995
Foundational technology (Nexus, MPI, Mbone, …)
1995-1997 Distributed Collaborative Experiment Environment
Projects (testbeds and supporting technology)
1997-2000 DOE 2000 Program
(pilot collaboratories and technology projects)
2000-present National Collaboratories Program
2001-present Scientific Discovery Through Advanced Computing (SciDAC)
Planning
In order to inform the development and deployment of technology, a set of highimpact science applications in the areas of high energy physics, climate, chemical
sciences, magnetic fusion energy, and molecular biology have been analyzed* to
characterize their visions for the future process of science, and the networking and
middleware capabilities needed to support those visions
*DOE Office of Science, High Performance Network Planning Workshop.
August 13-15, 2002: Reston, Virginia, USA.
http://doecollaboratory.pnl.gov/meetings/hpnpw
3
MAGIC for addressing the coordination problem?
Middleware And Grid Infrastructure Coordination
A team under the Large Scale Network (interagency coordination)
Meets Monthly (1st Wed of each month)
Federal participants
Other participants
ANL, DOE, LANL, LBL, NASA, NCO, NIH, NIST, NOAA, NSF, PNL, UCAR
Boeing, Cisco, Educause, HP, IBM, Internet2, ISI, Level3, Microsoft, U-Chicago, UIUC, U-Wisconsin
Workshop held in Chicago, Aug 26-28
editors, contributors and participants from Federal Government, agencies and labs; industry,
universities, and international organizations
~100 participants
“Blueprint for Future Science Middleware and Grid Research and Infrastructure”
4
Driving Factors for Middleware and Grids
Science Push
New classes of scientific problems
are enabled from technologies
development
High energy physicists will harness
tens of thousands of CPUs in a
worldwide data grid
On-line digital sky survey requires
mechanisms for data federation and
effective navigation
Advances in medical imaging and
technologies enable collaboration
across disciplines and scale
Coupling of expertise, collaboration,
and disciplines encourage the
development of new science and
research.
Technology Pull
Continuing exponential advances in
sensor, computer, storage and
network capabilities will occur.
Sensor networks will create
experimental facilities.
PetaByte and ExaByte databases
will become feasible.
Increase in numerical and computer
modeling capabilities broaden the
base of science disciplines.
Increase in network speeds makes it
feasible to connect distributed
resources as never before.
5
Future Science (~5yr)
Discipline
Characteristics
Many simulation
elements/components added as
understanding increases
100 Tby/100 yr generated
simulation data, 1-5PBy/yr (per
institution) distributed to major
users in large chucks for postsimulation analysis
Climate
High Energy Physics
Instrument based data sources
Hierarchial data repositories
Hundreds of analysis sites
100s of petabytes of data
Global collaboration
Compute and storage
requirements satisfies by optimal
use of all available global
resources
Chemical Sciences
3D simulation sets (30-100 TB)
Coupling of MPP quantum
chemistry and molecular
dynamics simulations
Validation using large
experiment data set
Vision for the Future
Process of Science
Enable the analysis of model
data by all of the collaborating
community
Productivity aspects of rapid
response
Worldwide collaboration will
cooperative analyze data and
contribute to a common
knowledge base
Discover of publishe
(structured) data and its
provenace
Remote steering of simulation
time step
Remote data sub-setting,
mining, and visualization
Shared data/metadata
w/annotation evolves to
knowledge base
Anticipated Requirements
Networking
Authenticated data streams for
easier site access through
firewalls
Robust access to large
quantities of data
100 Gbit/se
Lambda based point-to-point
for single high b/w flows;
capacity planning
Network monitoring
~100Gbit for distributed
computation chemistry and
molecular dynamics simulations
Middleware
Server side data processing
(compute/cache embedded in the
net)
Reliable data/file transfer
(accounting for system/network
failures)
Track world-side resource
usage patterns to maximize
utilization
Direct network access to data
management systems
Monitoring to enable optimized
use of network caching/
compute, and storage resources
Publish/subscribe and global
discovery
Management of metadata
Global event services
Cross-discipline respoitories
International interoperability for
collab. infrastructure,
respositories, search, and
notification
Archival publication
6