Djorgovski - Federation of Earth Science Information Partners

Download Report

Transcript Djorgovski - Federation of Earth Science Information Partners

Virtual Observatory:
A Quick Overview, and
Some Lessons Learned
S. George Djorgovski
Caltech
ESIP Workshop,
UCSB, July 2009
Astronomy Has Become Very Data-Rich
• Typical digital sky survey now generates ~ 10 - 100 TB, plus a
comparable amount of derived data products
– PB-scale data sets are on the horizon
• Astronomy today has ~ 1 - 2 PB of archived data, and generates
a few TB/day
– Both data volumes and data rates grow exponentially, with a
doubling time ~ 1.5 years
– Even more important is the growth of data complexity
• For comparison:
Human memory ~ a few hundred MB
Human Genome < 1 GB
1 TB ~ 2 million books
Library of Congress (print only) ~ 30 TB
1000
doubling t ≈ 1.5 yrs
100
10
1
0.1
1970
1975
1980
1985
1990
1995
2000
Exponential Growth
in Data Volumes and
Complexity
TB’s to PB’s of data,
108 - 109 sources,
102 - 103 param./source
Crab
CCDs
Star forming complex
Glass
Multi- data fusion leads to a more
complete, less biased picture
(also: multi-scale, multi-epoch, …)
Visible + X-ray
Radio + IR
Understanding of complex phenomena requires complex data!
Numerical simulations are also
producing many TB’s of very
complex “data”
Data + Theory = Understanding
The Archive Archipelago
• As the data sets kept increasing, a number of archives, data
depositories, and digital library services were created
• All of them are mission-, domain-, or observatory-specific,
distinct and independent scientifically, technologically,
institutionally, heterogeneous in look-feel, usage, etc.
–
–
–
–
There was a considerable replication of effort
There was some functional redundancy
There was almost no interoperability
Some standards have been generally adopted (e.g., FITS)
• All of them were primarily designed for single-object (or
single-pointing) queries - and thus inherently unsuitable for
the science enabled by the massive and complex data sets
• The next step was clearly to connect them in a functional
manner, and develop interoperability standards, formats, etc.
The Virtual Observatory Concept
• A complete, dynamical, distributed, open research
environment for the new astronomy with massive and
complex data sets
– Provide and federate
content (data, metadata)
services, standards, and
analysis/compute services
– Develop and provide
data exploration and
discovery tools
– Not just the archives!
– A part of a broader
Cyber-Infrastructure and
e-Science movement
From Traditional to Survey to VO Science
Traditional:
Survey-Based:
Another Survey/Archive?
Telescope
Survey
Telescope
Archive
Data Analysis
Results
Follow-Up
Telescopes
Target Selection
Data Mining
Results
Highly successful, but inherently limited by the information content
of individual sky surveys … What comes next, beyond survey
science is the VO science
A Systemic View of the VO-Based Science
Primary Data Providers
Surveys
Observatories
Missions
Survey
and
Mission
Archives
Secondary
Data
Providers
VO
Data Services
--------------Data Mining
and Analysis,
Target Selection
Follow-Up
Telescopes
and
Missions
Results
Digital libraries
VO connects the whole
system of astronomical
research
A Brief History of the VO Concept
• Early (pre-web!) ideas already in the “Astrophysics Data System”
(only the digital library part survives)
• Concept developed through 1990’s, mainly from large digital sky
surveys (DPOSS, SDSS…), discussions at conferences and
workshops in the late 1990’s
• Top recommendation in the “small projects” category in the NAS
Decadal Astronomy & Astrophysics survey
(the McKee-Taylor report), 2001
• The first major VO conference at Caltech in
2000; the NVO White paper
• National Virtual Observatory Science
Definition Team, 2001 - 2002
• ESO conferences, 2001 - 2002
• Vigorous international efforts, coordinated
via International VO Alliance (IVOA)
VO Development and Status
• NSF-funded framework development project (2001-2008): the
U.S. National Virtual Observatory (NVO)
• Now into a facility regime: Virtual Astro. Obs. (VAO)
• Joint funding by the NSF and NASA
• Work largely done in the existing data archives, and thus very
data-centric
• Vigorous international efforts (IVOA)
http://us-vo.org
http:// ivoa.net
Scientific Roles and Benefits of a VO
• Facilitate science with massive data sets (observations
and theory/simulations)
efficiency amplifier
• Provide an added value from federated data sets (e.g.,
multi-wavelength, multi-scale, multi-epoch …)
– Discover the knowledge which is present in the data,
but can be uncovered only through data fusion
• Enable and stimulate some qualitatively new science
with massive data sets (not just old-but-bigger)
• Optimize the use of expensive resources (e.g., space
missions, large ground-based telescopes, computing …)
• Provide R&D drivers, application testbeds, and stimulus
to the partnering disciplines (CS/IT, statistics …)
VO Represents a New Type of a
Scientific Organization
for the era of information abundance
• It is not yet another data center, archive, mission, or
a traditional project
It does not fit into any
of the usual organizational structures
– It is inherently distributed, and web-centric
– It is fundamentally based on a rapidly developing
technology (IT/CS)
– It transcends the traditional boundaries between
different wavelength regimes, agency domains
– It has an unusually broad range of constituents and
interfaces
– It is inherently multidisciplinary
Broader and Societal Benefits of a VO
• Professional Empowerment: Scientists and students
anywhere with an internet connection would be able to
do a first-rate science
A broadening of the
talent pool in astronomy, democratization of the field
• Interdisciplinary Exchanges:
– The challenges facing the VO are common to most
sciences and other fields of the modern human endeavor
– Intellectual cross-fertilization, feedback to IT/CS
• Education and Public Outreach:
– Unprecedented opportunities in terms of the content,
broad geographical and societal range, at all levels
– Astronomy as a magnet for the CS/IT education
“Weapons of Mass Instruction”
VO Education and Public Outreach
Microsoft’s World
Wide Telescope, and
Google Sky: use DSS,
SDSS, HST data, etc.,
for easy sky browsing
VO Functionality Today
What we did so far:
• Lots of progress on interoperability, standards, etc.
• An incipient data grid of astronomy
• Some useful web services
• Community training, EPO
What we did not do (yet):
• Significant data exploration and mining tools
That is where the science will come from!
Thus, little VO-enabled science so far
Thus, a slow community buy-in
 Development of powerful, usable knowledge
discovery tools should be a key priority
An Evolving Sociology
• We have transitioned from the data poverty regime into
an era of exponential data abundance
– Most astronomers do not seem too fully realize this
– Proprietary periods should be re-thought; there are other modes
of data access rights currencies, different scenarios?
– Data are cheap, but the expertise is expensive (and creativity is
priceless)
• Telescopes are just the hardware needed to generate the
data; and data are just incidental to our real mission,
which is knowledge creation
– When the data and the exploration tools are on the web, the
value of large facilities ownership should be rethought
– Computers are (relatively) cheap, but software is expensive —
especially if you are not approaching it in a smart way
Information Technology  New Science
• The information volume grows exponentially
Most data will never be seen by humans!
The need for data storage, network, database-related
technologies, standards, etc.
• Information complexity is also increasing greatly
Most data (and data constructs) cannot be
comprehended by humans directly!
The need for data mining, KDD, data understanding
technologies, hyperdimensional visualization,
AI/Machine-assisted discovery …
• We need to create a new scientific methodology on the
basis of applied CS and IT
• VO is the framework to effect this for astronomy
Some Readings:
• A quick summary:
– “Virtual Observatory: From Concept to Implementation”,
Djorgovski, S.G., & Williams, R. 2005, A.S.P. Conf. Ser. 345,
517, available as http://arXiv.org/abs/astro-ph/0504006
• The original VO White Paper:
– “Toward a National Virtual Observatory: Science Goals,
Technical Challenges, and Implementation Plan”, in Virtual
Observatories of the Future, A.S.P. Conf. Ser. 225, 353,
available as http://arXiv.org/abs/astro-ph/0108115
• The NVO SDT report, from http://www.us-vo.org/sdt
• Many other good documents available at http://us-vo.org
(especially the summer school presentations)
• Technical documents at http://www.ivoa.net