- Microsoft Research

Download Report

Transcript - Microsoft Research

Online Science
The World-Wide Telescope
as a Prototype For
the New Computational Science
Jim Gray
Microsoft Research
http://research.microsoft.com/~gray
Alex Szalay
Johns Hopkins University
1
The Evolution of Science
• Observational Science
– Scientist gathers data by direct observation
– Scientist analyzes data
• Analytical Science
– Scientist builds analytical model
– Makes predictions.
• Computational Science
– Simulate analytical model
– Validate model and makes predictions
• Data Exploration Science
Data captured by instruments
Or data generated by simulator
– Processed by software
– Placed in a database / files
– Scientist analyzes database / files
2
Information Avalanche
• In science, industry, government,….
– better observational instruments and
– and, better simulations
producing a data avalanche
Image courtesy
C. Meneveau & A. Szalay @ JHU
• Examples
– BaBar: Grows 1TB/day
2/3 simulation Information
1/3 observational Information
– CERN: LHC will generate 1GB/s .~10 PB/y
– VLBA (NRAO) generates 1GB/s today
– Pixar: 100 TB/Movie
BaBar, Stanford
P&E Gene Sequencer From
http://www.genome.uci.edu/
• New emphasis on informatics:
– Capturing, Organizing,
Summarizing, Analyzing, Visualizing
3
Space Telescope
World Wide Telescope
Virtual Observatory
http://www.astro.caltech.edu/nvoconf/
http://www.voforum.org/
• Premise:
Most data is (or could be online)
• The Internet is the world’s best telescope:
–
–
–
–
It has data on every part of the sky
In every measured spectral band: optical, x-ray, radio..
As deep as the best instruments (2 years ago).
It is up when you are up.
The “seeing” is always great
(no working at night, no clouds no moons no..).
– It’s a smart telescope:
links objects and data
to literature on them.
4
Why Astronomy Data?
IRAS 25m
•It has no commercial value
–No privacy concerns
–Can freely share results with others
–Great for experimenting with algorithms
2MASS 2m
•It is real and well documented
– High-dimensional data (with confidence intervals)
– Spatial data
– Temporal data
DSS Optical
•Many different instruments from
many different places and
many different times
•Federation is a goal
•There is a lot of it (petabytes)
•Great sandbox for data mining algorithms
IRAS 100m
WENSS 92cm
–Can share cross company
–University researchers
•Great way to teach both
Astronomy and
Computational Science
NVSS 20cm
5
ROSAT ~keV
GB 6cm
SkyServer.SDSS.org
• A modern Astronomy archive
– Raw Pixel data lives in file servers
– Catalog data (derived objects) lives in Database
– Online query to any and all
• Also used for education
– 150 hours of online Astronomy
– Implicitly teaches data analysis
• Interesting things
–
–
–
–
–
–
Spatial data search
Client query interface via Java Applet
Query interface via Emacs
Popular
Cloned by other surveys (a template design)
Web services are core of it.
6
Federation: SkyQuery.Net
• Combine 4 archives initially
• Just added 6 more
• Send query to portal,
portal joins data from archives.
• Problem: want to do multi-step data analysis
(not just single query).
• Solution: Allow personal databases on portal
• Problem: some queries are monsters
• Solution: “batch schedule” on portal server,
Deposits answer in personal database.
7
SkyQuery Structure
• Each SkyNode publishes
– Schema Web Service
– Database Web Service
• Portal is
– Plans Query (2 phase)
– Integrates answers
– Is itself a web service
Image
Cutout
SDSS
INT
SkyQuery
Portal
FIRST
2MASS
8
Information Avalanche:
science, business, personal
Astronomy data
SkyServer: http://SkyServer.SDSS.org
demo http://skyquery.net/
pixel space
record space
set space
Personal SkyServer download http://skyserver.org/myskyserver/
Mention data mining.
World-Wide Telescope
Federated web services
demo http://skyquery.net/
Other web services
Interop with Linux/Python/…
Other stuff
Portal with batch job scheduler
http://skyservice.pha.jhu.edu/devel/casjobs/
9