No Slide Title

Download Report

Transcript No Slide Title

HUBBLE LEGACY ARCHIVE PROJECT @ STSCI
Astronomical Data Tagging
Web 2.0 meets Astronomy in the HLA
Niall I. Gaffney, W. Warren Miller
(STScI)
HUBBLE LEGACY ARCHIVE PROJECT @ STSCI
What is the HLA
• Hubble Legacy Archive
– Joint project STScI, ST-ECF, CADC
– Providing best archive data products from HST data
•
•
•
•
Improving WCS solutions
Combine data
Extracting image photometry and GRISM spectra
Create Simple and Powerful User Interface
– Typical HST archive user visits once a year
– Get the right data into the users own environment
• Users want to use their daily applications (e.g. web)
• Users have their own data analysis system
HUBBLE LEGACY ARCHIVE PROJECT @ STSCI
HLA UI Philosophy
• UI “Requirements” from users
– Interfaces must be simple, understandable, powerful,
rich, self-explanatory
• “Google like”
– Interface must feature the Data and not the Query
– Interface must NOT get in the way of getting data and
using them in the tools users are accustomed to
– Interface should expose information that previous
interfaces have not been able to
HUBBLE LEGACY ARCHIVE PROJECT @ STSCI
Early Data Release - Target Oriented
HUBBLE LEGACY ARCHIVE PROJECT @ STSCI
Who else does this…
HUBBLE LEGACY ARCHIVE PROJECT @ STSCI
What is Web 2.0
•
•
Web 2.0 is a change in how we use the network
Web 2.0 is NOT dynamic web pages (AJAX)
– Web 2.0 is enabled by AJAX
•
Web 2.0 are applications and APIs delivered via the web
– Netscape vs. Google
– DoubleClick vs. AdSense
– My Home Page vs. My Blog or MySpace
•
•
•
A synergy between services and information to provide a
more focused information service
User aware and user provided (context)
Tim O’Reilly article with long discussion
http://www.oreilly.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html
HUBBLE LEGACY ARCHIVE PROJECT @ STSCI
YouTube - Data and Tags
HUBBLE LEGACY ARCHIVE PROJECT @ STSCI
Where to get Tags for our Data
• Proposal data not enough (one target in a sea)
• Astronomers are few and busy
– Its not “Browse or Perish”, “Publish or Perish”
HUBBLE LEGACY ARCHIVE PROJECT @ STSCI
What we did
• Use a “basic footprint” (aka cone search) with
Simbad to identify objects within a given field
– Not a true footprint as objects returned are all points
• Used Simbad to then get bibcodes for objects
• Used ADS to get keywords for each bibcode
• Harvested other data from HST proposal
information (abstract, proposed targets…)
• Use Apache Lucene as our search engine
• Modified the Apache Lucene search demo
• 43% of the 2769 ACS WFC “visits” in the past 2 years
 38% of “visits” are parallels (semi-random pointing)
• Average ~ 22 keywords per observation with keywords
Keywords
120
100
80
Count
HUBBLE LEGACY ARCHIVE PROJECT @ STSCI
How well did this work
60
Series1
40
20
0
1
5
9
13
17
21
25
29
33
37
41
45
49
53
57
61
Number of Keywords
65
69
73
77
81
85
89
93
97 101
HUBBLE LEGACY ARCHIVE PROJECT @ STSCI
DEMO
HUBBLE LEGACY ARCHIVE PROJECT @ STSCI
Where to go next
• Scientific input needed
– Is More Like This useful or annoying scientifically more
often than not? Can it be tweaked?
• Footprints and more Footprints
– Intersection of observation footprints with object
footprints improve tags (especially smaller fields)
– Real time evaluation for cutouts and surveys (seconds
not minutes)
• Standardize tags more
– Case, spelling, removal of irrelevant words (e.g.
“Galaxy Clusters General” -> “Galaxy Clusters”,
“Colour” -> “Color”, “Charged Coupled Device”
=>/dev/null)
HUBBLE LEGACY ARCHIVE PROJECT @ STSCI
AstroTube