Transcript GAVO

Workshop
Garching, June 27 – July 1 2005
Statistical Cross-Matching
Across Distributed Archives
H.-M. Adorf & GAVO Team
MPI f. extraterrestrische Physik
[email protected]
Statistical cross-matching
 Cross-matching of astrometric and
photometric catalogues
– core functionality of a virtual observatory
 Operational modes
– on an area of the sky
– using an input catalogue (GAVO matcher)
Hans-Martin Adorf, GAVO
Matcher Demo, Page 2
Philosophy
 Build a cross-matcher application that
– should be usable by scientists and help
producing science results
– uses what’s there and what works now
– doesn’t get stopped by a missing standard
 Support the VO process by
– helping to generate appropriate VO-standards
– adopting new VO-standards whenever feasible
Hans-Martin Adorf, GAVO
Matcher Demo, Page 3
Querying remote archives
 Movie
Hans-Martin Adorf, GAVO
Matcher Demo, Page 4
Querying remote archives
 Movie
 Using up to 10 servers
– distributed around the world
– operating in parallel
 Sneak preview of grid computing
– Locally specify your tasks
– Execute them remotely at the data centers
– Receive results locally for final combination
Hans-Martin Adorf, GAVO
Matcher Demo, Page 5
Software demo (#1)
 Input list
– 67 galaxies from FIRST radio catalogue
 Query
– 2 remote archives: SDSS, VizieR
– 20 catalogues: radio, infrared, optical, X-ray
 Task
– get counterparts for each input coordinate
– gather counterparts to form reasonable matches
Hans-Martin Adorf, GAVO
Matcher Demo, Page 6
The matching problem (#1)
Catalogue #2
Catalogue #3
Catalogue #1
Hans-Martin Adorf, GAVO
Matcher Demo, Page 7
The matching problem (#2)
Hans-Martin Adorf, GAVO
Matcher Demo, Page 8
Matcher workflow
Hans-Martin Adorf, GAVO
Matcher Demo, Page 9
Metadata
 Querying and cross-matching requires
metadata about catalogues & archives
– astrometric fields and associated uncertainties
– photometric fields and associated uncertainties
– some metadata …
 … are locally generated and stored
 … are retrieved from archives in real-time
Hans-Martin Adorf, GAVO
Matcher Demo, Page 10
Software demo (#2)
 Issue: false alarms
– matching is non-unique
– input: 67 sources
– output: almost 500 match candidates
– many of these match candidates are “false
alarms”
Hans-Martin Adorf, GAVO
Matcher Demo, Page 11
Issue: false alarms (#3)
 Two fundamental, independent probabilities
– Hit probability:
p(c|C)
– False alarm probability: p(c|not C)
 Goal
– keep the hit probability high (completeness)
– while keeping the false alarm probability low
– goodness depends on S/N ratio in the data
Hans-Martin Adorf, GAVO
Matcher Demo, Page 12
Issue: false alarms (#4)
 Solution: use statistics (``fuzzy’’ matching)
– compute statistical (Mahalanobis) distance
between counterparts and center position
– Compute reliability measure for match
candidate (reduced chi-squared)
Hans-Martin Adorf, GAVO
Matcher Demo, Page 13
Software demo (#3)
 Lower reduced chi-squared from 10,000 to 3
Hans-Martin Adorf, GAVO
Matcher Demo, Page 14
Software demo (#3)
 Lower reduced chi-squared from 10,000 to 3
 Result
– Hit-rate is still pretty high
– False-alarm rate is dramatically reduced
Hans-Martin Adorf, GAVO
Matcher Demo, Page 15
Issue: server reliability
 An archive server
– may be down (easy to detect)
– may be slow today (more difficult to detect)
– may deliver wrong results (spoils the science)
Hans-Martin Adorf, GAVO
Matcher Demo, Page 16
VO Standards
 Status
– Input
 CSV files for data
 XML files for query & match process description
– Sending plain HTTP/HTML to archive servers
– Receiving
 CSV file from SDSS SkyServer
 VOTable from VizieR (VO-Std)
– Output
 VOTable with complete match result (VO-Std) - VOPlot
 various CSV files
Hans-Martin Adorf, GAVO
Matcher Demo, Page 17
Software demo (#4)
 VOPlot
Hans-Martin Adorf, GAVO
Matcher Demo, Page 18
Plans & Ideas
 GUI for newcomers
– Facilitates selection of catalogues, astrometric
& photometric columns, etc.
– Generates configuration file
 for query including server selection
 for core cross-matcher, including chi-squared limit
 Automatic monitoring of server response
and reliability
 Improved matching algorithm
 GUI panel for match candidate visualization
Hans-Martin Adorf, GAVO
Matcher Demo, Page 19
Summary
 Shown a working cross-matcher application
– Operates with distributed archives queried in
parallel
 Demonstrated that
– fuzzy matching is needed
– reduced chi-squared is a powerful statistical
discriminator
 High hit-probability, low false-alarm probability
 GAVO cross-matcher currently being used
in a first science application
Hans-Martin Adorf, GAVO
Matcher Demo, Page 20
Thanks
 Particularly to the folks
– from SkyServer/SDSS, and
– from VizieR @ CDS and @ mirror sites,
who, with their services, have enabled the crossmatcher
Hans-Martin Adorf, GAVO
Matcher Demo, Page 21
The end
Hans-Martin Adorf, GAVO
Matcher Demo, Page 22
Issue: false alarms (#5)
Hans-Martin Adorf, GAVO
Matcher Demo, Page 23
Issue: false alarms (#6)
Hans-Martin Adorf, GAVO
Matcher Demo, Page 24
GAVO
 GAVO I
– Funded by BMBF
– Started end of 2002
– Ended end of March 2005
 GAVO interim
– Fundend
 50% by Leibniz-prize money
 50% by BMBF
Hans-Martin Adorf, GAVO
Matcher Demo, Page 25
The matching problem (#3)
Catalogue #2
Catalogue #3
Catalogue #1
Hans-Martin Adorf, GAVO
Matcher Demo, Page 26