Transcript GAVO
Workshop
Garching, June 27 – July 1 2005
Statistical Cross-Matching
Across Distributed Archives
H.-M. Adorf & GAVO Team
MPI f. extraterrestrische Physik
[email protected]
Statistical cross-matching
Cross-matching of astrometric and
photometric catalogues
– core functionality of a virtual observatory
Operational modes
– on an area of the sky
– using an input catalogue (GAVO matcher)
Hans-Martin Adorf, GAVO
Matcher Demo, Page 2
Philosophy
Build a cross-matcher application that
– should be usable by scientists and help
producing science results
– uses what’s there and what works now
– doesn’t get stopped by a missing standard
Support the VO process by
– helping to generate appropriate VO-standards
– adopting new VO-standards whenever feasible
Hans-Martin Adorf, GAVO
Matcher Demo, Page 3
Querying remote archives
Movie
Hans-Martin Adorf, GAVO
Matcher Demo, Page 4
Querying remote archives
Movie
Using up to 10 servers
– distributed around the world
– operating in parallel
Sneak preview of grid computing
– Locally specify your tasks
– Execute them remotely at the data centers
– Receive results locally for final combination
Hans-Martin Adorf, GAVO
Matcher Demo, Page 5
Software demo (#1)
Input list
– 67 galaxies from FIRST radio catalogue
Query
– 2 remote archives: SDSS, VizieR
– 20 catalogues: radio, infrared, optical, X-ray
Task
– get counterparts for each input coordinate
– gather counterparts to form reasonable matches
Hans-Martin Adorf, GAVO
Matcher Demo, Page 6
The matching problem (#1)
Catalogue #2
Catalogue #3
Catalogue #1
Hans-Martin Adorf, GAVO
Matcher Demo, Page 7
The matching problem (#2)
Hans-Martin Adorf, GAVO
Matcher Demo, Page 8
Matcher workflow
Hans-Martin Adorf, GAVO
Matcher Demo, Page 9
Metadata
Querying and cross-matching requires
metadata about catalogues & archives
– astrometric fields and associated uncertainties
– photometric fields and associated uncertainties
– some metadata …
… are locally generated and stored
… are retrieved from archives in real-time
Hans-Martin Adorf, GAVO
Matcher Demo, Page 10
Software demo (#2)
Issue: false alarms
– matching is non-unique
– input: 67 sources
– output: almost 500 match candidates
– many of these match candidates are “false
alarms”
Hans-Martin Adorf, GAVO
Matcher Demo, Page 11
Issue: false alarms (#3)
Two fundamental, independent probabilities
– Hit probability:
p(c|C)
– False alarm probability: p(c|not C)
Goal
– keep the hit probability high (completeness)
– while keeping the false alarm probability low
– goodness depends on S/N ratio in the data
Hans-Martin Adorf, GAVO
Matcher Demo, Page 12
Issue: false alarms (#4)
Solution: use statistics (``fuzzy’’ matching)
– compute statistical (Mahalanobis) distance
between counterparts and center position
– Compute reliability measure for match
candidate (reduced chi-squared)
Hans-Martin Adorf, GAVO
Matcher Demo, Page 13
Software demo (#3)
Lower reduced chi-squared from 10,000 to 3
Hans-Martin Adorf, GAVO
Matcher Demo, Page 14
Software demo (#3)
Lower reduced chi-squared from 10,000 to 3
Result
– Hit-rate is still pretty high
– False-alarm rate is dramatically reduced
Hans-Martin Adorf, GAVO
Matcher Demo, Page 15
Issue: server reliability
An archive server
– may be down (easy to detect)
– may be slow today (more difficult to detect)
– may deliver wrong results (spoils the science)
Hans-Martin Adorf, GAVO
Matcher Demo, Page 16
VO Standards
Status
– Input
CSV files for data
XML files for query & match process description
– Sending plain HTTP/HTML to archive servers
– Receiving
CSV file from SDSS SkyServer
VOTable from VizieR (VO-Std)
– Output
VOTable with complete match result (VO-Std) - VOPlot
various CSV files
Hans-Martin Adorf, GAVO
Matcher Demo, Page 17
Software demo (#4)
VOPlot
Hans-Martin Adorf, GAVO
Matcher Demo, Page 18
Plans & Ideas
GUI for newcomers
– Facilitates selection of catalogues, astrometric
& photometric columns, etc.
– Generates configuration file
for query including server selection
for core cross-matcher, including chi-squared limit
Automatic monitoring of server response
and reliability
Improved matching algorithm
GUI panel for match candidate visualization
Hans-Martin Adorf, GAVO
Matcher Demo, Page 19
Summary
Shown a working cross-matcher application
– Operates with distributed archives queried in
parallel
Demonstrated that
– fuzzy matching is needed
– reduced chi-squared is a powerful statistical
discriminator
High hit-probability, low false-alarm probability
GAVO cross-matcher currently being used
in a first science application
Hans-Martin Adorf, GAVO
Matcher Demo, Page 20
Thanks
Particularly to the folks
– from SkyServer/SDSS, and
– from VizieR @ CDS and @ mirror sites,
who, with their services, have enabled the crossmatcher
Hans-Martin Adorf, GAVO
Matcher Demo, Page 21
The end
Hans-Martin Adorf, GAVO
Matcher Demo, Page 22
Issue: false alarms (#5)
Hans-Martin Adorf, GAVO
Matcher Demo, Page 23
Issue: false alarms (#6)
Hans-Martin Adorf, GAVO
Matcher Demo, Page 24
GAVO
GAVO I
– Funded by BMBF
– Started end of 2002
– Ended end of March 2005
GAVO interim
– Fundend
50% by Leibniz-prize money
50% by BMBF
Hans-Martin Adorf, GAVO
Matcher Demo, Page 25
The matching problem (#3)
Catalogue #2
Catalogue #3
Catalogue #1
Hans-Martin Adorf, GAVO
Matcher Demo, Page 26