Efficiently mining the X

Download Report

Transcript Efficiently mining the X

Efficient X-ray Data Mining
John Cunniffe
Dunsink Observatory
Dublin Institute for Advanced Studies
Evert Meurs (Dunsink Observatory)
Aaron Golden (NUI Galway)
Aus VO 18/11/03
Once you make doing science with your
VO service easy,
everyone will want to use your server.
Analagous to oversubscribed observatory time
- how do users successfully ‘compete’ for query time
Query modelling in a proposal?
Need data simulators/previewers to run query on.
and/or data subset for test run.
2
Future X-ray missions
Current Missions - XMM/Chandra/RXTE
- download data (typ. few GB/pointing)
- processed on local machine
XEUS, Constellation-X, Astro-E2, etc
-very large data sets (few 100GB/pointing)
-online data processing
proposed framework involves users submitting web based
requests for processing pipelines
-derived data products very important
source catalogues
images, spectra, lightcurves, etc
3
Efficient X-ray Data Mining
Efficient Don’t want to reprocess the data archive unless really needed
– maximise use of metadata
X-ray Data processing pipelines more complicated (than e.g. optical)
Treatment of faint sources/sky background statistically complex
Instrument response complex
(not exclusive to X-ray)
Data Mining Interested in the sources found in the data
but also in the context (i.e. why we found them in that selection)
Not simply interested in finding objects through cone
searches and stopping there.
4
Science Use Case
Interested in variable/transient X-ray objects
short-term: e.g. flare stars
(~1 dataset)
long-term: e.g. variability of normal/active galaxies (multi-dataset)
Current approach:
• use http-get scripts to Heasarc - create cross-correlated source cats.
• where known objects are not present in a catalogue
– retrieve original dataset & calculate upper flux limit (Expensive)
N.B. if source catalogue was generated from the whole data archive then we
may need to re-analyse a significant fraction of it.
To understand space density/flaring rate/etc of populations in the
catalogues we need to know the volume of space covered by archive:
area coverage (RA, dec)
spectral (Energy)
temporal coverage (t1,t2,...,ti)
flux limit
5
ROSAT All-Sky Survey
Duration: 1990 June - 1991 Jan
E = 0.1 - 2.4 keV
RASS-BSC (Bright Source Catalogue)
RASS-FSC (Faint Source Catalogue)
Selection Criteria
BSC
FSC
Count Rate
> 0.05/sec
 BSC
Probability (MaxLik)
 15 (~5)
 7 (~3)
N(photons)
 15
6
Accepted Sources
18,811
105,924
NB: Catalogues
have non-uniform
sky coverage &
sensitivity.
6
Survey depth
Regions with different
sensitivity included in
same source catalogues.
c.f. XMM-Serendipitous Source Cat
(created from pointed mode
observations with different exposure
times & instrument modes)
Need a good coverage/sensitivity
model of the data archive to
understand volume of space
contained in source catalogue.
66 binned image of RASS data set
7
Model Method 1:
Upper Limit predictor
Combine:
Instument model
(ARFs, PSFs, modes, ...)
Exposure time ....
(0-30ksec)
NH information, .…
… source spectral model, ....
create a high resolution flux limit map of the RASS sky ….
….. in progress.
8
Model Method 2:
Upper limit flux tabulation
Reprocess the data archive and determine the upper limit statistics
from the photon data directly
… combine with ….
NH information, .…
… source spectral model, ....
create a high resolution flux limit map of the RASS sky ….
….. in progress.
9
Results in a sensitivity map of the RASS sky
- adds usefulness to the source catalogue
Doing this with RASS is straightforward (though not quick) as the
total data archive is a few 10s of GB.
Doing it for future observatories will have to be done on the
archive curator’s server
10
The role of Archive/Source Catalogue Metadata
X-ray photon lists/ancilliary instrument data
Data
Computationally expensive to reprocess
Archive
Selection
Criteria
How should contents (not parameters) of a source
catalogue best be described in the metadata?
- why are the sources in it - in it?
Source
- describe the selection criteria
Catalogue
11
Flux limit maps,
limiting magnitude calculators,
observation simulators …..
VO Data Model?
“These are an integral part of the
sensitivity/coverage description”
 Enhance the metadata (face larger metadata)
Theory?
“This is really telescope simulation”
 Build separate model/simulator
12
Other wavebands
Similar challenges other
wavebands.
Complex coverage and
sensitivity descriptions
plus catalogue selection
criteria.
How many brown dwarves are there?
In general, how much data description should go in the metadata
and how much should be left in secondary resources?
13
Final Questions.
How big (Kbytes) should data archive metadata be?
– Should it include preview data (e.g. ‘large’ FITS files)?
– Should selection criteria be described in the metadata
(or simply a reference to the original publication)
– Provide partially reduced or preview data as externally held
addendum to the metadata?
• Much bigger than standard metadata
• Much smaller than whole archive
– What other tools are needed to allow astronomers to
• assess usefullness of,
• justify to Time Allocation Committees
large proposals/queries in a VO context?
14