Transcript AstroBox

The Chinese VIRTUAL OBSERVATORY
Mining data using MATLAB
through AstroBox
Chao LIU, Chenzhou CUI
Presented by: Chenzhou CUI
National Astronomical Observatory, China
IVOA Interoperability Meeting, Trieste
2008-5-20
1
China-VO
• Chinese Virtual Observatory (China-VO) is the national VO
project in China initiated in 2002 by Chinese astronomical
community led by National Astronomical Observatories,
Chinese Academy of Sciences.
• It focuses its research and development on VO science
and applications.
• R&D focuses:
– China-VO Platform
– Unified Access to On-line Astronomical Resources and
Services
– VO-ready Projects and Facilities
– VO-based Astronomical Research Activities
– VO-based Public Education
IVOA Interoperability Meeting, Trieste
2008-5-20
2
An active IVOA member
IVOA 2007, Beijing
1st Small projects meeting, 2003
IVOA Interoperability Meeting, Trieste
2008-5-20
3
Our products
http://services.china-vo.org
• VOFilter
– an XML filter for OpenOffice.org Calc
to open VOTable files
• SkyMouse
– A Smart On-line Astronomical
Information Collector
• FitHAS
– FITS Header Archiving System
• VO-DAS
– An OGSA-DAI based data access
service system to provide unified
access to astronomy data, including
catalogs, images and spectra.
• AstroBox
– Coming soon
– ...
IVOA Interoperability Meeting, Trieste
2008-5-20
4
First Science Paper from China-VO
• SDSS DR5 photometric data were searched for new Milky
Way companions or substructures in the Galactic halo.
• Data analysis procedures were based on the VO-DAS.
• Five candidates are identified as over-dense faint stellar
sources that have color-magnitude diagrams similar to
those of known globular clusters, or dwarf spherical
galaxies.
– (Liu et al., 2008, A&A)
IVOA Interoperability Meeting, Trieste
2008-5-20
5
AstroBox: Goals
• To provide an astronomical data mining
application service, supporting VO
protocols and tools
• To provide an network environment for
time-consuming astronomical data
mining computing
• A high-level data analysis environment,
NOT a raw data analysis tool as IRAF
IVOA Interoperability Meeting, Trieste
2008-5-20
6
General procedures of data mining
• Data Accessing
– query database
– high volume of data
• Data Pre-processing
– select qualified data
– eliminate BAD data
• Data Mining
– try multiple times and find a way to get unknown
knowledge from specific data set
• Data Analysis and Interpretation
– visualization
– comparisons with different data source
– associate results with physical meaning
IVOA Interoperability Meeting, Trieste
2008-5-20
7
An introduction to MATLAB
• MATLAB is a popular numerical computation software used in
variant fields.
• It provides dozens of toolboxes for different purposes, e.g.
statistics, pattern recognizing, optimizing, neural networks etc.,
as well as a number of way to access data from either local or
remote sites.
• It also offers visualizations by flexible 2D and 3D graphics
routines.
• It supports Java, C, and Fortran as well as its own M-language.
• It is available of accessing URL resources and parsing XML,
which is necessary for embedding web service.
• In its latest release, refined parallel computation is ready.
• We conclude that MATLAB is one of the best platforms on which
astronomical data mining tools can be developed
IVOA Interoperability Meeting, Trieste
2008-5-20
8
AstroBox
• AstroBox is a plug-in package
for MATLAB to be used for
astronomical computing and
data mining
VO Tools
(Aladin, TOPCAT)
PLASTIC
VOTable
Local DB
VO-DAS client
Astronomical algorithms
AstroBox
MATLAB
Database Toolbox
–
–
–
–
–
Java
Libraries
• It comprises of:
VOTables
PLASTIC
Local
DB
MATLAB
VO-DAS Client
VO-DAS
IVOA Interoperability Meeting, Trieste
2008-5-20
9
VO utilities in AstroBox
• VOTable access and conversion
– integrate STILS package
• PLASTIC availability
– embed a Java subroutine to connect to PLASTIC Hub
through which to exchange data and messages with
third party applications, e. g. Aladin and TOPCAT.
– SAMP support next...
• VO-DAS client interface
– embed a VO-DAS command line client to send an
ADQL to VO-DAS server and wait for query result
– It is also capable for asynchronous query, which can
access millions of rows of data (on going)
IVOA Interoperability Meeting, Trieste
2008-5-20
10
Data mining support
• Regressions
– linear regression
• inherited from MATLAB
– nonlinear regression
• provide astronomical common regressive functions, e.g. King model for
density profile of a dwarf galaxy.
– kernel regression
• Fitting
– provide specific algorithms for non analytic expression such as
complicated observation dataset or user defined functions
– several times faster than existed MATLAB functions
• Spherical surface projecting functions
– Equatorial projection & Galactic projection
– equal-area Lambert projection in particular for density measurement
on spherical surface
– Aitoff projection for overall viewing
• Visualizing functions
– 2-D plotting
– 3-D plotting
– modified on existed MATLAB functions
IVOA Interoperability Meeting, Trieste
2008-5-20
11
Other functions
• High level functions
aiming at specific research topics, most of which currently are
Milky Way related
– Kurucz stellar model
– Gerardi stellar population model
– isochrone fitting the stellar population
– Galactic star count model with disk and halo components
– Chemical evolution model for stellar population (on going)
• Most common used utilities
– Monte Carlo methods
– coordination transformations
– magnitude system transformations
IVOA Interoperability Meeting, Trieste
2008-5-20
12
Demos 1
• PLASTIC implementation
IVOA Interoperability Meeting, Trieste
2008-5-20
13
Demo 2
• Special regression
– using a hyperbola relationship
between independent and
dependent variables
• Model fitting
– density profiles of candidate dwarf
galaxy
IVOA Interoperability Meeting, Trieste
2008-5-20
14
Demo 3
• Isochrone fitting
– observed data are
accessed from either
local database or VODAS server
– query reference data
from Gerardi database
to fit theoretical
isochrones
IVOA Interoperability Meeting, Trieste
2008-5-20
15
Demo 4
• Visualization
IVOA Interoperability Meeting, Trieste
2008-5-20
16
Demo 5
• Parallel computation
– fitting a 9-parameter star count model in a
8-core server
– faster than that in a single-core computer at
a factor of ~8.
IVOA Interoperability Meeting, Trieste
2008-5-20
17
Future works
• Release as a tool to the community
• Extend cosmology methods
• Establish a distributed parallel
computation environment
• Deploy an on-line data mining service
IVOA Interoperability Meeting, Trieste
2008-5-20
18
Q&A
IVOA Interoperability Meeting, Trieste
2008-5-20
19