Hierarchical Triangular Mesh

Download Report

Transcript Hierarchical Triangular Mesh

How to speed up search of ILMT light curves using the
HTM (Hierarchical Triangular Mesh)
method in relational databases
Poels, J.
•ILMT Software
• Data acquisition
• Clustering Framework (HPC)
• Data reduction
• Image subtraction (application to GL)
• Databases (RDBMS)
• ….
•GAIA QSO Classifier Software
ARC Liège, 11 February 2010
Motivation
• We explore ways of doing spatial search within a relational database
• hierarchical triangular mesh and HEALPix (a tessellation of the
sphere) as a zoned bucketing system, representing areas as disjunctivenormal form constraints.
• The approach has the virtue that the zone mechanism works well on BTrees native to all SQL systems and integrates naturally with current
query optimizers
• Involved projects:
– SDSS (Sloan Digital Sky Survey)
– GSC (Guide Star Catalog) Palomar and UK Schmidt surveys
– COBE (Cosmic Background Explorer)
– WMAP (Wilkinson Microwave Anisotropy Probe)
– ….
ILMT_operating_param
time
temperature
CCD position
rotation (ppm)
........
ILMT_reference_cat
2MASS/SDSS cat
Iterative_cat_1
Iterative_cat_2
Iterative_cat_(N-1)
ILMT_reference_cat
Iterative_cat_N
Fits_files_cat
fits_file_id
filename
path
file_type
processing_level
x_global_0
y_global_0
alpha_0
Reference_img_cat
source_id
fits_file_id1
fits_file_id2
x_local
y_local
x_global
y_global
alpha
delta
flux_aperture
flux_psf_fit
Mag_R
Object_type
Processing_No
Fwhm
Isolated_flag
Night catalogs
Night_1_cat
Night_2_cat
source_id
fits_file_id1
Night_3_cat
source_id
fits_file_id2
fits_file_id1
source_id
x_localfits_file_id2
Night_N_cat
y_localx_localfits_file_id1
source_id
fits_file_id2
x_global
y_localx_localfits_file_id1
y_global
x_global
y_localfits_file_id2
alpha y_global
x_local
delta alpha x_global
y_local
y_global
flux_aperture
delta alpha x_global
flux_psf_fit
flux_aperture
delta y_global
Mag_Rflux_psf_fit
alpha
flux_aperture
Object_type
Mag_Rflux_psf_fit
delta
Processing_Nr
Object_type
Mag_Rflux_aperture
FwhmProcessing_Nr
flux_psf_fit
Object_type
Isolated_flag
FwhmProcessing_Nr
Mag_R
Isolated_flag
FwhmObject_type
Processing_Nr
Isolated_flag
Fwhm
Isolated_flag
Objects_rowid_ptr
source_id
Night1_rowid
Night2_rowid
…
…
…
…
…
…
…
NightN_rowid
Ref_img_cat_rowid
Is this (horizontal time) model suitable ?
•
•
•
•
•
•
•
•
At the beginning the RDBMS should run smoothly but after 5 years of
operation ?
Indexing is not an easy task
A given source will be measures an order of 10^3 where each measure set
featuring ~ 200 bytes
Assuming ~10x10^6 sources we get a multi-TB DB (for alphanumeric data
only)! Consider also ~ 25TB of image data
Example of performance bottleneck: Search all constant point-like sources. We
have to scan the whole DB and for each source, track its history. This means
that for each source we have to issue 10x10^6x(N-1) SQL statements ! Forget
it.
Beyond the query complexity, the DMS prefetch the rows which are more
likely to be read: useless and slow disk activity.
The data must be rearranged
Solution: HTM
HTM (Hierachical Triangular Mesh)
• HTM [18] maps triangular
regions of the sphere to unique
identifiers
• The technique for subdividing
the sphere in spherical triangles
is a recursive process
• At each level of the recursion, the
area of the resulting triangles is
roughly the same
• In areas with a larger data
density, the recursion process can
be applied with a greater level of
detail than in areas with lower
density
• The starting point is a spherical
octahedron which identifies 8
spherical triangles of equal size
•
The term quadtree is used to describe a class of hierarchical data structures
based on the principle of a fast recursive decomposition of the space.
•
Sky tessellation with various mapping functions have been proposed. It is a
matter of fact that the astronomical community is accepting the HTM and
HEALPix (Hierarchical, Equal Area, and iso-Latitude Pixelisation) schema as
the default for object catalogues and for maps visualization and analysis,
respectively. HEALPix gives a hierarchical iso-area and iso-latitude
tessellation of the sphere and so are convenient for harmonic data analysis on
the sphere (densities, integrals, spherical harmonics, Fourier transforms,etc.,)
• Using a 64 bit long integer to store the index IDs leads to a limit for
the pixels size of about 7.7 and 0.44 milli-arcsec on a side for HTM
and HEALPix, respectively. Being able to quickly retrieve the list of
objects in a given sky region is crucial in several projects.
Application to the ILMT
• The indexing scheme does not have to go as deep as the actual pixel resolution
• Each triangle is linked to its own database table (the GSC approach)
• SQL queries involving searches based on RA,DEC (+range) quickly provide the
HtmId(s) which in turn is used to build the table_name(s) to be accessed.
• We can dynamically choose a triangle surface coverage depending on the
maximum number of sources (e.g. max 100 sources per triangle)
• One single SQL statement returns a cursor with the whole source_id history
• The problem is now: for each triangle database table, select distinct all
source_id and fetch history rows ~ 10x10^6 SQL runs ! Manageable 
• The indexing C++ software is freely available at:
http://www.sdss.jhu.edu/