modelling and storing merger trees

Download Report

Transcript modelling and storing merger trees

Cosmological simulations in a relational database:
modelling and storing merger trees
Gerard Lemson, GAVO, Max-Planck-Institut für extraterrestrische Physik, Garching, Germany
Volker Springel, Max-Planck-Institut für Astrophysik, Garching, Germany
Abstract:We present a method for storing tree-like data structures in a relational database that allows for fast querying of children and parents of any node and
from and down to any level. We have used this method in storing halo merger trees derived from a large cosmological N-body simulation and the merger trees of
model galaxy catalogues derived from the halo catalogues using semi-analytical methods. We give SQL queries corresponding to typical science questions that
can be asked from such a database and present an online query interface available through the web portal of the German Astrophysical Virtual Observatory.
Background and goals:
This work was done in the context of the German Astrophysical Virtual Observatory (GAVO). GAVO pays special attention to the introduction of theory data (simulations) into the Virtual Observatory (VO). To test
our ideas we have created various prototype implementations. Our main goal for the project presented here was to investigate the use of relational database technology in the analysis of results of large scale
structure simulations, as well as in their online publication. The former may lead to direct scientific benefits to the owners of the data, the latter leads to benefits to the larger community that gets access to the
data in a well defined and standardized manner.
Fig. 1: Slice through the density field of the Millennium
simulation at redshift z=0. The slice is 15 Mpc/h thick.
Simulation and science questions:
The simulation that was used in this prototype is a relatively small, dark matter,
cosmological N-body simulation, that was created as preparation for the Millennium
simulation [1,2]. For this project we were interested in post-processing products
of this simulation: density fields, halo catalogues including halo merger trees and
mock galaxy catalogues. The latter were produced using semi-analytical galaxy
formation (SAGF) routines that use the merger trees as input (see [3,4] for
descriptions of the SAGF algorithms).
The database was designed to answer a number of science questions, similar to the
Approach in [5]. We polled astrophysicists associated to the simulation project, which
resulted in the following list which is a subset of these questions:
1. Return the complete halo merger tree for a halo identified at z=0
2. Find positions and velocities for all galaxies at redshift zero with B-luminosity, colour
and bulge-to-disk ratio within given intervals.
3. Return B-band luminosity function of galaxies residing in halos of mass between 10^13
and 10^14 solar masses.
4. Return the formation time of halos, defined as the maximum time at which it still has a
progenitor of greater than half its mass, as function of the matter density in its
environment, defined by the matter density smoothed on scale of 10Mpc (inspired by
[6,7]).
Database Implementation:
The design of the information system started with the construction of an analysis model, shown in Fig. 3a. It contains the important
concepts and their relationships of the domain under investigation (see [8] and references therein). Important for this project are simulator
(the code), simulation (the running of the simulator with particular input parameters) and its snapshots. The actual data stored in the
database are the results of post-processing: cluster extraction and galaxy formation. In our model all of these specialize a common pattern
that in [8] is identified with the basic concepts: protocol, experiment and result. They are especially important for describing the
provenance of the data.
The physical database model is restricted to the data part of the conceptual model. It is more constrained in that it must fit the data in a
relational model, that it must enable translation of the science questions into (relatively easy) SQL and moreover that it do so efficiently.
The science questions deal with relations between different types of objects, between object and environment and, especially, with the
formation history of objects. The history is embodied in the merger trees of both halos and galaxies. One can store trees using a single
link from progenitor to descendant, but this requires recursion to retrieve a complete progenitor tree. This is not a standard feature of all
relational databases and a more efficient solution is desirable even where it is supported.
Fig. 2 illustrates our solution. Each object gets an identifier corresponding to its order in a depth first sort of the trees rooted in objects at
the final snapshot. Each object furthermore gets a pointer (foreign key) to the last progenitor in the ordering of the sub-tree rooted in that
object. The complete progenitor tree rooted in a given object (at any snapshot !) is now precisely the set of objects whose identifier has
value between the root object’s id and the id of the last progenitor. In SQL the relevant query is as follows:
select prog.*
from halo des, halo prog
where des.haloId = 5000063000000 -- example value
and prog.haloId between des.haloId and des.lastProgId
This is the query corresponding to science question 1 above. In the database the tables are clustered (ordered) according to the id
columns, which ensures that merger trees are sequentially stored on the disks, speeding up the retrieval.
One other feature of the data model is the spatial indexing based on the Peano-Hilbert space filling curve. The Millennium simulation’s
files are organized around this index (see [9]), which is a higher dimensional equivalent to the recursive HTM [10] or HEALPix [11] indexes
on the sky. In the database it will likewise allow efficient spatial searches, though for now it is used to link the objects and the density field.
Webportal and example queries:
The database is accessible online from a special purpose web application accessible through the GAVO portal (http://www.g-vo.org),
which follows design ideas from the SkyServer [12] and GalICS [13] web applications. The user can type in free-form SQL queries and
retrieve the result in a variety of formats: HTML, CSV, VOTable (Fig 4b). A particular feature is the ability to visualise the results directly via
a VOPlot [14] applet (see Fig 4c). A number of example queries are available. In Fig 5. we show the queries corresponding to the other
three science questions above. DEMO This GAVO web application is being demonstrated at this conference.
a
b
c
Fig 2: Illustration of the merger tree structure
of objects (halos/galaxies) in the simulation.
The black lines indicate the traditional,
descendant pointers. The red lines indicate the
pointer structure used in the database model.
a
b
Fig. 3: Formal datamodels used in the design of the Millennium
database. (a) shows an analysis model (UML), detailing the
important domain concepts and their interrelationships. (b)
shows a schematic relational model (ER) for the tables in the
database and their foreign key relations.
2.
select x,y,z,
velX, velY, velZ
from MMGalaxy
where mag_b between –23 and –18
and bulgeMass >= .1*stellarMass
3.
select .2*round(5*g.mag_b) as magB,
count(*) as num
from MMGalaxy g, MMHalo h
where g.haloId = h.haloId
and h.mTopHat between 1000 and 10000
and h.redshift=0
group by magB
4.
select zForm,
avg(g10) as g10
from MMField f,
( select des.haloId, des.phkey,
max(PROG.redshift) as zForm
from MMHalo PROG,
MMHalo DES
where DES.redshift = 0
and PROG.haloId between DES.haloId
and DES.lastProgenitorId
and prog.np >= des.np/2
and des.np between 100 and 200
group by des.haloId, des.phkey
) t
where t.phkey = f.phkey
and f.snapnum=63
group by zForm
Fig 4: Snapshots of the GAVO portal web pages providing access to the simulation database. (a) shows the query page, with demo queries and
links to the schema and documentation. (b) shows the result of the query in (a) in VOTable format. (c) show the same result plotted with VOPlot.
The query implements science question number 1 and the plot shows the evolution of the merger tree below a given halo at redshift 0 by plotting
the X-position vs the snapshot number. This gives a very nice illustration of the orbits of the halos and their merging behaviour.
References:
[1] Springel V., White S.D.M., et al, 2005, Nature, 435, 629
[2] http://www.mpa-garching.mpg.de/galform/virgo/millennium/index.shtml
[3] Croton D.J., Springel V., et al, 2005, MNRAS, submitted (astro-ph/0508046)
[4] de Lucia G., Kauffmann G. & White S.D.M, MNRAS. 349 (2004) 1101
[5] Gray J., Szalay A., et al, 2002. http://arxiv.org/abs/cs.DB/0202014
[6] Gao L., Springel V., White S.D.M, 2005, MNRAS, in press (astro-ph/0506510)
[7] Lemson G., Kauffmann G, 1999, MNRAS 302, 111
Fig 5: SQL implementations of science questions 2-4.
The database dialect is Postgres.
[8] Lemson, G., Dowler, P., Banday, A.J. 2003, in ASP Conf. Ser., Vol. 314 Astronomical Data Analysis Software and
Systems XIII, eds. F. Ochsenbein, M. Allen, & D. Egret (San Francisco: ASP), 472
[9] Springel, V., 2005, MNRAS, submitted (astro-ph/0505010)
[10] http://skyserver.org/HTM/
[11] http://healpix.jpl.nasa.gov/
[12] http://cas.sdss.org/dr4/en/tools/search/sql.asp
[13] http://galics.cosmologie.fr/main_frames.php?dir=database
[14] http://vo.iucaa.ernet.in/~voi/voplot.htm