Transcript Document

STORING AND MANIPULATING GRIDDED DATA IN
SPATIALLY-ENABLED DATABASES
Adit Santokhee, Jon Blower, Keith Haines
Reading e-Science Centre, Environmental Systems Science Centre
Introduction
Modern computer simulations and satellite observations of the oceans and atmosphere produce large
amounts of data on the terabyte scale. Data providers, such as the Met Office and the European Centre
for Medium-Range Weather Forecasts, need a manageable system for storing these datasets, whilst
enabling the many consumers of the data to access them in a convenient and secure manner. Typically,
these datasets are stored in flat files (often compressed) and each institution tends to store data in its
own format (e.g., NetCDF, HDF, GRIB) with the data discretized on a variety of grids.
BARRODALE
COMPUTING
SERVICES LTD.
www.barrodale.com
Extracting a subset of the grid
The following expression generates a timeseries for
temperature at latitude 50, longitude -30 at a 5 m
depth level between 1st January 2004 to 30th June
2004 from grids stored in the database:
select GRDExtract (grid, '((translation –30.0 50.0 0
0) (dim_names time depth lat lon)(dim_sizes 175 1 1
1) (affine_transformation 0 0 0 1 0 0 1 0 0 1 0 0 1 0
0 0) (nonuniform time 7305 …… 7480)(nonuniform
depth 5)(interpolation (time linear)))'::grdspec)
from foamvar where grid_id <= 6;
End-users of the data (which include research institutions, government agencies and private industry)
should not have to know the details of how the data are stored. They require a flexible means of
accessing data and downloading them in the form they prefer. A typical query might involve the
extraction of a subset of data from multiple source files, interpolation, aggregation and re-projection on
a new grid.
There is increasing justification for using database management systems (DBMSs) to store and
manipulate gridded data. The principal advantages of such databases are data integrity, consistency,
flexibility and effective access to data by diverse users of multiple applications. Implementing an
efficient DBMS for large quantities of gridded data is very challenging.
Barrodale Computing Services Ltd. (BCS) have recently developed a software module (the BCS Grid
DataBlade), that plugs into the IBM/Informix Dynamic Server 9.x (IDS) DBMS, for storage of gridded
data and efficient retrieval of data products. The Reading e-Science Centre are evaluating this system
on behalf of the environmental science community.
Exporting a grid to a file
The following expression exports a grid of
temperature that begins at latitude –89.0, longitude 0
and extended to latitude 89, longitude 360, every one
degree sampled at level 5 at time 6940 (1st January
2003) to a GIEF file :
Features of the Grid DataBlade
Processes
queries on the database server, thereby minimizing the amount of network input/output and clientside CPU time required
Extracts
data products up to 50-100 times faster than previous technology
Handles
1D, 2D, 3D and 4D grids
Select grdrowtogief('${curdir}/Tempvar.nc',
‘foamvar', rowid, '((translation 0 -89 0 0)
(dim_names time depth lat lon)(dim_sizes 1 1 179
360)(affine_transformation 0 0 0 1 0 0 1 0 0 1 0 0 1
0 0 0)(nonuniform time 6940) (nonuniform depth
5))'::grdspec) from foamvar where grid_id = 13;
Stores
grids using a tiling scheme in conjunction with Smart BLOBS, with user control over the tile size. This
allows very efficient generation of data products that involve only a small portion of the data
Stores
the data in, and converts it between, more than 40 different planar mapping projections supported by
the IBM/Informix Spatial DataBlade
Supports
irregularly spaced grids in any or all of the grid dimensions
Handles
the presence of multiple vector and/or scalar values at each grid point
Provides
interpolation options using N-Linear, nearest-neighbour or user-supplied interpolation schemes
X
Y
Depth
40 m
52.3 m
75 m
90 m
Extraction
Native
Some Applications
U.S. Navy Pilots can train on real-life scenarios,
including forecasted weather patterns, visibility, wind
speed and direction using PC-based flight simulation
software. The Grid DataBlade extracts timesignificant, location-specific weather data from a four
dimensional gridded dataset housed in IDS, Version
9.3 and passes it to trainees running the flight
simulation on a PC.
The U.S. National Library of Medicine granted BCS
access to their Visible Human Project consisting of
1,871 parallel high-resolution coloured images of a
male cadaver. BCS then subsampled the data to form a
1.6-gigabyte 3D gridded dataset. Users can query the
Grid DataBlade on the BCS Web site to extract 2D
slices of a human cross-sections.
can be at any angle through the 4D volume
Import/Export format is NetCDF; conventions defined in Grid Import-Export format (GIEF)
Provides
application programming interfaces for C, Java and SQL
http://www.barrodale.com/flightpath/index.html
http://www.barrodale.com/grid_Demo/GridBladeApplet.html
Progress Made So Far
We have successfully used the Grid DataBlade to store about 12 GB Forecasting Ocean Assimilation Model
(FOAM) data (temperature and salinity) in an Informix Database.
Then, we tested the functionalities of the Grid DataBlade: extracting data, updating a grid, generating
temperature timeseries involving extracting data from multiple grids and exporting data to files or for
visualisation. These experiments were carried out using programs written in SQL, Java and the Native
interfaces offered by the DataBlade and Informix APIs respectively.
Example Uses
Loading a GIEF file into a table
execute procedure grdfromgief(“pathname”,”table name”);
The above metadata describes a grid storing
temperature data for the FOAM eighth
degree at various levels and times (denoted
by nonunisample1 and nonunisample2
respectively). The starting point of the grid is
at longitude -98.5 and latitude 10. Each
dimension has a set of basis vectors which
tells us which axis varies fastest and by how
much. In this case longitude varies fastest
with 0.125 degrees spacing.
Future Work
Carrying
out some detailed experiments to determine the performance of the DataBlade compared to
traditional file based data access.
The
ability to make threshold type of queries directly on the database server. For instance, the possibility
to find all the regions where the temperature is above/below a certain value.
Creating
virtual datasets. For example, density could be calculated on the database server using
temperature and salinity data which are already stored in the database.
Adding
some new functionality for answering queries of the form: what values of salinity correspond to a
particular temperature, given I have a grid containing salinity and temperature ?
References
Acknowledgements
1. Barrodale Computing Services Ltd., 2002: Storing and manipulating gridded data in databases. Online:
http://www.barrodale.com/grid_Demo/gridInfo.pdf.
We are grateful to Ian Barrodale and Cedric Zala from Barrodale Computing Services Ltd. for kindly providing us an
evaluation version of the Grid DataBlade and for assistance in using it. Special thanks also go to John Pickford from
IBM for providing us a copy of the Informix Dynamic Sever and for support.
2. IBM, 2002: BCS speeds access to gridded data 100-fold with IBM Informix Dynamic Server. Online:
http://www.barrodale.com/docs/ibm_grid_writeup.pdf
printed by
www.postersession.com