ActiveStorage

Download Report

Transcript ActiveStorage

Integrated Grid workflow for
mesoscale weather modeling
and visualization
Zhizhin, M., A. Polyakov, D.
Medvedev, A. Poyda, S. Berezin
Space Research Institute of the
Russian Academy of Sciences
Abstract
•
•
•
•
For the model input and output we use a scalable parallel storage and data mining
system called ActiveStorage. It can store different types of weather data, provided
they are in the same Command Data Model (UNIDATA CDM): NCEP reanalysis,
NCDC stations weather data, MM5 model output.
The MM5 is a mesoscale weather forecast model. For the input boundary
conditions the model takes basic parameters such as elevation, air pressure and
temperature, etc. It can ingest reanalysis and direct observation data. As the
output the model provides high-resolution regional weather grids.
To make the MM5 input data and the modeling results accessible on the Grid to
the Earth Science community, we have developed a set of grid services (resources
and activities) inside the OGSA-DAI (both ver. 2 and 3) grid service container.
To visualize the weather data we have developed a special plugin for the NASA
World Wind which can read the data directly from the OGSA-DAI resources and
plot it over the 3D globe in different ways, such as contour lines, filled areas and
vector fields.
Active Storage, Modeling, Data Mining and
Visualization Services
Weather observations and
reanalysis time series
Geographical information:
elevation, hydrology, ...
Active Storage
Common Data Model
Microsoft SQL Server Cluster
Numerical Modeling
Raw data Input
Model output
OGSA-DAI and Matlab API
Time series
Grids
Trajectories
Derived products from
satellite data
Data Analysis
Environmental Scenario
Search Engine (ESSE)
Trend and change detection
algorithms
Ge Nu Ra
o g me w d
ra p ri c a t
hic al m a
al
inf odel
o rm s
ati
on
Trends and relations
Parallel mesoscale
meteorologial model MM5
Windows Compute Cluster +
MPI parallelization
Visaulization
Microsoft Virtual Earth
NASA World Wind
EVL UIC Scalable Graphics
Environment (SAGE+SAIL)
ActiveStorage
• ActiveStorage is a generic storage for arrays of primitive
data types.
• Its data model is based on the Unidata’s Common Data
Model, used in netCDF, HDF5 and OpenDAP.
• Basically, ActiveStorage is a SQL Server database with CLR
stored procedures and a client library.
• The stored procedures and the client library provide an
abstraction layer for data access.
• Large arrays are split into chunks and can be spread across
several parallel database servers for better performance.
ActiveStorage components
Data and
directory
tables
Metadata
tables
SQL Server 2005/2008 DB
Client library
Stored procedures
Common Data Model
Dataset
-name
Group
-name
DataType
Dimension
Attribute
-name
-length
-name
-value
-dataType
Variable
-char
-byte
-short
-int
-long
-float
-double
-String
-name
-shape
-dataType
This is the Common Data Model (CDM) used in the recent versions of OpenDAP, netCDF
and HDF5. Its purpose is the representation of multidimensional scientific data.
How it works
1. Pass multi-dimensional data
request to the client library
2. Issue commands to
the database server
Application
Client library
4. Assemble the data parts into
one multi-dimensional array
SQL Server
DB
3. Return the data parts
to the client library
3. Select the requested
data from several chunks
Parallel query processing
SQL Server
DB 1
Application
Client library
SQL Server
DB 2
Parallel query performance
1 database server
4 parallel database servers
NCEP/NCAR Weather Reanalysis
• Continually updating gridded data set
• Incorporates observations and global
climate model output
• 74 weather parameters
• 5000 netCDF files, 30 – 500 MB each
Time coverage:
Grids:
• 1948 – 2008
• Regular grid, 2.5 x 2.5 degrees
• 4-hourly values
• T62 Gaussian grid, 192 x 94 points.
NCDC Integrated Surface Database
Fixed ground stations
Ships
Mobile stations
Buoys
• 1901 – 2008 time coverage.
• 470 000 ASCII files packed with gzip.
• 30 million sensors.
• 50 GB packed; 400 GB unpacked.
• 1.7 billion observations.
When you’ve downloaded and unpacked the data...
Control data section
Mandatory data section
Section marker
Additional data section
0189010020999992007022817004+80050+016250FM-12+000899999V0202201N008019999999N0090001N1+00631+00541098651ADDGA1031+003009999KA1120N+99999...
date time
lat
lon
Group marker
Parameter group
MATLAB script using ActiveStorage library
import ru.wdcb.mdb.NcConnector
import com.microsoft.sqlserver.jdbc.SQLServerDriver
s=
'jdbc:sqlserver://localhost:1433;databaseName=NCEP_01;user=g
uest;password=guest';
connector = NcConnector();
ncid = connector.nc_open(s,0);
varid = connector.nc_inq_varid(ncid,'air');
origin = [0 0 10 10];
size = [80000 1 1 1];
stride = [1 1 1 1];
A = connector.nc_get_vars_short(ncid,varid,origin,size,stride);
plot(A, 'DisplayName', 'A', 'YDataSource', 'A'); figure
origin = [0 0 0 0];
size = [1 1 73 144];
stride = [1 1 1 1];
B = connector.nc_get_vars_shortm(ncid,varid,origin,size,stride);
B = reshape(B,[73 144]);
imagesc (B); figure(gcf);
Environmental Data Service:
OGSA-DAI plugin
Tomcat
NCEP
database
Clients
getProperty: sources
DAI
sources list
MM5
weather
model
getMetadata
SPIDR
databases
Metadata XML
MS Excel
getXMLData
Active
Storage
NetCDF file
serialisation
NWS
database
User
data XML
getNetCDFData
URL to NetCDF file
Dataexport
NetCDF file
Any client
Activities for data export
• XML output stream
– We have plugin for NASA World Wind to visualize XMLformatted data
– Can easily be transformed using XSLT to web page or another
XML document, e.g. MS Excel
– Can be used as input for ESSE fuzzy logic search engine
• NetCDF binary data file
– Standard for scientific data storage in files
– There are several visualization programs for NetCDF
– Compatible with Unidata Common Data Model standard
Data flow management by OGSA-DAI
OGSA-DAI query from
single data source
OGSA-DAI query from
distributed data sources
Parallel mesoscale weather model
MM5
Same Source Parallel MM5
• Source code for the parallel
MPI and the single process
MM5 model are the same
• Automated parallel code
generation from MM5 sources
by ANL:
– FLIC compiler
– RSL library for model domain
segmentation and message
exchange
• We have ported MM5 code to the
MS Windows Server 2008 HPC
platform
MM5 model as a grid client
Visualizing data from ActiveStorage
with NASA WorldWind
A NASA WorldWind plugin, developed at the
Moscow State University allows to retrieve data
from ActiveStorage via an OGSA-DAI service.
Several kinds of visualization are available:
- isolines
- color map
- vector field
OGSA-DAI services can be used by other
applications to retrieve data from ActiveStorage
NASA World Wind as a grid client
Using OGSA-DAI
services and a
special API plugin,
the NASA World
Wind can visualize
both the MM5 input
and output datasets