ODB Training 2005

Download Report

Transcript ODB Training 2005

Introduction to
Observational DataBase
(ODB)
Sami Saarinen, Paul Burton
ECMWF
22-Mar-2006
ODB Training 2006
slide 1
ECMWF
Overview
 Introduction to ODB
 Creating a simple database
 Use of simulobs2odb –program
 Visualizing data using odbviewer, ODBTk
 The bigger picture
 ODB within IFS/4DVAR-system
 A more complex database
 Manipulating ODB from Fortran90
 Tools: odbless, odbdiff, odbcompress, odbdup, odb2netcdf
ODB Training 2006
slide 2
ECMWF
Overview
 Introduction to ODB
 Creating a simple database
 Use of simulobs2odb –program
 Visualizing data using odbviewer, ODBTk
 The bigger picture
 ODB within IFS/4DVAR-system
 A more complex database
 Manipulating ODB from Fortran90
 Tools: odbless, odbdiff, odbcompress, odbdup, odb2netcdf
ODB Training 2006
slide 3
ECMWF
Introduction to ODB
 ODB is a tailor made database software developed at
ECMWF to manage very large observational data volumes
through the IFS/4DVAR-system, and to enable flexible postprocessing of observational data
 Observational database usually contains following items:
 Observation identification, position and time coordinates
 Observation value, pressure levels, channel numbers
 Various quality control flags
 Obs. departures from background and analysis fields
 Satellite specific information
 Other closely related information
ODB Training 2006
slide 4
ECMWF
AMSU-A data before screening
ODB Training 2006
slide 5
ECMWF
Basic components of ODB
 ODB/SQL-language
 Data Definition Language: To describe what data items
belong to database, what are their data types and how
they are related (if any) to each other
 Data Query Language: To query and return a subset of
data which satisfies certain user specified conditions.
This is the key feature of the ODB software !!
 Fortran90 interface layer
 Data manipulation : create, update & remove data
 Execute ODB/SQL-queries and retrieve filtered data
 To control MPI and OpenMP-parallelization
ODB Training 2006
slide 6
ECMWF
ODB/SQL compilation system
ODB Training 2006
slide 7
ECMWF
Typical ODB usage patterns
 Database can be created interactively or in batch mode
 We usually run our in-house BUFR2ODB in batch
 New observation types can also be fed in via text file
 Complete database manipulation currently prefers using
Fortran90-interface, but read/only database can also be
accessed via rudimentary client-server –interface (C/C++)
 When database has been created, the application program
normally queries data and places the result (also known as
view) into a data matrix allocated by the user
 There can be virtually any number of active views at any
given time. These can be updated and fed back to database
ODB Training 2006
slide 8
ECMWF
Overview
 Introduction to ODB
 Creating a simple database
 Use of simulobs2odb –program
 Visualizing data using odbviewer, ODBTk
 The bigger picture
 ODB within IFS/4DVAR-system
 A more complex database
 Manipulating ODB from For
 Tools: odbless, odbdif, odbcompress, odbdup, odb2netcdf
ODB Training 2006
slide 9
ECMWF
Creating a simple database
 We will create a very simple database using text files
 The 3 text files describe
 Data layout i.e. what data items comprise this ODB
 Location and time information of observations
 Actual observation measurement information for each
location at the given pressure levels
 Feed these files into simulobs2odb-program
 Discover the data values in database by using odbviewer
ODB Training 2006
slide 10
ECMWF
Data definition layout : MYDB.ddl
CREATE TABLE hdr AS (
CREATE TABLE body AS
(
seqno
pk1int,
entryno
pk1int,
obstype
pk1int,
varno
pk1int,
codetype pk1int,
vertco_type
pk1int,
lat
pk9real,
press
pk9real,
lon
pk9real,
obsvalue
pk9real,
date
yyyymmdd,
time
hhmmss,
body
@LINK,
);
);
ODB Training 2006
slide 11
ECMWF
Input file#2 : hdr.txt
#hdr
obstype = 2
codetype = 141
seqno lat lon
1
45 -15
ODB Training 2006
date
time
body.len
20041101
000000
1
slide 12
ECMWF
Input file#3 : body.txt
#body
entryno
varno
vertco_type
press
obsvalue
2
1
50000
251.0
1
ODB Training 2006
slide 13
ECMWF
Running simulobs2odb
 Initialize ODB interactive environment :
 use odb
 Create database using the following simple command :
 simulobs2odb –l MYDB –i hdr.txt –i body.txt
 As a result of these commands, a small database called
MYDB has been created and it contains one data pool with
two tables hdr and body, which are linked (related) to each
other via special @LINK data type
 It is now easy to extend database by providing more data,
or specifying more data items, or adding more tables, or all
above at the same time
ODB Training 2006
slide 14
ECMWF
Visualizing with odbviewer
 History: odbviewer was originally written to be used as a
debugging tool for ODB software development
 Linked with ECMWF graphics package MAGICS/MAGICS++
it displays coverage plots
 Also a textual report generator
 Displays output of data queries
 “Sensitive” to ODB/SQL-language : tries automatically
produce both coverage plot and textual report for the user
 Textual report itself can be invaluable source of information
for further post-processing tasks
ODB Training 2006
slide 15
ECMWF
Running odbviewer
 Go to database directory
 cd MYDB
 Run
 odbviewer –q ‘SELECT lat,lon,press,obsvalue\
FROM hdr, body \
WHERE obstype = 2’
ODB Training 2006
slide 16
ECMWF
odbviewer coverage plot
Our observation !!
ODB Training 2006
slide 17
ECMWF
Some odbviewer [options]
-h
List of options (gimme some “help” !)
-q ‘SQL-stmt’
Provide ODB/SQL-statement inline
-v viewname/poolno
Choose SQL name (& optionally pool number)
-p “1-10,12,15”
Choose from a subset of pools
-R
No radians-to-degrees conversion for (lat,lon)
-r
Enforce radians-to-degrees conversion
-c
Clean start (i.e. recompile all)
-e editor
Choose preferred editor
-e batch
Run in batch mode (same as –e pipe)
-N
Do not produce a report at all
-I
Do not show plot immediately
-P projection
Change projection
-C file.cmap
Supply a color map file
-A plot_area
Choose plotting area
ODB Training 2006
slide 18
ECMWF
ODBTk : The ODB Toolkit
GUI based ODB visualisation tool
Easy way for non-experts to build SQL
Interactive viewing of observational data
Can refine SQL “WHERE” statement as you
view the data
Portable, lightweight application
 Requires ODB, perl, Fortran90 & C compilers
ODB Training 2006
slide 19
ECMWF
ODBTk : Building an SQL
Twin views on structure
 Hierarchical structure
 Allows relationship between
tables/columns to be seen
 “Flat structure”
 Easy to find a given column/member
or table
 Allows user to sort structure
SQL library
 Both local & shared
ODB Training 2006
slide 20
ECMWF
Visualising Coverage
ODB Training 2006
slide 21
ECMWF
Visualising X-Y plots
ODB Training 2006
slide 22
ECMWF
Overview
 Introduction to ODB
 Creating a simple database
 Use of simulobs2odb –program
 Visualizing data using odbviewer, ODBTk
 The bigger picture
 ODB within IFS/4DVAR-system
 A more complex database
 Manipulating ODB from Fortran90
 Tools: odbless, odbdiff, odbcompress, odbdup, odb2netcdf
ODB Training 2006
slide 23
ECMWF
AMSU-A data after screening
Under 10% left active !!
ODB Training 2006
slide 24
ECMWF
ODB within IFS/4DVAR-system
ECMA/ODB
CCMA/ODB
Output BUFRs
ODB Training 2006
slide 25
ECMWF
A more complex database
 In the real world a database may contain many more tables
(>>5) than in the simple example earlier
 Each table can contain 10—50 data columns
 There can also be a sophisticated data hierarchy (next
slide) to describe potentially complex relationships
between tables
 In order to provide a good parallel performance on
supercomputers, data tables are furthermore divided into
data pools
 They behave like sub-databases within a database
 Allows much bigger data sets than otherwise possible
ODB Training 2006
slide 26
ECMWF
Comprehensive data hierarchy
ODB Training 2006
slide 27
ECMWF
ECMWF BUFR to ODB conversion
 ODBs at ECMWF are normally created by using bufr2odb
 Enables MPI-parallel database creation  efficient
 Allows retrospective inspection of Feedback BUFR data
by converting it into ODB
 bufr2odb can also be used interactively, for example:
bufr2odb –i bufr_input_file –I 1-20 –n 4
 The preceding example creates 4 pools of ECMA database
from the given BUFR input file, but includes only BUFR
subtypes from 1 to 20 (inclusive)
 Feedback BUFR to ODB works similarly:
fb2odb –i feedback_bufr_file –n 8 –u 2
ODB Training 2006
slide 28
ECMWF
Overview
 Introduction to ODB
 Creating a simple database
 Use of simulobs2odb –program
 Visualizing data using odbviewer, ODBTk
 The bigger picture
 ODB within IFS/4DVAR-system
 A more complex database
 Manipulating ODB from Fortran90
 Tools: odbless, odbdiff, odbcompress, odbdup, odb2netcdf
ODB Training 2006
slide 29
ECMWF
Manipulating ODB from Fortran90
 Currently Fortran90 is the only way to fill an ODB database
 simulobs2odb is also a Fortran90-program underneath
 likewise odbviewer or practically any other ODB-tool
 Also: to fetch and update data, Fortran90 is necessary
 ODB Fortran90 interface layer offers a comprehensive set
of functions to
 Open & close database
 Attach to & execute precompiled ODB/SQL queries
 Load, update & store queried data
ODB Training 2006
slide 30
ECMWF
An example ODB program
program main
use odb_module
implicit none
integer(4) :: h, rc, nra, nrows, ncols, npools, j, jp
real(8), allocatable :: x(:,:)
npools = 0
h = ODB_open(‘MYDB’, ’OLD’, npools=npools)
< data manipulation loop ; see next page >
rc = ODB_close(h, save=.TRUE.)
end program main
ODB Training 2006
slide 31
ECMWF
Data manipulation loop
DO jp=1,npools
! Execute SQL, allocate space, get data into matrix
rc = ODB_select(h,’sqlview’,nrows,ncols,poolno=jp)
allocate(x(nrows,0:ncols))
rc = ODB_get(h,’sqlview’,x,nrows,ncols,poolno=jp)
! Update data, put back to DB, deallocate space
call update(x,nrows,ncols) ! Not an ODB-routine
rc = ODB_put(h,’sqlview’,x,nrows,ncols,poolno=jp)
deallocate(x)
rc = ODB_cancel(h,’sqlview’,poolno=jp)
! Use the following only with READONLY-databases
! rc = ODB_release(h,poolno=jp)
ENDDO
ODB Training 2006
slide 32
ECMWF
Compile, link and run
(1) use odb
# once per session
(2) odbcomp MYDB.ddl
# once only;often from file MYDB.sch
(3) odbcomp sqlview.sql # recompile only when changed
(4) odbf90 main.F90 update.F90 –lMYDB –o main.x # link
(5) ./main.x
ODB Training 2006
# run
slide 33
ECMWF
Overview
 Introduction to ODB
 Creating a simple database
 Use of simulobs2odb –program
 Visualizing data using odbviewer, ODBTk
 The bigger picture
 ODB within IFS/4DVAR-system
 A more complex database
 Manipulating ODB from Fortran90
 Tools: odbless, odbdiff, odbcompress, odbdup, odb2netcdf
ODB Training 2006
slide 34
ECMWF
odbless
 A textual browser that allows to look at ODB data page-bypage –basis (a little like Unix less-command):
 By default calculates statistical summary for each
retrieved data column
 Cheap with near-optimal ODB data access pattern
 User has a choice of specifying starting row
 Usage:
odbless –q ‘SELECT column(s) FROM table(s) WHERE …’ \
–s starting_row –n number_of_rows_to_display \
[–b buffer_size –X]
ODB Training 2006
slide 35
ECMWF
odbdiff
 Enables to compare two ODB databases for differences
 Very useful tool when trying to identify errors/differences
between operational and experimental 4DVAR runs
 Usage:
odbdiff –q ‘SELECT …’ DATABASE1 DATABASE2
 By default brings up an xdiff-window with respect to diffs
 If latitude and longitude were given in the data query, then
also produces a difference plot using odbviewer-tool
ODB Training 2006
slide 36
ECMWF
odbcompress
 Enables creation of very compact database from the
existing one for
 archiving purposes, or for smaller footprint
 Makes post-processing considerably faster
 At this point the user has choices of both
 Truncating the data precision
 Leaving out columns that are less of importance
 Early tests show that this new tool achieves compression
factors from 2.5X to 11X
 the higher compression being for satellite data !!
ODB Training 2006
slide 37
ECMWF
odbdup
 Duplicates database(s) by copying metadata (low volume),
but shares the actual data (high volume)
 Allows database sharing between multiple users
 Over shared (e.g. NFS mounted) disk
 Enables creation of time-series database, for example:
odbdup –i “200601*/ECMA.conv” –o USERDB
 The previous example creates a new database labelled as
USERDB, which presumably spans over all the
conventional observations during January 2006
 The heureka is : user has now access to a whole month
of data as if it was situated in one single database !!
ODB Training 2006
slide 38
ECMWF
odb2netcdf
 Translates the given ODB-query (or whole ODB-table) into a
series of NetCDF-files, by default one file for each ODB
data pool
 Usage:
odb2netcdf –q ‘SELECT …’
 The result files can be viewed with standard NetCDF tools
like ncdump and ncview
 The files can also be produced in NetCDF packed format
(with a caveat of truncated precision)
ODB Training 2006
slide 39
ECMWF
Also … Some interesting facts
 Written mainly in C-language
 Except Fortran90-interface and IFS/4DVAR interface
 Except BUFRODB (by Milan Dragosavac)
 ODB/SQL is currently converted into C-code
 10 lines of SQL generates >> 100 lines of C-code
 Standalone ODB installation (w/o IFS) is also available
 Can be built in about 30 minutes for Linux/laptop
 Tested at least on the following machines
 SGI/Altix, IBM Power3/4, Linux Intel/AMD, VPP, …
 Automatic binary data conversion guarantees database
portability between different machines
ODB Training 2006
slide 40
ECMWF
… and some ODB “limitations”
 ODB software is clearly meant for large scale computation
since – given lots of memory and disk space, fast CPUs:
 A single program can handle up to 2^31 ODB databases
 A single database can have up to 2^31 data pools
 A single database can have any number of tables
 A single table in a data pool can have up to 2^31 rows
and (by default) 9999 columns
 A single ODB/SQL-query over active data pools can
retrieve up to 2^31 rows in one go
 These really big numbers show that ODBs potential is on
parallel computers, but we haven’t forgotten desktop PCs!
ODB Training 2006
slide 41
ECMWF
Finally…
 ODB software is developed to allow unprecedented amounts
of satellite data through the IFS/4DVAR system
 Software has been operational at ECMWF since
June’2000, but is still evolving
 Emphasis is now on graphical post-processing and how
to enable fast access to very large amounts of data
 Other ECMWF member states and co-operating countries
that are also using or just becoming users of ODB
 MeteoFrance, DWD, Hungary, Aladin/HIRLAM-nations
 MetOffice is considering via collaboration with BoM
 University of Vienna via re-analysis ERA40 collaboration
ODB Training 2006
slide 42
ECMWF