Federated Databases for the Geosciences

Download Report

Transcript Federated Databases for the Geosciences

Federated Databases for
the Geosciences
CSIG
July 21, 2005
Douglas S. Greer
Overview
• Database Federation Primer
– Basic concepts and principles
– DB2 Information Integrator
• The CHRONOS Federated Database
– Integration of 7 independently developed
geoscience databases
Top-Level View of a Federated Database
Applications
Federated
Database
Data Source A
Data Source B
Data Source C
Data Source D
Federated DB Data Sources
• Geographically Distributed Data Sources
• Heterogeneous Data Sources
–
–
–
–
Relational Databases – most common
Non-relational Sources
Web Pages / Web Services
Flat Files
Federated Databases
• May or may not actually contain data
• Federated database can create Global Views
that define data in a uniform way across the
data sources
• Applications can then access data through
the global view using the standardized SQL
schema
IBM DB2 Information Integrator
• Provides a framework for strategic information
integration to help applications access, manipulate
and integrate diverse and distributed data sources
across multiple servers in real time.
• Can access structured and unstructured data types
including relational databases such as Oracle,
MySQL, PostgreSQL and MS SQL Server
Connecting to the Remote Database
• Step 1 – Create WRAPPER
– Mechanism that the federated server uses to
communicate with a data source
– Identifies “Driver” code
• Step 2 – Identify SERVER
– Identifies the connection to a data source
– Specifies which WRAPPER to use
– Directly or Indirectly specifies the server name, server
type, version, database name and special parameters
Connecting to the Remote Database
• Step 3 – Specify USER MAPPING
– Maps between a federated database user and an
authorized user (account and password) of a data source
• Step 4 – Define NICKNAMES
– Pointer to a table or view in a data source
– Creates a binding between a local name and the data
source name and hides the associated metadata details
A Simple Federated View
CREATE VIEW <Table_Name> AS
SELECT
(Database #1 SQL Command)
UNION
SELECT
(Database #2 SQL Command)
UNION
SELECT
(Database #3 SQL Command)
Identifying Data Sources
CREATE VIEW <Table_Name> AS
SELECT
‘PALEOSTRAT’ AS db_name
genus_id AS genus
…
FROM PSTRAT.tbl_taxonomy
…
UNION
SELECT
‘PALEOBIOLOGY’ AS db_name
genus_name AS genus
…
Materialized Views
• Federated databases normally do not store data
locally. Data from remote sites is fetched as
needed.
• Materialized Views create a local copy of a
Global-View.
– Advantage: faster access
– Disadvantages: Data may be stale. Refreshes required
• Several of the CHRONOS Global-Views have
versions that use materialized views to increase
performance
CHRONOS Project
• Create a dynamic, interactive and time-calibrated
framework for Earth history
• Network of chronostratigraphy databases
• Online stratigraphic record
• Visualization and analytical tools
• Develop a better understanding of fundamental
Earth processes through time
CHRONOS Federated Databases
• The following databases are all part of the
CHRONOS Federated Database at SDSC based on
IBM’s DB2 Information Integrator
–
–
–
–
–
–
–
Neptune
PaleoStrat
PaleoBiology
Janus
TimeScale
FAUNMAP
MIOMAP
Neptune Database
• Developed at ETH Zürich and currently hosted by
Iowa State University
• Contains microfossil occurrences reported in
DSDP and ODP samples
• PostgreSQL based
• Contains four basic types of data: Fossil Records,
Taxonomy, Age models and Biogeography data
• Schema contains approximately 20 tables with
hundreds of thousands of taxonomic occurrences
PaleoStrat Database
• Developed at Boise State University in
collaboration with the CHRONOS
• Designed to support geoscience tools with broad
applicability
• Contains sedimentary, paleontologic and
stratigraphic data
• MS SQL Server based
• Approximately 120 tables with thousands
taxonomic occurrences
• Data from other databases currently being loaded
PaleoBiology Database
• Hosted by the National Center for Ecological
Analysis and Synthesis (NCEAS) at the University
of California at Santa Barbara
• Contains collection-based occurrence and
taxonomic information about marine and
terrestrial animals and plants
• MySQL based
• 16 tables with hundreds of thousands of
taxonomic occurrences
Janus Database
• Database for the Integrated Ocean Drilling
Program (IODP) hosted at Texas A&M University
• Contains numerous types of ocean drilling data
collected by United States, Japanese and European
ships
• Oracle based
• Approximately 580 tables with millions of
taxonomic occurrences
TimeScale Database
• Contains data and information from the 2004
Global Time Scale of the International commission
on Stratigraphy and 19 other time scales
• Supports web service conversions tools
• PostgreSQL based
• Approximately 25 tables with thousands of data
records
FAUNMAP Database
• Hosted by Illinois State Museum
• Contains information about the historical
distribution of mammal species in the United
States
• MySQL based
• Approximately 30 tables with tens of thousands of
data records
MIOMAP Database
• Hosted by University of California, Berkeley
• Contains comprehensive spatial and temporal
analysis of Miocene mammal taxa for the Western
United States
• MySQL based
• Thousands of records in a relatively small number
of tables
The Taxa Global-View
• Simple View to list taxa in all of the databases
• CHRONOS Taxa
–
–
–
–
–
Database Name
Table_Name
Taxon_ID
Genus
Species
Taxa Global View Example
Conop9 Application
• Developed by Peter M. Sadler, Dept. of Earth
Sciences, Univ. of California Riverside
• Correlates stratigraphic sections by minimizing
the number of inconsistencies in the order of
first and last occurrences of fossils between
sections
• Originally developed for flat files then adapted
to CHRONOS DB2/II global-views
CONOP9 Data Correlation
Conop9 Global View
• Developed for the Conop9 Application
• The Conop9 SDSC global-view provides a much
larger collection of data than that available in the
older flat file system
• The CHRONOS global-view presents exactly the
data needed by Conop9 but uses different SQL
statements for each database – this involves joins
across four tables in Neptune, seven tables in
PaleoStrat and five tables in Janus
Conop9 Global-View Attributes
• CHRONOS Conop Global View Fields
–
–
–
–
–
–
Database Name
Genus
Species
Taxon_id – Used to create Conop9 input tables
Hole_id – Which stratigraphic section does this come from
LAD – Last Appearance Datum, newest observation of this
taxa for this hole
– FAD – First Appearance Datum, oldest observation of this
taxa for this hole
– LAD and FAD are the result of an SQL computation
Conop9 Global View Example
Age-Depth Plot
Age/Depth Plot Global-Views
• Uniform Global-View of hole location for ADP
application
• Surprisingly there are significant differences between
databases
• CHRONOS Hole_Summary
–
–
–
–
Database Name
Hole_ID
Latitude
Longitude
Age/Depth Plot Views
• Uniform Global-View for Hole/Taxa
Description for ADP application
• CHRONOS Hole_Desc
–
–
–
–
–
Database Name
Hole_ID
Elevation
Meters_of_Section
Taxa_Count
Age/Depth Global View Example
Questions ?