Transcript Level 2

A Cyberinfrastructure Framework
for Discovery, Integration, and
Analysis of Earth Science Data
A Prototype System
*
*
*
*
**
A. K. Sinha, Z. Malik, A. Rezgui, A. Dalton, K. Lin
* Virginia Tech
** San Diego Supercomputer Center
Hypothesis Evaluation: Are A-Type Rocks in
Virginia related to a Hot Spot Trace ?
Spatio-Temporal Distribution
of Igneous Rocks
Ho
La
tS
u re
po
nti
tT
an
ra c
Cr
e?
us
ta
nd
Lit
ho
sp
he
re
m
Plu
ea
H
e
d
2
GEON’s DIA Engine

Evaluating a Hypothesis requires



Discovery - Access to Data
Integration of Data – Provide data products
Analysis of Data – Verify Hypothesis
3
Data Discovery

Registration of Data : Pre-requisite for
Data Discovery



Level 1 Registration – Keywords
Level 2 Registration – Ontologic Classes
Level 3 Registration – Item Detail Level
4
Registration of Data:
Key to Discovery, Integration and Analysis

Level 1


Level 2


Discovery of data resources (e.g., gravity, geologic maps, etc)
requires registration through use of high level index terms. GEON
has deployed extension of AGI Index terms -will be cross indexed
to others such as GCMD, AGU
Discovering Item level databases requires registration at data level
ontologies (e.g. bulk rock geochemistry, gravity database)
Level 3

Item detail level registration (e.g., column in geochemical database
that represents SiO2 measurement). This level of registration is a
requirement for semantic integration
5
Level 1 Registration
GEON Index
Ontology
AGI Index Terms
http://www.geoscienceworld.org/
6
Level 2: Registration at the Item Level
Level 2 Registration
Ontological Look at Virginia Tech Igneous Rock Database
Methods & References
MapReference
References
AnalyticalMethods
BodyShapes
Isotope
FeTreatmentMinerals
Structure
Structure
BulkRockGeochemMethods
Fractures
Fabric
Location
Isotope
Rb_Sr_Isotope_Whole_Rock
Sm_Nd_Isotope_Mineral
U_Th_Pb_Isotope_Mineral
Mineral
Rb_Sr_Isotope_Mineral
Sm_Nd_Isotope_Whole_Rock
Rock
Rock
Location
Geologic Images
RockGeoChemistry
GeologicLocation
Images
Element
U_Th_Pb_Isotope_Whole_Rock
Mineral
MineralChemistry
ModelComposition
7
Level 3 Registration
AnalyticalOxideConcentration
1
analyticalOxide: AnalyticalOxide
concentration : ValueWithUnit
0..n
errorOfConcentration : ValueWithUnit
A Section from Planetary Material Ontology
GEON
approach of
registering
data to
concepts
removes
structural
(format) and
semantic
heterogeneity
8
DIA Engine (1)

How does GEON discover data



Keywords, Resource Type, Temporal, Spatial
Invoke GEON protocol for discovering databases
Discovery, Integration and Analysis Engine


Retrieve the discovered data from registered
databases
Emphasize Geospatial and Aspatial Discoveries
(Not all things to be done through a Map-based
browser)
9
DIA Engine (2)
Geospatial Engine
Aspatial Engine
Geoscience Templates
Geologic Map (USA)
Geologic Map (States)
Geologic Provinces
Terrane Map
Geophysical Map
- Experimental Databases
- Tools
10
High-Level View of the DIA
Engine




Raw
Data
User specifies class of data for
analysis
The DIA Engine derives and retrieves
the different data sets needed for
the requested analysis
Query
Tool
The DIA Engine applies processing
and filtering techniques to generate
the requested data product
Data
Product
Data products and Query
Steps can be saved
Modeling
Computation
11
Data products (1)


Data products can be in the form of Interactive Maps,
Interactive Filtering Diagrams or Excel Data Files
Examples:

A map showing the A-Type bodies in the Mid-Atlantic region

An Excel file giving the ages of those A-Type bodies

A gravity database table spatially related to A-Type bodies

Saved as a contoured gravity map
12
Data products (2)

Data products can be:

Pre-Packaged


Quickly queried but not flexible and provide little
support for complex scientific discovery
Created Dynamically


May require on-the-fly, extensive query processing
but enables far richer possibilities for scientific
discovery
Requires Semantic Integration
13
Data Integration (1)

Semantic integration of data products
requires:


Ontologies: a common language to interpret
data from different sources
Data sharing: requires data registration

Fine grain (i.e., item-level) registration is necessary
to enable the automatic processing (by tools) of
shared data.
14
Data Integration (2)
Ontologically
Registered Data
(Geo-physics)
Ontologically
Registered Data
(Geo-chemistry)
Data Owner
Raw
Data
Data Owner
Raw
Data
Register
Data
Geo-physics
Ontology
Geo-chemistry
Ontology
Geo-chemistry
Ontology
Ontologically
Registered Data 2
QT 1
Ontologically
Registered Data 1
Integration
Class
Location
DP 1
QT 2
DP 2
Integration across
ontological classes
Query
Tool
Data
Product
Integration within an
ontological class
15
Limitations of Current Data
Sharing Approaches


Each research group adopts its own acronyms,
notations, conventions, units, etc.
Data sharing is of limited scope



Integration is difficult



Data discovery is ad-hoc
Only a small community of scientists may be aware of
and share a given data set
Extensive conversion efforts may be needed
Absence of streamlined integration leads to poor
ability to answer complex scientific questions
Solution: Ontology-based Data Registration
16
Query Building

Menu-based (Used in the Demo)

The GUI lets the user select only specific items which
in turn queries only a subset of the data



A robust system informs the user of any incorrect input and
guides in the right direction
Results are guaranteed as the query is definitely
answered
Text-based



The entire database can be queried
Result sets may be empty
Only a small mistake in the query can return incorrect
results, without the user being able to point out the
fallacy
17
Menu-based Query Building


In a selected “region of interest” the user is
provided with a number of options (the menu)
User clicks through the different menus to build
an exact query

Click history is maintained to enable future referencing
Menu # 1
Menu # 2
Menu # 3
Menu # 4
Menu # 5
18
Query Tool Selection


Tools provided by GEON can be used to answer a query
OR
Other geologic tools can be incorporated (invocation
interfaces need to be defined)

Example: GCD-Kit can be used for classification, geotectonic and
normative calculations for Igneous Rocks
19
Analysis

Data Product(s) generated can be analyzed using
various techniques


Modeling
Computation
20
Java/VB Scriptenabled
Web browser
Q: A-Type polygons in a region R
using discrimination diagram D ?
Java/VB Script
ASP.net
VB.net
Visual Basic
GEON
Server ESRI ArcGIS
Virginia
Server
Tech
User
10000*G
a/Al vs.
Zr
GeoSpatial
Data
MS SQL
Server
MS SQL
Server
GeoChemical
Data
Geo-Spatial
Data Server
Discrimination
Functions
Geo-Chemical
Data Server 3
(Texas)
Geo-Chemical
Data Server 1 Virginia Tech
(Mid-Atlantic)
US National
Gazeteer
ESRI
ArcSDE
FeO*/
MgO vs
Zr+Nb+
Ce+Y
Y vs. Nb
Geo-Chemical
Data Server 2
(Wyoming)
GeoChemical
Data
MS SQL
Server
Web Server
SDSC
Rock
Classification
Ontology
GeoChemical
Data
Workflow Associated with the Demo
21
Used Technologies

User Interface:




Back-End:




Java / VB Script
ASP.net
VB.net
ESRI ArcGIS Server 9.1
ESRI ArcSDE 9.1 (Spatial Database)
Microsoft SQL Server (Geo-Chemical Database)
Functionality Coding:

Visual Basic (to code the discrimination filters)
22
Demo Starts Here
23
Current Tool Sharing
Approaches




Each research group develops its own tools
Tools developed by a research group are rarely
used by other groups
Redundancy of development efforts
Little interoperability amongst tools


Interaction amongst different tools is often not
possible or requires extensive (re)coding
Solution: Wrap Tools as Web Services
Accessible to the Scientific Community
Worldwide
24
The Future: Integration through
Ontologies and Web Services

Benefits of Web Services

Facilitate Integration



Provide High Reusability


Tools developed independently may easily be integrated into
new applications
Example: Discrimination tools may be made as Web services
More tools available to the research community
Reduce development time, effort, and cost
25
Web Services Explained (1)
User
User
Application
Provider 2
WS Standards
WSDL:
Web Services
Description
Language
Application
Provider 1
2
Discover
Web
Service
3
Invoke
Web
Service
UDDI Registry
UDDI:
Universal
Description,
Discovery, and
Integration
SOAP
Messages
Web
WSDL Service
Descriptions
SOAP:
Simple
Object Access
Protocol
1
Publish
Web
Service
UDDI Registry
Function 2
Function 3
Function 1
Web
Services
Service
Provider 1
Service
Provider 2
Service
Provider 3
26
Web Services Explained (2)

WSDL (Service provider describes service using WSDL)




UDDI (Service provider publishes service using UDDI)


An XML-based language to describe the capabilities of Web services
The capabilities of a WS are described as a set of end points that
can exchange messages
WSDL is part of UDDI
A Web-based directory where service providers may list their
services and where service consumer may retrieve the services
published by the providers (like yellow pages)
SOAP (Clients and services communicate using SOAP)

An XML-based protocol used to encode the messages (requests and
responses) exchanged between a Web service and its clients.
27
Discovery
Geospatial Query
Aspatial Query
Integration
Between Different Ontologic
Classes
Within Same Ontologic Class
Geochemical
A-Type Identification
Geochemical
Geophysics
Geologic Time
VA. Ontologically
Registered Data
WY. Ontologically
Registered Data
TX. Ontologically
Registered Data
Ontologically Registered Data
Data Product
Data Product
Analysis
Hypothesis Evaluation: Are A-Type Rocks in Virginia related
to a Hot Spot Trace ?
28