Isao Kojima - National e

Download Report

Transcript Isao Kojima - National e

Grid-based Database Integration in
AIST
Isao KOJIMA
Said Mirza Pahlevi
Data Intensive Computing Team
GTRC,AIST
{kojima,mirza}@ni.aist.go.jp
1
National Institute of Advanced Industrial Science and Technology
Overview
Background for Database Integration

Distributed and Heterogeneous
Target


Database Discovery
Multi level application specific View
(under Autonomous /Dynamic environment)
Approach

Bottom-up
We just started - Current Results are not so large

GT3.0/OGSA-DAI based tools
Bring external web databases into Grid environment
Query conversion service to integrate different XML schema
Demo
2
National Institute of Advanced Industrial Science and Technology
Background
A.I.S.T.=National Research Institute
Research Information Databases

Online on the Web
Bio/Life science
Geo/Earth science
Chemical/Material
Patent/Bibliographic
3
National Institute of Advanced Industrial Science and Technology
AIST & Tsukuba Area

In AIST
nearly 100 online DBs(urls)
Tsukuba Science City


96 research institutes
52 public/governmental research labs.
>880urls
Number is not so large,
But the problem is the same
(heterogeneous, distributed)
4
National Institute of Advanced Industrial Science and Technology
Current Status
Each databases is separated/distributed

Can share some information
Chemical Structure, CAS Registry No,
Latitude, Longitude
Metadata Structure(Dublin Core,GILS,MARC,,)

Integration/Interconnection is useful
New research aspects/views
Multi Integration View(Organization, Area, Research Domain)
Most of them supports Form-query only
Limitation of Web interfaces

Need to combine with computing
Data Mining
Distributed Computation
5
National Institute of Advanced Industrial Science and Technology
Target
Integrate existing database/computing resources
within Grid framework
OGSA(OGSI),OGSA-DAI(S) framework
Provide Database Discovery function


Advanced Information Service
Autonomous Resource Management
Provide Application Specific Database View

Schema Integration, Virtualization, Ontology
Database Autonomy/Dynamism
6
National Institute of Advanced Industrial Science and Technology
Bottom-Up Approach
for Research & Deployment
Workflow
Our Target
Advanced
Our Database Discovery
Application
Specific View
Target
Database Autonomy
Transaction
Distributed
Query Processing
Practical
Bottom-Up
Approach
Remote Access
Web-Service(WS-XX)
GGF-DAIS,OGSA-DAI
Our Application Field
(Scientific Data)
Existing external web databases
EC site, Search Portal etc,,,
National Institute of Advanced Industrial Science and Technology
7
Result &
–Summary
Approach
the Problems
GridBring
Proxy/Mediator
Web Databases
database
into service
OGSA to access
web Environment
databases.
Howuniform
to accessOGSA-DAI
external webSQL
by using
OGSA/DAIS
 Provide
access
to
framework
external
web databases (including join)
Integrate within OGSA/DAIS framework
Query Conversion Service based on XQuery and
How to integrate (bottom-up) different database
XML schema
schema
 Provide integrated view of multi databases
with different XML schema
 CrossSearch over multi databases (not join)
8
National Institute of Advanced Industrial Science and Technology
Grid-based Integration
Overview
Our
Target
Our Application Field
(Scientific Data)
Workflow
Transaction
Feedback
Distributed
Query Processing
Remote Access apply
GGF-DAIS
OGSA-DAI
Query Conversion
Grid Service
to make CrossSearch
Proxy/Mediator
Grid Database Service
To access
Existing external
WebDBs
Existing external web databases
EC site, Search Portal etc,,,
9
National Institute of Advanced Industrial Science and Technology
1:Grid Proxy/Mediator database service for external web databases
OGSA Environment
Mediator
OGSA-DAI
Compliant
Database
Access
DbLP
Web Databases
Wrapper
Db/Lp
join
siteseer
Wrapper
SQL
join
siteseer
Delphin
(login)
Local /Remote
Databases
Wrapper
Proxy
Databases
delphion
10
National Institute of Advanced Industrial Science and Technology
Architecture
Internet
OGSA-DAI based System
OGSA-based
Grid Environment
SQL
Invoke Mediator
SQL
Grid
Database
Service
Mediator
Wrapper
Site specific
query
Proxy relations
SQL
XML
HTML
Proxy Database
Resource
Management
Grid Database Service
SQL
Management
Relations
Data
Services
グリッド外の
グリッド外の
Outside
the Grid
データサービス
データサービス
Wrapper
Load
&Exec
e-Commerce site
Search portal
web dataases
Resource DB
(wrappers,URLs,data formats)
11
National Institute of Advanced Industrial Science and Technology
Features
Globus3.0/OGSA-DAI based.

Compatible with OGSA-DAI RDB version.
Platform independent (DBMS,Wrapper)

Combined with Wrapper generator Tool (XFetch,WebL..)
Management functions are Grid services
Wrapper registration/deployment,External database
definition
Proxy Relation within the Grid

Works as a cache for external webDB
SQL condition is converted to webDB query
statement (not SQL function)



Handled as a table, not as a SQL function (it is possible)
Simple query optimization utilizing proxy relations
Approximate query => exact processing is done on proxy.
13
National Institute of Advanced Industrial Science and Technology
2:Query/Schema Conversion Service
Provide Schema/Query Conversion Function between different
XML schema

User can define multiple resources with XML schema

Databases for DB resources

User can define relationship/conversion between schema
(simple kind of Ontology)

XQuery-XQuery conversion service based on this info
Conversion Service
XQuery
Converted
XQueries
Database
Resources
XML Schema
Management
Service
Resource databases
14
National Institute of Advanced Industrial Science and Technology
Application Prototype
Geographic Metadata Query System

Multiple XML databases
Dublin Core
Dublin Core + Longitude/Latitude
GILS
Application Specific
Converted
XQueries
DC
XQuery
Distributed
Metadata
Query
System
DC+
GILS
JMP
Cuurent version is not OGSI based (in deveopment)
15
National Institute of Advanced Industrial Science and Technology
Summary & Directions
2 prototype services


OGSA-DAI compliant grid service to bring web databases
into the grid
Lesson:Need of dynamic scheduling for uncertainness of
external web
Schema/Query conversion service to handle multiple XML
schema
Lesson: Need of concise set to handle ontology
Directions


Advanced Grid Database Discovery Service
Active/Autonomous Functions for DBMS
16
National Institute of Advanced Industrial Science and Technology