Transcript Document

Transaction-based Grid Data Replication
Using OGSA-DAI
Presented by Yin Chen
February 2007
What is replication?
• Initial copying of data & synchronization of updating
• Is not Cashing
– Client phenomenon
– Only for improving response time
• Is not a Backup
– Not automatically overwritten when the original data is modified
– Normally, cannot directly access
Why do we need it?
• Data consolidation (central audit & analyse)
• Data distribution (for branch labs)
• Performance
–
–
–
–
–
Access efficiency (moving data near processing)
Load balancing (distributing access load)
Security (data protection)
Availability (off-line access)
Reliability (disaster recovery, avoiding single point of failure)
Challenges of Grid database replication
• How to copy the large data among heterogeneous DBs
• How to maintain the consistency of data in a highly
distributed network environment
• How to discover & self-repair the dead parts
Problems of existing technologies
• Existing Grid “replication” systems
 E.g. the EDG replica manager/ the Globus data replication
service/ SRB
 Support large dataset copying
Yet, merely deal with files
Too simple (e.g. not support updating, database replication, etc.)
Not consistent
• Relational database replication tools
 E.g. Oracle/ Sybase/ DB2/ MySQL replication
 Very flexible (e.g., portion copy, bi-direction update)
Yet, not suit for virtual organizations (e.g. can’t copy large data/
difficult to search for replicas)
Architecture
Transfer Service
Metadata
Catalogue
Data
Resource
Replication
Control
Service
Relational Database
Replication Mechanism
Data
Replica
Data flow directions
Replication control workflow
Request
Replication Control Service
Metadata
Search
Engine
Initiator
Transfer Service
Starter
Relational Database
Replication Mechanism
Data
Resource
Metadata
Catalogue
Selector
Metadata
Register
Replication
Target
OGSA-DAI activities (ongoing)
• High-level APIs to interact with relational replication
mechanisms:
 CreateReplicaDatabase()
 DropReplicaDatabase()
 ConfigReplication()
 CleanUp() -- to clean up replication configuration
 StartReplication()
 StopReplication()
 MonitorReplication() -- to check the status of each process
• Control the workflow of data replication, i.e.
sequence.addChild(createDB2RelicaDB);
sequence.addChild(configDB2Replication);
sequence.addChild(startDB2Replication);
IBM DB2 SQL Replication
•
Admin: create replication criteria  control table
• Capture: use log/trigger to capture the changes temp table
IBM Replication
• Apply: scheduled apply transactions accumulated target DB
•
Alert Monitor: monitor and notify users
•
Supports: after-image copy / before-image copy (can rollback)
•
Allows subset/simple view/ complex joins & unions copy
•
Asynchronous replication, allows specifying schedule
Features
• Combine Relational Database Replication with Grid
technologies, to gain benefits from both
 Keep the features of relational database replication
 Supporting more scalable, secure, high performance data access
• Explore the abilities of OGSA-DAI to control workflows
Information
• Project members:
• Dave Berry (NeSC, UK)
• Patrick Dantressangle (IBM, Hursley)
• Yin Chen (NeSC, UK)
• Simon Laws (IBM, Hursley)
• Project website:
http://www.aiai.ed.ac.uk/~ychen/ibm_ogsadai/ibm-ogsadai-index.html