DB2 Information Integrator V8.1 Beta Training
Download
Report
Transcript DB2 Information Integrator V8.1 Beta Training
®
IBM Software Group
Database Replication and Change Propagation
Technologies for Continuous Availability
WMO TECO-WIS Convention
Seoul, November 8th, 2006
Serge Bourbonnais
Database Replication
Silicon Valley laboratory
IBM Software Group | DB2 Information Management Software
Abstract - IBM Database Replication Technologies
• Database replication technologies allow an IT infrastructure to achieve continuous
availability of the enterprise operations, by providing solutions for Disaster
Recovery, Workload Isolation, and Information Integration.
• When used for Disaster Recovery, each replicated database can be fully active,
and copies do not need to be identical. Some trade-offs include administrative
costs, and the overhead of capturing and applying changes.
• For Workload Isolation, the replication process can manage conflicts that may arise
from the application workload, database constraints, loading a target while changes
are still occurring at the source, or changes arriving out of order in a multi-node
configuration. Conflict resolution either relies on timestamp and the origin of each
change, or on a designated master. Configurations for data distribution and
consolidation to/from hundreds of databases can also be deployed.
• For Information Integration, the replication process deals with heterogeneous data
schemas, data stores, or even data models.
IBM Database replication technologies can capture and propagate changes with
low-latency at high-throughput over long distances, while preserving database
transactional integrity, and tolerating system outages or intermittent connectivity.
IBM Software Group | DB2 Information Management Software
Agenda
Database Replication Technologies for continuous availability
In support of the Global Enterprise: From Continuous Availability to
Business Integration
Where Database Replication fits
Where replication does not fit
IBM Product Architecture and Capabilities
Capture, Apply, Federation, and Transforms
Topologies, Conflict Detection and Resolution
Sample Implementations
IBM Software Group | DB2 Information Management Software
Why Replication in an Information System?
1. Disaster Recovery
Requirements
Goal: High-Availability
Applications: Standby copy for failover, Scheduled and Unscheduled
outages
Requirements: Minimize recovery time and eliminate or reduce data loss.
Preserve transactional consistency.
Requirements
Less
2. Workload Isolation
Goal: High-Availability, Improve Performance
Applications: Data Distribution/Consolidation, Regional Data Centers,
Caches
Requirements: Maintain live copies or subsets for working in disconnected
mode, often geographically distributed. Need to detect and resolve
conflicts, if any. Data mappings and transformations.
3. Information Integration
Goal: High-Availability, Improve Performance, Global Enterprise View
Applications: Analytics, Enterprise Business Integration
Requirements: Moving data to/from heterogeneous data stores. Cleansing
and transformations. Assembling objects with data from several sources.
More
IBM Software Group | DB2 Information Management Software
From Disaster Recovery to Information Integration:
Replication needs are cumulating - Semantics are increasing.
Requirements
and Scenarios
Maintain a full
database copy for
Disaster Recovery
Database
Maintain a logical
database subset for
Disaster Recovery,
Workload Isolation
Relational tables
Business Objects
<order><oid>197</oid>
<pid>AS207</pid>
<desc>Wheel</desc>
<qty>1</qty>
</order>
Propagated
Objects
Technologies
and Products
Publish changes
(with transformations) for
Disaster Recovery,
Workload Isolation,
Data Integration
Log Shipping
•HADR (LUW)
Disk Mirroring
•GDPS
•PPRC
Logical Replication
•Data Propagator
•Q Replication
Event Publishing
•Q Event Publish
•II Federation
Integration Software
•DataStage
Database Replication Application Space
IBM Software Group | DB2 Information Management Software
Application Space for Database Replication Technologies
Database Replication is a good fit when:
Asynchronous Capture and Delivery
Outages. Network, servers, a site, the RDBMS.
Occasionally connected
Non-identical sources and targets
Different platforms. OS, RDBMS, even data models
Different shapes. Sub-setting required
Row-level transformations., Codepages, Schemas
Update-Anywhere with possible conflicts
Only possible with replication
When some data loss is tolerable in case of a major disaster.
Often, solution can be designed to limit loss to a few seconds.
Fast delivery over large distances (1000s km)
Several 10 000s rows/second achievable (up to 100 000row/sec)
Avoid or minimize full-refresh of data at the target
Other factors:
Minimize down-time, administrative cost, application performance impact
Replication Technologies guarantee Transactional Consistency with Resilience
IBM Software Group | DB2 Information Management Software
Limits of Database Replication technologies
Non-zero Data Loss required in case of Disaster (fire, flood)
Use Synchronous technologies instead, i.e., HADR, PPRC
Set-level .transformations are required on the data
Use ETL software instead
However, replication can be used to feed a staging area for ETL tools.
Replication can hide the differences between the target and the source
(database schema, data model, codepage, hardware architecture) and
provide a continuous, asynchronous feed.
Business Objects need to be assembled
Develop applications in the application layer
Other factors
Cost-Benefit analysis of the solutions, given the requirements
IBM Software Group | DB2 Information Management Software
IBM SQL Replication
Staging is in relational tables
Control, and Monitoring information also in relational tables
Transport is over a database connection
Source server
Target servers
DB2
Staging Tables
DB2
Apply
DB2
database
recovery
log
Apply
SQL
Capture
DB2
Apply
Non-DB2
Control tables
Triggers
•Informix
•Oracle
•Sybase
•SQL Server
Apply
DB2
Information
Integrator
Non-DB2
•z/OS
•iSeries
•UDB LUW
•Informix
•Oracle
•Sybase
•SQL Server
•Teradata
IBM Software Group | DB2 Information Management Software
A parenthesis: Database Federation
Remote objects (structured files, tables, spreadsheets) appear to the
application as if local tables in a DB2 database
Local and non-local data can be manipulated in the same SQL statement
CREATE NICKNAME ORAT3 FOR
ORACLE9.SCOTT.T3
INSERT INTO ORAT3 VALUES(5)
SELECT * FROM ORAT3
T1 is a Table;
ORAT3 a
Nickname
Nicknames appear as local tables. For example:
> db2 list tables
Table/View
------------------------------T1
ORAT3
CUSTOMERS
Schema
--------------BOURBON
BOURBON
BOURBON
Type
----T
N
T
IBM Software Group | DB2 Information Management Software
IBM Q Replication
Staging and Transport over MQSeries persistent message queues
High-throughput, Low-latency. Apply with parallel agents
Source server
Target servers
WebSphere Queue
Manager (or client)
DB2
DB2
Apply
Apply
Admin queue
DB2
database
recovery
log
Capture
Apply
Apply
Send queues
•z/OS
•UDB LUW
Control Tables
Non-DB2
•z/OS
•UDB LUW
•VM/VSE
Control tables
Restart queue
Apply
Apply
Apply
DB2
Information
Integrator
Control Tables
•Informix
•Oracle
•Sybase
•SQL Server
•Teradata
IBM Software Group | DB2 Information Management Software
Performance
Q Replication is between 3 to 10 times faster than SQL
replication
Higher throughput and shorter latency
Capture measured throughput: 49000rows/second (V9.1)
Latency less than 2 seconds achievable over 1000s of
kilometers
Measured Time to clear up receive queues after an outage
1,000,000 rows accumulated in target receive queue
Continuous arrival rate: 5,000 rows per second
Time to re-sync target database = 91 seconds
(1) Turbo Freeway (2064-216) 2 LPARs 4CP for the source system and 4cp for the target system.
IBM Software Group | DB2 Information Management Software
Q Replication subscriptions - defining target copies
Projection over columns and rows of a table:
Only changes for subscribed tables are sent
Some transactions can be ignored (e.g., by owner ID, trans ID, with signal
or command)
Some operations can be ignored (e.g., delete)
Filter rows with a predicate (e.g., WHERE :LOCATION ='EAST' AND :SALES >
(SELECT SUM(expense) FROM STORES WHERE stores.deptno = :DEPTNO)
Database Schema mappings examples:
1 column to N columns, e.g., [ :C1 || :C2]
N columns 1 columns, e.g., [substr(:C2,2,3)]
Generated columns, e.g., [CURRENT TIMESTAMP]
Capture side:
Apply side
ORDERS
oid
price
Replication handles codepage conversion, architecture difference
IBMORDERS
ibmID
price
ts
IBM Software Group | DB2 Information Management Software
Q Replication Subscription Types
Unidirectional
Changes are replicated in one direction
1:N – N:1 topologies – Distribution and Consolidation
Source
Changes can be filtered and transformed
Target(s)
Bidirectional – master/slave
Changes replicated in both directions
Conflicts detected on data values:
•
Primary Secondary/backup
Conflict rules: Check key, changed only, or all
columns
One server designated as winner
•
Conflict action: Force, ignore, merge change
Tree topologies only
Minimum overhead
Peer to peer – no master, use timestamps
Conflicts resolved by using most recent version, no
master copy - Handles out of order arrivals (e.g., delete
before insert)
Requires extra columns and triggers
IBM Software Group | DB2 Information Management Software
Data Distribution from a (CCD) staging area
SQL Apply
Q Capture
Q Apply
Target Table
Q Replication
SQL Connections
Read/write
Source Table
MQSeries
SQL Apply
Read-only
CCD Table
…
Target Table
SQL Apply
…
Target Table
IBM Software Group | DB2 Information Management Software
Consistent Changed Data (CCD) Apply targets
Usages:
AUDIT trail of database changes.
•
Answer: Who changed what, when, and how?
Staging table for data distribution (with SQL Apply)
PARTS_CCD
COMMITSEQ AUTHID
1
USER_A
2
USER_B
3
USER_A
OPERATION LOGMARKER XPARTNO PARTNO XPRICE PRICE
current
U
A7571
A7571
4.31
timestamp
5.03
current
I
null
A7981
null 121.03
timestamp
current
D
null
A7981
null
null
timestamp
•For updates, before values can be optionally present in the CCD (e.g., XPARTNO)
•Condensed CCD: Contains only the latest changed value of each row
•Complete CCD: Initially created with values for all rows from the source table.
IBM Software Group | DB2 Information Management Software
Event Publishing
Function
Capture changed data in real time
Correlate by transactions within a single
database
Output: XML or CSV
Usage
Building the Data Warehouse
Business Integration
Auditing requirements
DB2 z/OS and LUW
IMS
VSAM
CA IDMS
DataStage
Log-based
capture
Capture
Software AG
Adabas
WebSphere
Business Integration
WebSphere MQ
WebSphere MQ
Integrator Broker
Target DBs
User Application
JMS-aware
Application
IBM Software Group
Mazda
Challenge
Support 700 dealers in USA
Trouble matching customer demand with
available inventory
More current data needed to track sales
achievements with period-end goals
Solution
Sales and inventory information is replicated
every minute to portal server
Improved access to current data without
changes to existing IT infrastructure
“Within 5 weeks of receiving the [WebSphere]
Information Integrator product we were able
to implement it in our … environments. It
now provides us up to the minute sales
activity.”
Joe Neria, Software
Consultant. Mazda
Business benefits
Increased auto sales
Improved dealer satisfaction
Currency of information improved by
93%
Technology benefits
Re-used existing application and data
base infrastructure
Decreased network load compared to
full data refreshes 4 times an hour
Ease and speed of deployment
17
IBM Software Group
International provider of financial & investment services
Challenge
Corporate initiative to provide
customers better performing real-time
queries by utilizing multiple sites.
Replication of critical order processing
details for core business functionality
Solution
Q Replication for high speed movement
of up to 10 Million transactions to
secondary site several thousand miles
away. Current implementation is UniDirectional with peer-to-peer plans.
Business benefits
Replicating 5-10 Million
transactions with less than 2
seconds latency.
More efficient and cost-effective
resource utilization
Secondary platform services
reporting and business intelligence
queries and acts as backup to
primary
Technology benefits
Real-time back up of secondary
system provides results in
increased capacity for peak
workloads.
18
IBM Software Group
CitiStreet
Challenge
Support single sign-on access through both
Web and IVR applications ensuring 24x7
portal access for plan participants and
sponsors
Solution
Support redundant, active single sign-on
applications for failover processing replicating
profile changes between them in real time.
“Since
nearly 10 million of CitiStreet
customers are offered 24-hour access to
their retirement accounts, the company can't
afford downtime and must be able to
replicate data changes when they happen.
We fully replicate our database over
redundancy data lines, so to us the stability
and speed of that asynchronous replication
is strategic for us."
Barry Strasnick , CIO
CitiStreet
Overview
• CitiStreet is one of the largest and
most experienced global benefits
providers servicing over 9 million plan
participants across all markets.
CitiStreet was formed in partnership
between subsidiaries of State Street
Corporation and Citigroup
Business benefits
Ensure application availability for plan
participants and sponsors
The new solutions from IBM will
improve data integrity with a reduced
level of maintenance
Technology benefits
Maintain bi-directional synchronization
of profile updates (approx 175,000
updates daily) in real time
19
IBM Software Group | DB2 Information Management Software
Summary
IBM develops Data Propagation technologies to provide Continuous
Availability and achieve a Global Integrated view of the enterprise in an
heterogeneous environment
Q Replication (IBM WebSphere Replication Server) delivers low latency,
high throughput, and resilience. It is best-of-breed for heavy OLTP
workloads, providing resilience and preserving transactional integrity
throughout outages while minimizing the need for full data refreshes.