DB2 Information Integrator V8.1 Beta Training

Download Report

Transcript DB2 Information Integrator V8.1 Beta Training

®
IBM Software Group
Database Replication and Change Propagation
Technologies for Continuous Availability
WMO TECO-WIS Convention
Seoul, November 8th, 2006
Serge Bourbonnais
Database Replication
Silicon Valley laboratory
IBM Software Group | DB2 Information Management Software
Abstract - IBM Database Replication Technologies
• Database replication technologies allow an IT infrastructure to achieve continuous
availability of the enterprise operations, by providing solutions for Disaster
Recovery, Workload Isolation, and Information Integration.
• When used for Disaster Recovery, each replicated database can be fully active,
and copies do not need to be identical. Some trade-offs include administrative
costs, and the overhead of capturing and applying changes.
• For Workload Isolation, the replication process can manage conflicts that may arise
from the application workload, database constraints, loading a target while changes
are still occurring at the source, or changes arriving out of order in a multi-node
configuration. Conflict resolution either relies on timestamp and the origin of each
change, or on a designated master. Configurations for data distribution and
consolidation to/from hundreds of databases can also be deployed.
• For Information Integration, the replication process deals with heterogeneous data
schemas, data stores, or even data models.
IBM Database replication technologies can capture and propagate changes with
low-latency at high-throughput over long distances, while preserving database
transactional integrity, and tolerating system outages or intermittent connectivity.
IBM Software Group | DB2 Information Management Software
Agenda
 Database Replication Technologies for continuous availability
 In support of the Global Enterprise: From Continuous Availability to
Business Integration
 Where Database Replication fits
 Where replication does not fit
 IBM Product Architecture and Capabilities
 Capture, Apply, Federation, and Transforms
 Topologies, Conflict Detection and Resolution
 Sample Implementations
IBM Software Group | DB2 Information Management Software
Why Replication in an Information System?
1. Disaster Recovery
Requirements

Goal: High-Availability

Applications: Standby copy for failover, Scheduled and Unscheduled
outages

Requirements: Minimize recovery time and eliminate or reduce data loss.
Preserve transactional consistency.
Requirements
Less
2. Workload Isolation

Goal: High-Availability, Improve Performance

Applications: Data Distribution/Consolidation, Regional Data Centers,
Caches

Requirements: Maintain live copies or subsets for working in disconnected
mode, often geographically distributed. Need to detect and resolve
conflicts, if any. Data mappings and transformations.
3. Information Integration

Goal: High-Availability, Improve Performance, Global Enterprise View

Applications: Analytics, Enterprise Business Integration

Requirements: Moving data to/from heterogeneous data stores. Cleansing
and transformations. Assembling objects with data from several sources.
More
IBM Software Group | DB2 Information Management Software
From Disaster Recovery to Information Integration:
Replication needs are cumulating - Semantics are increasing.
Requirements
and Scenarios
Maintain a full
database copy for
Disaster Recovery
Database
Maintain a logical
database subset for
Disaster Recovery,
Workload Isolation
Relational tables
Business Objects
<order><oid>197</oid>
<pid>AS207</pid>
<desc>Wheel</desc>
<qty>1</qty>
</order>
Propagated
Objects
Technologies
and Products
Publish changes
(with transformations) for
Disaster Recovery,
Workload Isolation,
Data Integration
Log Shipping
•HADR (LUW)
Disk Mirroring
•GDPS
•PPRC
Logical Replication
•Data Propagator
•Q Replication
Event Publishing
•Q Event Publish
•II Federation
Integration Software
•DataStage
Database Replication Application Space 
IBM Software Group | DB2 Information Management Software
Application Space for Database Replication Technologies
 Database Replication is a good fit when:
 Asynchronous Capture and Delivery
 Outages. Network, servers, a site, the RDBMS.
 Occasionally connected
 Non-identical sources and targets
 Different platforms. OS, RDBMS, even data models
 Different shapes. Sub-setting required
 Row-level transformations., Codepages, Schemas
 Update-Anywhere with possible conflicts
 Only possible with replication
 When some data loss is tolerable in case of a major disaster.
 Often, solution can be designed to limit loss to a few seconds.
 Fast delivery over large distances (1000s km)
 Several 10 000s rows/second achievable (up to 100 000row/sec)
 Avoid or minimize full-refresh of data at the target
 Other factors:
 Minimize down-time, administrative cost, application performance impact
Replication Technologies guarantee Transactional Consistency with Resilience
IBM Software Group | DB2 Information Management Software
Limits of Database Replication technologies
 Non-zero Data Loss required in case of Disaster (fire, flood)
 Use Synchronous technologies instead, i.e., HADR, PPRC
 Set-level .transformations are required on the data
 Use ETL software instead
 However, replication can be used to feed a staging area for ETL tools.
Replication can hide the differences between the target and the source
(database schema, data model, codepage, hardware architecture) and
provide a continuous, asynchronous feed.
 Business Objects need to be assembled
 Develop applications in the application layer
Other factors
 Cost-Benefit analysis of the solutions, given the requirements
IBM Software Group | DB2 Information Management Software
IBM SQL Replication
 Staging is in relational tables
 Control, and Monitoring information also in relational tables
 Transport is over a database connection
Source server
Target servers
DB2
Staging Tables
DB2
Apply
DB2
database
recovery
log
Apply
SQL
Capture
DB2
Apply
Non-DB2
Control tables
Triggers
•Informix
•Oracle
•Sybase
•SQL Server
Apply
DB2
Information
Integrator
Non-DB2
•z/OS
•iSeries
•UDB LUW
•Informix
•Oracle
•Sybase
•SQL Server
•Teradata
IBM Software Group | DB2 Information Management Software
A parenthesis: Database Federation
Remote objects (structured files, tables, spreadsheets) appear to the
application as if local tables in a DB2 database
 Local and non-local data can be manipulated in the same SQL statement
CREATE NICKNAME ORAT3 FOR
ORACLE9.SCOTT.T3
INSERT INTO ORAT3 VALUES(5)
SELECT * FROM ORAT3
T1 is a Table;
ORAT3 a
Nickname
Nicknames appear as local tables. For example:
> db2 list tables
Table/View
------------------------------T1
ORAT3
CUSTOMERS
Schema
--------------BOURBON
BOURBON
BOURBON
Type
----T
N
T
IBM Software Group | DB2 Information Management Software
IBM Q Replication
 Staging and Transport over MQSeries persistent message queues
 High-throughput, Low-latency. Apply with parallel agents
Source server
Target servers
WebSphere Queue
Manager (or client)
DB2
DB2
Apply
Apply
Admin queue
DB2
database
recovery
log
Capture
Apply
Apply
Send queues
•z/OS
•UDB LUW
Control Tables
Non-DB2
•z/OS
•UDB LUW
•VM/VSE
Control tables
Restart queue
Apply
Apply
Apply
DB2
Information
Integrator
Control Tables
•Informix
•Oracle
•Sybase
•SQL Server
•Teradata
IBM Software Group | DB2 Information Management Software
Performance
 Q Replication is between 3 to 10 times faster than SQL
replication
 Higher throughput and shorter latency
 Capture measured throughput: 49000rows/second (V9.1)
 Latency less than 2 seconds achievable over 1000s of
kilometers
 Measured Time to clear up receive queues after an outage
 1,000,000 rows accumulated in target receive queue
 Continuous arrival rate: 5,000 rows per second
 Time to re-sync target database = 91 seconds
(1) Turbo Freeway (2064-216) 2 LPARs 4CP for the source system and 4cp for the target system.
IBM Software Group | DB2 Information Management Software
Q Replication subscriptions - defining target copies
 Projection over columns and rows of a table:
 Only changes for subscribed tables are sent
 Some transactions can be ignored (e.g., by owner ID, trans ID, with signal
or command)
 Some operations can be ignored (e.g., delete)
 Filter rows with a predicate (e.g., WHERE :LOCATION ='EAST' AND :SALES >
(SELECT SUM(expense) FROM STORES WHERE stores.deptno = :DEPTNO)
 Database Schema mappings examples:
 1 column to N columns, e.g., [ :C1 || :C2]
 N columns  1 columns, e.g., [substr(:C2,2,3)]
 Generated columns, e.g., [CURRENT TIMESTAMP]
Capture side:
Apply side
ORDERS
oid
price
Replication handles codepage conversion, architecture difference
IBMORDERS
ibmID
price
ts
IBM Software Group | DB2 Information Management Software
Q Replication Subscription Types
 Unidirectional
 Changes are replicated in one direction
 1:N – N:1 topologies – Distribution and Consolidation
Source
 Changes can be filtered and transformed
Target(s)
 Bidirectional – master/slave
 Changes replicated in both directions
 Conflicts detected on data values:
•
Primary Secondary/backup
Conflict rules: Check key, changed only, or all
columns
 One server designated as winner
•
Conflict action: Force, ignore, merge change
 Tree topologies only
 Minimum overhead
 Peer to peer – no master, use timestamps
 Conflicts resolved by using most recent version, no
master copy - Handles out of order arrivals (e.g., delete
before insert)
 Requires extra columns and triggers
IBM Software Group | DB2 Information Management Software
Data Distribution from a (CCD) staging area
SQL Apply
Q Capture
Q Apply
Target Table
Q Replication
SQL Connections
Read/write
Source Table
MQSeries
SQL Apply
Read-only
CCD Table
…
Target Table
SQL Apply
…
Target Table
IBM Software Group | DB2 Information Management Software
Consistent Changed Data (CCD) Apply targets
 Usages:
 AUDIT trail of database changes.
•
Answer: Who changed what, when, and how?
 Staging table for data distribution (with SQL Apply)
PARTS_CCD
COMMITSEQ AUTHID
1
USER_A
2
USER_B
3
USER_A
OPERATION LOGMARKER XPARTNO PARTNO XPRICE PRICE
current
U
A7571
A7571
4.31
timestamp
5.03
current
I
null
A7981
null 121.03
timestamp
current
D
null
A7981
null
null
timestamp
•For updates, before values can be optionally present in the CCD (e.g., XPARTNO)
•Condensed CCD: Contains only the latest changed value of each row
•Complete CCD: Initially created with values for all rows from the source table.
IBM Software Group | DB2 Information Management Software
Event Publishing
Function
 Capture changed data in real time
 Correlate by transactions within a single
database
 Output: XML or CSV
Usage
 Building the Data Warehouse
 Business Integration
 Auditing requirements
DB2 z/OS and LUW
IMS
VSAM
CA IDMS
DataStage
Log-based
capture
Capture
Software AG
Adabas
WebSphere
Business Integration
WebSphere MQ
WebSphere MQ
Integrator Broker
Target DBs
User Application
JMS-aware
Application
IBM Software Group
Mazda
Challenge
 Support 700 dealers in USA
 Trouble matching customer demand with
available inventory
 More current data needed to track sales
achievements with period-end goals
Solution
 Sales and inventory information is replicated
every minute to portal server
 Improved access to current data without
changes to existing IT infrastructure
“Within 5 weeks of receiving the [WebSphere]
Information Integrator product we were able
to implement it in our … environments. It
now provides us up to the minute sales
activity.”
Joe Neria, Software
Consultant. Mazda
Business benefits
 Increased auto sales
 Improved dealer satisfaction
 Currency of information improved by
93%
Technology benefits
 Re-used existing application and data
base infrastructure
 Decreased network load compared to
full data refreshes 4 times an hour
 Ease and speed of deployment
17
IBM Software Group
International provider of financial & investment services
Challenge
Corporate initiative to provide
customers better performing real-time
queries by utilizing multiple sites.
Replication of critical order processing
details for core business functionality
Solution
Q Replication for high speed movement
of up to 10 Million transactions to
secondary site several thousand miles
away. Current implementation is UniDirectional with peer-to-peer plans.
Business benefits
 Replicating 5-10 Million
transactions with less than 2
seconds latency.
 More efficient and cost-effective
resource utilization
 Secondary platform services
reporting and business intelligence
queries and acts as backup to
primary
Technology benefits
 Real-time back up of secondary
system provides results in
increased capacity for peak
workloads.
18
IBM Software Group
CitiStreet
Challenge
 Support single sign-on access through both
Web and IVR applications ensuring 24x7
portal access for plan participants and
sponsors
Solution
 Support redundant, active single sign-on
applications for failover processing replicating
profile changes between them in real time.
“Since
nearly 10 million of CitiStreet
customers are offered 24-hour access to
their retirement accounts, the company can't
afford downtime and must be able to
replicate data changes when they happen.
We fully replicate our database over
redundancy data lines, so to us the stability
and speed of that asynchronous replication
is strategic for us."
Barry Strasnick , CIO
CitiStreet
Overview
• CitiStreet is one of the largest and
most experienced global benefits
providers servicing over 9 million plan
participants across all markets.
CitiStreet was formed in partnership
between subsidiaries of State Street
Corporation and Citigroup
Business benefits
 Ensure application availability for plan
participants and sponsors
 The new solutions from IBM will
improve data integrity with a reduced
level of maintenance
Technology benefits
 Maintain bi-directional synchronization
of profile updates (approx 175,000
updates daily) in real time
19
IBM Software Group | DB2 Information Management Software
Summary
 IBM develops Data Propagation technologies to provide Continuous
Availability and achieve a Global Integrated view of the enterprise in an
heterogeneous environment
 Q Replication (IBM WebSphere Replication Server) delivers low latency,
high throughput, and resilience. It is best-of-breed for heavy OLTP
workloads, providing resilience and preserving transactional integrity
throughout outages while minimizing the need for full data refreshes.