Choosing the correct data movement technology

Download Report

Transcript Choosing the correct data movement technology

EM416
Choosing the Correct Data
Movement Technology
Chris Kleisath
Director of Engineering
iAnywhere Solutions
[email protected]
Overview
When choosing a data movement
technology you need to consider:
The business requirements for the distributed database.
The technological limitations of your environment.
The development and administrative resources available.
Enterprise Data Movement
Central Datastore
Workgroup
Computing
Wired and Wireless
Communication Links
Embedded
Computing
Mobile
Computing
Distributed Systems
Using any data movement technology
means we have a distributed database.
This almost always implies some form of
distributed application.
What is a Distributed System?
C.J. Date’s working definition:
“A distributed database system consists
of a collection of sites, connected
together via some sort of network, in
which:
Each site is a database system in its own right
Sites have agreed to work together (if necessary), so that a
user at any site can access data anywhere in the
network exactly as if the data were all stored at the
user’s own site.”
Distributed Systems
Practical Factors
Not all systems require that all data be
available to all sites.
Not all systems require that all data be
consistent between all sites all of the
time.
The degree to which your system must
meet the ideal definition is the single
biggest factor in choosing your data
movement technology.
Issues when distributing data
•
•
•
•
•
•
Local autonomy
Data partitioning (fragmentation)
Consistency
Transaction control
Accessibility (connection)
Topology
Local autonomy
Each site should operate independently
of the other sites.
No site should depend on another site for
its successful functioning.
A centralized database provides the
lowest level of local autonomy.
Decentralized systems provide the
highest level of local autonomy.
Data Partitioning
Also known as fragmentation.
Only the data needed by a site is present
at the site.
The database at a site is a “complete”
subset of the data.
Some data will need to be duplicated
between sites.
Data Partitioning
Update Anywhere
Primary keys must be unique across the
entire distributed system.
If multiple sites insert into the same table.
Requires a conflict detection and
resolution mechanism.
If multiple sites are able to change the same row.
Tight vs. Loose Consistency
Which version of the data is being used?
Account
AcctNo
Bal
Acct200 1000
Waterloo
Account
AcctNo
Bal
Acct200 1000
Paris
Consistency
Tight consistency requires all data to be
in a consistent state.
Loose consistency permits data to be
“out-of-date”.
Latency is the measure of how long it
takes the data to become consistent.
In some cases it is never consistent
since there are always changes that
have not been moved.
Transaction Control
Your chosen technology must pass the
ACID test
Atomicity, Consistency, Isolation, Durability
Only committed data should move.
Committed data must move.
Failure to successfully move committed
data must be detectable.
Changes must be applied in the same
order on all databases.
Accessibility
What kind of “network” do you have
between the sites?
High-speed LAN/WAN
Low-speed Dial-up (RAS)
Wireless
Indirect (email, ftp)
Internet (HTTP)
Sneaker-net
Topology
What kind of relationship exists between
the sites.
Peer-to-peer
Each site can transfer data to any other site.
No centralized master copy can exist.
Conflict resolution is extremely difficult
• There is no place to detect and resolve the conflict
Topology
Hierarchical
Each site passes data up and down the hierarchy.
A central master copy (consolidated database) exists.
Data must pass through the consolidated to move to
another site.
Conflict detection and resolution is implemented on the
consolidated.
Other Issues
Number of sites
Some technologies are better suited to mass deployment.
Vendors
Are the databases at each site the same product?
Is the technology commercially available and supported?
Factors
Summary
Each of the following factors will
influence your choice of data
movement technologies:
Local autonomy
Data partitioning (fragmentation)
Consistency
Transaction control
Accessibility (connection)
Topology
Types of Data Movement
All technologies can be categorized as
one of:
Online
Synchronization
Replication
Online
Changes are made “simultaneously” on
all databases.
Account
AcctNo
Bal
Acct200 1000
Account
AcctNo
Bal
Acct200 1000
Please wait while
your account is
being updated…
Withdraw
$100
Waterloo
Paris
Online
In its “simplest” form the application
updates all of the databases directly.
Underlying technology is normally “Two
Phase Commits”
Sybase products:
EAServer
Not really data movement but is
appropriate in some systems.
Online Characteristics:
Local Autonomy
Very low level of local autonomy.
If one site is down the entire system is down.
Account
AcctNo
Bal
Acct200 1000
Sorry the System is Unavailable
Withdraw
$100
Waterloo
Account
AcctNo
Bal
X
Acct200 1000
Paris
Online Characteristics:
Data Partitioning
Data can be partitioned as required.
If the data is partitioned the application
must update the row(s) everywhere.
Since transactions are applied at all
databases simultaneously no primary
key or conflict issues arise.
Online Characteristics:
Consistency
Use when tight consistency is an
absolute requirement.
Transactions will succeed or fail on all
databases.
Online Characteristics:
Transaction Control
A Distributed Transaction Server (DTS)
should be used.
Ensures the transaction is applied on all sites or not at all.
Very expensive to code yourself.
Both ASA and ASE provide support for a DTS
Online Characteristics:
Accessibility
Requires a reliable network connection
between sites.
Transactions will fail if one database is unavailable.
Application speed will be affected by
network speed.
Online Characteristics
Typically a peer-to-peer topology.
Since all databases are updated at once no master copy is
required.
Online Characteristics
Other Issues
“Simple” to understand
Looks just like a centralized database.
Very few sites can be supported
Consider the cost of updating many databases at once.
Vendors
Heterogeneous environments are “easily” supported.
Synchronization
Current state of the data is moved
between databases.
Can be a complete refresh or only the
rows that have changed.
Sybase products:
MobiLink
Synchronization
Product
sku_key qty_oh
1234
10
X
8
UPDATE Product SET qty_oh = 8
WHERE sku_key = 1234
Product
sku_key qty_oh
1234
1X0
10
9
X
8
Synchronization Characteristics
Local Autonomy
High local autonomy
Site database must have all of the data
required for the application to run.
Synchronization Characteristics
Data Partitioning
Data is usually partitioned.
Each site has common data and site
specific data.
Update anywhere requires:
Unique primary keys.
Conflict detection and resolution mechanism.
Synchronization Characteristics
Consistency
Low to high consistency.
Data is only consistent immediately after
synchronization.
Frequency of synchronization affects
level of consistency but in all cases
there is some latency.
Synchronization Characteristics
Transaction Control
Transaction boundaries are not
maintained.
• Some operation sequences can not be
synchronized. (i.e. insert then delete of
a row with the same primary key value)
Most synchronization technologies
“batch” the operations.
• e.g. all deletes, then inserts, then
updates
Synchronization Characteristics
Accessibility
Requires a stable network connection
during the synchronization process.
Connection speed affects the amount of
data that can be reasonably
synchronized.
Synchronization Characteristics
Topology
Both peer-to-peer and hierarchical
topologies are possible.
Peer-to-peer is difficult if update
anywhere is permitted.
Which copy of the data is correct?
Who resolves an update conflict?
Synchronization Characteristics
Other Issues
Heterogeneous environments can be
supported.
• Be aware of compatibility issues. E.g.
Oracle allows 1 varchar column/table,
ASA has no limit.
Because each site synchronizes
independently many sites can be
supported.
MobiLink
ASA, ASE, Microsoft, Oracle, IBM
HTTP, TCPIP
Serial
HotSync, Wireless
ASA, PalmOS, CE, Pagers, Phones
MobiLink Characteristics
Complete local autonomy.
Complete control over data partitioning
on the consolidated through the use of
scripts.
Uses the consolidated database’s scripting language or
Java.
No partitioning allowed on the remote.
MobiLink Characteristics
Session based.
Only changed records are synchronized.
Connection only required while synchronizing.
Bi-directional by default.
Medium to high latency.
Low to medium data volume.
MobiLink Characteristics
Hierarchical topology.
Consolidated can be any ODBC-based database
• Sybase, Microsoft, Oracle, IBM
ASA and/or UltraLite remotes.
Optimized for thousands of remotes.
Scalable based on consolidated
database’s capabilities.
MobiLink Synchronization
Components
Consolidated Database Server
Consolidated
Data Store
ODBC
Remote Database Server
(ASA or UltraLite)
MobiLink Client
(ASA or UltraLite)
Remote Data
Store
MobiLink
Server
TCP/IP
TCP/IP
MobiLink Synchronization Server
Provides interface between consolidated
database and remote server.
Works with ODBC-based host databases.
Responsible for ensuring the
synchronization process completes.
Supports multiple simultaneous
synchronizations.
MobiLink Consolidated
Synchronization Logic
SQL statements executed against the
consolidated database.
Written in language of consolidated database or Java.
Guides the synchronization server.
Controls the flow of data in both directions.
Handles conflicts.
MobiLink Remote
Synchronization Logic
ASA and UltraLite keep track of changes
to the data.
A synchronization component is provided
to:
Scan for changes to create the upload stream
Receive the download stream and apply the changes to
the remote
Replication
Transactions (changes) are moved
between the databases.
Uses store and forward mechanism.
Site(s) must have a common starting
point.
Sybase Products:
SQL Remote
Replication Server
Replication
Product
sku_key qty_oh
1234
1X0
10
9
X
8
UPDATE Product SET qty_oh = 9
WHERE sku_key = 1234
UPDATE Product SET qty_oh = 8
WHERE sku_key = 1234
Product
sku_key qty_oh
1234
1X0
10
9
X
8
Replication Characteristics
Local Autonomy
High local autonomy.
Database must have all of the data
required for the application to run.
Replication Characteristics
Data Partitioning
Data is usually partitioned.
Each site has common data and site
specific data.
Update anywhere requires:
Unique primary keys.
Conflict detection and resolution mechanism.
Replication Characteristics
Consistency
Low to high consistency is possible.
Speed of store and forward messaging
system determines how consistent the
database is.
Some latency is always present.
Replication Characteristics
Transaction Control
Mechanism must exist to guarantee
transactions are:
Sent and applied in the correct order.
No transactions are skipped
Replication Characteristics
Accessibility
Whether a direct connection is required
or not is dependant on the latency
requirements.
Not required in high latency implementations.
Replication Characteristics
Topology
Both peer-to-peer and hierarchical
topologies can be used.
Conflict resolution normally requires a
hierarchical model.
Replication Characteristics
Other Issues
Only transactions are moved therefore:
• It is possible to support many sites.
• Throughput is usually independent of
database size.
Replication Server
Replicate Sites
Primary Sites
 Adaptive Server
Replication Replication
Agent
Server
 DirectCONNECT
(Native drivers)






Adaptive Server/Enterprise
Adaptive Server/Anywhere
Oracle
Informix
OS/390 DB2
Replication Toolkit for MVS
 DirectCONNECT/
Anywhere (ODBC)
Replication Server
Characteristics
Transactions are sent to Replication Server
which stores and forwards them to the
interested sites.
Assumes there is normally a high speed
connection.
Near real time (low latency).
High data volumes.
Moderate number of sites.
Heterogeneous databases supported.
Uni-directional by default.
Replication Server
Primary Sites
Replication
Agent
Replicate Sites
Replication
Server
Replication Server Components
Primary Site
Origin of the data being moved.
Multiple vendors RDBMS supported.
Keeps a record of all transactions.
Normally this is in the transaction log
but it depends on the RDBMS.
Replication Server Components
Replication Agent
Scans the primary site’s record of
transactions.
Passes the committed transactions, in
the order they were applied, to
Replication Server.
Replication Server Components
Replication Server
Receives transactions from the Replication
Agents.
Stores the transactions until they are
successfully applied on all replicate sites.
Maintains a connection to all replicate sites.
Automatically recovers when a connection is dropped and restored
Determines which site(s) require the transaction
and applies them in the correct order.
Replication Server Components
Replication Server
Prevents “circular” transactions.
Provides user programmable “function
strings” to allow manipulation of the
transaction.
Data conversions (e.g. date formats)
Conversion of SQL in heterogeneous environments.
Detects SQL errors.
Replication Server Components
Replicate Site
Applies SQL sent by Replication Server.
A replicate site can also be defined as a
primary site if bi-directional replication
is required.
SQL Remote
ASE
ASA
OR
MAPI
FTP
SMTP
FILE
ASA
VIM
MAPI
FTP
SMTP
FILE
ASA
VIM
SQL Remote Characteristics
Complete local autonomy.
Partitioning based on:
Column values
Subqueries
Where clauses
Message based (no connection)
MAPI (Microsoft), VIM (Lotus), SMTP, FTP and File
Very loose consistency.
SQL Remote Characteristics
Built in guaranteed message delivery.
Hierarchical
Consolidated is either ASA or ASE
Remotes are ASA
Homogeneous.
Many (thousands) of remotes.
Low to medium data volumes.
SQL Remote Components
Consolidated Database
Server
Message Agent
Consolidated
Data Store
Message System
Remote Database
Server
Remote Data
Store
Message Agent
SQL Remote Components
Consolidated Database
Contains a copy of all data that is
replicating.
Performs conflict detection and
resolution.
Transactions are recorded in the
transaction log.
Maintains additional data in the
transaction log about what
transactions are eligible to replicate
and how they are partitioned.
SQL Remote Components
Message Agent
Scans the transaction log for committed
transactions that are eligible to
replicate.
Builds messages for the sites that have
subscribed to the transactions.
Interfaces with the message system.
Guarantees that transactions are:
Sent in the correct order.
Applied in the correct order and only applied once.
No transactions are skipped.
SQL Remote Components
Message Agent
Receives transactions from the message
system.
Applies the transactions.
Prevents “circular” transactions.
Detects update conflicts.
Detects SQL errors.
SQL Remote Components
Message System
Provides the store and forward
technology.
Support for:
MAPI
SMTP
VIM
File
FTP
SQL Remote Components
Remote Database
Contains data the site is subscribed to.
Transactions are recorded in the
transaction log.
Maintains additional data in the
transaction log about what
transactions are eligible to replicate
and how they are partitioned.
Which Technology should I
Choose?
Depends on the business requirements
and technological infrastructure
available.
Consistency and latency are the biggest
factors.
Use EAServer When …
Absolute consistency is required (Zero
latency).
Transactions must fail when one of the
site databases is unavailable.
There are very few sites.
Use MobiLink When …
Latency is permitted.
Local autonomy is required.
A reliable connection exists.
You have low to medium data volumes.
There are heterogeneous databases.
You do not require transaction boundaries to be
maintained.
You have a hierarchical topology.
You have many remotes.
You must know when your changes have been
synchronized.
Use Replication Server When …
Near real-time consistency is required.
You have high data volumes.
Local autonomy is required.
There are heterogeneous databases.
You require transaction boundaries to be
maintained.
You have a peer-to-peer topology.
You can implement a hierarchical topology
A small number of sites.
Use SQL Remote when …
Latency is not a factor.
No direct connection exists (or is not permitted)
or the connection is unreliable,.
Local autonomy is required.
You have low to medium data volumes.
Homogeneous (ASA & ASE) databases.
Hierarchical topology.
You have many remotes.
You require transaction boundaries to be
maintained.
Or Combine them
All four products are compatible.
Use the strengths of each to solve your
business problems.
Replication Server or EAServer between main
geographical databases.
MobiLink or SQL Remote for mass deployed devices.
Summary
Sybase has many different methods of
maintaining data in distributed
databases.
Your business requirements dictate
which method is best.
All the technologies can be used together
on the same database.
EM 416
Choosing the Correct Data
Movement Technology
Chris Kleisath
Director of Engineering
iAnywhere Solutions
[email protected]