lesson20 - SCF Faculty Site Homepage

Transcript lesson20 - SCF Faculty Site Homepage

Planning Server Deployments
Lesson 20
Skills Matrix
Skills Matrix
Replication
• You use replication to put copies of the same data at
different locations throughout the enterprise.
• Common reasons to replicate are:
– To move data closer to the user.
– To reduce locking conflicts when multiple sites want to
work with the same data.
– To allow site autonomy so each location can set up its
own rules and procedures for working with its copy of
the data.
– To preclude the impact of read-intensive operations,
such as report generation and ad hoc query
processing from the OLTP database.
Replication
• SQL Server uses two strategies for
replication: replication itself and distributed
transactions managed by the distributed
transaction coordinator. Whichever strategy
you use, the copies of the data are current
and consistent.
• You can also use both strategies in the same
environment.
Replication
• The timing describes the main difference
between replication and distributed
transactions. With distributed transactions,
SQL Server maintains your data one hundred
percent synchronized one hundred percent
of the time. Replication involves some
latency.
Publisher/Subscriber
• The publisher is the source database where
replication begins.
– It makes data available for replication.
• The subscriber is the destination database
where replication ends.
– It either receives a snapshot of all the
published data or applies transactions that
have been replicated to itself.
Publisher/Subscriber
• The distributor is the intermediary between the publisher
and subscriber.
– It receives published transactions or snapshots from
the publisher and then stores and forwards these
publications to the subscribers.
• The publication is the storage container for different
articles.
– A subscriber can subscribe to an individual article or
an entire publication.
• An article is the data, transactions, or stored procedures
that are stored within a publication.
– This is the actual information being replicated.
New Publication Wizard
Two-phase Commit
• Two-phase commit (sometimes referred to as 2PC)
is a form of realtime distribution in which
modifications are made to all involved databases
at the same time.
• Distributed transactions handles this.
• As with any transaction, either all statements
commit successfully or all modifications roll back.
• Two-phase commit uses the Microsoft DTC to
accomplish its tasks.
• The DTC implements the functionality of a portion
of the Microsoft Transaction Server.
Replication Factors
• Autonomy: This refers to how much
independence you want to give each
subscriber with regard to the replicated data.
• Latency: Latency refers to the time lag
between updates on the subscriber.
• Transactional consistency: Although several
types of replication exist, the most common
method moves transactions from the
publisher through the distributor and on to the
subscriber.
Types of Replication
• Snapshot replication - Snapshot replication
distributes data exactly as it appears at a specific
moment in time and does not monitor for updates
to the data.
• Transactional replication - A type of replication that
typically starts with a snapshot of the publication
database objects and data.
• Merge replication - A type of replication that allows
sites to make autonomous changes to replicated
data, and at a later time, merge changes and
resolve conflicts when necessary.
Distribution Types
• Distributed transactions
• Transactional replication
• Transactional replication with immediate
updating subscribers
• Snapshot replication
• Snapshot replication with immediate
updating subscribers
• Merge replication
• Queued updating
Distribution Types
Queue Updating
• With transactional and snapshot replication, you
can also configure queued updating.
• Like the immediate updating subscribers option,
this gives your users the ability to make changes to
the subscription database.
• But unlike immediate updating subscribers,
queued updating will store changes until the
publisher can be contacted.
• This can be extremely useful in networks where
you have subscribers who are not always
connected or the connection is unreliable.
Subscriptions
• When you set up your subscribers, you can
create either pull or push subscriptions.
Push Subscriptions
• Push subscriptions help centralize your
administrative duties because the subscription
itself is stored on the distribution server.
• In other words, the data can be pushed to the
subscribers based on the publisher’s schedule.
• Push subscriptions are most useful if a subscriber
needs to be updated whenever a change occurs at
the publisher.
• The publisher knows when the modification takes
place, so it can immediately push those changes to
the subscribers.
Pull Subscriptions
• Pull subscriptions are configured and maintained
at each subscriber.
• The subscribers will administer the synchronization
schedules and can pull changes whenever they
consider it necessary.
• This type of subscriber also relieves the distribution
server of some of the overhead of processing.
• Pull subscriptions are also useful in situations in
which security is not a primary issue.
Replication Agents
• Five replication agents handle the tasks of
moving data from the publisher to the
distributor on to the subscribers.
– Logreader agent
– Distribution agent
– Snapshot agent
– Merge agent
– Queue reader agent
Merge Replication
• When you use merge replication, the merge
agent can be centrally located on the
distributor, or it can reside on every subscriber
involved in the merge replication process.
• When you have implemented push replication,
the merge agent will reside on the distributor.
• In a pull scenario, the merge agent is on every
subscriber.
Merge Replication
Conflict Resolution in Merge Replication
• Performing updates to the same records at multiple
locations causes conflicts.
• To resolve these conflicts, SQL Server uses the
MSmerge_contents table and some settings from the
publication itself.
• When you first create a merge publication, you can
use the conflict resolver with three levels of
resolution tracking in a merge publication:
– Row-level tracking
– Column-level tracking
– Logical record-level tracking
Snapshot Replication
• When you use snapshot replication, an entire copy of
the publication moves from the publisher to the
subscriber.
• Everything on the subscriber database is overwritten,
allowing for autonomy, as well as transactional
consistency because all changes are made at once.
• Latency can be high for this type of replication if you
want it to be.
• When you use snapshot replication, there is no
merge agent. Snapshot replication uses the
distribution agent.
Snapshot Replication
Transactional Replication
• When you use transactional replication, only the
changes (transactions) made to the data are moved.
• Before these transactions can be applied at a
subscriber, however, the subscriber must have a copy
of the data as a base.
• Because of its speed and relatively low overhead on
the distribution server, transactional replication is
currently the most often-used form of replication.
• Generally, data on the subscriber is treated as readonly, unless you are implementing transactional
replication with immediate updating subscribers.
Transactional Replication
Publication Issues
• Before you start your replication process, you
should consider a few more topics, including data
definition issues, IDENTITY column issues, and
some general rules involved when publishing.
• Keep the following data definition items in mind
when you are preparing to publish data:
– Timestamp data types
– Identity values
– User-defined data types
– Not for replication
Tips for Distribution Servers
• Here are some tips to keep in mind when selecting
a machine to be the distributor:
– Ensure you have enough hard disk space for the
Distribution working folder and the distribution
database.
– You must manage the distribution database’s
transaction log carefully.
– The distribution database will store all
transactions from the publisher to the
subscriber.
Tips for Distribution Servers
– Snapshots and merge data are stored in the
Distribution working folder.
– Be aware of the size and number of articles
being published.
– Text, ntext, and image datatypes are replicated
only when you use a snapshot.
– A higher degree of latency can significantly
increase your storage space requirements.
– Know how many transactions per
synchronization cycle there are.
Replication Models
• You can use one of several models for each
replication process that you implement:
– Central publisher/central distributor
– Remote distribution
– Central subscriber/multiple publishers
– Multiple publishers/multiple subscribers
Central Publisher Model
Remote Distribution
Central Subscriber/Multiple Publishers
Multiple Publishers/Multiple Subscribers
• Use this model when you need to maintain a
single table on multiple servers.
• Each server subscribes to the table and also
publishes the table to other servers.
• This model can be particularly useful in the
following business situations:
– Reservations systems
– Regional order-processing systems
– Multiple warehouse implementations
Multiple Publishers/Multiple Subscribers
Heterogeneous Replication
• Heterogeneous database replication allow
you to replicate data to non-Microsoft
database servers including replicate to
databases across the Internet.
• Heterogeneous replication occurs when you
publish to other databases through an OLE
DB connection.
Heterogeneous Replication
• When you publish to these non–SQL Server
subscribers, you need to keep the following
rules in mind:
– Only push subscriptions are supported.
– You can publish index views as tables; they
cannot be replicated as an indexed view.
– Snapshot data will be sent using bulk copy’s
character format.
– Datatypes will be mapped as closely as
possible.
Heterogeneous Replication
– The account under which the distribution agent
runs must have read access to the install
directory of the OLE DB provider.
– If an article is added to or deleted from a
publication, subscriptions to non–SQL Server
subscribers must be reinitialized.
– NULL and NOT NULL are the only constraints
supported for all non–SQL Server subscribers.
– Primary key constraints are replicated as unique
indexes.
Replication over the Internet
• Replicating data over the Internet allows
remote, disconnected users to access data
stored or “parked”, temporarily on an FTP
site when they need it using a connection to
the Internet. Replicate data over the Internet
using:
– A Virtual Private Network (VPN).
– The Web synchronization option for merge
replication.
Installing and Using Replication
• To successfully install and enable
replication, you must install a distribution
server, create your publications, and then
subscribe to them.
• Before any of this can take place, you must
first configure SQL Server.
• To install your replication scenario, you must
be a member of the sysadmins fixed server
role.
Installing and Using Replication
• Before you can configure your SQL Server for
replication, the computer itself must meet the
following requirements:
– All servers involved with replication must be
registered in Management Studio.
– If the servers are from different domains, Active
Directory trust relationships must be established
before replication can occur.
– Any account you use must have access rights to
the Distribution working folder on the
distribution server.
Installing and Using Replication
• Use a single Windows domain user account
for all your SQL Server Agents.
– Do not use a LocalSystem account because
this account has no network capabilities and
will not, therefore, allow replication.
– Also, you need to make the account a
member of the Domain Administrators group
because only administrators have access to
the system ($) shares.
Installing a Distribution Server
• Before you can enable a publication
database, you must be a member of the
sysadmin fixed server role.
• Once you have enabled publishing, any
member of that database’s db_owner role
can create and manage publications.
Adding a Publication
• The Create Publication Wizard allows you to specify
the following options:
– Number of articles
– Schedule for the snapshot agent
– Whether to maintain the snapshot on the
distributor
– Tables and stored procedures you want to publish
– Publications that will share agents
– Whether to allow updating subscribers
– Whether to allow pull subscriptions
Creating a Subscription
• As part of the process of creating a
subscription, you will be able to specify the
publishers you want to subscribe to and a
destination database to receive the
published data, verify your security
credentials, and set up a default schedule.
Testing Replication
• You can now verify that replication is running
properly.
Replication Monitor
• The Replication Monitor gathers replication
information about the different replication
agents. This includes the agent history, with
information about inserts, updates, deletes,
and any other transactions that were
processed.
• Through the Replication Monitor, you can
also edit the various schedules and
properties of the replication agents.
Replication Scripts
• Now that you have replication set up and working
properly, you may want to save all your hard work
in the form of a replication script.
Replication Scripts
• Scripting your replication scenario has the following
advantages:
– You can use the scripts to track different
versions of your replication implementation.
– You can use the scripts (with some minor
tweaking) to create additional subscribers and
publishers with the same basic options.
– You can quickly customize your environment by
modifying the script and then rerunning it.
– You can use the scripts as part of your database
recovery process.
Replication Scripts
• From here you can script the distributor and
publications for the various replication items
stored with this distribution server.
• You can also script the options for any
subscribers and even the replication jobs.
• When you have made your choices, just click
the Script to File button and save the script
wherever you like.
Replication Resources
• Replication requires considerable memory and
processor resources.
• You can perform a number of tweaks to increase
the performance of your replication scheme:
– Set a minimum memory allocation limit.
– Use a separate hard disk for all the databases
used in replication.
Replication Resources
– Use multiple processors.
– Publish only the amount of data required.
– Place the snapshot folder on a drive that does
not have database or log files.
– Be sparing with horizontal partitioning.
– Use a fast network.
– Run agents continuously instead of frequently.
Summary
• Replication is a powerful tool used to
distribute data to other database engines in
your enterprise, which you need to do so
your data will be closer to your users and,
therefore, faster and easier for them to
access.
Summary
• Microsoft uses a publisher/subscriber metaphor to
explain replication.
• The publisher contains the data that needs to be
copied.
• The subscribers get a copy of the data from the
publisher, and the distributor moves the data from
the publisher to the subscribers.
• The data are published in groups called
publications; a publication can contain several
articles, which are the actual data being replicated.
Summary
• You can choose from three main types of replication:
merge, transactional, and snapshot.
• Each has pros and cons, but you should consider three
main issues when picking a replication type: autonomy,
latency, and consistency.
• In other words, you need to know whether the data has
to be replicated right away or whether it can be late
(latency).
• You need to know whether subscribers can update the
data (autonomy); and you need to know whether the
transactions need to be applied all at the same time
and in a specific order (consistency).
Summary
• When you have picked the right type of
replication, you have a number of physical
models to choose from:
– Central publisher/central distributor
– Remote distribution
– Central subscriber/multiple publishers
– Multiple publishers/multiple subscribers
– Heterogeneous replication
Summary
• Once you have implemented a replication solution,
you need to back it up.
– You should back up all the databases involved in
replication, but especially the distributor,
because if you do not, the transaction log in the
distribution database will fill up and replication
will stop.
• You should also generate replication scripts so that
if your server ever suffers a catastrophic failure,
you will be able to rebuild the replication solution
much faster.
Summary
• Also keep in mind all the points for enhancing
replication performance.
• Once you have implemented replication, your
users will come to depend on it, and if it doesn’t
move fast enough, it will not be dependable,
and your users will not be happy.
• If you keep it in top shape, though, users will be
able to take full advantage of the power of
replication.
Summary for Certification Examination
• Know the publisher/subscriber metaphor:
Publishers contain the original copy of the
data where changes are made.
• Subscribers receive copies of the data from
the publishers.
• The data are disseminated to the
subscribers through the distributor.
Summary for Certification Examination
• Know the types of replication: Three basic types of
replication exist: snapshot, transactional, and merge.
• In transactional replication, transactions are read right from
the transaction log and copied from the publisher to the
subscribers.
• In snapshot replication, the entire publication is copied
every time the publication is replicated.
• In merge replication, data from the publisher is merged with
data from the subscribers, which are allowed to update.
• With the immediate updating subscribers and queued
updating options, subscribers can make changes to data
that has been replicated with transactional and snapshot
data as well.
Summary for Certification Examination
• Know the replication models: You need to be
familiar with the various models—that is, who
publishes, who subscribes, and who distributes.
• In the central publisher/central distributor model, a
single server is both the publisher and distributor,
and there are multiple subscribers.
• The remote distribution model has one publishing
server, a separate distributor, and multiple
subscribers.
Summary for Certification Examination
• In the central subscriber/multiple publishers
model, multiple publishers all publish to a single
subscribing server.
• The multiple publishers/multiple subscribers
model contains multiple publishing servers and
multiple subscribing servers. The number of
distributors is undefined.
• Heterogeneous replication describes replication to
a third-party database engine, such as DB2 or
Oracle.
Summary for Certification Examination
• Understand how publications and articles
work: A publication comprises articles that
contain the data being replicated. An article
is actually a representation of a table. The
article can be partitioned either vertically or
horizontally, and it can be transformed.