Distributed Databases - School of Information and Communication

Download Report

Transcript Distributed Databases - School of Information and Communication

Lecture 11
Distributed Databases
and
Cloud computing
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-1
Definitions
• Distributed Database: A single logical
database that is spread physically across
computers in multiple locations that are
connected by a data communications link.
• Decentralized Database: A collection of
independent databases on non-networked
computers.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-2
Reasons for
Distributed Database
• Local business units want control over data.
• Consolidate data across local databases for
integrated decision making.
• Reduce telecommunications costs.
• Reduce the risk of telecommunications
failures.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-3
Distributed Database Options
• Homogeneous - Same DBMS at each node.
– Autonomous - Independent DBMSs.
– Non-autonomous - Central , coordinating
DBMS.
• Heterogeneous - Different DBMSs at
different nodes.
– Gateways - Simple paths are created to other
databases without the benefits of one logical
database.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-4
Distributed database environments
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-5
Homogeneous, Non-Autonomous
Database
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-6
Homogeneous, Non-Autonomous
Database
• Data is distributed across all the nodes.
• Same DBMS at each node.
• All data is managed by the distributed
DBMS (no exclusively local data.)
• All access is through one, global schema.
• The global schema is the union of all the
local schema.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-7
Focus on The Following
Heterogeneous Environment
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-8
Focus on The Following
Heterogeneous Environment
• Data distributed across all the nodes.
• Different DBMS may be used at each node.
• Local access is done using the local DBMS
and schema.
• Remote access is done using the global
schema.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-9
Objectives and Trade-offs
• Location Transparency - User does not have
to know the location of the data.
• Local Autonomy - Local site can operate
with its database when central site is down.
• Synchronous Distributed Database - All
copies of the same data are always identical.
• Asynchronous Distributed Database - Some
data inconsistency is tolerated.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-10
Advantages of
Distributed Database
•
•
•
•
•
Increased reliability and availability.
Local control over data.
Modular growth.
Lower communication costs.
Faster response for certain queries.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-11
Disadvantages of
Distributed Database
•
•
•
•
Software cost and complexity.
Processing overhead.
Data integrity exposure.
Slower response for certain queries.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-12
Options for
Distributing a Database
•
•
•
•
Data replication.
Horizontal partitioning.
Vertical partitioning.
Combinations of the above.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-13
Data Replication
• Advantages – Reliability.
– Fast response.
– May avoid complicated distributed transaction
integrity routines (if replicated data is refreshed
at scheduled intervals.)
– De-couples nodes (transactions proceed even if
some nodes are down.)
– Reduced network traffic at prime time (if
updates can be delayed.)
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-14
Data Replication
• Disadvantages –
–
–
–
Additional requirements for storage space.
Additional time for update operations.
Complexity and cost of updating.
Integrity exposure of getting incorrect data if
replicated data is not updated simultaneously.
• Therefore, better when used for non-volatile
data.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-15
Types of Data Replication
• Snapshot Replication – Changes are periodically sent to a master site
which sends an updated snapshot out to the
other sites.
• Near Real-Time Replication – Broadcast update orders without requiring
confirmation.
• Pull Replication – Each site controls when it wants updates.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-16
Issues in Data Replication Use
• Data timeliness.
• Useful if DBMS cannot reference data from more
than one node.
• Batched updates can cause performance problems.
• Updates complicated with heterogeneous DBMSs
or database design.
• Telecommunications speeds may limit mass
updates.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-17
Horizontal Partitioning
• Different records of a file at different sites.
• Advantages – Data stored close to where it is used.
– Local access optimization.
– Security.
• Disadvantages
– Accessing data across partitions.
– No data replication.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-18
Vertical Partitioning
• Different columns of a file at different sites.
• Advantages and disadvantages are the same
as for horizontal partitioning except that
combining data across partitions is more
difficult because it requires joins.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-19
Factors in Choice of
Distributed Strategy
•
•
•
•
•
•
Funding, autonomy, security.
Site data referencing patterns.
Growth and expansion needs.
Technological capabilities.
Costs of managing complex technologies.
Need for reliable service.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-20
Cloud computing
• Cloud computing is the latest evolution of Internet-based
computing.
• The potential benefits of cloud computing are
overwhelming. However, attaining these benefits requires
that each aspect of the cloud platform support the key
design principles of the cloud model.
• One of the core design principles is dynamic scalability, or
the ability to provision and decommission servers on
demand.
• Unfortunately, the majority of today’s database servers are
incapable of satisfying this requirement.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-21
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-22
Key Benefits of Cloud Computing:
• Lower costs: All resources, including expensive networking
equipment, servers, IT personnel, etc. are shared, resulting in reduced
costs, especially for small to mid-sized applications and prototypes.
• Dynamic scalability: Most applications experience spikes in traffic.
Instead of over-buying your own equipment to accommodate these
spikes, many cloud services can smoothly and efficiently scale to
handle these spikes with a more cost-effective pay-as-you-go model.
• Simplified maintenance: upgrades are rapidly deployed across the
shared infrastructure, as are backups.
• Large scale testing: Cloud computing makes large scale prototyping
and load testing much easier. You can easily spawn 1,000 servers in the
cloud to load test your application and then release them as soon as
you are done.
• Faster development: Cloud computing platforms provide many of the
core services that, under traditional development models, would
normally be built in house. These services, plus templates and other
tools can significantly accelerate the development cycle.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-23
Evolving Cloud Database Requirements
• Cloud database usage patterns are evolving, and business adoption of these
technologies accelerates that evolution. Initially, cloud databases serviced
consumer applications. These early applications put a priority on read access,
because the ratio of reads to writes was very high. Delivering highperformance read access was the primary purchase criteria. However, this is
changing.
• Consumer-centric cloud database applications have been evolving with the
adoption of Web 2.0 technologies. User generated content, particularly in the
form of social networking, have placed somewhat more emphasis on updates.
Reads still outnumber writes in terms of the ratio, but the gap is narrowing.
With support for transactional business applications, this gap between
database updates and reads is further shrinking. Business applications also
demand that the cloud database be ACID compliant: providing Atomicity,
Consistency, Isolation and Durability.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-24
The Achilles Heel of Cloud Databases
• Dynamic scalability—one of the core principles of cloud
computing—has proven to be a particular problem for
databases. The reason is simple; most databases use a sharednothing architecture. The shared-nothing architecture relies on
splitting (partitioning) the data into separate silos of data, one
per server.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-25
Are Replicated Tables the Answer?
• Since data partitioning and cloud databases are inherently
incompatible, Amazon, Facebook and Google have taken
another approach to solve the cloud database challenge.
They have created a persistence engine—technically not a
database—that abandons typical ACID compliance in
favor replicated tables of data that store and retrieve
information while supporting dynamic or elastic
scalability.
• Facebook offers BigTable, Amazon has SimpleDB and
Facebook is working on Cassandra. However, they are not
a replacement for a real database, and they do not address
corporate cloud computing requirements.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-26
The Shared-Disk Database Architecture is Ideal
for Cloud Databases
• The database architecture called shared-disk, which eliminates the
need to partition data, is ideal for cloud databases. Shared-disk
databases allow clusters of low-cost servers to use a single collection
of data, typically served up by a Storage Area Network (SAN) or
Network Attached Storage (NAS). All of the data is available to all of
the servers, there is no partitioning of the data. As a result, if you are
using two servers, and your query takes .5 seconds, you can
dynamically add another server and the same query might now take .35
seconds. In other words, shared-disk databases support elastic
scalability.
• The shared-disk DBMS architecture has other important advantages—
in addition to elastic scalability—that make it very appealing for
deployment in the cloud.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-27
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-28
Conclusion
• Whether you are assembling, managing or developing on a
cloud computing platform, you need a cloud-compatible
database.
• Shared-nothing databases require data partitioning, which
is structurally incompatible with dynamic scalability, a
core foundation of cloud computing.
• The shared-disk database architecture, on the other hand,
does support elastic scalability. It also supports other cloud
objectives such as lower costs for hardware, maintenance,
tuning and support.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-29