2515 - Distributed Databases

Download Report

Transcript 2515 - Distributed Databases

G063 - Distributed Databases
Learning Objectives:
By the end of this topic you should be able to:
• explain how databases may be stored in more than
one physical location
• explain the methods by which this distribution may be
carried out
• explain reasons why distribution would be carried out
• explain the security issues of distributed databases
Database storage:
Database storage:
Database storage:
Database storage:
A Distributed Database is:
• a single logical database
– consisting of many entities
– possibly used by many users for different purposes
• a database is not stored in its entirety at a single
physical location
• database is spread physically across a number of
computers
– computers could be in multiple locations
 buildings or sites,
– computers connected by a data communications link
 LAN and/or WAN
Why distribute a database:
• allows faster local queries
– faster searching
• speeds up other network operations
– due to some data queries being handled locally
 reduces network traffic
• improved reliability
– data may be replicated at multiple sites
• allows for modular growth of the database
– can easily add new sites and/or uses
• user does not need to know where data is stored
physically
– looks like a single, location, centralized system to the user
Types of Distributed Database
• Replicated
• Centralised
• Partitioned
Replicated Database
• complete database is duplicated at each centre
• exact copy of the database stored & accessed locally
• duplicated versions are usually read only
– transaction files created of changes at each centre
• updates allowed made on a master database
– a ‘new’, updated copy of database sent to each centre
 at regular intervals
Replicated Database
Advantages:
• reliability
– data is always available locally
– not reliant on the network or central server
– work carries on even if some nodes are down
• fast response to searches
– local access will be faster than WAN access
 data does not have to be transmitted over the network
• reduced network traffic at prime time
– faster access to network if required
Replicated Database
Disadvantages
• additional local requirements for storage space
• additional time for update operations
• complexity and cost of updating
• data integrity issues
– if replicated data is not updated simultaneously
– local copies of data may be different
Centralised Database
• single database held centrally (possibly at Head Office)
• each node accesses database through a network (WAN)
– access available to all branches or offices,
• an index to the central database is held locally at each
node
– speeds up queries/transactions
• booking systems need distributed access to a central
database if they are to work effectively
– sharing of up-to-date information important,
– avoids double bookings.
Centralised Database
Advantages:
• better security of data
– one copy rather than several (replicated copies)
– security handled centrally
• good data integrity
– one copy rather than several
 always sharing the same data
• data can be updated in real time
– data always up-to-date
• centralised backup
– can be automated
Centralised Database
Advantages (from June 2011 Q13 mark scheme):
• storage is only required at the central location for the centralised database (1)
the local indexes stored at each site take up far less memory (1)
• queries are processed locally(1) this speeds up searches as only the required
data is retrieved from the central location (1)
• less data traffic than complete centralisation (1) as only data is sent and not
the additional information /forms/reports structure (1)
• increased security (1) only central database needs increased security as that is
where the data is stored (1)
• integrity of data not compromised (1) as it is stored in only one location and
one database to update (1)
• centralised back-up of data (1) management backup easier as it is just one
person’s responsibility (1)
Centralised Database
Drawbacks:
• a virus in the central system could spread throughout
all sites
• possibility of update clashes
– two sites trying to modify the same record at the same
time
Partitioned Database
• database is split into sections
• each node or site on the network stores local data
– i.e the section of the database that relates to that site,
 e.g. the section of the database that relates to a single supermarket’s
stock is stored at that site,
• other (global) data is held centrally
– changes to central data can be dealt with overnight by a
batch update from the sites,
Horizontal partitioning
• involves putting different rows into different tables.
• splitting the table into number of smaller tables
– on the basis of rows (records)
 i.e. specific field contents
Example:
• branch offices in an organization deal mostly with a set
of local customers
– Euston Road branch stores the fragment where contents of
the Branch field = 'Euston Road'
Horizontal partitioning
• this table represents the database for an estate
agency with 3 branches
Horizontal partitioning
• the database is horizontally partitioned
– so that the data for each branch is stored on the
server in that branch:
– this will speed up local queries
 Boldmere staff searching for properties in Boldmere
Horizontal partitioning
• this means that the data is stored like this:
Horizontal partitioning
• this means that the data is stored like this:
Vertical partitioning
• dividing the table based on the different columns.
• involves creating tables with fewer columns
– using additional tables to store the remaining columns.
• different columns of a table located at different sites
– e.g. stock descriptions (country of origin, supplier name at
one site and prices at another site)
Vertical partitioning
From June 2011 Q13 mark scheme:
• only certain people see certain fields
– e.g. financial matters not revealed to all (1)
• to conform to the law/DPA (1)
– keeping personal information private (1)
• reduces amount of data being sent between locations (1)
– in order to speed up data transfer (1)
– allowing faster reaction time (1)
– meaning rescue reaches emergency quicker (1)
Partitioned Database
Advantages:
• speed:
– faster access to local data
 less network access required
• local control over local data
• scalability
– can add new sites as required
• not reliant on network or server for day-to-day tasks
• each partition can have its own transaction log
– local reporting (access/sales)
Partitioned Database
Drawbacks:
• data inconsistency
– possibility of different data being held centrally to that on
partition
– regular batch update required to maintain consistency
• unsuitable for certain applications
– if data changes at one node must be instantly seen by all nodes
 e.g. holiday bookings
• high network usage during update process
– will slow down other network processes