DATABASE SYSTEMS - 10p Course No. ??

Download Report

Transcript DATABASE SYSTEMS - 10p Course No. ??

1
UU - DIS - UDBL
DATABASE SYSTEMS - 10p
Course No. 2AD235
Spring 2002
A second course on development of
database systems
Kjell Orsborn
Uppsala Database Laboratory
Department of Information Technology, Uppsala University,
Uppsala, Sweden
Kjell Orsborn
2016-04-11
2
UU - DIS - UDBL
Introduction to Distributed DBMSs
(Elmasri/Navathe ch. 24)
Distributed DBMS (ch. 24.4 and 24.5 are omitted)
Kjell Orsborn
Uppsala Database Laboratory,
Department of Information Technology,
Uppsala University, Uppsala, Sweden
Kjell Orsborn
2016-04-11
3
UU - DIS - UDBL
Distributed DBMSs
• A distributed database (DDB) is a collection of several logically
interrelated databases distributed over a computer network including
a number of computers (nodes).
• A distributed database mangement system (DDBMS) is a software
system that permits management of DDB’s and that makes the
distribution transparent for the user.
• A DDB is not:
– a collection of files (need structure and DB manager)
– a client-server interface to a database
• data on one node, clients on other nodes in network
• (almost) every centralized DBMS has client-server interface
Kjell Orsborn
2016-04-11
4
UU - DIS - UDBL
Background
• What is a Distributed System?
• A Distributed System is a number of autonomous computers
communicating over a network with software for integrated tasks.
• Examples of Distributed Systems:
• SUN’s Network File System (NFS), distributed file system
Kjell Orsborn
2016-04-11
5
UU - DIS - UDBL
Distributed DBMSs . . .
Distributed database over several
nodes in a network
Centralized database in a network
Node 5
Node 1
Node 5
Node 1
Node 2
Node 2
communication
network
communication
network
Node 4
Node 4
Node 3
Node 3
Kjell Orsborn
2016-04-11
6
UU - DIS - UDBL
Centralized Database Server
•
•
•
•
•
Stream (row-by-row) based client-server interfaces
DBMS specific interfaces
Compiler integrated interfaces (embedded SQL)
ODBC: SQL-based standardized subroutine call library (Microsoft)
JDBC: ODBC for Java (not Microsoft)
Kjell Orsborn
2016-04-11
7
UU - DIS - UDBL
Distributed Databases
•
•
•
•
Database seen as one unit; queries and updates to ONE database.
Data in database transparently distributed over many DB nodes.
Manual partitioning or fragmentation of data tables.
DBMS automatically optimizes queries and updates to distributed database.
Kjell Orsborn
2016-04-11
8
UU - DIS - UDBL
Multi-Databases
• Database seen as several heterogeneous units
• Multi-database query language needed to combine data from the
databases.
• Primitives needed to integrate (combine, fuse) data from the databases.
• Special query optimization techniques to deal with heterogneity and
dynamism.
Kjell Orsborn
2016-04-11
9
UU - DIS - UDBL
Example of Multi-Database
• Automatic Teller Machines, ATMs
Kjell Orsborn
2016-04-11
10
UU - DIS - UDBL
Fragmentation of data
• data fragmentation (= data partitioning)
• division of data sets (e.g. a relation) into several pieces - fragments
transparently stored on several different nodes
• increased accessability and performance
• several types of fragmentation:
– horisontal fragmentation
– vertical fragmentation
– mixed fragmentation
• good when nodes far apart
Kjell Orsborn
2016-04-11
11
UU - DIS - UDBL
Replication of data
• copies of the same data on several nodes
• increased reliability and access performance
• more complex updating, transactions handling, recovery.
– updates must be propagated to each replica!
– special procedures after failures to restore consistency
– more problematic transaction synchronization!
• types of replication:
–
–
–
–
–
Kjell Orsborn
full replication (whole db at each node)
no replication (each fragment only at one node)
partial replication (certain fragments replicated)
not necessary to replicate all tables
full replication often not realistic!
2016-04-11
12
UU - DIS - UDBL
Transparency in a DDBMS
•
•
•
By transparency we here mean the hiding of basic implementation details from one abstraction
level to another.
Data independence
– logical data independence
– physical data independence
Network transparency
–
–
–
protect user from operational details of network
hides the existence of a network
no machine names in database table references
•
•
•
Replication transparency
–
–
–
•
location transparency
naming transparency
user should not experience data replicas
automatic handling of updates, such as replica propagation
automatic handling of node crasches
Fragmentation transparency
–
hides the existence of fragments
•
–
Kjell Orsborn
e.g. that a logical relation is horizontally fragmented into local physical tables
handling of transformation of global queries to fragmented queries
2016-04-11
13
UU - DIS - UDBL
Advantages of Distributed DBMSs . . .
• Data sharing
– uniform interface and sharing of data through the DDBMS
– natural to distribute certain database applications
• Increased reliability
– redundance increase security and accessability
– crashes less severe (if application not dependent of non-local data)
• Local independence
– allows sharing of data but keeps local control of data
• Improved performance
– avoid unnecessary data transfer
• Expandibility
– easy to add new nodes (not always linear scale up due to central directory)
• Local autonomy
– local control
– local policies
Kjell Orsborn
2016-04-11
14
UU - DIS - UDBL
Problems with Distributed DBMSs . . .
• Complexity
– database administration becomes more complex (such as recovery)
– increased complexity of system design, implementation and maintenance
• Security
– keep security in a network harder
• Networking a known problem
• Distributed administration
– less control and more meetings
• Cost
– hardware - software - development/maintenance
Kjell Orsborn
2016-04-11
15
UU - DIS - UDBL
Problems with Distributed DBMSs . . .
• Distributed schema management
–
–
–
–
schema is accessed whenever SQL query issued!
global directory => Central Database becomes hot spot
local directories => Data replication
=> Since schema is not updated often but need to be accessed very often it
is normally fully replicated by the DDBMS.
• Distributed concurrency control
– consistency of replicas: mutual consistency
• Distributed deadlock management
• Reliability of DDBMS
– consistency of replicas
– bring up (fragmented) database at failed sites
• OS Support
– multiple layers of network software
Kjell Orsborn
2016-04-11
16
UU - DIS - UDBL
Additional functionality required by DDBMS
• Access of physically divided databases - schema management
• Handling of distribution and replication of data
– which copy of data should for example be used
• Handling of consistency of replicated data
• Handling of distributed queries
• Handling of distributed transactions (over several network
nodes)
• Handling of recovery/restart from crashes (of nodes) and new
types of errors such as communication errrors/failures.
Kjell Orsborn
2016-04-11
17
UU - DIS - UDBL
Distributed database design
• Goal:
– to minimize the combined cost of maintaining data, recieve efficient
communication and good performance for transactions.
• Problems:
–
–
–
–
–
where (on which node/nodes) shall data and applications be placed
partitioning of data (split data into distributed partitions)
replication of data (copies of data on several nodes)
NP-complete optimization problem.
distributed query processing
• automatically done by distributed query processor of DDBMS
• analyze query --> distributed execution plan
• factors:
– data replication
– data availability
– communication costs
Kjell Orsborn
2016-04-11