Distributed Databases

Download Report

Transcript Distributed Databases

Distributed Databases
Dr. Lee 
By Alex Genadinik
Distributed Databases? What is
that!??
• Distributed Database - a collection of
multiple logically interrelated databases
distributed over a computer network
Overview
• Because the database is distributed,
different users can access it without
interfering with one another.
• However, the DBMS must periodically
synchronize the scattered databases to
make sure that they all have consistent
data.
Visual Representation
More Detailed List of Benefits
• No centralized point of failure (data is not
centralized).
• Local autonomy
• Ability to distribute data over multiple
storage drives (no supercomputers)
• Replication of Data for Disaster Recovery
and High Availability
Closer look at the drawbacks
• Increased complexity of database design,
hardware and other software
• Gives rise to absolute need of complicated
security software and procedures
• Requires resolution for concurrent
operation as well as having data integrity
issues
System Transparency
• Location Transparency – A command works the
same no matter where in the system it is issued
• Naming Transparency – We can refer to data by
the same name, from anywhere in the system,
with no further specification.
• Replication Transparency – Hides multiple
copies of data from user
• Fragmentation Transparency – Hide the fact that
data is fragmented (ie, different sections of
correlated data may be in different locations)
Architecture, Visually
More of Conceptual View
2 Basic Patterns
• Horizontal – Store
Whole Tuples on
Different machines.
• Vertical – Store
Different Fields of the
same tuples on
Different machines.
Horizontal pattern
• Entire tuples are on different machines
This is nice because we can use standard
relational algebra statements to define a
restriction on a relation that creates these:
s”new york” (City)
s “chicago” (City)
Vertical pattern
• Store Different Fields of the same tuples
on Different machines
Use Projection Op to declare these:
P (Acct #, Branch, Client Name Account)
P (Acct #, Balance Account)
(requires redundant storage of at least one
primary key per tuple)
Few Comments Before Moving On
• Data is completely dispersed
• Data is replicated (helps in case of
accidents)
• There is no global directory
• Local-Master Directory
• Each node has its own catalog of data
• Each node has a directory to all of its data
that is replicated elsewhere.
Cont..
• Each database in a distributed database is
distinct from all other databases in the
system and has its own global database
name
Name Resolution
• Every data object in every schema in
every database has a unique identifying
name
• SELECT * FROM “Some Remote
Database with a unique name” WHERE
“X”;
Remote and Distributed SQL
Statements
• Remote update – modification of data in
one or more tables (all tables located on
the same remote node).
• Remote query - retrieves information from
two or more nodes.
Case Study
One may think distributed databases are
required in large corporations that have
large databases. This is not true.
Sometimes even in a single office, with
only two cubicles that have two computers
you may need to have your database on a
network i.e., distributed.
Case Study cont..
If the two users needed to use the
company’s database and make changes
to some data, they needed to have the
database centralized somewhere.
They could not make changes to the
database because the other person
wouldn’t be able to see them and would be
working with an outdated database.
Conclusion
If you are not running a simple database
that is local to only your workstation, you
need to be using a database that is on
some server i.e., a distributed database.
Conclusion cont..
Thank you everyone for your
Attention.
~ Alex