Parallel Database

Download Report

Transcript Parallel Database

Parallel Database Systems
The Future Of High Performance Database Systems
David Dewitt and Jim Gray
1992
Presented By – Ajith Karimpana
Parallel Databases
• History of Parallel Databases
• Why Parallel Databases ?
• How are they implemented ?
• Where are they implemented ?
• Future of Parallel Databases
Parallel Databases
Parallel Databases
The History
History of Parallel Databases
 Mainframes dominated most database and transaction
processing tasks.
 Parallel Machines were practically written off.
 Specialized Database Machines came up with trendy
hardware.
 Relational Data Model brought about a revolution.
History of Parallel Databases
Relational Data Model Revolution
 Uniform operations applied to uniform streams of data.
 Each operator produces a new relation.
 Pipelined Parallelism
 Partitioned Parallelism
History of Parallel Databases
Pipelined Parallelism
Streaming the output of one operator into the output of another operator.
Partitioned Parallelism
Partitioning the input data among multiple processors and memories, such
that an operator is split into many independent operators each working on a
part of the data.
Parallel Databases
Parallel Databases
WHY ?
Parallel Databases – Why ?
The Philosophy –
The ideal database machine would be a single infinitely fast
processor with an infinite memory with infinite bandwidth –
and it would be infinitely cheap (free).
But do we have such an ideal machine ?
NO
So the challenge is to build an infinitely fast processor out
of infinitely many processors of finite speed, and to build
an infinitely large memory with infinite memory bandwidth
from infinitely many storage units of finite speed.
Answer To This Challenge – Parallel Databases
Parallel Databases
Parallel Databases
The Implementation
Parallel Databases- Implementation
Parallel Database Implementation – The Basic Techniques
Two Key Properties -
Parallel Databases- Implementation
Two Kinds of Scale up –
 Batch – Same query running on N-times larger
database.
 Transactional – N-times as many clients,
submitting N-times as many requests against an
N-times larger database.
Parallel Databases- Implementation
Threats To Linear Speedup/Scale up
Parallel Databases- Implementation
Hardware Architecture
Shared Memory
Shared Disk
Parallel Databases- Implementation
Hardware Architecture
Shared Nothing
Parallel Databases- Implementation
Parallel Dataflow Approach To SQL Software
 SQL data model was originally proposed to
improve programmer productivity by offering a
nonprocedural database language.
 SQL came with Data Independence since the
programs do not specify how the query is to be
executed.
 Relational Queries with their properties can be
executed as a dataflow graph and can use both
pipelined and partitioned parallelism.
Parallel Databases- Implementation
Data Partitioning
 Partitioning a relation involves distributing its
tuples over several disks.
 Three Kinds –
 Round-robin Partitioning
 Range Partitioning
 Hashing Partitioning
Parallel Databases- Implementation
Range
Round-Robin
Hashing
Parallel Databases- Implementation
Round-Robin
Ideal for applications that wish to read entire relation sequentially
for each query.
Not ideal for point and range queries, since each of the n disks
must be searched.
Hash
Ideal for point queries based on the partitioning attribute.
Ideal for sequential scans of the entire relation.
Not ideal for point queries on non-partitioning attributes.
Not ideal for range queries on the partitioning attribute.
Range
Ideal for point and range queries on the partitioning attribute.
Parallel Databases- Implementation
Handling Of Skew
The distribution of tuples when a relation is
partitioned (except for Round-Robin) may be
skewed, with a high percentage of tuples placed
in some partitions and fewer tuples in other
partitions.
2 Kinds –
Data Skew (Attribute-value Skew)
Execution Skew (Partition Skew)
Parallel Databases- Implementation
Parallelism With Relational Operators
Consider a simple sequential query –
Parallel Databases- Implementation
A Relational Dataflow Graph
Parallel Databases- Implementation
Parallel Databases- Implementation
Famous Implementations Of Parallel Databases
Teradata
Tandem NonStop SQL
Gamma
The Super Database Computer
Bubba
nCUBE
Parallel Databases
Parallel Databases
The Future
Parallel Databases- The Future








Research Problems
Parallel Query Optimization
Application Program Parallelism
Physical Database Design
On-line Data Reorganization and Utilities
Future Directions
Many commercial success stories.
But research issues still remain unresolved.
Some applications are not well supported by
relational data model.
Object-oriented design ??
Parallel Databases
Thank You
Grilling Time !!