Column-Oriented DB, IMDB
Download
Report
Transcript Column-Oriented DB, IMDB
Advanced Database Systems:
DBS CB, 2nd Edition
Advanced Topics of Interest:
In-Memory DB (IMDB) and
Column-Oriented DB
1
Outline
In-Memory Database (IMDB)
Column-Oriented Database (C-Store)
2
In-Memory Database (IMDB)
3
Introduction
In Memory database system (IMDB)
Data resides permanently on main physical
memory
Backup copy (optionally) on disk
Disk Resident database system (DRDB)
Data resides on disk
Data may be cached into memory for access.
Main difference is that in IMDB, the primary
copy. lives permanently in memory
4
Questions about IMDB
Is it reasonable to assume that the entire
database fits in memory?
Yes, for some applications!
What is the difference between a IMDB and a
DRDB with a very large cache?
In DRDB, even if all data fits in memory, the data
structures and algorithms are designed for disk
access.
5
Differences in properties of main
memory and disk
The access time for main memory is orders of
magnitude less than for disk storage
Main memory is normally volatile, while disk
storage is not
The layout of data on disk is much more critical
than the layout of data in main memory
6
Impact of memory resident data
The differences in properties of main-memory
and disk have important implications in:
Concurrency control
Commit processing
Access methods
Data representation
Query processing
Recovery
Performance
7
Concurrency control
Access to main memory is much faster than disk
access, so we can expect that transactions
complete more quickly in an IMDB system
Lock contention may not be as important as it is
when the data is disk resident
8
Commit Processing
As protection against media failure, it is
necessary to have a backup copy and to keep a
log of transaction activity
The need for a stable log threatens to
undermine the performance advantages that can
be achieved with memory resident data
9
Access Methods
The costs to be minimized by the access
structures (indexes) are different, i.e., B-tree vs.
T-tree…
10
Data representation
Main memory databases can take advantage of
efficient pointer following for data representation
11
A study of Index Structures
for Main Memory Database
Management Systems
Tobin J. Lehman
Michael J. Carey
VLDB 1986
12
Disk versus In-Memory
Primary goals for a disk-oriented index structure
design:
Minimize the number of disk accesses
Minimize disk space
Primary goals of an In Memory index design:
Reduce overall computation time
Using as little memory as possible
13
Classic index structures
Arrays:
A: use minimal space, providing that the size is known in
advance
D: impractical for anything but a read-only environment
AVL Trees:
Balanced binary search tree
The tree is kept balanced by executing rotation operations when
needed
A: fast search
D: poor storage utilization
14
Classic index structures (cont)
B trees:
Every node contains some ordered data items and pointers
Good storage utilization
Searching is reasonably fast
Updating is also fast
15
Hash-based indexing
Chained Bucket Hashing:
Static structure, used both in memory and in disk
A: fast, if proper table size is known
D: poor behavior in a dynamic environment
Extendible (Dynamic) Hashing:
Dynamic hash table that grows with data
A hash node contain several data items and splits in two when an
overflow occurs
Directory grows in powers of two when a node overflows and has
reached the max depth for a particularly directory size
16
Hash-based indexing (cont)
Linear Hashing:
Uses a dynamic hash table
Nodes are split in predefined linear order
Buckets can be ordered sequentially, allowing the bucket
address to be calculated from a base address
The event that triggers a node split can be based on storage
utilization
Modified Linear Hashing:
More oriented towards main memory
Uses a directory which grows linearly
Chained single items nodes
Splitting criteria is based on average length of the hash chains
17
The T-tree
A binary tree with many elements kept in order in a
node (evolved from AVL tree and B tree)
Intrinsic binary search nature
Good update and storage characteristics
Every tree has an associated minimum and maximum
count
Internal nodes (nodes with two children) keep their
occupancy in the range given by min and max count
18
The T-tree
19
Search algorithm for T-tree
Similar to searching in a binary tree
Algorithm:
Start at the root of the tree
Loop:
If the search value is less than the minimum value of the node
Then search down the left sub-tree
If the search value is greater than the maximum value in the
node
Then search the right sub-tree
Else search the current node
The search fails when a node is searched and the item is
not found, or when a node that bounds the search value
cannot be found
20
Insert algorithm
Insert (x):
Search to locate the bounding
node
If a bounding node is found:
Let a be this node
If value fits then insert it into a
and STOP
Else
remove min element amin from node
Insert x
Go to the leaf containing greatest lower bound for a and insert amin
into this leaf
21
Insert algorithm (cont)
If a bounding node is not found
Let a be the last node on the search path
If insert value fits then insert it into the node
Else create a new leaf with x in it
If a new leaf was added
For each node in the search path (from leaf to root)
If the two sub-trees heights differ by more than one, then rotate and
STOP
22
Delete algorithm
(1) Search for the node that bounds the delete value;
search for the delete value within this node, reporting an
error and stopping if it is not found
(2) If the delete will not cause an underflow then delete
the value and STOP
Else, if this is an internal node, then delete the value
and ‘borrow’ the greatest lower bound
Else delete the element
(3) If the node is a half-leaf and can be merged with a
leaf, do it, and go to (5)
23
Delete algorithm (cont)
(4) If the current node (a leaf) is not empty, then STOP
Else free the node and go to (5)
(5) For every node along the path from the leaf up to the
root, if the two sub-trees of the node differ in height by
more than one, then perform a rotation operation
STOP when all nodes have been examined or a node
with even balanced has been discovered
24
LL Rotation
Left rotation
25
LR Rotation
A
Right rotation
Left rotation
C
B
Bl
Ar
Cr
Cl
26
Summary
We introduced a new In Memory index structure,
the T-tree
For unordered data, Modified Linear Hashing
should give excellent performance for exact
match queries
For ordered data, the T-tree provides excellent
overall performance for a mix of searches,
inserts and deletes, and it does so at a relatively
low cost in storage space
27
But…
Even if the T-trees have more keys in
each node, only the two end keys are
actually used for comparison
Since for every key in node we store a
pointer to the record, and most of the time
the record pointers are not used, the
space is ‘wasted’
28
C-Store: A Column-Oriented
RDBMS;
Michael Stonebraker
Column-Oriented Database
29
Traditional Row-Oriented Database
Store fields in one record contiguously on disk
Use B-tree indexing
Use small (e.g. 4K) disk blocks
Align fields on byte or word boundaries
Conventional (row-oriented) query optimizer and
executor (technology from 1979)
Aries-style transactions
30
Terminology -- “Row Store”
Record 1
Record 2
Record 3
Record 4
E.g. DB2, Oracle, Sybase, SQLServer, …
Row-Stores are Write Optimized
Can insert and delete (IUD) a record in one
physical write
Good for On-Line Transaction Processing (OLTP)
But not for read mostly applications
Data warehouses
CRM
32
Elephants Have Extended Row Stores
With Bitmap indices
Better sequential read
Integration of “data cube” products
Materialized views
But there may be a better idea…….
33
Column Stores
At 100K Feet….
Ad-hoc queries read 2 columns out of 20
In a very large warehouse, Fact table is rarely clustered
correctly
Column store reads 5-10% of what a row store reads
35
C-Store (Column Store) Project
Brandeis/Brown/MIT/UMass-Boston project
Usual suspects participating
Enough coded to get performance numbers for some
queries
Complete status later
Pioneering Work
Sybase IQ (early ’90s)
MonetDB (see CIDR ’05 for the most recent description)
36
C-Store Technical Ideas
Code the columns to save space
No alignment
Big disk blocks
Only materialized views (perhaps many)
Focus on Sorting not indexing
Automatic physical DBMS design
Optimize for grid computing
Innovative redundancy
Xacts – but no need for Mohan
Data ordered on anything, Not just time
Compression
Column optimizer and executor
37
No Alignment
Dense pack columns
E.g. a 5 bit field takes 5 bits
Current CPU speed going up faster than disk
bandwidth
Faster to shift data in CPU than to waste disk
bandwidth
38
Big Disk Blocks
Tunable
Big (minimum size is 64K)
39
Only Materialized Views
Projection (materialized view) is some number of
columns from a fact table
Plus columns in a dimension table – with a 1-n join
between Fact and Dimension table
Stored in order of a storage key(s)
Several may be stored!
With a permutation, if necessary, to map between them
Table (as the user specified it and sees it) is not stored!
No secondary indexes (they are a one column sorted MV
plus a permutation, if you really want one)
40
Example:
User view:
EMP (name, age, salary, dept)
Dept (dname, floor)
Possible set of MVs:
MV-1 (name, dept, floor) in floor order
MV-2 (salary, age) in age order
MV-3 (dname, salary, name) in salary order
41
Automatic Physical DBMS Design
Not enough 4-star wizards to go around
Accept a “training set” of queries and a space budget
Choose the MVs auto-magically
Re-optimize periodically based on a log of the interactions
42
Optimize for Grid Computing
I.e. shared-nothing
Dewitt (Gamma) was right
Horizontal partitioning and intra-query parallelism
as in Gamma
43
Innovative Redundancy
Hardly any warehouse is recovered by a redo
from the log
Takes too long!
Store enough MVs at enough places to ensure Ksafety
Rebuild dead objects from elsewhere in the
network
K-safety is a DBMS-design problem!
44
XACTS – No C. Mohan
Undo from a log (that does not need to be persistent)
Redo by rebuild from elsewhere in the network
Snapshot isolation (run queries as of a tunable time in the
recent past)
To solve read-write conflicts
Distributed Xacts
Without a prepare message (no 2 phase commit)
45
Storage (sort) Key(s) is not Necessarily
Time Efficient
That would be too limiting
So how to do fast updates to dense pack column storage
that is not in entry sequence?
46
Solution – a Hybrid Store (H-Store)
http://www.vldb2005.org/program/paper/thu/p553-stonebraker.pdf
(Much like Monet)
(Much like Monet)
Write-optimized
I/U store
(Much like MonetDB)
(Batch rebuilder)
Tuple mover
Read-optimized
Column store
(Batch rebuilder)
(What we have been
(What we have been
talking about so far)
talking about so far)
(What we have been
talking about so far)
MIT
47
Column Executor
Column operations – not row operations
Columns remain coded – if possible
Late materialization of columns
Column Optimizer
Chooses MVs on which to run the query
Build in snowflake schemas
Most important task
Which are simple to optimize without exhaustive search
Looking at extensions
48
Performance
100X popular row store in 40% of the space
7X popular row store in 1/6th of the space
Code available with BSD license
49
University Research
Extension of algorithms to non-snowflake schemas
Study of L2 cache performance
Study of coding strategies
Study of executor options
Study of recovery tactics
Non-cursor interface
Study of optimizer primitives
50
END
51