Transcript olfit
Cache-Conscious Concurrency
Control of Main-Memory Indexes
on Shared-Memory Multiprocessor
Systems
By: Sang K. Cha, Sangyong Hwang, Kihong Kim
and Kunjoo Kwon
Presenter: Kaloian Manassiev
1
Presentation Plan
Need for main-memory DBs
Special considerations for in-memory operation
Main-memory indexing structures and
concurrency control
OLFIT
Evaluation
Conclusions
2
Slide borrowed from time4change by Sang K. Cha
Main Memory DBMS
Database resident in memory
Read transactions simply read the in-memory data.
Update transactions do in-memory updates and write update log to the
log disk.
Occasionally, checkpoint the dirty pages of the in-memory database to
the disk-resident backup DB to shorten the recovery time.
MMDBMS
Primary DB
Checkpointing
Backup DB
Logging
Log
3
Slide borrowed from time4change by Sang K. Cha
Q: Is Disk Database with large buffer
the same as Main Memory Database?
No!
Complex mapping between disk and memory
E.g., traversing index blocks in buffer requires bookkeeping the mapping
between disk and memory addresses
Large Buffer
Index Blocks
Data
Blocks
Database
•
record
disk address
Log
Disk index block design is not optimized against hardware
cache misses.
4
Slide borrowed from time4change by Sang K. Cha
Cache behavior of commercial DBMS
(on Uniprocessor Pentium II Xeon)
Anastassia Ailamaki et al, DBMSs on a Modern Processor: Where
does time go?, VLDB 99
Memory related delays: 40-80% of execution time.
Data accesses on caches: 19-86% of memory stalls.
Multiprocessor cache behavior?
Probably worse because of coherence cache misses
5
Main-memory database index
structures
Plain old B+-Tree – too much data stored
in the nodes => low fanout, which incurs
cold and capacity cache misses
6
Main-memory database index
structures (2)
T-Tree – small amount of data stored in
the nodes, but traversal mainly touches
the two end keys in the node => poor L2
cache utilisation
…
…
7
Main-memory database index
structures (3)
CSB+-Tree – keeps only one child pointer per
node and combines child nodes with a common
parent into a group
Increased fanout, cache-conscious, reduces the
cache miss rate and improves the search
performance
CSB+-tree:
Does not consider
concurrent operations!
23 34 47 58
8
Concurrency control
Lock
coupling
9
Concurrency control (2)
Blink-Tree
Removes
the need for lock coupling by linking
each node to its right neighbour
10
Concurrency control (3)
Tree-Level Locking
11
Concurrency control (4)
Physical
Versioning
Use
Copy-On-Write so that updaters do
not interfere with concurrent readers
Severely limits the performance when
the update load is high
Needs garbage collection mechanism to
release the dead versions
12
OLFIT
Probability of update with 100% insert
workload (10 Million keys)
13
OLFIT (1)
Node structure
CCINFO
14
OLFIT (2)
Node read
15
OLFIT (3)
Node update
16
OLFIT (4)
Node split
?
Node deletion
Registers
the node into a garbage collector
17
Evaluation
Algorithms & parameters
18
Evaluation (1)
Search performance
19
Evaluation (2)
Insert & delete (pure update) performance
20
Evaluation (2)
Varying update ratio performance (ST)
21
Evaluation (3)
Varying update ratio performance (MT)
22
Conclusions (pros)
Good algorithm, does not interfere with
readers or other updaters
Minimises L2 cache misses
Avoids operating system locking calls
If used in a database, should put the
database transactional concurrency
control on top of it
23
Conclusions (cons)
Uses busy waiting
The evaluation only considers very small key
sizes, so busy waiting is not a problem
It would be interesting and more validating to
see the performance of this algorithm when the
key sizes are longer, as is the case with
databases. Then, the cost of busy waiting and
retries will be more pronounced
24
Questions?
25