Making B+-Trees Cache Conscious in Main memory

Download Report

Transcript Making B+-Trees Cache Conscious in Main memory

Making B+-Trees Cache
Conscious in Main Memory
Author:Jun Rao, Kenneth A. Ross
Members: Iris Zhang, Grace Yung, Kara Kwon, Jessica Wong
Outline
1. Introduction
2. Related Work
3. Cache Sensitive B+-Trees
4. Conclusion
Motivation
1. Significant portion
of execution time:
•
•
second level data
cache misses
first level
instruction cache
misses
System Hierarchy
Motivation (Cont’d)
2. CPU speeds have been
increasing at a much
faster rate than
memory speeds
• Conclusion: improving cache behavior is going to
be an imperative task in main memory data
processing
• Resolution: using memory index structure
Cache Memories
Cache memories are small fast static RAM memories
that improve performance by holding recently
referenced data.
Parameter:
• Capacity
• Block Size (cache line)
• Associativity
Memory reference:
• Hit
• Miss
Cache Optimization on Index
Structures—B+-Trees
B+-Tree (n =2)
• Height-balanced tree
• Minimum 50%
occupancy (except for
root). Each node contains
d <= m <= 2d entries.
The parameter d is called
the order of the tree.
(n=2d)
• Each node is 1 cache line
(cache-line based)
• Full pointer
Cache Optimization on Index
Structures—CSS-Trees
CSS-Tree
• Similar as B+-tree
• Eliminating child pointers
-Storing child nodes in a
fixed sized array.
-Nodes are numbered &
stored level by level, left
to right.
-Position of child node
can be calculated via
arithmetic.
-No pointer
Comparison between B+-Trees
and CSS-Trees
• Cache Line Size=12 bytes, Key Size=Pointer Size=4 bytes
• Search key =3
• B+-Tree
CSS-Tree
Comparison between B+-Trees
and CSS-Trees(cont’d)
B+ tree
• full pointer
• more cache access and
more cache misses
• efficient for updating
operation, e.g.
insertion and deletion
CSS tree
• no pointer
• fewer cache access
and fewer cache
misses
• acceptable for static
data updated in
batches
Conclusion: partial pointer elimination
Cache Sensitive B+-Trees
1. Cache Sensitive B+-Trees with One Child
Pointer
2. Segmented CSB+-Trees
3. Full CSB+-Trees
Cache Sensitive B+-Trees with
One Pointer
• Similar as B+-tree
• All the child nodes of any
given node are put into a
node group with one
pointer
• Nodes within a node
group are stored
continuously and can be
accessed using an offset to
the first node in the group
Cache Sensitive B+-Trees with
One Pointer (cont’d)
• Cache misses are reduced because a cache line can
hold more keys than B+-Trees and can satisfy one
more level comparison.
Cache Line Size=64 bytes, Key Size=Pointer Size=4 bytes
B+-Tree: 7 keys per node
CSB+-Tree: 14 keys per node
• CSB+-Tree can support incremental updates in a
way similar to B+-Tree
Operations on CSB+-Tree—
Bulkload
22|
7|
3|
13|19
30|
25|
33|
2|3 5|7 12|13 16|19 20|22 24|25 27|30 31|33 36|39
Operations on CSB+-Tree—
Insertion
1. Search the leaf node n to insert the new entry
2. If n is not full, insert the new entry in the appropriate place
3. Otherwise, split n. Let p be n’ parent node, f be the firstchild pointer in p and g be the node-group pointed by f
a. If p is not full, copy g to g' in which n is split in two
nodes. Let f point to g'
b. If p is full, copy half g to g'. Let f point to g'. Split the
node-group of p according to step a
Operations on CSB+-Tree—
Insertion (cont’d)
22|
key = 34
7|
3|
13|19
30|
25|
33|
2|3 5|7 12|13 16|19 20|22 24|25 27|30 31|33 36|39
a CSB+-Tree of Order 1
Operations on CSB+-Tree—
Insertion (cont’d)
22|
key = 34
7|
3|
13|19
30|
25|
33|36
2|3 5|7 12|13 16|19 20|22 24|25 27|30 31|33 34|36 39|
Operations on CSB+-Tree—
Search
•
Determine the rightmost key K in the node that is smaller
than the search key
•
Get the address of the child node
•
Goto first step until find the search key or there is no other
node can be checked
Search method in a node
basic approach
uniform approach
variable approach
Segmented Cache Sensitive B+Trees
• Problem: it’s time consuming
to split a node group
• Resolution:SCSB+-Tree
– method: divide node group
into two segments with one
child pointer per segment
– result: better split
performance, but worse
search
Full CSB+-Tree
• Motivation: reduce the split cost
• Method:
– pre-allocate space for a full node group
– shift part of the node group along by one node
when a node split
• Result:
– reduce the split cost, but increase the space
complexity
Conclusion
• CSB+-Trees are more cache conscious than
B+-Tree because of partial pointer
elimination
• CSB+-Trees support efficient incremental
updates, but CSS-Trees do not
• Partial pointer elimination is a general
technique which can be applied to other
memory structures