Data Representation Methods

Download Report

Transcript Data Representation Methods

Higher Order Tries
• Key = Social Security Number.
 441-12-1135
 9 decimal digits.
• 10-way trie (order 10 trie).
0 1 2 3 4 5 6 7 8 9
Height <= 10.
Social Security Trie
• 10-way trie
 Height <= 10.
 Search => <= 9 branches on digits plus 1 compare.
• 100-way trie
 441-12-1135
 Height <= 6.
 Search => <= 5 branches on digits plus 1 compare.
Social Security Trie
• 109-way trie
 Height <= 2.
 Search => <= 1 branch on a digit plus 1 compare.
Memory Accesses
• During a search, we can compute needed field of
a branch node.
• Access only that field.
T
key = 46…
0 1 2 3 4 5 6 7 8 9
Memory Accesses
• #memory accesses = 1 per encountered branch
node + those needed for an element node.
T
key = 46…
0 1 2 3 4 5 6 7 8 9
Binary Search Trees
• Red-black tree
 Height <= 2log2109 ~ 60.
 Search => up to 60 compares of 9 digit numbers and
up to 60 memory accesses.
• AVL tree
 Height <= 1.44log2109 ~ 40.
 Search => up to 40 compares of 9 digit numbers and
up to 40 memory accesses.
• Best binary tree.
 Height = log2109 ~ 30.
Higher Order Search Trees
10
30
50
• height can be < log2 n
• #nodes accessed is reduced
• but cache misses/node increases as node
size increases
• worst-case #compares remains >= log2 n
Compressed Social Security Trie
0 1 2 3 4 5 6 7 8 9
char# #ptr
• char# = character/digit used for branching.
 Equivalent to bit# field of compressed binary trie.
• #ptr = # of nonnull pointers in the node.
Insert
Insert 012345678.
012345678
Insert 015234567.
3 2 5
012345678
Null pointer fields not shown.
015234567
Insert
3 2 5
012345678
Insert 015231671.
015234567
Insert
3 2 5
6 1 4
012345678
015231671
Insert 079864231.
015234567
Insert
2 1 7
3 2 5
079864231
6 1 4
012345678
015231671
Insert 012345618.
015234567
Insert
2 1 7
3 2 5
8 1 7
012345618
079864231
1 4
012345678
015231671
Insert 011917352.
6
015234567
Insert
2 1 7
31 2 5
079864231
011917352
8 1 7
012345618
1 4
012345678
015231671
6
015234567
Delete
2 1 7
31 2 5
079864231
011917352
8 1 7
012345618
1 4
012345678
015231671
Delete 011917352.
6
015234567
Delete
2 1 7
3 2 5
079864231
8 1 7
012345618
1 4
012345678
015231671
Delete 012345678.
6
015234567
Delete
2 1 7
3 2 5
079864231
1 4
6
012345618
015231671
Delete 015231671.
015234567
Delete
2 1 7
3 2 5
079864231
012345618
015234567
Variable Length Keys
3 2 5
Insert 0123.
1 4 6
012345678
015231671
015234567
Problem arises only when one key is a (proper)
prefix of another.
Variable Length Keys
3 2 5
5
Insert 0123.
1 4 6
4 #
012345678
0123
015231671
End of key character (#) not shown.
015234567
Variable Length Keys
One trie per length.
T[] 1 2 3 4 5 6 7 8 9 10
Array of tries.
Hashtable of tries.
Tries With Edge Information
• Add a new field (element) to each branch
node.
• New field points to any one of the element
nodes in the subtree.
• Use this pointer on way down to figure out
skipped-over characters.
Example
3 2 5
5 4 #
012345678
1 4 6
0123
015231671
element field shown in blue.
015234567
Etc.
• Expected height of an order m trie is ~logmn.
• Limit height to h (say 6). Level h branch nodes
point to buckets that employ some other search
structure for all keys in subtrie.
Etc.
• Switch from trie scheme to simple array when
number of pairs in subtrie becomes <= s (say s =
6).
 Expected # of branch nodes for an order m trie when
n is large and m and s are small is n/(s ln m).
• Sample digits from right to left (instead of from
left to right) or using a pseudorandom number
generator so as to reduce trie height.
Web Resource
• See Web writeup for additional applications
of tries.
 Prefix search.
 Automatic command (or phone number or
URL) completion.
 LZW compression.
Web Resource
• See Web writeup for alternative node
structures for tries.




Array
Chain
Binary search tree
Hash table