Transcript i th

Data Structures &
Algorithms
Radix Search
Richard Newman
based on slides by S. Sahni
and book by R. Sedgewick
Radix-based Keys
•
Key has multiple parts
•
Each part is an element of some set
•
•
•
Character
•
Numeral
Key parts can be accessed (e.g.,
string s[i])
Size of set is radix
Advantages of Radix-based Search
• Good worst-case performance
• Simpler than balanced trees, etc.
• Fast access to data
• Easy way to handle variable-length
keys
• Save space (part of key in structure)
Disadvantages of Radix-based
Search
• May be space-inefficient
• Performance depends on access to
bytes of keys
• Must have distinct keys, or other
way to handle duplicate keys
Digital Search Trees
• Similar to binary search trees
• Difference is that we use bits of the
key to determine subtree to search
• Path in tree = prefix of key
Digital Search Trees
• Insert A-S-E-R-C-H-I-N-G
Key Repr
A 00001
S 10011
E 00101
R 10010
C 00011
H 01000
I 01001
N 01110
G 00111
0
E
C
0
0
1
1
0
G
0
H
1
1
S
0
R
0
1
1
N
I
10
A
1
0
1
Note that binary tree is not
sorted in BST sense
Digital Search Trees
Prop 15.1: A search or insertion into a
DST takes about lg N comparisons
on average, and about 2 lg N
comparisons in the worst case, in a
tree built from N keys. The number of
comparisons is never more than the
number of bits in the search key.
Tries
• Use bits of key to guide search like
DST
• But keep keys in order like BST
• Allow recursive sort, etc.
• Pronounced “try-ee” or “try”
• Keys kept at leaves of a binary tree
Tries
• Defn. 15.1: A trie is a binary tree that
has keys associated with each leaf,
defined as follows:
a trie for an empty set is a null link
a trie for a single key is a leaf w/key
a trie for > 1 key is an internal node
with left link referring to trie for keys
that start with 0, right for keys 1xxx
Tries
• Insert A-S-E-R-C-H-I-N-G
Key Repr
A 00001
S 10011
E 00101
R 10010
C 00011
H 01000
I 01001
N 01110
G 00111
A
0
A
1
S
Construct tree to point
where prefixes match
Tries
• Insert A-S-E-R-C-H-I-N-G
Key Repr
A 00001
S 10011
E 00101
R 10010
C 00011
H 01000
I 01001
N 01110
G 00111
0
A
A
1
S
0 1
0 1
0 1
0 1
A
E
0 1
R
0 1
S
Construct tree to point
where prefixes match
Tries
• Insert A-S-E-R-C-H-I-N-G
Key Repr
A 00001
S 10011
E 00101
R 10010
C 00011
H 01000
I 01001
N 01110
G 00111
0
0 1
1
0 1
H
0 1
0 1
A
E
0 1
0 1
A
C
R
0 1
S
Tries
• Insert A-S-E-R-C-H-I-NG
Key Repr
A
S
E
R
C
H
I
N
G
00001
10011
00101
10010
00011
01000
01001
01110
00111
0
0 1
0 1
A
C
0 1
H
0 1
0 1
0 1
E
1
0 1
01
H
0 1
I
R
0 1
S
Tries
• Prop. 15.2: The structure of a trie is
independent of key insertion order;
there is one unique trie for any given
set of distinct keys.
• Prop. 15.3: Insertion or search for a
random key in a trie built from N
random keys takes about lg N bit
comparisons on average, in the worst
case, bounded by bits in key
Tries
• Annoying feature of tries:
• One-way branching when keys
have common prefix
• Prop. 15.4: A trie built from N
random w-bit keys has about N/lg 2
nodes on the average (about 1.44 N)
Patricia Tries
• Annoying feature of tries:
• One-way branching when keys
have common prefix
• Two different types of nodes in trie
• Patricia tries: fix both of these
•
Practical Algorithm To Retrieve
Information Coded In Alphanumeric
Patricia Tries
• Avoid one-way branching:
• Keep at each node the index of the
next bit to test
•
Skip over common prefix!
• Avoid two types of nodes:
• Store data in internal nodes
•
Replace external links with back links
Patricia Tries
Key Repr
A 00001
S 10011
E 00101
R 10010
C 00011
H 01000
I 01001
N 01110
G 00111
0
1
2
E
3
C
4
A
H
S
4
R
Patricia Tries
• Prop 15.5: Insertion or search in a
patricia trie built from N random
bitstrings takes about lg N bit
comparisons on average, and about
2 lg N in the worst case, but never
more than the length of the key.
Map
• Radix search
• Digital Search Trees
• Tries
• Patricia Tries
• Multiway tries and TSTs
• Text string algorithms
Multiway Tries
• Like radix sort, can get benefit from
comparing more than one bit at a
time
• Compare r bits, speed up search by
a factor of r
• What could possibly be bad?
• Number of links is now R=2r
• Can waste a lot of space!
Multiway Tries
• Structure is (almost) the same as
binary tries
• Except there are R branches
• Search: start at root, leftmost digit
• Follow ith link if next R-ary digit is i
• If null link, then miss
• If reach leaf, it contains only key with
prefix matching path to it - compare
Existence Tries
• Only keys, no records
• Insert/search
• Defn. 15.2: The existence trie for a
set of keys is:
• Empty set: null link
• Non-empty set: internal node with
links for each possible digit to tries
built with the leading digit omitted
Existence Tries
• Convenient to return null on miss,
dummy record on hit
• Convenient to have no duplicate keys
and no key a prefix of another key
• Keys of fixed length, or
• Use termination character with
value NULLdigit, only used as
sentinel
Existence Tries
• No need to store any data
• All keys captured in trie structure
• If reach NULLdigit at the same time
we run out of key digits, search hit
• Otherwise, search miss
• Insert: search until find null link, then
add nodes for each of the remaining
digits in the key
Existence Tries
the
time
is
now
for
f
i
o
s
r
t
n
h
o
w
i
m
e
e
Multi-way Tries
• R-ary branching
• Keys stored at leaves
• Path to leaf defines prefix of key
stored at leaf
• Only build tree downward until
prefixes become distinct
Multi-way Tries
• Defn. 15.3: The multiway trie for a set
of keys associated with leaves is:
• Set empty: null link
• Singleton set: leaf with key
• Larger set: internal node with links
for each possible digit to tries built
with the leading digit omitted
Multi-way Tries
• Def
Summary
• Radix search
• Digital Search Trees
• Tries
• Patricia Tries
• Multiway tries and TSTs
• Text string algorithms