Sorted Maps and Skip Lists

Download Report

Transcript Sorted Maps and Skip Lists

Sorted Maps
© 2014 Goodrich, Tamassia, Goldwasser
Skip Lists
1
Sorted Maps (Ordered Maps)

Maps


exact match to the key
Sorted Maps


Allows inexact match to the key
E.g. Flights departing between a time
range
 (MLB, JFK, 3Oct, 1:00) to (MLB, JFK, 3Oct,
3:00)
 Kayak.com
© 2014 Goodrich, Tamassia, Goldwasser
Skip Lists
2
Main operations

floorEntry(k)

Entry with the greatest key less than or
equal to k
 The key “at or before” k

ceilingEntry(k)

Entry with the least key value greater than
or equal to k
 The key “at or after” k

Submap(k1,k2)

Entries between k1 and k2
© 2014 Goodrich, Tamassia, Goldwasser
Skip Lists
3
Example
Index
Element
0
1
2
3
4
5
6
A
D
F
K
M
O
Q
Search Key is G
•
F<G<K
•
•
F is the floor of G
K is the ceiling of G
© 2014 Goodrich, Tamassia, Goldwasser
Skip Lists
4
Sorted Search Tables


Sorted array by the key values
Exact match/search


Binary search
Inexact match/search

Modify binary search
 To handle “not found” differently
© 2014 Goodrich, Tamassia, Goldwasser
Skip Lists
5
Binary search (review)


Recall low and high specify the valid
range where key is
If the mid point is the key


Key is found
If low > high (no valid range)

Key is not found
© 2014 Goodrich, Tamassia, Goldwasser
Skip Lists
6
Binary Search

Binary search can perform nearest neighbor queries on an
ordered map that is implemented with an array, sorted by key




similar to the high-low children’s game
at each step, the number of candidate items is halved
terminates after O(log n) steps
Example: find(7)
0
1
3
4
5
7
1
0
3
4
5
m
l
0
9
11
14
16
18
m
l
0
8
1
1
3
3
7
19
h
8
9
11
14
16
18
19
8
9
11
14
16
18
19
8
9
11
14
16
18
19
h
4
5
7
l
m
h
4
5
7
l=m =h
© 2014 Goodrich, Tamassia, Goldwasser
Binary Search Trees
7
Binary Search (inexact)
Index
Element
0
1
2
3
4
5
6
A
D
F
K
M
O
Q
Search Key is G
1. Low=0, high=6, mid=3; G<K, check first half
© 2014 Goodrich, Tamassia, Goldwasser
Skip Lists
8
Binary Search (inexact)
Index
Element
0
1
2
3
4
5
6
A
D
F
K
M
O
Q
Search Key is G
1. Low=0, high=6, mid=3; G<K, check first half
2. Low=0, high=2, mid=1; G>D, check second half
© 2014 Goodrich, Tamassia, Goldwasser
Skip Lists
9
Binary Search (inexact)
Index
Element
0
1
2
3
4
5
6
A
D
F
K
M
O
Q
Search Key is G
1. Low=0, high=6, mid=3; G<K, check first half
2. Low=0, high=2, mid=1; G>D, check second half
3. Low=2, high=2, mid=2; G>F, check second half
© 2014 Goodrich, Tamassia, Goldwasser
Skip Lists
10
Binary Search (inexact)
Index
Element
0
1
2
3
4
5
6
A
D
F
K
M
O
Q
Search Key is G
1.
2.
3.
4.
•
•
Low=0, high=6, mid=3; G<K, check first half
Low=0, high=2, mid=1; G>D, check second half
Low=2, high=2, mid=2; G>F, check second half
Low=3, high=2; low>high, not found
F<G<K
Low=3 is the ceiling of G; high=2 is the floor of G
© 2014 Goodrich, Tamassia, Goldwasser
Skip Lists
11
Disadvantage of sorted array


Fixed maximum size
Put/remove could lead to moving a
number of entries
© 2014 Goodrich, Tamassia, Goldwasser
Skip Lists
12
Worst-case Time Complexity
n entries in a sorted array
Operation
Time Complexity
get(k)
put(k,v)
remove(k)
ceilingEntry(k)
floorEntry(k)
submap(k1,k2)
© 2014 Goodrich, Tamassia, Goldwasser
Skip Lists
13
Worst-case Time Complexity
n entries in a sorted array
Operation
Time Complexity
get(k)
O(log n)
put(k,v)
O(n)
remove(k)
O(n)
ceilingEntry(k)
O(log n)
floorEntry(k)
O(log n)
submap(k1,k2)
O(log n + s), s entries reported
O(n) if all entries are reported
© 2014 Goodrich, Tamassia, Goldwasser
Skip Lists
14
No Fixed Maximum Size

Sorted Linked List


Instead of sorted array
Can we still use binary search?
© 2014 Goodrich, Tamassia, Goldwasser
Skip Lists
15
Presentation for use with the textbook Data Structures and
Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia,
and M. H. Goldwasser, Wiley, 2014
Skip Lists
S3 -
S2 -
15
S1 -
15
23
15
23
S0 -
© 2014 Goodrich, Tamassia, Goldwasser
+
Skip Lists
10
+
+
36
+
16
Linked List and Search

Sorted array


Binary search
Sorted linked list

Can’t use binary search
 No direct access to a node


Only sequential access via the next pointer
Additional data structure
 Skip lists for faster search/updates
© 2014 Goodrich, Tamassia, Goldwasser
Skip Lists
17
What is a Skip List

A skip list for a set S of distinct (key, element) items is a series of
lists S0, S1 , … , Sh such that




S3
-
S2
-
S1
-
S0
-
Each list Si contains the special keys + and -
List S0 contains the keys of S in nondecreasing order
Each list is a subsequence of the previous one, i.e.,
S0  S1  …  Sh
List Sh contains only the two special keys
+
+
31
23
12
23
© 2014 Goodrich, Tamassia, Goldwasser
26
31
34
31
34
Skip Lists
+
64
44
56
64
78
+
18
Search for Key k

start at the first position of the top list

At the current position p, y  key(next(p))

Compare k with y
 k = y: we return element(next(p))
 k >= y: we “scan forward”
 k < y: we “drop down”


If we try to drop down past the bottom list, we return null
Example: search for 78
S3
-
S2
-
S1
-
S0
-
+
scan forward
+
31
drop down
23
12
23
© 2014 Goodrich, Tamassia, Goldwasser
26
31
34
31
34
Skip Lists
+
64
44
56
64
78
+
19
How to find the floor and ceiling of 70?
Search for Key k

start at the first position of the top list

At the current position p, y  key(next(p))

Compare k with y
 k = y: we return element(next(p))
 k >= y: we “scan forward”
 k < y: we “drop down”


If we try to drop down past the bottom list, we return null
Example: search for 78
S3
-
S2
-
S1
-
S0
-
+
scan forward
+
31
drop down
23
12
23
© 2014 Goodrich, Tamassia, Goldwasser
26
31
34
31
34
Skip Lists
+
64
44
56
64
78
+
20
Expected Height of Skip Lists

List Si+1



Randomly selected from Si based on a
“coin toss”
Si+1 has about half of the items in Si
For N entries

Expected number of skip lists (height) is
log N
© 2014 Goodrich, Tamassia, Goldwasser
Skip Lists
21
Pseudorandom Numbers

Library functions

Generate pseudorandom numbers
 Not truly random
 Starting with the same “seed”


The same sequence of pseudorandom numbers can
be generated
Simulate a coin toss


0 is tail, 1 is head
Even is tail, odd is head
© 2014 Goodrich, Tamassia, Goldwasser
Skip Lists
22
Randomized Algorithms

Based on pseudorandom numbers


Perform an arbitrary operation out of a
fixed set of operations
For example,
 operation a if odd
 operation b if even

Insertion to skip lists

uses a randomized algorithm
© 2014 Goodrich, Tamassia, Goldwasser
Skip Lists
23
Insertion: entry (x,o)

Repeatedly toss a coin until we get tails


If i  h, we add to the skip list new lists Sh+1, … , Si +1



each containing only the two special keys
For each list Sj,


i = number of times the coin came up heads
find the position pj that is right before x (floor)
Insert entry (x, o) into list Sj after position pj
Example: insert key 15, with i = 2
p2
S2 -
p1
S1 -
S0 -
S3 -
23
p0
10
23
36
© 2014 Goodrich, Tamassia, Goldwasser
+
+
S2 -
15
+
S1 -
15
23
+
S0 -
15
23
Skip Lists
10
+
+
36
+
24
Deletion: key x




search for x in the skip list and find the positions p0, p1 ,
…, pi of the items with key x, where position pj is in list Sj
remove positions p0, p1 , …, pi from the lists S0, S1, … , Si
remove all but one list containing only the two special
keys
Example: remove key 34
S3 -
+
p2
S2 -
34
p1
S1 -
S0 -
23
34
p0
12
23
34
© 2014 Goodrich, Tamassia, Goldwasser
45
+
S2 -
+
S1 -
+
S0 -
Skip Lists
+
+
23
12
23
45
+
25
Implementation


We can implement a skip list
with quad-nodes
A quad-node stores:






element
link to the
link to the
link to the
link to the
node
node
node
node
quad-node
prev
next
below
above
x
special keys PLUS_INF and
MINUS_INF
© 2014 Goodrich, Tamassia, Goldwasser
Skip Lists
26
Expected (not Worst-case)
Complexity
N entries
Expected Complexity
Height
O(log N)
Space Complexity
O(N)
Time Complexity
Search/get
O(log N)
Insertion/put
O(log N)
Deletion/remove
O(log N)
Skipping the proof (beyond the scope this course)
© 2014 Goodrich, Tamassia, Goldwasser
Skip Lists
27
Skipping the rest
© 2014 Goodrich, Tamassia, Goldwasser
Skip Lists
28
Space Usage


Depends on the random “coin toss” used by each
invocation of the insertion algorithm
Two basic probabilistic facts:
 Fact 1:
 The probability of getting i consecutive heads
when flipping a coin is (½)i =1/2i
 Fact 2:
 If each of n entries is present in a set with
probability p

the expected size of the set is np
© 2014 Goodrich, Tamassia, Goldwasser
Skip Lists
29
Space Usage

Consider a skip list with n entries



By Fact 1, we insert an entry in list Si with probability 1/2i
By Fact 2, the expected size of list Si is n/2i
The expected number of nodes used by the skip list is
h
h
n
1
=
n
 2i  2i < 2n
i =0
i =0
Thus, the expected space usage of a skip list with n
items is O(n)
© 2014 Goodrich, Tamassia, Goldwasser
Skip Lists
30
Height





Fact 3: If each of n events has probability p
 the probability that at least one event occurs is at most np
Consider a skip list with n entries
 By Fact 1, we insert an entry in list Si with probability 1/2i
 By Fact 3, the probability that list Si has at least one item is at
most n/2i
By picking i = 3log n, the probability that S3log n has at least one entry
is at most
n/23log n = n/n3 = 1/n2
Thus a skip list with n entries has height at most 3log n with
probability at least 1 - 1/n2
Highly likely, the height is O(log n)
© 2014 Goodrich, Tamassia, Goldwasser
Skip Lists
31
Time Complexity of Search



The search time in a skip list is proportional to
 the number of drop-down steps, plus
 the number of scan-forward steps
The drop-down steps are bounded by the height
 O(log n) with high probability
To analyze the scan-forward steps, we use:
Fact 4: The expected number of coin tosses required
to get tails is 2
© 2014 Goodrich, Tamassia, Goldwasser
Skip Lists
32
Time Complexity of Search


scan forward in a list (and key does not belong to a
higher list)
 A scan-forward step is associated with a former coin
toss that gave tails
By Fact 4, in each list the expected number of scanforward steps


2
Expected overall number of scan-forward steps

O(log n)
© 2014 Goodrich, Tamassia, Goldwasser
Skip Lists
33
Time Complexity of Search

Drop-down steps


O(log n)
Scan-forward steps

O(log n)

Overall: O(log n) expected time

The analysis of insertion and deletion gives similar results
© 2014 Goodrich, Tamassia, Goldwasser
Skip Lists
34
Summary


A skip list is a data
structure for maps that
uses a randomized
insertion algorithm
In a skip list with n
entries


The expected space used
is O(n)
The expected search,
insertion and deletion
time is O(log n)
© 2014 Goodrich, Tamassia, Goldwasser


Skip Lists
Using a more complex
probabilistic analysis,
one can show that
these performance
bounds also hold with
high probability
Skip lists are fast and
simple to implement in
practice
35