ws0708-lecture-12-PersistentDS

Download Report

Transcript ws0708-lecture-12-PersistentDS

Persistent Data Structures
Computational Geometry, WS 2007/08
Lecture 12
Prof. Dr. Thomas Ottmann
Khaireel A. Mohamed
Algorithmen & Datenstrukturen, Institut für Informatik
Fakultät für Angewandte Wissenschaften
Albert-Ludwigs-Universität Freiburg
Overview
• Versions and persistence in data structures
• Making structures persistent
• Partial persistence
– Fat node method
– Path-copying method
– Node-copying (DSST) method
• Revisit: Planar point-location
– Sarnak-Tarjan solution
– Dobkin-Lipton observation
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
2
Data Structures in the Temporal Sense
A data structure is called
• Ephemeral – no mechanisms to revert to previous states.
– Usually, a single transitory structure where a change to the structure
destroys the old version.
•
Persistent – supports access to multiple versions. Furthermore, a
structure is
– partially persistent if all versions can be accessed but only the newest
version can be modified, and
– fully persistent if every version can be both accessed and modified.
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
3
A Linked Data Structure
Pre-definitions:
• A linked data structure has a finite collection of nodes.
• Each node contains a fixed number of named fields.
• All nodes in the structure are of exactly the same type
• Access to the linked structure is by pointers indicating nodes of the
structure.
In our deliberations:
• We shall use the binary search tree as our linked data structure for
all running examples throughout the lecture.
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
4
Persisted Versions
Versions are directly related to the operations incurred on the data
structure, mainly:
• Update operations
• Access operations
• After an update operation, the current and all previous states of the
data structure are archived in a manner that makes them accessible
(via access operations) from their version identities.
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
5
Terminologies
•
Current version – Version vi of the data structure where a current
operation is about to be performed
•
Current operation – An update operation performed on the current
version vi of the data structure, which will result in the newest
version vi+1, spawned after a successful completion of the operation.
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
6
Making Structures Persistent: Naïve I
Naïve Structure-Copy Method
• Make a copy of the data structure each time it is changed
• At current operation:
– A new version vi+1 is spawned by completely copying the current version
– The update operation is performed on the newest version
• Costs (for structure of size n):
– Per update:
– For m updates:
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
Time
Time
Space
Space
7
Making Structures Persistent: Naïve II
Naïve Log-File Method
• Store a log-file of all updates
• At current operation:
– Update log-file
• To access version i:
– Sequentially carry out i updates, starting from the initial structure, to
generate version i.
• Costs (for structure of size n):
– Per update:
– For m updates:
– Per access:
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
Time
Time
Time
Space
Space
8
Hybrid Method
Structure-Copying with Log-file
• Store the complete sequence of updates in a log-file
• Store every kth version of the data structure, for a suitably chosen k
• To access version i:
– Retrieve structure from version k i/k
– Sequentially update structure to get version i
• Tradeoffs from Log-file method:
– Time and space requirement increase at least with a factor of
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
9
Ideals
We seek more efficient techniques:
Ideally, we want (on average) to have
• Storage space used by the persistent structure to be O(1) per
update step, and
• Time per operation to increase by only a constant factor over the
time in the ephemeral structure
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
10
Fat Node Method – Partial Persistence
• Record all changes made to the node field in the nodes themselves
• Nodes are allowed to become arbitrarily “fat” to include version
history; i.e. a list of version stamps
• A version stamp indicates the version in which the named field was
changed to the specified value
• Each fat node has its own version stamp to indicate the version in
which it was created
• However, a version stamp is not unique; i.e. several Fat nodes can
have the same version stamp
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
11
Update Operation – Fat Node Method
Consider update operation i.
Ephemeral
Persistent (Fat Node Method)
• Creates new node
• Creates new Fat node with version
stamp i, and all original field values
• Changes a field
• Store field value plus version stamp
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
12
Update Operation – Example
• (Versions 1 to 9) Insert: 5, 20, 8, 15, 6, 2, 1, 28, 12
5
2
20
1
8
28
6
15
12
• (Versions 10 to 12) Delete: 20, 5, 1
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
13
Access Operation – Fat Node Method
Accessing any version i  m in the persistent structure:
• Find the root node at version i.
• Then traverse nodes in the structure, choosing only version values
with the maximum version stamp  i.
Example: Given this persistent structure, access version v11
v1-v10
5
v6
v7
v11-v12
v2
2
20
v3
v12
1
8
v10
v5
v11
v4
6
v10
v8
28
v10
15
v9
12
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
v10
14
Analysis – Fat Node Method
Assumption: The version stamps in a Fat node are ordered and stored
in a balanced binary search tree.
Update operation
• Space per update:
• Time per update:
Access operation
• Time per access: (multiplicative slow-down)
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
15
Path-Copying Method – Partial Persistence
• Creates a set of search trees, one per update, having different roots
but sharing common subtrees
• Copy only the nodes in which changes are made, such that any
node in the current version that contains a pointer to a node must
itself be copied
• In our linked data structure, each node contains pointers to its
children
• Copying one node in the current version requires copying the entire
path from the node to the root – hence the name “Path-Copying”
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
16
Update Operation – Path-Copying
Consider update operation i.
• Identify the node in the current version that will be affected by the
update operation
• Make a copy of this node (and hence the path to the root in the
current version)
• Modify the path accordingly to the operation
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
17
Update Operation – Example (Insert)
• (Versions 1 to 9) Insert: 5, 20, 8, 15, 6, 2, 1, 28, 12
… v7
5
2
20
1
8
6
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
15
18
Update Operation – Example (Insert)
• (Versions 1 to 9) Insert: 5, 20, 8, 15, 6, 2, 1, 28, 12
… v7
v8
5
5
2
20
1
8
6
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
20
28
15
19
Update Operation – Example (Insert)
• (Versions 1 to 9) Insert: 5, 20, 8, 15, 6, 2, 1, 28, 12
… v7
5
2
20
1
8
6
v8
v9
5
5
20
20
28
8
15
15
12
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
20
Update Operation – Example (Delete)
• (Versions 10 to 12) Delete: 1, 20, 5
… v9
5
2
20
1
8
6
28
15
12
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
21
Update Operation – Example (Delete)
• (Versions 10 to 12) Delete: 1, 20, 5
… v9
v10
5
5
2
20
1
8
6
2
28
15
12
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
22
Update Operation – Example (Delete)
• (Versions 10 to 12) Delete: 1, 20, 5
… v9
5
2
20
1
8
6
v10
v11
5
5
2
28
15
8
15
12
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
23
Update Operation – Example (Delete)
• (Versions 10 to 12) Delete: 1, 20, 5
… v9
5
2
20
1
8
6
v10
v11
v12
5
5
2
2
28
15
8
15
12
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
24
Access Operation – Path-Copying
Assumption: The version roots are ordered and stored in some
accessible structure on top of all the m persisted versions.
To access any version vi:
• We only need to locate the correct root from the accessible top
structure to access the required version i
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
25
Analysis – Path-Copying Method
Update operation
• Space per update:
• Time per update:
Access operation
• Time per access:
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
26
Node-Copying Method – Partial Persistence
• An improvement to the Fat node method
• We do not allow nodes to become arbitrarily “fat”, but fix this number
• When we run out of space for version stamps, we then create a new
copy of the node
• In our deliberation, we allow only 1 additional pointer, contained in
the node and call it the version stamp modification box.
k
lp
Original left pointer
to left child with
version before vt
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
vt: ptr
rp
Version stamp
modification box
Original right pointer
to right child with
version before vt
27
Update Operation – Node-Copying
Consider update operation i.
• Identify the node in the current version that will be affected by the
update operation
• Make a copy of this node if the version stamp modification box is
not empty
• Modify the node accordingly to the operation
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
28
Update Operation – Example (Insert)
• (Versions 1 to 6) Insert: 15, 6, 2, 1, 28, 12
v0
5
20
8
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
29
Update Operation – Example (Insert)
• (Versions 1 to 6) Insert: 15, 6, 2, 1, 28, 12
v0
5
20
v2:lp
8
8
v1:rp
6
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
15
30
Update Operation – Example (Insert)
• (Versions 1 to 6) Insert: 15, 6, 2, 1, 28, 12
v0-v4
v5-v6
5
5
v3:lp
2
20
v4:lp
v2:lp
1
8
8
20
28
v1:rp
6
15
v6:lp
12
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
31
Update Operation – Example (Delete)
• (Versions 7 and 8) Delete: 1, 20
v0-v4
v5-v6
5
5
v3:lp
2
20
v4:lp
v2:lp
1
8
8
20
28
v1:rp
6
15
v6:lp
12
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
32
Access Operation – Node-Copying
Navigating through this persistent structure is exactly the same as the
Fat node method.
To access any version vi:
• Find the root node at version i.
• Then traverse nodes in the structure, choosing only version values
with the maximum version stamp  i.
Exercise: From the previous figure, access version v6
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
33
Analysis – Node-Copying Method
Update operation
• Space per update:
• Time per update:
Access operation
• Time per access:
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
34
Planar Point Location
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
35
Planar Point Location: Sarnak-Tarjan Solution
• Idea: (partial) persistence
– Query time: O(log n), Space: O(n)
– Relies on Dobkin-Lipton construction and Cole’s observation.
• Dobkin-Lipton:
– Partition the plane into vertical slabs by drawing a vertical line through
each endpoint.
– Within each slab, the lines are totally ordered.
– Allocate a search tree per slab containing the lines, and with each line
associate the polygon above it.
– Allocate another search tree on the x-coordinates of the vertical lines.
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
36
Dobkin-Lipton Construction
• Partition the plane into vertical slabs.
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
37
Dobkin-Lipton Construction
• Locate a point with two binary searches. Query time: O(log n).
• Nice but space inefficient! Can cause O(n2).
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
38
Worst-Case Example
• Θ(n) segments in each slabs, and Θ(n) slabs.
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
39
Cole’s Observation
A B
• Sets of line segments intersecting contiguous slabs are similar.
• Reduces the problem to storing a “persistent” sorted set.
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
40
Improving the Space Bound
•
•
Create the search tree for the first slab.
Then obtain the next one by deleting the lines that end at the corresponding
vertex and adding the lines that start at that vertex.
•
Total number of insertions / deletions:
– 2n
– One insertion and one deletion per segment.
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
41
Planar Point Location and Persistence
•
•
Updates should be persistent (since we need all search trees at the end).
Partial persistence is enough (Sarnak and Tarjan).
•
Method 1: Path-copying method; simple and powerful (Driscoll et al.,
Overmars).
– O(n log n) space + O(n log n) preprocessing time.
•
Method 2: Node-copying method
– We can improve the space bound to O(n).
Computational Geometry, WS 2007/08
Prof. Dr. Thomas Ottmann
42