Slides - Università di Pisa

Download Report

Transcript Slides - Università di Pisa

Rank-Sensitive Data Structures
Iwona Bialynicka-Birula and Roberto Grossi (Università di Pisa)
12th Symposium on String Processing and Information Retrieval (SPIRE 2005)
Overview
• Motivation
• Our results
• Solution outline
Motivation
• Output-sensitive data structures can still be too
costly
• Most often additional criteria exist
• Examples
•
•
•
•
•
•
Web pages – PageRank or similar
Geometrical objects – Z-order
Various databases – physical location
News items – time stamp
Biological databases – biological relevance
...
Rank-sensitive data structures
• Output-sensitive data structures
• Data structure holds n items
• A query reports all (say l) matching items
• Query time is O(t(n)+l)
• Rank-sensitive data structures
•
•
•
•
Additional, independent, a priori ranking function
Additional parameter k at query time
Query returns top k matching items sorted by rank
Query time is O(t(n)+k)
Rank-sensitive – key features
• Returns items sorted in rank order
• Query time depends on k and not on l
• Even if l is small with respect to n, it can still be
much too large to process (e.g. web search)
• Can be used in real-time systems
• No structure which requires sorting has this feature
Overview
• Motivation
• Our results
• Solution outline
Our model
• Tree data structures
• Result set is obtained from
• An interval of consecutive leaves or
• O(polylog(n)) such disjoint intervals
Our model – examples (1)
• Suffix tree (trie)
0
1
0
occurrences of „010”
Our model – examples (2)
• Range tree (1D)
1
2
3
5 7
11 13 17
Items in range 3; 19
19 23 29
31
Our model – examples (3)
• Range tree (2D)
Items in a 2D rectangle
Our model – examples (4)
• Hierarchy
Similar items
Our results in this model
• D – data structure as defined above
• D – size of D (in memory words)
• s(n) – number of items stored in D (incl. copies)
• O(t(n)+l) – query time
• D – rank-sensitive version of D (static case)
• O(t(n)+k) – query time
• DD + O(s(n)lgn) – for any 01
• D – rank-sensitive version of D (dynamic)
• O(t(n)+k) plus O(lgn/lglgn) per interval
• DD + O(s(n)lgn/lglgn)
• O(lgn) per copy – rank change
Overview
• Motivation
• Our results
• Solution outline
Basic idea
1
6
7
6
1
2
7
8
7
1
6
8
• O(nlgn) space
3
4
5
6
8
3
1
4
7
8
2
3
4
5
4
2
3
5
5
2
Query
1
6
7
6
1
2
7
8
7
1
6
8
3
4
5
6
8
3
1
4
7
8
2
3
4
5
4
2
3
5
5
2
• Reduced to merging O(lgn) lists on the fly
Space reduction in static case
1
6
1
7
6
0
1
0
2
1
7
0
8
1
7
0
1
6
8
• Chazelle 1988
• O(nlgn) space
3
1
4
1
5
1
6
0
8
0
3
1
1
4
7
0
8
0
2
1
3
0
4
0
5
1
4
0
2
1
3
5
5
0
2
Dynamic case
•
•
•
•
Store explicit values in lists
Weight-balanced B-tree
Degree proportional to lgn/lglgn
Dynamic fractional cascading
• Multi-Q-heaps
• Constant-depth hierarchical pipeline of heaps
Multi-Q-heaps
• Similar to Q-heap
•
•
•
•
Stores up to O(lgN/lglgN) integers
The integers are from the universe 0...O(N)
Search, find-min, insert, delete in O(1) time
Requires lookup tables of O(N) space
• Performs operations on any subset of items
• Simple implementation, no special instructions
Multi-Q-heaps in our solution (1)
3
2
®
Multi-Q 11
lgN
lglgN
2
1
Multi-Q
...
6
1
Multi-Q
7
®
lgN
lglgN
1
26
13 Multi-Q 13
16
Constant depth
®
lgN
lglgN
O(lgN )
Multi-Q-heaps in our solution (2)
• Nodes have non-constant degree
3
Multi-Q
21
3
9
1
15
Thank you
• Questions?