Error-Bounded Problem
Download
Report
Transcript Error-Bounded Problem
Wavelet Synopses with Predefined Error Bounds:
Windfalls of Duality
Panagiotis Karras
DB seminar, 23 March, 2006
Algorithms for Maximum-Error Wavelet Synopses
Restricted
Space-Bounded
Direct
Unrestricted
GK04,05
Guha05
GH05,06
Indirect
Muthukrishnan05
Error-Bounded
?
Compact Data Synopses useful in:
• Approximate Query Processing (exact answers not
always required)
• Learning, Classification, Event Detection
• Data Mining, Selectivity Estimation
• Situations where massive data arrives in a stream
Haar Wavelets
• Wavelet transform: orthogonal transform for the hierarchical
representation of functions and signals
• Haar wavelets: simplest wavelet system, easy to understand
and implement
• Haar tree: structure for
the visualization of
decomposition and
value reconstructions
• Synopsis: Wavelet
representation with few
non-zero terms.
34
18
0
18
18
7
-8
11
25
9
-9
16
2
26
10
10
20
20
10
0
36
16
Maximum-Error Metrics
• Error Metrics providing tight error guarantees for all
reconstructed values:
– Maximum Absolute Error
max i | dˆi di |
– Maximum Relative Error with Sanity Bound (to avoid
domination by small data values)
| dˆi d i |
max i
max{| d i |, s}
• Aim at minimization of these metrics
Restricted Synopses
• Compute Haar wavelet decomposition of D
• Preserve best coefficient subset that satisfies bound
• Space-Bounded Problem:
[GK04,05,Guha05]
Bound B on number of non-zero coefficients
• Error-Bounded Problem:
[Muthukrishnan05]
Bound ε on maximum error
Faster Indirect solution to Space-Bounded Problem
How does it work?
• Space-Bounded Problem
GK04,05: Global Tabulation
S = subset of selected
ancestors
root
EiL , b, S ,
min max
0bb
E
i
,
b
b
,
S
R
Ei, b, S min
E
i
,
b
,
S
c
,
L
i
min max
0bb1
EiR , b 1 b, S ci
+
i
iL
iR
Guha05: Local Tabulation
- Tabulate four one-dimensional arrays:
EiL ,*, S
EiR ,*, S
EiL ,*, S
i
EiR ,*, S
i
- Extract Ei,*, S from these four, delete them
- At most Olog n arrays concurrently stored
- Derive solution at the top, solve the problem again below
On 2 time, On space
How does it work?
root
root
S = subset of selected
ancestors
• Error-Bounded Problem
Muthukrishnan05
MiL , S MiR , S
Mi, S min
M
i
,
S
c
M
i
,
S
c
1
i
R
i
L
+
iL
i
iR
- At log log n levels from bottom
stop recursion, enter local
search
n
On space
- O log
time,
n
2
• No need to tabulate
• The solution to this problem is more economic
• Dual Space-Bounded solved Indirectly via binary search
Unrestricted Synopses
[GH05,06]
• Forget about actual coefficient values
• Choose a best set of non-zero wavelet terms of any values
• In practice:
Examined values are multiples of resolution step δ
EiL , b, v ,
min max
0bb
EiR , b b, v
Ei, b, v min
min max E iL , b , v z ,
z , 0bb1
EiR , b 1 b, v z
Unrestricted Synopses
[GH05,06]
• Approximation quality better than restricted
• Time asymptotically linear to n
But:
- Examined values bounded by M [GH05]
- Multiple Guesses of error result [GH06]
- Space-Bounded Problem:
Two-Dimensional Tabulation E(b,v) on each tree node
→ High Running Time and Space demands
Our Approach:
Wavelet Synopses with Predefined Error Bounds
• Error-Bounded Problem
DP algorithm:
- Demarcates examined values using error bound ε
- Tabulates only S(v), one dimension per node
• Space-Bounded Problem
Enhanced Solution:
- Calculate upper bound for error, use it to bound values
Indirect Solution:
- Use binary search on Error-Bounded problem
How does it work?
• One-dimensional tabulation on values only
Si, v min SiL , v z SiR , v z z 0
z
• Examined incoming values v bounded by error bound
vi v
• Examined assigned values z also bounded
zi ziv vi v
• Strong version of problem: minimize error within space
Complexity
• Error-Bounded Problem:
Time
2
O n
Space
n
O BM log
BM
or
2
O n log n
or
O log n n
• Space-Bounded Problem:
E 2
2
O n log B
Time
E
o n log n
vs.
Space
E
o log n n
E
n
vs. O B log n
B
2
Experiments: Error-Bounded Problem
Experiments: Error-Bounded Problem
Experiments: Space-Bounded Problem
Experiments : Space-Bounded Problem
Experiments : Space-Bounded Problem
Related Work
• M. Garofalakis and A. Kumar. Deterministic wavelet
thresholding for maximum-error metrics. PODS 2004
• S. Guha. Space efficiency in synopsis construction
algorithms. VLDB 2005
• S. Guha and B. Harb. Wavelet Synopses for Data
Streams: Minimizing Non-Euclidean Error. KDD 2005
• S. Muthukrishnan. Subquadratic algorithms for
workload-aware haar wavelet synopses. FSTTCS 2005
• S. Guha and B. Harb. Approxmation algorithms for
wavelet transform coding of data streams. SODA 2006
Thank you! Questions?