Error-Bounded Problem

Download Report

Transcript Error-Bounded Problem

Wavelet Synopses with Predefined Error Bounds:
Windfalls of Duality
Panagiotis Karras
DB seminar, 23 March, 2006
Algorithms for Maximum-Error Wavelet Synopses
Restricted
Space-Bounded
Direct
Unrestricted
GK04,05
Guha05
GH05,06
Indirect
Muthukrishnan05
Error-Bounded
?
Compact Data Synopses useful in:
• Approximate Query Processing (exact answers not
always required)
• Learning, Classification, Event Detection
• Data Mining, Selectivity Estimation
• Situations where massive data arrives in a stream
Haar Wavelets
• Wavelet transform: orthogonal transform for the hierarchical
representation of functions and signals
• Haar wavelets: simplest wavelet system, easy to understand
and implement
• Haar tree: structure for
the visualization of
decomposition and
value reconstructions
• Synopsis: Wavelet
representation with few
non-zero terms.
34
18
0
18
18
7
-8
11
25
9
-9
16
2
26
10
10
20
20
10
0
36
16
Maximum-Error Metrics
• Error Metrics providing tight error guarantees for all
reconstructed values:
– Maximum Absolute Error


max i | dˆi  di |
– Maximum Relative Error with Sanity Bound (to avoid
domination by small data values)

 | dˆi  d i | 

max i 


 max{| d i |, s} 

• Aim at minimization of these metrics
Restricted Synopses
• Compute Haar wavelet decomposition of D
• Preserve best coefficient subset that satisfies bound
• Space-Bounded Problem:
[GK04,05,Guha05]
Bound B on number of non-zero coefficients
• Error-Bounded Problem:
[Muthukrishnan05]
Bound ε on maximum error
Faster Indirect solution to Space-Bounded Problem
How does it work?
• Space-Bounded Problem
GK04,05: Global Tabulation
S = subset of selected
ancestors
root



EiL , b, S , 
min max 




0bb


E
i
,
b

b
,
S


 R


Ei, b, S   min 






E
i
,
b
,
S

c
,


L
i
 min max





 0bb1
EiR , b  1  b, S  ci 


+
i
iL
iR
Guha05: Local Tabulation
- Tabulate four one-dimensional arrays:
EiL ,*, S 
EiR ,*, S 
EiL ,*, S  
i
EiR ,*, S  
i
- Extract Ei,*, S  from these four, delete them
- At most Olog n arrays concurrently stored
- Derive solution at the top, solve the problem again below
On 2  time, On  space
How does it work?
root
root
S = subset of selected
ancestors
• Error-Bounded Problem
Muthukrishnan05
MiL , S   MiR , S 


Mi, S   min 





M
i
,
S

c

M
i
,
S

c

1
i
R
i
 L

+
iL
i
iR
- At log log n levels from bottom
stop recursion, enter local
search

n
On  space

- O log
time,

n
2


• No need to tabulate
• The solution to this problem is more economic
• Dual Space-Bounded solved Indirectly via binary search
Unrestricted Synopses
[GH05,06]
• Forget about actual coefficient values
• Choose a best set of non-zero wavelet terms of any values
• In practice:
Examined values are multiples of resolution step δ



EiL , b, v , 
min max 



0bb
EiR , b  b, v 




Ei, b, v   min 






 
 min max E iL , b , v  z ,

 
 z , 0bb1 
EiR , b  1  b, v  z 


Unrestricted Synopses
[GH05,06]
• Approximation quality better than restricted
• Time asymptotically linear to n
But:
- Examined values bounded by M [GH05]
- Multiple Guesses of error result [GH06]
- Space-Bounded Problem:
Two-Dimensional Tabulation E(b,v) on each tree node
→ High Running Time and Space demands
Our Approach:
Wavelet Synopses with Predefined Error Bounds
• Error-Bounded Problem
DP algorithm:
- Demarcates examined values using error bound ε
- Tabulates only S(v), one dimension per node
• Space-Bounded Problem
Enhanced Solution:
- Calculate upper bound for error, use it to bound values
Indirect Solution:
- Use binary search on Error-Bounded problem
How does it work?
• One-dimensional tabulation on values only
Si, v   min SiL , v  z   SiR , v  z   z  0
z
• Examined incoming values v bounded by error bound
vi  v  
• Examined assigned values z also bounded
zi  ziv    vi  v
• Strong version of problem: minimize error within space
Complexity
• Error-Bounded Problem:
Time
   2 
O   n 
   


Space

n 


O BM log
BM 

or
   2

O   n log n 
  



or


O log n  n 


• Space-Bounded Problem:
  E 2

2
O   n log B 
  



Time
 E 


o   n log n 
  



vs.
Space
E

o log n  n 


 E 

n
vs. O   B log  n 
B
  

2
Experiments: Error-Bounded Problem
Experiments: Error-Bounded Problem
Experiments: Space-Bounded Problem
Experiments : Space-Bounded Problem
Experiments : Space-Bounded Problem
Related Work
• M. Garofalakis and A. Kumar. Deterministic wavelet
thresholding for maximum-error metrics. PODS 2004
• S. Guha. Space efficiency in synopsis construction
algorithms. VLDB 2005
• S. Guha and B. Harb. Wavelet Synopses for Data
Streams: Minimizing Non-Euclidean Error. KDD 2005
• S. Muthukrishnan. Subquadratic algorithms for
workload-aware haar wavelet synopses. FSTTCS 2005
• S. Guha and B. Harb. Approxmation algorithms for
wavelet transform coding of data streams. SODA 2006
Thank you! Questions?