Transcript PPT
Streaming Algorithms
Piotr Indyk
MIT
Data Streams
• A data stream is a sequence
of data that is too large to be
stored in available memory
• Examples:
– Network traffic
– Sensor networks
– Approximate query
optimization and answering in
large databases
– Scientific data streams
– ..And this talk
Outline
• Data stream model, motivations,
connections
• Streaming problems and techniques
– Estimating number of distinct elements
– Other quantities: Norms, moments, heavy
hitters…
– Sparse approximations and compressed
sensing
Basic Data Stream Model
• Single pass over the data: i1, i2,…,in
– Typically, we assume n is known
• Bounded storage (typically n or logc n)
– Units of storage: bits, words, coordinates or
„elements”
(e.g., points, nodes/edges)
• Fast processing time per element
– Randomness OK (in fact, almost always necessary)
8 2 1 9 1 9 2 4 6 3 9 4 2 3 4 2 3 8 5 2 5 6 ...
Connections
• External memory algorithms
– Linear scan works best
– Streaming algorithms yield efficient
external memory algorithms
• Communication
complexity/sketching
– Low-space algorithms yield lowcommunication protocols
• Metric embeddings, sub-linear time
algorithms, pseudo-randomness,
compressive sensing
A
B
Counting Distinct Elements
• Stream elements: numbers from {1...m}
• Goal: estimate the number of distinct elements DE in
the stream
– Up to 1
– With probability 1-P
• Simpler goal: for a given T>0, provide an algorithm
which, with probability 1-P:
– Answers YES, if DE> (1+)T
– Answers NO, if DE< (1-)T
• Run, in parallel, the algorithm with
T=1, 1+, (1+)2,..., n
– Total space multiplied by log1+n ≈ log(n)/
Vector Interpretation
Stream: 8 2 1 9 1 9 2 4 4 9 4 2 5 4 2 5 8 5 2 5
Vector X:
1 2 3 4 5 6 7 8 9
• Initially, x=0
• Insertion of i is interpreted as
xi = xi +1
• Want to estimate DE(x) = the number of
non-zero elements of x
Estimating DE(x)
(based on[Flajolet-Martin’85])
Vector X:
1 2 3 4 5 6 7 8 9
Set S:
•
•
•
•
+
++
(T=4)
Choose a random set S of coordinates
–
For each i, we have Pr[iS]=1/T
Maintain SumS(x) = iS xi
Estimation algorithm I:
–
–
YES, if SumS(x)>0
NO, if SumS(x)=0
Analysis:
–
–
–
Pr=Pr[SumS(x)=0] = (1-1/T)DE
For T large enough: (1-1/T)DE ≈e-DE/T
Using calculus, for small enough:
• If DE> (1+)T, then Pr ≈ e-(1+) < 1/e - /3
• if DE< (1-)T, then Pr ≈ e-( 1-) > 1/e + /3
0.8
0.7
Pr
0.6
0.5
0.4
Series1
0.3
0.2
0.1
0
1
3
5
7
9 11 13 15 17 19
DE
Estimating DE(x) ctd.
• We have an algorithm:
– If DE> (1+)T, then Pr<1/e-/3
– if DE< (1-)T, then Pr>1/e+/3
• Need to amplify the probability of correctness:
– Select sets S1 … Sk , k=O(log(1/P)/2)
– Let Z = number of SumSj(x) that are equal to 0
– By Chernoff bound with probability >1-P
• If DE> (1+)T, then Z<k/e
• if DE< (1-)T, then Z>k/e
• Total space:
– Decision version: O(log (1/P)/2 ) numbers in range 0…n
– Estimation: : O(log(n)/ * log (1/P)/2 ) numbers in range 0…n
(the probability of error is O(P log(n)/) )
• Better algorithms known:
– Theory: O( 1/ 2 +log n) bits [Kane-Nelson-Woodruff’10]
– Practice: need 128 bytes for all works of Shakespeare , ε≈10%
[Durand-Flajolet’03]
Chernoff bound
Let X1…Xn be a sequence of i.i.d. 0-1
random variables such that Pr[Xi=1]=p.
Let X=sumi Xi Then for any eps<1 we have
Pr[ |X-pn| >eps pn] <2e-eps^2 pn/3
Comments
• Implementing S:
– Choose a uniform hash function h: {1..m} -> {1..T}
– Define S={i: h(i)=1}
– We have i S iff h(i)=1
• Implementing h
– In practice: to compute h(i) do:
• Seed=i
• Return random()
More comments
Vector X:
1 2 3 4 5 6 7 8 9
• The algorithm uses “linear sketches”
SumSj(x)=iSj xi
• Can implement decrements xi=xi-1
– I.e., the stream can contain deletions of elements
(as long as x≥0)
– Other names: dynamic model, turnstile model
More General Problem
• What other functions of a vector x can we maintain in small space ?
• Lp norms:
||x||p = ( ∑i |xi|p )1/p
– We also have ||x|| =maxi |xi|
– … and we can define ||x||0 = DE(x), since ||x||pp =∑i |xi|p→DE(x) as
p→0
• Alternatively: frequency moments Fp = p-th power of Lp norms
(exception: F0 = L0 )
• How much space do you need to estimate ||x||p (for const. ) ?
• Theorem:
– For p[0,2]:
polylog n space suffices
– For p>2:
n1-2/p polylog n space suffices and is necessary
[Alon-Matias-Szegedy’96, Feigenbaum-Kannan-Strauss-Viswanathan’99,
Indyk’00, Coppersmith-Kumar’04, Ganguly’04, Bar-Yossef-JayramKumar-Sivakumar’02’03, Saks-Sun’03, Indyk-Woodruff’05]
Estimating L2 norm: AMS
Alon-Matias-Szegedy’96
• Choose r1 … rm to be i.i.d. r.v., with
Pr[ri=1]=Pr[ri=-1]=1/2
• Maintain
Z=∑i ri xi
under increments/decrements to xi
• Algorithm I:
Y=Z2
• “Claim”: Y “approximates” ||x||22 with “good”
probability
Analysis
• We will use Chebyshev inequality
– Need expectation, variance
• The expectation of Z2 = (∑i ri xi )2 is equal to
E[Z2] = E[∑i,j rixirjxj] = ∑i,j xi x j E[rirj]
• We have
– For i≠j, E[rirj] = E[ri] E[rj] =0 – term disappears
– For i=j, E[rirj] =1
• Therefore
E[Z2] = ∑i xi2 =||x||22
(unbiased estimator)
Analysis, ctd.
• The second moment of Z2 = (∑i ri xi )2 is equal to the expectation of
Z4 = (∑i ri xi ) (∑i ri xi ) (∑i ri xi ) (∑i ri xi )
• This can be decomposed into a sum of
– ∑i (ri xi )4
→expectation= ∑i xi 4
– 6 ∑i<j (ri rj xixj )2
→expectation= 6∑i<j xi2 xj2
– Terms involving single multiplier ri xi (e.g., r1x1r2x2r3x3r4x4)
→expectation=0
Total: ∑i xi 4 + 6∑i<j xi2 xj2
• The variance of Z2 is equal to
E[Z4]-E2[Z2] = ∑i xi 4 + 6∑i<j xi2 xj2 – (∑i xi2 )2
= ∑i xi 4 + 6∑i<j xi2 xj2 – ∑i xi4 -2 ∑i<j xi2 xj2
= 4∑i<j xi2 xj2
≤ 2 (∑i xi 2 )2
Analysis, ctd.
• We have an estimator Y=Z2
– E[Y] = ∑i xi2
– σ2 =Var[Y] ≤ 2 (∑i xi 2 )2
• Chebyshev inequality:
Pr[ |E[Y]-Y| ≥ cσ ] ≤ 1/c2
• Algorithm II:
– Maintain Z1 … Zk (and thus Y1 … Yk ), define Y’ = ∑i Yi /k
– E[Y’] = k ∑i xi2 /k = ∑i x i2
– σ’2 = Var[Y’] ≤ 2k(∑i xi 2 )2 /k2 = 2 (∑i xi 2 )2 /k
• Guarantee:
Pr[ |Y’ - ∑i xi2 | ≥c (2/k)1/2 ∑i xi2 ] ≤ 1/c2
• Setting c to a constant and k=O(1/ε2) gives (1 ε)approximation with const. probability
Comments
• Only needed 4-wise independence of r1…rm
– I.e., for any subset S of {1…m}, |S|=4, and any
b{-1,1}4, we have Pr[rS=b]=1/24
– Can generate such vars from O(log m) random bits
• What we did:
– Maintain a “linear sketch” vector Z=[Z1...Zk] = R x
– Estimator for ||x||22 : (Z12 +... + Zk2)/k = ||Rx||22 /k
– “Dimensionality reduction”: x→ Rx
… but the tail somewhat “heavy”
– Reason: only used second moment of the estimator