Numerical Linear Algebra in the Streaming Model

Download Report

Transcript Numerical Linear Algebra in the Streaming Model

Numerical Linear Algebra in
the Streaming Model
Ken Clarkson - IBM
David Woodruff - IBM
The Problems
Given n x d matrix A and n x d’ matrix B, we want
estimators for
• The matrix product AT B
• The matrix X* minimizing ||AX-B||
– A slightly generalized version of least-squares regression
• Given integer k, the matrix Ak of rank k minimizing
||A-Ak||
• We consider the Frobenius matrix norm: root of sum of
squares
General Properties of Our Algorithms
• 1 pass over matrix entries, given in any order (allow multiple
updates)
• Maintain compressed versions or “sketches” of matrices
• Do small work per entry to maintain the sketches
• Output result using the sketches
• Randomized approximation algorithms
Matrix Compression Methods
• In a line of similar efforts…
– Element-wise sampling [AM01], [AHK06]
– Sketching / Random Projection: maintain a
small number of random linear combinations
of rows or columns [S06]
– Row / column sampling [DK01], [DKM04],
[DMM08]
– Usually more than 1 pass
• Here: sketching
Outline
• Matrix Product
• Regression
• Low-rank approximation
Approximate Matrix Product
• A and B have n rows, we want to estimate ATB
• Let S be an n x m sign (Rademacher) matrix
– Each entry is +1 or -1 with probability ½
– m is small, to be specified
– Entries are O(log 1/δ)-wise independent
• Sketches are STA and STB
• Our estimate of ATB is [ATS][STB]/m
• Easy to maintain sketches given updates
– O(m) time per update, O(mc log(nc)) bits of space for
STA and STB
– Output [ATS][STB]/m using fast rectangular matrix
multiplication
Expected Error and a Tail Estimate
• Using linearity of expectation,
E[ATSSTB/m] = ATE[SST]B/m = ATB
• Moreover, for δ, ε > 0, there is m = O(log 1/ δ) ε-2
so that
Pr[||ATSSTB/m-ATB|| > ε ||A|| ||B||] · δ
(again ||C|| = [Σi, j Ci, j2]1/2)
• This tail estimate seems to be new
– Follows from bounding O(log 1/δ)-th moment
of ||ATSSTB/m-ATB||
• Improves space of SarlÓs’ 1-pass algorithm by a
log c factor
Matrix Product Lower Bound
• Our algorithm is space-optimal for constant δ
– a new lower bound
• Reduction from a communication game
– Augmented Indexing, players Alice and Bob
– Alice has random x 2 {0,1}s
– Bob has random i 2 {1, 2, …, s}
• also xi+1, …, xs
• Alice sends Bob one message
• Bob should output xi with probability at least 2/3
• Theorem [MNSW]: Message must be (s) bits
on average
Lower Bound Proof
• Set s := £(cε-2log cn)
• Alice makes matrix U
– Uses x1…xs
• Bob makes matrix U’ and B
– Uses i and xi+1, …, xs
• Alg input will be A:=U+U’ and B
– A and B are n x c/2
• Alice:
– Runs streaming matrix product Alg on U
– Sends Alg state to Bob
– Bob continues Alg with A := U + U’ and B
• ATB determines xi with probability at least 2/3
– By choice of U, U’, B
– Solving Augmented Indexing
• So space of Alg must be (s) = (cε-2log cn) bits
Lower Bound Proof
• U = U(1); U(2); …, U(log (cn)); 0s
– U(k) is an £(ε-2) x c/2 submatrix with entries in {-10k, 10k}
•
•
•
•
•
– U(k)i, j = 10k if matched entry of x is 0, else U(k)i, j = -10k
Bob’s index i corresponds to U(k*)i*, j*
U’ is such that A = U+U’ = U(1); U(2); …, U(k*); 0s
– U’ is determined from xi+1, …, xs
ATB is i*-th row of U(k*)
||A|| ¼ ||U(k*)|| since the entries of A grow geometrically
ε2||A||2¢ ||B||2, the squared error, is small, so most entries of
the approximation to ATB have the correct sign
Linear Regression
• The problem: minX ||AX-B||
• X* minimizing this has X* = A-B, where A- is the
pseudo-inverse of A
• The algorithm is
– Maintain STA and STB
– Return X’ solving minX ||ST(AX-B)||
• Main claim: if A has rank k, there is an
m = O(kε-1log(1/δ)) so that with probability at
least 1- δ, ||AX’-B|| · (1+ε)||AX*-B||
– That is, relative error for X’ is small
• A new lower bound (omitted here) shows our
space is optimal
Regression Analysis
• Why should X’ be good?
• First reduce to showing that ||A(X*-X’)|| is
small
– Use normal equation ATAX* = ATB
– Implies ||AX’-B||2 = ||AX*-B||2 + ||A(X’-X*)||2
Regression Analysis Continued
• Bound ||A(X’-X*)||2 using several tools:
– Normal equation (STA)T(STA)X’ = (STA)T(STB)
– Our tail estimate for matrix product
– Subspace JL: for m = O(kε-1log(1/δ)), ST
approximately preserves lengths of all vectors in a kspace
• Overall, ||A(X’-X*)||2 = O(ε)||AX*-B||2
• But ||AX’-B||2 = ||AX*-B||2 + ||A(X’-X*)||2
Best Low-Rank Approximation
• For any matrix A and integer k, there is a
matrix Ak of rank k that is closest to A
among all matrices of rank k.
• LSI, PCA, recommendation systems,
clustering
• The sketch STA holds information about A
• There is a rank k matrix Ak’ in the rowspace
of STA so that ||A-Ak’|| · (1+ε)||A-Ak||
Low-Rank Approximation via Regression
• Why is there such an Ak’? Apply the regression
results with A ! Ak, B ! A
• The X’ minimizing ||ST(AkX-A)|| has
||AkX’-A|| · (1+ ε)||Ak X*-A||
• But here X* = I, and X’ = (ST Ak)- STA
• So the matrix AkX’ = Ak(STAk)-STA:
– Has rank k
– Is in the rowspace of STA
– Is within 1+ ε of the smallest distance of rank-k matrix
• Problem: seems to require 2 passes
1 Pass Algorithm
• Suppose R is a d x m sign matrix. By regression results
transposed, the columnspace of AR contains a good rank-k
approximation to A
• X’ minimizing ||ARX-A|| has ||ARX’-A|| · (1+ ε)||A-Ak||
• Apply regression results with A ! AR and B ! A and X’’
minimizing ||ST(ARX-A)||
• So X’’=(STAR)-STA has
||ARX’’-A|| · (1+ ε)||ARX’-A|| · (1+ε)2||A-Ak||
• Algorithm maintains AR and STA, and computes
ARX’’ = AR(STAR)-STA
• Compute best rank-k approximation to ARX’’ in the
columnspace of AR (ARX” has rank kε-2)
Concluding Remarks
• Space bounds are tight for product, regression
• Get 1-pass and O(kε-2(n+dε-2)log(nd)) space for low-rank
approximation.
– First sublinear 1-pass streaming algorithm with relative error
– Show our space is almost optimal via new lower bound (omitted)
– Total work is O(Nkε-4), where N is the number of non-zero entries
of A
– In next talk, [NDT] give a faster algorithm for dense matrices
• Improve the dependence on ε
• Look at tight bounds for multiple passes
A Lower Bound
binary string x
matrix A
index i and xi+1, xi+2, …
k ε-1 columns per block
0s
k rows
n-k rows
10, -10
-10, 10
-10,-10
…
…
-100, -100
100, 100
100, 100
…
…
0, 0 -1000
-1000,
-1000,
0,
0 1000
1000,
0,
0 1000
…
…
0, 0
10000,
-10000
-10000,
0,
0
-10000
-10000,
0,
0
-10000
…
…
Error now dominated by block of interest
Bob also inserts a k x k identity submatrix into block of interest
…
0
…
0
…
…
…
…
…
…
Lower Bound Details
Block of interest:
k rows
0s
k ε-1 columns
P*Ik
0s
n-k rows
*
Bob inserts k x k identity submatrix, scaled by large value P
Show any rank-k approximation must err on all of shaded region
So good rank-k approximation likely has correct sign on Bob’s entry