Transcript Slide 1

Solving SDD Linear Systems
in Nearly mlog1/2n Time
Richard Peng
M.I.T.
A&C Seminar, Sep 12, 2014
OUTLINE
•
•
•
•
The Problem
Approach 1
Approach 2
Combining these
LARGE GRAPHS
Images
Roads
Social
networks
Meshes
Algorithmic challenges:
How to store?
How to analyze?
How to optimize?
LAPLACIAN PARADIGM
Graph
Electrical network
1Ω
1Ω
2Ω
Linear systems
GRAPH LAPLACIAN
Row/column  vertex
Off-diagonal  -weight
Diagonal  weighted degree
1Ω
1Ω
2Ω
Electrical network/
Weighted graph
Properties
• Symmetric
• Row/column sums = 0
• Non-positive off-diagonals
CORE ROUTINE
Input: graph Laplacian L, vector b
Output: vector x s.t. Lx ≈ b
To measure performance:
n: dimension, # vertices
m: nnz, O(# of edges)
APPLICATIONS
Directly related:
SDD, M matrices
Elliptic PDEs
Few iterations:
eigenvectors,
heat kernels
Clustering
Many iterations:
Combinatorial
graph problems
Inference
Image processing
Random trees
Flow / matching
GENERAL LINEAR SYSTEM SOLVES
Oldest studied algorithmic problem
• [Liu 179]…[Newton `1710]
[Gauss 1810]: O(n3)
• [HS `52]: conjugate gradient, O(nm)*
• [Strassen `69] O(n2.8)
• [Coopersmith-Winograd `90] O(n2.3755)
• [Stothers `10] O(n2.3737)
• [Vassilevska Williams`11] O(n2.3727)
COMBINATORIAL PRECONDITIONING
1Ω
1Ω
2Ω
Use the connection to graphs to
design numerical solvers
• [Vaidya`91]: O(m7/4)
Subsequent improvements:
• [Elkin-Emek-Spielman-Teng
`05]
• [Batson-Spielman-Srivastava
`09]
• [Boman-Hendrickson`01]
O(mn)
• [Andersen-Chung-Lang
`06]
• `03]
[Kolla-Makarychev-Saberi-Teng
`10]
• [Spielman-Teng
O(m1.31)
• [Koutis-Miller `07]
[Koutis-Miller-P
• [Spielman-Teng• `04]
O(mlogc`10,
n) `11]
•
•
•
[Spielman-Srivastava `08]
[Abraham-Bartal-Neiman `08]
[Andersen-Perez `09]
…
•
•
•
[Orecchia-Vishnoi `11]
[Abraham-Neiman `12]
[OveisGharan-Trevisan `12]
…
ZENO’S DICHOTOMY PARADOX
O(mlogcn)
Fundamental theorem of Laplacian solvers:
improvements decrease c by factor between [2,3]
2004: 70
2010: 6
2006: 32
2009: 15
c
2010: 2
2011: 1
2014: 1/2
OPT: 0?
[Miller]: Instance of speedup
theorem?
OUTLINE
•
•
•
•
The Problem
Approach 1
Approach 2
Combining these
WHAT IS A SOLVE?
What is b = Lx?
3x3 – x2 – 2x1
= (x3 – x2) + 2(x3 – x1)
1Ω
x2
1Ω
2Ω
x3
Flows on edges
Ohm’s law: current =
voltage × conductance
b: residue of electrical current
x: voltages
at vertices
WHAT IS A SOLVE?
find voltages x whose flow meets demand b
Intermediate object: flows on edge, f
[Kirchoff 1845, Doyle-Snell `84]:
f minimizes energy dissipation
Energy of f = Σer(e)f(e)2
WHAT MAKES SOLVES HARD
Densely connected
graphs, need:
• sampling
• error reduction
Long paths, need:
• Divide-and-conquer
• Data structures
Solvers need to combine both
KELNER-ORECCHIA-SIDFORD-ZHU `13:
ELECTRICAL FLOW SOLVER
Start with some flow f meeting demand b
Push flows along cycles in the
graph to reduce energy
`
[KOSZ `13]: energy of f approaches optimum
Algorithmic challenges:
How many cycles? Ω(m)
How big is each cycle? Ω(n)
HOW TO FIND CYCLES?
`
`
Cycle = tree + edge:
• Pick a tree
• Sample off-tree edges
`
HOW TO SAMPLE EDGES
[KMP `10, `11, KOSZ `13]: Sample
edges w.p. proportional to stretch
Stretch = 4
• Stretch = length of edge /
length of path in tree
• Unweighted graph:
length of tree path
`
Stretch = 3
Key quantity: total stretch, S
[KOSZ `13]: O(m + S) cycles halves error in expectation
WHAT’S A LOW STRETCH TREE?
n1/2-by-n1/2 unit weighted mesh
Candidate 1:
2: ‘haircomb’:
recursive C ‘fractal’
Stretch
stretch(e)=
= n1/2 O(1)
1/2)
But stretch(e)=Ω(n
only n1/2 such edges,
Contribution = O(n) 3/2
total stretch = Ω(n )
• shortestO(logn)
path tree
such layers due to fractal,
• max weight
Total spanning
= O(nlogn)
tree
LOW STRETCH TREES
Runtime
AKPW `91
O(m log log n) exp((log n log log n)1/2)
Bartal 96, 98 O(m log n)
FRT `03
EEST `05
ABN `08
AN `12
Stretch
O(log n log log n)
[CKMPX`14]:
O(m log3 n)
O(log n)
Construction in O(mloglogn) time,
log 2Sn= O(mlogn)
O(log2 n log log n)
willO(m
assume
(actual bounds more intricate)
O(m log n)
O(log n (log log n)c)
O(m log n)
O(log n log log n)
Bartal / FRT trees have steiner vertices
[KOSZ `13]: embeddable trees are ok!
[KOSZ `13] IN A NUTSHELL
[KOSZ `13]
Converges on
Flows
Sampling Distribution Stretch
Sample size
1
# updates
O(S + m)
Cost per update
O(n)
O(logn)
using data structures
Total (S = O(mlogn))
O(mnlogn)
O(mlog2n)
OUTLINE
•
•
•
•
The Problem
Approach 1
Approach 2
Combining these
NUMERICAL VIEW OF SOLVERS
Iterative methods: given H
similar to G, solve LGx = b
by solving several LHy = r
Similar: LG ≼ LH ≼ kLG
≼ : Loewner ordering, A ≼ B  xTAx ≤ xTBx for all x
Chebyshev iteration:
If LG ≼ LH ≼ kLG, can halve error in LGx = b by
solving O(k1/2) problems in H to O(1/poly(k))
accuracy
NUMERICAL VIEWS OF SOLVERS
Chebyshev iteration (ignoring errors):
If LG ≼ LH ≼ kLG, can solve problem in
G by solving O(k1/2) problems in H
Preconditioner construction view:
• Given G, find H that’s easier to
solve such that LG ≼ LH ≼ kLG
Perturbation stability view:
• Can adjust edge weights by factor of [1,k]
• Need to solve O(k1/2) problems on resulting graph
GENERATING H
[KOSZ `13]: sample 1 edge, reduce error by 1-1/(m + S)
O(m + S) samples halves error
Matrix Chernoff: O(S logn) samples gives LG ≼ 2LH ≼
3LG
.
`
`
Can halve error in LGx=b via. O(1) solves in H
HOW DOES THIS HELP?
Go from G to H with m’ = O(Slogn) off tree edges
S = O(mlogn), m’ > m
`
`
[KMP `10]: take perturbation stability view,
scale up tree by some factor k
• factor k distortion, O(k1/2) iterations
• S’ = S/k, total stretch decrease by factor of k
SIZE REDUCTION
Scale up tree by factor of k so S’ = S/k
Sample m’ = O(S’logn) = O(Slogn/k) edges
•
•
`
`
•
•
•
n - 1 tree edges + m’ off-tree edges
Repeatedly remove degree 1 and 2 vertices
New size: O(Slogn/k)
T(m, s) = O(k1/2)(m + T(Slogn / k, S/k))
(can show: errors don’t accumulate in recursion)
TWO TERM RECURRENCE?
W-cycle algorithm:
T(m, s) = O(k1/2)(m + T(Slogn / k, S/k))
These are really the same parameter!
Key invariant from [KMP `11]: ‘spine-heavy’: S = m/O(logn)
TIME ON SPINE HEAVY GRAPHS
T(m) = O(k1/2)(m + T(m / k)) = O(m)
Low stretch spanning tree: Sini = mlogn
Initial scaling: O(log2n), O(logn) overhead
More important runtime parameter:
how small of S to get O(m) runtime?
NUMERICAL SOLVER
[KMP `10, `11]
Converges on
Vectors
Sampling Distribution Stretch
Cost per update
O(1) amortized*
O(m) ‘steps’ when
S = O(m/logn)
*All updates are
matrix-vector
multiplications
Increase when SkS O(k1/2)
Total (S = O(mlogn))
O(mlogn)
Byproduct of this view: solving to O(logcn) error is
`easy’, expected convergence become w.h.p.
via. checking with O((loglogn)c) overhead
NUMERICAL VS. COMBINATORIAL
Converges on
[KMP `10, `11]
[KOSZ `13]
Vectors
Flows
Sampling Distribution Stretch
Stretch
Cost per update
O(logn)
O(m) ‘steps’ when
O(1)
amortized
S = O(m/logn)
S = O(m)
Increase when SkS O(k1/2)
O(k)
Total (S = O(mlogn)) O(mlogn)
O(mlog2n)
COMMONALITY: RANDOMNESS
Reason: need to handle complete graph,
easiest expander constructions are randomized
[KMP `10, `11]
[KOSZ `13]
Sampling Distribution Stretch
Stretch
Analysis Method
Matrix Chernoff + black
box numerical methods
Expected convergence,
single potential function
Sample Count
arbitrary, O(m)
1
Overhead needed
for convergence
O(logn) (from union
bound on n dimensions)
O(1)
Analogs
Multigrid methods
Preconditioning
Stochastic methods
Kaczmarz iteration
Randomized descent
Stretch is the ‘right’ parameter when we use
trees due to the Sherman-Morrison formula
NUMERICAL VS. COMBINATORIAL
[KMP `10, `11]
[KOSZ `13]
Cost per update
O(1) amortized O(logn)
O(m) ‘steps’ when
S = O(m/logn)
S = O(m)
Increase when
SkS
O(k1/2)
O(k)
•
•
•
•
•
+ Sublinear sample count
+ Recursive, O(1) per udpate
- LG ≼ LH ≼ kLG, O(logn)
overhead
+ Overhead: (S / m)1/2
- Recursive error propagation
•
•
•
•
•
- Ω(m) samples
- Data structure, O(logn)
+ Adaptive convergence
- Linear dependence on S
+ Simple error propagation
COMBINE THESE?
Can the adaptive convergence guarantees
work with the recursive Chebyshev?
Consequence : T(m, s) = O(k1/2)(m + T(S / k, S / k)))
This Talk
Converges on
Vectors and Flows
Sampling Distribution Stretch
Cost per update
O(1) amortized
O(m) ‘steps’ when
S = O(m)
Increase when SkS O(k1/2)
Total (S = O(mlogn)) O(mlog1/2n)
OUTLINE
•
•
•
•
The Problem
Approach 1
Approach 2
Combining these
DIRECT MODIFICATION
Use Chebyshev iteration
Thought`13]?
Experiment
[KOSZoutside
`13] of [KOSZ
Converges on
Flows
????????????
Sampling Distribution Stretch
Stretch
Cost per update
O(logn)
O(logn)
O(m) ‘steps’ when
S = O(m)
S = O(m)
Increase when SkS O(k)
O(k1/2)
Total (S = O(mlogn)) O(mlog2n)
O(mlog3/2n)
Within (loglogn)c of [LS`13], which
also uses Chebyshev-like methods
Our analyses do make this rigorous
CHALLENGES
[KMP `10, `11]
[KOSZ `13]
Sample size
m/k
1
Converges on
Vectors
Flows
Cost per
O(1)
O(logn)
update
amortized
1. Multiple updates, instead of a few
2. Flows vs. voltages
3. Remove data structure overhead
BATCHED CYCLE UPDATES
This only gives flows,
`
Chebyshev iteration
works with voltages
`
•
•
•
Each cycle updates lowers flow energy
Changing one after another is still valid
flow update when considering both
Updating together can only be better!
MULTI-EDGE SAMPLING
`
Expected error reduction from [KOSZ `13]
extends to multiple edges, in voltage space
Major modifications:
• Voltages instead of flows
• Sublinear # of samples
• O(m + S)  O(S)
• O(m) term goes into iterative method
• Only prove this for iterative refinement
• Reduce to black-box Chebyshev
• Simpler analysis, at cost of (loglogn)c
WHAT HAPPENED TO THE DATA
STRUCTURE?
Path updates/queries in
a tree in O(logn) time
Amortized into:
• Vertex elimination
• Recursion
GETTING RID OF LOG
Recursive Chebyshev: T(m) = k1/2(m + T(m / k))
Call structure is ‘bottom light’:
• Total size of bottom level calls is o(m)
• Cost dominated by top level size, O(m),
instead of being same at all O(logn) levels
MLOG1/2N
[KMP `10, `11]
[KOSZ `13] [CKMPPRX`14]
Converges on
Vectors
Flows
Vectors/Flows
Sampling
Distribution
Stretch
Stretch
Stretch
Cost per update
O(1) amortized O(logn)
O(1) amortized
O(m) ‘steps’ when
S = O(m/logn)
S = O(m)
S = O(m/(loglogn)c)
Increase when SkS O(k1/2)
O(k)
O(k1/2)
Total (S = O(mlogn))
O(mlog2n) O(mlog1/2 (loglogn)cn)
O(mlogn)
Expected convergence as shown in [KOSZ `13]
• Is a general phenomenon
• Interacts well with numerical algorithms in
recursive settings
HIDDEN UNDER THE RUG
• Error propagation in recursion
• Vector based analysis of Chebyshev
iteration (instead of operators)
• Numerical stability analysis
• Construction of better trees by
discounting higher stretch
OPEN PROBLEMS
loglogn factors: how many are real?
Extend this analysis?
Numerical stability: double precision ok?
O(m) time algorithm?
(maybe with O(mlogn) pre-processing?)
• Other notions of stretch?
• Beyond Laplacians
• Code packages 2.0.
•
•
•
•
THANK YOU!
Questions?