A High-Performance Interactive Tool for

Download Report

Transcript A High-Performance Interactive Tool for

A High-Performance Interactive Tool
for Exploring Large Graphs
John R. Gilbert
University of California, Santa Barbara
Aydin Buluc & Viral Shah (UCSB)
Brad McRae (NCEAS)
Steve Reinhardt (Interactive Supercomputing)
with thanks to Alan Edelman (MIT & ISC)
and Jeremy Kepner (MIT-LL)
1
Support: DOE Office of Science, NSF, DARPA, SGI, ISC
3D Spectral Coordinates
2
2D Histogram: RMAT Graph
3
Strongly Connected Components
4
Social Network Analysis in Matlab: 1993
Co-author graph
from 1993
Householder
symposium
5
Social Network Analysis in Matlab: 1993
Sparse Adjacency Matrix
Which author has
the most collaborators?
>>[count,author] = max(sum(A))
count = 32
author = 1
>>name(author,:)
ans = Golub
6
Social Network Analysis in Matlab: 1993
Have Gene Golub and Cleve Moler ever been coauthors?
>> A(Golub,Moler)
ans = 0
No.
But how many coauthors do they have in common?
>> AA = A^2;
>> AA(Golub,Moler)
ans = 2
And who are those common coauthors?
>> name( find ( A(:,Golub) .* A(:,Moler) ), :)
ans =
Wilkinson
VanLoan
7
Outline
8
•
Infrastructure: Array-based sparse graph computation
•
An application: Computational ecology
•
Some nuts and bolts: Sparse matrix multiplication
Combinatorial Scientific Computing
Emerging large scale, high-performance applications:
•
Web search and information retrieval
•
Knowledge discovery
•
Computational biology
•
Dynamical systems
•
Machine learning
•
Bioinformatics
•
Sparse matrix methods
•
Geometric modeling
•
...
How will combinatorial methods be used by nonexperts?
9
Analogy: Matrix Division in Matlab
x = A \ b;
• Works for either full or sparse A
• Is A square?
no => use QR to solve least squares problem
• Is A triangular or permuted triangular?
yes => sparse triangular solve
• Is A symmetric with positive diagonal elements?
yes => attempt Cholesky after symmetric minimum degree
• Otherwise
=> use LU on A(:, colamd(A))
10
Matlab*P
A = rand(4000*p, 4000*p);
x = randn(4000*p, 1);
y = zeros(size(x));
while norm(x-y) / norm(x) > 1e-11
y = x;
x = A*x;
x = x / norm(x);
end;
11
Star-P Architecture
Star-P
client manager
package manager
processor #1
dense/sparse
sort
processor #2
ScaLAPACK
processor #3
FFTW
Ordinary Matlab variables
processor #0
FPGA interface
MPI user code
UPC user code
...
MATLAB®
processor #n-1
server manager
matrix manager
12
Distributed matrices
Distributed Sparse Array Structure
P0
31
41
59
26
53
1
3
2
3
1
31
1
41
53
59
26
2
3
P1
P2
Pn
13
Each processor stores
local vertices & edges
in a compressed row structure.
Has been scaled to >108 vertices,
>109 edges in interactive session.
The sparse( ) Constructor
• A = sparse (I, J, V, nr, nc);
14
•
Input: ddense vectors I, J, V, dimensions nr, nc
•
Output: A(I(k), J(k)) = V(k)
•
Sum values with duplicate indices
•
Sorts triples < i, j, v > by < i, j >
•
Inverse: [I, J, V] = find(A);
Sparse Array and Matrix Operations
•
dsparse layout, same semantics as ordinary full & sparse
•
Matrix arithmetic: +, max, sum, etc.
•
matrix * matrix and matrix * vector
•
Matrix indexing and concatenation
A (1:3, [4 5 2]) = [ B(:, J) C ] ;
15
•
Linear solvers: x = A \ b; using SuperLU (MPI)
•
Eigensolvers: [V, D] = eigs(A); using PARPACK (MPI)
Large-Scale Graph Algorithms
16
•
Graph theory, algorithms, and data structures are
ubiquitous in sparse matrix computation.
•
Time to turn the relationship around!
•
Represent a graph as a sparse adjacency matrix.
•
A sparse matrix language is a good start on primitives
for computing with graphs.
•
Leverage the mature techniques and tools of highperformance numerical computation.
Sparse Adjacency Matrix and Graph

1
2
4
7
3
AT
x
5
6
ATx
• Adjacency matrix: sparse array w/ nonzeros for graph edges
• Storage-efficient implementation from sparse data structures
17
Breadth-First Search: Sparse mat * vec

1
2
4
7
3
AT
18
x
5
6
ATx
•
Multiply by adjacency matrix  step to neighbor vertices
•
Work-efficient implementation from sparse data structures
Breadth-First Search: Sparse mat * vec

1
2
4
7
3
AT
19
x
5
6
ATx
•
Multiply by adjacency matrix  step to neighbor vertices
•
Work-efficient implementation from sparse data structures
Breadth-First Search: Sparse mat * vec


1
2
4
7
3
AT
20
x
5
6
ATx (AT)2x
•
Multiply by adjacency matrix  step to neighbor vertices
•
Work-efficient implementation from sparse data structures
SSCA#2: “Graph Analysis” Benchmark
(spec version 1)
Fine-grained, irregular data access
Searching and clustering
21
•
Many tight clusters, loosely interconnected
•
Input data is edge triples < i, j, label(i,j) >
•
Vertices and edges permuted randomly
Clustering by Breadth-First Search
• Grow local clusters from many seeds in parallel
• Breadth-first search by sparse matrix * matrix
• Cluster vertices connected by many short paths
% Grow each seed to vertices
%
reached by at least k
%
paths of length 1 or 2
C = sparse(seeds, 1:ns, 1, n, ns);
C = A * C;
C = C + A * C;
C = C >= k;
22
Toolbox for Graph Analysis
and Pattern Discovery
Layer 1: Graph Theoretic Tools
23
•
Graph operations
•
Global structure of graphs
•
Graph partitioning and clustering
•
Graph generators
•
Visualization and graphics
•
Scan and combining operations
•
Utilities
Typical Application Stack
Computational ecology, CFD, data exploration
Applications
CG, BiCGStab, etc. + combinatorial preconditioners (AMG, Vaidya)
Preconditioned Iterative Methods
Graph querying & manipulation, connectivity, spanning trees,
geometric partitioning, nested dissection, NNMF, . . .
Graph Analysis & PD Toolbox
Arithmetic, matrix multiplication, indexing, solvers (\, eigs)
Distributed Sparse Matrices
24
Landscape Connnectivity Modeling
25
•
Landscape type and features facilitate or impede
movement of members of a species
•
Different species have different criteria, scales, etc.
•
Habitat quality, gene flow, population stability
•
Corridor identification, conservation planning
Pumas in Southern California
Habitat quality model
Joshua Tree N.P.
L.A.
Palm Springs
26
Predicting Gene Flow with Resistive Networks
N = 100 m = 0.01
Genetic vs. geographic distance:
Circuit model predictions:
27
Early Experience with Real Genetic Data
•
Good results with wolverines,
mahogany, pumas
•
Matlab implementation
•
Needed:
28
–
Finer resolution
–
Larger landscapes
–
Faster interaction
5km resolution(too coarse)
Combinatorics in Circuitscape
29
•
Initial grid models connections to 4 or 8 neighbors.
•
Partition landscape into connected components with
GAPDT
•
Graph contraction from GAPDT contracts habitats into
single nodes in resistive network. (Need current flow
between entire habitats.)
•
Data-parallel computation on large graphs - graph
construction, querying and manipulation.
•
Ideally, model landscape at 100m resolution (for pumas).
Tradeoff between resolution and time.
Numerics in Circuitscape
30
•
Resistance computations for pairs of habitats in the
landscape
•
Direct methods are too slow for largest problems
•
Use iterative solvers via Star-P:
–
Hypre (PCG+AMG)
–
Experimenting with support graph preconditioners
Parallel Circuitscape Results
Pumas in southern California:
–
12 million nodes
–
Under 1 hour (16 processors)
–
Original code took 3 days at
coarser resolution
Targeting much larger problems:
–
31
Yellowstone-to-Yukon corridor
Figures courtesy of Brad McRae, NCEAS
Sparse Matrix times Sparse Matrix
•
32
A primitive in many array-based graph algorithms:
–
Parallel breadth-first search
–
Shortest paths
–
Graph contraction
–
Subgraph / submatrix indexing
–
Etc.
•
Graphs are often not mesh-like, i.e. geometric locality
and good separators.
•
Often do not want to optimize for one repeated
operation, as in matvec for iterative methods
Sparse Matrix times Sparse Matrix
•
33
Current work:
–
Parallel algorithms with 2D data layout
–
Sequential hypersparse algorithms
–
Matrices over semirings
ParSpGEMM
J
K
B(K,J)
K
I
*
=
C(I,J)
A(I,K)
C(I,J) += A(I,K)*B(K,J)
• Based on SUMMA
• Simple for non-square
matrices, etc.
34
How Sparse? HyperSparse !
nnz(j) =
p
c
0
p
nnz(j) = c
p blocks
 Any local data structure that depends on local submatrix
dimension n (such as CSR or CSC) is too wasteful.
35
SparseDComp Data Structure
36
•
“Doubly compressed” data structure
•
Maintains both DCSC and DCSR
•
C = A*B needs only A.DCSC and B.DCSR
•
4*nnz values communicated for A*B in the worst case
(though we usually get away with much less)
Sequential Operation Counts
•
Matlab: O(n+nnz(B)+f)
•
SpGEMM: O(nzc(A)+nzr(B)+f*logk)
Required non- zero
operations (flops)
Number of columns
of A containing at
least one non-zero
Break-even
point
37
Parallel Timings
time vs n/nnz, log-log plot
• 16-processor Opteron,
hypertransport,
64 GB memory
• R-MAT * R-MAT
• n = 220
• nnz = {8, 4, 2, 1, .5} * 220
38
Matrices over Semirings
•
Matrix multiplication C = AB (or matrix/vector):
Ci,j = Ai,1B1,j + Ai,2B2,j + · · · + Ai,nBn,j
•
Replace scalar operations  and + by
 : associative, distributes over , identity 1
 : associative, commutative, identity 0 annihilates under 
39
•
Then Ci,j = Ai,1B1,j  Ai,2B2,j  · · ·  Ai,nBn,j
•
Examples: (,+) ; (and,or) ; (+,min) ; . . .
•
Same data reference pattern and control flow
Remarks
• Tools for combinatorial methods built on parallel
sparse matrix infrastructure
• Easy-to-use interactive programming environment
– Rapid prototyping tool for algorithm development
– Interactive exploration and visualization of data
• Sparse matrix * sparse matrix is a key primitive
• Matrices over semirings like (min,+) as well as (+,*)
40