Graph models and applications
Download
Report
Transcript Graph models and applications
Jo Ellis-Monaghan
St. Michaels College, Colchester, VT 05439
e-mail: [email protected] website: http://academics.smcvt.edu/jellis-monaghan
Graphs and Networks
A Graph or Network is a set of vertices (dots) with edges (lines)
connecting them.
A
A
A
B
B
B
A multiple edge
D
C
C
D
D
C
A loop
Two vertices are adjacent if there is a line between them. The vertices A and
B above are adjacent because the edge AB is between them. An edge is
incident to each of the vertices which are its end points.
The degree of a vertex is the number of edges sticking out from it.
The Kevin Bacon Game
or
6 Degrees of separation
Bacon
Number
http://www.spub.ksu.edu/issues/v100/FA/n069/feamaking-bacon-fuqua.html
Kevin Bacon is not even
among the top 1000 most
connected actors in Hollywood
(1222th).
# of
People
Connery
Number
# of
people
0
1
0
1
1
1766
1
2216
2
141840
2
204269
3
385670
3
330591
4
93598
4
32857
5
7304
5
2948
6
920
6
409
7
115
7
46
8
61
8
8
Total number of linkable
actors: 631275
Weighted total of linkable
actors: 1860181
Average Bacon number:
2.947
Average Connery
Number: 2.706
Data from The Oracle of
Bacon at UVA
Maximal Matchings in Bipartite Graphs
A Bipartite Graph
Start with any matching
Start at an
unmatched
vertex on the left
End at an unmatched
vertex on the right
Find an alternating path
A maximal
matching!
Switch matching to
nonmatching and vice
versa
The small world phenomenon
http://mathforum.org/mam/04/poster.html
Stanley Milgram sent a
series of traceable letters
from people in the
Midwest to one of two
destinations in Boston.
The letters could be sent
only to someone whom
the current holder knew
by first name. Milgram
kept track of the letters
and found a median
chain length of about six,
thus supporting the
notion of "six degrees of
separation."
Social Networks
•Stock Ownership (2001 NY
Stock Exchange)
•Children’s Social Network
•Social Network of Sexual
Contacts
http://mathforum.org/mam/04/poster.html
Infrastructure and Robustness
Number of vertices
Scale Free
Vertex degree
JetBlue
Number of vertices
Distributed
MapQuest
Vertex degree
Rolling Blackouts inAugust 2003
http://encyclopedia.thefreedictionary.com/_/viewer.aspx?path
=2/2f/&name=2003-blackout-after.jpg
Some Networks are more robust than others.
But how do we measure this?
http://www.caida.org/tools/visualization/mapnet/Backbones/
A network modeled by a graph
(electrical, communication, transportation)
Question: If each edge operates independently with probability p, what
is the probability that the whole network is functional?
t
s
A functional network
(can get from any vertex to any
other along functioning edges)
A dysfunctional network
(vertices s and t can’t
communicate)
Deletion and Contraction is a Natural
Reduction for Network Reliability
If an edge is working (this happens
with probability p), it’s as thought the
two vertices were “touching”—i.e.
just contract the edge:
If an edge is not working (this
happens with probability 1-p), it
might as well not be there—i.e. just
delete it:
Thus, if R(G;p) is the reliability of the network G where all edges
function with a probability of p, and e is not a bridge nor a loop, then
R(G;p) =(1-p)R(G-e;p) + p R(G/e;p)
Reliability Example
Note that if every edge of the network is a bridge (i.e. the network is a disjoint
union of trees), then R(G;p) = (p)E, where E is the number of edges.
Also note that R(loop;p) = 1
E.g.:
(1-p)
= (1-p)p2 + p (1-p)
+ p
+pp
= (1-p)p2 + p(1-p)p + p2
So R(G;p) = 3p2- 2p3 gives the probability that the network is functioning.
E.g. R(G; .5)=.5625
Bothersome question: Does the order in which the edges are deleted and
contracted matter?
Conflict Scheduling
A
A
E
B
D
C
Draw edges between classes
with conflicting times
E
B
D
C
Color so that adjacent vertices have different colors.
Minimum number of colors = minimum required
classrooms.
Coloring Algorithm
The Chromatic Polynomial counts the ways to vertex color a graph:
C(G, n ) = # proper vertex colorings of G in n colors.
G
G-e
G\e
+
=
Recursively: Let e be an edge of G . Then,
C; n n
CG; n C(G e; n) CG \ e; n
=
-
=
n(n-1)2 +
= n(n-1)2 +n(n-1) + 0 = n2 (n-1)
+
Conflict Scheduling
Frequency Assignment
Assign frequencies to mobile radios and
other users of the electromagnetic
spectrum. Two customers that are
sufficiently close must be assigned
different frequencies, while those that
are distant can share frequencies.
Minimize the number of frequencies.
Register Allocation
Assign variables to hardware registers
during program execution. Variables
conflict with each other if one is used both
before and after the other within a short
period of time (for instance, within a
subroutine). Minimize the use of nonregister memory.
Vertices: users of mobile radios
Edges: between users whose
frequencies might interfere
Colors: assignments of different
frequencies
Vertices: the different variables
Edges: between variables which conflict
with each other
Colors: assignment of registers
Need at least as many frequencies as the
minimum number of colors required!
Need at least as many registers as the
minimum number of colors required!
The Ising Model
Consider a sheet of Metal:
It has the property that at low temperatures it is magnetized, but as the
temperature increases, the magnetism “melts away”.
We would like to model this behavior. We make some simplifying
assumptions to do so—
•The individual atoms have a “spin”, i.e., they act like little bar
magnets, and can either point up (a spin of +1), or down (a spin of –1).
•Neighboring atoms with different spins have an interaction energy,
which we will assume is constant.
•The atoms are arranged in a regular lattice.
At low temperature “coalescing” states are more probable and there is
non-zero magnetization
As the temperature rises, the states become more random, and the
magnetization “melts away”
Applet by Peter Young at http://bartok.ucsc.edu/peter/java/ising/keep/ising.html
Magnetization =
1
N
si
1
, Energy = N si s j
where N is the number of lattice points.
Critcal Temperature is
2 ln(1 2)
Lattice and Hamiltonian
A choice of spins at each point gives what is called a “state” of the lattice:
The Hamiltonian (total energy) of a state w is
H w f si , s j
where the sum is over all adjacent points, and f is 0 if the spins are the
same and 1 if they are different.
H(w) is just the total number of edges in the state with different spins on
their endpoints.
A Little Thermodynamics
e H ( w)
The probability of a state occurring is:
e
H w
all states w
Here
1
, where T is the temperature and k is the Boltzman constant 1.38 10
23
joules/Kelvin.
kT
The numerator is easy. The denominator, called the partition function is the
interesting (hard) piece.
It has a deletion-contraction reduction!
Let
P G;
e
H w
. Then
all states w
P G; P G e; e 1 P G / e;
Rectilinear pattern recognition
joint work with J. Cohn (IBM), R. Snapp and D. Nardi (UVM)
IBM’s objective is to check a chip’s design and find all occurrences of a
simple pattern to:
– Find possible error spots
– Check for already patented segments
– Locate particular devices for updating
The Haystack
The Needle…
Pre-Processing
BEGIN
/* GULP2A CALLED ON THU FEB 21 15:08:23 2002 */
EQUIV 1 1000 MICRON +X,+Y
MSGPER -1000000 -1000000 1000000 1000000 0 0
HEADER GYMGL1 'OUTPUT 2002/02/21/14/47/12/cohn'
LEVEL PC
LEVEL RX
CNAME ULTCB8AD
(Raw data format)
CELL ULTCB8AD PRIME
PGON N RX 1467923 780300 1468180 780300 1468180 780600 +
1469020 780600 1469020 780300 1469181 780300 1469181 +
781710 1469020 781710 1469020 781400 1468180 781400 +
1468180 781710 1467923 781710
PGON N PC 1468500 782100 1468300 782100 1468300 781700 +
1468260 781700 1468260 780300 1468500 780300 1468500 +
780500 1468380 780500 1468380 781500 1468500 781500
RECT N PC 1467800 780345 1503 298
ENDMSG
Two different layers/rectangles
are combined into one
layer that contains three shapes;
one rectangle (purple)
and two polygons (red and blue)
Algorithm is cutting edge, and not currently used for this application in industry.
Linear time subgraph search for
target
Both target pattern and entire chip are encoded like this, with the vertices
also holding geometric information about the shape they represent. Then
we do a depth-first search for the target subgraph. The addition information
in the vertices reduces the search to linear time, while the entire chip
encoding is theoretically N2 in the number of faces, but practically NlogN.
Netlist Layout
(joint work with J. Cohn, A. Dean, P. Gutwin, J. Lewis, G. Pangborn)
How do we convert this…
… into this?
Netlist
A set S of vertices ( the pins) hundreds of thousands.
A partition P1 of the pins (the gates) 2 to 1000 pins per gate, average of
about 3.5.
A partition P2 of the pins (the wires) again 2 to 1000 pins per wire,
average of about 3.5.
A maximum permitted delay between pairs of pins.
Example
Gate
Pin
Wire
The Wires
The Wiring Space
Placement layergates/pins go here
Horizontal wiring
layer
Vias (vertical
connectors)
Vertical wiring
layer
Up to 12 or so layers
The general idea
Place the pins so that pins are in
their gates on the placement
layer with non-overlapping
gates.
Place the wires in the wiring
space so that the delay
constrains on pairs of pins
are met, where delay is
proportional to minimum
distance within the wiring,
and via delay is negligible
Lots of ProbLems….
Identify Congestion
Identify dense substructures from the netlist
Develop a congestion ‘metric’
B
D
A
F
C
E
G
H
Congested
area
Congested area
What often happens
What would be good
Automate Wiring Small Configurations
Some are easy to place and route
Simple left to right logic
No / few loops (circuits)
Uniform, low fan-out
Statistical models work
Some are very difficult
E.g. ‘Crossbar Switches’
Many loops (circuits)
Non-uniform fan-out
Statistical models don’t work
SPRING EMBEDDING
Random layout
Spring embedded layout
Biomolecular constructions
Nano-Origami: Scientists At
Scripps Research Create
Single, Clonable Strand Of
DNA That Folds Into An
Octahedron
A group of scientists at The Scripps
Research Institute has designed,
constructed, and imaged a single
strand of DNA that spontaneously
folds into a highly rigid, nanoscale
octahedron that is several million
times smaller than the length of a
standard ruler and about the size of
several other common biological
structures, such as a small virus or
a cellular ribosome.
http://www.sciencedaily.com/releases/2004/02/040
212082529.htm
DNA Strands Forming a Cube
http://seemanlab4.chem.NYU.edu
Assuring cohesion
A problem from biomolecular computing—physically constructing
graphs by ‘zipping together’ single strands of DNA
(not allowed)
N. Jonoska, N.
Saito, ’02
A Characterization
A theorem of C. Thomassen specifies precisely when a graph may be
constructed from a single strand of DNA, and theorems of Hongbing and
Zhu to characterize graphs that require at least m strands of DNA in their
construction.
Theorem: A graph G may be constructed from a single strand of DNA if
and only if G is connected, has no vertex of degree 1, and has a spanning
tree T such that every connected component of G – E(T) has an even
number of edges or a vertex v with degree greater than 3.
L. M. Adleman, Molecular Computation of
Solutions to Combinatorial Problems. Science,
266 (5187) Nov. 11 (1994) 1021-1024.
Oriented Walk Double Covering and Bidirectional
Double Tracing
Fan Hongbing, Xuding Zhu, 1998
“The authors of this paper came across the problem of bidirectional
double tracing by considering the so called “garbage collecting”
problem, where a garbage collecting truck needs to traverse each
side of every street exactly once, making as few U-turns
(retractions) as possible.”
DNA sequencing
(joint work with I. Sarmiento)
AGGCT
TCTAC
CTCTA
AGGCTC
GGCTC
CTACT
TTCTA
It is very hard in
general to “read
off’ the sequence of
a long strand of
DNA. Instead,
researchers probe
for “snippets” of a
fixed length, and
read those.
The problem then
becomes
reconstructing the
original long strand
of DNA from the
set of snippets.
Enumerating the reconstructions
This leads to a directed graph with the same number of in-arrows as out
arrows at each vertex.
The number of reconstructions is then equal to the number of paths
through the graph that traverse all the edges in the direction of their
arrows.
Graph Polynomials Encode
the Enumeration
A very fancy polynomial, the interlace polynomial,
of Arratia, Bollobás, and Sorkin ,2000, encodes the
number of ways to reassemble the original strand of
DNA.
It is related, with a lot of work, to the contractiondeletion approach of the Chromatic and Reliability
polynomials.
The interlace polynomial
is computed, not on the
“snippet” graph, but on
an associated circle
graph.
The “snippet” graph
a
b
d
c
a
b
c
a
c
a
d
c
d
d
b
b
A chord diagram
The associated circle graph
Pendant Duplicate Graphs
Effect of adding a pendant vertex or duplicating a vertex
v'
a
v’
c
v
b
v
v'
Adding a
pendant vertex
to v.
v
v
a
v'
c
v’
v
b
Duplicating
the vertex v.
v
v'
Theorem
A set of subsequences of DNA permits exactly
two reconstructions iff the circle graph
associated to any Eulerian circuit of the
‘snippet’ graph is a pendant-duplicate graph.
Side note to the cognesci: Pendant-duplicate graphs
correspond to series-parallel graphs via a medial graph
construction, so the two reconstructions is actually a new
interpretation of the beta invariant.