Chapter 3 Graphs, Trees, and Tours

Download Report

Transcript Chapter 3 Graphs, Trees, and Tours

Chapter 3
Graphs, Trees, and Tours
Presented by Qibin Cai
Overview
Terminology in graph theory
Trees
- Minimum spanning tree (MST)
- Shortest path tree (SPT)
Tours
- TSP tours
Overview cont’d
Building trees
Building tours
- Kruskal’s algorithm
- Nearest-neighbor
- Prim’s algorithm
algorithm
- Improved nearestneighbor heuristic
- Divide and conquer
strategy
- Dijkstra’s algorithm
- Prim-Dijkstra
algorithm
Terminology
What is a graph?
Observation:
A graph is a set of points in a plane (or
in 3-space) and a set of line segments
(possibly curved), each of which either
joins two points or joins a point to itself.
Some definitions
Graphs
B
Y
C
P
Q
X
A
Z




D
A,B,C etc. are vertices(nodes)
(A,X), (X,Y) etc. are edges
P,Q,Z is a cycle
Degree of a node is the number of edges at the node
– Degree Y =3, degree C=1
Terminology cont’d
Definition in mathematical language?
A graph G = (V, E) is a mathematical
structure consisting of two sets V and E.
The elements of V are called vertices
(or nodes), and the elements of E are
called edges.
Digragh : a directed graph
Terminology cont’d
Endpoints : a set of one or two vertices
associated to each edge.
Loop: an edge where both endpoints
are the same. Also called a self-loop.
Parallel edges: a collection of two or
more edges having identical end.
Also called a multi-edge.
Terminology cont’d
A graph is simple if it has no loops or
parallel edges.
Most of our discussions will involve
simple graphs. Sometimes, when we
considering reliability, we will introduce
parallel edges if the network has parallel
links.
Terminology cont’d
The degree of a node: the number of
edges in the graph that have the node
as an endpoint (plus twice the number
of self-loops).
Indegree
Outdegree
Terminology cont’d
Adjacent vertices:
Two nodes are adjacent if there is an
edge that has them as endpoints.
Incidence:
The relationship between an edge and
its endpoints.
Terminology cont’d
Walk from vertex u to vertex v:
an alternating sequence of vertices and
edges, representing a continuous
traversal from vertex u to vertex v.
Trail: a walk with no repeated edges.
Path: a walk with no repeated vertices.
Terminology cont’d
Cycle:
a closed path with at least an edge.
Connected graph:
a graph in which every pair of distinct
vertices has a walk between them.
Terminology cont’d
Subgraph: A graph G’=(N’,A’) is a
subgraph of G=(N,A) if N’  N and
A’ A.
Component:
a maximal connected subgraph of a
graph.
Terminology cont’d
Isomorphism:
Two graphs G1 and G2 are isomorphic
if there is a 1-to-1 mapping f: v1 -> v2
such that (v1, v2)  E1 if and only if
(f(v1), f(v2))  E2.

Trees
Tree: a connected, simple graph without
cycles.
Star: a tree in which only 1 node has
degree greater than 1.
Chain: a tree in which no node has
degree greater than 2.
Any tree with n nodes has n-1 edges.
Trees
A tree is a connected simple graph with no cycles
e.g.
C
B
P
X
Q
Y
A
Z
D
Star
A tree is a star if only 1 node has degree >1
B
C
P
Y
Q
X
A
Z
D
Chains
A chain is a tree with no nodes of degree >2
B
C
P
X
Q
Y
A
Z
D
Weighted Graph
A graph G is weighted if there is a value associated with each
edge (e.g. link speed, cost, etc.)
 Weight of the edge ei = w(ei)
We often denote this graph (G, w). If G’ is any subgraph of G,
then w(G’) =
w(e)

eG '
To optimise a connected graph find the graph with the
minimum weight
The Minimal Spanning Tree (MST)
Minimal Spanning Trees
Let G be a connected weighted graph.
A spanning subgraph includes all the
nodes of G.
A tree T is a spanning tree of G if T is a
spanning subgraph of G.
MST: A spanning tree of G whose total
edge-weight is a minimum.
Finding the MST
Two algorithms Kruskal and Prim
Kruskal achieves the MST by starting
with a graph and cutting out edges
Prim
starts by selecting a node,
 adding the “least expensive edge”
 iterates until tree is built

Example MST
Use of MSTs
Small design problems - few nodes
Highly reliable links with low “downtime”

or network can tolerate unreliability
Nodes ‘v’ reliability
As the number of nodes increases
reliability decreases (exponentially!)

Kruskal’s Algorithm (1956)
1. Check that the graph G is connected.
If it is not connected, abort.
2. Sort the edges of the graph G in
ascending order of weight.
3. Mark each node as a separate
component.
4. Examine each of the sorted edges:
if the edge connects two separate
components, add it ; otherwise, discard.
Prim’s Algorithm (1957)
Input : a weighted connected graph
G=(N,E).
Output : a minimum spanning tree T.
U = set of all nodes in MST
V = set of all nodes that are NOT yet in
MST, but they’re adjacent to nodes in U.
Prim’s Algorithm (cont’d)
1. Place any node in U,
and update V .
2. Find the edge with smallest weight
that connects a node in V to a node in U
3. Add that edge to the tree,
and update U & V.
4. Repeat 2 & 3 until all nodes are
included, i.e., | U | = | N |.
How to use Delite to
Calculate MST’s
Invoke the code for Prim’s
algorithm from the Design menu.
Select to produce a trace file
Demonstration
Tree Designs
Overview
• Squareworld
• Coordinate systems:
•
-V&H
•
-L&L
• MSTs do not scale
• Definitions: hops(n1,n2), hops
Square World
We will create a little world with several properties that
make it a nice place to work on network design
problem.

The world is 1000 miles by 1000 miles.
 1 type of transmission line with a capacity of 1,000,000 bps.
 Given 2 sites, S1 at location (X1, Y1) and S2 at location (X2,Y2),
the cost of a link between them is
($1000 + $10 x d) / month
where
d  ( x2  x1 ) 2  ( y2  y1 ) 2
2 Coordinate Systems
We will use a problem generator to set up a series of network
design problems. Before we can do this, we need to know
something about the methods of locating sites are used in the
real world.
Vertical and horizontal (V&H)
- a grid of lines, or more accurately curves is drawn.
- allows for a simplified computation of distances.
Latitude and longitude (L&L)
- defined for all locations on the surface of the earth.
- The distance calculation is essentially an exercise in
spherical geometry.
MSTs Do Not scale
Why? First look at an example.
Figure 3.2 (An MST for 5 nodes in square world)
N2
N1
N4
N3
MAX_UTIL=0.6%
N5
Figure 3.3 (An MST for 10 nodes in square world)
N6
N2
N7
N10
N9
N1
N5
N4
N8
MAX_UTIL=2.5%
N3
The network is beginning to have a leggy look, which means that the
traffic is taking a circuitous route between its source and destination.
To qualify the legginess in the network, we make the following definition.
Two Definitions
Definitions 3.17
- The number of hops
between node n1 and n2 is
the number of edges in the
path chosen by the routing
algorithm for the traffic
flowing from n1 to n2.
Denoted by hops (n1,n2)
Definitions 3.18
- The average number of
hops in a network is:

n1, n 2
Traffic(n1, n2)  hops(n1, n2)

n1, n 2
Denoted by
Traffic(n1, n2)
hops
MSTs Do Not scale (cont’d)
We summarize the values of hops as below:
Number of nodes
hops
5
10
20
50
1.8
3.1778
4.4158
8.5159
100
13.9479
#hops grow past a
reasonable level, and
MSTs are not good
solutions as # nodes and
the traffic grow.
Then we will consider if
we can design better
trees.
Shortest-Path Trees (SPT)
Definition 3.19
Given a weighted graph (G,W) and nodes n1 and n2, the
shortest path from n1 to n2 is a path P such that  w(e)
eP
is a minimum.
Definition 3.20
Given a weighted graph (G,W) and a node n1, a shortest –
path tree rooted at n1 is a tree T such that, for any other
node n2  G, the path from n1 to n2 in the tree T is a
shortest path between the nodes.
Dijkstra’s Algorithm
1. Mark every node as unscanned and
give each node a label of 
2. Set the label of the root to 0 and the
predecessor of the root to itself. The
root will be the only node that is its own
predecessor.
Dijkstra’s Algorithm (cont’d)
3. Loop until you have scanned all the
nodes.
-Find the node n with the smallest label.
Since the label represents the distance to
the root we call it d_min.
-Mark the node as scanned.
-Scan all the adjacent nodes m and see if the
distance to the root through n is better than
the distance stored in the label of m. If it is,
update the label and update pred[m]=n.
4. When the loop finishes, we have a tree
stored in pred format rooted at root.
SPT vs. MST
Lower utilization of the links
More cost
Important:
Smaller average number of hops
Actually we will compare star (a kind of SPT)
with MST, because …
Star
SPT vs. MST
If we run Dijkstra’s algorithm on a
sparse graph, we will get a tree with a
fair number of nodes not connected
directly to the root.
If we run Dijkstra’s algorithm on a
complete graph (exactly what we’re
studying now), then we usually get a
star.
Star vs. MST (cont’d)
Design name
hops
MST
13.9479
0.493
$325.516
Star
1.9800
0.09
$453.861
MAX_UTIL
Cost
Prim’s algorithm produces much shorter paths but can
produce very expensive networks. SPT is not good, either.
Is there some middle ground between MST and SPT?
Prim – Dijkstra Trees
Algorithm : Label
1) Prim’s: min neighbors dist(node, neighbor)
2) Dijkstra’s:
min neighbors(dist (root , neighbor)  dist (neighbor, node))
3) Prim-Dijkstra’s:
min neighbors(  dist(root , neighbor)  dist(neighbor, node))
0  1
Prim – Dijkstra Trees (cont’d)
If   0, we build a MST.
If   1 , we build a SPT.
The  delay, and cost for various Prim-Dijkstra trees.

Design
0(MST)
hops
Link delay
N0
13.9479
0.3066
$325,516
0.1
N1
10.5717
0.1451
$280,162
0.2
N2
7.8640
0.1067
$247,217
0.3
N3
6.7762
0.0913
$243,551
0.4
N4
5.6679
0.0746
$248,650
0.5
N5
4.6303
0.0598
$253,579
0.6
N6
3.7063
0.0467
$273,742
0.7
N7
3.0186
0.0380
$295,012
0.8
N8
2.2879
0.0277
$378,792
0.9
N9
1.9800
0.0233
$453,861
Cost
If we have such a
large set of
designs shown in
the table, how to
select the best?
Dominance among Designs
If we have a large set of designs, the
problem is to decide which merit
consideration and which should be
discarded.
To help us do this we impose a partial
ordering on the designs.
Dominance among Designs
(cont’d)
Definition 3.21:
Given a set S and an operator  that
maps S x S
{TRUE,FALSE}, then we
call S a partially ordered set, or poset, if
1) For any s  S, s  s is FALSE.
2) For any s1, s2  S, s1  s2, if s1  s2 is
TRUE, then s2  s1 is FALSE.
3) If s1  s2 and s2  s3 are TRUE, then
s1  s3 is TRUE.
Dominance among Designs
(cont’d)
Definition 3.22:
Suppose design D1 has cost C1 and
performance P1. Suppose design D2
has cost C2 and performance P2. We
will say D1 dominates D2, or D1  D2, if
C1 < C2 and P1 > P2.
Dominance among Designs
(cont’d)
Design
N0
N1
N2
N3
N4
N5
N6
N7
N8
N9
Dominates

N0
N0,N1
N0,N1,N2
N0,N1
N0,N1
N0,N1
N0


Link delay
0.3066
0.1451
0.1067
0.0913
0.0746
0.0598
0.0467
0.0380
0.0277
0.0233
Cost
$325,516
$280,162
$247,217
$243,551
$248,650
$253,579
$273,742
$295,012
$378,792
$453,861
Dominance among Designs
(cont’d)
Show dominance relationships as a
directed graph:
A directed graph is a graph G=(V,E) in
which each edge e has been given an
orientation. If the edge has endpoints v1
and v2, we shall denote the edge
e=(v1,v2) if the orientation of v1 is the
source vertex.
Dominance among Designs
(cont’d)
Think of the designs (N0,N1,N2,…,N9) as the nodes of a graph.
A directed edge runs from Ni to Nj if Ni
Nj.
We can see we don’t want to consider N0, N1, or N2.

N9
N8
N3
N2
N5
N4
N1
N7
N6
N0
Further Analysis of
Prim-Dijkstra Trees
Given a pair of nondominating designs S1 and S2, 1
must be cheaper and 1 must have lower delay.
After rejecting the dominated designs, we still have 7
designs left to choose from. One way to clarify their
differences further is to discuss the marginal cost of
delay:
C1 - C2
P2 - P1
Using Delite to Produce
Prim-Dijkstra Trees
Unlike Prim’s algorithm, the choice of
the node at the center of the tree is
important in the Prim-Dijkstra
algorithm.
The value of .
Create trace file.
Tours
Sometimes a tree is just too unreliable to be a
good network design.
Tours are far more reliable yet only have 1
additional link.
In graph theory, a tour refers to a possible
solution of the traveling salesman problem
(TSP).
Tours (cont’d)
Definition 3.24:
Given a set of vertices {v1 , v2 ,..., vn } ,
a tour T is a set of n edges E such that
each vertex v has degree 2 and the
graph is connected.
vt1
Tours (cont’d)
The number of tours is
(n  1)! / 2
Proof :
1) Represent the tour as a permutation:
(vt1 , vt2 ,..., vt n )
2) There are n! such permutations, but the
reverse permutation also gives the same tour.
TSP
Definition 3.25:
Given a set of vertices (vt1 , vt 2 ,..., vt n )
and a distance function d : V V   ,
the traveling salesman problem is to find the
tour T such that
n
 d (vt , vt ) is a minimum.
i 1
i
i 1
In this notation we identify vtn1by vt1
Reliability
Definition 3.26: The reliability of a network is the
probability that the functioning nodes are connected
by working links.
Assumption:
1) The probability of each node working is 1;
2) The probability of a link failing is p .
Typically, p is a small number.
3) There is no correlation between the link failures.
Reliability (cont’d)
For the 5-node tree:
b
d
a
e
c
4
(
1

p
)
P(no_link_failures) =
P(failure) = 1  (1  p) 4
= 1  4 p  6 p 2  4 p3  p 4
 4p
Reliability (cont’d)
For the 5-node ring network :
b
d
a
e
c
If q = 1-p, then
Pring ( failure)  1  q 5  5 pq 4  10 p 2 q 3  10 p 3 q 2  5 p 4 q  p 5
 10 p 2 q 3
Reliability (cont’d)
p
0.1
0.01
0.001
10
…
4
4p (tree)
0.4
0.04
0.004
4  10
…
4
2
3
10 p q (ring )
0.0729
0.00097
10 5
10
7
…
Building Tours
TSP is NP-hard.
No polynomial – time algorithm.
We will use heuristic algorithms.
For our purpose a heuristic algorithm for tours
will build some tour but not necessarily the
TSP tour.
Nearest-neighbor Algorithm
1. Start at a distinguished node we call root and set
current_node = root.
2. Loop until we have all the nodes in the tour.
Now loop through the nodes and find the node
closest to the current_node that is not in the tour. We
call this best_node.
Create an edge between current_node and
best_node.
Reset the current_node to the best_node.
3. Finally create an edge between the last node and the
root to complete the tour.
Nearest-neighbor Algorithm
(cont’d)
Observation:
* Good (?):
We are trying to produce a short tour, we will always
move to the best possible next location.
* Bad (?):
When we look at the figure produced, we can see the
lines may cross frequently.
* Can we improve it? How can we measure the
goodness (creditability) of an algorithm?
Creditable Algorithms
After a little analysis the lines would be
uncrossed by hand, and the creditability of all
the work would be brought into question.
Definition 3.27:
A heuristic optimization algorithm
produces a creditable result if the result is
a local optimum for the problem.
Otherwise, it produces an uncreditable
result.
Creditable Algorithms
(cont’)
Example:
Prim’s and Dijkstra’s algorithms solve
the MST and SPT problems. Their
results are always creditable.
The nearest-neighbor tour-building
algorithm frequently produces
uncreditable designs.
Creditable Algorithms
(cont’)
Definition 3.28
A suite of network design problems S is a set of triples
( Locationi , Traffici , Costi ) for i ,..., S
Definition 3.29
A creditability test a program test (net, traffic, cost) that takes a
network problem as input and returns OK or FAIL depending on
whether or not test () can manipulate net into another valid network
of lower cost.
Definition 3.30
Given a suite of network design problems S, a design algorithm A,
and a creditability test t ( ) then
Cs ( A) 
net  S t (net )  OK 
S
Creditable Algorithms
(cont’)
First creditability test on the nearest-neighbor
heuristic. (Refer to the code in Appendix B “Computing
creditability of Simple Nearest Neighbor”)
The creditability list :
Sites
6
8
10
15
20
30
40
50 trials
500 trials
5000 trials
64.0%
56.0%
34.0%
22.0%
10.0%
8.0%
0.0%
59.4%
52.4%
37.8%
21.8%
13.8%
3.2%
0.8%
58.0%
47.94%
39.84%
22.62%
12.58%
3.84%
1.50%
A more creditable
nearest-neighbor algorithm
It differs from the simpler algorithm in two ways:
First, the closest node doesn’t mean the closest node to the last
added node; it means the closest node to any node in the partial
N
tour we have built.
Second, when we add best_node to the tour, we don’t add it at
the end of the tour, rather, we do a test to find the best place to
add it, such that
i
dist ( Ni , best _ node)  dist (best _ node, N j )  dist ( Ni , N j )
is the smallest possible value among all nodes N i and N j that
are adjacent in the partially built tour.
A more creditable
nearest-neighbor algorithm (cont’d)
The increasing of the creditability of the tour built by
the improved nearest-neighbor heuristic :
Sites
6
8
10
15
20
30
40
50 trials
64.0% -> 98.0%
56.0% -> 92.0%
34.0% -> 90.0%
22.0% -> 86.0%
10.0% -> 80.0%
8.0% -> 54.0%
0.0% -> 56.0%
500 trials
5000 trials
59.4%->94.8% 58.0%->95.32%
52.4% ->92.6% 47.94%->92.78%
37.8% ->90.8% 39.84%->91.04%
21.8% ->82.6% 22.62%->82.18%
13.8% ->72.6% 12.58%->74.06%
3.2% ->60.4%
3.84%->58.94%
0.8% ->55.0%
1.50%->48.84%
Time Complexity
Simple nearest-neighbor algorithm:
( n 2 )
Improved nearest-neighbor algorithm:
(2n 2 )  (n 2 )
Both heuristics have the same computational
complexity.
A related heuristic
The furthest-neighbor heuristic
vs
the nearest-neighbor heuristic :
The nearest-neighbor heuristic has a tendency to
strand the furthest sites, because it brings the site
with the smallest dtour into the tour at the best site.
The furthest-neighbor heuristic brings the site with
the largest value of dtour into the tour.
TSP Tours Do Not Scale
Theorem 3.3:
Given uniform traffic any TSP tour of n nodes has hops  n  1
4
2
n
if n is even. (Proof on text pp.86)
4( n  1)
Comparison of average number of hops for MST and TSP:
if n is odd and
Number of nodes
5
10
20
50
100
hops MST
1.8
3.1778
4.4158
8.5159
13.9479
hops TSP
1.5
2.777
5.263
12.755
25.252
Intolerable !
Tour Building and Delite
The Delite tool can build tours using both the
improved nearest-neighbor heuristic and the
furthest-neighbor heuristic.
Starting node.
Trace file.
Return to Graph Theory:
2 - Connectivity
Connectivity:
 Two nodes i and j are connected if the
graph contains at least one path from
node i and j.
 A graph is connected if every pair of its
nodes is connected.
Return to Graph Theory:
2 – Connectivity (cont’d)
The attraction of tours is that they are 2-connected.
Definition 3.31:
Given a connected graph G=(V,E), the vertex v is an articulation
point if removing the vertex and all the attached edges
disconnects the graph.
Definition 3.32:
If a connected graph G=(V,E) has no articulation points, then the
graph is 2-connected.
* Our goal is to produce a design that is 2-connected and that has
fewer hops. To this end we quote a helpful theorem.
Return to Graph Theory:
2 – Connectivity (cont’d)
Theorem 3.4:
G2  (V2 , E2 ) are 2-connected
graphs with V1 V2   . Let v1 , v2 V1 and v3 , v4 V2 .
Then the graph G with vertices V1  V2
and edges
E1  E2  (v1 , v3 )  (v2 , v4 )
is 2-connected.
Suppose
G1  (V1 , E1 ) and
How can we use this theorem on joining together 2connected graphs to help us with the hop count
problem in tours?
The answer is a divide-and-conquer strategy.
Divide and Conquer
In the figure, we have a TSP tour on 20 nodes. The average
number of hops is 5.263. We want to reduce the average hop
count but keep the 2-connectivity.
N20
N13
N6
N2
N7
N15
N9
N14
N10
N1
N9
N16
N12
N18
N17
N4
N8
N11
N3
N5
Divide and Conquer (cont’d)
1. Divide the 20 nodes into 2 “compact” clusters of 10 nodes
each. Call these clusters C1 and C2.
(We might divide the 20 nodes by ranges of their coordinates,
for example, to create the 2 clusters.)
2. Use the nearest-neighbor algorithm to design 2 TSP tours on
each cluster.
3. Select v1 C1 and v2 C2 to be the 2 nodes such that the
distance is the minimum.
4. Now select v3 C1-v1 and v4 C2-v2 to be the 2 nodes such
that the distance is the minimum.
5. Add the edges (v1,v2), (v3,v4) to the design.
Divide and Conquer (cont’d)
Many questions about the procedure will be
addressed later in the book. At this point, we
will focus on the generalization to more
clusters.
One approach to building networks
composed of multiple 2-connected clusters
can be stated in the following theorem.
Divide and Conquer (cont’d)
Theorem 3.5
Suppose that G  (V , E ) is a 2-connected graph with
V  2 . Suppose that each node vi  V
is
replaced by a 2-connected graph Gi . Suppose ,
each edge e  (u, w)  E is replaced by an edge e
from u ' Gu to v' Gw . Then if no 2 of these
replacement edges have a common vertex, the graph
H  ( i Vi , i Ei  E ' ) is a 2-connected graph.
Divide and Conquer (cont’d)
G=(V,E)
Divide and Conquer (cont’d)
H
Summary
If the traffic is small when compared to link size, then the optimal
networks are MSTs and TSP tours, depending on the reliability
desired.
Both MSTs and TSP tours do not scale.
The growth in the average # of hops is at the heart of the
problem. It’s better to build a Prim-Dijkstra tree or a “ring of
rings” to control the length of the routes.
The notion of creditability.
The End
Thank you !