Transcript Powerpoint

Introduction to graph theory and
applications
Introduction and definitions
Typical network modelling applications
Classic graph theory problems and proofs
The Seven Bridges of Königsburg
The four colour map colouring theorem
The three cottage problem
Data structures used for storing graphs
Incidence and adjancancy lists and matrixes

Definitions
Graph theory concerns the study of networks based on a
mathematical abstraction of the form of a graph. A graph is
made up of vertices (singular: vertex) and edges. An edge
connects exactly 2 vertices. These are the same vertex in the
special case where the edge is called a loop or loopback.
A vertex can have any number of edges connected to it.
Edges might be directed, and drawn as an arrow to indicate
that network flow or traffic or the connection represented by
the edge works in one direction only. In an undirected graph
the edges are bidirectional. Edges may be associated with a
numeric cost. The meaning of edge cost will depend upon the
graph application.
Vertices and bidirectional edges
More definitions
A path is a route through a graph visiting vertices and
edges in turn. A cycle is a path that ends at the starting
vertex. A Hamiltonian path or cycle visits all vertices
exactly once. A Eulerian path or cycle visits all edges
exactly once.
A graph is simple if no edge is a loop, and no 2 edges
have the same endpoints.
A planar graph can be drawn on a plane surface, e.g. a
piece of paper, so that no edges cross over other edges.
Even more definitions ...
A directed acyclic graph is a directed graph without
cycles, i.e. you can't get back to the vertex where you
started by following edges in their defined direction.
This is a way to map a forest of trees so that subtrees
can be shared between trees. Sources are vertices all
of whose edges lead out from them and sinks are
vertices all of whose edges lead into them. A tree is a
graph which is connected, acyclic and simple.
Terms to remember
➲
➲
➲
➲
➲
➲
➲
➲
graph
vertex (pl vertices)
edge
edge cost
undirected graph
directed graph
simple graph
loop
➲
➲
➲
➲
➲
➲
➲
➲
path
cycle
Hamiltonian
Eulerian
planar graph
acyclic directed
graph
forest
tree
Typical Graph Applications 1
Modelling a road network with vertexes as towns and
edge costs as distances.
Modelling a water supply network. A cost might relate
to current or a function of capacity and length. As
water flows in only 1 direction, from higher to lower
pressure connections or downhill, such a network is
inherently an acyclic directed graph.
Modelling the recent contacts of someone who has
become ill with a notifiable illness, e.g. Sars or
Meningitis. Edge costs might be a function of the
probability that the contact resulted in an infection.

Typical Graph Applications 2
Dynamically modelling the status of a set of routes by
which traffic might be directed over the Internet.
Modelling the connections between a number of
potential witnesses or suspects who were reported or
came forward as having been within the vicinity of a
serious crime within an hour of of when it occurred.
Minimising the cost and time taken for air travel when
direct flights don't exist between starting and ending
airports.
Using a directed graph to map the links between pages
within a website and to analyse ease of navigation
between different parts of the site.

Classic Graph Theory Problems
Graph theory started from a mathematical curiosity.
"The Seven Bridges of Königsberg is a problem inspired by
an actual place and situation. The city of Kaliningrad, Russia
(at the time, Königsberg, Germany) is set on the Pregolya
River, and included two large islands which were connected
to each other and the mainland by seven bridges. The
question is whether it is possible to walk with a route that
crosses each bridge exactly once, and return to the starting
point. In 1736, Leonhard Euler proved that it was not
possible."
Source: http://en.wikipedia.org/wiki/Seven_Bridges_of_Koenigsberg
Seven Bridges of Königsberg 2
"In proving the result, Euler formulated the problem in terms
of graph theory, by abstracting the case of Königsberg -- first,
by eliminating all features except the landmasses and the
bridges connecting them; second, by replacing each landmass
with a dot, called a vertex or node, and each bridge with a
line, called an edge or link. The resulting mathematical
structure is called a graph."
Seven Bridges of Königsberg 3
"The shape of a graph may be distorted in any way without
changing the graph itself, so long as the links between nodes
are unchanged. It does not matter whether the links are
straight or curved, or whether one node is to the left of
another.
Euler realized that the problem could be solved in terms of the
degrees of the nodes. The degree of a node is the number of
edges touching it; in the Königsberg bridge graph, three nodes
have degree 3 and one has degree 5. Euler proved that a
circuit of the desired form is possible if and only if there are no
nodes of odd degree. Such a walk is called an Eulerian circuit
or an Euler tour. Since the graph corresponding to Königsberg
has four nodes of odd degree, it cannot have an Eulerian
circuit."
Seven Bridges of Königsberg 4
"The problem can be modified to ask for a path that
traverses all bridges but does not have the same
starting and ending point. Such a walk is called an
Eulerian trail or Euler walk. Such a path exists if and
only if the graph has exactly two nodes of odd degree,
those nodes being the starting and ending points. (So
this too was impossible for the seven bridges of
Königsberg.)"
The four colour theorem
This theorem states that given any set of areas on a
planar surface, e.g. representing political regions on a
map, it is possible to colour every area so that no 2
regions sharing a border need use the same colour if 4
colours are used.
Sharing a border means a linear border between 2
areas of more than zero length; e.g. many regions can
meet at a point but this doesn't count as a border. Also
an area within the set must be single and contiguous.
The four colour theorem 2
Political geographic entities such as countries
sometimes break this requirement by having enclaves,
e.g. the USA includes Alaska which is not part of the
largest geographically contiguous region of the USA.
Old maps of English and Welsh counties split the
county of Flint into 2 seperate areas.
This theorem comes from a question (conjecture)
asked by a student Francis Guthrie of his maths
lecturer Augustus De Morgan in 1852. It was finally
proved in 1976.
Graph theory presentation of the theorem
"To formally state the theorem, it is easiest to rephrase it in graph
theory. It then states that the vertices of every planar graph can be
colored with at most four colors so that no two adjacent vertices
receive the same color. Or "every planar graph is four-colorable"
for short. Here, every region of the map is replaced by a vertex of
the graph, and two vertices are connected by an edge if and only if
the two regions share a border segment (not just a corner)"
Source: http://en.wikipedia.org/wiki/Four-color_theorem
Three cottage problem
Source: http://en.wikipedia.org/wiki/Three_cottage_problem
"The three cottage problem is a well-known mathematical
puzzle. It can be stated like this:
Suppose there are three cottages on a plane (or sphere) and
each needs to be connected to the gas, water, and electric
companies. Is there a way to do so without any of the lines
crossing each other?"
Three cottage problem 2
"The problem is part of the mathematical field of topological
graph theory which studies the embedding of graphs on
surfaces. In more formal graph theoretic terms the problem
asks whether the complete bipartite graph K3,3 is planar.
Kazimierz Kuratowski proved in 1930 that K3,3 is nonplanar,
and thus that the three cottage problem has no solution.
But K3,3 is toroidal, that is it can be embedded on the torus. In
terms of the three cottage problem this means the problem can
be solved by punching a hole through the plane (or the sphere).
This changes the topological properties of the surface and
using the hole we can connect the three cottages without
crossing lines."
Three cottage problem 3
From this description note that the term: "complete bipartite
graph" means a graph with 2 sets of vertices where every
vertex in one set is connected by an edge to every vertex in the
other set. The designation K3,3 for this graph indicates that
there are 3 vertices in each of the 2 sets, representing utilities
and cottages respectively. K5 is diagrammed on the right.
Three cottage problem 4
Kuratowski's proof is based on a process involving
simplification of complex graphs into simpler graphs
by removing vertices and edges in certain situations
and finding instances of one of the 2 simplest non
planar graphs as subsets of the more complex graph.
The other simplest non-planar graph is a complete
graph known as K5, because it has a single set of 5
vertices with each vertex connected to every other
vertices.
Data structures used for storing graphs
Before choosing one of various options or designing a
customised approach, programmers need to consider
application data, and the means by which this will be
attached to vertices and edges. Records will need to
be attached either to vertices, or to edges, or one type
of record might need attachment to vertices and
another to edges. The choice and design of data
structure(s) will influence and be influenced by
program performance and memory usage
requirements.
Incidence list
The connectivity of an edge is regular in that each represents a
connection between exactly 2 vertices. This lends itself to fixedsize record designs, e.g. an array of records known as an
incidence list.
The coding in the next slide uses pointers to the vertex records, as
a vertex may appear in any number of edge records.
Alternatively, if the vertices are stored in an array which is
populated before the edge records are created, and unchanged
during the runtime of the application, the edge records could
store the integer array indices. Which approach is optimal
depends upon how vertex records are stored and how this
collection of data is searched and updated.
Incidence list structure coding
Adjacency list
The connectivity of a vertex is irregular, in the sense
each can be connected using a variable number of
edges. Vertex information can be stored in a fixed
width record containing the head pointer of a linked
list. Each linked list item will represent an edge. The
latter will contains references (e.g. array indices, keys
or pointers) to edge record storage so this isn't
duplicated. If this isn't done within the edge records
themselves, the linked list nodes must also reference
other vertices connected via the edges. This approach
is known as an adjacency list.
Adjacency list structure coding
Adjacency list coding comments
The above structure is better suited to a directed graph than an
undirected graph. A link from a webpage to another page (i.e. a
directed edge as in the above example ) does not require a link in
the opposite direction. Using this kind of approach for an
undirected graph would either lead to duplication of information,
in the sense that each edge would have to be stored and updated
twice, or added complexity of path access and traversal. Edge
costs, if required, could be stored as an additional field within the
url_list structure, for which each node in the list is an edge. A
reason for having edge costs in this example might be if not all
links between pages are equally easy for a user to find.
Incidence matrix
This structure is a 2 dimensional array where the rows
index the edges, and columns index the vertices. In
the simplest implementation the array element is
boolean, with a 1 indicating a connection and a 0
indicating no connection. The advantage of this
structure is that it enables data to be rapidly indexed,
either for vertices or for edges. The disadvantage is
that memory proportional to V x E is required, where
V is the number of vertices, and E the number of
edges. List or array based structures require memory
proportional to V + E.
Adjacency matrix
This is a V x V 2D array where V is the number of
vertices. In the simplest case, if the data present at
index [i][j] is a 1 this indicates a (possibly directed)
edge from i to j, while a 0 indicates there being no
such edge. If the edges have costs, this could be a
floating point number present at index [i][j], with a
sentinel value, e.g. 0 or -1, indicating lack of an edge
from vertex i to vertex j. This data structure makes it
easier to search for subgraphs of particular patterns or
identities, e.g. K3,3 or K5.
Conclusions
Graph theory enables us to study and model networks and
solve some difficult problems inherently capable of being
modelled using networks.
Various terms e.g. vertex and edge, are associated with graph
theory which gives these terms special meanings. These
meanings need to be understood and remembered in order to
apply graph theoretic approaches to solving problems.
When solving a problem by developing a graph-based
program, careful attention must be given at the design stage to
the structuring of data to help make solving the problem
tractable, to enable linkages to be traced efficiently and to
avoid duplication of data.