Arjun - Fault-Tolerant Routing: A Genetic

Download Report

Transcript Arjun - Fault-Tolerant Routing: A Genetic

CS717
Fault-Tolerant Routing: A Genetic
Algorithm and CJC
Arjun Rao
CS 717
November 18, 2004
CS717
Next Paper
• [1] Loh, Peter K.K., “Artificial Intelligence
Search Techniques as Fault-Tolerant Routing
Strategies”
• [2] Loh, Shaw., “A Genetic-Based FaultTolerant Routing Strategy for Multiprocessor
Networks”
CS717
Our Little Problem…
• AI search techniques topology- and fault-type
independent…
• …but non-minimal routes utilized
• Follow-up work shows how genetic
algorithms (combined with heuristics) can find
minimal routes in presence of network faults
CS717
Genetic Algorithms: Overview
• Optimization strategy
• Population of potential solutions evolve over
series of generations
• Each element of population is chromosome;
each unit of chromosome is gene
• Chromosomes undergo crossover and
mutation
• Most fit chromosomes selected for next
generation, based upon fitness function
CS717
Abstract Model
• Same as before (including definitions of S
and G)
• Pure abstraction suffers from same caveats
as before
• Basic idea: Instead of AI search for adaptive
route, optimize over population of routes to
find best
CS717
Message Packets
• Simplified version:
CS717
Chromosome
• Route  Chromosome
• Node on route  Gene in chromosome
• Length of route  Size of chromosome
– Chromosome size directly reflects routing
performance!
• Distance traversed basis of fitness
CS717
Population Creation
CS717
Mutation and Crossover
• Mutation: Swap and/or shift
• Normal crossover destroys routes, messes
with source and destination; problem w/
different lengths
– Use one-point random crossover
CS717
Fitness Function
• F = (Dmax – Droute) / Dmax + 
– Dmax: Maximum distance between source and
destination
– Droute: Distance traveled by specific route
– : Predefined value to ensure non-zero fitness
• Higher value  More fit
CS717
Selection Scheme
• Roulette Wheel
– Sum of fitness values * random value from [0,1]
– Select chromosomes until fitness sum greater than product
• Tournament Selection
– Most fit chromosomes selected
• Stochastic Remainder
– Probabilities used to select route
• Which scheme has best performance selecting
optimal route?
CS717
Reroute
CS717
Genetic Hybrid Algorithm
CS717
Performance Testing
• RW, TS, SR tested in concert with RR and
HCR (previous algorithm was for RR)
• 25-node mesh network
• Varying fault percentages
• Ten tests, each over 20 generations
• Variations in mutation and crossover rates
CS717
Performance Testing (cont.)
• Randomness of RR bad for rerouting
• Unsuccessful routes increase with number of
generations
• SR + HCR performed best (prevented
premature convergence, maintained good
diversity)
CS717
Performance Testing (cont.)
CS717
Comparison With AI-only Strategies
CS717
Next (and Final) Paper
• [3] Loh, Schröder, Hsu., “Fault-Tolerant
Routing on Complete Josephus Cubes” (not
AI/GA-related but interesting nevertheless)
CS717
What is a Josephus Cube?
• New network topology
– Better embedding, communications performance
than hypercube topologies
• Can be used for processors, node clusters w/
optical channel architectures
– In clusters, fault tolerance sacrificed for scalability
– Symmetry (which is good for routing) somewhat
sacrificed as well
CS717
Complete Josephus Cube (CJC)
•
•
•
•
Augmented (i.e. more links) Josephus Cube
Better fault-tolerance
Routing efficiency maintained!
Properties
–
–
–
–
–
–
Uniform node degree of log2N + 2
N = 2r, r is order of network
Smaller diameter
Better symmetry
Guaranteed message delivery (w/ up to r + 1 faults)
No deadlocks/livelocks
• We examine CJC as node cluster topology
CS717
Defining the CJC
• CJC(N) is an undirected graph G = (V, E)
• Nodes labeled using function J:
– J(1) = 1
– J(2i) = 2J(i) – 1, i  1
– J(2i + 1) = 2J(i) + 1, i  1
CS717
Defining the CJC (cont.)
• V = {u | J(2r)  u  J(2r + 1 – 1)}
• Three types of edges: H, J, C
– E = EH  EJ  EC
• (x,y)  EH iff H(x,y) = 1, where H is the
Hamming distance
• (x,y)  EJ iff y = x XOR 6
• (x,y)  EC iff y = (x), where  finds 2’s
complement
CS717
Example: CJC(8), r = 3
CS717
Example: CJC(16), r = 4
CS717
Data Vectors
• Route vector Rv(u) = (Tr+1Tr…T1T0)
– Information on paths already traversed
– Tr+1 = 1 if |u  v| > CEIL(r / 2) else 0
– Tr = 1 if v = u(J), where u(J) is 2’s complement of
lowest order bits
– Tk = (uk+1  vk+1), 0  k < r
CS717
Data Vectors (cont.)
• Input link vector Iv(w) = (Lr+1Lr…L1L0)
– Li = 0 if message arrives on link i else 1
• Fault status vector Fv(w) = (Sr+1Sr…S1S0)
– Sj = 0 if node on link j or link j itself faulty else 1
• Navigation vector Nv(u) = Rv(u)  Iv(u)  Fv(w)
– Designates optimal (or best-possible) paths from
current node
CS717
CJC-FTROUTE() Algorithm
1)
2)
3)
4)
5)
6)
Extract route vector from packet header
Update current node’s fault status vector
If route vector all 0’s, destination reached
(Re)Initialize input link vector
Compute navigation vector Nv(u)
If enabled C link not faulty, set Rvr+1(u) to 0, take 2’s
complement of Rv(u), and route packet
7) Else, do the same with the J link (link r), except
complement Rv0(u) and Rv1(u) only
8) If both C and J links faulty and/or not enabled, find
an H link for routing (or misrouting), set Rvr+1(u) and
Rvr(u) to 0, and either route or discard
CS717
Example 1: CJC(16)
CS717
Example 2: CJC(16)
CS717
Routing With Bounded Faults
• Theorem: Routing between a node pair (x, y)
is optimal if FT < (r + 2) which includes faulty J
and C incident links at x
• Proof: Let fi faults be encountered at node wi
on path of length p. |R(wi)| - fi optimal links left
to y; routing from wi to wi+1 optimal if link or
|R(wi)| - fi > 0. Generalizing to whole path:
p 1
p 1
| R(w ) | f
i 0
i
i 0
i
 FT
CS717
Maximum Path Distance
• Define FTR(u, v) = Set of edges of path from
u to v
• Corollary: FTR guarantees path distance of
route upper-bounded by H(x, y) if FT < (r + 2)
• Corollary: FTR guarantees max distance r in
order r cluster
• Proof: Follows from above corollary with H(x,
y)  r
CS717
Final Set of Theorems
• For sub-optimal routing, FTR guarantees max
path distance of 2r – H(x, y) + 2 only if FT < (r
+ 2) (including disabled C and J links at x)
• FTR guarantees deadlock-free routes
• FTR guarantees livelock-free routes
– Proof of latter two utilizes notion of virtual
networks in CJC