A General Method For Maximizing the Error

Download Report

Transcript A General Method For Maximizing the Error

A General Method For Maximizing
the Error-Detecting Ability of
Distributed Algorithms
Martina Schollmeyer and Bruce McMillin
IEEE Transactions on parallel and distributed System,
VOL. 8, NO. 2, February 1997
元智大學 資訊工程所
陳桂慧
1999.04.21
Outline
• Terminology for MPS Topology
• Coloring algorithm
• Maximal fault index (MFI) for common
topologies
• Conclusion
Terminology for MPS Topology
• The communication environment (CE) of a processor Pi is
the set of processors from which Pi will receive information
during the execution a program. CE(Pi) is a subset of
{P1,P2,…Pn}
• A fault group of a processor, FG(Pi), of fault tolerance tl, is
the collection of faulty processors in CE(Pi), to guarantee error
detection for all errors caused by these faults, required that
• A collection of processors that must be nonfaulty to guarantee
detection of all errors induced by a set of faulty processor P is
called the nonfault group of P, NFG(P)
Coloring Algorithm
• Coloring algorithm - to color the graph, indicating
faultiness or nonfaultiness of components, when
determine the NFG of an individually faulty
processor.
– describe how the coloring is done for one fault in each
CE( tl =1)
– extend the algorithm for tl >1 to multicoloring, where
each vertex has a chromaticity of tl , to obtain the NFGs.
– this algorithm is used to obtain a possible distribution of
component failures for the whole MPS.
for i:=1 to n
// n is the total # of processors
color Pi faulty;
color (Pj)(i  j  Pj  CE ( Pi)) non-faulty; //NFG({pi})
save NFG({Pi});
reset colors;
end for
An algorithm to determine the NFG of a set P of faulty processor
To determine a permissible fault
distribution for the entire network.
1. Select 1 to be faulty
=> 2,3 nonfaulty
2a. Arbitrarily node 5 is
chosen to be faulty
=>4,6,7 nonfaulty
2b.Arbitrarily node 3 is
chosen to be faulty
=>all nonfaulty
• This provides a total of
three faulty processors, 1, 3,
and 5 with at most two
faulty component in each CE.
A fault matrix (FM) of an MPS gives, for all sets of faulty processor P, all
processors that must be nonfaulty if the elements in P are faulty. A FM corresponds to
a collection of NFGs for a specific tl .
MFI for Meshes - Square Pattern
• For an arbitrary n*m torus-connected mesh, with tl =1
MFI = div(m,2) * div(m,2)
MFI for Meshes - Star Pattern
• The number of
resource nodes in kary n-cube for
perfect 1-adjacency
as
X=k^n/(2n+1), k>2
• A torus-connected
mesh is a k-ary 2cube, if we can
guarantee that we
have only k*k
meshes
X = k^2/5
MFI for Binary Hypercubes
Conclusion
• The Maximal fault index was introduced to
demonstrate how a maximal number of simultaneous
component failures can be tolerated by an error
detecting algorithm, based on specific distributions of
the faults within the interconnection network.
• The assessment of an error-detecting algorithm
based on the concept of its minimal and maximal
fault index can be used for safety critical system,
especially with respect to the process-to-processor
mapping that can be obtained from it.