Certification of Computational Results Greg Bronevetsky

Download Report

Transcript Certification of Computational Results Greg Bronevetsky

Certification of Computational
Results
Greg Bronevetsky
Background
• Technique proposed by
• Gregory F. Sullivan
• Dwight S. Wilson
• Gerald B. Masson
• All from Johns Hopkins CS Department.
Overview
• Trying to do fault detection without the severe
overhead of replication.
• Certification Trails are a manual approach
that has that programmer provide additional
code to have the program check itself.
• A program generates a certification trail that
details its work.
• A checker program can use this trail to verify
that the output is correct in asymptotically
less time.
• Several examples provided. No automation.
Roadmap
• We will cover some algorithms to which the
Certification Trails technique has been
applied
• Sorting
• Convex Hull
• Heap Data Structures
• The addition of Certification Trails and the
creation of the Checker is done manually by
the programmer in all cases.
Trail for Sorting
• In order to verify the output of a sorting
algorithm we must check that
• The sorted items are a permutation of the
original input items.
• The sorted items appear in a non-decreasing
order in the sorter's output.
• Thus, the trail should contain all the items in
their original order, each labeled with its
location in the sorted list.
Sorting Checker
• A Sorting Checker must:
• Use the labels to place all elements into their
sorted spots and verify that this results in a nondecreasing order.
• Verify that no two elements are placed in the
same location in the ordered list.
• The Sorter takes O(n2) or O(n log n) time.
• The Checker takes O(n) time.
• Checker is asymptotically faster than Sorter.
Convex Hull Problem
Given a set of points on a 2D plane, find a subset of points that forms a
convex hull around all the points.
Convex Hull: Step 1
Points sorted in order of increasing slope relative to P1
P6
P8
P5
P7
P4
P1 is the
point with
the least xcoordinate.
P1
P3
P2
Convex Hull: Invariant
All the points not on the Hull are inside a triangle formed by P1 and two
successive points on the Hull.
P6
P8
P5
P7
P4
P1
P3
P2
Convex Hull: Invariant
We know that P3 is not a Hull point because the clockwise angle between
lines P2 P3 and P3 P4 is ≥ 180º.
P6
P8
P5
P7
P4
≥ 180º
P1
P3
P2
Convex Hull: Invariant
Note that if clockwise angle between
lines P2 P3 and P3 P4 is < 180º, then P3 is a Hull point
P6
P8
P5
P7
P4
P1
< 180º
P2
P3
Convex Hull Algorithm
• Add P1, P2 and P3 to the Hull.
(Note: P1, P2 and Pn must be on the Hull.)
• For Pk = P4 to Pn
• ... trying to add Pk to the Hull ...
• Let QA and QB be the two points most
recently added to the Hull:
• While the angle formed by QA, QB and Pk
≥180
• remove QB from the Hull since it is inside the
triangle: P1, QA, Pk.
• Add Pk to the Hull.
Trail for Convex Hull
• Augment Program to
• Output {q1, q2, ..., qm} = the indexes of the points
on the hull.
• Output a proof of correctness for {x1, x2, ..., xr} =
all points not on the Hull in the form of the
triangle that contains it.
Point not on Convex Hull
P3
P7
3 Surrounding Points
P1, P2, P4
P1, P6, P8
Convex Hull Checker
Checker must check that:
• There is a 1-1 correspondence between input
points and {q1, q2, ..., qm} U {x1, x2, ..., xr}.
• All points in the triangle proofs correspond to
input points.
• Each point in in the triangle proofs actually
lies in the given triangle.
• Every triple of supposed Hull points forms a
convex angle.
• There is a unique locally maximal point on
the hull.
Asymptotic Runtimes
• Original Convex Hull Algorithm takes
O(n log n) time to sort and the Hull
construction loop takes only O(n) time.
O(n log n)-time total.
• Convex Hull Checker runs thru the set of
points once for each check.
O(n)-time total.
• Checker asymptotically faster than
Original.
Certification Trails for Data
Structures
• Lets have a data structure for storing
value/key pairs, ordered lexicographically:
(key, val) < (key', val') iff
val<val' or (val=val' and key<key')
• Operations:
• member(key): returns whether key is mapped to
some val.
• insert(key, val): inserts a pair (key, val) into the
data structure.
• delete(key): deletes the pair that contains key.
Data Structure Specs
• Data Structure Operations
• changekey(key, newval): executed when the pair
(key, oldval) exists in the data structure.
Removes this pair and inserts the pair
(key, newval)
• deletemin(): deletes the smallest pair (according
to the ordering). Returns “empty” if the data
structure contains no pairs.
• predecessor(key): returns the key of the pair that
immediately precedes key's pair or “smallest”
if there is no such pair.
• empty(): returns whether the data structure is
empty.
Data Structure Implementation
• Such a Data Structure can be implemented
via an AVL tree, a red-black tree or a b-tree.
• Most operations will take O(log n) time.
• We can augement implementations to
generate a certification trail:
• insert(key, val): output the key of the
predecessor of the newly inserted pair
(key, val). If there is no predecessor, output
“smallest”.
• changekey(key, newval): output predecessor of
the new pair (key, newval). If there is no
predecessor, output “smallest”.
Data Structure Checker
• A Checker for any program using the above
data structure can use the certification trail to
implement a much faster data structure.
• All operations can be done in O(1) time.
• Resulting program will be faster than original
program. Maybe asymptotically faster.
Optimized Data Structure
• A doubly linked list of (key, val) pairs, sorted
according to the pair ordering relation.
• An array indexed by keys, containing pointers to
(key, val) pairs corresponding to the indexes.
• The first pair (with key=0) contains value=sm, which
is defined to be smaller than any other possible
value.
Optimized Data Structure
• Optimized data structure operations:
• insert(key, val):
• Read from trail prec_key = the key of the pair
preceding the new (key, val) pair.
• Check that it is a valid index.
• Look at the pair pointed to by array[prec_key].
• Verify that it is ≠null.
• Place the (key, val) pair at index key, following the
(prec_key, prec_val) pair.
• Check that before the insert() array[key] was =null.
• Ensure that (key, val) is greater than its
predecessor and less than its successor.
Optimized Insert Example
Result of the call insert(5, 62)
Optimized Data Structure
• Optimized data structure operations:
• delete(key): Remove the pair pointed to by array[key].
• Ensure that array[key]≠null.
• changekey(key, newval): Call delete(key), followed by
insert(key, newval). These calls will check all necessary
conditions.
• deletemin(): Look at the pair that follows the pair (0,sm)
(pointed to by array[0]).
• If no such pair, return “empty”.
• Else, if there exists pair (key, val), then remove it and
set array[key] to null.
• empty(): Return whether there is a pair following the pair
(0,sm).
Optimized Data Structure
• Optimized data structure operations:
• member(key): return whether array[key]=null.
• predecessor(key):
• Look at the pair pointed to by array[key].
• Follow its backward link to its predecessor pair.
• If the predecessor pair is (0,sm) then return “smallest”.
• Else, return the key field of that pair.
• Note that all the operations can be done in
O(1) time.
Shortest Path
• A Shortest Path algorithm was implemented
using the above algorithm.
• The original program used the original data
structure that produced a certification trail.
• The checker version was identical to the
original except that its data structure was the
optimized version that used the trail.
• Original runtime = O(m•log n)
• Checker runtime = O(m)
• (m=number of edges, n=number of nodes)
Performance: Sort
Basic
1st Execution
2 nd Execution Speedup % Savings
Algorithm (Generates Trail) (Uses Trail)
10000
0.28
0.30
0.04
7.00
39.29%
50000
1.80
1.90
0.19
9.47
41.94%
100000
3.96
4.08
0.41
9.66
43.31%
500000
23.95
24.69
2.14
11.19
43.99%
1000000
50.23
51.57
4.38
11.47
44.31%
Size
• Basic Algorithm – Sorting algorithm with no
certification trails.
• 1st Execution – Sorter that produces certification trail.
• 2nd Execution – Checking algorithm that uses the
trail.
• Speedup – factor of improvement of 2nd vs Basic.
• %Savings – of 1st + 2nd trails execution over running
Basic twice.
Performance: Sort
Size
10000
50000
100000
500000
1000000
Basic
1st Execution 2 nd Execution Speedup % Savings
Algorithm (Generates Trail) (Uses Trail)
0.28
0.30
0.04
7.00
39.29%
1.80
1.90
0.19
9.47
41.94%
3.96
4.08
0.41
9.66
43.31%
23.95
24.69
2.14
11.19
43.99%
50.23
51.57
4.38
11.47
44.31%
Performance: Convex Hull
Basic
2nd Execution Speedup % Savings
1st Execution
Algorithm (Generates Trail) (Uses Trail)
5000
0.61
0.62
0.07
8.73
43.62%
10000
1.33
1.34
0.14
9.56
44.54%
25000
3.68
3.68
0.36
10.22
45.12%
50000
7.68
7.74
0.71
10.75
44.94%
100000
16.23
16.30
1.43
11.35
45.39%
200000
33.93
34.37
2.84
11.94
45.16%
Size
Performance: Shortest Path
Basic
1st Execution
2nd Execution Speedup % Savings
Size
(n,m)
Algorithm (Generates Trail) (Uses Trail)
100,1000
0.04
0.05
0.02
2.00
12.50%
250,2500
0.15
0.16
0.06
2.50
26.67%
500,5000
0.31
0.33
0.11
2.82
29.03%
100,10000
0.70
0.76
0.23
3.04
29.29%
2000,20000
1.58
1.67
0.45
3.51
32.91%
2500,25000
2.06
2.15
0.55
3.75
34.47%
Summary of Experiments
• The overhead of generating a certification
trail is about 2%.
• The checker run is much faster than the
original. It can be run on much slower
hardware or use a formally verified language.
Application to Byzantine Failures
• Current technique is completely manual. No
known way to automatically convert a
program to generate a trail.
• We may develop libraries that use the
Certification Trails technique, allowing us to
catch errors in a large fraction of a program.
• Door open to Failure Recovery: when an
error is detected the checker goes back to
using original code to redo the work.