Transcript Design

Techniques for automated
localization and correction of
design errors
Jaan Raik
Tallinn University of Technology
1
Design error debug
“There has never been an unexpectedly short
debugging period in the history of computers.”
Steven Levy
2
Designs are getting bigger
3
Designs are getting costlier
 25-30 % annually decreasing cost per function
 15 percent annual growth of the market for IC
• But …
 The cost of chip design keeps on growing.
• In 1981, development of a leading-edge CPU
cost 1 M$
• …today it costs more than 300 M$ !!!
• Why do the costs increase ???
4
Design automation crisis
System
design
• productivity gap
Logic
design
40
– 58% versus 21% annually
Physical
design
60
70
Schematic
entry
Simulation
30
50
Tehnology’s
capabilities
< 1979
Placement &
routing
2
~ 1983
2
1986
Hierarchy,
generators
30
transistors
on the die
40
Logic synthesis
30
2
Designer’s productivity
High-level synthesis /
System-level synthesis
10
time
today
1988-92
Person months /
20 000 logic gates
5
2
1992-95
Specialized
high-level synthesis
3
2
~1996-...
Verification and debugging
Verification
Debug
Development time:
Specify
•
•
•
•
Design
Detect
Localise
Correct
Debug = Localization + Correction
~2/3 of development time for verification
~2/3 of verification time for debug
Thus nearly half of the development cycle
6
Bugs are getting „smarter“
CREDES Summer School, June 2-3, 2011, Tallinn, Estonia
7
Traditional debug flow
???
Spec
Design
Verification
Error!
Counter-examples
(waveforms),
failed assertions,
...
• Too much information
• Too little information
8
Automated debug flow
Spec
Design
Verification
Corrected design,
Repair log, ...
Error!
Error
localization
Error
correction
9
Outline
• Verification basics
• Automated debug at the gate-level
• RTL debug methods
– Localization: SAT; correction: resynthesis
– Localization: path tracing; correction: mutation
• General discussion, future trends
• Prototype tools, on-going activities
CREDES Summer School, June 2-3, 2011, Tallinn, Estonia
10
Verification
“To err is human - and to blame it on a
computer is even more so.”
Robert Orben
11
Verification versus test
• The goal of verification is to check if a system is
designed correctly.
• Validation is similar to verification but we check
on a prototype device, not a model.
• By (manufacturing) test we understand
checking every instance of a produced chip
against manufacruring defects.
12
Abstraction levels and verification
13
Difficulties in verification
• Errors may be in implementation,
specification or verification environment
(constraints)
• No way to detect bugs in the spec, because
reference object is missing. Thus: verification
by redundancy.
• Problem: How to assess verification quality
i.e. coverage? (except in equivalence
checking)
14
15
Verification flow
16
Dynamic verification
17
Dynamic verification
• Based on simulation
• Code coverage
• Assertions, functional coverage
18
Formal verification
19
Dynamic vs formal verification
20
Automated debug techniques
“Logic is a poor model of cause and effect.”
Gregory Bateson
21
Debugging design errors
• Concept of design error:
– Mostly modeled in implementation,
sometimes in specification
• Main applications:
– Checking the synthesis tools
– Engineering change, incremental
synthesis
– Debugging
22
Debugging design errors
What leads to debugging?
• Design behavior doesn’t match expected
behavior
When does this occur?
• During simulation of design
• Formal tools (property/equivalence check)
• Checkers identify the mismatch
23
Design error diagnosis
• Classification of methods:
– Structure-based/specification-based
– Explicit/Implicit fault model (model-free)
– Single/multiple error assumption
– Simulation-based/symbolic
24
Debugging combinational logic
• Thoroughly studied in 1990s
• Many works by Aas, Abadir, Wahba &
Borrione, others
• Also studied, at TUT (Ubar & Jutman)
– Used structural BDDs for error
localization
25
Explicit error model (Abadir)
• functional errors of gate elements
–
–
–
–
–
gate substitution
extra gate
missing gate
extra inverter
missing inverter
• connection errors of signal lines
– extra connection
– missing connection
– wrong connection
26
Missing gate error (Abadir)
27
Mapping stuck-at faults to
design errors
• Abadir: Complete s-a test detects all
single gate replacements
(AND,OR,NAND,NOR), extra gates
(simple case), missing gates (simple
case) and extra wires.
28
Combinational fault diagnosis
Fault localization by fault table
T1
T2
T3
T4
T5
T6
F1
0
1
1
0
0
0
F2
1
0
1
1
0
0
F3
1
0
0
0
1
1
F4
0
1
1
0
0
0
F5
0
0
0
1
1
0
F6
0
0
1
0
1
1
Test responses:
E1
0
0
0
1
1
0
F7
0
0
0
0
0
1
E2
0
1
1
0
0
0
E3
1
0
0
1
1
0
Fault F5 located
Faults F1 and F4 are not distinguishable
No match, diagnosis not possible
29
Mapping stuck-at faults to
design errors
30
Distribution of design errors
31
Explicit model: disadvantages
• High number of errors to model
• Some errors still not modeled
32
Implicit design error models
• Do not rely on structure
• Circuit under verification as a black
box
• I/O pin fault models
33
Design error correction
• Classification:
– Error matching approach
– Resynthesis approach
34
Design error correction
• Happens in a loop:
– An error is detected and localized
– Correction step is applied
– Corrected design must be reverified
– ...
• Until the design passes verification
35
Ambiguity of error location
• Since there is more than one way to synthesize a
given function, it is possible that there is more
than one way to model the error in an incorrect
implementation
• correction can be made at different locations
36
Crash course on SAT
CREDES Summer School, June 2-3, 2011, Tallinn, Estonia
37
Satisfiability aka SAT
• SAT: a Boolean function is satisfiable
iff there exists a variable assignment
to make it evaluate to TRUE
• The Boolean function must be
represented as a CNF:
Digitaalsüsteemide verifitseerimise kursus
38
Satisfiability aka SAT
• SAT is transformed to CNF
(i.e. product of sums).
• Sums are called terms.
• If a term has max 2 literals, then 2-SAT
 2-SAT is solved in polynomial time
 3-SAT is an NP-complete problem
• N-SAT can be reduced to 3-SAT
Digitaalsüsteemide verifitseerimise kursus
39
SAT for circuits
• Characteristic function
• Build CNF for logic gates using
implication:
• ab = ¬a + b
a
0
0
1
1
b
0
1
0
1
ab
1
1
0
1
Digitaalsüsteemide verifitseerimise kursus
40
SAT for circuits
a
b
&
c
• Implications for AND-gate:
¬a¬c & ¬b ¬c & ¬c  ¬a  ¬b
• Characteristic function for AND as a CNF:
(a+ ¬c) (b+ ¬c) (c+ ¬a+ ¬b)
Digitaalsüsteemide verifitseerimise kursus
41
SAT for circuits
a
b
1
c
• Implications for OR-gate:
ac & b c & c  a  b
• Characteristic function for OR as a CNF:
(¬a + c) (¬b + c) (¬c + a + b)
Digitaalsüsteemide verifitseerimise kursus
42
SAT for circuits
a
b
d
&
c
e 1
f
Characteristic function for the circuit:
(a+¬d)(b+¬d)(d+¬a+¬b)(¬c+¬e)(c+e)(¬d+f)(¬e+f)(¬f+d+e)
Digitaalsüsteemide verifitseerimise kursus
43
SAT-based RTL debug
• Mux-enrichment
– Muxes added to RTL code blocks
– Mux select values select free inputs for the
symptom blocks
– Synthesis is applied to find logic expressions
generating the signatures for these free
inputs
• Cardinality constraints
• Test vector constraints
Smith, Veneris, et al., TCAD, 2005
44
SAT-based RTL debug
a) Mux enrichment, b) cardinality constraints
45
SAT-based RTL debug
• SAT provides locations of signals where errors
can be corrected
• Multiple errors considered!
• They also provide the partial truth table of the fix
• Correction by resynthesis
• This is also a disadvantage:
– Why should we want to replace a bug with a more
difficult one?
46
Path tracing for localization
• One of the first debug methods
• Backtracing mismatched outputs
(sometimes also matched outputs)
• Dynamic slicing → critical path tracing
(RTL)
47
Mutation-based correction
• Locate error suspects by backtracing
• Correct by mutating the faulty block
(replace by a different function from a
preset library)
• An error-matching approach
48
Testbench-based approach
Original system
description
if (fn==1)
1
else if (fn==2)
2
...
if (fn==4)
4
else if (fn==5)
5
...
1. Identify injection
location
2. Apply mutation operators
Injected system
description
accordingly
49
Arithmetic Operator Replacement
(AOR)
• Set of arithmetic operators = {addition, subtraction,
multiplication, division, modulo}
• Replace each occurrence of arithmetic operator with all
the other operators in the set
a = b – c;
a = b * c;
a = b + c;
a = b / c;
a = b % c;
50
Logical Connector
Replacement (LCR)
• Set of logical connectors = {and, nand, nor, or, xor}
• Replace each occurrence of logical connector with all the
other connectors in the set
if !(a & b) …
if !(a | b) …
if (a & b) …
if (a | c) …
if (a ^ c) …
51
Relational Operator
Replacement (ROR)
• Set of relational operators = {equal, not_equal,
greater_than, less_than, greater_than_or_equal,
less_or_equal_then}
• Replace each occurrence of relational operator with all
the other operators in the set
if (a != b) …
if (a > b) …
if (a < b) …
if (a == b) …
if (a >= c) …
if (a <= c) …
52
Unary Operator Injection
(OUI)
• Set of unary operators = {negative,
inversion}
• Replace each occurrence of unary operator
with the other operator in the set
a = !b;
a = ~b;
53
More mutation examples
•
•
•
•
Constant value mutation
Replacing signals with other signals
Mutating control constructs
.....
CREDES Summer School, June 2-3, 2011, Tallinn, Estonia
54
Approaches for SW & HW
• Vidroha Debroy and W. Eric Wong, Using
Mutation to Automatically Suggest
Fixes for Faulty Programs, Software
Testing, Verification and Validation Conf.,
June 2010.
• Raik, J.; Repinski, U.; et al. High-level
design error diagnosis using backtrace
on decision diagrams. 28th Norchip
Conference 15-16 November 2010.
55
Motivational example
b:=a-b
IF res = 1 THEN state:=s0;
ELSE
CASE state IS
WHEN s0 =>
a:=in1; b:=in2; ready:=0;
state:=s1;
WHEN s1 =>
IF ab THEN state:=s2;
ELSE state:=s5; ENDIF;
WHEN s2 =>
IF a>b THEN state:=s3;
ELSE state:=s4; ENDIF;
WHEN s3 =>
a:=a-b; state:=s1;
WHEN s4 =>
b:=b-a; state:=s1;
WHEN s5 =>
ready:=1;
state:=s5;
END CASE;
END IF;
a)
s0,s3,s4
state res 0
s1
state
s5 s5
1
s0
s1
F
a≠b
s2
T
T s3
s2
a>b
s4
F
s0
ready
state
s5
0
1
s1,s2,s3,s4 ready
4
a
b)
s0
in1
state
s3
a-b
5
s1,s2,s4,s5
a
4
b
a-b
s0
in2
state
s4
b-a
5
s1,s2,s3,s5
b
4
56
Motivational example
Passed sequence
res in1 in2 state a b ready
1 4
2
- 0
s0 4 2
0
0
s1 4 2
0
0
s2 4 2
0
0
s3 4 2
0
0
s1 2 2
0
0
s5 2 2
0
0
s5 2 2
1
Failed sequence
res in1 in2 state a
b
ready
1 2
4
0 s0
2
4
0
0 s1
2
4
0
0 s2
2
4
0
0 s4
2
4
0
0 s1
2 2-2
0
0 - s5s2 2 2-2
0
0 - s5s3 2 2-2 10
57
Motivational example
Backtrace cone: Passed sequence
ready
ready:=0
ready:=1
b
b:=in2
res=1
a:=in1
a=ab
state:=s5
a=b
state:=s1
state:=s3
a>b
state:=s2
a≠b
state:=s1
state:=s0
Backtrace cone: Failed sequence
ready
ready:=0
b
b:=ab
b:=in2
res=1
a:=in1
state:=s2
a≠b
state:=s1
state:=s4
ab
state:=s2
a≠b
state:=s1
state:=s0
58
Statistical analysis
• Ranking according to suspiciousness:
Suspiciousness score
Circuit blocks
59
Fault localization experiments
Design
gcd
diffeq
risc
crc
success rate,
# detected
functions
step1
step2
2/2
2/2
8/8
8/8
16/16 13/16
25/25 20/25
average resolution,
# suspects
step1
3
3.3
7.6
17.3
step2
1
1.9
1.4
2.4
worst resolution,
# suspects
step1
3
5.6
11.6
21
step2
1
2.8
2.3
7
Step1: Critical path tracing of mismatched outputs (max Failed)
Step2: Max ratio (Failed/Passed+Failed) of backtrace cones
60
Advantages & open questions
•
•
•
•
Mutation-based repair is readable
Helps keeping user in the loop
Provides a „global“ repair, for all stimuli
How does this backtracing based method
perform in the case of multiple errors?
• What would be a good fault model for
high-level design errors?
61
Future trends
• The quality of localization and correction is
dependent on input stimuli
• Thus, diagnostic test generation needed
• Readable, small correction prefered:
– Correction holds normally only wrt given input
vectors (e.g. Resynthesis)
– Why should we replace an easily detectable
bug with a more difficult one?!
62
Idea: HLDD-based correction
• A canonical form of high-level decision
diagrams (HLDD) using characteristic
polynomials
• It allows fast probabilistic proof of
equivalence of two different designs.
• Idea: Extend it towards correction
63
Prototype tools, activities
CREDES Summer School, June 2-3, 2011, Tallinn, Estonia
64
FP7 Project DIAMOND
• Start January 2010, duration 3 years
• Total budget 3.8M €
– EU contribution 2.9M €
• Effort 462.5 PM
The IBM logo is a registered trademark of
International Business Machines
Corporation (IBM) in the United States
and other countries.
DIAMOND Kick-off, Tallinn, February 2-3, 2010
65
The DIAMOND concept
Design
Specification
Flow
Implementation
Post-Silicon
Design errors, soft errors, ...
Holistic fault
models
Fault
diagnosis
Fault
correction
Reliable Nanoelectronics Systems
66
FORENSIC
• FoREnSiC – Formal Repair Engine for
Simple C
• For debugging system-level HW
• Idea by TUG, UNIB and TUT at DATE’10
• Front-end converting simple C descriptions
to flowchart model completed
• 1st release expected by the end of 2011
67
Forensic Flow
68
APRICOT: Design Verification
Extensions of BDD 69 HLDD  THLDD
APriCoT Verification System
– Assertion/Property checkIng, Code coverage
& Test generation
– The tools run on a uniform design model
based on high-level decision diagrams.
– The functionality includes currently
•
•
•
•
•
test generation,
code coverage analysis,
assertion-checking,
mutation analysis and
design error localization
70
ZamiaCAD: IDE for HW Design
• ZamiaCAD is an Eclipse-based development
environment for hardware designs
• Design entry
• Analysis
• Navigation
• Simulation
• Scalable!
• Co-operation with IBM Germany, R. Dorsch
71
To probe further...
Functional Design Errors in Digital Circuits:
Diagnosis, Correction and Repair
K. H. Chang, I. L. Markov, V. Bertacco
...............................................
Publisher: Springer
Pub Date: 2009
72