CA-BIST for Asynchronous Circuits: A Case Study on the RAPPID
Download
Report
Transcript CA-BIST for Asynchronous Circuits: A Case Study on the RAPPID
CA-BIST for Asynchronous Circuits:
A Case Study on the RAPPID Pentium
R
Pro Instruction Length Decoder
Marly Roncken, Ken Stevens, Shai Rotem - Intel Corporation
Rajesh Pendurkar - Sun Microsystems
ASYNC’00 / MR
Parimal Pal Chaudhuri - Bengal Engineering College
1
RAPPID
Revolving Asynchronous Pentium
R
Processor Instruction-length Decoder
RAPPID
Pentium II
(Processor Core)
Throughput [instr/nsec]
RAPPID
3.5
1.2
0
2
4
Deschutes
Latency [nsec]
2.1
5
0
ASYNC’00 / MR
2
4
6
3.1 x 3.5 mm
2
Baseline for Case Study
RAPPID = High-Speed Proof-of-Concept
synchronous basis
0.25 micron CMOS static + domino library
self-timed addition: Relative Timing
from handshake causality
to relative assumptions
Testability study started afterwards
no DfT in core design - but some debug features
considered a major risk in performance + fault coverage
Wanted: non-invasive test approach
outside RAPPID core
low performance penalty
no re-design
ASYNC’00 / MR
3
Objectives
Achieve 95% fault coverage with non-scan BIST
look for fast ways to tune BIST to RAPPID
beyond pseudo-random testing
use HDM coverage metric : min-terms in Length Decode PLA
follow design architecture : replication
use Cellular Automata BIST
wider behavior than LFSR solutions
expert in-house
Analyze testability impact of Relative Timing
take BIST solution - analyze undetected stuck-at faults
manageable
in-focus
ASYNC’00 / MR
high fault coverage leaves relatively few undetected faults
HDM coverage metric uncovers implementation-specific faults
4
Outline
Part I
CA-BIST solution for RAPPID
RAPPID interface and design hooks
CA-BIST architecture + algorithm + costs
CA test generation engine
bootstrapped test expansion
Part II
Stuck-at fault analysis for Relative Timing
fault coverage distribution
ASYNC’00 / MR
benign and suspicious escapees
Conclusion
5
CA-BIST - starting point
RAPPID
ASYNC’00 / MR
6
CA-BIST - starting point
CATPG
CARE
RAPPID
ASYNC’00 / MR
7
RAPPID - core Architecture
16x replication
used in CATPG
Byte
Row 0
Tag Unit
Row 1
Tag Unit
Tag Unit
Tag Unit
ASYNC’00 / MR
3 4
5
6 7 8
9 10 11 12 13 14 15
optimal balance
(common instr)
Length
Decode
Row 2
CTRL
1 2
Byte
Latch
Row 3
Byte Unit
Column 0
Decode and Steer Unit
720 MHz
3.6 GHz
Crossbar Switch
900 MHz
Crossbar Switch
Output Buffer
Crossbar Switch
Output Buffer
Crossbar Switch
Outputs shared
Output Buffer
in CARE
8
RAPPID - input FIFO
RAPPID
CATPG
interface to the tester
interface to RAPPID
data bytes in circular scan
share circular scan
branch status register / byte
at-speed performance testing
from external Branch Target Buffer
16/32-bit instruction modes
global setting
instruction based local setting
ASYNC’00 / MR
to bootstrap test generation
share 3-bit status register
direct access via test circuit BRT
2-byte-instruction based setting
share 16/32 global setting
test patterns cover local setting
9
CA-BIST - interfacing RAPPID
CATPG
CA
CARE
RAPPID
ASYNC’00 / MR
10
One for All: CA test engine
Generate initial fillings for 16 instruction bytes
11-bit D1*CA + dual version
11 state bits - take first 8 bits as test instruction byte
48 pairs of state components with cycle length 16
no LFSR implementation
ASYNC’00 / MR
11
D1*CA component Pair
16x S0-S0 cycle
=16 FIFO fillings
Normal
S0
ASYNC’00 / MR
Dual
?
16x S1-S1 cycle
=16 FIFO fillings
S1
12
Traversal algorithm
Step 1
S0 := normal cyclic state := S2
Step 2
S1 := S0 with inverted MSB
run S1-S1 dual cycle
use state outputs as test bytes
ASYNC’00 / MR
Step 3
S0 := next normal state after S1
run S0-S0 normal cycle
use state outputs as test bytes
Step 4
STOP if ( S0=S2 ) else Step 2
13
All for One: test expansion
design replication
used in CATPG
Circular scan + Bootstrap Algorithm (256x)
Step 1: run Traversal Algorithm for next FIFO filling
Step 2: 128x ( left-rotate FIFO by 1 bit + test RAPPID )
Step 3: repeat Step 2 with right-rotate
Step 4: STOP if end-of-Traversal-Algorithm else Step 1
ASYNC’00 / MR
Add Test circuitry to close test gaps
HDM coverage revealed 66-0F gap for long instruction
extra Test Circuitry to circulate 66-0F in test set
Test operation modes (4x)
16/32 bit instruction modes
BRT to test branch instructions
660F to cover remaining test gap
1024 x 32 tests
100% coverage
14
CA-BIST solution
CATPG
CA
CARE
RAPPID
ASYNC’00 / MR
15
Costs
performance
latency
CATPG
1 gate delay (shared scan)
CARE
negligible output load+delay
throughput
ASYNC’00 / MR
CATPG + CARE
0%
off critical path
area
5%
from schematics
5% (including circular scan)
fault coverage
for HDM
100% PLA min-terms
at switch level
94% testable stuck-at faults
16
Stuck-at fault Analysis
120,000 transistors
static + domino gates
pass + reset transistors
Column 0
Decode and Steer Unit
15
switch-level fault analysis
COSMOS
stuck-at input + output
simulated for full RAPPID
only injected in 1 column-row
Tag Unit
Crossbar Switch
9%
5%
5%
excellent ATPG
candidates
ASYNC’00 / MR
81%
Detected
Unexercised
Benign
Suspicious
17
Benign & Suspicious Escapees
full keeper
for 0 & 1
half keeper
for 1
only
footed protection
no foot
set-reset overlap ( c d )
for pulse-d evaluations
ASYNC’00 / MR
18
Domino Escapees
@0
pulse narrowing
fast slow
d fights keeper (weak)
push-out of z:=0
slightly deteriorated 0
keeper helps reset c
push-forward of z:=1
floating gate output
z floats when neither set-reset
relative timing matters
benign
wide evaluation pulse
d stays valid during evaluation
redundant n-transistor in keeper
test at low speed or frequency
within specification range
increases test application costs
noise sensitive
ASYNC’00 / MR
fast reset OK, slow NOT OK
suspicious
@1
test for realistic noise conditions
similar for clocked design
19
Pulse domino Escapees
@0
pulse narrowing
d fights keeper (weak)
suspicious
push-out of z:=0
d0d1d2 reset during evaluation
slightly deteriorated 0
required for un-footed gate
crucial n-transistor in keeper
keeper helps reset c
push-forward of z:=1
benign
wide evaluation pulse
d stays valid during evaluation
redundant n-transistor in keeper
output pulse shrinks >50%
from 7 gate delays ( c loop )
to 3 gate delays ( d2 loop )
noise sensitive
ASYNC’00 / MR
small input pulse
noise can make pulse narrower
similar for self-reset clocks
in `synchronous’ pulse logic
20
Conclusion
Testability is no excuse
to avoid asynchronous + clocked high-performance
similar fault effects for RAPPID and clocked domino
CA-BIST without scan works for RAPPID
non-invasive with low performance + area penalty
covers 94% of testable faults
remaining 6% relate to missing data operands
suitable minority for tailored ATPG + off-chip testing
… and BIST tuning
RAPPID column replication tuned test expansion
Current
focus
not always so obvious
HDM coverage metric tuned CATPG solution
ASYNC’00 / MR
… but missed 5% potentially catastrophic timing related faults
21