CA-BIST for Asynchronous Circuits: A Case Study on the RAPPID

Download Report

Transcript CA-BIST for Asynchronous Circuits: A Case Study on the RAPPID

CA-BIST for Asynchronous Circuits:
A Case Study on the RAPPID Pentium
R
Pro Instruction Length Decoder
Marly Roncken, Ken Stevens, Shai Rotem - Intel Corporation
Rajesh Pendurkar - Sun Microsystems
ASYNC’00 / MR
Parimal Pal Chaudhuri - Bengal Engineering College
1
RAPPID
Revolving Asynchronous Pentium
R
Processor Instruction-length Decoder
RAPPID
Pentium II
(Processor Core)
Throughput [instr/nsec]
RAPPID
3.5
1.2
0
2
4
Deschutes
Latency [nsec]
2.1
5
0
ASYNC’00 / MR
2
4
6
3.1 x 3.5 mm
2
Baseline for Case Study

RAPPID = High-Speed Proof-of-Concept

synchronous basis




0.25 micron CMOS static + domino library
self-timed addition: Relative Timing

from handshake causality

to relative assumptions
Testability study started afterwards

no DfT in core design - but some debug features

considered a major risk in performance + fault coverage
Wanted: non-invasive test approach

outside RAPPID core

low performance penalty

no re-design
ASYNC’00 / MR
3
Objectives

Achieve 95% fault coverage with non-scan BIST



look for fast ways to tune BIST to RAPPID

beyond pseudo-random testing

use HDM coverage metric : min-terms in Length Decode PLA

follow design architecture : replication
use Cellular Automata BIST

wider behavior than LFSR solutions

expert in-house
Analyze testability impact of Relative Timing

take BIST solution - analyze undetected stuck-at faults

manageable


in-focus

ASYNC’00 / MR
high fault coverage leaves relatively few undetected faults
HDM coverage metric uncovers implementation-specific faults
4
Outline

Part I
CA-BIST solution for RAPPID


RAPPID interface and design hooks

CA-BIST architecture + algorithm + costs

CA test generation engine

bootstrapped test expansion
Part II
Stuck-at fault analysis for Relative Timing

fault coverage distribution


ASYNC’00 / MR
benign and suspicious escapees
Conclusion
5
CA-BIST - starting point
RAPPID
ASYNC’00 / MR
6
CA-BIST - starting point
CATPG
CARE
RAPPID
ASYNC’00 / MR
7
RAPPID - core Architecture
16x replication
used in CATPG
Byte
Row 0
Tag Unit
Row 1
Tag Unit
Tag Unit
Tag Unit
ASYNC’00 / MR
3 4
5
6 7 8
9 10 11 12 13 14 15
optimal balance
(common instr)
Length
Decode
Row 2
CTRL
1 2
Byte
Latch
Row 3
Byte Unit
Column 0
Decode and Steer Unit
720 MHz
3.6 GHz
Crossbar Switch
900 MHz
Crossbar Switch
Output Buffer
Crossbar Switch
Output Buffer
Crossbar Switch
Outputs shared
Output Buffer
in CARE
8
RAPPID - input FIFO
RAPPID
CATPG

interface to the tester

interface to RAPPID

data bytes in circular scan

share circular scan


branch status register / byte


at-speed performance testing


from external Branch Target Buffer
16/32-bit instruction modes

global setting

instruction based local setting
ASYNC’00 / MR

to bootstrap test generation
share 3-bit status register

direct access via test circuit BRT

2-byte-instruction based setting
share 16/32 global setting

test patterns cover local setting
9
CA-BIST - interfacing RAPPID
CATPG
CA
CARE
RAPPID
ASYNC’00 / MR
10
One for All: CA test engine

Generate initial fillings for 16 instruction bytes

11-bit D1*CA + dual version

11 state bits - take first 8 bits as test instruction byte

48 pairs of state components with cycle length 16

no LFSR implementation
ASYNC’00 / MR
11
D1*CA component Pair
16x S0-S0 cycle
=16 FIFO fillings
Normal
S0
ASYNC’00 / MR
Dual
?
16x S1-S1 cycle
=16 FIFO fillings
S1
12
Traversal algorithm

Step 1



S0 := normal cyclic state := S2
Step 2

S1 := S0 with inverted MSB

run S1-S1 dual cycle

use state outputs as test bytes
ASYNC’00 / MR

Step 3

S0 := next normal state after S1

run S0-S0 normal cycle

use state outputs as test bytes
Step 4

STOP if ( S0=S2 ) else Step 2
13
All for One: test expansion
design replication
used in CATPG

Circular scan + Bootstrap Algorithm (256x)
Step 1: run Traversal Algorithm for next FIFO filling
Step 2: 128x ( left-rotate FIFO by 1 bit + test RAPPID )
Step 3: repeat Step 2 with right-rotate
Step 4: STOP if end-of-Traversal-Algorithm else Step 1


ASYNC’00 / MR
Add Test circuitry to close test gaps

HDM coverage revealed 66-0F gap for long instruction

extra Test Circuitry to circulate 66-0F in test set
Test operation modes (4x)

16/32 bit instruction modes

BRT to test branch instructions

660F to cover remaining test gap
1024 x 32 tests
100% coverage
14
CA-BIST solution
CATPG
CA
CARE
RAPPID
ASYNC’00 / MR
15
Costs

performance


latency

CATPG
1 gate delay (shared scan)

CARE
negligible output load+delay
throughput


ASYNC’00 / MR
CATPG + CARE
0%
off critical path
area


5%
from schematics
5% (including circular scan)
fault coverage

for HDM
100% PLA min-terms

at switch level
94% testable stuck-at faults
16
Stuck-at fault Analysis


120,000 transistors

static + domino gates

pass + reset transistors
Column 0
Decode and Steer Unit
15
switch-level fault analysis

COSMOS

stuck-at input + output

simulated for full RAPPID

only injected in 1 column-row
Tag Unit
Crossbar Switch
9%
5%
5%
excellent ATPG
candidates
ASYNC’00 / MR
81%
Detected
Unexercised
Benign
Suspicious
17
Benign & Suspicious Escapees
full keeper
for 0 & 1
half keeper
for 1
only
footed protection
no foot
set-reset overlap ( c d )
for pulse-d evaluations
ASYNC’00 / MR
18
Domino Escapees
@0

pulse narrowing


fast slow

d fights keeper (weak)

push-out of z:=0

slightly deteriorated 0
keeper helps reset c

push-forward of z:=1
floating gate output

z floats when neither set-reset

relative timing matters


benign

wide evaluation pulse

d stays valid during evaluation

redundant n-transistor in keeper

test at low speed or frequency

within specification range

increases test application costs
noise sensitive


ASYNC’00 / MR
fast reset OK, slow NOT OK
suspicious


@1
test for realistic noise conditions
similar for clocked design
19
Pulse domino Escapees
@0

pulse narrowing


d fights keeper (weak)
suspicious

push-out of z:=0

d0d1d2 reset during evaluation

slightly deteriorated 0

required for un-footed gate

crucial n-transistor in keeper
keeper helps reset c
push-forward of z:=1

benign

wide evaluation pulse

d stays valid during evaluation

redundant n-transistor in keeper

output pulse shrinks >50%

from 7 gate delays ( c loop )

to 3 gate delays ( d2 loop )
noise sensitive


ASYNC’00 / MR
small input pulse




noise can make pulse narrower
similar for self-reset clocks
in `synchronous’ pulse logic
20
Conclusion

Testability is no excuse

to avoid asynchronous + clocked high-performance



similar fault effects for RAPPID and clocked domino
CA-BIST without scan works for RAPPID

non-invasive with low performance + area penalty

covers 94% of testable faults

remaining 6% relate to missing data operands

suitable minority for tailored ATPG + off-chip testing
… and BIST tuning

RAPPID column replication tuned test expansion


Current
focus
not always so obvious
HDM coverage metric tuned CATPG solution

ASYNC’00 / MR
… but missed 5% potentially catastrophic timing related faults
21