Sunil’s presentation - Texas A&M University

Download Report

Transcript Sunil’s presentation - Texas A&M University

Design and
Impementation of a Subthreshold BFSK
Transmitter
By:
Suganth Paul#
Rajesh Garg$
Sunil P. Khatri$
Sheila Vaidya%
#Intel
Corporation, Austin, TX
$Department of ECE, Texas A&M University, College Station, TX
%Lawrence Livermore National Lab., Livermore, CA
1
Outline
 Sub-threshold circuits – the opportunity
 Challenges
 Process/temperature/voltage variations
 Solution – dynamic body bias
 Validation via test chip
 Design methodology
 Silicon results
 Conclusions
2
The Opportunity
 Power consumption has become a major issue for recent ICs
 There is a large and growing class of applications where
power reduction is paramount – not speed.
 Such applications are ideal candidates for sub-threshold circuit design
Traditional Ckt
Sub-threshold Ckt (Vb = 0V) Sub-threshold Ckt (Vb = VDD)
Process Delay(ps) Power(W) P-D-P(J) Delay  Power  P-D-P  Delay  Power  P-D-P 
bsim70
14.157 4.08E-05 5.82E-07 17.01X 308.82X 18.50X
9.93X 141.10X 14.43X
bsim100 17.118 6.39E-05 1.08E-06 24.60X 497.54X 20.08X
12.00X 100.96X
8.20X
 Compared traditional circuit with sub-threshold (obtained by
simply setting VDD < VT)
 Performed simulations for 2 different processes on a 21 stage
ring oscillator.
 Impressive power reduction (100X – 500X)
 Power-Delay-Product (P-D-P) improves by as much as 20X
 P-D-P is an important metric to compare circuit design styles
3
Sub-threshold Logic
 Ids has an exponential dependence on process,
voltage and temperature (PVT)
I dssub  I o 
W
e
L
 Vgs VT Voff

nvt




V 

 ds  
v
 1  e  t  




 Need to stabilize the circuit performance by
compensating for PVT variations
 No approach to compensate sub-threshold delay
 Existing approaches compensate sub-threshold currents
 To compensate delay, need a representative
circuit
 Not easy to come up with representative circuit for
standard cells
4
Our Solution
 We propose a technique that uses self-adjusting body-bias
to phase-lock the circuit delay to a beat clock.
 Use a network of PLAs to implement circuits.
 Several PLAs in a cluster share a common nbulk node.
 A representative PLA in each cluster is chosen to phase
lock the delay of the PLAs to the beat clock
 If the delay is too high, a forward body bias is applied to
speed up the representative PLA.
 If the delay is low, body bias is brought back down to zero
to slow down the representative PLA.
 All other PLAs exhibit the same delay as the representative
PLA, since they all share a common nbulk terminal
5
Objective
 Validate and verify flow by designing a sub-threshold
circuit for the application
 Choose a test application
 Low power, low speed
 Develop a sub-threshold circuit design flow
 Implement our delay compensation scheme to negate
PVT variations
 Implement the same application using a standard cell based
flow on the same die
 Fabricate and test the chip (TSMC 0.25 um process)
 Compare the sub-threshold circuit with the standard
cell circuit in terms of power consumption
6
Test Application - Binary Frequency
Shift Keying (BFSK) Transmitter
Binary
Input
Data
Digital BFSK Modulator
Produces two tones
f1 if Input is LOW
f2 if Input is HIGH
Digital Block Implemented Using
Sub-threshold Circuits
DAC
Amplifier
Antenna
 Specifications
 Input bit Rate: RB = 32kbps, Broadcast distance: D = 1000m
 FSK tones: f1=150kHz, f2=450kHz, Channel bandwidth: B = 300kHz
7
Sub-threshold Design
Approach
 Digital part of the circuit implemented as NPLA (Network of
Programmable Logic Arrays)
 NPLAs have low delay
 Critical path delay easy to find
 PLAs have common nbulk node
 Circuit level PVT compensation
 An external Beat Clock (BCLK) signal is phase locked
with the critical path delay
 Delay controlled by a charge pump that modulates the
bulk voltage of transistors in the circuit
 Compensates for both inter- and intra-die variations
8
Dynamic NOR-NOR PLA
Inputs
Outputs
clk
clk
clk
Precharge
Evaluate
completion
 We use precharged
NOR-NOR PLAs as the
structure of choice
 Wordlines run
horizontally
 Inputs / their
complements and outputs
run vertically
 Each PLA has a
“completion” signal that
switches low after all the
outputs switch
 Several PLAs in a cluster
share a common nbulk
node.
9
Network of PLAs (NPLA)
Inputs
Combinational Logic
Implemented as NPLA
Timing Diagram
L1
PLA
L2
PLA
L3
PLA
L4
PLA
Outputs
L2
PLA
Throughput = Tpchg +n.Teval
L1 PLA
L2 PLA
L3 PLA
L4 PLA
clk
10
The Charge Pump
- PLA “completion” signal lags beat
clock
- nbulk node gets forward biased
pullup
pulldown
- PLA “completion” signal leads beat
clock
- nbulk goes back to zero bias
11
Effectiveness of the
Approach
 We simulated a single
PLA from 0ºC to
100ºC. Also applied
VT variations (10%)
and VDD variations
(10%).
 The light region shows
the variations on delay
over all the corners
without delay
compensation.
 The red region shows
the delays with the
self-adjusting bodybias circuit.
12
Design Flow
BFSK
Design
HDL
Synthesis
Logic Verification
RC Extraction
LVS
Full Chip
Spice
Verification
Layout
Map
to
NPLA
Design
Of Analog
Components
Spice Verification:
Functional,
timing, charge pump
Integrated
Spice Netlist
13
BFSK Design
Phase Accumulator
Phase
Increment


Mux
DFF
DFF
9
Sine Lookup
8
Table
Depth:
fout = fclk 
9
2 = 512
Binary
Clk
Clk
Input
 fout < fclk/2, Nyquist criterion, implies  < 256.
 Phase increments chosen based on fclk or left
programmable in real time to get Software Defined Radio
(SDR) operation.
 We fix phase increments to avoid extra input pins required
for SDR
512
14
Design Flow
BFSK
Design
HDL
Synthesis
Logic Verification
RC Extraction
LVS
Full Chip
Spice
Verification
Layout
Map
to
NPLA
Design
Of Analog
Components
Spice Verification:
Functional,
timing, charge pump
Integrated
Spice Netlist
15
Basic BFSK transmitter
Block Diagram
Binary
Input
Data
Digital BFSK Modulator
Produces two tones
f1 if Input is LOW
f2 if Input is HIGH
DAC
Digital Block Implemented Using
NPLA based Sub-threshold Circuits
Amplifier
Antenna
16
System Architecture
CLK
Ref. PLA completion
BEAT CLK
Phase
Detector
Charge
Pump
Common Bulkn
Input
Phase
Accum
Digital BFSK
Modulator
9
DFF
NCO
8
DFF
Binary to
Thermometer
Encoder
CLK
19
Digital BFSK
using NPLA
Antenna
Amplifier
DAC
4 LSBs - Binary
15 MSBs - Thermometer
Avoids glitches in DAC o/p
17
Delay Compensated Subthreshold Design block diagram
NPLA
DFFs
L1
PLA
L2
PLA
L3
PLA
L4
PLA
DFFs
L2
PLA
L1
PLA
L2
PLA
L2
PLA
Clk
Clk
Completion of
Reference PLA
Phase
Detector
Beat Clk
Charge
Pump
Common nbulk
node of a cluster
of PLAs, modulated
by charge pump
18
HDL to Schematic of Digital BFSK
 Digital BFSK transmitter described using VHDL
 VHDL synthesized using FPGA synthesis tool, to get a
gate level netlist
 This is imported into SIS in “blif” format
 The “blif” file is logically optimized and mapped into
NPLA
 Technology Independent Optimization done on
circuit
 Circuit converted to a mult-level network of nodes
with 5 or less inputs per node
 Circuit traversed from inputs to outputs, and nodes
are implemented using PLAs of size (8/6/12)
 Using NPLA throughput equation, fclk estimated as
1.2MHz
 We choose f1≈0.115* fclk and f2 = 0.345* fclk
19
Design Flow
BFSK
Design
HDL
Synthesis
Logic Verification
RC Extraction
LVS
Full Chip
Spice
Verification
Layout
Map
to
NPLA
Design
Of Analog
Components
Spice Verification:
Functional,
timing, charge pump
Integrated
Spice Netlist
20
System Architecture
CLK
Ref. PLA completion
BEAT CLK
Phase
Detector
Charge
Pump
Common Bulkn
Input
Phase
Accum
Digital BFSK
Modulator
9
DFF
NCO
8
DFF
Binary to
Thermometer
Encoder
CLK
19
Antenna
Amplifier
DAC
21
Thermometer Coded 8-BIT
DAC
Digital
BFSK
Output
4
Binary to
Thermometer
Code
Conversion
15
DAC
4 LSBs
Binary
Therm
00
000
01
001
10
011
11
111
Adjacent Values
Differ by 1-bit
22
8-BIT DAC Schematic

Currents flow
through mirror legs
based on input value

Output current /
voltage modulated
based by sum of
weighted currents
through Rout
Thermometer codes
prevent glitches at
output
DAC supply is 0.7V to
handle 0.6V digital
signals
Rout, Rcm are off-chip
resistances

W1


CM leg
T4 - T18
B3
B2
B1
B0
Device size
16W1
8W1
4W1
2W1
W1
23
Amplifier Schematic
 Common Source
Amplifer
 Supply of 0.7V
 Rd, Rs are off-chip
resistances
 M1 biased by DAC Rout
resistor
 CL on-chip antenna load
80pF
24
Testability Features added before
Integration
CLK
Ref. PLA completion
BEAT CLK
Input
Phase
Accum
9
Phase
Detector
NCO
DFF
Charge
Pump
8
DFF
Charge
Pump
Supply
Bulkn
Common
Bulkn
Binary to
Thermometer
Encoder
CLK
19
Antenna
Amplifier
DAC
CHIP
Amp Ouput
DAC Ouput
8-BIT BFSK Output or
8-BIT DAC Input
25
Layout





Manual PLA layout for every PLA in design
NPLA routed using SEDSM
I/O pad cells, ESD diodes layout done manually
DAC, amplifier layout done manually
Antenna coil layout done manually
26
PLA Layout
Input, Bit Line
Word, Lines
Transistors, modified
based on logic to
be implemented
Output, Lines
27
I/O PAD CELL Layout
 Fully Compliant with
TSMC Design rules
 ESD Diodes have
guard rings to prevent
latchup
I/O PAD
I/O
Drivers
Primary ESD Diodes
Secondary ESD Diodes
28
Die Photo
Digital BFSK inputs domain, 0.7V
Digital BFSK domain, 0.6V
Digital BFSK output domain, 2V
Std Cell domain, 2.5V
29
Experimental Results from
Silicon
 Output of BFSK transistor is
shown
 As input changes from 0 to 1,
the output frequency changes
showing the modulation




Fclk = 1MHz
F1 = 117kHz
F2 = 347kHz
The adjacent peaks are around -10dB
below the fundamental peaks
 We found from Matlab Simulations that,
signals from the extracted Spice netlist,
could be demodulated at the receiver
side
30
Results from Silicon
Operating Range
 Nbulk kept at 0V, 0.45V
 Maximum frequency shows an quadratic dependence on
supply Voltage
31
Power Comparison
Design Style
Operating
Voltage
Frequency of
Operation
Avg
Current
Power
Dissipated
Sub-threshold
0.6V
1.05MHz
A
26.8W
Std Cell
2.5V
1.05MHz
208A
520W
 Sub-threshold power calculated only for Phase
Accumulator, and NCO blocks on 0.6V power supply,
 Std Cell implements only this portion of BFSK circuit
 Sub-threshold gives 19.4X lesser power
32
Bulkn Node Modulation
 Bulk node modulates
when beat clock demands
speedup or slow-down
 Bulk node modulates as
supply voltage is changed,
so that circuit delay is
maintained constant.
33
Conclusion
 Validated a sub-threshold circuit design methodology
based on dynamic body bias (first-of-kind)
 Validated design tools and techniques
 First-of-kind design automation flow, will help bring subthreshold design to mainstream.
 We implemented an ultra low power, low data rate
wireless BFSK transmitter
 The fabricated chip, works as expected, validating our
design flow.
 We compared the sub-threshold design a with Std Cell
based design and showed 19.4X reduction in power.
34
Thank you!!
35
Backup Slides
36
Introduction
 Power consumption has become a significant
hurdle for recent ICs
 Higher power consumption leads to
 Shorter battery life
 Higher on-chip temperatures – reduced operating
life of the chip
 There is a large and growing class of applications
where power reduction is paramount – not speed.
 Such applications are ideal candidates for subthreshold circuit design
 For sub-threshold circuits, VDD ≤ VT
37
TX/RX System Testing
TX PCB with subthreshold IC
TX antennas
RX board
38
RX setup
Solving the Problem of
Delay Sensitivity to
Process, Voltage and
Temperature Variations
"A Variation-tolerant Sub-threshold Design Approach",
Jayakumar, Khatri. Design Automation Conference (DAC)
2005 Anaheim, CA , June 13-17.
39
An Example Showing Phase
Locking
VDD change
0.2V to 0.22V
VDD change
0.22V to 0.18V
 This figure shows how
the body bias (and
hence the delay of the
PLA) changes with
changes in VDD.
 The adjustment is very
quick (within a few
clock cycles).
40
Energy and Speed
 We may be interested in the minimum energy operating
point for the design
 Minimizing VDD reduces power but minimum VDD does not
mean minimum energy
 The optimum VDD value increases with increased logical depth,
and with temperature
"Minimum Energy Near-threshold Network of PLA based Design", Jayakumar, Khatri.
International Conference on Computer Design (ICCD) 2005, Oct 2-5, San Jose, CA.
 Reclaiming the speed penalty
 Can be done for datapath circuits, using asynchronous
micropipelining
 Showed that speedup of 7X is possible, with a area overhead
of 44%
"A PLA based Asynchronous Micropipelining Approach for Subthreshold Circuit
Design", Jayakumar, Garg, Gamache, Khatri. IEEE/ACM Design Automation
Conference (DAC) 2006, July 24-28, San Francisco, CA.
41
On-chip Antenna
 Antenna size needs to be at least a 10th of the transmit
wavelength to radiate effectively
 Transmit wavelength around 600m
 Due to on-chip space constraints, antenna coil length is
only 0.2m
 We have the option of using an external antenna
 And we had a 60dB safety margin in the link budget
analysis.
 This could compensate for a lossy antenna
42
Spectrum of Amplifier Tones




Fclk = 1MHz
F1 = 117kHz
F2 = 347kHz
The adjacent peaks are
around -10dB below the
fundamental peaks
 We found from Matlab
Simulations that, signals
from the extracted Spice
netlist, could be
demodulated at the
receiver side
43