Transcript Document

A 16-Bit Low-Power Microcontroller with
Monolithic MEMS-LC Clocking
Eric D. Marsman1, Robert M. Senger1, Michael S.
McCorquodale2, Matthew R. Guthaus1, Rajiv A. Ravindran1,
Ganesh S. Dasika1, Scott A. Mahlke1, Richard B. Brown3
1University
of Michigan, 2Mobius Microsystems, 3University of Utah
IEEE International Symposium on Circuits and Systems
May 23rd – May 26th, 2005, Kobe, Japan
NSF ERC for Wireless Integrated MicroSystems (WIMS)
Overview
• Motivation
• Microsystem Architecture
– Microcontroller
– Clock Generation
– Dynamic Frequency Scaling (DFS)
• Microsystem Measured Results
–
–
–
–
–
Microcontroller
Compiler Utilization
Instruction Level Power Modeling
Clock Generation
DFS
• Future Directions
• Conclusion
NSF ERC for Wireless Integrated MicroSystems (WIMS)
2
Motivation
Wireless Integrated Microsystems (WIMS)
Environmental Sensors
Biomedical Implants
Cochlear
Implant
Heavy
Metals
Deep
Brain
Implants
m Gas
Chromatograph
NSF ERC for Wireless Integrated MicroSystems (WIMS)
3
Motivation (cont)
•
Power minimization
–
–
–
–
–
Frequency scaling
Voltage scaling
Memory architecture
Process technology
Leakage current mitigation
Core
Process
Frequency
No. Bits
Core Power
ARM7TDMI
0.18um
88MHz
32
22mW
Tensilica
Xtensa
0.18um
200MHz
32
80mW
MIPS32M4K
0.13um
300MHz
32
84mW
Infineon
C166S
0.18um
80MHz
16
160mW
Commercially available cores
NSF ERC for Wireless Integrated MicroSystems (WIMS)
4
Microsystem Architecture
16-bit, 3-stage pipeline
Software controlled register interface to clock generator
Peripheral communication interfaces for flexibility
Register Files
CMOS-MEMS Clock Generator
Execute
Decode
Fetch
64 KB External Memory
•
•
•
Memory Management Unit
Boot
ROM
64KB
SRAM
Loop
Cache
NSF ERC for Wireless Integrated MicroSystems (WIMS)
USART
X3
Test
Int.
SPI
X2
Timer
5
Microcontroller Architecture
Primarily a Load-Store architecture
77 instructions, 8 addressing modes
Data and address registers split into two windows
Hardware support for one level of interrupts and subroutines
Banked memory architecture with additional external memory
interface
Power
Area
–
•
Energy/area tradeoffs
compared to single 64kB
bank
Low-power loop cache
for commonly executed
instructions
1
Normalized Area and Power
•
•
•
•
•
0.8
15.9% more area
69.2% less power
0.6
0.4
0.2
0
1x64
NSF ERC for Wireless Integrated MicroSystems (WIMS)
2x32
4x16
8x8
16x4
Ram Structure ('banks' x 'size in kB')
32x2
6
Monolithic Clock Generation
•
•
•
•
•
Complementary, cross coupled, negative-transconductance tank
Frequency trimming via modulation of tail current with vtrim
CMOS compatible
1.056GHz oscillation frequency
Buffer amplifier removes amplitude variation
+
L
16fo
R
C
C
D
Q
DFF
Q
DFF
Q
_
D
D
Q
DFF
Q
Q
2fo
vtrim
NSF ERC for Wireless Integrated MicroSystems (WIMS)
7
Dynamic Frequency Scaling
•
•
•
Fully synthesized logic, no custom design
Synchronization chain ensures glitch free output
Optional external clock input
External Clock Sel
Clock Sel
D
2f0
Q
C Q
D
f0
External Clock
Q
C Q
f1
FF0
fclk
D
FF1
Q
D
C Q
D
C Q
D
Q
System Clock
To Clock Tree
C Q
Clock Synchronizer
Q
C Q
Q
FF4
f15
fn 
f0
; n  1 , 2, ... ,15
2n
Clock Divider
NSF ERC for Wireless Integrated MicroSystems (WIMS)
8
Dynamic Frequency Scaling (cont)
•
Glitch suppression example
Clock Sel
f0
f2
f1
2f0
f0
f1
f2
glitch
fclk
FF0.Q
FF4.Q
NSF ERC for Wireless Integrated MicroSystems (WIMS)
9
Microsystem Measured Results
•
•
•
CLK
16KB
SRAM
ANALOG
TEST
PIPELINE
PERIPHERALS
•
16KB
SRAM
CACHE
•
•
•
TSMC 0.18mm MM/RF
bulk CMOS
3.5 million transistors
Operates up to 92MHz
33.9mW core power
consumption @ 92MHz
& 1.8V
1.4mW core power
consumption @ 10MHz
& 1.1V
17.28mW MEMS clock
source power
consumption @ 1.8V
740mW sleep power
consumption @ 1.1V
16KB
SRAM
16KB
SRAM
3.54mm
NSF ERC for Wireless Integrated MicroSystems (WIMS)
10
Microcontroller Measured Results
•
•
Static loop cache utilization provides 4 to 20% energy savings
Vdd scaling across different frequencies allows for adjustment to
program workload requirements
25.0%
20
20.0%
10
5
0
Data1
Data2
Data3
Data4
15.0%
10.0%
5.0%
0.0%
Chip #2
Chip #3
Chip #4
60.00
Fetch1 Fetch2
Loop cache energy savings
NSF ERC for Wireless Integrated MicroSystems (WIMS)
50.00
Core Power (mW)
25
56% LC accesses
30.0%
93% LC accesses
30
29% LC accesses
35.0%
23% LC accesses
35
28% LC accesses
40.0%
15
Chip #1
Percentage
Savings
45.0%
Power Savings using Loop Cache
SRAM and
Loop Cache
40
23% LC accesses
Measured Power (mW)
45
SRAM
Only
90MHz
40.00
30.00
50MHz
20.00
10.00
10MHz
0.00
1.10 1.20 1.30 1.40 1.50 1.60 1.70 1.80 1.90 2.00 2.10 2.20
Core Vdd (V)
Power vs. Vdd across frequency ranges
11
WIMS C Compiler
Windowed versus non-windowed machine
–
–
Dynamic instruction placement in 512B loop cache achieves
43% energy savings over static placement
60
% Energy Savings
Dynamic
Static
50
40
30
20
average
rijndael
gsmenc
unepic
blowfish
djpeg
cjpeg
rawd
rawc
pegwitdec
rasta
0
gsmdec
10
epic
•
19% reduction in power consumption
30% performance improvement
sha
•
Energy savings in 64B loop cache
NSF ERC for Wireless Integrated MicroSystems (WIMS)
12
Instruction Level Power Modeling
•
Divide ISA into groups of similar
instructions
noops model inter-instruction
pipeline switching
Account for memory access
energy separately
•
•
Instruction
Group
Energy
(nJ)
Instruction
Group
Energy
(nJ)
add-sub
0.2403
win swap
0.1832
shift
0.1950
load imm
0.1961
boolean
0.2127
branch-nt
0.1720
compare
0.2082
branch-t
0.5741
multiply
2.7702
jmp abs
0.5372
Ext Mem
(nJ)1
Loop (nJ)
MMR (nJ)
Boot Rom
(nJ)
divide
2.7160
jmp rel
0.4020
inst fetch
-0.0554
-0.0507
-
-0.0420
copy
0.2127
jmp abs sub
0.5658
bit2
-0.1643
-0.1615
-0.1909
-
bit
0.6137
jmp rel sub
0.3527
load abs2
-0.0976
-0.1016
-0.0877
-
load abs
0.5249
return
0.3700
load rel2
-0.1039
-0.1039
-0.1091
-
load rel
0.3661
swi
0.5585
store abs2
-0.0411
-0.0461
-0.0427
-
store abs
0.4427
store rel2
-0.0525
-0.0633
-0.0575
-
store rel
0.3070
noop
0.1931
1 Excludes
2 Fetch
memory access energy as this is memory dependent
energy counted separately
Memory access energy
NSF ERC for Wireless Integrated MicroSystems (WIMS)
Energy per instruction group
13
Clock Generation Results
•
•
•
•
•
•
•
•
•
•
No external reference
No PLL/DLL
High frequency accuracy
Low start-up latency
Low temperature
coefficient
Broad operating
temperature range
Low jitter
Minimal area overhead
(3% of die)
Low Power
All Si technology
Metric/Parameter
LC Clock
Reference frequency
1056MHz
Output frequencies
0.002 – 66MHz
Frequency accuracy across lot
±0.75%
Frequency precision (no trim)
±2%
Trimmed frequency accuracy
100ppm
Worst case duty cycle
48/52
Worst case RMS period jitter
<300ppm
Temperature stability
±0.9% (-40 to 100C)
Max. operation temperature
150C
Power supply
1.8V
Bias current
9.6mA
Power dissipation
17.28mW
Min. operating power
7.2mW
Start-up latency (25C/125C)
18ns/28ns
Si footprint
0.3mm2
NSF ERC for Wireless Integrated MicroSystems (WIMS)
14
MEMS Fabrication
•
•
•
Post processing etch
using PAD cut
Suspended inductor
Varactor etch
unsuccessful
–
–
No etch chemistry for
MiM oxy-nitride
dielectric
Use transconductance
modulation instead
NSF ERC for Wireless Integrated MicroSystems (WIMS)
15
DFS Results
•
•
Glitch free switching
Switching latency is 5/2f0, or 37.45ns for this implementation
glitch-free frequency
switching
1MHz
33MHz
NSF ERC for Wireless Integrated MicroSystems (WIMS)
33MHz
8MHz
16MHz
4MHz
16
Future Directions
•
•
•
Add DSP for Cochlear Implants and other bio-medical devices
Include ring oscillator for a lower power alternative
ISA improvements to reduce
compiler bottlenecks
–
•
•
Address register support
Separate data and address
register windows
DMA instructions
Decrease sleep mode power
Explore Microsystem design in
advanced technologies
8KB
SRAM
8KB
SRAM
CACHE
PIPELINE
–
–
8KB
SRAM
8KB
SRAM
I/O CLK
DSP
3.0mm
Preliminary next generation system
NSF ERC for Wireless Integrated MicroSystems (WIMS)
17
Conclusion
•
•
•
•
Described a highly-functional, low-power Microsystem ideally
suited for remote and bio-medical applications
DFS allows on-the-fly, low-latency adaptation to workload
requirements from 33.9mW @ 90MHz to 1.4mW @ 10MHz or
sleep mode at 740mW
Monolithic clock reference decreases system size, cost, and
power consumption compared to other techniques
Power-aware compiler takes advantage of low-power
architectural features to achieve maximum power reduction
NSF ERC for Wireless Integrated MicroSystems (WIMS)
18
Acknowledgements
•
•
•
•
•
•
•
•
NSF ERC for WIMS
MOSIS Educational Program
Artisan Components
TSMC
Cadence
Synopsys
Mentor Graphics
Coventor
NSF ERC for Wireless Integrated MicroSystems (WIMS)
19