MTJ Switching Energy Analysis

Download Report

Transcript MTJ Switching Energy Analysis

Click to edit Master title style
Progress Update
Energy-Performance
Characterization of CMOS/MTJ
Hybrid Circuits
Fengbo Ren
05/28/2010
Modern MTJ
 Bias voltage/current controlled variable resistance
device
– Low: RP
– High: RAP
– TMR = (RAP - RP)/ RP
 Spin-transfer-torque (STT) Switching
– Switching is controlled by the direction of writing current.
– Writing current density has to exceed thresholds
2
Motivations for Hybrid Logic
 Significant application in MRAM design.
 Why logic?
– CMOS-compitible
● Switching current: 200uA – 2mA
● 90nm transistor: 1mA/um gate width
– Non-volatility, high stability
● Introducing MTJ's non-volatility into CMOS, which may suppress
leakage in active mode and reduce the leakage in idle mode to
minimum.
– 3D – stack
● Replace CMOS with MTJ may increase density.
3
Questions?
 What architecture can best utilize MTJ's non-volatility
feature to improve energy efficiency?
 Can MTJ/CMOS hybrid circuit has better energy delay
trade-off than CMOS circuit?
 How much leakage power can be saved by introducing
MTJ to CMOS?
 Any overhead? How much is the switching power of
MTJ?
 What will be the trend of MTJ/CMOS hybrid circuit with
technology scaling?
4
Logic-in-Memory MTJ (LIM-MTJ) Logic Style
 LIMT-MTJ
– Use differential MTJ in Dynamic Current-mode Logic
(DyCML)
● Outputs are evaluated based on the resistance difference of pull
down networks through x-coupled PMOS.
● Claimed to have dynamic and static power than SCMOS.
Z
Z’
MTJ
I
X1
X2
X3
External
Inputs
CLK
S’
R(X,Y)’
Y
Y’
Stored
Inputs
(MTJ)
CLK CLK
S Co’
A’
A
I'
R(X,Y)
34 CMOS Transistors + 4 MTJ
VDD
Current Comparator
X1
X2
BL
X3
BL’
WL1
Ci
A
CLK
Co
A
CiA’
Ci’
WL4
WL2 WL3
Ci
Ci’
B MTJ Memory
MTJ
(MTJ Cell)
B’
CLK
GND
MTJ
CLK’
Sum Circuit
CL
B MTJ Memory
MTJ
(MTJ Cell)
CLK
CLK’
B’
CL
Carry Circuit
Schematic of LIM-MTJ 1-bit full adder.
5
Energy-Performance Characterization
 V.S. SCMOS & DyCML
32 CMOS Transistors
VDD
28 CMOS Transistors
VDD
A
B
C
B
A
B
Ci
CLK
S’
Ci
B
A
S
B
A
GND
B
B
Carry Circuit
A
B
Ci
Ci’
Ci
B
Ci
A
A’
A
A
C
CLK CLK
S
Co’
CLK
Co
Ci
A
Ci
B’
Ci’
A’
A
A
B
A’
B’
Co
A
Sum Circuit
Schematic of SCMOS 1-bit full adder.
CLK
GND
CLK
CLK’
Sum Circuit
CL
CLK’
CL
Carry Circuit
Schematic of DyCML 1-bit full adder.
– LIM-MTJ has no energy performance advantage as compared to the
equivelent CMOS implementation
6
MTJ Switching Energy Analysis
 Switching Energy
E W  IW 2  R  t
– IW = JC∙A,
● JC is the critical current density
● A is the junction area. A = π∙W∙L= K∙L2 , L is
junction size.
– R = δ/A
● δ is the resistance-area product, intrinsic MTJ
parameter. δ = 20 Ω ∙ um2
– t is time.
E W  K  J C    L  t.
2
2
7
MTJ Switching Energy Analysis
 JC is a function of current pulse width.
– Switching time is a function of current density.
J C (t ) 
t
J C 0 [1  ln( ) / ], t  8ns
t0
JC 0 
C1
 C2 ,
t
t
J C (t )  J C 0 [1  ln( ) / ]
t0
t  8ns
J C (t )  J C 0 
C
t
● Δ is the thermal stability factor (Δ≥40)
● t0 is the intrinsic switching time. t0 = 1 ns
● JC0 is the intrinsic critical current density, JC0 = JC at t= t0.
– Modern MTJs have been shown to have JC0 = 2-7 MA/cm2
8
MTJ Switching Energy Analysis
 Switching Energy
E W (t )  K  J C 2 (t )    L2  t
– Function of switching time (t) given JC0, δ, L, Δ
– Ref. MTJ
● JC0 = 5 MA/cm2, δ= 20 Ω ∙ um2, L=135nm, (W=65 nm,)
● RP=725 Ω, IC=1.4mA @ t=1ns
 Switching Energy > 1 pJ
– CMOS/MTJ hybrid
logic circuits require
frequent switching is
hardly energy efficient.
9
MTJ Switching Energy Analysis
 Switching Energy with scaling E W (t )  K  J C 2 (t )    L2  t
– δ, L, JC0
 fJ Switching
– δ ≤ 5Ω ∙ um2 & JC0 ≤ 0.6 MA/cm2 & L ≤ 33nm
10
LUT-based Logic
 Store the true table in memory
 Reads out the logic value based on
input selection.
– Reconfigurable
– Can implement all type of logics.
e.g. FPGA
 Replace storage cell with MTJ
– No MTJ switching during the logic
operation. Only need to be configured
once.
– Non-volatile, minimum stanby power.
– Instant boot-up.
Example of 3 input LUT
11
MTJ Reading Circuit
 Conventional current-mirror sense amplifier based reading
circuit. (SA)
∆V
∆V
VIP VIN
– Slow (2 stages)
– Power hungry (DC current)
12
MTJ Reading Circuit
 X-coupled inverter based reading circuit. (XSA)
∆V at
evaluation
phase
1MTJ and 1Rref
accessed per read
Amplified by Xcoupled inverter
– Fast
● ∆V are generated and amplified at the same time
– Power efficient
● no DC current, only charging discharging capacitance
13
Energy Performance Comparison
14
Instant Power
15
1 Bit Full Adder (CMOS_LUT)
 Transistor Count
– 16xEDFF
– 4xMUX4
– 2xMUX2
– 672 Transistors
16
1 Bit Full Adder (MTJ_LUT1)
 Transistor Count
–
–
–
–
16xREAD1XMTJ
4xMUX4
2xMUX2
2xWRTCKT
– 448 Transistors
– 33% Reduction
– 16 MTJ
17
READ1XMTJ
 15T+1MTJ
 Need writing circuit
18
1 Bit Full Adder (MTJ_LUT2)
 Transistor Count
–
–
–
–
–
2x READ8XMTJ
1x 9-WORD DECODER
2x MUX2
1x INV
1x WRTCKT
– 174 Transistors
– 76% Reduction
– 16 MTJ
19
READ8XMTJ




MTJs share reading circuit
1MTJ + 1 Rref are accessed / read
1MTJ is accessed / write
23T + 8 MTJ
20
Simulation Setup
 3 LUT architecture are compared



– CMOS-LUT
– MTJ-LUT1: MTJ reading circuit + MUX
– MTJ-LUT2: Shared MTJ reading circuit + decoder
Configured to implement 1-bit full adder
– 2 3-input LUTs
ASU predictive technology model (PTM)
– 90nm, 65nm (bulk)
– 45nm, 32nm (SOI)
MTJ characteristic
– Rp = 700, Rap = 1400, TMR = 100%, Icap2p = 223uA, Icp2ap
= 500uA
– Verilog-A MTJ model from Richard.
21
Configuration Power
 CMOS-LUT

– 1GHz
MTJ-LUT
– 250MHz
– 750uA Writing Current
– About 3 ns Writing time
/ MTJ
 MTJ-based LUT are 10x bigger configuration power
– 16 MTJ’s switching energy
22
Delay
 MTJ-based LUT2 has 2.5x bigger delay
23
Leakage Power
 MTJ-LUT1 has a little bit bigger leakage power
 MTJ-LUT2 has about 5x smaller total leakage power and
– 10x smaller storage leakage (due to MTJ)
– 2x smaller logic leakage (from MUX to decoder)
24
Energy (Operation Frequency:100MHz)
 LUT2
– 4x total energy saving @ 32nm
● 1/10 leakage_storage, ½ leakage_logic, bigger dynamic_logic
● Dynamic_storage overhead decreases with technology scaling
down.
25
Energy (Operation Frequency:250MHz)
 LUT2
– 3x total energy saving @ 32nm
● 1/10 leakage_storage, ½ leakage_logic, ½ dynamic_logic
● Dynamic_storage overhead decreases with technology scaling
down.
26
Energy (Operation Frequency:500MHz)
 LUT2
– 2x total energy saving @ 32nm
● 1/10 leakage_storage, ½ leakage_logic, ½ dynamic_logic
● Dynamic_storage overhead decreases with technology scaling
down.
27
Standby Power
Standby Power (uW)
Technology Node
Structure
90nm
65nm
45nm
32nm
CMOS-LUT
6.5
12.8
3.3
29.9
MTJ-LUT1
1.66
1.79
0.469
1.04
MTJ-LUT2
0.836
0.625
0.202
0.227
 Dynamic sleep transistor
– 50mV voltage drop across sleep transistor
 5-20X reduction
28
Conclusions
 What architecture can best utilize MTJ's non-volatility feature to improve
energy efficiency?
– LUT-based logic which require no MTJ switching.
 Can MTJ/CMOS hybrid circuit has better energy delay trade-off than CMOS
circuit?
– Yes.
 How much leakage power can be saved by introducing MTJ to CMOS?
– About 10x reduction
 Any overhead? How much is the switching power of MTJ?
– Yes. MTJ reading energy is overhead. MTJ writing energy of modern MTJ is
around several pJ.
 What will be the trend of MTJ/CMOS hybrid circuit with technology scaling?
– Will play significant role in suppressing leakage below 45 nm.
29