PPT - Electrical Engineering and Computer Science

Transcript PPT - Electrical Engineering and Computer Science

1
EECS598
Non-Volatile Storage
Jerry Kao
[email protected]
Electrical Engineering & Computer Science Department
The University of Michigan, Ann Arbor
University of Michigan
1
1
2
A SURVEY OF CIRCUIT
INNOVATIONS IN
FERROELECTRIC
RANDOM-ACCESS MEMORIES
Ali Sheikholeslami, and P. Glenn Gulak
University of Michigan
2
2
FRAM Structure



3
Motives for FRAM: short programming time and low power
consumption.
Easily integration in a SoC.
Research are done in following three areas: material
processing, modeling, circuit design.
University of Michigan
3
3
FRAM Comparison




4
FRAM is superior in term of write-access time and overall
power consumption.
Target application: contactless smart card, and digital camera
Also hoping to be part of the mobile device market.
This paper focused on the six innovative circuit techniques.
University of Michigan
4
4
Ferromagnetic Cores Background



5
Main technology prior to the 1950’s.
a current the x-access and y-access wire magnetized in a “0”
or “1” direction.
Read access consists of a write access followed by sensing.
 Writing the wrong data will induce a large current.
 write the data stored in sense amp back to cell after write access.
University of Michigan
5
5
Ferroelectric Capacitors Background
6


Name was adopted to convey similarity in the hysteresis loop.
Key concept: spontaneous polarization: a displacement that is
inherent to the cycstal structure and does not disappear in
absence of electric field.


Popular matieral is lead zirconate titanate (PZT), perovskites.
At 0V, the cell has two possible states.
University of Michigan
6
6
Techniques to Reduce Voltage Disturbance




7
Novel material process to make the
loop more square like.
Add the access transistor to each
cell. (1T-1C)
Access transistor OFF
 FE cap disconnect from bit line (BL)
Access transistor ON
 FE cap is connected to BL and can be
read or write from plate line (PL).

voltage boosted VDD is applied to
WL.
University of Michigan
7
7
Step-Sensing Approach Timing Diagram
8






Step PL before sensing.
BL precharge to 0V
turn on WL resulting in a capacitor
divider consisting CFE and CBL
between PL and ground.
Raise PL to VDD.
Sense the voltage on BL, Vx.
Sense amp restore the original data
in the cell.
University of Michigan
8
8
Pulse-Sensing Approach



9
pulse PL before sense amp.
has a smaller common mode
voltage.
step-sensing approach is preferred
due to higher cm voltage.
University of Michigan
9
9
Reference Voltage Generation



10
Reference voltage between V0 and V1 is need to do the
comparison.
V0 and V1 are not exact and are process and time dependent.
Two type of ferroelectric imperfections:
 Relaxation: a partial loss of remanent charge in a µs if cap is not access
for a period of time. → V1↓ or V0↑
 Imprint: the tendency of a cell to prefer one state over the other if it stay
in that state for a long period of time. → shift in V1, V0, and VREF.

A variable reference is need to track the process Variation.
University of Michigan
10
10
One Oversized Reference Capacitor per Column


Two additional cells in each column
(1C’/BL).
CREF is sized larger than CFE so that
VREF is midway between V0 and V1.

When WL0 and RWL0 or WL1 and
RWL1 are turned on at the same time,
and the sense amp amplify the
difference between BL and /BL.

Reset transistor are added to reduce a
voltage build up in the CREF.
VREF tuning achieves using
adjustable CREF, adjustable RPL, or
adjustable voltage reference
generator.

11
University of Michigan
11
11
Two Half-Sized Reference Cap per Column



also call (2 ×0.5C/BL)
Generate VREF=(V0+V1)/2
CREF1 and CREF0 are half of the size of
CFE.

In this case, VREF is going to be slightly
larger than (V0+V1)/2.
CREF1 and CREF0 fatigues faster than CFE.

University of Michigan
12
12
12
Two Full-Sized Reference Cap per Two Columns



also called (2C/2BL).
CREF1 = CREF0 = CFE
BL1 has V1 and BL2 has V0 before EQ
turn ON.


After EQ turn ON, VBL1=VBL2=(V0+V1)/2
At the end, a “0” and “1” must be
restored in CREF0 and CREF1 by pulsing
RPL thru transistor driven by RP.
University of Michigan
13
13
13
Adding Reference Cells to Rows



also called (2C/WL)
fatigue the reference voltage circuit less.
reference generated by shorting RBL and
/RBL.

need to add Cext to balance cap due to
RBL.
University of Michigan
14
14
14
A Self-Reference Fully Differential Arch.
15



also called (2T-2C)
Two CFEs store opposite values.
twice the voltage difference between BL
and /BL.

only used in lower density memory.
University of Michigan
15
15
Summary
16
 2T-2C is the most robust, but has density issue.
 among 1T-1C, 2C/2BL and 2C/WL schemes have superior
sensing complexity and fatigue immunity, respectively.
University of Michigan
16
16
Ferroelectric Memory Architecture



17
adopted folded bitline architecture to
reduce the bitline mismatch.
constant PL architecture is desired
since PL is slow to move.
Two disadvantages:
 A refresh is required.
 voltage range across CFE is smaller.
University of Michigan
17
17
Wordline-Parallel Plateline



also called (WL//PL)
PL is parallel to WL
a row of cells are access at the
same time.

If PL is shared between two
row, un-accessed row can be
disturbed.

When disturbed, “0” is
reinforced, and “1” might be
flipped.
University of Michigan
18
18
18
Bitline-Parallel Plateline



also called (BL//PL)
only a single cell can be selected.
absorb the y-decoder and reduce the
power significantly.

PL activation can disturb all the cells
in the column.
University of Michigan
19
19
19
Segmented Plateline


20
also called (Segmented PL)
Break the PL into local segments.
 faster PL than WL//PL
 no disturbance to non-selected cell
compared to BL//PL.
University of Michigan
20
20
Merged Wordline/Plateline (ML) Architecture
21
 Since WL and PL are parallel, people though



of ways to merge them.
either two 1C-1T cells or one 2C-2T cell.
write “0” into C1 and “1” into C2.
four phase operations:



BLn=0V and BLn+1=VDD
ML1 and ML2 set to VDD, forcing “0” into C1.
ML1 pulled down to ground, leaving “0” in C1,
and forcing “1” into C2.

ML1 pull to VDD and ML2 are pull to ground
forcing “1” into C1 if BLn were at VDD.
write access
 Faster read access time.
 same read/write time
 higher density
read access
University of Michigan
21
21
Nondriven Plateline Architecture
22
 also called Nondriven Plateline(NDP)
 Constant voltage on PL reduce read/write


access time.
PL=VDD/2
read operation



BL1=BL2=0V
activate WL
VDD/2 used to switch the cap storing “1”.
Good for SrBi2Ta2O9

Sense amp restore the value by holding
BL1=BL2.
 Write operation is done similar to read
operation except that BL is hold at VDD or
0V.
University of Michigan
22
22
Bitline-Driven Architecture






23
PL=0V
full VDD when read, and no refresh on
VDD/2
Shaded circuit precharge BL and /BL to
VDD or 0V before activating the WL.
PL is only pulsed after sensing.
This reduce the read access time, but
not read cycle time.
Performance can be improved if
combined with segmented PL.
University of Michigan
23
23
Dual-Mode Ferroelectric Memories

limited the switching of CFE during
the power down and power up
mode to reduce the fatigue
problem.

During power shutdown:
 STO is turn on.
 PL is pulsed, writing data to CFE
 STO pull to ground, ready for power off.
During power on sequence:

University of Michigan
24
24
24
Transpolarizer-Based Architectures




25
two CFE connected in opposite
direction.
Simpler reference voltage since
(V1+V0)/2 always equal to
VDD/2.
Although it is a 1T-2C structure,
the C is smaller than 1T-1C to
get small signal level on BL.
Read operation with t4 and t5
doing write back.
University of Michigan
25
25
Cross-Point Array of Ferroelectric Gain Cells
26





Memory architecture without PL and
destructive read.
consist of array of gain cells.
two caps form a capacitor divider, and
the transistor amplify the result.
In standby, WL=BL=VDD
In read, precharge BL to VDD and
lower WL slightly. BL with cell storing
“0” would have a larger current than BL
with cell storing “1”.
University of Michigan
26
26
Chain FRAM (NAND Architecture)
27
 similar to NAND flash.
 in unit of cell block.
 A cell block is terminated by a BL




and PL on each end.
In standby, all WL=VDD.
in active operation, WLx=0V and
raising Block-Select(BS). other WL
remain high allowing BL voltage and
PL voltage to reach the selected
cells.
Increase the number of cell in cell
block increase density but reduce
readout delay.
1024 cells per bit line and 16 cells
per cell block reduces area by 63%.
University of Michigan
27
27
Architecture Summary
University of Michigan
28
28
28
Future Trends

29
Progress in density, access time, and SoC integration can be
assumed.
 62kb and 256kb has been achieved with 1Mb expected.
 Access time hasn’t improved, but can be through circuit

innovation.
It is easier to integrate FRAM to SoC compare to EEPROM.
University of Michigan
29
29
30
ULTRALOW
POWER DATA STORAGE FOR
SENSOR NETWORKS
Gaurav Mathur, Peter Desnoyers, Deepak Ganesan, Prashant
Shenoy
University of Michigan
30
30
Motivation
31

What is the most energy-efficient storage platform for the
sensor networks, and what is the implication on sensor
network design?

Results
 Parallel NAND flash is 100X more energy-efficient storage compared to
other flash memories and the radio on MicaZ.
University of Michigan
31
31
Background



32
NOR flash is less dense than NAND and uses more energy for
erase and programming, but provides random read access
time less than 100ns.
NAND flash has significantly higher starting latency, but can
stream subsequently read bytes at high speed since it is
always page-oriented.
Writes are “one-way.” Need to erase before the next write. A
microcontroller is used to translate the disk like operation to
NAND interface, which also increase power consumption. This
takes care of erasure, page remapping, ECC, and wear
leveling.
University of Michigan
32
32
Flash Energy Consumption


33
measured on Mica mote with 10Ω resistor with 3.3V supply
Toshiba NAND is 21X more efficient than Telos NOR.
University of Michigan
33
33
Affect of Size of Data on Energy Consumption


34
read operation has a smaller
energy overhead compared
to write operation.
having a write buffer can
amortizes the fix cost over a
larger number of data bytes.
University of Michigan
34
34
Idle Current
35

NOR and NAND device are smaller between 2µA and 5µA,
which is smaller than mote CPU’s 5µA and 15µA or self
discharge current of AA battery of 10µA.

NOR and NAND device has idle current that is 17X smaller
than MMC.
University of Michigan
35
35
Summary



36
parallel NAND flash is the most energy efficient storage for
sensor network.
A desired device would have the performance of a parallel
NAND and the pin count of a serial NAND flash.
ECC is better handle using the microcontroller during idle
cycle.
University of Michigan
36
36
Implication on Sensor Systems





37
Compare energy consumption of flash to CPU, radio.
writing a byte in flash is 11X more expensive than
computation.
radio transmission of a byte is 200X over write access, and
500X over read access.
Suggested that storage energy should be part of the trade-off.
Applications that benefit
 In-network Query Process.
 Use of History
 Network-level compression
 Custody Transfer
University of Michigan
37
37
Re-thinking Sensor Net Design



38
Sensor network service involve three operation: computation,
storage and communication.
characterize those operations by two parameters: frequency
and magnitude.
Model using a sensor service emulator.
University of Michigan
38
38
Impact on Communication Service



39
NAND flash provides significant energy gain for batch size
greater than 128 bytes.
In 1% duty cycles, it achieves 3.8 times less energy/byte with
batch size of 512 bytes and 58 times improvement for a batch
size of 65kbytes.
The 7.5% duty cycle has smaller preamble resulting in less fix
energy cost per packet.
University of Michigan
39
39
Impact on Data Aggregation




40
effect of compression on energy consumption.
Three type of compression: lossless encoding, lossy
encoding, feature extraction.
use a benchmark wavelet compression scheme optimized for
floating pointless operation with computation complexity of
60N.
Conclude that 10X energy consumption saving for using of
data aggregation.
University of Michigan
40
40
Conclusion



41
parallel NAND flash has 100 fold more energy efficient than
serial NOR flash.
This observation has implication for sensor network design.
Data shows that communication and data aggregation
achieves at least an order of magnitude energy reduction.
University of Michigan
41
41
42
THE MISSING MEMRISTOR
FOUND
Dmitri B. Strukov, Gregory S. Snider, Duncan R. Stewart & R.
Stanley Williams
University of Michigan
42
42
The four fundamental two terminal circuit elements
43
University of Michigan
43
43
Operation
University of Michigan
44
44
44

PPT - Electrical Engineering and Computer Science

Transcript PPT - Electrical Engineering and Computer Science

Directory