Converting Behavioral Verilog to Transistor Counts

Download Report

Transcript Converting Behavioral Verilog to Transistor Counts

E-Voting Machine - Design Presentation
• Group M1
Mon. Sept 29
System Hardware Component Diagram
Gate-level Data path
Updated Transistor Estimates
Floorplan
•
•
•
•
Bohyun Jessica Kim
Jonathan Chiang
Chi Ho Yoon
Donald Cober
Secure Electronic Voting Terminal
Status Update
•
•
•
•
•
•
Behavioral Verilog Entire System
Gate-level Hardware Block Diagram
Updated Transistor Count Calculations
Initial Floorplan
Structural Verilog Entire System
Refined Floorplan
constant init
Card Reader
0
1
8 bit MUX
Machine
Init FSM
Encryption Key SRAM
Key Register
8 bit
Add/Sub
Fingerprint Scanner
8-bit
REG
User ID SRAM
User ID
FSM
T: 128
0
Data Bus
Selectio
n FSM
T: 88
Selection Counter
Write-in SRAM
1
8 bit MUX
User Input
8 bit
Full Adder
0
1
8 bit
Full Adder
Choice SRAM
XOR
COMMS Register
XOR
8 bit
Full Adder
Message ROM
Display
Shift
Registe
r In
Shift
Registe
r Out
1
8 bit MUX
8 bit
Full Adder
TX_Check
Confirm
ation
FSM
0
8 bit MUX
8-bit
REG
SUPER MUX!
SuperMux:
•
•
•
•
•
•
Our data flow consists of shuffling 8 bits of
data from a source to a destination
These sources and destination are SRAMs,
User Input, Comms, etc
Many are bidirectional
Since only one piece of data will be sent at
a time, it makes sense to use a bus
configuration for data movement rather
than a set of giant muxes
We can gate which srcs/dests (drop
points) are connected to the bus with one
level of pass logic
This way the data will only ever go
through two layers of pass logic to
–
–
•
•
Get onto the bus
Get off of the bus
We will still call this the SuperMux for
legacy purposes
Layout will be fun
data[7:0]
…
Drop point
Drop point
Drop point
…
Tiny Encryption Algorithm Project Specs
Original Implementation:
64-bit blocks: Two 32-bit inputs
128-bit key: Four 32-bit keys (K[0], K[1], K[2], K[3])
Feistel Structure: Symmetric structure used in block ciphers
“Magic” constant:
9E3779B9 (Delta) = 2^32 / 1.6180339887 (golden ratio)
64 Feistel rounds = 32 cycles
E-Voting Machine Implementation:
16-bit blocks: Two 8-bit inputs
32-bit key: Four 8-bit keys
32 Feistel rounds = 16 cycles
Decision:
Scale up 1.6 golden ratio by magnitude of 10 to 16, scale
(2^16) by 10 = 655360 and do division 655360 / 16 to get
Delta. Avoids using Floating point for key scheduler.
New Delta = A000, truncate least sig bit to A000 to fit 16 bits
when decrypting, since A00 * 8 cycles = 0x5000
Hardware:
4, 5-bit Shifters
16-bit Multipliers
16-bit Adder / Subtractor
COMMS BLOCK Hardware Implementation 1
States
(1)
(2)
(3)
(4)
(5)
inA[7:0]
delta
v1
v1 << 4
v1 >> 5
v0
inB[7:0]
sum[7:0]
sum[7:0]
k0
k1
out3
sel_out
0
0
1
1
0
sel_shift[1:0]
00
01
10
11
1
sel_sum
0
1
0
0
1
v_out[7:0]
v_out0 = sum[7:0]
v_out1= (C+D)
v_out2= (A+B) ^ (C+D)
v_out3 = (A+B) ^ (C+D) ^ (E+F)
v_outx = V0 + (A+B) ^ (C+D) ^ (E+F)
States (6)-(9) same as above except using k2, k3, and flip v1, v0
Implementation goes through 9 states/clk cycles each
iteration to update output function v_outx.
3:1
8 bit MUX
3:1
8 bit MUX
inA[7:0]
inB[7:0]
sel_shift[1:0]
Logical Shifter Code
inA[7:0] sel_shift[1:0]
delta
00
v1
01
v1 << 4
10
v1 >> 5
11
0
1
sel_sum
8 bit MUX
T: 32
00 01 10 11
4:1
8 bit MUX T: 64
8’h00
8 bit
Full Adder/Sub
0
1
8 bit MUX
T: 32
T: 128
XOR
1-bit
REG
clk
8-bit
REG
T: 48
T: 88
v_outx
sum += delta;
v0 += ((v1<<4)+k0) ^ (v1+sum) ^ ((v1>>5)+k1);
v1 += ((v0<<4)+k2) ^ (v0+sum) ^ ((v0>>5)+k3);
In addition, logic will to iterate 8 times and be
controlled via FSM machine that uses:
(2x) 3:1 8 bit MUX for state input selection [8*8*2 = 128]
(2x) 1 bit Counter adder for updating cycle [16*2 = 32]
(2x) 1 bit REG for storing updated cycle [11*2 = 22]
Total: 606
Advantages:
Saves transistors and area for Comms Block
1 bit Full Adder
clk
sel_out
Reusing of:
(1x) 8 bit Full adder/sub (Ripple carry) [16*8 = 128]
(2x) 2:1 8 bit MUX for output pass-through [4*8*2 = 64]
(8x) 2-input XORS [6*8 = 48]
(1x) 8 bit REG [11*8 = 88]
(1x) 4:1 8 bit MUX for shifting selection [12*8 = 96]
Disadvantages:
Very heavy pass-logic from MUX layers and XOR
High clk frequency required since reusing same
components for calculating outx by stages. This translates
to higher power consumption since we are trying to do
more with less hardware.
Tradeoff:
Every 8-bit MUX uses 4*8 = 32 transistors compared to 8bit Full Adder 16*8 = 128 transistors. However MUXES
have high pass-logic so area vs. power tradeoff is
concerned here.
COMMS BLOCK Hardware Implementation 2
sum
delta
0
sel_out
0
1
1
8 bit MUX
output
pass sum, V1
pass new sum, V0
8 bit
Add/Sub
V1
K0
8-bit
REG
V1clk
V1
K1
V0
Implementation 2 does concurrent calculations for all 3
parts of function, completes full iteration of calculations in 2
clk cycles.
T: 88
T: 128
0
1
0
8 bit MUX
T: 32
sel_out
1
0
8 bit MUX
T: 32
1
8 bit MUX
T: 32
{V1[3:0], 4’b0}
{5’b0, V1[7:5]}
8 bit
Full Adder
8 bit
Full Adder
T: 128
8 bit
Full Adder
T: 128
T: 128
Uses:
(1x) 8 bit Full adder/sub (Ripple carry) [16*8 = 128]
(3x) 8 bit Full adder (Ripple carry) [12*8*4 = 384]
(4x) 2:1 8 bit MUX for output pass-through [4*8*4 = 128]
(16x) 2-input XORS [6*16 = 96]
(2x) 8 bit REG [11*8*2 = 176]
(1x) 1 bit Counter adder for updating cycle [16]
(1x) 1 bit REG for storing updated cycle [11]
Total: 939
In addition, logic will not need complex FSM, just
needs to do 8 iterations.
XOR
Advantages:
Low pass logic, speed performance, low power, MUX logic
transistor count essentially halved.
1 bit Full Adder
XOR
clk
8 bit
Full Adder
1-bit
REG
clk
sum += delta;
v0 += ((v1<<4)+k0) ^ (v1+sum) ^ ((v1>>5)+k1);
v1 += ((v0<<4)+k2) ^ (v0+sum) ^ ((v0>>5)+k3);
T: 128
8-bit
T: 88
REG
v_outx
Disadvantages:
More Transistor Count and larger area.
Tradeoff:
Larger area but low pass logic from reduced MUX and
complex FSM simplifies design, increases speed and
minimizes power.
E-Voting TEA Gate Level Hardware
Full Adder
Common full adder
Mirror Adder
-Uses 28 transistors (including 4 transistors in inverters)
-NMOS and CMOS are completely symmetrical
logic :
S = a ⊕ b ⊕ Carryin
Carryout = (a ⊕ b) • Carryin +(a • b)
E-Voting TEA Gate Level Hardware
Full Adder
What we decided to use in this project…
1-bit full adder
-Uses pass-transistor logic for computing XNOR
-Sum-bit equals to A^B^C, where A and B are 2 inputs and Cin is the Carry-in input;
muxing at the bottom will sort out the Cout bit to carry out.
-Will use this adder 8 times to compute all 8 bits of data
-Uses inverters to strengthen the signal at the end of each XNOR
-Uses only 16 transistors yet strong signal
E-Voting TEA Gate Level Hardware
XOR
XOR
-To avoid using two t-gates
-Uses 6 transistors (XNOR + inv)
MUX
T-gate Mux
-4 transistors
-very tiny hence difficult to layout
E-Voting TEA Gate Level Hardware
REG
TSPC Register
-True single phase clock flip-flop
-Advantage of single clock distribution, small area for clock lines, high speed and no clock
skew
-We will use 8T instead of 9T
SRAM Gate Level Hardware
SRAM Cell
-6T SRAM Cell
-smaller transistor size
-lower energy dissipation
-efficient layout
SRAM Gate Level Hardware
Address Decoder
-Combination of inverters and nand gates
SRAM Gate Level Hardware
SRAM
-Input/Ouput tri-state buffers?
-Need of Sense amplifier?
8bit Data
1bit Card Detected Signal
Card Reader
8bit Data
Data Bus
Machine Initialization FSM
1bit Data Ready
8bit Data
COMMS
8bit Data
1bit Message
4-bit Data bus control
1bit Activate next
Encryption Key SRAM
(4 byte)
2bit Address
Message ROM
1bit Activate this
1bit Reactivate this
1bit Yes Signal
User Input
1bit No Signal
8bit Data
1bit Card Detected Signal
Card Reader
8bit Data
1bit Finger Scanned Signal
Data Bus
User ID FSM
3bit Address
Fingerprint Scanner
8bit Data
User ID SRAM
(8 byte)
8bit Data
COMMS
1bit Data Ready
8bit Data
Message ROM
2bit Message
7-bit Data bus control
1bit Activate next
8bit Data
Display
1bit Activate this
1bit Reactivate this
8bit Data
1bit Next Page Signal
8bit Data
Selection Counter
8bit Data
1bit Data Ready
Data Bus
Selection FSM
2bit Address
Choice SRAM
(4 byte)
8bit Data
COMMS
8bit Data
Message ROM
2bit Message
6-bit Data bus control
1bit Activate next
8bit Data
Display
3bit Count
User Input
1bit Previous Page Signal
TX_Check
1bit Activate this
1bit Yes Signal
User Input
1bit No Signal
1bit TX_good
8bit Data
3bit Address
1bit Reset
8bit Data
Confirmation FSM
2bit Address
1bit Reset
1bit Reset
Data Bus
6bit Address
8bit Data
User ID SRAM
(8 byte)
Choice SRAM
(4 byte)
Write-in SRAM
(64 byte)
8bit Data
COMMS
1bit Data Ready
8bit Data
Message ROM
2bit Message
8-bit Data bus control
8bit Data
Display
1bit Reactivate Selection
1bit Reactivate User ID
SUPER MUX!
The statement that we only transfer one byte of data at a time is technically false
For example:
When the Message ROM is sending a message to the COMMS
The COMMS are using data from the Encryption Key SRAM to encode the message
Encryption Key SRAM
(4 byte)
Message ROM
Data Bus
COMMS
We can circumvent this by hardwiring the Encryption Key SRAM data to the
COMMs Key input in addition to attaching it to the bus. This only works
because the Key SRAM will never be active on the data bus while the
COMMs are accessing it
SUPER MUX!
Other hardwired Connections:
TX Check
Choice SRAM
The transmission check confirms that the data sent to the main computer
and held in it’s current session matches the choices stored in our SRAM
During the Confirmation FSM the SRAM data is sent to the main
computer and the main computer echos it back.
The echo is streamed into the TX Check (as well as the display) and the
TX Check compares it (as it is streaming) to the Choice SRAM
User Input
Write-In SRAM
//Initialize
actNext = 0;
state = 0;
next_state = 1'b0;
case (state)
Converting Behavioral Verilog to Transistor Counts
State:
`s1: begin
src
1
0
dest
0
D 2 ~Q CARD D
3
>
end
MESSAGE
Q
message
mux_src = 0;
mux_dest = 0;
//Wait for card data
if(cardDetectSig) begin
//Send card data to the Key SRAM
next_address = 0;
next_state = `s2;
end
>
`s2: begin
0
KEY
D0
~Q
COMMS
0
5
COMMS
~Q
KEY_REQUEST
Q
mux_src = `CARD_SRC;
mux_dest = `KEY_SRAM_DEST;
//read in 4 bytes from card reader
if(address==3) begin
next_state = `s3;
4
Machine Init FSM
>
Q
0
0
KEY
0
end
next_address = address + 1;
6
z
end
NEXT
z
`s3: begin
D
mux_src = `MESSAGE_SRC;
end output
Each 1-bit
derived
from
>
Q
> a 3-bitQinput (state)
Approx`s4:2begin/ 2 input gates for each
mux_src = 0;
~10 transistors tfor
each
mux_dest
= 0; distinct output
next_address = 0;
//Wait for data to arrive
if(commDetectSig==0) begin
next_state = `s4;
end else begin
next_state = `s5;
end
50 transistors total for random logic
`s5: begin
2. State Change Logic:
• Most changes are sequentially
incrementing
• Flip Flops are configured as
counters
//Send a key request to the comms
~Q
D
~Q
message = `KEY_REQUEST;
mux_dest = `COMMS_DEST;
5 distinct 1bit outputs
next_state = `s4;
end
1. Create registers:
• 6 states => 3 D-flip-Flops
• + 2bit SRAM address
3. Further Logic:
• Remaining logic consists of
output signals generated mostly
by state
• Random logic can be
approximated based on number
and configuration of outputs
Converting Behavioral Verilog to Transistor Counts (cont)
States
Address
Registers
Distinct
Outputs
Random
Transistors
Machine Init FSM
6
2 bits
5
5
50
105
User ID FSM
12
3 bits
7
13
130
207
Selection FSM
7
2 bits
5
9
90
145
Confirmation FSM
9
6 bits
10
8
80
170
User Input
NA
6 bits
14
20
90
244
Selection Counter
NA
NA
3
3
0
33
TX Compare
NA
2 bit
3
1
0
33
Block
Block
Points on Bus
T-gates
Transistors
Data Bus MUX
13
104
208
Block
Message ROM
Messages
Inputs
~ Gates / Bit
Transistors
8 (1 byte)
8
7 (35 transistors)
280
Total:
1425
Converting Behavioral Verilog to Transistor Counts (cont)
Block
Bits
Address transistors
Transistors
Key SRAM
32
8*(2^2)+2*2 = 36
228
User ID SRAM
64
8*(2^3)+2*3 = 70
454
Choice SRAM
32
8*(2^2)+2*2 = 36
228
Write-In SRAM
512
8*(2^6)+2*6 = 524
3 596
Block
Bits
COMMs
Transistors
<slide 7>
939
Shift IN
8
88
Shift Out
8
88
Input/Output MUX
8
32
Register
16
176
Total:
7254
Encryption Key SRAM
COMMS
Machine Init FSM
User ID SRAM
USER ID FSM
Choice SRAM
Confirmation FSM
Comm Register
User Input
Write-In SRAM
MUX
Selection FSM
Shift In
Shift Out
Questions?
Thank you!