Stream Cyphers for FPGA Security
Download
Report
Transcript Stream Cyphers for FPGA Security
Shemal Shroff
Shoaib Bhuria
Yash Naik
Peter Hall
Introduction to Security
Relevance to FPGA
Design and Manufacture flow for an FPGA
Things to secure and why?
Types of Attack
Prevention
PUFs
Provisions and policies adopted by a network administrator
To prevent and monitor:
Unauthorized access,
Misuse,
Modification,
Denial of a computer network and network-accessible resources.
Simmonds, A; Sandilands, P; van Ekert, L (2004). "An Ontology for Network Security Attacks". Lecture Notes in Computer Science. Lecture Notes in
Computer Science 3285: 317–323
Research on “FPGA Security” has been active since the early 2000s.
Several commercial and military applications employ programmable logic.
This makes design security important for safety and national security.
WP365, Solving Today’s Design Security Concerns, Xilinx White Paper.
To learn the confidential cryptographic key.
One-to-one copy or “cloning” together with its key.
Reverse engineering of encryption algorithm.
Execute certain cryptographic operation with presumably secret key.
E.g. pay-tv and in-government communications
Thomas Wollinger and Christoff Paar, Security Aspects of FPGAs in Cryptographic Applications in New Algorithms, Architectures and
Applications for Reconfigurable Computing, Springer, 2005, Ch. 21, pp 265-278.
Saar Drimer, Volatile FPGA Design Security – A Survey, v0.96, April 2008.
Figure: Simplified depiction of the FPGA design, manufacturing, packaging, and
testing processes.
Saar Drimer, Volatile FPGA Design Security – A Survey, v0.96, April 2008.
Figure: Development, manufacturing,
and distribution of an FPGA-based
system.
Saar Drimer, Volatile FPGA Design Security – A Survey, v0.96, April 2008.
B. Dipert. Cunning circuits confound crooks. http://www.e-insite.net/ednmag/contents/images/21df2.pdf, October 12 2000.
Bitstream
Configuration of the device
Bitstream has all the configuration bits required for programming the FPGA.
If the bitstream is compromised then your design can be cloned or reverse
engineered.
To protect the logic of FPGA
To prevent manipulation of design using JTAG.
Single Event Upset (SEU) or faults
Verify that the application is trusted to be correct.
Authenticate the application.
Black box
Attack
Reverse
engineering
Bitstream
Cloning of
sRAM FPGAs
Readback
Attack
Side Channel
Attack
Attacks
Fault injection
Hardware
virus
Configuration
of the device
Manipulating
design
through JTAG
Voltage
modification
Temperature
1.
Black Box Attack
2.
Reverse-Engineering of the
Bitstreams
3.
Cloning of sRAM FPGAs
4.
Readback Attack
5.
Side Channel Attacks
Step
1: The attacker inputs all
possible combinations, while saving
the corresponding outputs.
Step 2: Develops a K-map to simplify
the resulting tables
Step 3: Extracts the logic of the FPGA.
Thomas Wollinger and Christoff Paar, Security Aspects of FPGAs in Cryptographic Applications in New Algorithms, Architectures and Applications for Reconfigurable
Computing, Springer, 2005, Ch. 21, pp 265-278.
A
C
AB 00
01
11
10
B
C
Output
(Y)
0
0
0
0
0
0
1
0
0
0
1
0
0
0
1
0
1
1
0
0
0
0
0
1
1
0
1
0
0
0
1
0
1
0
1
1
0
0
1
1
1
0
Y = (A.B)’.B.C’
= A’BC’
Not a real threat nowadays, due to:
complexity of the designs
size of state-of-the-art FPGAs.
Common I/O pins which makes it difficult to connect to the right pin.
An attacker has to connect to device’s pin of a known function like,
Microprocessor interrupt input,
And also,
Figure out whether to:
Drive a pin with a voltage,
Sense its output state, or both isn’t a straightforward exercise.
B. Dipert. Cunning circuits confound crooks. http://www.e-insite.net/ednmag/contents/images/21df2.pdf, October 12 2000.
Thomas Wollinger and Christoff Paar, Security Aspects of FPGAs in Cryptographic Applications in New Algorithms, Architectures and Applications for Reconfigurable
Computing, Springer, 2005, Ch. 21, pp 265-278.
A = 32 bits
B = 32 bits
Adder
Output
We have, in total, 264 input combinations.
Lets assume that latency for the adder is 10 ns.
Therefore, time to apply all the combinations is 264 x10 ns.
This takes approximately 5849 years which is equivalent to 5.849 x 1011 hours.
Reconstructing the original circuit details
Altering the design
Incorporating it in other designs
Reverse Engineering
Saar Drimer, Volatile FPGA Design Security – A Survey, v0.96, April 2008.
Thomas Wollinger and Christoff Paar, Security Aspects of FPGAs in Cryptographic Applications in New Algorithms, Architectures and Applications for Reconfigurable
Computing, Springer, 2005, Ch. 21, pp 265-278.
These are the toughest to crack.Why?
Increase in gate counts w.r.t number of
I/O pins
Antifuse
Encryption
PUFs
B. Dipert. Cunning circuits confound crooks. http://www.einsite.net/ednmag/contents/images/21df2.pdf, October 12 2000.
Security implications of storing data unprotected and external to FPGA
Non-volatile memory
Transmitted during power up
Vulnerability = can be easily eavesdropped
Feasible
Thomas Wollinger and Christoff Paar, Security Aspects of FPGAs in Cryptographic Applications in New Algorithms, Architectures and Applications for Reconfigurable
Computing, Springer, 2005, Ch. 21, pp 265-278.
Non-volatile + FPGA on one chip
Battery-Backed RAM
eFUSE
Device DNA
Encryption
PUFs
Thomas Wollinger and Christoff Paar, Security Aspects of FPGAs in Cryptographic Applications in New Algorithms, Architectures and Applications for Reconfigurable
Computing, Springer, 2005, Ch. 21, pp 265-278.
Battery-Backed RAM
256-bit key stored in volatile on-chip memory cells.
Must receive continuous power from the external battery.
eFUSE
securely store bitstream decryption key.
No BB-RAM and external battery.
The OTP eFUSE links are permanently programmed.
No need battery backup.
Device DNA
Virtex-6 has embedded, unique device identifier (Device DNA).
unique 57-bit identifier is nonvolatile and permanently programmed
Present in all FPGAs.
For easy debugging.
Read the configuration of FPGA through JTAG.
Thomas Wollinger and Christoff Paar, Security Aspects of FPGAs in Cryptographic Applications in New Algorithms, Architectures and Applications for Reconfigurable
Computing, Springer, 2005, Ch. 21, pp 265-278.
A security bit can be used to prevent the readback functionality.
Although, fault injection has proven successful to overcome these countermeasures
in FPGA.
PUFs
side channel can leak important information.
Side channel can be:
power consumption
Light
Electromagnetic radiation.
Power analysis of bitstream
A. Bogdanov, A. Moradi et. Al, efficient and side-channel resistant authenticated encryption of FPGA Bitstreams, International Conference on Reconfigurable
computing and FPGAs, 2012.
Thomas Wollinger and Christoff Paar, Security Aspects of FPGAs in Cryptographic Applications in New Algorithms, Architectures and Applications for
Reconfigurable Computing, Springer, 2005, Ch. 21, pp 265-278.
Magnetic field surrounding FPGA
Loop antenna to pick variations of field
160 bit EC point Multiplication
Prior info of Encryption is must
Power trace from an RSA operation
Uses standard square and multiply
Square and multiply operations have visibly different power profiles
‘1’ relates to squaring step followed by a multiplication step
‘0’ in the exponent involves only a squaring step
CMOS transistors emit photons.
Electrons gain energy when current flows.
Emission energy is much higher for transition 0->1 than 1->0
To observe the light emitted, the chip needs to be opened either from its backside
or front side, depending on its package type.
Photons collected by high sensitivity photon sensor.
InGaAs detectors have best quantum efficiency.
J.Di. Battista, J. Courrege, B. Rouzeyre, L. Torres and P. Perdu, “When Failure Analysis meets Side-Channel Attacks”, CHES 2010, IACR, Santa Barbara,
California, USA.
First the light emission activity is localized by turning the cryptoprocessor is
on/off.
It is not necessary to know either the architecture of the algorithm, or its
implementation.
This technique is now less used because of the increasing number of metal layers
which act as a light screen.
There are two kinds of countermeasures: Hardware and software
Software countermeasures refer to algorithmic changes, such as masking of secret
keys with random values,
More Complex Algorithms
Hardware countermeasures often deal either with some form of power trace
smoothing or with transistor-level changes of the logic.
This technique is now less used because of the increasing number of metal layers
which act as a light screen.
Thomas Wollinger and Christoff Paar, Security Aspects of FPGAs in Cryptographic Applications in New Algorithms, Architectures and Applications for Reconfigurable
Computing, Springer, 2005, Ch. 21, pp 265-278.
Temperature Modification
Voltage Modification
Fault Injection/Single Event Upsets
Hardware Virus
Manipulating design through JTAG
Modify operating voltages or temperatures of FPGA.
Causes unintended behavior.
Can be used to extract data or bypass certain security features.
Monitor and correctly respond to fluctuations in the operating temperature and
voltage.
Virtex-6 FPGA System Monitor (SYSMON)
CRC circuitry
Zeroization of Device
Thomas Wollinger and Christoff Paar, Security Aspects of FPGAs in Cryptographic Applications in New Algorithms, Architectures and Applications for
Reconfigurable Computing, Springer, 2005, Ch. 21, pp 265-278.
Hardware virus or a hardware Trojan
Kill switch: thinning certain crucial wires, so that electro migration eventually
eliminates part of a wire, creating an open connection
Manipulating the design through JTAG
Disable write feature in JTAG
Don’t download untrusted designs.
Known as Physical Uncloneable
Function.
Physical entity easy to manufacture
but difficult to clone.
PUFs implement a challenge-
response authentication.
Unpredictable response.
This is because of the physical
factors.
PUFs generate different outputs for
same inputs.
Also, they can generate same outputs
for different inputs.
This
randomness is due to the
Challenge-Response Pairs.
Ideal for cryptographic applications
Arbiter PUFs
Based on MUXes and Arbiter
Ring Oscillator or RO-PUF
Based on Delay Circuit and Counters
Note: RO PUFs are more suitable for ASICs and FPGAs. Therefore, we will
concentrate on it.
Consists of N oscillators circuits.
Each Oscillator has a unique
frequency.
At any instance two oscillators are
picked by the MUXes.
Every counter will counter number of
cycles.
Output will be 0 or 1 depending on
counter values.
Sensitive to temperature variations
Limited number of Outputs
Limited number of Challenge
Response Pairs
It’s not over yet…
Reconfigurable hardware offers advantages for cryptographic applications.
Algorithm flexibility
Algorithm distribution
Algorithm modification
Resource efficiency
Architecture efficiency
Source: A. Coman, R. Frățilă. “Cryptographic Applications using FPGA Technology.” SECITC 2010.
Algorithm flexibility
Security protocols such as SSL or IPSec
are algorithm agnostic.
Encryption algorithm chosen when
setting up session.
A configuration controller can set up
hardware implementation of algorithm
for each session.
Source: A. Dandalis, V. K. Prasanna. “An Adaptive
Cryptographic Engine for IPSec Architectures.”
FCCM. April 2000.
Algorithm Distribution
Infeasible to upgrade algorithm in
ASIC deployment without new
hardware.
Reconfigurable hardware in field can
update algorithm or add algorithm to
protocol.
Source: A. Coman, R. Frățilă. “Cryptographic Applications using FPGA Technology.” SECITC 2010.
Algorithm Modification
Adapting preexisting algorithm to
application.
Standard library algorithm is not
sufficient.
No off-the-shelf ASIC will implement
your custom algorithm.
Passwd: DES 25 times in a row + salt.
Resource Efficiency
With security protocols, public-key is
used to send private session key.
Unrelated symmetric-key session uses
the transferred key.
Run-time reconfiguration to repurpose
resources.
Architecture Efficiency
Synthesize encrypt/decrypt hardware given the key.
General purpose multipliers use more area.
Fixed key for current configuration → Constant multiplier.
Can be implemented in LUT as times-table.
Non-constant multiplicand is input to LUT.
http://www.andraka.com/multipli.htm
Architecture Efficiency
Field arithmetic on GF(2n) used for both block and stream ciphers.
For example: GF(23) = {(000), (001), (010), (011), (100), (101), (110), (111)}
Squaring is a common operation.
Depending on basis used:
Simply a bit shift.
Simply in insertion of interlacing zeroes.
For example, in GF(27)
(1101)2 = (1010001)
because (x3 + x2 + 1)2 = x6 + x4 + 1
No nice algorithm for interlacing zeroes between bits.
Reconfigurable hardware – generate squaring register, depending on what
field will be used for algorithm.
Must perform FOIL of the
polynomials.
Followed by modulo reduction
the primitive polynomial.
General algorithm for GF(2w).
int galois_shift_multiply(int x, int y, int w){
int prod;
int i, j, ind;
int k;
int scratch[33];
prod = 0;
for (i = 0; i < w; i++) {
scratch[i] = y;
if (y & (1 << (w-1))) {
y = y << 1;
y = (y ^ prim_poly[w]) & nwm1[w];
} else {
y = y << 1;
}
}
for (i = 0; i < w; i++) {
ind = (1 << i);
if (ind & x) {
j = 1;
for (k = 0; k < w; k++) {
prod = prod ^ (j & scratch[i]);
j = (j << 1);
}
}
}
return prod;
}
Galois.tar - Fast Galois Field Arithmetic Library in C/C++Copyright (C) 2007 James S. Plank
Problems in software:
No ISA-level GF(2n) multiplication instruction.
Bit operations are required to examine one bit at-a-time.
Need to perform modular reduction
Computational bottleneck, especially if n is unknown at runtime.
Must loop and XOR.
A. Mahboob, N. Ikram. “A New Multiplication Technique for GF(2m) with Cryptographic Significance.” WISA 2004.
Many hardware implementations to
parallelize polynomial-basis finitefield multiplication.
For GF(24). Came from general
derivation for GF(2n).
Reconfigurable hardware can
synthesize multipliers needed, given
fields to use.
A. Reyhani-Masoleh, M. A. Hasan. “Low Complexity Bit Parallel Architectures for
Polynomial Basis Multiplication over GF(2m).” IEEE Trans on Computers, August 2004.
Block Cyphers
Like AES, DES
Stream Cyphers
Acterbahn,A5/1
They divide the complete plaintext in to blocks .
They work on symmetric key Cryptography
The key used is same for all blocks.
The number of bits per block is equal to number of bits in the key
They require more memory and more resources than stream Cyphers.
A one-time pad is theoretically unbreakable.
Cryptanalysis tools have nothing to go on.
Key as long as message.
Key generated by true random process, such as particle radiation.
Stream cyphers approach a one-time pad.
Generate cyphertext with a keystream.
Key as long as the plain-text message, or a very long period.
Note: Stream Cyphers are ideal for FPGAs.
Can be easily implemented with
Feedback Shift Register.
Small hardware footprint.
Just XOR keystream with plain-text.
Decryption is same.
Recurrence relation:
ak+4 = ak + ak+1, k≥0
Keystream exits at right.
Characteristic poly:
p(x) = 1 + x3 + x4
p(x) cannot be factored
Only one cycle
Source: https://en.wikipedia.org/wiki/Linear_feedback_shift_register
Linear complexity can also be defined as the shortest LFSR that can generate the sequence.
The feedback function of an LFSR consists only of XOR logic.
The taps chosen give rise to a characteristic polynomial.
Roots of this polynomial’s factors are in the cycles of states the register
can take on.
If polynomial is irreducible, only one cycle (can’t factor it!)
If polynomial is primitive, all states are roots, so LFSR generates a
maximum length sequence.
LFSR/polynomial is non-singular, so all states are reachable.
A non-linear feedback shift-register has more than addition
operation in recurrence relation.
Feedback logic may be XOR, AND and/or finite-field arithmetic.
An NLFSR from our project:
Source: K. Mandal, G. Gong. ”Probabilistic Generation of Good Span n Sequences from Nonlinear Feedback Shift
Registers”, CACR Tech Report. 2012.
A de Bruijn sequence has high linear complexity
Defined as a sequence where each subsequence of a given length occurs
only once.
Primitive LFSR seen earlier generated a de Bruijn sequence
Linear complexity was 4.
Implementing for the first time a stream cypher based on
“Cryptographically Strong de Bruijn Sequences with Large Periods” by
Mandal and Gong.
Presented at SAC 2012 conference.
Using WGT as a kernel NLFSR.
By changing length of NLFSR, and which kernel polynomial is used for
WGT, can change period of keystream generated!
Ideal for reconfig: synthesize different stream cypher given length of
message to be encrypted.
With 40-bit register, obtain keystream with:
Period: 240
Linear Complexity: > 216 >> 40 if using a LFSR.