Stream Cyphers for FPGA Security

Download Report

Transcript Stream Cyphers for FPGA Security

Shemal Shroff
Shoaib Bhuria
Yash Naik
Peter Hall
 Introduction to Security
 Relevance to FPGA
 Design and Manufacture flow for an FPGA
 Things to secure and why?
 Types of Attack
 Prevention
 PUFs
 Provisions and policies adopted by a network administrator
 To prevent and monitor:




Unauthorized access,
Misuse,
Modification,
Denial of a computer network and network-accessible resources.
Simmonds, A; Sandilands, P; van Ekert, L (2004). "An Ontology for Network Security Attacks". Lecture Notes in Computer Science. Lecture Notes in
Computer Science 3285: 317–323
 Research on “FPGA Security” has been active since the early 2000s.
 Several commercial and military applications employ programmable logic.
 This makes design security important for safety and national security.
WP365, Solving Today’s Design Security Concerns, Xilinx White Paper.
 To learn the confidential cryptographic key.
 One-to-one copy or “cloning” together with its key.
 Reverse engineering of encryption algorithm.
 Execute certain cryptographic operation with presumably secret key.
 E.g. pay-tv and in-government communications
Thomas Wollinger and Christoff Paar, Security Aspects of FPGAs in Cryptographic Applications in New Algorithms, Architectures and
Applications for Reconfigurable Computing, Springer, 2005, Ch. 21, pp 265-278.
Saar Drimer, Volatile FPGA Design Security – A Survey, v0.96, April 2008.
Figure: Simplified depiction of the FPGA design, manufacturing, packaging, and
testing processes.
Saar Drimer, Volatile FPGA Design Security – A Survey, v0.96, April 2008.
Figure: Development, manufacturing,
and distribution of an FPGA-based
system.
Saar Drimer, Volatile FPGA Design Security – A Survey, v0.96, April 2008.
B. Dipert. Cunning circuits confound crooks. http://www.e-insite.net/ednmag/contents/images/21df2.pdf, October 12 2000.
 Bitstream
 Configuration of the device
 Bitstream has all the configuration bits required for programming the FPGA.
 If the bitstream is compromised then your design can be cloned or reverse
engineered.
 To protect the logic of FPGA
 To prevent manipulation of design using JTAG.
 Single Event Upset (SEU) or faults
 Verify that the application is trusted to be correct.
 Authenticate the application.
Black box
Attack
Reverse
engineering
Bitstream
Cloning of
sRAM FPGAs
Readback
Attack
Side Channel
Attack
Attacks
Fault injection
Hardware
virus
Configuration
of the device
Manipulating
design
through JTAG
Voltage
modification
Temperature
1.
Black Box Attack
2.
Reverse-Engineering of the
Bitstreams
3.
Cloning of sRAM FPGAs
4.
Readback Attack
5.
Side Channel Attacks
 Step
1: The attacker inputs all
possible combinations, while saving
the corresponding outputs.
 Step 2: Develops a K-map to simplify
the resulting tables
 Step 3: Extracts the logic of the FPGA.
Thomas Wollinger and Christoff Paar, Security Aspects of FPGAs in Cryptographic Applications in New Algorithms, Architectures and Applications for Reconfigurable
Computing, Springer, 2005, Ch. 21, pp 265-278.
A
C
AB 00
01
11
10
B
C
Output
(Y)
0
0
0
0
0
0
1
0
0
0
1
0
0
0
1
0
1
1
0
0
0
0
0
1
1
0
1
0
0
0
1
0
1
0
1
1
0
0
1
1
1
0
 Y = (A.B)’.B.C’
= A’BC’
 Not a real threat nowadays, due to:
 complexity of the designs
 size of state-of-the-art FPGAs.
 Common I/O pins which makes it difficult to connect to the right pin.
 An attacker has to connect to device’s pin of a known function like,
 Microprocessor interrupt input,
And also,
 Figure out whether to:
 Drive a pin with a voltage,
 Sense its output state, or both isn’t a straightforward exercise.
B. Dipert. Cunning circuits confound crooks. http://www.e-insite.net/ednmag/contents/images/21df2.pdf, October 12 2000.
Thomas Wollinger and Christoff Paar, Security Aspects of FPGAs in Cryptographic Applications in New Algorithms, Architectures and Applications for Reconfigurable
Computing, Springer, 2005, Ch. 21, pp 265-278.
A = 32 bits
B = 32 bits
Adder
Output
 We have, in total, 264 input combinations.
 Lets assume that latency for the adder is 10 ns.
 Therefore, time to apply all the combinations is 264 x10 ns.
 This takes approximately 5849 years which is equivalent to 5.849 x 1011 hours.
 Reconstructing the original circuit details
 Altering the design
 Incorporating it in other designs
Reverse Engineering
Saar Drimer, Volatile FPGA Design Security – A Survey, v0.96, April 2008.
Thomas Wollinger and Christoff Paar, Security Aspects of FPGAs in Cryptographic Applications in New Algorithms, Architectures and Applications for Reconfigurable
Computing, Springer, 2005, Ch. 21, pp 265-278.
 These are the toughest to crack.Why?
 Increase in gate counts w.r.t number of
I/O pins
 Antifuse
 Encryption
 PUFs
B. Dipert. Cunning circuits confound crooks. http://www.einsite.net/ednmag/contents/images/21df2.pdf, October 12 2000.
 Security implications of storing data unprotected and external to FPGA
 Non-volatile memory
 Transmitted during power up
 Vulnerability = can be easily eavesdropped
 Feasible
Thomas Wollinger and Christoff Paar, Security Aspects of FPGAs in Cryptographic Applications in New Algorithms, Architectures and Applications for Reconfigurable
Computing, Springer, 2005, Ch. 21, pp 265-278.
 Non-volatile + FPGA on one chip
 Battery-Backed RAM
 eFUSE
 Device DNA
 Encryption
 PUFs
Thomas Wollinger and Christoff Paar, Security Aspects of FPGAs in Cryptographic Applications in New Algorithms, Architectures and Applications for Reconfigurable
Computing, Springer, 2005, Ch. 21, pp 265-278.
 Battery-Backed RAM
 256-bit key stored in volatile on-chip memory cells.
 Must receive continuous power from the external battery.
 eFUSE
 securely store bitstream decryption key.
 No BB-RAM and external battery.
 The OTP eFUSE links are permanently programmed.
 No need battery backup.
 Device DNA
 Virtex-6 has embedded, unique device identifier (Device DNA).
 unique 57-bit identifier is nonvolatile and permanently programmed
 Present in all FPGAs.
 For easy debugging.
 Read the configuration of FPGA through JTAG.
Thomas Wollinger and Christoff Paar, Security Aspects of FPGAs in Cryptographic Applications in New Algorithms, Architectures and Applications for Reconfigurable
Computing, Springer, 2005, Ch. 21, pp 265-278.
 A security bit can be used to prevent the readback functionality.
 Although, fault injection has proven successful to overcome these countermeasures
in FPGA.
 PUFs
 side channel can leak important information.
 Side channel can be:
 power consumption
 Light
 Electromagnetic radiation.
 Power analysis of bitstream
A. Bogdanov, A. Moradi et. Al, efficient and side-channel resistant authenticated encryption of FPGA Bitstreams, International Conference on Reconfigurable
computing and FPGAs, 2012.
Thomas Wollinger and Christoff Paar, Security Aspects of FPGAs in Cryptographic Applications in New Algorithms, Architectures and Applications for
Reconfigurable Computing, Springer, 2005, Ch. 21, pp 265-278.
 Magnetic field surrounding FPGA
 Loop antenna to pick variations of field
 160 bit EC point Multiplication
 Prior info of Encryption is must
 Power trace from an RSA operation
 Uses standard square and multiply
 Square and multiply operations have visibly different power profiles
 ‘1’ relates to squaring step followed by a multiplication step
 ‘0’ in the exponent involves only a squaring step
 CMOS transistors emit photons.
 Electrons gain energy when current flows.
 Emission energy is much higher for transition 0->1 than 1->0
 To observe the light emitted, the chip needs to be opened either from its backside
or front side, depending on its package type.
 Photons collected by high sensitivity photon sensor.
 InGaAs detectors have best quantum efficiency.
J.Di. Battista, J. Courrege, B. Rouzeyre, L. Torres and P. Perdu, “When Failure Analysis meets Side-Channel Attacks”, CHES 2010, IACR, Santa Barbara,
California, USA.
 First the light emission activity is localized by turning the cryptoprocessor is
on/off.
 It is not necessary to know either the architecture of the algorithm, or its
implementation.
 This technique is now less used because of the increasing number of metal layers
which act as a light screen.
 There are two kinds of countermeasures: Hardware and software
 Software countermeasures refer to algorithmic changes, such as masking of secret
keys with random values,
 More Complex Algorithms
 Hardware countermeasures often deal either with some form of power trace
smoothing or with transistor-level changes of the logic.
 This technique is now less used because of the increasing number of metal layers
which act as a light screen.
Thomas Wollinger and Christoff Paar, Security Aspects of FPGAs in Cryptographic Applications in New Algorithms, Architectures and Applications for Reconfigurable
Computing, Springer, 2005, Ch. 21, pp 265-278.
 Temperature Modification
 Voltage Modification
 Fault Injection/Single Event Upsets
 Hardware Virus
 Manipulating design through JTAG
 Modify operating voltages or temperatures of FPGA.
 Causes unintended behavior.
 Can be used to extract data or bypass certain security features.
 Monitor and correctly respond to fluctuations in the operating temperature and
voltage.
 Virtex-6 FPGA System Monitor (SYSMON)
 CRC circuitry
 Zeroization of Device
Thomas Wollinger and Christoff Paar, Security Aspects of FPGAs in Cryptographic Applications in New Algorithms, Architectures and Applications for
Reconfigurable Computing, Springer, 2005, Ch. 21, pp 265-278.
 Hardware virus or a hardware Trojan
 Kill switch: thinning certain crucial wires, so that electro migration eventually
eliminates part of a wire, creating an open connection
 Manipulating the design through JTAG
 Disable write feature in JTAG
 Don’t download untrusted designs.
 Known as Physical Uncloneable
Function.
 Physical entity easy to manufacture
but difficult to clone.
 PUFs implement a challenge-
response authentication.
 Unpredictable response.
 This is because of the physical
factors.
 PUFs generate different outputs for
same inputs.
 Also, they can generate same outputs
for different inputs.
 This
randomness is due to the
Challenge-Response Pairs.
 Ideal for cryptographic applications
 Arbiter PUFs
 Based on MUXes and Arbiter
 Ring Oscillator or RO-PUF
 Based on Delay Circuit and Counters
Note: RO PUFs are more suitable for ASICs and FPGAs. Therefore, we will
concentrate on it.
 Consists of N oscillators circuits.
 Each Oscillator has a unique
frequency.
 At any instance two oscillators are
picked by the MUXes.
 Every counter will counter number of
cycles.
 Output will be 0 or 1 depending on
counter values.
 Sensitive to temperature variations
 Limited number of Outputs
 Limited number of Challenge
Response Pairs
It’s not over yet…
 Reconfigurable hardware offers advantages for cryptographic applications.
 Algorithm flexibility
 Algorithm distribution
 Algorithm modification
 Resource efficiency
 Architecture efficiency
Source: A. Coman, R. Frățilă. “Cryptographic Applications using FPGA Technology.” SECITC 2010.
 Algorithm flexibility
 Security protocols such as SSL or IPSec
are algorithm agnostic.
 Encryption algorithm chosen when
setting up session.
 A configuration controller can set up
hardware implementation of algorithm
for each session.
Source: A. Dandalis, V. K. Prasanna. “An Adaptive
Cryptographic Engine for IPSec Architectures.”
FCCM. April 2000.
 Algorithm Distribution
 Infeasible to upgrade algorithm in
ASIC deployment without new
hardware.
 Reconfigurable hardware in field can
update algorithm or add algorithm to
protocol.
Source: A. Coman, R. Frățilă. “Cryptographic Applications using FPGA Technology.” SECITC 2010.
 Algorithm Modification
 Adapting preexisting algorithm to
application.
 Standard library algorithm is not
sufficient.
 No off-the-shelf ASIC will implement
your custom algorithm.
Passwd: DES 25 times in a row + salt.
 Resource Efficiency
 With security protocols, public-key is
used to send private session key.
 Unrelated symmetric-key session uses
the transferred key.
 Run-time reconfiguration to repurpose
resources.
 Architecture Efficiency
 Synthesize encrypt/decrypt hardware given the key.
 General purpose multipliers use more area.
 Fixed key for current configuration → Constant multiplier.
 Can be implemented in LUT as times-table.
 Non-constant multiplicand is input to LUT.
http://www.andraka.com/multipli.htm
 Architecture Efficiency
 Field arithmetic on GF(2n) used for both block and stream ciphers.
 For example: GF(23) = {(000), (001), (010), (011), (100), (101), (110), (111)}
 Squaring is a common operation.
 Depending on basis used:
 Simply a bit shift.
 Simply in insertion of interlacing zeroes.
 For example, in GF(27)
 (1101)2 = (1010001)
 because (x3 + x2 + 1)2 = x6 + x4 + 1
 No nice algorithm for interlacing zeroes between bits.
 Reconfigurable hardware – generate squaring register, depending on what
field will be used for algorithm.
 Must perform FOIL of the
polynomials.
 Followed by modulo reduction
the primitive polynomial.
 General algorithm for GF(2w).
int galois_shift_multiply(int x, int y, int w){
int prod;
int i, j, ind;
int k;
int scratch[33];
prod = 0;
for (i = 0; i < w; i++) {
scratch[i] = y;
if (y & (1 << (w-1))) {
y = y << 1;
y = (y ^ prim_poly[w]) & nwm1[w];
} else {
y = y << 1;
}
}
for (i = 0; i < w; i++) {
ind = (1 << i);
if (ind & x) {
j = 1;
for (k = 0; k < w; k++) {
prod = prod ^ (j & scratch[i]);
j = (j << 1);
}
}
}
return prod;
}
Galois.tar - Fast Galois Field Arithmetic Library in C/C++Copyright (C) 2007 James S. Plank
 Problems in software:
 No ISA-level GF(2n) multiplication instruction.
 Bit operations are required to examine one bit at-a-time.
 Need to perform modular reduction
 Computational bottleneck, especially if n is unknown at runtime.
 Must loop and XOR.
A. Mahboob, N. Ikram. “A New Multiplication Technique for GF(2m) with Cryptographic Significance.” WISA 2004.
 Many hardware implementations to
parallelize polynomial-basis finitefield multiplication.
 For GF(24). Came from general
derivation for GF(2n).
 Reconfigurable hardware can
synthesize multipliers needed, given
fields to use.
A. Reyhani-Masoleh, M. A. Hasan. “Low Complexity Bit Parallel Architectures for
Polynomial Basis Multiplication over GF(2m).” IEEE Trans on Computers, August 2004.
 Block Cyphers
 Like AES, DES
 Stream Cyphers
 Acterbahn,A5/1
 They divide the complete plaintext in to blocks .
 They work on symmetric key Cryptography
 The key used is same for all blocks.
 The number of bits per block is equal to number of bits in the key
 They require more memory and more resources than stream Cyphers.
 A one-time pad is theoretically unbreakable.
 Cryptanalysis tools have nothing to go on.
 Key as long as message.
 Key generated by true random process, such as particle radiation.
 Stream cyphers approach a one-time pad.
 Generate cyphertext with a keystream.
 Key as long as the plain-text message, or a very long period.
 Note: Stream Cyphers are ideal for FPGAs.
 Can be easily implemented with
Feedback Shift Register.
 Small hardware footprint.
 Just XOR keystream with plain-text.
 Decryption is same.
 Recurrence relation:
 ak+4 = ak + ak+1, k≥0
 Keystream exits at right.
 Characteristic poly:
 p(x) = 1 + x3 + x4
 p(x) cannot be factored
 Only one cycle
Source: https://en.wikipedia.org/wiki/Linear_feedback_shift_register
Linear complexity can also be defined as the shortest LFSR that can generate the sequence.
 The feedback function of an LFSR consists only of XOR logic.
 The taps chosen give rise to a characteristic polynomial.
 Roots of this polynomial’s factors are in the cycles of states the register
can take on.
 If polynomial is irreducible, only one cycle (can’t factor it!)
 If polynomial is primitive, all states are roots, so LFSR generates a
maximum length sequence.
 LFSR/polynomial is non-singular, so all states are reachable.
 A non-linear feedback shift-register has more than addition
operation in recurrence relation.
 Feedback logic may be XOR, AND and/or finite-field arithmetic.
 An NLFSR from our project:
Source: K. Mandal, G. Gong. ”Probabilistic Generation of Good Span n Sequences from Nonlinear Feedback Shift
Registers”, CACR Tech Report. 2012.
 A de Bruijn sequence has high linear complexity
 Defined as a sequence where each subsequence of a given length occurs
only once.
 Primitive LFSR seen earlier generated a de Bruijn sequence
 Linear complexity was 4.
 Implementing for the first time a stream cypher based on
“Cryptographically Strong de Bruijn Sequences with Large Periods” by
Mandal and Gong.
 Presented at SAC 2012 conference.
 Using WGT as a kernel NLFSR.
 By changing length of NLFSR, and which kernel polynomial is used for
WGT, can change period of keystream generated!
 Ideal for reconfig: synthesize different stream cypher given length of
message to be encrypted.
 With 40-bit register, obtain keystream with:
 Period: 240
 Linear Complexity: > 216 >> 40 if using a LFSR.