A DoS Limiting Network Architecture

Transcript A DoS Limiting Network Architecture

Remote Timing Attacks are Practical
An Overview by
- Rahul Deshpande
What are Timing Attacks


Extracting secrets by observing time to respond to
various queries
E.g.. Kocher designed a timing attack to expose secret
keys used for RSA.
Timing Attacks

Usually used to attack weak computing devices such as
Smart Cards

Also applicable to general software systems

Practical against network servers
Common Assumptions

Attack only applicable to hardware security devices

Attack cannot be used to against general purpose
servers since decryption times are masked by many
concurrent processes running on the system.
Challenging the Assumptions

Remote timing attack against OpenSSL developed.

OpenSSL: an SSL library commonly used in web servers
and other SSL applications.

Attack client measures the time an OpenSSL server
takes to respond to the decryption queries.

Client able to extract private key stored on the server.
Environments in which attack is applicable

Network: between two machines in different buildings
with multiple routers and switches between them.

Interprocess: Between two processes running on the
same machine.

Virtual Machines: extracting RSA private key from
secure Virtual Machine (VM), invalidating isolation
provided by Virtual Machine Monitor (VMM)
OpenSSL Decryption






RSA decryption done using modular exponentiation
M = cd mod N where N= pq is the RSA modulus.
OpenSSL uses Chinese Remainder Theorem to perform
exponentiation.
CRT computes exponentiation in two steps by computing
m1 and m2 and then combining the two to get m.
Decryption with CRT gives up to a factor of four speedup
Timing attack can expose the factors of N used in CRT.
The Chinese Remainder Theorem




It is possible to reconstruct integers in a certain range from their
residues modulo a set of pair wise relatively prime moduli.
E.g. The 10 integers in Z10(0,1….9) can be reconstructed from their
two residues modulo 2 and 5 (relatively prime factors of 10).
Provides a way to manipulate large numbers mod M in terms of
tuples of smaller numbers.
CRT can be formulated as:
k

M=∏
mi
i=1



Where, mi are pairwise relatively prime.
Any integer in Zm can be represented by a k-tuple whose elements
are in Zmi using the following correspondence;
A <-> (a1, a2,….ak)
The Chinese Remainder Theorem


n = n1n2…nk with gcd (ni; nj ) = 1 when i != j
The system of congruencies
x=x1(mod n1)=…=xk(mod nk)
has a simultaneous solution x to all of the
congruencies, and there exists exactly one
solution x between 0 and n-1.
Speedup RSA with CRT

Any message M<N is uniquely represented by the tuple [MP;MQ ],
where
MP = M(mod P)
and MQ = M(mod Q).
CP = C(mod P)
and CQ = C(mod Q).
DP = D(mod (P-1)) and DQ = D(mod (Q-1))
RP = QP-1(mod N) and RQ = PQ-1(mod N)
MP = CPDP(mod P) and MQ = CQDQ(mod Q)
SP = MPRP(mod N) and SQ = MQRQ(mod N)
M = SP + SQ. If M>=N then calc M=M-N.

Reference: Johann Großschädl, “The Chinese Remainder
Theorem and its Application in a High-Speed RSA Crypto Chip”
Exponentiation

Simplest algorithm to compute gd mod q is square and
multiply.

OpenSSL uses optimization of square and multiply
called sliding window exponentiation
Sliding Window Exponentiation





Block of bits (window) of d processed at each iteration.
Requires precomputing a multiplication table, taking time
proportional to 2w-1 +1 for a window size of w.
For a 1024-bit modulus, OpenSSL uses a window size of
five.
Attack: Querying on many inputs g, attacker exposes
information about bits of the factor q.
Attack on sliding windows harder than on square and
multiply because of fewer multiplications.
Montgomery Reduction





A reduction modulo q done via multiprecision division
and then returning the remainder is expensive.
Montgomery proposed method for implementing
reduction modulo q using series of operations efficient in
hardware and software.
Montgomery reduction transforms a reduction modulo q
into a reduction modulo some power of two denoted by
R
Reduction modulo power of 2 faster since easily
implemented in hardware.
All variables must be put into Montgomery form.
Montgomery Reduction




At the end of reduction, checked if output cR is greater
than q.
If cR>q, q subtracted from output to keep cR in the range
[o,q). This extra step is called Extra Reduction.
Extra Reduction causes timing difference for different
inputs.
Detecting timing differences from extra reduction tells
how close g is to a multiple of one of the factors.
Multiplication Routines




RSA operations make use of a multi-precision integer
multiplication routine.
OpenSSL implements two multiplication routines:
Karatsuba and Normal.
Karatsuba used when multiplying two numbers with
equal number of words. Takes time O(n1.58).
Normal Multiplication used when multiplying two
numbers with unequal word sizes n and m. Takes time
O(nm).
Multiplication Routines





Normal Multiplication takes quadratic time for numbers of
approximately same size.
Multiplication of two unequal size words takes longer
than multiplication of equal size words.
This fact used in timing attack on OpenSSL.
Underlying word multiplication algorithm dominates the
total time for a decryption.
In OpenSSL, it takes 30%-40% of total running time.
Comparison of Timing Differences

Two algorithmic data dependencies in OpenSSL that
cause time variance in RSA decryption:
1. Number of extra reductions in Montgomery Reduction.
2. Choice of multiplication routine.

Effects of these optimizations counteract each other.

Karatsuba: decryption of g<q faster than g>q and vice
versa for Montgomery Reduction.
A Timing Attack on OpenSSL





Exposes the factorization of the RSA modulus.
Approximations built which get progressively closer as
the attack proceeds.
Can be viewed as a binary search for q.
After recovering half-most bits of q, Coppersmith’s
algorithm used to retrieve complete factorization.
Value of decryption not needed, only the time required
for decryption needed.
Timing Attack on OpenSSL






g is an integer that has the same top i-1 bits as q and
remaining bits of g are 0.
ghi is same as g, with ith bit set to 1. If bit of q is 1 then g<
ghi<q, otherwise g<q< ghi.
Measure the time to decrypt both ug and ughi,
represented as t1 and t2.
Calculate the timing difference td = |t1-t2|.
If bit i of q is 0, then td is large
If bit i of q is 1, then td is small
Real World Scenarios

Timing attack applies to SSL applications such as
stunnel, Apache web server with mod_SSL, and trusted
computing projects such as Microsoft’s NGSCB.

RSA applications using a hardware crypto accelerator
not vulnerable.

Attacks apply to only software based RSA
implementations.
Example of an Attack on SSL server





In a standard full SSL handshake, SSL server performs
RSA decryption using its private key.
CLIENT-KEY-EXCHANGE message composed by
encrypting PKCS 1 padded random bytes with server’s
public key.
In the attack, client substitutes properly formatted
CLIENT-KEY-EXCHANGE with the guess g.
Server generates ALERT message.
Client computes time difference and repeats for various
values.
Experiments







Show that factorization of the RSA modulus N is
vulnerable.
Test effects of increasing decryption requests
Compare effectiveness based upon different keys
Compare effectiveness based upon machine
architecture and common compile-time optimizations
Compare effectiveness based upon source-based
optimizations
Compare inter-process vs. local network attacks
Compare effectiveness against two common SSL
applications: Apache web server with mod_SSL and
stunnel
Experiment Setup





Attack performed against OpenSSL 0.9.7 which does not
blind RSA operations by default.
Simple TCP server implemented that read an ASCII
string
Converted string to OpenSSL’s internal multi-precision
representation
The RSA decryption performed
Decryption time: writing the ciphertext over the socket to
receiving the reply.
Experiment 1- Number of
Ciphertexts

1.
2.

Parameters that determine the number of queries
needed to expose a single bit of RSA factor:
Neighborhood size: for every bit of q, measure the
decryption time for a neighborhood of values g, g+1,
g+2… g+n, denoted by n.
Sample Size: For each value g+i, sample decryption
time multiple time and compute mean decryption time.
Number of times g+i is queried on denoted by s.
Total number of queries needed to compute Tg= s*n.
Continued..




Zero-one gap: gap between when a bit of q is 0 and 1.
Larger the gap, stronger the indicator that bit is 0,
smaller the chance of error.
Increasing the neighborhood size increases zero-one
gap when bit is 0, but is steady when bit is 1.
Total number of queries to recover a factor:
(2ns)*log2(N/4) where N= RSA public modulus.
Experiment 2- Different Keys





Several 1024-bit keys attacked, to determine the ease of
breaking different moduli.
Zero-one gap positive for first 32 bits due to Montgomery
reductions.
Normally, resulting zero-one gap shifts occur around the
multiple of machine word size.
Attacker must be aware that zero-one gap may flip signs
when guessing bits that are around multiples of machine
word size.
If hard-to-guess bits encountered, neighborhood size
can be increased to increase the zero-one gap.
Experiment 3- Architecture and CompileTime effects




Computer Architecture and compile-time optimizations
affect the zero-one gap.
Effect of Architecture: Programs with similar retirement
counts may have different execution profiles.
This is due to different run-time factors such as branch
predictions, pipeline throughput, and the L1 and L2
cache behavior.
Compile-time optimizations change the number of
instructions and how efficiently instructions are executed
on the hardware.
Continued…

1.
2.
3.

Effects of compile-time optimizations tested by
compiling OpenSSL in three different ways:
Optimized
No Pentium flag
Unoptimized
Each different compile-time optimizations changes the
zero-one gap.
Experiment 4 – Source-Based
Optimizations






Patches can change the code profile of RSA libraries
resulting in timing vulnerability.
After a CRT decryption, OpenSSl re-encrypts the result
to verify if it is identical to original ciphertext.
OpenSSL calculates both Montgomery parameters on
every decryption.
A patch allows OpenSSL to cache both the values
between decryptions with the same key.
This shifts the zero-one gap since resulting code has
different execution profile.
Patches may be used to increase the zero-one gap
making the code vulnerable to timing attacks.
Experiment 5 – Interprocess vs. Local
Network Attacks

Noise from network eliminated by repeated sampling,
giving similar zero-one gap to inter-process.

Networks with less than1ms of variance are vulnerable.

Attacker can take advantage of higher CPU speeds for
increasing accuracy of timing measurements.
Experiment 6 – Attacking SSL
Applications on the Local Network






Apache+mod_SSL is a commonly used secure web
server.
Stunnel allows TCP/IP connections to be tunneled
through SSL.
Servers connected by a single switch are vulnerable to
the attack. Attacker has access to a machine near the
OpenSSL-based server.
Timing attacks also work in larger networks where client
and webserver are separated by multiple routers and
switches on the network backbone.
Run-time differences result in different zero-one gaps.
Experiment highlights difficulty in determining minimum
number of queries for a successful attack.
Defenses
Three Possible Defenses:
RSA Blinding:

1.




Calculates
x is then decrypted as normal, followed by division by r.
Since r is random, x is random and timing the decryption
does not reveal information about the key.
Performance penalty of 2%-10%.
Continued…
2. Try and make all RSA decryptions not dependent upon
the input ciphertext.

Harder to create and maintain the code when decryption time is
not dependant upon ciphertext.
3.
Require all RSA computations to be quantized i.e.
always take multiples of some predefined time
quantum.

Preferred method is Blinding.
Drawbacks is that it requires a good source of
randomness to prevent attacks on blinding factor
leading to a small performance degradation

Conclusion

Experiments show that timing attacks are effective when
carried out between machines separated by multiple
routers.

Timing attacks also effective on two processes on the
same computer.

Several Crypto libraries, including OpenSSL, now
implement blinding by default to prevent timing attacks.

A DoS Limiting Network Architecture

Transcript A DoS Limiting Network Architecture

Directory