Transcript Document
Adversaries and
Information Leaks
Geoffrey Smith
Florida International University
November 7, 2007
TGC 2007 Workshop on the Interplay of Programming Languages and Cryptography
1
Motivation
Suppose that a program c processes some
sensitive information.
How do we know that c will not leak the
information, either accidentally or
maliciously?
How can we ensure that c is trustworthy ?
Secure information flow analysis: Do a
static analysis (e.g. type checking) on c
prior to executing it.
2
Two Adversaries
The program c:
Has direct access to the sensitive
information (H variables)
Behavior is constrained by the static
analysis
The observer O of c’s public output:
Has direct access only to c’s public output
(final values of L variables, etc.)
Behavior is unconstrained (except for
computational resource bounds)
3
The Denning Restrictions
Classification: An expression is H if it
contains any H variables; otherwise it is L.
Explicit flows: A H expression cannot be
assigned to a L variable.
Implicit flows: A guarded command with a
H guard cannot assign to L variables.
if ((secret % 2) == 0)
leak = 0;
else
leak = 1;
H guard
assignment to L variable
4
Noninterference
If c satisfies the Denning Restrictions,
then (assuming c always terminates) the
final values of L variables are independent
of the initial values of H variables.
So observer O can deduce nothing about
the initial values of H variables.
Major practical challenge: How can we
relax noninterference to allow “small”
information leaks, while still preserving
security?
5
Talk Outline
I.
Secure information flow for a language
with encryption [FMSE’06]
II. Termination leaks in probabilistic
programs [PLAS’07]
III. Foundations for quantitative information
flow
Joint work with Rafael Alpízar.
6
I. Secure information flow for a
language with encryption
Suppose that E and D denote encryption
and decryption with a suitably-chosen
shared key K.
Programs can call E and D but cannot
access K directly.
Intuitively, we want the following rules:
If e is H, then E(e) is L.
If e is either L or H, then D(e) is H.
But is this sound ? Note that it violates
noninterference, since E(e) depends on e.
7
It is unsound if encryption is
deterministic!
Assume secret : H, mask : L, leak : L.
leak := 0;
n-1
mask := 2 ;
while mask ≠ 0 do (
if E(secret | mask) = E(secret) then
leak := leak | mask;
mask := mask >> 1
)
8
Symmetric Encryption Schemes
SE with security parameter k is a triple
(K,E,D) where
K is a randomized key-generation algorithm
E is a randomized encryption algorithm
We write K <- K.
We write C <- EK(M).
D is a deterministic decryption algorithm
We write M := DK(C).
9
IND-CPA Security
M1
M2
EK(LR(·,·,b))
b?
A
M1 and M2 must
have equal length.
C
The box contains key K and selection bit b.
If b=0, the left strings are encrypted.
If b=1, the right strings are encrypted.
The adversary A wants to guess b.
10
IND-CPA advantage
Experiment Expind-cpa-b(A)
K <- K;
d <- AEK(LR(·,·,b));
return d
Advind-cpa(A) = Pr[Expind-cpa-1(A) = 1] –
Pr[Expind-cpa-0(A) = 1].
SE is IND-CPA secure if no polynomial-time
adversary A can achieve a non-negligible
IND-CPA advantage.
11
Our programming language
e ::= x | n | e1+e2 | … | D(e1,e2)
c ::= x := e
random assignment according
to distribution R
| x <- R
| (x,y) <- E(e)
| skip
| if e then c1 else c2
| while e do c
| c1;c2
Note: n-bit values, 2n-bit ciphertexts.
12
Our type system
Each variable is classified as H or L.
We just enforce the Denning restrictions,
with modifications for the new constructs.
E(e) is L, even if e is H.
D(e1,e2) is H, even if e1 and e2 are L.
R (a random value) is L.
13
Leaking adversary B
B has a H variable h and a L variable l, and
other variables typed arbitrarily.
h is initialized to 0 or 1, each with
probability ½.
B can call E() and D().
B tries to copy the initial value of h into l.
14
Leaking advantage of B
Experiment Expleak(B)
K <- K;
h0 <- {0,1};
h := h0;
initialize other variables to 0;
run BEK(·),DK(·);
if l = h0 then return 1 else return 0
Advleak(B) = 2 · Pr[Expleak(B) = 1] - 1
15
Soundness via Reduction
For now, drop D() from the language.
Theorem: Given a well-typed leaking
adversary B that runs in time p(k), there
exists an IND-CPA adversary A that runs
in time O(p(k)) and such that
Advind-cpa(A) ≥ ½·Advleak(B).
Corollary: If SE is IND-CPA secure, then
no polynomial-time, well-typed leaking
adversary B achieves non-negligible
advantage.
16
Proof of Theorem
Given B, construct A that runs B with a
randomly-chosen 1-bit value of h.
Whenever B calls E(e), A passes (0n, e) to
its oracle EK(LR(·,·,b)).
So if b = 1, E(e) returns EK(e).
And if b = 0, E(e) returns EK(0n), a random
number that has nothing to do with e!
If B terminates within p(k) steps and
succeeds in leaking h to l, then A guesses 1.
Otherwise A guesses 0.
17
What is A’s IND-CPA advantage?
If b = 1, B is run faithfully.
Hence
Pr[Expind-cpa-1(A) = 1]
= Pr[Expleak(B) = 1]
= ½·Advleak(B) + ½
If b = 0, B is not run faithfully.
Here B is just a well-typed program with
random assignment but no encryption.
18
More on the b = 0 case
In this case, we expect the type system to
prevent B from leaking h to l.
However, when B is run unfaithfully, it
might fail to terminate!
Some careful analysis is needed to deal
with this possibility — Part II of talk!
But in the end we get
Pr[Expind-cpa-0(A) = 1] ≤ ½
So Advind-cpa(A) ≥ ½·Advleak(B), as claimed. □
19
Can we get a result more like
noninterference?
Let c be a well-typed, polynomial-time
program.
Let memories μ and ν agree on L variables.
Run c under either μ or ν, each with
probability ½.
A noninterference adversary O is given the
final values of the L variables of c.
O tries to guess which initial memory was
used.
20
A computational noninterference
result
Theorem: No polynomial-time adversary O
(for c, μ, and ν) can achieve a non-negligible
noninterference advantage.
Proof idea: Given O, we can construct a
well-typed leaking adversary B.
Note that O cannot be assumed to be well
typed!
But because O sees only the L variables of
c, it is automatically well typed under our
typing rules.
21
Construction of B
initialize L variables of c according to μ and ν;
if h = 0 then
initialize H variables of c according to μ
else
initialize H variables of c according to ν;
c;
O;
// O puts its guess into g
l := g
22
Related work
Laud and Vene [FCT 2005].
Work on cryptographic protocols: Backes
and Pfitzmann [Oakland 2005], Laud [CCS
2005], …
Askarov, Hedin, Sabelfeld [SAS’06]
Laud [POPL’08]
Vaughan and Zdancewic [Oakland’07]
23
II. Termination leaks in
probabilistic programs
In Part I, we assumed that all adversaries
run in time polynomial in k, the key size.
This might seem to be “without loss of
generality” (practically speaking) since
otherwise the adversary takes too long.
But what if program c either terminates
quickly or else goes into an infinite loop?
In that case, observer O might quickly be
able to observe whether c terminates.
24
The Denning Restrictions and
Nontermination
The Denning Restrictions allow H variables
to affect nontermination:
while secret = 0 do
skip;
leak := 1
“If c satisfies the Denning Restrictions,
then (assuming c always terminates) the
final values of L variables are independent
of the initial values of H variables.”
Can we quantify such termination leaks?
25
Probabilistic Noninterference
Consider probabilistic programs with
random assignment but no encryption.
Such programs are modeled as Markov
chains of configurations (c,μ).
And noninterference becomes
The final probability distribution on L
variables is independent of the initial values
of H variables.
26
A program that violates
probabilistic noninterference
t <- {0,1};
if t =0 then
while h = 1 do skip;
l := 0
else
while h = 0 do skip;
l := 1
randomly assign 0 or 1 to t
t:L
h:H
l:L
If h = 0, terminates with l = 0 with probability ½
and loops with probability ½.
If h = 1, terminates with l = 1 with probability ½
and loops with probability ½.
27
Approximate probabilistic
noninterference
Theorem: If c satisfies the Denning
restrictions and loops with probability at
most p, then c’s deviation from
probabilistic noninterference is at most 2p.
In our example, p = ½, and the deviation is
|½ - 0| + |0 – ½| = 1 = 2p.
probability that l = 0
when h = 0 and when h = 1
probability that l = 1
when h =0 and when h = 1
28
Stripped program c
Replace all subcommands that don’t assign
to L variables with skip.
Note: c contains no H variables.
t <- {0,1};
if t =0 then
while h = 1 do skip;
l := 0
else
while h = 0 do skip;
l := 1
t <- {0,1};
if t =0 then
skip;
l := 0
else
skip;
l := 1
29
The Bucket Property
c’s buckets
loop
l=0
l=1
l=2
l=1
l=2
Pour water from loop bucket
into other buckets.
c’s buckets
loop
l=0
30
Proof technique
In prior work on secure information flow,
probabilistic bisimulation has often been
useful.
Here we use a probabilistic simulation
[Jonsson and Larson 1991] instead.
We define a modification of the weak
simulation considered by [Baier, Katoen,
Hermanns, Wolf 2005].
31
Fast Simulation on a Markov
chain (S,P)
A fast simulation R is a binary relation on
S such that if s1 R s2 then the states
reachable in one step from s1 can be
partitioned into U and V such that
v R s2 for every v Є V
letting K = ΣuЄUP(s1,u), if K > 0 then there
exists a function Δ : S x S -> [0,1] with
Δ(u,w) > 0 implies that u R w
P(s1,u)/K = ΣwЄSΔ(u,w) for all u Є U
P(s2,w) = ΣuЄUΔ(u,w) for all w Є S.
32
Proving the Bucket Property
Given R, a set T is upwards closed if sЄT
and sRs’ implies s’ЄT.
Pr(s,n,T) is the probability of going from s
to a state in T in at most n steps.
Theorem: If R is a fast simulation, T is
upwards closed, and s1 R s2, then
Pr(s1,n,T) ≤ Pr(s2,n,T) for every n.
We can define a fast simulation RL such
that (c,μ) RL (c,μ), for any well-typed c.
33
Approximate noninterference
(c,μ)
at most p
loop
l=0
l=1
l=2
loop
l=0
l=1
l=2
loop
l=0
l=1
l=2
(c,μ) = (c,ν)
μ and ν agree
on L variables
(c,ν)
at most p
34
Remarks
Observer O’s ability to distinguish μ and ν
by statistical hypothesis testing could be
bounded as in [Di Pierro, Hankin, Wiklicky
2002].
The Bucket Property is also crucial to the
soundness proof of the type system
considered in Part I of this talk.
35
III. Foundations for quantitative
information flow
To allow “small” information leaks, we need
a quantitative theory of information.
Quite a lot of recent work:
Clark, Hunt, Malacaria [2002, 2005, 2007]
Köpf and Basin [CCS’07]
Clarkson, Myers, Schneider [CSFW’05]
Lowe [CSFW’02]
Di Pierro, Hankin, Wiklicky [CSFW’02]
36
Research Steps
1. Define a quantitative notion of
information flow.
2. Show that the notion gives appropriate
security guarantees.
3. Devise static analyses to enforce a given
quantitative flow policy.
4. Prove the soundness of the analyses.
Here we’ll discuss only steps 1 and 2.
37
Our Conceptual Framework
Rather than trying to tackle the general
problem, let’s consider important special
cases to better see what’s going on.
Assume that secret h is chosen from some
space S with some a priori distribution.
Assume that c is a program that has only h
as input and (maybe) leaks information
about h to its unique public output l.
Assume that c is deterministic and total.
38
Then…
[Köpf Basin 07]
There exists a function f such that l = f(h).
f induces an equivalence relation on S.
h1 ~ h2 iff f(h1) = f(h2)
So c partitions S into equivalence classes:
f-1(l3)
f-1(l1)
f-1(l2)
39
What is leaked?
The observer O sees the final value of l.
This tells O which equivalence class h
belonged to.
How bad is that?
Extreme 1: If f is a constant function, then
there is just one equivalence class, and
noninterference holds.
Extreme 2: If f is one-to-one, then the
equivalence classes are singletons, and we
have total leakage of h (in principle…).
40
Quantitative Measures
Consider a discrete random variable X
whose values have probabilities p1, p2, p3,…
Assume pi ≥ pi+1
Shannon Entropy H(X) = Σ pi log (1/pi)
“uncertainty” about X
expected number of bits to transmit X
Guessing Entropy G(X) = Σ i pi
expected number of guesses to guess X
41
Shannon Entropy applied to the
partitions induced by c
Let’s assume that
|S| = n and h is uniformly distributed
The partition has r equivalence classes
C1, C2, …, Cr and |Ci| = ni
H(h) = log n
“initial uncertainty about h”
H(l) = ∑ (ni/n) log (n/ni)
Plausibly, “amount of information leaked”
Extreme 1: H(l) = 0
Extreme 2: H(l) = log n
42
How much uncertainty about h
remains after the attack?
This can be calculated as a conditional
Shannon entropy:
H(h|l) = ∑ (ni/n) H(Ci) = ∑ (ni/n) log ni
Extreme 1: H(h|l) = log n
Extreme 2: H(h|l) = 0
There is a pretty equation relating these!
H(h) = H(l) + H(h|l)
initial
uncertainty
information
leaked
remaining
uncertainty
43
So is Step 1 finished?
“1. Define a quantitative notion of information
flow”
In the special case that we are considering, it
seems promising to define the amount of
information leaked to be H(l), and the remaining
uncertainty to be H(h|l).
This seems to be the literature consensus:
Clarke, Hunt, Malacaria
Köpf and Basin — also use G(l) and G(h|l)
Clarkson, Myers, Schneider (?) — Sec. 4.4, when the
attacker’s belief matches the a priori distribution
44
What about Step 2?
“2. Show that the notion gives appropriate
security guarantees.”
“Leakage = 0 iff noninterference holds”
Good, but this establishes only that the
zero/nonzero distinction is meaningful!
More interesting is the Fano Inequality.
But this gives extremely weak bounds.
Does the value of H(h|l) accurately
reflect the threat to h?
45
An Attack
Copy 1/10 of the bits of h into l.
l = h & 017777…7;
Gives 2.1
= n.1 equivalence classes, each
of size 2.9 log n = n.9.
H(l) = .1 log n
H(h|l) = .9 log n
log n
After this attack, 9/10 of the bits are
completely unknown.
46
Another Attack
Put 90% of the possible values of h into one
big equivalence class, and put each of the
remaining 10% into singleton classes:
if (h < n/10) l = h; else l = -1;
H(l) = .9 log (1/.9) + .1 log n ≈ .1 log n + .14
H(h|l) = .9 log (.9n) ≈ .9 log n – .14
Essentially the same as the previous attack!
But now O can guess h with probability 1/10.
47
A New Measure
With H(h|l), we can’t do Step 2 well!
Why not use Step 2 to guide Step 1?
Define V(h|l), the vulnerability of h given l,
to be the probability that O can guess h
correctly in one try, given l.
48
Calculating V(h|l) when h is
uniformly distributed
As before, assume that
|S| = n and h is uniformly distributed
The partition has r equivalence classes
C1, C2, …, Cr and |Ci| = ni
r
V(h|l) = Σ (ni/n) (1/ni) = r/n
i=1
So all that matters is the number of
equivalence classes, not their sizes!
49
Examples
a. Noninterference case: r = 1, V(h|l) = 1/n
b. Total leakage case: r = n, V(h|l) = 1
c. Copy 1/10 of bits: r = n.1, V(h|l) = 1/n.9
d. Put 1/10 of h’s values into singleton
classes: r = 1 + n/10, V(h|l) ≈ 1/10
e. Put h’s values into classes, each of size 10:
r = n/10, V(h|l) = 1/10
f. Password checker: r = 2, V(h|l) = 2/n
50
Conclusion
Maybe V(h|l) is a better foundation for
quantitative information flow.
Using a single number is crude.
Compare examples d, e.
V(h|l) is not so good with respect to
compositionality.
51