Transcript Slideshow

ISIT 2000 Tutorial
Cryptography: An Introduction
for Information Theorists
James L. Massey
Prof. em. ETH-Zürich
Adjunct Professor, Lund University
Trondhjemsgade 3, 2t.h.
DK-2100 København Ø
[email protected]
© Copyright 2000, James L. Massey. All rights reserved.
1
Outline
Unconditional Security (“information-theoretic security”)
• Shannon’s theory of secrecy
• Simmon’s theory of authenticity
• Secret sharing
Computational Security:
• Shannon’s work function for a cipher.
• Diffie’s and Maurer’s use of a public randomizer.
• Hiltgen’s bounds on one-wayness for gate complexity.
Public-Key Cryptography
• Diffie and Hellman’s theory
• Public-key (two key) vs. secret-key (one key)
cryptography
2
In 1949, Shannon made the important distinction
between two kinds of security in secrecy systems, viz.
“Theoretical Secrecy” (or “unconditional security”
or “information-theoretic security”): “How secure is a
system against cryptanalysis when the enemy has
unlimited time and manpower available for the analysis
of intercepted cryptograms?”
“Practical Secrecy” (or “computational security”):
“An analysis of the basic weaknesses of secrecy
systems is made. This leads to methods for
constructing systems which will require a large amount
of work to solve.
C.E. Shannon, "Communication Theory of Secrecy Systems", Bell
System Tech. J., vol. 28, pp. 656-715, Oct., 1949.
3
Type of security
Type of algorithm
Goal
Secrecy
One
Key
Two
Key
Authenticity
Computational
4
Shannon always made his assumptions clear!
ENEMY
CRYPTANALYST
E
MESSAGE MESSAGE ENCIPHERER CRYPTOGRAM
SOURCE
TK
M
E
KEY
K
E
DECIPHERER MESSAGE
TK-1
M
KEY K
KEY
SOURCE
Shannon’s 1949 “Fig. 1—Schematic of a general secrecy
system.” [Tiger added]
This figure makes clear what Shannon’s assumptions are
for a ciphertext-only attack by the enemy cryptanalyst.
5
Shannon’s “Fig. 1” specifies the following assumptions:
• Message M and Key K are independent, i.e.,
PK |M = m(k) = PK(k)
for every k and every m. Equivalently,
H(K | M) = H(K).
• The Message M can be recovered from the Cryptogram
E and the Key K, i. e.,
H(M | EK) = 0.
• The Key K is somehow delivered to the sender and
receiver so that it is unknown a priori by the enemy
cryptanalyst. (i.e. “secret-key” or “one key” cryptography.
6
Shannon always stated his definitions clearly!
Perfect Secrecy- “`Perfect Secrecy´ is defined by
requiring of a system that after a cryptogram is
intercepted by the enemy the a posteriori probabilities
of this cryptogram representing various messages be
identically the same as the a priori probabilities of the
same messages before the interception.”
In other words, perfect secrecy means that
PM |E = e(m) = PM(m)
for every e and every m, or equivalently, that
H(M | E) = H(M).
N.B.: For Shannon, M is the total message that is sent
before the key E is changed.
7
Shannon gave proofs of what he claimed!
Shannon proved that Vernam's 1926 cipher (now
usually called the “one-time pad") gave perfect secrecy.
(Vernam had mentioned that the “unbreakability” of his cipher was
demonstrated by field trials with the U.S. Army Signal Corps.)
ENEMY
CRYPTANALYST
Binary
Plaintext
Source
E
M
Destination
K
K
M
E
E
K
KEY K
BSS
For every N-bit message m, PE |M = m(e) = 2-N for every
N -bit cryptogram e.
 M and E are statistically independent.
8
Shannon’s Proposition on Key Length:
“For perfect secrecy, ... the number of different
keys is at least as great as the number of
[different messages].”
Proof: For perfect secrecy,
PE |M = m(e) = PE(e)  0
for every e and every m. The number of possible
cryptograms must be at least as great as the
number of different messages. “But all the keys
from a fixed m to different e’s must be different.”
This is actually a stronger statement about key
length than the more usual version of Shannon’s
proposition:
9
Shannon’s proposition on key length is usually
stated (even by Shannon later) as
For perfect secrecy, H(K)  H(M).
Proof: Perfect Secrecy 
H(M) = H(M | E)
 H(MK | E) =
= H(K | E) + H(M | KE)
= H(K | E)  H(K).
From my personal notes of a lecture given by Shannon in
his course at M.I.T. in May, 1961:
“It is obvious that H(M | E)  H(K | E).”
10
Shannon’s Key-Equivocation Function: H(K | E1E2 … EN)
where Ei is the ith digit of the cryptogram.
H(K | E1E2 … EN)
H(K)
Strongly ideal cipher
Typical cipher
0
N
“… it is natural to use the equivocation as a
theoretical secrecy index.”
A cipher is strongly ideal if the cryptogram E is
statistically independent of the key K.
11
Equivalently, a cipher is strongly ideal if the
enemy cryptanalyst in a ciphertext-only attack can
never learn anything about the key.
Strongly ideal ciphers exist!
Proof:
M
E
K
Massey’s strongly-ideal cipher
12
Less trivially, as Shannon observed:
“To approximate the ideal equivocation, one may
first operate on the message with a transducer
which removes all redundancies. After this almost
any simple ciphering system - substitution,
transposition, Vigenère, etc., is satisfactory.”
Shannon’s arguments directly imply that
M
IDEAL
DATA
COMPRESSOR
is a strongly ideal cipher!
ARBITRARY
NON-EXPANDING
CIPHER
E
K
This would be great, except that no one knows how
to build an ideal data compression system!
13
PUBLIC
RANDOMIZER
R
PRIVATE
RANDOMIZER
ENEMY
CRYPTANALYST
S
R
E
MESSAGE MESSAGE ENCIPHERER CRYPTOGRAM
SOURCE
TK
M
E
KEY
K
E
R
DECIPHERER MESSAGE
TK-1
M
KEY K
KEY
SOURCE
Ueli Maurer’s 1990 generalization of Shannon’s
“Fig. 1—Schematic of a general secrecy system.”
14
Maurer’s “Fig. 1” implies
H(E | MKSR) = 0 (encryption)
and
H(M | EKR) = 0 (decryption).
Shannon’s definition of perfect secrecy for a cipher
must be modified to read “M is independent of (E, R)”.
Shannon’s key-length proposition is unchanged:
For perfect secrecy, H(K)  H(M).
Proof: Exercise!
15
PUBLIC
RANDOMIZER
PRIVATE
RANDOMIZER
MESSAGE
SOURCE
M
S
R
R
ENCIPHERER
TK
K
E
R
E'
E' is accepted if
and only if it is a
valid cryptogram
for the key K.
DECIPHERER
TK-1
M'
ENEMY
CRYPTANALYST
K
KEY
SOURCE
RECOGNIZED
UNAUTHENTIC
E' can be the legitimate cryptogram E or a phony
cryptogram E (E  E) inserted by the attacker.
Simmons’ Model of a Substitution Attack on an
Authenticity System (with Maurer’s randomizers added).
16
In an impersonation attack, the attacker forms E
without seeing a legitimate cryptogram E.
PI = Probability of successful impersonation when the
attacker uses an optimum attack.
PS = Probability of successful substitution when the
attacker uses an optimum attack.
Pd = Probability of Deception = max(PI, PS)
Simmons’ 1984 bound on the probability of deception:
Pd  2-I(E; K).
The only way to get unconditionally secure authenticity
is to let the cryptogram give information about the key!
17
Example: 1-bit messages with individual signatures.
K = (K1, K2, . . . Kn, Kn+1, . . . K2n) [2n-bit key]
assumed generated by a BSS.
M is 0 or 1.
M = 0  E = (0, K1, K2, . . . Kn)
M = 1  E = (1, Kn+1, Kn+2, . . . K2n)
Note that there is no secrecy!
Whether the attacker observes E or not, he must
guess n bits of key to produce a cryptogram E that
will be accepted as authentic.
 PI = PS = Pd = 2-n.
But I(E; K) = n bits so that Simmons’ bound holds
with equality!
18
The informational divergence (or the “Kullbach-Leibler
distance” or the “relative entropy” or the “discrimination”)
from P to Q, two probability distributions on the same
alphabet, is the quantity
Q(x)
D( P || Q)    P(x) log
.
P(x)
xsupp( P )
Fundamental property of informational divergence:
D( P || Q)  0 with equality if and only if P  Q.
Let H0 and H1 be the two possible hypotheses and let Y
be the observation used to determine which hypothesis is
true. Let D0 and D1 be the regions of Y values in which
one decides for H0 or H1, respectively. Let  and  be the
error probabilities when H0 or H1 is true, respectively.
19
Let V (0 or 1) be the decision for which hypothesis is
true so that
  PV |H (1)
0
and
  PV |H (0) .
Direct calculation gives
D( PV |H 0 || PV |H1
1
1- 

)   log
 (1   ) log
.

1-
Fundamental information-theoretic bound for
hypothesis testing:
D( PY |H 0 || PY |H1

)   log
 (1   ) log

1-
with equality if and only if
1- 
PY|H 0 ( y)
PY|H1 ( y)
has constant
value for all y  D0 and has constant value for all y  D1.
20
For the important special case where  = 0, i.e., where
we never err when H0 is true, the previous bound gives
 2
 D ( PY |H0 ||PY |H1 )
.
Now suppose that H0 is the hypothesis that Y is the
legitimate cryptogram E for the key k, i.e.,
PY |H 0 ( y )  PE|K  k ( y ) , and that H1 is the hypothesis that
Y is formed by an impersonating attacker according to
PY |H1 ( y )  PE ( y )   PK k (k ) PE|K k ( y ) ,
k
which may not be the optimum attacking strategy. Let
k be the error probability when K = k so that
k  2
 D ( PE|K k ||PE )
.
21
D ( PE|K k || PE )  

y
PE ( y )
PE|K k ( y ) log
PE|K k ( y )
   PK (k ) k   PK (k )2
k
k

2

 D ( PE|K k ||PE )
k
 PK ( k )D( PE|K k ||PE )
k
where we have used Jensen’s inequality. But
 PK ( k )D ( PE|K k
k
|| PE )  H ( E )  H ( E | K )  I ( E ; K ) .
Moreover,  is just the probability PI of successful
impersonation, so the information-theoretic bound becomes
PI
 2 I ( K ; E ) .
22
Simmons’ proof of his bound on the probability of
deception appears in
G. J. Simmons, "Authentication Theory/Coding Theory," pp. 411-431 in Advances in
Cryptology - CRYPTO '84 (Eds. G. R. Blakey and D. Chaum), Lecture Notes in
Computer Science No. 196. Heidelberg and New York: Springer, 1985.
Several simplifications of his derivation have since been
given. The most insightful one, which we have followed, is
by Maurer, cf.
U. M. Maurer, "Information Theoretic Bounds in Authentication Theory," p.12 in Proc.
IEEE Inst. Symp. Info. Th., Whistler, Canada, Sept. 17-22, 1995.
U.M. Maurer, "A Unified and Generalized Treatment of Authentication Theory, pp. 387398 in Proc. 13th Symp. on Theoretical Aspects of Computer Science (STACS'96),
Lecture Notes in Computer Science No. 1046, New York: Springer, 1996.
Maurer based his treatment on Blahut’s informationtheoretic approach to hypothesis testing, cf.
R. E. Blahut, “Hypothesis testing and information theory”, IEEE Trans. Inform. Theory,
vol. IT-20, pp. 405-417, July 1974
23
Classical Secret Sharing
1 0 0 1 0 1 1 0 0 1
1 0 0 1 0
1 1 0 0 1
• The secret “leaks out”
• Both crooks are needed to open
24
No-Leakage Secret Sharing
The secret
1 0 0 1 0 1 1 0 0 1
Share 1: BSS output
0 0 1 1 0 1 0 1 1 1
Share 2: Secret  BSS output
1 0 1 0 0 0 1 1 1 0
• No leakage (Perfect secrecy!)
• Both crooks are needed to open
25
What is really going on here?
We are actually using the (n=3, k=2) linear code over
GF(210) with generator matrix and parity-check matrix
1 0 1
G

0 1 1
and
H  1 1 1 ,
respectively. This code has dual distance d = 3. We
are using this code in the following manner:
• Letting [v1, v2, v3] be the codeword, we choose the
first digit v1 to be the 10-bit secret;
• we choose the second digit v2 uniformly at random
in GF(210);
• we compute the third digit v3 so that [v1, v2, v3] H = 0;
• and we select v2 and v3 as the two shares of the secret.
26
Proposition: If we choose the secret to be the first digit in a
systematic (n, k) maximum-distance separable (MDS) linear
code, choose the next k - 1 information digits uniformly at
random, compute the codeword [v1, v2, . . . , vn] to satisfy
the parity-check equations, and then take v2, v3, . . . , vn to
be the n - 1 shares of the secret, the resulting secretsharing scheme has the properties that
• any k shares uniquely determine the secret, but
• any k - 1 (or fewer) shares give no information about the
secret [in the sense of perfect secrecy].
Proof: Any k positions in an MDS code form an information
set. The first position together with any k - 1 other positions
form an information set so if we choose the digits in these
other positions, then every value of the first digit (the secret)
is possible in exactly one codeword.
27
This proposition generalizes for access structures to:
Proposition: If we choose the secret to be the first digit in a
systematic (n, k) linear code with dual distance d, choose
the next k - 1 information digits uniformly at random,
compute the codeword [v1, v2, . . . , vn] to satisfy the paritycheck equations, and then take v2, v3, . . . , vn to be the n - 1
shares of the secret, the resulting secret-sharing scheme
has the property that
• a collection of shares gives no information about the
secret [in the sense of perfect secrecy] unless this
collection contains all the shares corresponding to the
remaining non-zero positions of a codeword in the dual
code whose first component is 1, in which case the
collection of shares uniquely determine the secret.
Proof: The only constraints among code digits are those
given by the codewords of the dual code.
28
Some references for secret sharing:
Secret sharing was introduced by Adi Shamir in his paper:
A. Shamir, "How to share a secret", Comm. ACM, vol. 22, pp. 612613, November 1979
in which he essentially reinvented the Reed-Solomon
codes.
The connection to Reed-Solomon codes (and maximumdistance-separable codes in general) was pointed out in
McEliece, R. J. and Sarwate, D. V., “On sharing secrets and ReedSolomon codes,” Communications of the ACM, vol. 24, pp. 583-584,
1981.
The generalization for arbitrary access sets is given in
J. L. Massey, "Minimal Codewords and Secret Sharing," in Proc. 6th
Joint Swedish-Russian Int. Workshop on Info. Theory, 1993, pp. 276279.
29
Shannon defined the work function W(N) of a cipher
as the expected computational work to find all
consistent keys, given the first N bits of the ciphertext.
Shannon postulated that the work function of a typical
cipher would have the following form:
W(N)
“unicity distance”
N
No one has ever calculated, or even given an
interesting lower bound on, the work function of
any practical cipher!
30
Shannon realized that practical ciphers would have to
depend on computational security.
“The problem of good cipher design is essentially one
of finding difficult problems, subject to certain other
conditions. This is a rather unusual situation, since one
is ordinarily seeking the simple and easily soluble
problems in a field.”
“How can we ever be sure that a system which is not
ideal and therefore has a unique solution for sufficiently
large N will require a large amount of work to break
with every method of analysis. ... We may construct
our cipher in such a way that breaking it is equivalent to
(or requires at some point in the process) the solution
of some problem known to be laborious.”
What is hard problem?
31
The most commonly used definition today in
cryptography of a “hard problem” is:
“This is a hard problem—no one has been
able to solve this problem.”
(Vernam would have approved of this definition.)
“Problems worthy of attack
prove their worth by hitting
back!”
Piet Hein
“A hard problem is one that nobody works on.”
32
“A rose is a rose is a rose.”
Gertrude Stein
But in computational complexity theory,
“a problem is not a problem is not a problem”
and
“a function is not a function is not a function”.
Problems (or functions) must have countably
infinitely many instances, each of which is a
problem (or a function). [In simple words, a
problem (or function) is a family of problems (or
functions)]
33
Example (Jevon’s problem, 1873): m = 8 616 460
799 is the product of two distinct primes, what are
they?
Jevon stated that “I think it is unlikely that anyone will
ever know; for they are two large prime numbers.”
N. B.: 8 616 460 799 = 96 079 * 89 681
Example: The problem: Given the product m of
two distinct primes p1 and p2, find these primes.
(Jevon’s problem is an instance of the above
problem.)
Breaking a cipher is a problem, not a problem. It
seems to me that complexity theory has little to say
about the security of practical ciphers.
34
.
f2
f1
.
.
f3
Instances (functions)
of
increasing “size”
Suppose that computing the function f is NP-hard
and NP  P.
 No algorithm exists that can efficiently compute
the value of all instances of the function f.
But, for every instance (function) of the function
f, there might be an algorithm that easily computes
this function!
35
Diffie’s (unpublished) scheme:
00...000
00...001
00...010
.
.
.
00...011
When one dials the L-bit telephone
number of one of these 2L
telephones, one receives a recorded
message consisting of an N-bit
string (N >> L) that was a produced
by a BSS for this telephone only.
.
.
.
• Alice and Bob somehow agree privately on an L-bit secret key K.
• Later, when Alice wants to send Bob a confidential message of
length N, she dials telephone K, gets its sequence, and does a
“one-time pad” encryption of the message.
• When Bob gets the cryptogram from Alice, he dials telephone K, gets
its sequence, and does a “one-time pad” decryption of the message.
36
y = f(x) is a one-way function (or permutation) if
• for every x, it is easy to compute f(x) but,
• for virtually all y, it is infeasibly difficult to find an x
(the x) such that f(x) = y.
y = f(x, z) is a secret-key one-way function (or
permutation) if
• it is easy to make a random choice of z, and
• when z is known then, for every x, it is easy to compute
f(x, z), but
• when z is not known, then for virtually all y, it is
infeasibly difficult to find an x (the x) such that f(x, z) = y,
even if one knows many (xi, yi) pairs for this z.
[Note that this differs from the definition, given later on these
transparencies, of a keyed one-way permutation in that f(x, z) need
not be invertible for fixed z nor, if one knows z and is given y, need it
be easy to find an x such that f(x, z) = y.]
37
Public-Key Cryptography
A trapdoor one-way permutation is a family of
permutations indexed by a parameter z (the trapdoor)
such that
• it is easy to make a random choice of z,
• if you know z you can easily find fast algorithms Ez
and Dz that implement the permutation fz(.) and its
inverse gz(.), respectively, but
• if you don’t know z then, for virtually all y in the
range of fz(.), it is infeasibly difficult when given y to
find the x such that y = fz(x) even if you know the
fast algorithm Ez for computing fz(.) [and hence can
act as your own magic genie (or “oracle”)].
38
The mechanical analogy of a
Trapdoor One-Way Permutation:
Ez
A lock with two very different keys--the key EZ is for
LOCKING and the key DZ is for UNLOCKING.
Public-key cryptography is thus sometimes called twokey cryptography or asymmetric cryptography.
39
As was noted by Diffie and Hellman in 1976, with a
Trapdoor One-Way Permutation one can build a
computationally secure public-key cryptosystem:
B
Message
Source
x
fZB(.)
EZB
y
Alice encrypts
PUBLIC
CHANNEL
Message
Sink
y
x
gZB(.)
Bob decrypts
Trusted
Public
Directory
(A, EZA)
(B, EZB)
etc.
Something new! Trusted
information instead of secrets.
40
A keyed one-way permutation (KOWP) is a family
of permutations indexed by a parameter z (the key)
such that
• it is easy to make a random choice of z,
• if you know z you can easily find fast algorithms Ez
and Dz that implement the permutation fz(.) and its
inverse gz(.), respectively, but
• if you don’t know z then, for virtually all y in the
range of fz(.), it is infeasibly difficult when given y to
find the x such that y = fz(x) even with the help of a
magic genie (or “oracle”), who when given any x'
immediately returns y' = fz(x').
41
The mechanical analogy of a
Keyed One-Way Permutation (KOWP):
A lock with only one key--the same key z is used for
LOCKING and for UNLOCKING.
Secret-key cryptography is thus sometimes called
one-key cryptography or symmetric cryptography.
42
With a Keyed One-Way Permutation one can build
a secret-key cryptosystem that is computationally
secure against a chosen-plaintext attack.
Note that a Trapdoor One-Way Permutation can be
used as a Keyed One-Way Permutation simply by
Alice and Bob sharing the trapdoor z as a secret key
and never publishing the encryption algorithm.
It follows that it is easier to build a computationally
secure secret-key cryptosystem than to build a
computationally secure public-key cryptosystem.
The advantage of public-key systems is not their
greater security, but their ability to create public
signatures and to ease the key-distribution problem.
43
As was also noted by Diffie and Hellman in 1976,
with a Trapdoor One-Way Permutation one can
create computationally-secure public signatures.
Redundant
Message
Source
x
Random looking string
gZA(.)
Alice signs
Intelligible
message
x
Redundancy
Check
A
fZA(.)
EZA
RECOGNIZED
UNAUTHENTIC
Trusted
Public
Directory
(A, EZA)
(B, EZB)
etc.
Bob verifies
Public signatures cannot be created by secret-key methods!
44
The quest for provable security:
To make real progress toward provably secure
cryptosystems, we need to use a complexity measure that
captures the compexity of individual instances of a
function. Gate complexity seems to be the most useful
such measure at this time.
A gate is a boolean function of two variables.
The gate complexity of a boolean function
f(x1, x2, … , xn) is the smallest number of gates in an
acyclic gate network that computes this function.
(If there is at least one gate, then the output of the network
is the output of a designated gate whose output is not
otherwise used.)
N.B. The functions 0, 1, x1, x2, … , xn and only these
functions have gate complexity 0.
45
Virtually all lower bounds on gate complexity rely on
Shannon’s “counting argument” that was given in
C. E. Shannon, “The Synthesis of Two-Terminal Switching Circuits,"
Bell System Tech. J., vol. 28, pp. 59-98, 1949.
If the number of different functions that can be
computed with K gates is NK and the number of
functions in a certain class is Nf where Nf > NK, then
there is at least one function in the class that has gate
complexity greater than K. Moreover, the fraction of
functions in the class that has gate complexity K or
less is at most NK/Nf.
To apply this argument, we first overbound NK with GK,
the number of different acyclic gate networks with K gates
(K  1), many of which compute the same function.
46
Overbounding GK:
x1
x2
..
.
1
2
. . .
f(x1, x2, … , xn)
K
xn
Each of the two inputs to gate i must be one of the n
inputs or the output of one of the i - 1 previous gates.
Gate i must be one of the 16 possible gates. Thus
K
GK   (n  i  1) (16)  ( K  n)
2
i 1
Thus,
G2n /(2 n)
for all n  5.
n
K
2
 (4  4n)
2n
2n
2( )
2n
2K
 
 2
2n
n n
2K
.
2
2n
4
47
The number of different boolean functions of n variables
is just
2n
N f  2 , so we have proved:
There exist boolean functions of n variables with gate
1 2n
for all n  5. Virtually all
complexity at least
2 n
such functions have at least this gate complexity.
Using careful bounding arguments, Hiltgen proved:
Proposition: The gate complexity of the most complex
boolean function of n variables is at least (1/n)2n and is less
than 3.8  (1/n)2n for all n  3 (except possibly 6 and 7).
N.B. The best construction, due to Blum, achieves gate
complexity only 3n - 3.
48
Perhaps the most fundamental question is: Do oneway permutations exist? The strongest result on
computational assymetry is due to Hiltgen:
Proposition: For all n sufficiently large, there exist
permutations on n binary digits (in which no output is
equal to one of the inputs) whose inverse has gate
complexity essentially twice that of the permutation itself.
Is this too pessimistic? We have proved:
Proposition: For all n  6, virtually all permutations on
n binary digits have gate complexity different by a
factor of less than 2.5 from the gate complexity of their
inverse.
49
You can find a proof of the previous proposition as well
as more information on gate complexity by going to
http://www.iacr.org and following the links to the IACR
Distinguished Lectures and then to my 1996 lecture.
If one-way functions turn out to exist, the important
practical problem will be to find one.
References to Hiltgen’s work:
A. P. L. Hiltgen, Cryptographically Relevant Contributions to
Combinatorial Complexity Theory, (ETH-Zürich Dissertation).
Konstanz: Hartung-Gorre Verlag, 1994. (ISBN 3-89191-745-7)
A. P. Hiltgen, “Constructions of Feebly-One-Way Families of
Permutations," in Advances in Cryptology--AUSCRYPT '92 (Eds. J.
Seberry and Y. Zheng), Lecture Notes in Computer Science No.
718. New York: Springer, 1993, pp. 422-434.
50