Shannon`s theory

Transcript Shannon`s theory

Shannon’s theory part II
Ref. Cryptography: theory and practice
Douglas R. Stinson
Shannon’s theory


1949, “Communication theory of Secrecy
Systems” in Bell Systems Tech. Journal.
Two issues:

What is the concept of perfect secrecy? Does
there any cryptosystem provide perfect secrecy?


It is possible when a key is used for only one
encryption
How to evaluate a cryptosystem when many
plaintexts are encrypted using the same key?
Perfect secrecy


Definition: A cryptosystem has perfect
secrecy if Pr[x|y] = Pr[x] for all xP, yC
Idea: Oscar can obtain no information about
the plaintext by observing the ciphertext
Bob
Alice
y
x
Pr[Head]=1/2
Pr[Tail]=1/2
Oscar
Case 1:
Pr[Head |y]=1/2
Pr[Tail |y]=1/2
Case 2:
Pr[Head |y]=1
Pr[Tail |y]=0
Perfect secrecy when |K|=|C|=|P|


(P,C,K,E,D) is a cryptosystem where
|K|=|C|=|P|.
The cryptosystem provides perfect secrecy iff



every keys is used with equal probability 1/|K|
For every xP, yC, there is a unique key K such
that eK ( x)  y
Ex. One-time pad in Z2
P: 010
K: 101
P: 111
K: 000
?
C: 111
Outline

Introduction







One-time pad
Elementary probability theory
Perfect secrecy
Entropy
Properties of entropy
Spurious keys and unicity distance
Product system
Preview (1)
Ciphertext
Plaintext
yn
xn
K

We want to know:
the average amount of ciphertext required
for an opponent to be able to uniquely
compute the key, given enough computing
time
Preview (2)


That is, we want to know:
How much information about the key is
revealed by the ciphertext
= conditional entropy H(K|Cn)
We need the tools of entropy
Entropy (1)

Suppose we have a discrete random variable
X



What is the information gained by the outcome
of an experiment?
Ex. Let X represent the toss of a coin,
Pr[head]=Pr[tail]=1/2
For a coin toss, we could encode head by 1,
and tail by 0 => i.e. 1 bit of information
Entropy (2)


Ex. Random variable X with Pr[x1]=1/2,
Pr[x2]=1/4, Pr[x3]=1/4
The most efficient encoding is to encode x1
as 0, x2 as 10, x3 as 11.
uncertainty information codeword length
Pr[x1]=1/2
Pr[x2]=1/4
Entropy (3)


Notice: probability 2-n => n bits
p => -log2 p
Ex.(cont.) The average number of bits to
encode X
1
1
1
3
1   2   2 
2
4
4
2
Entropy: definition

Suppose X is a discrete random variable
which takes on values from a finite set X.
Then, the entropy of the random variable X
is defined as
H ( X)    Pr[ x] log 2 Pr[ x]
xX
Entropy : example

Let P={a, b}, Pr[a]=1/4, Pr[b]=3/4.
K={K1, K2, K3}, Pr[K1]=1/2, Pr[K2]=Pr[K3]=
1/4.
a
b
encryption matrix: K
1
2
1
K2
K3
2
3
1
1 3
3
H(P)=  log 2  log 2  0.81
4
4 4
4
H(K)=1.5, H(C)=1.85
3
4
Properties of entropy (1)

Def: A real-valued function f is a strictly
concave (凹) function on an interval I if
 x  y  f ( x)  f ( y )
f

2
 2 
f(y)
f(x)
x
y
Properties of entropy (2)

Jensen’s inequality: Suppose f is a
continuous strictly concave function on I,
n
a
i 1
i
 1, ai  0, 1  i  n
Then
 n

ai f ( xi )  f   ai xi 

i 1
 i 1

n
Equality hold iff x1 =...=xn
x1
xn
Properties of entropy (3)

Theorem: X is a random variable having a
probability distribution which takes on the
values on p1, p2,…pn, pi>0, 1  i  n.
Then H(X)  log2 n
with equality iff pi=1/n for all i
* Uniform random variable has the maximum entropy
Properties of entropy (4)

Proof:
n
1
H ( X)   pi log 2
pi
i 1

1
 log 2   pi  
pi 
i 1 
n
 log 2 n
Entropy of a natural language (1)
HL : average information per letter in English
1. If the 26 alphabets are uniform random,
= log2 26  4.70
2. Consider alphabet frequency

H(P)  4.19
Entropy of a natural language (2)
3. However, successive letters has correlations
Ex. Digram, trigram
Q: entropy of two or more random variables?
Properties of entropy (5)

Def: H ( X, Y)  
 Pr[ x, y] log
x X , yY


2
Pr[ x, y ]
Theorem: H(X,Y)  H(X)+H(Y)
with equality iff X and Y are independent
n
Proof: Let
pi  Pr[ X  xi ], 1  i  m
q j  Pr[Y  y j ], 1  j  n
pi   rij
j 1
m
q j   rij
i 1
rij  Pr[ X  xi , Y  y j ], 1  i  m, 1  j  n
Entropy of a natural language (3)
3. Let Pn be the random variable that has as
its probability distribution that of all n-gram
of plaintext.
tabulation of digrams => H(P2)/2  3.90
tabulation of trigrams => H(P3)/3
…
tabulation of n-grams => H(Pn)/4
n
H (P )
H L  lim
n 
n
1.0  HL  1.5
Entropy of a natural language (4)

Redundancy of L is defined as
HL
RL  1
log 2 P
Take HL =1.25, RL = 0.75
English language is 75% redundant !
We can compress English text to about
one quarter of its original length
Conditional entropy

Known any fixed value y on Y, information
about random variable X
H ( X | y)   Pr[ x | y ] log 2 Pr[ x | y ]
x

Conditional entropy: the average amount of
information about X that is revealed by Y
H ( X | Y)   Pr[ y ] Pr[ x | y ] log 2 Pr[ x | y ]
y

x
Theorem: H(X,Y)=H(Y)+H(X|Y)
Theorem about H(K|C) (1)


Let (P,C,K,E,D) be a cryptosystem, then
H(K|C) = H(K) + H(P) – H(C)
Proof:
H(K,P,C) = H(C|K,P) + H(K,P)
Since key and plaintext uniquely determine the ciphertext
H(C|K,P) = 0
H(K,P,C) = H(K,P) = H(K) + H(P)
Key and plaintext are independent
Theorem about H(K|C) (2)

We have
Similarly,

Now,

H(K,P,C) = H(K,P) = H(K) + H(P)
H(K,P,C) = H(K,C) = H(K) + H(C)
H(K|C)= H(K,C)-H(C)
= H(K,P,C)-H(C)
= H(K)+H(P)-H(C)
Results (1)

Define random variables as
Ciphertext
Plaintext
Cn
Pn
K
H (K | C )  H (K )  H (P )  H (C )
n
n
n
H (P )
H L  lim
n 
n
Set |P|=|C|,
=>
n
H (P )  nH L  n(1  RL ) log 2 P
n
H (C )  nH (C)  n log 2 C
n
H (K | C )  H (K)  nRL log 2 P
n
Spurious(假) keys (1)





Ex. Oscar obtains ciphertext WNAJW, which
is encrypted using a shift cipher
K=5, plaintext river
K=22, plaintext arena
One is the correct key, and the other is
spurious
Goal: prove a bound on the expected
number of spurious keys
Spurious keys (2)
Ciphertext
Plaintext
Cn
Pn
K
Given yCn , the set of possible keys
n
K ( y)  K  Κ : x  P such that Pr[ x]  0, eK ( x)  y





The number of spurious keys |K(y)|-1
The average number of spurious keys
sn 
 Pr[ y]| K ( y) | 1   Pr[ y] | K ( y) | 1
yCn
yCn
Relate H(K|Cn) to spurious keys
(1)

By definition
H (K | C ) 
n
 Pr[ y]H (K | y)
yCn

 Pr[ y] log
2
| K ( y) |
yCn
 log n
 Pr[ y] | K ( y) |
yCn
 log n sn  1
Relate H(K|Cn) to spurious keys
(2)
H (K | C )  log n sn  1
n
We have derived
H (K | C )  H (K)  nRL log 2 P
n
So
log 2 (sn  1)  H (K)  nRL log 2 P
|K|
 log 2 | K |  nRL log 2 P  log 2 nRL
P
Relate H(K|Cn) to spurious keys
(3)

Theorem: |C|=|P| and keys are chosen
equiprobably. The expected number of
spurious keys
sn 

|K|
P
nRL
1
As n increases, right hand term => 0
Relate H(K|Cn) to spurious keys
(4)

Set sn  0
log 2 | K |
n0 
RL log 2 P

For substitution cipher, |P|=|C|=26, |K|=26!
n0  88.4 /( 0.75  4.7)  25
Unicity distance
The average amount of ciphertext required for an opponent
to be able to unique compute the key, given enough time
Product cryptosystem


S1 = (P,P,K1,E1,D1), S2 = (P,P,K2,E2,D2)
The product of two cryptosystems is
S1 = (P,P, K1K2 ,E,D)
Encryption:
e( K1 , K 2 ) ( x)  eK 2 (eK1 ( x))
Decryption:
d ( K1 , K 2 ) ( y )  d K1 (d K 2 ( y ))
Product cryptosystem (cont.)

Two cryptosystem M and S commute if
MxS = SxM

Idempotent cryptosystem: S2 = S


Ex. Shift cipher
If a cryptosystem is not idempotent, then
there is a potential increase in security by
iterating it several times
How to find non-idempotent
cryptosystem?

Thm: If S and M are both idempotent, and
they commute, then SM will also be
idempotent
(SXM) x (SxM) = S x (M x S) xM
=S x (S x M) x M
=(S x S) x (M x M)
=S x M

Idea: find simple S and M such that they do
not commute

SxM is possibly non-idempotent

Shannon`s theory

Transcript Shannon`s theory

Directory