Transcript Slide 1

Cryptography I
Lecture 3
Dimitrios Delivasilis
Department of Information and
Communication Systems Engineering
University of Aegean
Some modular arithmetic
•
When we write
r = (x mod m)
We mean that r is the remainder when x is divided by m. In
the above equation r satisfies the following two constraints:
1)
x = km +r, for some arbitrary k
2)
0≤r≤m
Polyalphabetic Ciphers
• A polyalphabetic cipher is just a set of monoalphabetic
ciphers applied sequentially
• A monoalphabetic cipher enciphers each character ci in the
plaintext in exactly the same way as illustrated below:
Plaintext
Ciphertext
co
f(co)
c1
f(c1)
… cp-1 cp
… f(cp-1) f(cp)
cp+1
f(cp+1)
• A polyalphabetic cipher cycles through a sequence of
distinct transformations, with the result that plaintext
characters are enciphered in several different ways as
below:
Plaintext
Ciphertext
co
f0(co)
c1
f1(c1)
… cp-1
… fp-1(cp-1)
cp
f0cp
cp+1
f1(cp+1)
Attacks on Polyalphabetic Ciphers…
• Index of Coincidence
– Natural language has an uneven distribution of character
probabilities.
– The use of polyalphabetic ciphers tends to smooth out the
distribution of characters in the ciphertext
– If a polyalphabetic cipher has a long period and if the
individual substitution alphabets are chosen randomly, then all
letters in the ciphertext will occur with equal frequencies
– One way to measure evenness of a distribution is to calculate
the sums of the squares of the probabilities of the characters
n
 p (a
i 1
i
)
2
where p(ai) is the probability of character ai
… Attacks on Polyalphabetic Ciphers…
• More on the Index of Coincidence
– For an even distribution with n characters, p(ai) = 1/n and this
sum will be:
n
 (1/ n)  (1/ n)
2
i 1
n
2
2
1

(
1
/
n
)
n  1/ n

i 1
– This value will range from 1/n to 1, with a higher value
indicating a less even distribution.
– This values is the probability that two characters are the same
if they are chosen at random according to the distribution. It is
called the index of coincidence and is written Ic.
– For a 26 letter alphabet an even distribution will result in a
value of 1/26 = 0.038. The actual frequencies with which the
26 letters of the alphabet actually occur in English
approximate 0.065
… Attacks on Polyalphabetic Ciphers…
• Obtaining the Index of Coincidence
– Suppose c(ai) is the number of occurrences of a character ai in
a text. Then we can estimate
n
2
p
(
a
)
 i
i 1
– With
n
n
i 1
i 1
2
2
2
(
c
(
a
)
/
c
)

1
/
c
c
(
a
)
 i
 i
– Where c is the total number of characters in the text
… Attacks on Polyalphabetic Ciphers…
• More on Obtaining the Index of Coincidence
– The value of the Index of Coincidence can be used to estimate
the period of a polyalphabetic cipher
– Given sufficiently long piece of ciphertext, the probabilities of
the individual characters can be estimated from their
frequencies and therefore the index of coincidence can be
calculated
– If the plaintext is known to be English and the value we obtain
is close to 0.065, then we can safely assume that a short
period cipher was used
– If the value is close to 1/n, then we can guess that a long
period polyalphabetic cipher was used.
… Attacks on Polyalphabetic Ciphers…
• Lets carefully consider the following example
Plaintext:
foraciphertextonlyattackweneedsomeknowledgeofthestatisticsofthesourceoftheplainte
xtthesourcemightbecomputerdataEnglishtextimagesdigitsedvoicesignalsandsoonitmig
htevenbepositionalinformationbroadcastbymilitarynavigationsatellitestheamericanmilit
aryhavesuchasysteminwhichtheinformationallowingacoarsepositiontobecalculatedisbr
oadcastplainandextrainformationallowinganaccuratefixisbroadcastencrypted
-Total of 397 characters
-Character frequencies:
a b c d e f g h
i
j
k
l
m
n
o
p
q
r
s
t
41 7
40 0
2
17 12
27
31
7
0
18
25
42
19
12 39
8
10
14
u
v
w
x
y
6
4
5
5
6
-Summing up the square of frequencies we obtain 10743
-Dividing by 3972 = 157609 we get 0.068
-Then encipher the plaintext using a) polyalphabetic cipher of period 3, b)
polyalphabetic cipher with period six and c) polyalphabetic cipher with period 10.
- The index of coincidence became: 0.049, 0.046 and 0.042 respectively
… Attacks on Polyalphabetic Ciphers…
• Estimating Period with the Index of Coincidence
– We cannot make a sharp estimate of the period of
polyalphabetic cipher from the letter frequencies in a
ciphertext
– Lets assume that a ciphertext was created by a polyalphabetic
cipher with period m. The ciphertext can then be rearranged
into m columns, one for each substitution alphabet
– The probability that two characters chosen from the same
column are the same is 0.065
– The probability that two characters chosen from different
columns is the same is just 1/26 = 0.038
… Attacks on Polyalphabetic Ciphers…
• Estimating Period with the Index of Coincidence (Cntd)
– If we choose two characters from anywhere in the text, the
probability that they belong to the same column is 1/m
– The probability that they are in different columns is (m-1)/m
– Therefore the probability that they are the same is:
0.065 0.038 (m  1) 0.065  0.038 (m  1)
Ic 


m
m
m
… Attacks on Polyalphabetic Ciphers…
• Estimating Period with the Index of Coincidence (Cntd)
– We can estimate Ic directly from the character frequencies in
the text and thus obtain m, since
0.065  0.038(m  1)

m
m Ic  0.065  0.038(m  1) 
Ic 
m Ic  0.065  0.038m  0.038 
m( I c  0.038)  0.065  0.038 
m
0.027
I c  0.038
– Thus finally we get:
m  0.027
 1
 2
c

c(a)  0.038

i 1

n
2
… Attacks on Polyalphabetic Ciphers…
• Determining the period with the index of coincidence
In the previous example we obtained Ic= 0.068, 0.049, 0.046 and
0.042 for m=1, 3, 6, and 10 respectively. Pluggin the values for Ic
in the last equation we get estimates for the period of 0.9, 2.45,
3.375 and 6.75 respectively.
These estimates are approximately correct.
This illustrates the importance of having sufficient ciphertext, and
of not relying in a single method for estimating the period.
… Attacks on Polyalphabetic Ciphers…
•
Confirmation methods for the estimated period:
1)
The method of Displacement
According to this method the text is compared against a shifted version of itself
and the number of coincidences is recorded for a range of displacements. If the
number of coincidences is plotted against the displacement, then there will be
peaks of multiples of the cipher period. Thus by inspection, or by calculating the
autocorrelation function, or by using the Fourier spectrum we may be able to find
a frequency component with period equal to the period of the cipher
2)
The Kasiski Test
In the Kasiski test the cryptanalyst looks for repeated sequences of letters in the
ciphertext. Any pair of sequences which are at positions in the plaintext which are
a multiple of the cipher period apart will be enciphered with the same sequence of
substitution alphabets. If such a pair of sequences are identical, then they will be
enciphered identically. To estimate the period we look at factors of distances
between identical sequences of letters. Displacements with the largest number of
occurrences of identical n-grams will tend to have the cipher period as a factor.
… Attacks on Polyalphabetic Ciphers
• Exaple: The Kasiski test
Suppose that the cipher period were 9 and the plaintext were:
the time they ate them
As it happens, the three instances of the are separated by exaclty 9 characters as
shown in the diagram below:
Each of the three instances of the will be enciphered identically