La sicurezza dei sistemi distribuiti

Download Report

Transcript La sicurezza dei sistemi distribuiti

PART FIVE:
Security issues in
distributed systems
Suggested reading

Crittografia, P. Ferragina e F. Luccio, Ed.
Bollati Boringhieri, € 16.
2
Roadmap

Introduction
Computer security vs network security
 Attacks to security


Criptography
Private key (symmetric) cryptography
 Public key (asymmetric) cryptography

3
Computer security vs network (i.e.,
distributed systems) security


Computer security: aims at
protecting information within a
computer
Network security: aims at protecting
the exchange of information among
computers
4
Basic problems in network security


Networks are insecure because most of the
communications occur in clear (e.g., e-mail,
http, ftp,...)
No physical point-to-point connections but




through shared lines
through third-party routers
Often there is no router authentication, but
only (and not always) of users
New security issues: e-commerce, e-banking,
e-government, etc.
5
Attacks to information security
6
Information security: main issues




Confidentiality: prevent the data sent from
one person A to person B to be understood by
a third party C.
Authentication: verify the identity of who
sends or receives data.
Integrity: be sure that the data received are
identical to those sent.
Non-repudiation: prevent users who send data
may in future negate to have sent them
(digital signature).
Ensuring security: cryptography


To address the above security issues, one has
to make use of cryptographic techniques, i.e.,
of communication protocols that overcome the
influence of adversaries
Cryptography (from the greek kryptos, hidden,
and graphein, writing) is the discipline that in
the old days dealt with the study of the
“secret scriptures”: today is a “set of
techniques that permit the construction of an
encrypted text and the decryption of a
cryptogram.” (Garzanti, 1972)
8
Cryptography: a brief history




Cryptography was used in the ancient antiquity to hide the
content of text messages (in modern terms, to ensure
confidentiality)
Cryptography experienced a tremendous development during the
Second World War, when British mathematician Alan Turing
formalized the theory needed to decrypt the Nazi German
cryptosystem known as Enigma
In 1949, Claude Shannon published a paper which gave start to
what is now called Theory of Information, that along with
Probability, Computational Complexity, and Number Theory
forms the basis of Modern Cryptography
Modern cryptography is concerned with all the aspects
(theoretical, computational, implementative) related to
information security
Cryptosystem

Definition: A cryptosystem is a quintuple
(P,C,K,Cod,Dec), where:
 P: finite set of plaintexts
 C: finite set of ciphertexts
 K: set of possible encryption
keys
 Cod: P x K → C: encryption function (injective and
invertible)
 Dec: C x K → P: decryption function


Kerckhoffs’ principle: a cryptosystem should be
secure even if everything about the system,
except the key, is public knowledge: thus, all its
strength is based on the inviolability of the key
It can be rephrased in the Shannon maxim: the
enemy knows the system!
Symmetric vs asymmetric cryptosystems

If Cod and Dec use the same key to encrypt
and to decrypt a given text, we talk about
symmetric cryptosystems, otherwise of
asymmetric cryptosystems
Symmetric cryptosystem: since the same key
is used to encrypt and to decrypt data,
sender A and receiver B must share such a
key
 since this key must be kept secret, the main
problem is the key exchange!

Symmetric key scenario
The problem of transmitting the key
Q: If you want to use a symmetric cipher
to protect the dataflow between two
parties, how to exchange the secret
key?
A: You must use a secure channel of
communication!
A first example of a symmetric
key cipher: The Caesar cipher



Let us consider the Italian alphabet, and let us construct
a cipher that replaces each letter of the alphabet by the
letter which is 3 (this is the key!) positions forward:
For example, the clear text “distributed algorithm" is
encrypted in the cryptogram “gnvzuneazhg dolrunzmpv”.
However, as most of the simple ciphers based on
transpositions and/or shifting, it can be easily attacked
by means of statistical analysis
Statistical cryptanalysis

The plaintext is obtained by means of the use of
statistical techniques on the frequency of characters or
substrings of the ciphertext, as compared to the
corresponding frequencies of the associated language
Perfect cryptosystems


A cryptosystem is called perfect if plaintext
and ciphertext are statistically independent
More formally, if we denote by P(m) the
probability that a message from A to B
contains m, and by P(m|c) the probability that
a message is equal to m after observing an
encrypted message c, then a cipher is perfect
iff for any mM and for any cC it holds
P(m|c)=P(m).
A necessary condition to be perfect
Theorem (Shannon): A cryptosystem is perfect only
if |K|≥|M|.
Proof: For the sake of contradiction, assume the
existence of a perfect cryptosystem with |K|<|M|.
By the injectivity of the Cod function, |M|≤|C|, i.e.,
|K|<|C|. Let m be a message s.t. P(m)=p≠0. Then, m
can generate at most |K|<|C| ciphered messages
(one for each key). It follows that there exists at
least a ciphered message c* which is not image of m,
namely
P(m|c*)=0≠p=P(m)
against the assumption of perfectness.
□
Two very unperfect ciphers



Assume that P(m)=p, 0<p<1, and that
P(m|c)=0≠p; then, a cryptoanalyst that sees c,
is able to infer that the encrypted message is
not m!
Assume now that P(m)=p, 0<p<1, and that
P(m|c)=1≠p; then, a cryptoanalyst that sees c,
is able to infer that the encrypted message is
exactly m!
In all the intermediate cases in which
P(m|c)≠p, a cryptoanalyst is able to infer
something by observing the ciphered
messages!
A perfect cryptosystem

One-time pad (Vernam G., AT&T, 1917):
Builds a large random (and not
pseudorandom) key, for example using a
detector of cosmic rays.
2. The ciphertext is constructed by a bitwise
XOR between the plaintext m and the key
k (recall 10=01=0; 11=00=1)
1.
3.
4.
5.
A sends c=mk
B decrypts m=ck (indeed xyy=x)
The key should never be reused (one-time
pad).
One-time pad is perfect!
We have to show that P(m|c)=P(m). Let m and c be of n bits;
by definition of conditional probability, we have:
P(m|c)=P(m∩c)/P(c)
where P(m∩c) is the probability that A generates m and
ciphers it as c; then
P(m∩c)=P(m∩c:=mk)
and since k is truly random, and the XOR transforms a 0 of
m in a 0 of c with probability 1/2, and 0 in 1 with probability
1/2, and a 1 in 0 with probability 1/2, and finally a 1 in 1 with
probability 1/2, it follows that any bit of c is statistically
independent of the corresponding bit of m. Thus, m and c are
independent, and so
P(m∩c:=mk) = P(m)∙P(c)
from which it follows that P(m|c)=P(c).
□
From perfection to reality ...


One-time pad is only theoretically perfect: how A
and B do actually exchange key k?!? If they
exchange it a priori (by not using a traditional
communication channel), then its length will bound
the length of the message to be encrypted!
(notice however that one-time pad was used for
the Moscow–Washington red line)
Instead of being perfect (i.e., provably secure but
practically unusable), all used cryptosystems are
computationally secure: the cryptanalytic problem
(namely, the decryption of a ciphertext without
knowing the key) is computationally intractable
The state-of-the-art in symmetric
key encryption: Rijndael



Developed by Joan Daemen and Vincent
Rijmen.
This algorithm has won the selection for
Advanced Encryption Standard (AES) in
2000. Officially, the Rijndael method has
become the standard for symmetric key
encryption of the XXI century.
The cipher uses a set of keys of variable
length (128, 192, 256 bits), and a network
of “shuffling of the message," in which
multiple operations of transposition,
substitution, and xoring of blocks of
fixed length are performed. Keys are
exchanged by encrypting them with RSA
cryptography (to be seen next).
Limits of symmetric key ciphers



Does a secure channel of communication to
exchange the secret key actually exist in
reality? And if it does exist, why using
encryption??
In addition, for secure communication
between n users, one must exchange a
quadratic number of keys!
Finally, the symmetric method does not
distinguish between sender and receiver, and
so it cannot address security issues like the
authentication and the non-repudiability of a
message
Asymmetric key (a.k.a. public key)
algorithms

Each subject S has a pair of keys:



A public key Kpu(S), known to all;
A private key Kpr(S), known only by himself.
The requirements that a public key algorithm
must enjoy are:



Data encoded with one key can be decrypted only
with the other one;
The private key should never be transmitted in
the network;
It must be very difficult to derive a key from the
other one (in particular, the private key from the
public key).
The various public key scenarios
First scenario: A encodes a message with the public key
associated with B, which then decodes the message by
using its own private key; in this way, confidentiality and
integrity are guaranteed (B only can read the message)
The various public key scenarios
Second scenario: A “signs” a message by encoding it with
its own private key, and then sends it to B, which then
authenticates the message by using the public key
associated with A; in this way, authenticity and nonrepudability are guaranteed (all can read the message, but
A only can have signed it)
The various public key scenarios
Third scenario: A “signs” a message by encoding it with its
own private key, then re-encodes it with the public key
associated with B; hence, it sends it to B, which decodes it
by using its own private key, and then authenticates it by
using the public key associated with A; in this way,
confidentiality, integrity, authenticity, and nonrepudability are guaranteed.
The birth of PKI systems
• Where do I find the public keys of my
recipients?
• Creation of archives of public keys, the public
key servers.
• But who guarantees the correspondence of
public keys with the respective owners?
 Birth of the Certification Authority (CA).
• At this point, who guarantees the validity of a
certificate authority?
Act of faith!
The mathematics of public key
systems
It was introduced by Diffie and Hellman in 1976:
Definition: A function f is called one-way if for
every x the computation of y=f(x) is simple
(i.e., it is in P), while the calculation of x=f-1(y)
is computationally hard (i.e., it is NP-hard).
Definition: A one-way function is called trapdoor
if the calculation x=f-1(y) can be made easy
once that additional information (private) are
known.
... But unfortunately for them, they were not able
to build a one-way trapdoor function!
The RSA algorithm



Designed in 1977 by Ron Rivest, Adi Shamir and
Leonard Adlemann, the cipher is patented, and has
become public knowledge until 2000.
Basic idea: given two prime numbers p and q (very
large), it is easy to calculate the product n = p∙q, while
it is very difficult to compute the factorization of n
(although this problem is not known to be NP-hard).
The best factorization algorithms currently available
(Quadratic Sieve, Elliptic Curve Method, Pollard’s
Heuristic, etc.). all have an exponential complexity, in
the order of:
The RSA algorithm

To ensure security, it is necessary that p and q are at
least 200 decimal digits. Indeed, in this way n=p∙q is 400
digits long, namely is in the order of 10400, and so:
≈ e79 ≈ 1034
which is computationally intractable.
 keys are typically 1024 bits long (21024 ≈ 10300)

RSA is much slower than symmetric key algorithms, and
it is often applied for the transmission of small amount
of data, like the private key in a symmetric key system
(as noticed before).
RSA at work: key generation
Recall: xy mod z  the remainder of the integer
division between x and z, and between y and z is the
same, namely x mod z = y mod z (or, equivalently, there
exists an integer k s.t. x=y+kz)
1. Choose two large primes p and q and computes n=p∙q.
2. Compute the Euler totient function w.r.t. n, i.e., the
cardinality of all numbers less than n and prime with it:
ϕ(n)=ϕ(pq)=pq-[(q-1)+(p-1)]-1=pq-(p+q)+1=
=(p-1)·(q-1)=ϕ(p)·ϕ(q)
(since there are q-1 multiples of p less than n, and p-1
multiples of q less than n)
3. Choose a number 0<e<ϕ(n) s.t. GCD(e,ϕ(n))=1 (i.e., e,ϕ(n)
are coprime)
4. Define the public key as (e,n).
5. Compute d such that e·d1 mod ϕ(n).
6. Define the private key as (d,n).
RSA at work
A sends a crypted message x to B
The encryption function of A is Cod(x):=xe mod n (with
x<n), where (e,n) is the public key of B.
The decryption function of B is:
Dec(Cod(x)):=Cod(x)d mod n = (xe mod n)d mod n
where (d,n) is the private key of B.
1.
2.
1.
2.
A sends a signed (i.e., non-repudable) message x to B
The encryption function of A is Cod(x)=xd mod n (with
x<n), where (d,n) is the private key of A.
The decryption function of B is:
Dec(Cod(x)):=Cod(x)e mod n = (xd mod n)e mod n
where (e,n) is the public key of A.
 Notice that public and private keys can be used
interchangably, since Dec(Cod(x))=Cod(Dec(x)).
Correctness of RSA: some theorems
of modular algebra



Theorem (modular equations): Equation axb mod
n has a solution iff GCD(a,n) divides b. In such a
case, there are exactly GCD(a,n) distinct solutions.
Corollary (existence of the inverse): If a and n
are coprime, then ax1 mod n has eactly one
positive solution less than n, known as the inverse
of a modulo n.
Euler’s Theorem: For any n>1, and for any a prime
with n, we have that aϕ(n)1 mod n.
Correctness of RSA



Notice that e and ϕ(n) are coprime, and so from the corollary
on the existence of the inverse, there exists a unique d less
than ϕ(n) s.t. e∙d1 mod ϕ(n).
Here is the strength of RSA: to compute d from e one must
know ϕ(n), i.e., p and q, and so one must be able to factorize
efficiently!
Secretation: we must prove that  x<n, Dec(Cod(x))=x. But
Dec(Cod(x))=(xe mod n)d mod n=xed mod n,
and so we have to show that x=xed mod n.
Prove it!
We distinguish two cases:
1. p and q do not divide x (and so GCD(p,x)=GCD(q,x)=1, since
they are prime);
2. p (or q) divides x, but q (or p) does not divide x.
(notice that p and q cannot both divide x, since otherwise
we should have x≥n, against the assumptions)
Correctness of RSA (2)
Case 1: We have GCD(x,n)=1, and so from Eulero’s theorem, we
have xϕ(n)1 mod n; since ed1 mod ϕ(n), we have
ed=1+kϕ(n), for some positive integer k. So, since x<n, we
have:
xed mod n = x1+kϕ(n) mod n = x·(xϕ(n))k mod n = x·1k mod n = x.
Case 2: Since p divides x, for any positive integer k we have
xxk0 mod p, namely (xk-x)0 mod p. Since instead q does
not divide x, similarly to Case 1, we have also xedx mod q,
and so (xed-x)0 mod q. It follows that (xed-x) is divided by
both p and q, and then by their product n, from which it
follows
(xed-x)0 mod n  xedx mod n  xed mod n = x mod n = x.
Authentication: it follows from the RSA property:
□
RSA at work: an example (1 of 2)








Assume that A wishes to send a secret message to B; then, by
the RSA protocol, B needs to provide its publik key to A.
B needs to generate its keys; then, it selects two large primes,
for instance p=3 e q=11 (ehmm, not very large, actually!)
Then, n=33 e ϕ(n)=2·10=20.
Then, B takes e=3, since 3 is coprime with 20  (3,33) is the
public key of B
Then, B searches d s.t. 3d1 mod 20. Hence, from 3d=1+k·20, by
setting k=1, we have d=7  (7,33) is the private key of B
Now, to encrypt a message, A divides it in blocks of bits whose
maximum value is less than n=33; then, a block P becomes:
C:=Cod(P)=P3 mod 33
A sends C to B; to decode it, B computes P=C7 mod 33
In our example, since n=33, a block contains at most 5 bits
(25<33); however, in the practice, n is in the order of 21024, and
so blocks have a size of 1024 bits, i.e., 128 ASCII characters (8
bits each).
RSA at work: an example (2 of 2)
To visualize the example, let us suppose that the 26 letters
of the English alphabet are represented by using 5 bits, and
so, since n=33, each block is made up by a single character:
Computational Complexity of RSA





It can be shown that the keys (and thus p,q,e,d) can be
generated in polynomial time w.r.t. to their binary
representation (namely, logarithmic in their value).
In particular, e is usually chosen by taking a quite small
prime number (e.g., e=3).
Instead, d is obtained by an extension (polynomial) of
the Euclidean algorithm for computing the GCD (based on
the fact that GCD(a,b)=GCD(b,a mod b)).
However, to find large prime numbers (i.e., p and q),
probabilistic primality testing algorithms are used, since
deterministic algorithms are too slow (although
polynomial, but in the order of a degree of 10).
Finally, note that the processes of encryption and
decryption can be performed efficiently by successive
exponentiation (so-called modular exponentiation).
Searching for p and q




Recall: remember the randomized algorithm for computing a
MIS: its answer was deterministically correct, while its
computational complexity was given in probabilistic form. This
is known as a randomized Las Vegas algorithm.
There exists another fundamental type of randomized
algorithm, known as randomized Monte Carlo algorithm., in
which the answer is probably correct, while the time
complexity is deterministically bounded.
Definition (Monte Carlo algorithm): A Monte Carlo "nobiased" algorithm is a randomized algorithm for solving a given
decision problem, such that the answer "no" is always correct,
while the answer "yes" may be incorrect with a fixed
probability ε. Monte Carlo "yes-biased" algorithms are
similarly defined.
The Miller&Rabin algorithm is a Monte Carlo "no-biased"
algorithm to test the primality of a number n. Its time
complexity is O(log3 n), and its probability of inaccuracy is
ε≈1/4 (i.e., YES answer is correct with probability ≈3/4).
Miller&Rabin algorithm
It is based on the following property: given an odd integer n
(for which we want to test primality), we write it as n=2sr+1,
with r odd (thus s is the multiplicity of factor 2 in the
decomposition of the even number n-1). Now, given 2 ≤ t ≤ n2, we define the following 2 predicates:
(P1): GCD(n,t)=1;
j
r
2
(P2): (t mod n=1) OR (it exists 0 ≤ j ≤ s-1 s.t. t r mod n=-1).
Theorem: If n is prime it satisfies both predicates, while if n
is composite, then the number of integers between 1 and n-1
that satisfy both predicates is less than n/4.
 We run MR(n) a number of k times, testing each time the
two predicates on a random integer 2 ≤ t ≤ n-2. If the
algorithm answers "no“, even only once, the number is
definitely composite, but if it always answers "yes", then the
probability that the number is composite is 4-k, and
therefore the probability that the number is prime is:
P(prime)=1-P(composite)=1-4-k
(e.g., if k=100, then P(prime)≈1-10-60 ≈ 1)

Miller&Rabin algorithm
Miller-Rabin(n)
1.
Set n-1=2sr with r odd
2.
For i=1 to k do
2.1 choose randomly an integer t s.t. 2≤t≤n-2
2.2 if GCD(n,t)>1 return composite %condition
(P1) is false
2.3 compute y=tr mod n
2.4 if y≠1 do %first condition of (P2) is false
2.4.1 j=0
2.4.2 while ((j≤s-1)
and (y≠n-1))
j
y:=t2 r mod n
j++
2.4.3 if y≠n-1 return composite %second
condition of (P2) is false as well
3.
Return prime (w.h.p. 1-4-k)
Is it easy to find prime numbers?



Despite the efficiency of the primality test, it is still
unknown if the primes are well distributed and
therefore easy to find at random (Riemann
hypothesis!). However, we know that their density is
quite high, as stated by the following result:
Gauss Theorem (prime numbers): Let π(n) be the
distribution function of prime numbers, i.e., the
number of primes less than n. Then the following is
satisfied:
So, if you search for a prime number of 100 digits,
and they would be uniformly distributed, you should
check "only" ln (10100) ≈ 230 consecutive numbers.