Transcript slides

PRIMALITY TESTING –
its importance for cryptography
Lynn Margaret Batten
Deakin University
Talk at RMIT
May 2003
Prime numbers have attracted much
attention from mathematicians for many
centuries.
Questions such as
• how many are there?
• is there a formula for generating them?
• how do you tell if a give number is a
prime?
have fascinated people for years.
However, the first actual use of prime
numbers in an important area outside
of the theory of numbers was
discovered only in the mid to late
1900s.
This was in the establishment of a
technical system to be used in
maintaining the secrecy of electronic
communications.
TRANSMITTER
Message
M
Source
Encrypt M
C = EK (M),
COMMUNICATION
CHANNEL
RECEIVER
Decrypt C
M = DK (C),
C
1
2
Using key
K1
C
Using Key
K2
Cryptanalyst
Key Source
#1
Random Key
K1 is
Produced
K1
KEY CHANNEL
K2
Key Source
#2
Decryption
Key K2
Determined
from K1
Conventional cryptosystem. The key channel must be secure.
The Diffie-Hellman scheme proposed in 1976, was a
radical departure from what, up to then, had all been
essentially ‘private key’ schemes.
The idea was that everyone would own both a ‘private key’
and a ‘public key’. The public key would be published in a
directory, like a telephone book.
If A wanted to send B an encrypted message, A simply
looked up B’s public key, applied it and sent the message.
Only B knew B’s private key and could use it to decrypt the
message.
PROBLEM? Diffie and Hellman had no concrete example
of an encryption/decryption pair which could pull this off!
Then along came the Rivest, Shamir,
Adleman
(RSA)
solution
in
1977:
Public information:
• n an integer which is a product of two
large primes (p and q kept secret), and
• e a positive integer less than (p-1)(q-1) with
gcd(e,(p-1)(q-1)) = 1.
Secret information:
• The two primes p and q such that n = pq,
and
• d such that ed  1 (mod (p – 1)(q – 1)).
To encrypt the message/number m:
c  me (mod n).
To decrypt c:
cd  med  m (mod n).
Example. Let n = 101 x 107 = 10807 and e = 7.
Note 7d  1 (mod 100x106),
or 7d  1 (mod 10600) so
d = 4543.
To encrypt the message m = 109 we find
c = 1097 (mod 10807) = 4836.
To decrypt find cd = 48364543  109.
The security of this scheme depends on the difficulty of
factoring n. In fact, it is easy to show that knowing d is
equivalent to factoring n.
No way of breaking RSA is known, other than finding the
secret information.
Thus the RSA scheme leads to the following two problems:
1.
Find a large pool of big ( >100 digits) primes.
(If very few of these are available, Oscar will easily be able to
get his hands on the list and simply try them all in order to
break the scheme.)
2.
Find a quick (polynomial time) algorithm to factor
integers. (There is no known deterministic, polynomial time
algorithm for factoring integers.)
We take a look at problem 1.
The primes p and q must be of sufficient size that
factorization of their product is beyond
computational reach.
Moreover, they should be random primes in the
sense that they be chosen as a function of a
random input which defines a pool of candidates of
sufficient cardinality that an exhaustive attack is
infeasible.
In practice, the resulting primes must also be of a
pre-determined bitlength, to meet system
specifications.
Since finding large primes is very
difficult. And also, since the known
primes are usually available in some
library or on some website, one of
the 'solutions' to problem 1 has been
to investigate numbers that are not
primes, but simply act like primes.
Generally speaking, we say that a composite
integer N is a pseudoprime if it satisfies some
condition that a prime must always satisfy.
One result for primes is the well-known:
FERMAT'S LITTLE THEOREM
Let p be a prime, and gcd(a,p) = 1.
Then ap-1  1 (mod p).
[Try a =2 and p=7.]
The converse of Fermat's theorem is false as we
see by the following example:
Let N = 2701 = 37•73.
Then 22700  1 (mod2701).
Now consider the following:
Definition We say that the composite integer N
is a base b pseudoprime (written b-psp) if
bN-1  1
(mod N).
(*)
Thus a b-psp acts like a prime with respect to
Fermat's theorem, but it is not a prime. If there
were only a few such numbers, this would not
improve our situation, but as early as 1903 Malo
showed that there exists an infinite number of
composite N satisfying (*).
There exists an infinite number of
base b pseudoprimes because:
Theorem If p is an odd prime,
p  b (b2  1) and
N = (b2p  1) / (b2  1), then N is
a b-psp.
The existence of so many pseudoprimes indicates that the question of
deciding whether a given number is
prime or composite is a difficult one.
This leads us back to RSA and its
second problem (factoring) which we
now approach from a different angle –
that of primality testing.
It was simply very difficult (if not impossible) to
prove that a randomly selected 100-digit
number was a prime back in 1978.
Furthermore, the primality proving methods
that were available did not lend themselves to
easy implementation in hardware, a necessary
condition for RSA to become widely useable.
A result of this situation was the refinement and
further development of what are called
probabilistic primality tests.
Probabilistic methods
Let S be any set. A Monte Carlo algorithm
for S is an algorithm, which, given x and a
source of random numbers for choosing x,
returns “yes” or “no” with the properties that:
If
If
x  S , then the answer is always “no”;
x  S , then the answer is “yes” with
probability at least ½.
Solovay-Strassen test
The Solovay-Strassen probabilistic primality
test (1977) was the first such test
popularized by the advent of public-key
cryptography.
There is no longer any reason to use this
test, because an alternative is available,the
Miller-Rabin test, which is both more efficient
and always at least as correct.
Miller-Rabin Test
The probabilistic primality test used most in
practice today is the Miller-Rabin test (1980), also
known as the strong pseudoprime test. The test is
based on a more complex version of Fermat’s
Little Theorem:
ap-1  1 (mod p) or ap-1 - 1  0 (mod p)
for p prime and gcd(a, p) =1.
For p odd, of course p – 1 = 2r is even.
Then ap-1 - 1 = a2r – 1 = (ar -1)(ar + 1).
So ap-1 – 1  0 (mod p) implies that the
prime p divides into ar – 1 or into ar + 1
and consequently
ar  1 (mod p)
or
ar  -1 (mod p).
This can be taken even further, by taking all
powers of 2 out of p – 1 to obtain the following
fact.
Fact 1.
Let n be an odd prime, and let
n – 1 = 2sr where r is odd.
Let a be any integer such that gcd(a, n) = 1.
Then either
r
a  1 (mod n) or
a
2j r
 -1 (mod n) for some j, 0  j  s – 1.
Definitions
Let n be an odd composite integer
and let n – 1 = 2sr where r is odd. Let a be an
integer in the interval [1, n – 1] relatively prime to n.
(i)
If ja  1 (mod n) and if
2 r
a  1 (mod n) for all j,
0  j  s – 1,
then a is called a strong witness (to
compositeness) for n.
(ii)
Otherwise, n is said to be a strong
pseudoprime to the base a . The integer
is called a strong liar (to primality) for n.
r
a
Example (strong pseudoprime)
Consider the composite integer n = 91 =7x13.
Try a = 9.
Since 91 – 1 = 90 = 2 x 45, s = 1 and r = 45.
Since 9r = 945  1 (mod 91), 91 is a strong
pseudoprime to the base 9. The set of all
strong liars for 91 is
{1, 9, 10, 12, 16, 17, 22, 29, 38, 53, 62, 69
74, 75, 79, 81, 82, 90}.
Notice that the number of strong liars for 91
is less than 90/4.
Fact 1 can be used as a basis for a
probabilistic primality test due to the
following result.
Fact 2
If n is an odd composite integer,
1
then at most 4 of all the numbers a, 1  a
 n –1, are strong liars for n.
Algorithm Miller-Rabin probabilistic primality test
MILLER-RABIN (n,t)
INPUT: an odd integer n  3 and security parameter t  1.
OUTPUT:
an answer ‘prime” or “composite”.
1.
2.
3.
Write n – 1 = 2sr such that r is odd.
For i from 1 to t do the following:
2.1 Choose a random integer a, 2  a  n – 2.
2.2 Compute y = ar mod n.
2.3 If y  1 and y  n – 1 then do the following:
j  1.
While j  s – 1 and y  n – 1 do the following:
Compute y  y2 mod n.
If y  1 then return (“composite”).
j j + 1.
If y  n – 1 then return (“composite”).
Return (“prime”).
If n is actually prime, this algorithm
will always declare ‘prime’.
However, if n is composite, Fact 2
can be used to deduce the following
probability of the algorithm
erroneously declaring ‘prime’.
FACT 3 (Miller-Rabin errorprobability bound)
For any odd composite integer n, the
probability that MILLER-RABIN (n, t)
incorrectly declares n to be “prime” is
less than  1 
t
 4
To perform the Miller-Rabin test on n N to base
, we will
a need no more than log2( )
n
(which is the number of bits in the binary
representation of n) modular exponentiations,
each using  (log 22 a) bit operations.
Hence, the Miller-Rabin test to base a takes
3
 (log 2 n) bit operations.
Since 2  a  n  2, we can run this up to n – 3
times, but the more values of a we run, the
slower the algorithm.
In 1983, Adleman, Pomerance and
Rumely gave the first deterministic
algorithm for primality testing that
runs in less than exponential time.
For n the number being tested, the
O ( logloglog n )
time needed is (log n)
.
In 1986, two independent
algorithms were developed by
Goldwasser and Kilian and by
Atkin which, under certain
assumptions, would guarantee
primality (but not necessary
compositness) in polynomial
time.
Then in August 2002, Agrawal, Kawal and
Saxena made public their unconditional
deterministic, polynomial-time algorithm for
primality testing.
For n the number being tested, this algorithm
runs in O((log n)12 ) time.
The proof that the algorithm works uses
relatively basic mathematics and we shall
outline it here.
The AKS algorithm is based on the following
identity for prime numbers:
( x  a) p  ( x p  a) (mod p)
for any
a
such that gcd( a, p)  1.
We expand the difference between the
polynomials.
Thus, for o  i  p, the coefficient of x
in (( x  a) p  ( x p  a)) is (1)i  pi a pi .
 
If p is prime,  pi  is divisible by p for all i.
If p is not prime, let q be a prime divisor of
p and q k || p.
k
pq
 p
.
The q does not divide  q  or a
p
p
((
x

a
)

(
x
 a)) is not zero
In this case
modulo p.
i
So, given p to test, one could choose a value
p
for a and test ( x  a) as above.
We would need to evaluate about p
coefficients however, in the worst case, which
is too slow.
The trick used to reduce the run time is a
standard one in algebra:
We ‘mod out’ by a polynomial to obtain
( x  a)  ( x  a)
p
p
(mod x  1)
r
Still working modulo p .
r
How is x  1 chosen? Will this work?
(
*
)
In fact, all primes p satisfy * for any choice of
r  1 and of a .
Unfortunately, some composites may also satisfy ( * )
for some choices of the pair ( a , r ).
Congruence ( * ) takes O(r log 2 p) time to check if
Fast Fourier Multiplication (Knuth, 1998) is used.
The authors show that a suitable choice of r is:
r
prime of order O(log 6 p ) where x  1 contains a
factor of a certain size.
They then verify their algorithm for a small
(O( r log p )) number of a ‘s.
The algorithm
_______________________________________________
Input:
1.
2.
3.
4.
5.
6.
Integer
n 1
If ( n is of the form a , b  1 ) output COMPOSITE;
r2 ;
While ( r  n) {
if (gcd( n, r )  1) output COMPOSITE
if ( r is prime)
let q be the largest prime factor of r  1 ;
b
r 1
q
 1 (mod r ))
7.
if ( q  4 r log n) and ( n
8.
break;
r  r 1 ;
9.
10.
}
11. For a  1 to 2 r log n
n
n
r
12.
if (( x  a)  ( x  a) (mod x  1, n)) output COMPOSITE;
13. output PRIME;
_______________________________________________________________
The first loop in the algorithm tries to find
a prime r such that r  1 has a large
prime factor q  4 r log n.
The authors show that r , as described in
line 7 of the algorithm, must exist, and
they are even able to establish bounds on
it.
They then use these bounds to establish
that if n is prime, the algorithm returns
PRIME.
In order to show that if n is composite, the
algorithm returns COMPOSITE, the following set is
constructed:
I g  {m | g ( x) m  g ( x m ) (mod x r 1)}
Where g (x ) is a polynomial of the type on line 12
of the algorithm. There are 2 r log n such
polynomials.
Thus, if the algorithm falsely declares PRIME,
every one of the 2 r log n incongruences in
line 12 must be false.
It follows that n  I g , and the authors show that
this leads to a contradiction.
Time Complexity
_______________________________________________
Input:
Integer n  1
b
1. If ( n is of the form a ,
b  1 ) output COMPOSITE;
O(log 3n)
3. While ( r  n)
{
4. If (gcd(n, r )  1)
output COMPOSITE; poly (log log r )
5. if ( r is prime)
1
r 2 poly (log log n)
6. let q be the largest prime factor of r  1 ;
7. If (q  4 r log n) and (n
10. }
 1 (mod r ))
poly (log log n)
8. break;
9. r  r  1
r 1
q
;



O(log 6 n) iterations
Total:
1
1
O((log n  r )poly (log n  r 2 ))
6
2
or
O(log 9 n poly (log 9 n))
6
11. for a  1 to 2 r log n

12
12
n
n
r
((
x

a
)

(
x

a
)
(mod
x

1
,
n
))
12. if
O(log n poly(log n))
output COMPOSITE
13. output PRIME;
Total:
O(log 12n poly (log 12n))
Implications for future work:
There is a good chance that people are
already looking at implementing the new
idea of using modulus by a polynomial to
find a polynomial algorithm for factoring.
REFERENCES
M. Agrawal, N. Kayal, N. Saxena, ‘PRIMES is in P’.
R. Crandall and C. Pomerance, ‘Prime numbers: A
computational perspective’. Springer, 2001.
D. Knuth, ‘Art of computer programming’, VII. AddisonWesley, 1998.
H. Williams, ‘Edouard Lucas and Primality Testing’, CMS
Monographs, Wiley, 1998.