d - Futureware

Download Report

Transcript d - Futureware

Experimenting with Shared
Generation of RSA Keys
Michael Malkin
Thomas Wu
Dan Boneh
Stanford University
*Supported by DARPA
Why Share Keys?
CA
CA
d
d1
d2
d3
The private key is never reconstructed!
Who generates the shared key?
1/4/99
2
Trusted Dealers
CA
d1
d2
Trusted
Dealer
d3
Drawbacks:
• Single point of failure
• May have to destroy dealer afterwards
1/4/99
3
Distributed Generation
Advantages:
• Nobody ever knows the entire key
• No single point of failure
Step 1
Step 2
Step 3
d1
d2
d3
1/4/99
N
e
4
RSA Keys
N
An n-bit modulus, N = pq
e
The encryption (public) key
d
The decryption (private) key
Sharing of d : d = d1 + d2 + d3
 Can apply key without reconstructing d
d is the secret
p or q  d
1/4/99
5
Distributed Generation*
1
p1
q1
p2
q2
3
p3
q3
Biprimality
Test
pi, qi are n/2 bit integers
2
p1
q1
p2
q2
p3
q3
N
4
p1,q1
d1
p2,q2
d2
p3,q3
d3
N = (p1 + p2 + p3)·(q1 + q2 + q3) = pq
Nobody ever knows p or q!
1/4/99
(*Boneh-Franklin)
6
How Do They Compare?
Non-Distributed:
• Pick prime p
• Pick prime q
• Multiply
Distributed:
• Pick N
• Hope N = pq is an RSA modulus
• Can’t test p and q separately
Distributed generation takes more iterations
1/4/99
7
Main Results
Initial time: 2.5 hours
(1024-bit key)
• Distributed Sieving
× 10
• Multithreading
×6
• Load Balancing
× 1.3
• Parallel Trial Division
× 1.3
Final time: 1.5 minutes
1/4/99
8
Minding Your p’s and q’s
• Bad N  probably divisible by 3 or 5 or 7 or …
• Idea: Ensure that N isn’t divisible by any small primes
 Distributed Sieving
• Can pick pi, qi so that p, q are not divisible by small primes
… But nobody actually knows p or q!
1/4/99
9
Using Idle Time
• Synchronous algorithm  synchronization delays
• Under-utilizing CPU — idle 80% of time
 Multithreading
• 6 threads optimal for 1024-bit key
• Almost 6 times faster!
(On 300Mhz Pentium II’s running Solaris 2.6)
1/4/99
10
Costly Biprimality Test
• Biprimality test involves time-consuming calculation
• Idea: Only one server needs to do this
 Load Balancing
• A different server does test for each iteration
• Probabilistic load balancing
1/4/99
11
More Small Primes
• What about small primes not covered by sieving?
• Trial division on N by small primes
 Parallel Trial Division
• Each server does trial division on different small primes
1/4/99
12
Private Key Generation
• Implemented method for small e
• In RSA usually use a small e
• After N is found, generate d1, d2, and d3 so:
d1 + d2 + d3 = d
… But do this so that nobody ever knows d
• There is an additional way to share d
• Only a fraction of servers need to be active
1/4/99
13
Implementation: Config File
Num_Servers:
Key_Length:
Threads:
3
Normal
2
TrialDiv_End:
Sieve:
Test_Mode:
Sequence_Numbers:
Transport:
10000
True
True
True
sslv3
Share_IP_Port_0:
Server_IP_Addr_0:
Server_Sequence_File_0:
Server_Cert_0:
Server_Key_0:
1/4/99
8080
ittc.stanford.edu
com_security/seq0
com_security/cert_s0.pem
com_security/key_s0.pem
14
Implementation: COM
• Abstraction layer
• Fault tolerance - non-blocking I/O
• Private, authenticated channels
• Based on SSLeay
• Authenticates share servers using a server certificate:
/C=US/ST=California/O=Stanford University/
OU=ITTC Project/CN=[SERVER 0]
1/4/99
15
Shared Key Storage
• Stored as PEM-encoded ASN.1 format
Data Type
Integer
Integer
Integer
Integer
Integer
M
Integer
1/4/99
Field
Version
N
e
k
d1
M
dk
16
Performance
Primality
Network
Key Size Threads Tests Iterations Total Time Traffic
512 bit
2
36
119
0.15 min 0.18 Mb
1024 bit
6
49
130
1.5 min 1.16 Mb
2048 bit
6
234
495
18 min 7.48 Mb
On three 300Mhz Pentium II’s running Solaris 2.6
• Network bandwidth is reasonable
• 1024-bit works well
• 2048-bit is reasonable
1/4/99
17
Effect of Number of Servers
Minutes
Time to generate a 1024-bit RSA key
6
5
4
3
2
1
0
3
4
5
WAN 1 CPU
Number of Servers
WAN:
• Two servers at Stanford
• One server at University of Wisconsin at Madison
• Difficult to find PC’s running Solaris
1/4/99
18
1200
7
1000
6
800
5
Minutes
Iterations per Thread
Effect of Threads
600
400
4
3
2
200
1
0
0
0
2
4
6
8
0
2
Threads
4
6
8
Threads
• Synchronization/CPU tradeoff
• Minimize time with 6 threads
*Generating a 1024-bit RSA key
1/4/99
19
12000
1200
10000
1000
Seconds
Iterations
Effect of Distributed Sieving
8000
6000
4000
2000
800
600
400
200
0
0
50
150
0
0
Sieve Bound
50
150
Sieve Bound
• Sieve bound is largest prime sieved
• Larger sieve  fewer iterations
• Diminishing returns
*Generating a 512-bit RSA key
1/4/99
20
Conclusions
key generation is practical:
 Distributed
• 1.5 minutes for 1024-bit key
 Several practical improvements to algorithm
• Distributed Sieving
• Multithreading
• Load Balancing
• Parallel Trial Division
 Optimized cryptographic algorithm
• Requires security proofs
http://www.stanford.edu/~dabo/ITTC
1/4/99
21