Lucas-Lehmer Primality Tester Presentation 8 March 22nd 2006

Download Report

Transcript Lucas-Lehmer Primality Tester Presentation 8 March 22nd 2006

Lucas-Lehmer Primality Tester
Team: W-4
Nathan Stohs W4-1
Brian Johnson W4-2
Joe Hurley W4-3
Marques Johnson W4-4
Design Manager: Prateek Goenka
1
Ancient Greeks
• 300 BC Euclid’s Proof
• Proved that were an infinite number of Prime
numbers that were irregularly spaced
2
How to find Prime Numbers
• The method used for smaller numbers is
called Sieve of Eratosthenes from 240 BC
• Trial Division is another method for smaller
numbers
3
43rd Known Mersenne Prime Found!!
• December 2005
• Dr. Curtis Cooper and Dr. Steven Boone
• Professors at Central Missouri State
University
• 230,402,457-1
4
rank
prime
digits
who
when
reference
1
230402457-1
9152052 G9
2005
Mersenne 43
2
225964951-1
7816230 G8
2005
Mersenne 42
3
224036583-1
7235733 G7
2004
Mersenne 41
4
220996011-1
6320430 G6
2003
Mersenne 40
5
213466917-1
4053946 G5
2001
Mersenne 39
6
27653.29167433+1
2759677 SB8
2005
7
28433.27830457+1
2357207 SB7
2004
8
26972593-1
2098960 G4
1999
9
5359.25054502+1
1521561 SB6
2003
10
4847.23321063+1
999744 SB9
2005
Mersenne 38
5
Prime Number Competitions
• Electronic Frontier Foundation
• $50,000 to the first individual or group who discovers
a prime number with at least 1,000,000 decimal digits (awarded
Apr. 6, 2000)
• $100,000 to the first individual or group who discovers
a prime number with at least 10,000,000 decimal digits
• $150,000 to the first individual or group who discovers
a prime number with at least 100,000,000 decimal digits
• $250,000 to the first individual or group who discovers
a prime number with at least 1,000,000,000 decimal digits
6
Mersenne Prime Algorithm
• For P > 2
• 2P-1 is prime if and only if Sp-2 is zero in
this sequence:
• S0 = 4
• SN = (SN-12 - 2) mod (2P-1)
7
Example to Show 27 - 1 is Prime
•
•
•
•
•
•
•
27 – 1 = 127
S0 = 4
S1 = (4 * 4 - 2) mod 127 = 14
S2 = (14 * 14 - 2) mod 127 = 67
S3 = (67 * 67 - 2) mod 127 = 42
S4 = (42 * 42 - 2) mod 127 = 111
S5 = (111 * 111 - 2) mod 127 = 0
8
Algorithmic description
We knew the computations needed, but how
to translate that to gates?
Computations needed:
-Squaring (not a problem…)
-Add/Subtract (not a problem…)
-Modulo (2^n – 1) multiplication (?)
9
Mechanisms behind the math
• If done with brute force, modulo 2^n-1 could have been
ugly.
– Would need to square and find the remainder via
division.
• Luckily, for that specific computation, math is on our
side, the 2^n-1 constraint saves us from division, as will
be seen.
• A quick search on www.ieee.com produced inspiration.
• Taken from
“Efficient VLSI Implementation of Modulo (2^n +- 1)
Addition and Multiplication”
Reto Zimmermann
Swiss Federal Institute of Technology (ETH)
10
Useful Math: Multiplication
Just like any other
multiplication, a
modulo multiplication
can be computed by
(modulo) summing
the partial products.
So modulo
multiplication is
multiplication using a
modulo adder.
11
Useful Math: The Modulo Adder
The more logic driven math that is the basis of our modulo
adder.
12
Last Bits: Modulo Reduction
At various points, such as when finding the partial
product, the result has to be reduced. There is a nifty
way to do that as well.
13
Block Diagram
16
Mod Calc
P
clk
start
1
16
4
FSM
1
16
Mod
Multiply
clk
2
16
2
done
Subtract 2
2
16
clk
Register r4
16
1
Count
clk
Out
Compare
1
14
Mod Multiply Block Diagram
from register
16
2
Counter
4
16
P
FSM
clk
Next Partial Product
16
16
Mod add
16
Register
2
2p-1
to sub 2
FSM
clk
16
15
Block Diagram
16
Mod Calc
P
clk
start
1
16
4
FSM
1
16
Mod
Multiply
clk
2
16
2
done
Subtract 2
2
16
clk
Register r2
16
1
Count
clk
Out
Compare
1
16
The Process So far:
Design Process
- Found Mathematical Means (core algorithm)
- Found Computational Means (modulo multiplier, adder)
From the above, a high level C program was written in a
manner that would easily translate to verilog and gates, or
at least more standard operations
int mod_square_minus(int value, int p, int offset) {
int acc, i;
int mod = (1 << p) - 1;
for(acc=offset, i=0; i<(sizeof(int)*8-1); i++) {
int a = (value >> i) & 1;
int temp;
if (a) {
if (i-p > 0)
temp = value << (i-p);
else
temp = value >> (p-i);
acc = acc + temp + ((value << i) & ((1 << p) - 1));
}
if (acc >= mod)
acc = acc - mod;
}
return acc;
}
This easily translated into behavorial
verilog, and readily turned into a gatelevel implementation. Essentially it
was written in a more low-level
manner.
17
Design Process
The rest of the design can simply be thought of as a
wrapper for the modulo multiplier.
The following slides contain Verilog code that was
directly taken from the C code below.
module mod_mult(out, itrCount, x, y, mod, p, reset, en, clk);
input [15:0] x, y, mod, p;
output [15:0] out;
input
reset, en, clk;
wire [15:0] pp, ma0, temp;
output [3:0] itrCount;
Top level of multiplier
counter mycount(itrCount, reset, en, clk);
partial_product ppg(pp, x, y, itrCount, mod, p);
mod_add modAdder(out, pp, temp, mod);
dff_16_lp partial(clk, out, temp, reset, en);
endmodule
18
module partial_product(out, x, y, i, mod, p);
output [15:0] out;
input [15:0] x, y, mod, p;
input [3:0] i;
Partial Product Unit w/ modulo reduction
wire [15:0] diff1, diff2, added, result, corrected, final;
wire [15:0] high, low, shifted, toadd;
wire
cout1, cout2, ithbith, toobig;
sub_16 difference1(diff1, cout1, {12'b0, i}, p);
sub_16 difference2(diff2, cout2, p, {12'b0, i});
shift_left shiftL(high, y, diff1[3:0]);
shift_right shiftR(low, y, diff2[3:0]);
mux16 choose(high, low, shifted, cout1);
shift_left shiftL2(toadd, y, i);
and16 bigand(added, toadd, mod);
fulladder_16 addhighlow(.out(result), .xin(added), .yin(shifted), .cin({1'b0}), .cout(nowhere));
sub_16 correct(.out(corrected), .cout(toobig), .xin(mod), .yin(result));
mux16 correctionMux(.out(final), .high(corrected), .low(result), .sel(toobig));
shift_right ibit({15'b0, ithbit}, x, i);
select16 checkfor0(.out(out), .x(result), .sel(ithbit));
endmodule
19
Modulo Adder
module mod_add(out, x, y, mod);
input [15:0] x, y, mod;
output [15:0] out;
wire
cout, isDouble, cin;
wire [15:0] plus, lowbits, done, mod_bar, check;
fulladder_16 add(.out(plus), .xin(x), .yin(y), .cin(cin), .cout());
invert_16 inverter(mod_bar, mod);
and16 hihnbits(check, plus, mod_bar);
and16 lownbits(done, plus, mod);
or8 (cin, check[0], check[1], check[2], check[3], check[4], check[5], check[6], check[7],
check[8], check[9], check[10], check[11], check[12], check[13], check[14], check[15]);
compare_16 checkfordouble(isDouble, done, 16'b1111_1111_1111_1111);
mux16 fixdouble(.out(out), .high(16'b0), .low(done), .sel(isDouble));
endmodule
20
Final Design Process Notes
• Lessons learned: Never tweak the schematics
without retesting the verilog first.
• Considering total time spent during this phase,
roughly half was on the “core” and the FSM, the
rest on the “wrapper”.
21
Road to verification : C
2 Examples of the high-level C implementations:
Tyrion:~/Desktop/15525 nstohs$ ./prime4 7
round 1: (4 * 4 - 2) mod 127 = 14
round 2: (14 * 14 - 2) mod 127 = 67
round 3: (67 * 67 - 2) mod 127 = 42
round 4: (42 * 42 - 2) mod 127 = 111
round 5: (111 * 111 - 2) mod 127 = 0
2^7-1 is prime
Tyrion:~/Desktop/15525 nstohs$ ./prime4 11
round 1: (4 * 4 - 2) mod 2047 = 14
round 2: (14 * 14 - 2) mod 2047 = 194
round 3: (194 * 194 - 2) mod 2047 = 788
round 4: (788 * 788 - 2) mod 2047 = 701
round 5: (701 * 701 - 2) mod 2047 = 119
round 6: (119 * 119 - 2) mod 2047 = 1877
round 7: (1877 * 1877 - 2) mod 2047 = 240
round 8: (240 * 240 - 2) mod 2047 = 282
round 9: (282 * 282 - 2) mod 2047 = 1736
2^11-1 is not prime
22
Road to verification: Verilog
Samples of Verilog Verification output:
Partial Product Unit p = 7
380 ppOut= 56, x= 14, y=
400 ppOut= 112, x= 14, y=
420 ppOut= 0, x= 14, y=
440 ppOut= 0, x= 14, y=
Top Level p = 7
itrOut= x
itrOut= 4
itrOut= 14
itrOut= 67
itrOut= 42
itrOut= 111
itrOut= 0
Tests were either specific tests on
important units such as
Partial_Product
14, i= 2, mod= 127, p= 7
14, i= 3, mod= 127, p= 7
14, i= 4, mod= 127, p= 7
14, i= 5, mod= 127, p= 7
Top Level p = 11
itrOut= x
itrOut= 4
itrOut= 14
itrOut= 194
itrOut= 788
itrOut= 701
itrOut= 119
itrOut= 1877
…
…our top level tests. Note that
these are the same results
generated from the C code
23
Road to verification: Schematic I
Schematic Test of our
modular adder.
128 + 68 Mod 127 = 69
24
Road to verification: Schematic II
Plot of the top level output after a single iteration, p=7
Output after a single iteration is 14, the expected value.
25
Road to verification: Schematic III
The simulation outputs after a full run, showing the results of all
iterations. Simulations start taking a long time. More on that later.
26
Road to verification: Intermission
Disk Space required for a full-length schematic test of p=7 : 6 GB
Time required for a full-length schematic test of p=7 : 4 hours
Disk Space required for a full-length extracted test of p=7 : more
Time required for a full-length extracted test of p=7 : longer
Disk Space required for a full-length extractedRC test of p=7 : 1 iPod
Time required for a full-length extractedRC test of p=7 : T_T
Simulations become very demanding and lengthy due to
tests needing to be “deep” to be useful. To meet such
demands, be sure to use Genuine AMD© CPUs.
27
Road to verification: Layout I
3 words: “the net-lists match”
Of course, there is far more to be concerned about.
Due to simulator issues, layout simulations were delayed on some major
modules.
Partial Product Sims In Progress (I Hope)
28
Road to verification: Layout II
Top Level layout Sims in Progress
29
Road to verification: Timing
Pathmill was useful to help us gauge our critical path, which is one
cycle through our modulo multiplier. When run on the top level, a
critical path of 12.703ns was found. This was in the ballpark relative to
our research.
Layout Timing Sims in progress
30
Issues
• extractedRC of partial_product module
• Registers switch
• Switching from parallel calculations to series
– Transistor count vs. clock cycles
• Syncing up design between people
– Transferring files
– Different design styles
• LONG simulation times
• Floorplanning
– Too much emphasis on aspect ratios and not enough on wiring
– Couldn’t decide on one set floorplan
31
Floorplan v1.0
Mod Multiplier
Prime
Logic
Memory
Mod Adder
FSM
32
Floorplan v2.0
33
Floorplan v3.0
34
Floorplan v4.0
35
Floorplan v5.0
36
Final Floorplan
37
Pin Specs
Pin
Type
# of Pins
Vdd!
In/Out
1
Gnd!
In/Out
1
p<0:15>
In
16
clk
In
1
start
In
1
Done
Out
1
out
Out
1
Total
-
22
38
Initial Part Specs
Module
Transistor
Count
Area
(µm²)
Transistor
Density
FSM
300
900
.33
mod_p
2,440
7,000
.35
mod_add
1,282
9,000
.14
partial_product
8,676
65,000
.13
count
1,656
6,000
.27
sub_16
704
3,500
.20
Registers
1,848
6,000
.30
compare
36
300
.12
Total
16,942
97,700
.17
39
Final Part Specs
Module
Transistor
Count
Area
(µm²)
Transistor
Density
Aspect
Ratio
FSM
152
1,200
.13
2.45
mod_p
1,280
8,603
.15
0.79
mod_add
1,168
5,603
.21
2.40
partial_product
7,520
54,680
.14
1.16
count
1,424
8,701
.16
6.88
sub_16
576
2,934
.20
4.49
Registers
896
6,028
.15
4.76
compare
56
201
.28
4.41
Total
13,702
86,621
.16
1.01
40
Chip Specs
•
•
•
•
•
Transistor Count: 13,702
Size: 296.51µm x 292.13µm
Area: 86,621µm²
Aspect Ratio: 1.01:1
Density: 0.16 transistors/µm²
41
Final Floorplan
42
Final Floorplan
43
Poly Layer
Density: 7.14%
44
Active Layer
Density: 8.76%
45
Metal1 Layer
Density: 23.86%
46
Metal2 Layer
Density: 19.97%
47
Metal3 Layer
Density: 11.30%
48
Metal4 Layer
Density: 10.34%
49
Conclusions
• Plan for buffers
– Can’t put them in after the fact
• Your design will change dramatically from
start to finish so be flexible
• Communication is key
• Do layout in parallel
50