Transcript Slide 1

Reconfigurable Computing
(EN2911X, Fall07)
Lab 2 presentations
Prof. Sherief Reda
Division of Engineering, Brown University
http://ic.engin.brown.edu
Reconfigurable Computing
S. Reda, Brown University
Runtimes by different teams
•
•
•
•
•
•
4.2 seconds
14 seconds
33 seconds
300 seconds
305 seconds
320 seconds
Reconfigurable Computing
S. Reda, Brown University
Palindrome Checker
Cesare Ferri
Rotor Le
Reconfigurable Computing
S. Reda, Brown University
Part I : Verilog Module
always @(posedge CLOCK_50)
begin
DECOMPOSE
not_palindrome = 1'd0;len = 0; tmp = number; //reset
THE NUMBER
for (i = 0; i<9 ; i = i + 4'd1)
IN DIGITS
begin
if (tmp > 0)
begin
modulo = tmp % 4'd10;
Room for
tmp = tmp / 10;
loop unrolling
vector[len % 9] = modulo;
here..
len = len + 1;
end
end
th = (len >> 1) ;
for (j=0; j<th; j = j + 4'd1)
begin
tmp2 = (len-1) - j; tmp3 = vector[j];tmp4 = vector[tmp2];
if ( tmp3 != tmp4 )
not_palindrome = 1'b1;
end
Reconfigurable
Computing = ~(not_palidrome);
result
S. Reda, Brown University
end
Part I : Verilog Module
always @(posedge CLOCK_50)
begin
not_palindrome = 1'd0;len = 0; tmp = number; //reset
for (i = 0; i<9 ; i = i + 4'd1)
begin
if (tmp > 0)
begin
modulo = tmp % 4'd10;
tmp = tmp / 10;
vector[len % 9] = modulo;
len = len + 1;
COMPARE THE
end
DIGITS STORED
end
INTO THE
th = (len >> 1) ;
VECTOR
for (j=0; j<th; j = j + 4'd1)
begin
tmp2 = (len-1) - j; tmp3 = vector[j];tmp4 = vector[tmp2];
if ( tmp3 != tmp4 )
loop
not_palindrome = 1'b1;
unrolling,
end
again..
Reconfigurable
Computing = ~(not_palidrome);
result
S. Reda, Brown University
end
Optimized Verilog Code
• Do loop unrolling to compare digits:
•
•
•
• if (digits[0] == digits[3] &&
•
digits[1] == digits[2])
•
not_palindrome = 1'd1;//reset
Reconfigurable Computing
S. Reda, Brown University
Unsolved things
• Our running time now depends on the way
that we extract digits from the number
• Some ideas to improve?
–Using shift register
–Using non-blocking instructions
Reconfigurable Computing
S. Reda, Brown University
Palindrome Homework Summary
ENGN2911X
Aaron Mandle
Bryant Mairs
Reconfigurable Computing
S. Reda, Brown University
Setup
• Two-cycle fixed length custom instruction
• Operates on 20 numbers at a time
• Returns total palindromes in that 20number block
Reconfigurable Computing
S. Reda, Brown University
Process
• Combinatorial conversion from binary to
BCD
• Check number of digits
• Compare digits based on length
• Total up number of valid palindromes
Reconfigurable Computing
S. Reda, Brown University
Binary to BCD Conversion
• Built using blocks of conditional add-3
modules and shifts
• Add-3 modules:
– 4-bit input
– Adds 3 if input was 5 or greater
• Based on adding 6 numbers > 9
Reconfigurable Computing
S. Reda, Brown University
module checkPalindrome(data, result);
input [31:0] data;
output [31:0] result;
wire [3:0] digits [10:0];
wire [3:0] digCount;
bin2bcd({digits[9], digits[8], digits[7], digits[6], digits[5], digits[4], digits[3], digits[2], digits[1], digits[0]}, data);
assign digCount = digits[9] != 0?10:
digits[8] != 0?9:
digits[7] != 0?8:
digits[6] != 0?7:
digits[5] != 0?6:
digits[4] != 0?5:
digits[3] != 0?4:
digits[2] != 0?3:
digits[1] != 0?2:
1;
assign result = digCount == 1 ||
digCount == 2 && (digits[0] == digits[1]) ||
digCount == 3 && (digits[0] == digits[2]) ||
digCount == 4 && (digits[0] == digits[3] && digits[1] == digits[2]) ||
digCount == 5 && (digits[0] == digits[4] && digits[1] == digits[3]) ||
digCount == 6 && (digits[0] == digits[5] && digits[1] == digits[4] && digits[2] == digits[3]) ||
digCount == 7 && (digits[0] == digits[6] && digits[1] == digits[5] && digits[2] == digits[4]) ||
digCount == 8 && (digits[0] == digits[7] && digits[1] == digits[6] && digits[2] == digits[5] && digits[3] == digits[4]) ||
digCount == 9 && (digits[0] == digits[8] && digits[1] == digits[7] && digits[2] == digits[6] && digits[3] == digits[5]);
endmodule
Reconfigurable Computing
S. Reda, Brown University
Yossi
Reconfigurable Computing
S. Reda, Brown University
For all solutions
• Finding the length of the decimal representation
(# digits) by:
typedef unsigned long UINT;
inline UINT GetMSDFIndx(UINT n) {
return
(n >= 100000000 ? 8
(n >= 10000000
? 7
(n >= 1000000
? 6
(n >= 100000
? 5
(n >= 10000
? 4
(n >= 1000
? 3
(n >= 100
? 2
(n >= 10
? 1
}
Reconfigurable Computing
S. Reda, Brown University
:
:
:
:
:
:
:
: 0))))))));
Software Only Solutions
• Times:
– On laptop (Intel 2333 MHz): 8 secs.
– On NIOS (100 MHz): 3500 secs.
• Inherently sequential
– Early false detection: quit the computation if
we find two digits that do not match.
 Brings down expected # divide operations to less than 2.2
Reconfigurable Computing
S. Reda, Brown University
Software Only Solutions
• Observations:
1. Detect whether the MSD is a given number without division
– MSD test: d is the MSD of number n of length L if and only if
d*10L-1 ≤ n < (d+1)* 10L-1
E.g 4*103 <= 4765 < 5*103
2. “Cut out” the MSD: 4665 – 4*103 = 665 and continue.
• Algorithm: find one LSD after another, compare
with MSDs, quit early if not a palindrome.
• Runs in 8 seconds on laptop
Reconfigurable Computing
S. Reda, Brown University
Software Only Solutions
• On NIOS, division is really expensive
• Division free algorithm:
Don’t test the MSD, find it with binary search
Reconfigurable Computing
S. Reda, Brown University
Software Only Solutions
• On NIOS, division is really expensive
• Algorithm:
– Start from left
– Find half of the digits
– Compute the palindrome whose left half matches
these digits
– Compare to the tested number
Loose the early false detection, but still better than division.
Runs in 3500 secs on NIOS 100 MHz.
Reconfigurable Computing
S. Reda, Brown University
Using the Hardware
• A general trick to divide by a constant
without using division.
Based on trick I read in “Hackers Delight” of how to divide by 3.
• Demonstrate on divide by 10:
– Given: number n < 230
– Needed: floor(n/10)
– Algorithm:
Multiply n by (231+2)/10 = 0xCCCCCCD, and
then shift right 31 positions.
Reconfigurable Computing
S. Reda, Brown University
Division Free divide by 10
• Algorithm:
Multiply n<230 by (231+2)/10 = 0xCCCCCCD, and then
shift right 31 positions.
• Proof: The above algorithm outputs:
floor(n/10) <= n/10 + 2*n/(10*231) < floor(n/10) + 1
floor[ n * ((231+2)/10) * 1/231 ] = floor [ n/10 + 2*n/(10*231) ]
n < 230 implies: 2*n < 231 
< 1/10
floor(n/10) <= n/10 <= floor(n/10) + 9/10
Reconfigurable Computing
S. Reda, Brown University
= floor(n/10)
Divide by Constant
• Similarly, to divide n by a constant C, we need to
find P and R such that:
– 2P + R = 0 mod C.
– R*n < 2P
And then multiply n by (2P + R)/C, and shift
right P positions.
• Found the constants to all powers of 10 needed.
Algorithm worst register to register delay: 25 ns.
• Run Time: 33 secs.
Reconfigurable Computing
S. Reda, Brown University
EN2911X Lab 2:
Palindromes
Brian Reggiannini and Chris
Erway
Reconfigurable Computing
S. Reda, Brown University
Checking a palindrome
• All combinational logic!
• Step 1: Convert 30-bit integer to 37-bit
binary-coded decimal (BCD) format
• Step 2: Detect the length of decimal
number
• Step 3: Compare pairs of digits with XOR
Reconfigurable Computing
S. Reda, Brown University
Binary to BCD converter
Reconfigurable Computing
S. Reda, Brown University
Binary to BCD converter
Reconfigurable Computing
S. Reda, Brown University
Integration with Nios II
• Worst-case propagation delay: 43ns, 5
cycles
• Don’t want to wait! Use 32-bit PIO
interface
• Array of 25 palindrome-checking units
• Write out 32-bit start value…
– Read back # of total palindromes found (from
next 25)
– While Nios is waiting: increment loop counter
Reconfigurable Computing
S. Reda, Brown University
Nios Software
Reconfigurable Computing
S. Reda, Brown University
Results
• Original C program: 49.59s/billion
• Unoptimized Nios C program:
7842s/100million
• Final result: 4.2s/billion (420000036 cycles
@ 100MHz)
– Total logic elements: 23,039 / 33,216 (69%)
Reconfigurable Computing
S. Reda, Brown University