T(n/2) - Lyle School of Engineering

Download Report

Transcript T(n/2) - Lyle School of Engineering

Dr. Peter-Michael Seidel
Computer Science and Engineering
Southern Methodist University
Dallas, TX, 75275
Basic Data Structures (1)
• Data Structure
– Single digits fixed to 32 bits
(1 Digit “Word”)
Long Precision Digit
bool
sign
1 bit
Word
32 bits
• Long Precision Data Structure
– numbers capable of fixed # digits 4096 bits (multiplication of two
2048 bit numbers)
->
128 digits,
fractions ?
– access to units of Words(32bit) & DWord(64bit)
– taking care of machine precision and available data types
– memory allocation has large impact on running times
• Operations
– basic arithmetic operations (Long Precision and Modular Arithmetic)
– Constructor, Destructor
– GetDigit() to access an arbitrary digit of a number
Variable Precision Data Structure
• Digit
– Implemented based on Long Precision “Digit” (32 or 16 bits)
• Long Precision Number
bool
Word
sign
1 bit 32 bits
Word
32 bits
Long precision number
Word
Word
32 bits
32 bits
m_iMaxLen: up to 128 “digits”
• Number Components
– magnitude:
Array[m_iMaxLen] of Word
– sign:
Bool
– actual length: m_MSD: pos. Integer in [0 : m_iMaxLen]
Operations
• Constructors (Destructor)
– Four different types possible (including copying existing numbers)
• Import/Export Functions
– To convert numbers and interface
• Addition
–
–
–
–
Use Assembler Macro to add
2 Digits + Carry
Addition only necessary for digits up to min{A.MSD, B.MSD}
Propagation of Carries for digits up to max{A.MSD, B.MSD}
Adjustment of actual result length
addition:
{ max{A.MSD, B.MSD}, max{A.MSD, B.MSD}+1 }
subtraction: [ 0 : max{A.MSD, B.MSD} ]
• Multiplication
– Consider recursive methods that reduce problem to addition,
e.g. Karatsuba’s Algorithm
– Comparing fixed solutions at 16 bits and at 32 bits
– Considering “School Method” for lowest level(s) (fewer additions)
FP Expansions
Represent long precision as sum of FP numbers:
• aligned significands:
• Range:
– Single precision
(exponent range: [-126,127])
• Could represent numbers up to 2128 with resolution of 2-139 (266 bits)
• Exact representation requires up to 12 FP numbers
– Double precision
(exponent range: [-1022,1023])
• Could represent numbers up to 21024 with resolution of 2-1074 (2096 bits)
• Exact representation requires up to 40 FP numbers
– Extension: Could be considered as “digits” in larger system
Exact Addition of 2 FP numbers
Find C,D, so that
A+B = C+D
C,D non-overlapping
D < C*2-p
A
B
C
D
Computation in 3 steps (assume A > B):
C = RNE(A+B)
E = RNE*(C-A)
(representable)
D = RNE*(B-E)
(representable)
Achieves most compact representation for FP Expansions:
– Sort, process from right to left
(effort can be reduced to 2 adds/FP number)
Multiplication
• Of two n-digit numbers
How to multiply 2 n-digit numbers.
X
2 n2
********
********
********
********
********
********
********
********
********
********
****************
Multiplication of 2 n-bit numbers
• X=
• Y=
a
c
• X = a 2n/2 + b
b
d
Y = c 2n/2 + d
• XY = ac 2n + (ad+bc) 2n/2 + bd
Mult of 2 n-bit(digit) numbers
a
b
• X=
c
d
• Y=
• XY = ac 2n + (ad+bc) 2n/2 + bd
MULT(X,Y):
If |X| = |Y| = 1 then RETURN XY
Break X into a;b and Y into c;d
RETURN
MULT(a,c) 2n + (MULT(a,d) + MULT(b,c)) 2n/2 + MULT(b,d)
Time required by MULT
T(n)
T(1)
= 4T(n/2) + 3TAdd(n)
= 1
Form: T(n) = an2 + bn + c for some a,b,c
Calculate: T(1) = 1  a + b + c = 1
T(n) = 4 T(n/2) + 3n
 an2 + bn + c = 4 [a(n/2)2 + b(n/2) + c] + 3n
= an2 + 2bn + 4c + 3n
 (b+3)n + 3c = 0
Therefore: b=-3 c=0 a=4
=> T(n) = 4n2 - 3n
MULT computation
MULT(X,Y):
If |X| = |Y| = 1 then RETURN XY
Break X into a;b and Y into c;d
RETURN
MULT(a,c) 2n + (MULT(a,d) + MULT(b,c)) 2n/2 + MULT(b,d)
MULT is recursively called 4 times.
Karatsuba-Ofman Multiplication
Input: a,b,c,d
Output: ac, ad+bc, bd
•
X1 = a + b
•
X2 = c + d
•
X3 = X1 X2 = ac + ad + bc + bd
•
X4 = ac
•
X5 = bd
•
X6 = X3 – X4 - X5 = ad + bc
T(n) = 3T(n/2) + 3TAdd(n)
T(1) = 1
Karatsuba-Ofman Mult 1962
MULT(X,Y):
If |X| = |Y| = 1 then RETURN XY
Break X into a;b and Y into c;d
e = MULT(a,c) and f =MULT(b,d)
RETURN e2n + (MULT(a+b, c+d) – e - f) 2n/2 + f
•T(n) = 3 T(n/2) + 3n
•Actually: T(n) = 2 T(n/2) + T(n/2 + 1) + 3n
T(n)
=
T(n/2)
3n
T(n/2)
T(n/2)
T(n)
=
3n
3n/2
T(n/4)T(n/4)T(n/4)
T(n/2)
T(n/2)
T(n)
=
3n
3n/2
3n/2
3n/2
T(n/4)T(n/4)T(n/4)
T(n/4)T(n/4)T(n/4)
T(n/4)T(n/4)T(n/4)
0
1
3n
3n/2
+
3n/2
+
3n/2
2
3n/4 + 3n/4 + 3n/4 + 3n/4 + 3n/4 + 3n/4 + 3n/4 + 3n/4 + 3n/4
i
Level i is the sum of 3i copies of 3n/2i
..........................
log2n
1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1
= 3n
= 9/2n
= 27/4n
= 3(3/2)in
 (3 / 2)log2n n
log2n
log2n
3n(1+3/2+(3/2)2+ . . . + (3/2) ) -2n(3/2) =
7n
log2 3
 6n
The Geometric Sum
S  1  x  x  x ... x
2
Sx 
k
x  x  x ... x  x
k 1
x
k 1
2
S  Sx  1
k 1
1
S
x1
x
3
3
k
Toom-Cook Multiplication
Recursive reduction of n bit/digit multiplication
to 5 multiplications of size n/3 + some additions
=>
T(n) = O(nlog5/log3) = O(n1.465…)
Multiplication Algorithms
conventional
n2
Karatsuba
n1.58…
Toom - Cook
n1.465…
FFT (Brigham)
n logn2
(Schönhage – Strassen)
n logn loglogn
Practical Comparisons
Implementation in GMP / NTL (GNU Multiple Precision)
REFERENCES
•Borodin, A. and Munro, I. The Computational Complexity of Algebraic and Numeric
Problems. New York: American Elsevier, 1975.
•Borwein, J. M.; Borwein, P. B.; and Bailey, D. H. "Ramanujan, Modular Equations, and
Approximations to Pi, or How to Compute One Billion Digits of Pi." Amer. Math. Monthly 96,
201-219, 1989.
•Brigham, E. O. The Fast Fourier Transform. Englewood Cliffs, NJ: Prentice-Hall, 1974.
•Brigham, E. O. Fast Fourier Transform and Applications. Englewood Cliffs, NJ: PrenticeHall, 1988.
•Cook, S. A. On the Minimum Computation Time of Functions. Ph.D. Thesis. Cambridge, MA:
Harvard University, pp. 51-77, 1966.
•Hollerbach, U. "Fast Multiplication & Division of Very Large Numbers." sci.math.research
posting, Jan. 23, 1996.
•Karatsuba, A. and Ofman, Yu. "Multiplication of Many-Digital Numbers by Automatic
Computers." Doklady Akad. Nauk SSSR 145, 293-294, 1962. Translation in Physics-Doklady
7, 595-596, 1963.
•Knuth, D. E. The Art of Computer Programming, Vol. 2: Seminumerical Algorithms, 3rd ed.
Reading, MA: Addison-Wesley, pp. 278-286, 1998.
•Schönhage, A. and Strassen, V. "Schnelle Multiplikation Grosser Zahlen." Computing 7,
281-292, 1971.
•Toom, A. L. "The Complexity of a Scheme of Functional Elements Simulating the
Multiplication of Integers." Dokl. Akad. Nauk SSSR 150, 496-498, 1963. English
translation in Soviet Mathematics 3, 714-716, 1963.
•Zuras, D. "More on Squaring and Multiplying Large Integers." IEEE Trans. Comput. 43,
899-908, 1994.