PowerPoint 프레젠테이션 - German University in Cairo

Transcript PowerPoint 프레젠테이션 - German University in Cairo

ELECT 90X
Programmable Logic Circuits:
Floating-Point Numbers
Dr. Eng. Amr T. Abdel-Hamid
Slides based on slides prepared by:
• B. Parhami, Computer Arithmetic: Algorithms and Hardware
Design, Oxford University Press, 2000.
• I. Koren, Computer Arithmetic Algorithms, 2nd Edition, A.K.
Peters, Natick, MA, 2002.
Fall 2009
The ANSI /IEEE Floating-Point Standard
Programmable Logic Circuits
Short (32-bit) format
8 bits,
bias = 127,
–126 to 127
Sign Exponent
11 bits,
bias = 1023,
–1022 to 1023
23 bits for fractional part
(plus hidden 1 in integer part)
IEEE 754 Standard
(now being revised
to yield IEEE 754R)
Significand
52 bits for fractional part
(plus hidden 1 in integer part)
Dr. Amr Talaat
Long (64-bit) format
The ANSI/IEEE standard floating-point number representation formats.
ELECT 90X
Overview of IEEE 754 Standard Formats
Programmable Logic Circuits
Some features of the ANSI/IEEE standard floating-point number representation formats.
Dr. Amr Talaat
––––––––––––––––––––––––––––––––––––––––––––––––––––––––
Feature
Single / Short
Double / Long
––––––––––––––––––––––––––––––––––––––––––––––––––––––––
Word width (bits)
32
64
Significand bits
23 + 1 hidden
52 + 1 hidden
Significand range
[1, 2 – 2–23]
[1, 2 – 2–52]
Exponent bits
8
11
Exponent bias
127
1023
Zero (0)
e + bias = 0, f = 0
e + bias = 0, f = 0
Denormal
e + bias = 0, f  0
e + bias = 0, f  0
represents  0.f  2–126 represents 0.f 2–1022
Infinity ()
e + bias = 255, f = 0
e + bias = 2047, f = 0
Not-a-number (NaN)
e + bias = 255, f  0
e + bias = 2047, f  0
Ordinary number
e + bias  [1, 254]
e + bias  [1, 2046]
e  [–126, 127]
e  [–1022, 1023]
represents 1.f  2e
represents 1.f  2e
min
2–126  1.2  10–38
2–1022  2.2  10–308
max
 2128  3.4  1038
 21024  1.8  10308
––––––––––––––––––––––––––––––––––––––––––––––––––––––––
ELECT 90X
Exponent Encoding
Programmable Logic Circuits
Exponent encoding in 8 bits for the single/short (32-bit) ANSI/IEEE format
Decimal code
Hex code
Exponent value
0
00
1
01
126 127 128
7E 7F 80
–126
–1
0
254 255
FE FF
+1
+127
1.f  2e
f = 0: Representation of 0
f  0: Representation of denormals,
0.f  2–126
–
Dr. Amr Talaat
Exponent encoding in
11 bits for the double/long
(64-bit) format is similar
f = 0: Representation of 
f  0: Representation of NaNs
max –
Negative numbers
FLP –
min –
Sparser
Overflow
region
0
Denser
min +
Positive numbers
FLP +
Denser
Underflow
example
+
Sparser
Underflow
regions
Midway
example
max +
Overflow
region
Typical
example
Overflow
example
ELECT 90X
Special Operands and Denormals
Programmable Logic Circuits
Operations on special operands:
Ordinary number  (+) = 0
(+)  Ordinary number = 
NaN + Ordinary number = NaN
0
1
Biased value
2
. . .
253 254 255
-126 -125
. . .
126 127
Ordinary FLP numbers
0, Denormal
( 0.f 
2–126)
, NaN
Dr. Amr Talaat
ELECT 90X
Denormalized Numbers
Programmable Logic Circuits
No hidden bit - significands not normalized
Exponent - -126 selected instead of 0-127=-127
+ 12 -126
- smallest normalized number is F min=
-23 -126
Smallest representable number is 2  2 =
2-149
-126
 instead of 2
- gradual (or graceful) underflow
 Does not eliminate underflow - but reduces gap between sm
allest representable number and zero; 2 = distance betw
een any two consecutive denormalized numbers = distance
between
two consecutive normalized numbers with smallest
-149
exponent 1-127=-126




Dr. Amr Talaat
ELECT 90X
Requirements for Arithmetic
Programmable Logic Circuits
Results of the 4 basic arithmetic operations (+, -, , )
as well as square-rooting must match those obtained
if all intermediate computations were infinitely precise
That is, a floating-point arithmetic operation should introduce no
more imprecision than the error attributable to the final rounding of
a result that has no exact representation (this is the best possible)
Example:
(1 + 2-1)

(1 + 2-23 )
Dr. Amr Talaat
Exact result
1 + 2-1 + 2-23 + 2-24
Rounded result
1 + 2-1 + 2-22
Error = ½ ulp
ELECT 90X
Basic Floating-Point Algorithms
Programmable Logic Circuits
Addition
Assume e1  e2; alignment shift (preshift) is needed if e1 > e2
( s1  b e1) + ( s2  b e2) = ( s1  b e1) + ( s2 / b e1–e2)  b e1
= ( s1  s2 / b e1–e2)  b e1 =  s  b e
Example:
Dr. Amr Talaat
Rounding,
overflow,
and
underflow
issues
discussed
later
Numbers to be added:
x = 25  1.00101101
y = 21  1.11101101
Operand with
smaller exponent
to be preshifted
Operands after alignment shift:
x = 25  1.00101101
y = 25  0.000111101101
Result of addition:
s = 25  1.010010111101
s = 25  1.01001100
Extra bits to be
rounded off
Rounded sum
ELECT 90X
Floating-Point Multiplication and Division
Programmable Logic Circuits
Multiplication
( s1  b e1)  ( s2  b e2) = ( s1  s2 )  b e1+e2
Because s1  s2  [1, 4), postshifting may be needed for normalization
Overflow or underflow can occur during multiplication or normalization
Division
( s1  b e1) / ( s2  b e2) = ( s1 / s2 )  b e1-e2
Dr. Amr Talaat
Because s1 / s2  (0.5, 2), postshifting may be needed for normalization
Overflow or underflow can occur during division or normalization
ELECT 90X
Exceptions in Floating-Point Arithmetic
Programmable Logic Circuits
Divide by zero
Overflow
Underflow
Inexact result: Rounded value not the same as original
Dr. Amr Talaat
Invalid operation: examples include
Addition
(+) + (–)
Multiplication
0
Division
0 / 0 or  / 
Square-rooting
operand < 0
ELECT 90X
Rounding Schemes
Programmable Logic Circuits
Whole part
Fractional part
xk–1xk–2 . . . x1x0 . x–1x–2 . . . x–l
Round
yk–1yk–2 . . . y1y0
ulp
The simplest possible rounding scheme: chopping or truncation
xk–1xk–2 . . . x1x0 . x–1x–2 . . . x–l
Chop
xk–1xk–2 . . . x1x0
ulp
Dr. Amr Talaat
ulp -Unit in the last position - weight of the least-significant bit of
the fractional significand
ELECT 90X
Truncation (Chopping)
Programmable Logic Circuits
 d extra digits removed - no change in m remaining digits
- rounding towards zero
 For F1  x  F2 - Trunc(x) results in F1 (Trunc(2.99)=2)
 Fast method - no extra hardware
 Poor numerical performance - Error up to ulp
Dr. Amr Talaat
ELECT 90X
Truncation or Chopping
Programmable Logic Circuits
chop(x)
4
3
2
1
x
–4
–3
–2
–1
1
2
3
4
–1
–2
–3
Dr. Amr Talaat
–4
Truncation or chopping of a
signed-magnitude number (same
as round toward 0).
ELECT 90X
Round to Nearest Number
rtn(x)
Programmable Logic Circuits
Rounding has a slight upward bias.
Consider rounding
(xk–1xk–2 ... x1x0 . x–1x–2)two
to an integer (yk–1yk–2 ... y1y0 . )two
4
3
2
The four possible cases, and their
representation errors are:
1
x–1x–2
00
01
10
11
x
–4
–3
–2
–1
1
2
3
4
–1
–2
Dr. Amr Talaat
–3
Round
down
down
up
up
Error
0
–0.25
0.5
0.25
With equal prob., mean = 0.125
–4
Rounding of a signed-magnitude value to
the nearest number.
ELECT 90X
Round to Nearest Even Number
Programmable Logic Circuits
rtne(x)
4
3
2
1
x
–4
–3
–2
–1
1
2
3
4
–1
–2
–3
Dr. Amr Talaat
–4
Rounding to the nearest even
number.
ELECT 90X
Programmable Logic Circuits
Round to Nearest
Even
 In case of a tie (X.10),


Dr. Amr Talaat




 choose out of F1 and F2
the even one (with least-significant bit 0)
Alternately rounding up and down - unbiased
Round-to-Nearest-Odd - select the one with
least-significant bit 1
d=2 :
Sum of errors=0
Bias=0
Mandatory in IEEE floating-point standard
ELECT 90X
FLP Addition Hardware
Programmable Logic Circuits
Isolate the sign, exponent, significand
Reinstate the hidden 1
Convert operands to internal format
Identify special operands, exceptions
x
Operands
Unpack
Signs Exponents
Significands
Add/
Sub
Mu x
Selective complement
and possible swap
Sub
Align significands
E1–E2
(–1)s1 m1
+
(–1)s2
m2
y
cout
Control
& sign
logic
Add
Normalize
(–1)s m
Dr. Amr Talaat
Combine sign, exponent, significand
Hide (remove) the leading 1
Identify special outcomes, exceptions
Round and
selective complement
Add
Sign
Normalize
Exponent
Significand
Pack
s
Sum/Difference
ELECT 90X
cin
Pre- and Postshifting
Programmable Logic Circuits
Four-stage
combinational
shifter for
preshifting
an operand
by 0 to 15 bits.
Dr. Amr Talaat
x i+ 8
x i+ 7 x i+ 6
x i+ 5
x i+ 4
x i+ 3 x i+ 2
x i+ 1
xi
y i+ 8
y i+ 7 y i+ 6
y i+ 5
y i+ 4
y i+ 3 y i+ 2
y i+ 1
yi
LSB
4-Bi t
Sh ift
Amou n t
MSB
ELECT 90X
Leading Zeros / Ones Detection or Prediction
Programmable Logic Circuits
Signi fi cand
Adder
Count
Leading
0s/ 1s
Adj us t
Exponent Shi ft amount
P ost-Shi fter
Signi fi cand
Adder
Dr. Amr Talaat
P redi ct
Leading
0s/ 1s
Adj us t
Exponent Shi ft amount
P ost-Shi fter
Leading zeros/ones counting
versus prediction.
ELECT 90X
Rounding and Exceptions
Programmable Logic Circuits
Adder result = (coutz1z0 . z–1z–2 . . . z–l G R S)2’s-compl
Guard bit
Why only 3 extra bits?
Round bit
Sticky bit
OR of all bits
shifted past R
Amount of alignment right-shift
One bit: G holds the bit that is shifted out, no precision is lost
Two bits or more: Shifted significand has a magnitude in [0, 1/2)
Unshifted significand has a magnitude in [1, 2)
Difference of aligned significands has a magnitude in [1/2, 2)
Normalization left-shift will be by at most one bit
Dr. Amr Talaat
If a normalization left-shift actually takes place:
R = 0, round down, discarded part < ulp/2
R = 1, round up, discarded part  ulp/2
The only remaining question is establishing whether the discarded part
is exactly ulp/2 (for round to nearest even); S provides this information
ELECT 90X
Implementation of Rounding for Addition
Programmable Logic Circuits
The effect of 1-bit normalization shifts on the rightmost few bits of
the significand adder output is as follows:
Before postshifting (z)
1-bit normalizing right-shift
S
1-bit normalizing left-shift
After normalization (Z)
. . . z–l+1
. . . z–l+2
z–l |
z–l+1 |
G
z–l
. . . z–l
. . . Z–l+1
G
Z–l
R
S
0
Z–l–1 Z–l–2 Z–l–3
|
|
R
G
S
R
Dr. Amr Talaat
Note that no rounding is needed in case of multibit left-shift,
because full precision is preserved in this case
Round to nearest even:
Do nothing if Z–l–1 = 0 or Z–l = Z–l–2 = Z–l–3 = 0
Add ulp = 2–l otherwise
ELECT 90X
Exceptions in Floating-Point Addition
Programmable Logic Circuits
• Overflow/underflow detected by
exponent adjustment block
• Overflow can occur only for
normalizing right-shift
• Underflow possible only with
normalizing left shifts
Dr. Amr Talaat
• Exceptions involving NaNs and
invalid operations handled by
unpacking and packing blocks
• Zero detection: Special case of
leading 0s detection
x
Operands
y
Unpack
Signs Exponents
Significands
Add/
Sub
Mu x
Selective complement
and possible swap
Sub
Align significands
cout
Control
& sign
logic
Add
Normalize
Round and
selective complement
Add
Sign
Normalize
Exponent
Significand
Pack
s
Sum/Difference
ELECT 90X
cin
Floating-Point Multipliers
Programmable Logic Circuits
( s1  b e1)  ( s2  b e2) = ( s1  s2 )  b e1+e2
Overflow or underflow can occur during
multiplication or normalization
Floating-point operands
Unpack
XOR
Add
Exponents
Mul ti pl y
Signi fi cands
Speed considerations
Many multipliers produce the lower half
of the product (rounding info) early
Adj us t
Exponent
Normal ize
Need for normalizing right-shift is known
at or near the end
Dr. Amr Talaat
Hence, rounding can be integrated in
the generation of the upper half,
by producing two versions of these bits
Block diagram of a floating-point multiplier.
Round
Adj us t
Exponent
Normal ize
P ack
P roduct
ELECT 90X
FP Multiplication
Programmable Logic Circuits
Dr. Amr Talaat
 Operands
(–1)s1 M1 2E1 *
(–1)s2 M2 2E2
 Exact Result
(–1)s M 2E
 Sign s: s1 ^ s2
 Significand M: M1 * M2
 Exponent E:
E1 + E2
 Fixing
 If M ≥ 2, shift M right, increment E
 If E out of range, overflow
 Round M to fit precision
 Implementation
 Biggest problem is multiplying significants
ELECT 90X
Computational Errors
Programmable Logic Circuits
FLP approximates exact computation with real numbers
Two sources of errors to understand and counteract:
Representation errors
e.g., no machine representation for 1/3, 2, or p
Arithmetic errors
Dr. Amr Talaat
e.g., (1 + 2–12)2 = 1 + 2–11 + 2–24
not representable in IEEE 754 short format
ELECT 90X
Representation and Arithmetic Errors
Programmable Logic Circuits
Example: Compute 1/99 – 1/100, using a decimal floating-point
format with 4-digit significand and single-digit signed exponent
Precise result = 1/9900  1.010  10–4 (error  10–8 or 0.01%)
Chopped to 3 decimals
x = 1/99  1.010  10–2
Error  10–6 or 0.01%
y = 1/100 = 1.000  10–2
Error = 0
Dr. Amr Talaat
z = x –fp y = 1.010  10–2 – 1.000  10–2 = 1.000  10–4
Error  10–6 or 1%
ELECT 90X
Notation for a General Floating-Point System
Number representation in FLP(r, p, A)
Programmable Logic Circuits
Radix r (assume to be the same as the exponent base b)
Precision p in terms of radix-r digits
Approximation scheme A  {chop, round, rtne, chop(g), . . .}
Let x = r es be an unsigned real number, normalized such that 1/r  s <
1,
and assume xfp is the representation of x in FLP(r, p, A)
xfp = r e sfp = (1 + h)x
h is the relative representation error
A = chop
–ulp < sfp – s  0
–r  ulp < h  0
A = round
–ulp/2 < sfp – s  ulp/2
 h   r  ulp/2
Dr. Amr Talaat
Arithmetic in FLP(r, p, A)
Obtain an infinite-precision result, then chop, round, . . .
Real machines approximate this process by keeping g > 0 guard digits,
thus doing arithmetic in FLP(r, p, chop(g))
ELECT 90X
Error Analysis for Multiplication and Division
Programmable Logic Circuits
Errors in floating-point multiplication
Consider the positive operands xfp and yfp
xfp fp yfp
=
=
=

(1 + h) xfp yfp
(1 + h)(1 + s)(1 + t) xy
(1 + h + s + t + hs + ht + st + hst) xy
(1 + h + s + t) xy
Errors in floating-point division
Again, consider positive operands xfp and yfp
Dr. Amr Talaat
xfp /fp yfp
=
=
=

(1 + h) xfp / yfp
(1 + h)(1 + s)x / [(1 + t)y]
(1 + h)(1 + s)(1 – t)(1 + t2)(1 + t4)( . . . ) x/y
(1 + h + s – t) x / y
ELECT 90X
Error Analysis for Addition and Subtraction
Programmable Logic Circuits
Errors in floating-point addition
Consider the positive operands xfp and yfp
xfp +fp yfp
= (1 + h)(xfp + yfp)
= (1 + h)(x + sx + y + ty)
= (1 + h)(1 +
sx + ty
x+y
)(x + y)
Magnitude of this ratio
is upper-bounded by
max(| s | |, | t |), so the
overall error is no more
than | h | + max(| s | |, | t |)
Errors in floating-point subtraction
Again, consider positive operands xfp and yfp
Dr. Amr Talaat
xfp -fp yfp
This term also
unbounded
for subtraction
= (1 + h)(xfp - yfp)
= (1 + h)(x + sx - y - ty)
= (1 + h)(1 +
sx - ty
x-y
)(x - y)
Magnitude of this ratio
can be very large if x and
y are both large but x – y
is relatively small (recall
that t can be negative)
ELECT 90X
Cancellation Error in Subtraction
Programmable Logic Circuits
xfp -fp yfp = (1 + h)(1 +
sx - ty
x-y
)(x - y)
Subtraction result
Example: Decimal FLP system, r = 10, p = 6, no guard digit
x = 0.100 000 000  103
xfp = .100 000  103
x + y = 0.544  10–4
y = –0.999 999 456  102
yfp = – .999 999  102
and
xfp + yfp = 0.1  10–3
xfp +fp yfp = 0.100 000  103 -fp 0.099 999  103 = 0.100 000  10-2
Relative error = (10–3 – 0.544  10–4) / (0.544  10–4)  17.38 = 1738%
Dr. Amr Talaat
Now, ignore representation errors, so as to focus on the effect of h
(measure relative error with respect to xfp + yfp, not x + y)
Relative error = (10–3 – 10–4) / 10–4 = 9 = 900%
ELECT 90X
Bringing Cancellation Errors in Check
Programmable Logic Circuits
xfp -fp yfp = (1 + h)(1 +
sx - ty
x-y
)(x - y)
Subtraction result
Example: Decimal FLP system, r = 10, p = 6, 1 guard digit
x = 0.100 000 000  103
xfp = .100 000  103
x + y = 0.544  10–4
y = –0.999 999 456  102
yfp = – .999 999  102
and
xfp + yfp = 0.1  10–3
xfp +fp yfp = 0.100 000  103 -fp 0.099 999 9  103 = 0.100 000  10-3
Relative error = (10–4 – 0.544  10–4) / (0.544  10–4)  0.838 = 83.8%
Dr. Amr Talaat
Now, ignore representation errors, so as to focus on the effect of h
(measure relative error with respect to xfp + yfp, not x + y)
Relative error = 0
Significantly better than 900%!
ELECT 90X
How Many Guard Digits Do We Need?
Programmable Logic Circuits
xfp -fp yfp = (1 + h)(1 +
sx - ty
x-y
)(x - y)
Subtraction result
Theorem: In the floating-point system FLP(r, p, chop(g)) with g  1 and
–x < y < 0 < x, we have:
xfp +fp yfp = (1 + h)(xfp + yfp)
with
–r –p+1 < h < r–p–g+2
Corollary: In FLP(r, p, chop(1))
xfp +fp yfp = (1 + h)(xfp + yfp)
with  h  < r –p+1
Dr. Amr Talaat
So, a single guard digit is sufficient to make the relative arithmetic
error in floating-point addition or subtraction comparable to relative
representation error with truncation
ELECT 90X
Invalidated Laws of Algebra
Programmable Logic Circuits
Many laws of algebra do not hold for floating-point arithmetic
(some don’t even hold approximately)
This can be a source of confusion and incompatibility
Associative law of addition:
a = 0.123 41  105
Dr. Amr Talaat
Results
differ
by more
than
20%!
a + (b + c) = (a + b) + c
b = – 0.123 40  105
c = 0.143 21  101
a +fp (b +fp c)
= 0.123 41  105 +fp (– 0.123 40  105 +fp 0.143 21  101)
= 0.123 41  105 –fp 0.123 39  105
= 0.200 00  101
(a +fp b) +fp c
= (0.123 41  105 –fp 0.123 40  105) +fp 0.143 21  101
= 0.100 00  101 +fp 0.143 21  101
= 0.243 21  101
ELECT 90X
Do Guard Digits Help with Laws of Algebra?
Programmable Logic Circuits
Invalidated laws of algebra are intrinsic to FLP arithmetic;
problems are reduced, but don’t disappear, with guard digits
Let’s redo our example with 2 guard digits
Associative law of addition:
a = 0.123 41  105
a + (b + c) = (a + b) + c
b = – 0.123 40  105
c = 0.143 21  101
a +fp (b +fp c)
= 0.123 41  105 +fp (– 0.123 40  105 +fp 0.143 21  101)
= 0.123 41  105 –fp 0.123 385 7  105
= 0.243 00  101
Dr. Amr Talaat
Difference
of about
(a +fp b) +fp c
0.1% is
= (0.123 41  105 –fp 0.123 40  105) +fp 0.143 21  101
better, but
= 0.100 00  101 +fp 0.143 21  101
still too high!
= 0.243 21  101
ELECT 90X
Unnormalized Floating-Point Arithmetic
Programmable Logic Circuits
One way to reduce problems resulting from invalidated laws of
algebra is to avoid normalizing computed floating-point results
Let’s redo our example with unnormalized arithmetic
Associative law of addition:
a = 0.123 41  105
a + (b + c) = (a + b) + c
b = – 0.123 40  105
c = 0.143 21  101
a +fp (b +fp c)
= 0.123 41  105 +fp (– 0.123 40  105 +fp 0.143 21  101)
= 0.123 41  105 –fp 0.123 39  105
= 0.000 02  105
Dr. Amr Talaat
(a +fp b) +fp c
= (0.123 41  105 –fp 0.123 40  105) +fp 0.143 21  101
= 0.000 01  105 +fp 0.143 21  101
= 0.000 02  105
ELECT 90X
Other Invalidated Laws
Programmable Logic Circuits
Associative law of multiplication
a  (b  c) = (a  b)  c
Cancellation law (for a > 0)
a  b = a  c implies b = c
Distributive law
a  (b + c) = (a  b) + (a  c)
Multiplication canceling division
a  (b /a) = b
Dr. Amr Talaat
Before the ANSI-IEEE floating-point standard became available
and widely adopted, these problems were exacerbated by the
use of many incompatible formats
ELECT 90X
Worst-Case Error Accumulation
Programmable Logic Circuits
In a sequence of operations, round-off errors might add up
The larger the number of cascaded computation steps (that depend
on results from previous steps), the greater the chance for, and the
magnitude of, accumulated errors
With rounding, errors of opposite signs tend to cancel each other
out in the long run, but one cannot count on such cancellations
Practical implications:
Perform intermediate computations with a higher precision than
what is required in the final result
Dr. Amr Talaat
Implement multiply-accumulate in hardware (DSP chips)
Reduce the number of cascaded arithmetic operations; So, using
computationally more efficient algorithms has the double benefit of
reducing the execution time as well as accumulated errors
ELECT 90X
Floating Point Special Functions
Programmable Logic Circuits
 Tang’s Table-Driven Exponential Function
Dr. Amr Talaat
ELECT 90X
EXP(X) Algorithm
Programmable Logic Circuits
 Exponent also uses additive range reductio
n, where:
 This means we break x into two parts
 2 raised to some exponent plus a fractional re
sidual.
Dr. Amr Talaat
ELECT 90X
EXP(X) Algorithm
Programmable Logic Circuits
 Consider X is a floating point number where:
 Where
 Then:
Dr. Amr Talaat
ELECT 90X
EXP(X) Algorithm
Programmable Logic Circuits
 Where:
Dr. Amr Talaat
 4 Steps Needed now:
 Step 1 Filter any out-of-bounds inputs that occu
r. NaN, out of range, zero, positive or negative i
nfinity, the algorithm would either be able to co
mpute it by an approximated arithmetic operatio
ns, or not able to solve it at all.
ELECT 90X
Step 2: Calculating N & r
Programmable Logic Circuits
 Start the computation by first calculating N,
 where INVL is a floating-point constant approxi
mately equal to 32/log 2, and INTEGER is the de
fault IEEE-754 round-to-nearest mode. This N is
composed of two parts,
Dr. Amr Talaat
m = N1/32, and N2 = j
ELECT 90X
Step 2: Calculating N & r
Programmable Logic Circuits
r = X- (N * log 2 / 32)
 Let r = r1+r2
 Where
|r2| <<
Dr. Amr Talaat
 L1 and L2 are constants, where L1 + L2 approximates log
2/32 to a precision higher than that of single precision (th
e working one).
ELECT 90X
Step 3: P(r)
Programmable Logic Circuits
Dr. Amr Talaat
ELECT 90X
Step 4: 2 j/32
Programmable Logic Circuits
 The values of 2j/32, j= 0,1,....32, are calculated
beforehand and represented by working-precisio
n numbers (single precision in our case), Slead
and Strail. Their sum approximates to roughly d
ouble the working precision.
 Finally exp(x) is calculated as follows:
Dr. Amr Talaat
ELECT 90X
Programmable Logic Circuits
Dr. Amr Talaat
ELECT 90X
Programmable Logic Circuits
Dr. Amr Talaat
ELECT 90X
Programmable Logic Circuits
Dr. Amr Talaat
ELECT 90X
Imp.
Programmable Logic Circuits
F.P
Dr. Amr Talaat
ELECT 90X
7/16/2015
49/21

PowerPoint 프레젠테이션 - German University in Cairo

Transcript PowerPoint 프레젠테이션 - German University in Cairo

Directory