Chapter9 - davidspellman

Download Report

Transcript Chapter9 - davidspellman

Chapter 9
Floating Point Arithmetic
9.1 Floating Point Formats
Common Format Components
• Each codes a normalized number whose
“binary scientific notation” would be
±1.dd…d x 2exp
• Sign bit
– 0 for positive and 1 for negative
• Exponent field
– Actual exponent exp plus a bias
– Bias gives an alternative to 2’s complement
• Fraction (“mantissa”) field
IEEE Single Precision Format
• 32-bit format
– Sign bit
– 8-bit biased exponent (the actual exponent in
the normalized binary “scientific” format plus
127)
– 23-bit fraction (the fraction in the scientific
format without the leading 1 bit)
• Generated by REAL4 directive
IEEE Double Precision Format
• 64-bit format
– Sign bit
– 11-bit biased exponent (the actual exponent
in the normalized binary scientific format plus
1023)
– 52-bit fraction (the fraction in the scientific
format without the leading 1 bit)
• Generated by REAL8 directive
Double Extended Precision Format
• 80-bit format
– Sign bit
– 15-bit biased exponent (the actual exponent
in a normalized binary scientific format plus
16,383)
– 64-bit fraction (the fraction in the scientific
format including the leading 1 bit)
• Generated by REAL10 directive
Floating Point Formats
format
exponent
total bits
bits
fraction
bits
approximate
maximum
approximate
minimum
approximate
decimal
precision
single
32
8
23
3.401038
1.1810-38
7 digits
double
64
11
52
1.7910308
2.2310-308
15 digits
extended
double
80
15
64
1.19104932
3.3710-4932
19 digits
• These are for normalized numbers
• Binary scientific notation mantissa written starting with 1 and
binary point
•
Zero cannot be normalized
• +0 represented by a pattern of all 0 bits
• Also formats for ±  and NaN ("not a number”)
9.2 80x86 Floating Point Architecture
Floating Point Unit
• FPU is independent of integer unit
• Eight 80-bit registers, organized as a stack
– ST, the stack top, also called ST(0)
– ST(1), the register just below the stack top
– ST(2), the register just below ST(1)
– ST(3), ST(4), ST(5), ST(6)
– ST(7), the register at the bottom of the stack
• Several 16-bit control registers, including
status word
Load Instructions
• fld realMemoryOperand
– Loads stack top ST with floating point data
value
– Values already on the stack are pushed down
• fld integerMemoryOperand
– Converts integer value to corresponding fp
value that is pushed onto the stack
• fld
st(nbr)
– Pushes a copy of st(nbr) onto the fp stack
More Loads and finit
• fld1
– Pushes 1.0 onto floating point stack
• fld0
– Pushes 0.0 onto fp stack
• fldpi
– Pushes  onto fp stack
…and others
• finit initializes the floating point
processor, clearing the stack
Store Instructions
• fst realMemoryOperand
– Copies stack top ST value to memory
• fstp realMemoryOperand
– Copies stack top ST value to memory and
pops the floating point stack
• fist integerMemoryOperand
– Copies stack top ST value to memory,
converting to integer
• fistp integerMemoryOperand
– Same as fist, but also pops the floating
point stack
Exchange Instructions
• fxch
– Exchange values in ST and ST(1)
• fxch st(nbr)
– Exchange ST and ST(nbr)
Addition Instructions
• fadd
– adds ST(1) and ST; pushes sum on stack
• fadd st, st(nbr)
– adds ST(nbr) and ST; sum replaces ST
• fadd st(nbr), st
– adds ST(nbr) and ST; sum replaces ST(nbr)
• faddp st(nbr), st
– adds ST(nbr) and ST; sum replaces ST(nbr);
old ST popped from stack
More Addition Instructions
• fadd realMemoryOperand
– Adds ST and real memory operand;
sum replaces ST
• fiadd integerMemoryOperand
– Adds ST and integer memory operand;
sum replaces ST
Subtraction Instructions
• fsub
– pops ST and ST(1); calculates ST(1) - ST;
pushes difference onto the stack
• fsub st(nbr), st
– calculates ST(nbr) - ST;
replaces ST(nbr) by the difference
• fsub st, st(nbr)
– calculates ST - ST(nbr);
replaces ST by the difference
More Subtraction Instructions
• fsub realMemoryOperand
– calculates ST - real number from memory;
replaces ST by the difference
• fisub integerMemoryOperand
– calculates ST - integer from memory;
replaces ST by the difference
• fsubp st(nbr), st
– calculates ST(nbr) - ST;
replaces ST(nbr) by the difference;
pops ST from the stack
Reversed Subtraction Instructions
• fsubr
– pops ST and ST(1); calculates ST - ST(1);
pushes difference onto the stack
• fsubr st(nbr), st
– calculates ST - ST(nbr);
replaces ST(nbr) by the difference
• fsubr st, st(nbr)
– calculates ST(nbr) - ST;
replaces ST by the difference
More Reversed Subtraction Instructions
• fsubr realMemoryOperand
– calculates real number from memory - ST;
replaces ST by the difference
• fisubr integerMemoryOperand
– calculates integer from memory - ST;
replaces ST by the difference
• fsubpr st(nbr), st
– calculates ST - ST(nbr);
replaces ST(nbr) by the difference;
pops ST from the stack
Multiplication Instructions
• fmul
– pops ST and ST(1);
multiplies these values;
pushes product onto the stack
• fmul st(nbr), st
– multiplies ST(nbr) and ST;
replaces ST(nbr) by the product
• fmul st, st(num)
– multiplies ST and ST(nbr);
replaces ST by the product
More Multiplication Instructions
• fmul realMemoryOperand
– multiplies ST and real number from memory;
replaces ST by the product
• fimul integerMemoryOperand
– multiplies ST and integer from memory;
replaces ST by the product
• fmulp st(nbr), st
– multiplies ST(nbr) and ST;
replaces ST(nbr) by the product;
pops ST from stack
Division Instructions
• fdiv
– pops ST and ST(1);
calculates ST(1) / ST;
pushes quotient onto the stack
• fdiv st(nbr), st
– calculates ST(nbr) / ST;
replaces ST(nbr) by the quotient
• fdiv st,st(nbr)
– calculates ST / ST(nbr);
replaces ST by the quotient
More Division Instructions
• fdiv realMemoryOperand
– calculates ST / real number from memory;
replaces ST by the quotient
• fidiv integerMemoryOperand
– calculates ST / integer from memory;
replaces ST by the quotient
• fdivp st(nbr),st
– calculates ST(nbr) / ST;
replaces ST(nbr) by the quotient;
pops ST from the stack
Reversed Division Instructions
• Similar to reversed multiplication--each
division instruction has a version that
reverses operands used as dividend and
divisor
• fdivr
• fdivr
• fdivr
• fdivr
• fidivr
• fdivpr
Miscellaneous Instructions
• fabs
– Absolute value: ST := | ST |
• fchs
– Change sign: ST := - ST
• frndint
– Rounds ST to an integer value
• fsqrt
– Replace ST by its square root
• There are also trigonometric, exponential and
logarithmic functions
Comparisons
• Each instruction compares ST with some
other operand
• Sets “condition code” bits 14, 10 and 8 in
the status word register
– These bits are named C3, C2 and C0
result of comparison
ST > operand
ST < operand
ST = operand
not comparable
C3
0
0
1
1
C2
0
0
0
1
C0
0
1
0
1
Comparison Instructions
• fcom
– compares ST and ST(1)
• fcom st(nbr)
– compares ST and ST(nbr)
• fcom realMemoryOperand
– compares ST and real number in memory
• ficom integerMemoryOperand
– compares ST and integer in memory
More Comparison Instructions
• ftst
– compares ST and 0.0
• fcomp
– compares ST and ST(1); then pops stack
• fcompp
– compares ST and ST(1); then pops stack
twice
Yet More Comparison Instructions
• fcomp st(nbr)
– compares ST and ST(nbr); then pops stack
• fcomp realMemoryOperand
– compares ST and real number in memory;
then pops stack
• ficomp integerMemoryOperand
– compares ST and integer in memory; then
pops stack
Status Word Access
• Conditional jump instructions look at bits in
flags register, not in status word. The
fstsw instructions provide access to the
status word bits.
• fstsw memoryWord
– copies status register to memory word
• fstsw AX
– copies status register to AX
• Similar instructions available for control
word
Comparison in 32-bit Mode
fcom
fstsw ax
sahf
jna endGT
;
;
;
;
ST > ST(1)?
copy condition code bits to AX
shift condition bits to flags
skip if not
• sahf copies AH into the low order eight
bits of the EFLAGS register
– Puts C3 in the ZF position (bit 6) and C0 in
the CF position (bit 0)
– Makes it possible to use conditional jump
instructions (unsigned mnemonics)
Comparison in 64-bit Mode
• sahf not available in 64-bit mode
• Two instructions directly set flags in the
flags register
• fcomi st, st(nbr)
– compares ST and ST(nbr)
• fcomip st, st(nbr)
– compares ST and ST(nbr); pops stack
9.3 Converting Floating Point
To and From ASCII
ASCII to Floating Point
• Algorithm similar to ASCII to integer:
value := 0.0;
point at first character of source string;
while (source character is a digit) loop
convert ASCII digit to 2's complement digit;
value := 10*value + float(digit);
point at next character of source string;
end while;
• Main difference is that you must divide the
final value by 10dig, where dig is the
number of digits after a decimal point
Floating Point to ASCII (1)
• Algorithm generates E-notation:
– a leading minus sign or a blank
– a digit
– a decimal point
– five digits
– the letter E
– a plus sign or a minus sign
– two digits
• These pieces generated one at a time
Floating Point to ASCII (2)
point at first destination byte;
if value  0
then
put blank in destination string;
else
put minus in destination string;
value := value;
end if;
point at next destination byte;
Make leading
character a
minus sign or a
blank
Floating Point to ASCII (3)
exponent := 0;
“Normalize” fp value to
if value ≥ 10
have single digit before
then
decimal point
repeat
divide value by 10;
add 1 to exponent;
until value < 10 loop
else
while value < 1 loop
multiply value by 10;
subtract 1 from exponent;
end while;
end if;
Floating Point to ASCII (4)
add 0.000005 to value;
if value ≥ 10
then
divide value by 10;
add 1 to exponent;
end if;
{ for rounding }
Continue to normalize
floating point value; get
first digit and decimal
point
digit := int(value);
{ truncate to integer }
convert digit to ASCII and store in destination string;
point at next destination byte;
store "." in destination string;
point at next destination byte;
Floating Point to ASCII (5)
for i := 1 to 5 loop
value := 10 * (value  float(digit));
digit := int(value);
convert digit to ASCII and store in destination string;
point at next destination byte;
end for;
Generate five digits after
the decimal point
Floating Point to ASCII (6)
store E in destination string;
point at next destination byte;
if exponent  0
then
put + in destination string;
else
put  in destination string;
exponent := exponent;
end if;
point at next destination byte;
Generate exponent
convert exponent to two decimal digits;
convert two decimal digits of exponent to ASCII;
store characters of exponent in destination string;
9.4 Single-Instruction
Multiple-Data Instructions
SIMD Instructions
• Single-instruction multiple-data (SIMD)
instructions operate on several operands
at once with a single instruction
• The Intel family has had some form of
SIMD instructions since the Pentium II
– MMX technology in Pentium II
– Several generations of streaming SIMD
extensions (SSE)
– All current 80x86 CPUs include these features
SSE
• First appeared in the Pentium III processor
• Eight new 128-bit registers, XMM0 through
XMM7
– 64-bit architecture added eight more XMM
registers, XMM8 through XMM15
• A single 128-bit register can hold four 32bit floating point numbers
SSE Instructions
• Packed SSE instructions operates on four
pairs of floating point numbers
simultaneously
• Scalar SSE instructions operate only on
the low-order operands, ignoring the other
three
Selected Scalar SSE Instructions
mnemonic operand 1 (dest)
operand 2 (source) action
movss xmmreg or mem32 xmmreg or mem32
destination := source
(at least one operand must be a register)
addss xmmreg
xmmreg or mem32 destination := destination + source
subss xmmreg
xmmreg or mem32 destination := destination - source
mulss xmmreg
xmmreg or mem32 destination := destination * source
divss xmmreg
xmmreg or mem32 destination := destination / source
sqrtss xmmreg
xmmreg or mem32 destination := sqrt(source)
rcpss
xmmreg or mem32 destination := 1/source
xmmreg
comiss xmmreg
xmmreg or mem32
compare operand1 and operand2;
set flags
Selected Packed SSE Instructions
mnemonic operand 1 (dest) operand 2 (source)
xmmreg or
movups
xmmreg or mem128
mem128
addps
xmmreg
xmmreg or mem128
subps
xmmreg
xmmreg or mem128
mulps
xmmreg
xmmreg or mem128
divps
xmmreg
xmmreg or mem128
action
destination := source
(at least one operand must be a register)
destination := destination + source
(four additions)
destination := destination - source
(four subtractions)
destination := destination * source
(four multiplications)
destination := destination / source
(four divisions)
Using Scalar SSE Instructions
• Similar to programming integer operations
with general registers in the 32-bit or 64-bit
mode
• comiss comparison instruction sets flags
in exactly as fcomi does for the floating
point unit
– “Unsigned” conditional jump instructions are
appropriate following comiss or fcomi
9.5 Floating Point Assembly
Language Procedures With C/C++
Why Use Assembly Language
Procedures?
• May be possible or easier or more efficient
to code parts of a program in assembly
language than in a high-level language
– Parts that need critical optimization
– Implementation of low-level algorithms
• The bulk of programming is usually better
done in a high level language
32-bit Linkages
• Decorate assembly language procedure
name with an underscore
– If C program calls roots, name the procedure
_roots
• To use cdecl protocol in a C++ program,
use the “C” declaration, for example,
extern "C" void roots(…);
• Push parameters on stack
• Return single float value in ST
64-bit Linkages
• No text decoration
• Pass floating point parameters in XMM0,
XMM1, XMM2 and XMM3
– Integer parameters in RCX, RDX, R8 and R9
• Return single float value in XMM0