Real Numbers

Transcript Real Numbers

Floating Point Arithmetic
• The goal of floating point representation is
represent a large range of numbers
• Important Terms
Given the number
-123.154 x 105
Sign = negative
Mantissa = 123.154
Exponent = 5
IEEE Binary Floating-Point
Representation
Storage of Floating Point Binary
Numbers
(Short Real or Single Precision Format)
31
1
Sign
30
23
11111111
Exponent
22
0
11111111111111111111111
Mantissa
Long Real(double precision – 64 bits) – 1 bit for sign, 11 bits for
exponent, 52 bits for mantissa
Storage Components
• The Sign
– The sign is positive(a 0 bit) or negative (a 1 bit)
• The Mantissa (Significand)
– The bits to the right of decimal point is the mantissa or significand.
– The numeral to the left of the decimal point is ALWAYS 1 (normalized
notation).
• The Exponent
– The exponent can be either positive or negative. The exponent is
biased by +127.
– The numeral to the left of the decimal point is ALWAYS 1 (normalized
notation).
The Significand
(Positional Notation)
The Significand Must be
Normalized
• 1234.567 = 1.234567 x 103
• Numbers are normalized by moving the decimal
point so that only one digit appears to the left of
the decimal point.
• 1101.101 = 1.101101 exponent = 3
• 0.00101 = 1.01
exponent = -3
• Note that the leading 1 is omitted from storage
IEEE Bit Representation
The Exponent is Biased by +127
Exponent Encoding
• Exponent encoding is bias 127. To get the
encoding, take the exponent and add 127 to it.
• If exponent is –1, then exponent field = -1 + 127
= 126 = 7Eh
If exponent is 10, then exponent field = 10 + 127
= 137 = 89h
Smallest allowed exponent is –126, largest
allowed exponent is +127. This leaves the
encodings 00H, FFH unused for normal
numbers.
BR 6/00
Floating Point Encoding
• The number of bits allocated for exponent will
determine the maximum, minimum floating point
numbers (range)
1.0 x 2 –max (small number) to
1.0 x 2 +max (large number)
• The number of bits allocated for the significand
will determine the precision of the floating point
number
• The sign bit only needs one bit (negative:1,
positive: 0)
BR 6/00
Convert Floating Point Binary Format to Decimal
1 10000001 01000000000000000000000
• What is the number shown?
• Sign bit = 1, so negative.
• Exponent field = 81h = 129.
Actual exponent = Exponent field – 127 = 129 –
127 = 2.
• Number is:
-1 . (01000...000) x 22
= -1 . (0 x 2-1 + 1 x 2-2 + 0 x 2-3 .. +0) x 4
= -1 . (0 + 0.25 + 0 +..0) x 4
= -1.25 x 4
• = -5.0.
BR 6/00
Convert FP Decimal to binary encoding
What is the number -28.75 in Single Precision Floating Point?
1. Ignore the sign, convert integer and fractional part to binary
representation first:
a. 28 = 1Ch = 0001 1100
b. .75 = .5 + .25 = 2-1 + 2-2 = .11
-28.75 in binary is - 00011100.11 (ignore leading zeros)
2. Now NORMALIZE the number to the format
1.mmmm x 2exp
Normalize by shifting. Each shift right add one to exponent, each
shift left subtract one from exponent:
- 11100.11 x 20 = - 1110.011 x 21
= - 111.0011 x 22
= - 1.110011 x 24
BR 6/00
Convert Decimal FP to binary encoding (cont)
Normalized number is: - 1.110011 x 24
Sign bit = 1
Significand field = 110011000...000
Exponent field = 4 + 127 = 131 = 83h =
1000 0011
Complete 32-bit number is:
1 10000011 110011000….000
• Sign exponent
mantissa
BR 6/00
Algorithm for converting fractional decimal to Binary
•
•
•
•
•
An algorithm for converting any fractional decimal
number to its binary representation is successive
multiplication by two (results in shifting left).
Determines bits from MSB to LSB.
Multiply fraction by 2.
If number >= 1.0, then current bit = 1, else current bit
= 0.
Take fractional part of number and go to ‘a’. Continue
until fractional number is 0 or desired precision is
reached.
Example: Convert .5625 to binary
.5625 x 2 = 1.125 ( >= 1.0, so MSB bit = ‘1’).
.125 x 2 = .25
( < 1.0 so bit = ‘0’)
.25 x 2
= .5
(< 1.0 so bit = ‘0’)
.5 x 2
= 1.0
( >= 1.0 bit = 1), finished.
.5625 = .1001b
BR 6/00
Overflow/Underflow, Double Precision
• Overflow in floating point means producing a
number that is too big or too small (underflow)
– Depends on Exponent size
– Min/Max exponents are 2 –126
is 10 -38 to 10 +38 .
to 2 +127
• To increase the range, need to increase number
of bits in exponent field.
• Double precision numbers are 64 bits - 1 bit
sign bit, 11 bits exponent, 52 bits for significand
• Extra bits in significand gives more precision, not
extended range.
BR 6/00
Special Numbers
• Min/Max exponents are 2 –126 to 2 +127 .
This corresponds to exponent field values of of 1 to
254.
• The exponent field values 0 and 255 are reserved for
special numbers . Special Numbers are zero, +/infinity, and NaN (not a number)
• Zero is represented by ALL FIELDS = 0.
• +/- Infinity is Exponent field = 255 = FFh, significand = 0.
+/- Infinity is produced by anything divided by 0.
• NaN (Not A Number) is Exponent field = 255 = FFh,
significand = nonzero. NaN is produced by invalid
operations like zero divided by zero, or infinity – infinity.
BR 6/00
Comments on IEEE Format
• Sign bit is placed in MSB for a reason – a quick
test can be used to sort floating point numbers
by sign, just test MSB
• If sign bits are the same, then extracting and
comparing the exponent fields can be used to
sort Floating point numbers. A larger exponent
field means a larger number since the ‘bias’
encoding is used.
• All microprocessors that support Floating point
use the IEEE 754 standard. Only a few
supercomputers still use different formats.
BR 6/00
Assigning Storage for Large
Numbers
• Dd (define doubleword) – 4-byte storage; Real
number stored as a doubleword is called a short
real.
–
–
–
–
Dd 12345.678
Dd +1.5E+02
Dd 2.56E+38
Dd 3.3455E-39
;largest positive exponent
;largest negative exponent
• Dq (Define quadword) -8-byte storage; long real
number (double in C,C++ and Visual)
– Dq 2.56E+307
;largest exponent
JM 11/02
Floating Point Architecture
(8087 Coprocessor)
• So far we have only dealt with integers
• The 8087 was the math coprocessor for
the original PC.
• With the 486, the FPU (floating point unit)
became part of the CPU chip.
• We will only look at the instruction set of
the original 8087 chip.
• Handles both integer and floating point
calculations.
Jm 11/02
Floating Point Registers
ST(0) = ST
Instruction Pointer
ST(1)
Operand Pointer
ST(2)
32-bit Registers
ST(3)
ST(4)
Control Word
ST(5)
Status Word
ST(6)
Tag Word
ST(7)
16-bit Registers
80-bit Registers
JM 11/02
Floating Point Unit (Coprocessor)
Data Registers
• 8 individually addressable 80-bit registers
– (ST(0), ST(1), ST(2)…ST(7))
– Arranged in stack format
• ST(0) = ST -> top of stack
• Control Registers
– 3 16-bit registers (control, status, tag)
– 2 32-bit registers (instruction pointer, operand
pointer)
JM 11/02
Floating Point Data
Register Stack
Floating Point Registers
ST(0) = ST
Instruction Pointer
ST(1)
Operand Pointer
ST(2)
32-bit Registers
ST(3)
ST(4)
Control Word
ST(5)
Status Word
ST(6)
Tag Word
ST(7)
16-bit Registers
80-bit Registers
JM 11/02
Transfer of Data
• Data must be in memory to be sent to the
coprocessor (not in the CPU)
• The coprocessor loads the number from
memory into its register stack, performs an
arithmetic operation, stores the result in
memory, and signals the CPU that it has
finished.
JM 11/02
Instruction Formats
• Begins with the letter F (to distinguish from CPU instructions)
• 2nd letter
–
–
–
–
–
–
B
binary coded decimal operand
I
binary integer operand
neither
assume real number format.
FBLD - load bcd number
FILD - load integer number
FMUL – real number multiply
• Can not use CPU registers (such as AX, BX) as
operands
JM 11/02
Floating Point Operations
•
•
•
•
•
•
Add
Sub
Subr
Mul
Div
Divr
Add source to destination
Subtract source from destination
Subtract destination from source
Multiply source by destination
Divide destination by source
Divide source by destination
JM 11/02
Basic Arithmetic Instructions
Instruction Form
Classical Stack
Classical Stack, Extra Pop
Register
Mnemonic Form
Operands
(Dest,Source)
Example
Fop
{ST(1), ST}
FADD
FopP
{ST(1), ST}
FSUBP
ST(n), ST
ST, ST(n)
FMUL ST(1),ST
FDIV
ST,ST(3)
ST(n), ST
FADDP
ST(2),ST
Fop
Register, pop
FopP
Real Memory
Fop
{ST}, memReal
FDIVR
Integer Memory
FIop
{ST}, memInt
FSUBR hours
Instruction Forms
• Classical stack
– No explicit operands needed
– (ST, source; ST(1) destination)
– FADD
; ST(1)=ST(1) + ST
; pop ST
ST
100.0
ST(1)
20.0
Before
– FSUB
120.0
After
;ST(1) = ST(1) – ST; pop ST
Instruction Forms
• Register
– Uses coprocessor registers as ordinary
operands (one must ST)
FADD st, st(1)
FDIVR st, st(3)
FIMUL st(2), st
;st = st + st(1)
;st = st / st(3)
;st(2) = st(2) * st
JM 11/02
Instruction Forms
• Register Pop
– Identical to register except st is popped at end
– FADDP st(1), st
; ST(1)=ST(1)+ST
; pop ST
; ST(0) = ST(1)
ST
200.0
200.0
ST(1)
32.0
232.0
Before
Intermediate
JM 11/02
232.0
After
Instruction Forms
• Real Memory and Integer Memory
– Have an implied first operand, ST
– Second operand, explicit, is an integer or real
– FADD Myreal_op
– FIADD MyInteger_op
JM 11/02
;st = st + myreal_op
;st = st + myinteger_op
Initialize Instruction
finit
• Finit
– initialize floating point processor
– Should come first in code
– Clears registers
JM 11/02
Load Instructions
fld, fild
• Fld – load a real memory operand into ST(0)
• Fild – load an integer memory operand into ST(0)
.data
op1
op2
dd
dw
finit
fld
fld
op1
op2
6.0
3
;floating point value
;integer value
.code
6.0
3.0
??
6.0
JM 11/02
Store Instructions
fst, fstp
• fst mem_location
– (Float store)
– Store value in ST into memory
• fstp mem_location
– (Float store, and pop)
– Store value in ST(0) into memory and then
pop stack
JM 11/02
Reverse Polish Notation
(operands are keyed in before their operators)
Evaluating a postfix expression
6 2 * 5 +
– When reading an operand from input
• push it on stack
– When reading an operator from input
• pop the two operands located at the top of
the stack
• perform the selected operation on the
operands
• push the result back on the stack.
JM 11/02
TITLE FPU Expression Evaluation
(Expr.asm)
; Implementation of the following expression:
; (6.0 * 2.0) + (4.5 * 3.2)
; FPU instructions used.
; Last update: 10/8/01
INCLUDE Irvine32.inc
; 32-bit Protected mode program.
.data
array
REAL4 6.0, 2.0, 4.5, 3.2
dotProduct REAL4 ?
.code
main PROC
finit
fld
fmul
fld
fmul
fadd
fstp
exit
main ENDP
END main
array
array+4
array+8
array+12
dotProduct
; initialize FPU
; push 6.0 onto the stack
; ST(0) = 6.0 * 2.0
; push 4.5 onto the stack
; ST(0) = 4.5 * 3.2
; ST(0) = ST(0) + ST(1)
; pop stack into memory operand
Register Stack Example
Instruction
Register Stack
fld op1
ST = 6.0
fld op2
ST = 2.0
ST(1) = 6.0
fmul
ST = 12.0
fld op3
ST = 5.0
ST(1) = 12.0
fsub
ST = 7.0
JM 11/02
Other Instructions
• fmul
fdiv
fdivr
fsqrt
fsin
fcos
;st(1) = st(1)* st(0), pop
;st(1) = st(1)/ st(0), pop
;st(1) = st(0)/ st(1), pop
;st(0) = square root(st(0))
;st(0) = sine(st(0));
;st(0) = fcos(st(0));
BR 6/00

Real Numbers

Transcript Real Numbers

Directory