Block Floating Point

Download Report

Transcript Block Floating Point

DSP C5000
Chapter 13
Numerical Issues
Copyright © 2003 Texas Instruments. All rights reserved.
Learning Objectives

Data formats

Fixed point: integer and fractional numbers

Use methods for handling multiplicative
and accumulative overflow

Floating point

Block floating point

Comparison of formats
ESIEE, Slide 2
Copyright © 2003 Texas Instruments. All rights reserved.
Data Formats and Numerical Issues



Common data sizes: 8, 16, 24, 32 bits
Fixed or floating point
For a given technology:



Processors of the ‘C5000 family are
fixed point processors.

ESIEE, Slide 3
Fixed point is faster and less expensive
But fixed point programming is more
difficult
But they can also execute floating point
operations through software
Copyright © 2003 Texas Instruments. All rights reserved.
Digital Representation of a Signal


Sampling
ADC Analog to Digital Conversion



ESIEE, Slide 4
Quantization
Coding of the quantized value
Digital representation used in DSP
Copyright © 2003 Texas Instruments. All rights reserved.
Digital Coding of Data and Arithmetic

Finite precision:




ESIEE, Slide 5
Representation uses a given number of bits
Fixed point
Floating point
Block floating point
Copyright © 2003 Texas Instruments. All rights reserved.
Interface ADC - DSP - DAC
A D
D A
C C
D
S
P
Possible Conversions:
fixed point
floating point
A or mu law
linear law
(Compression-Expansion)
ESIEE, Slide 6
Copyright © 2003 Texas Instruments. All rights reserved.
Binary Representation of Signed Integers
used in ADC-DAC or DSP
in Fixed Point Format




ESIEE, Slide 7
2’s Complement (digital processors)
1’s Complement
Sign, magnitude
Offset Binary
Copyright © 2003 Texas Instruments. All rights reserved.
Fixed Point Arithmetic
2’s Complement Representation
ESIEE, Slide 8
Copyright © 2003 Texas Instruments. All rights reserved.
Example of Size 3 bits for Integers,
Decimal and Binary Representations
Positive Positive Signed Signed Signed
Signed
integers integers integers integers integers integers
Offset
Sign +
decimal Binary Decimal
Decimal
binary
magnitude
7
111
3
111
3
011
6
110
2
110
2
010
5
101
1
101
1
001
4
100
0
100
0
000
3
011
-1
011
0
100
2
010
-2
010
-1
101
1
001
-3
001
-2
110
0
000
-4
000
-3
111
Weights
2 1 0
2 2 2
ESIEE, Slide 9
Copyright © 2003 Texas Instruments. All rights reserved.
Example of Size 3 bits for Integers,
Decimal and Binary Representations
Signed
integers
Decimal
3
2
1
0
-1
-2
-3
-4
Signed integers
1's complement
Signed integers
2's complement
011
010
001
0 0 0 or 1 1 1
110
101
100
011
010
001
000
111
110
101
100
y =-2N
ESIEE, Slide 10
x
1
y =-2 N
x
Copyright © 2003 Texas Instruments. All rights reserved.
Representation of Signed Integers
in 2’s Complement Format
x  b N -1
bk
x 0 x =
b0
N -1
-k
b
2
 k
k =0
x  0  y = 2N - x  y =
N -1
-k
b
2
 k
k =0
x = -2 N -1 b N -1 
ESIEE, Slide 11
N-2
-k
b
2
 k
k =0
Copyright © 2003 Texas Instruments. All rights reserved.
Non-Integer Numbers Using Fixed Point


Format Qk : k fractional bits associated with
negative power of 2.
The binary representation of a number x in
format Qk is the 2’s complement
representation of the integer y:
y=2 x
k
Integer Part
b N -1- k
Fractional Part
,
b1b 0 b -1
x = -bN -1- k 2 N -1- k  bN - 2- k 2 N - 2- k 
ESIEE, Slide 12
b-k
b0  b-1 2-1 
b- k 2- k
Copyright © 2003 Texas Instruments. All rights reserved.
Some Properties of
2’s Complement Representation
Max number=2 N -1 - 1
Min number=-2 N-1
Circular Representation: (OVM, SATD)
(2 N -1 - 1)  1 = 2 N -1  -2 N -1
Sign bit Extension: (SXM, SXMD)
Related status bits in C5000 DSP
OVM = OVerflow Mode of the C54 DSPs on C54 DSPs
SATD = SATuration mode of the D unit on C55 DSPs
SXM = Sign eXtension Mode on C54 DSPs
SXMD = Sign eXtension Mode of the D unit on C55 DSPs
ESIEE, Slide 13
Copyright © 2003 Texas Instruments. All rights reserved.
Addition and Subtraction
Using 2’s Complement

Simple hardware operator: to add 2
signed N-bit integers with a result of size
N bits. Whatever the sign of numbers, it
is sufficient to add the 2’s complement
values.
0
-1
Carry
1
-2
2
-3
3
111
+ 111
-------1 110
010
+ 001
-------0 011
110
+ 011
-------1 001
110
+ 001
-------0 111
-4
Overflow (intermediate)
OV=1
ESIEE, Slide 14
Copyright © 2003 Texas Instruments. All rights reserved.
Multiplying and Shifting in 2’s Complement




Simple hardware operator but more
difficult than with a sign-magnitude
representation.
The product of 2 N-bit numbers needs
support for 2N-bit results.
Generally, the product register is of size
2N bits => 2 identical MSB (1bit left
shift).
Booth Algorithm (on 3 bits)
AB=-4A(b2-b1)-2A(b1-b0)-A(b0-0)

ESIEE, Slide 15
k bits right Arithmetic shifting: sign bit
extension necessary.
Copyright © 2003 Texas Instruments. All rights reserved.
Sign eXtension Mode SXM or SXMD



With 2’s complement, when 16-bit data
are loaded into a 32-bit accumulator, the
sign bit is also extended.
This sign extension may be annoying: e.g.
Calculation of 16-bit addresses.
The user can choose whether or not to
use sign bit extension mode.


ESIEE, Slide 16
SXM = Sign eXtension Mode bit in the
status word ST1 in C54 DSPs.
SXMD = Sign eXtension Mode bit for the D
unit in the status word ST1_55 in C55 DSPs
Copyright © 2003 Texas Instruments. All rights reserved.
Sign Bit Extension

Example data size 6 bits, Accumulator size 12 bits
Data
1 0 1 0 0 1
Loading of ACCU with sign extension
1 1 1 1 1 1 1 0 1 0 0 1
Loading of ACCU without sign extension
0 0 0 0 0 0 1 0 1 0 0 1
ESIEE, Slide 17
Copyright © 2003 Texas Instruments. All rights reserved.
Addition Overflow


When adding 2 numbers of size N bits,
the result may need N+1 bits.
Example for integers of N=3 bits:



ESIEE, Slide 18
3+3 = 6 cannot be represented using 3 bits,
but can be expressed using 4 bits.
In format Q2 of N=3 bits, 0.75 + 0.5 =1.25
cannot be represented using 3 bits, needs 4
bits.
When adding M numbers of N bits, the
result potentially needs N+ log2(M) bits.
Copyright © 2003 Texas Instruments. All rights reserved.
Using Saturation


Overflows in 2’s complement create
unexpected sign changes and peaks that
are difficult to filter.
Saturation arithmetic detects the
overflow and replaces the result with a
saturation value.
Example, max value = 0.75
1
0.8
Saturation at 0.75
0.6
0.4
0.2
0
2’s complement overflow
-0.2
-0.4
-0.6
-0.8
-1
ESIEE, Slide 19
0
0.2
0.4
0.6
0.8
1
Copyright © 2003 Texas Instruments. All rights reserved.
Setting saturation modes with OVM or SATD

The user can choose whether or not to use
saturation mode by setting the
corresponding mode bits.

OVM = OVerflow Mode bit in status word
ST1 in C54 DSPs.

If OVM = 1:



SATD = SATuration mode bit for the D unit
in the status word ST1_55 in C55 DSPs.


If SATD = 1 and M40 =0, same as for C54 DSP
If SATD=1 and M40 =1


ESIEE, Slide 20
positive results are saturated to 00 7FFF FFFF
Negative results are saturated to FF 8000 0000.
positive results are saturated to 7F FFFF FFFF
Negative results are saturated to 80 0000 0000.
Copyright © 2003 Texas Instruments. All rights reserved.
Saturation mode for the A unit in C55 DSPs

SATA = SATuration mode bit for the Aunit ALU in the status word ST3_55 in
C55 DSPs.

If SATA=1, if a calculation in the A-unit
results in an overflow:


ESIEE, Slide 21
positive results are saturated to 7FFF
Negative results are saturated to 8000.
Copyright © 2003 Texas Instruments. All rights reserved.
Effect of 2’s Complement Overflow



As 2’s complement is a circular representation,
if the result holds on N bits, the intermediate
overflows do not alter the final result
This is not the case for saturation
Example of N = 3 bits:


Calculate x = 3+2-4, the theoretical result is 1
With 2’s complement overflow:



With saturation:



ESIEE, Slide 22
Calculate first y=(3+2)= 011+010 =101 =-3 overflow
Then (y-4)=101+100=1 001 = 1 and carry =1 correct result
Calculate first y=(3+2)=3 saturation
Then (y-4) = 011+100=111=-1 wrong result
If a system has a unity gain, saturation should
not be used.
Copyright © 2003 Texas Instruments. All rights reserved.
Example of 2’s Complement Binary
Representations

Represent x = 1.75 using N=6 bits in
format Q3


Represent x = -1.75 using N=6 bits in
format Q3


Answer 110.0 10 = - 4 +2+1/4
Represent x = 1. 805 using N=6 bits in
format Q3

ESIEE, Slide 23
Answer 001.110 = 1 +1/2 +1/4
Answer 001.110 = 1 + 1/2 + 1/4
Copyright © 2003 Texas Instruments. All rights reserved.
Operations with Fractional Numbers using
Fixed Point Format

Addition: align on same size N and align
bits with same weight.
Qk  Qk  Qk

Multiplication: product requires 2N bits
Qk  Qk '  Qk  k '
ESIEE, Slide 24
Copyright © 2003 Texas Instruments. All rights reserved.
Example of 2’s Complement Binary
Operations


Data size N=6, format Q3
Product 12 bits, Q6


Product 1.75 x 2.5 = 4.375
Binary representation:


Sum 6 bits, format Q3


Sum 1.75 + 1.5 = 3.25
Binary representation:

ESIEE, Slide 25
001.110 x 010.010 = 000100.011000
001.110 + 001.100 = 011.010
Copyright © 2003 Texas Instruments. All rights reserved.
Accumulator and size of the result




The final result of a calculation usually
uses more than 16 bits (size of memory
words).
ACCUs use 32, 40, 56 ... Bits
If we want to save the result in a single
memory word, the question is:
Which pack of N bits must be saved
from accumulator?


ESIEE, Slide 26
Possibility of overflow and underflow
Overflow during accumulation or during
saving.
Copyright © 2003 Texas Instruments. All rights reserved.
ACCUMULATOR


Possibility of overflow and underflow
Scaling when adding many products
Gu a rd bits
39
32
ACCU Hig h
31
ACCU Lo w
16
15
0
16 bits to s a v e
ESIEE, Slide 27
Copyright © 2003 Texas Instruments. All rights reserved.
Saturation on store mode, SST bit




ESIEE, Slide 28
SST = mode bit in PMST (C54) or
ST3_55 (C55) status word.
If SST is set, the CPU saturates a shifted
or unshifted accumulator value before
storing it.
The saturation value depends on the
value of the sign extension mode bit.
ACCU remains unchanged.
Copyright © 2003 Texas Instruments. All rights reserved.
Example of Fixed Point Processing
y(n)=x(n)+a1y(n-1)


Data size N=16, product size 32 bits, accumulator size 40 bits
The coefficient a1 is smaller than 1: format Q15. Format of
data = Q15, accumulator size 40bits
Accumulator
39
32
31
16
15
0
a1y(n-1), Q30
39
32
31
16
15
0
16
15
0
a1y(n-1), Q30
+
39
32
x(n), Q15
31
y(n), Q15
16 bits to save
ESIEE, Slide 29
Copyright © 2003 Texas Instruments. All rights reserved.
Representation of Sum of Products

The basic sum of M products operation,
for data and coefficients of size N bits:
M -1
y(n) =  bk x(n - k)
k =0



ESIEE, Slide 30
Needs 2N bits for each product +
log2(M) bits for the sum of M products,
or maximum 2N+log2(M) bits.
The C5000 DSP has Accumulators of
size 32+8 bits that allow for the sum of
256 products without overflow.
If M>256, may necessitate scaling of
data
Copyright © 2003 Texas Instruments. All rights reserved.
Solutions to Overflow



Overflow multiplication can be prevented by
using pure fractional numbers (< 1)
Saturation of the result
Scaling of the inputs and use of fractional
arithmetic


Use double precision or double word



ESIEE, Slide 31
But decreases speed of calculation
Use DSP with larger accumulators.


But loss of precision
8 guard bits in the’C5000 accumulators.
Design system with unity gain.
Use floating point
Copyright © 2003 Texas Instruments. All rights reserved.
Products of size 2N or 2N-1 Bits? 1 of 3


The product of 2 data values of size N
bits can be stored using 2N-1 bits,
except where the two most negative
numbers are multiplied together.
Example of size N=3 bits for integer
values.




Example on N=16 bits and Q15 format:

ESIEE, Slide 32
The integer values are between –4 and +3.
All the products are between –16 and +15
and can be written on 2N-1=5 bits,
Except –4 x –4 = 16.
-1 x –1 = +1 cannot be written on 31 bits in
Q30.
Copyright © 2003 Texas Instruments. All rights reserved.
Products of Size 2N or 2N-1 Bits? 2 of 3



Consider the case of data < 1 using
N=16 bits, Q15 format.
Their products are < 1 and can be
expressed using 32 bits format Q30 with
2 sign bits
It is possible with the C5000 DSP to
automatically eliminate one sign bit by a
left shift of 1 bit, thus obtaining a Q31
result.

ESIEE, Slide 33
If bit FRCT in ST1 is set to 1, products are
automatically shifted left by 1 bit.
Copyright © 2003 Texas Instruments. All rights reserved.
Products of Size 2N or 2N-1 Bits? 3 of 3

The exception –1 x –1 can be treated
using the SMUL status bit that saturate
the result of the multiplication before
accumulation. –1 is equal to 8000 in
hexadecimal on 16 bits.

If SMUL=1, SATD or OVM=1, FRCT =1


ESIEE, Slide 34
The product of (1)8000 x (1)8000 is saturated
to the positive number 7FFF FFFF after the
multiplication and before accumulation in
MAC or MAS instructions.
Consistent with ETSI-GSM specifications.
Copyright © 2003 Texas Instruments. All rights reserved.
Fixed Point Programming




ESIEE, Slide 35
Perpetual compromise between dynamic
range and precision constraints
Keep enough bits to represent the
integer part of the result
Keep enough bits in the fractional part
to satisfy the precision.
Rounding results.
Copyright © 2003 Texas Instruments. All rights reserved.
Entering Non-Integer Values using the
Software Development Tools

The tools do not support fractions

To store 0.707 in Q15 use:


To store 3.252 in Q13 use:


.word 8192*3252/1000
Generally, to convert a real number x
using 2’s complement representation
with size N bits and format Qk:


ESIEE, Slide 36
.word 32768*707/1000
Calculate the integer y=round(x 2k)
The 2’s comp. representation of y is the 2’s
comp. representation of x in format Qk.
Copyright © 2003 Texas Instruments. All rights reserved.
Some more stuff on Saturation
Two saturation methods exist:
 Manual: using the SAT instruction (ACx only)
AC0
128
1
SAT AC0
0
-1
-128

ESIEE, Slide 37
Auto: using the SATA/SATD or OVM control bits

SATA affects TAx registers (T0-3/AR0-7) in A unit
ex:
7FFFh + 2 = 7FFFh
ex:
8001h - 3 = 8000h

SATD affects AC0-3 registers in D unit
(ST1_55M40 = 0) 00.7FFF.FFFF or FF.8000.0000
(ST1_55M40 = 1) 7F.FFFF.FFFF or 80.0000.0000
- Affects ST0_55ACxOV and can be tested
Copyright © 2003 Texas Instruments. All rights reserved.
Rounding
$
$
$
$



How do you round this amount to the nearest $ ?
1.53
- Add $0.50
0.50
- Partial result
2.03
- Truncate result (to nearest $)
2.
Instructions RND in C54 DSPs or ROUND in C55 DSPs,
rounds the content of the accumulator.
For the C55, 2 kinds of rounding: biaised or unbiaised,
depending on the bit RDM in ST2_55.
Biased Rounding (ST2_55RDM = 0) or round to the infinite
- Direct: ROUND AC0
- Store: MOV uns(rnd(HI(saturate(AC0)))),*AR1

ESIEE, Slide 38
rnd() and ROUND perform the following operation:
(add 1 to bit 15) and (truncate)
(ACx+0x8000) & 0xFFFF0000
Copyright © 2003 Texas Instruments. All rights reserved.
Other Useful Stuff...

ESIEE, Slide 39
Absolute Value
ABS AC0,AC1
2’s Complement
NEG AC0,AC1
1’s Complement
NOT AC0,AC1
1-bit division
SUBC Smem,ACx
Normalization
MANT; EXP
Setting ST1_55 SMUL, FRCT, SATD = 1 will saturate (-1 x -1) to
7FFF_FFFFh prior to adding/subtracting to/from the
accumulator. This ensures a 1 cycle ETSI-compatible
operation and prevents temporary overflow.
Copyright © 2003 Texas Instruments. All rights reserved. - 39
Copyright © 2003 Texas Instruments. All rights reserved.
Floating Point Arithmetic
ESIEE, Slide 40
Copyright © 2003 Texas Instruments. All rights reserved.
Floating Point Representation

Number x -> Mantissa M and Exponent E
x = M2E


If M is of size m bits and E is of size e bits,
then x is of size N = m + e bits
Range of positive numbers for 0.5 |M| <1
and 2’s comp. representation of M and E:


 1 - 2e -1
1- m
2 e -1 -1 
, 1- 2
2
 2 2

ESIEE, Slide 41
Copyright © 2003 Texas Instruments. All rights reserved.
Normalization of the mantissa

The decomposition of a real value x into
the product of a mantissa and an
exponent term is not unique:




M must be normalized to make the
decomposition unique.
The normalization is a constraint
applied to M


ESIEE, Slide 42
x=M12E1=M22E2 …
Example: 12.8=0.8 24 and also 12.8= 1.6 23
for example: 0.5  |M| < 1
The ratio of the limits of the interval must
be smaller than 2 to have the same
exponent.
Copyright © 2003 Texas Instruments. All rights reserved.
Floating Point Representation


Non-linear scale:
The precision decreases geometrically
while the data size increases.
m- 2
m- 2
2values
8xmin
2values
4xmin
2xmin xmin
0
For a given number of bits:
the number of bits of the mantissa determines the precision
the number of bits of the exponent determines the dynamic
range.
ESIEE, Slide 43
Copyright © 2003 Texas Instruments. All rights reserved.
Floating Point Overflow or Underflow

Very unlikely to occur
Overflow
2 e -1 -1

x 2
1 -  2 e-1 
Underflow x  2
2
ESIEE, Slide 44
Copyright © 2003 Texas Instruments. All rights reserved.
Floating Point Addition Operator
A  B = Ma 2

ESIEE, Slide 45
 Mb 2
Eb
=  M a  Mb 2
Eb - Ea
2
Ea
It is necessary to denormalize the
smallest number (B)


Ea
Its mantissa is multiplied by 2Eb-Ea before
being added to Ma.
Loss of precision due to the rounding of
the mantissa
Copyright © 2003 Texas Instruments. All rights reserved.
Floating Point Multiplication Operator
A  B = M a Mb 2




ESIEE, Slide 46
Ea  Eb
It is necessary to normalize MaMb
1 extra bit would be necessary to
prevent overflow of Ea+Eb.
2m-1 bits are necessary to represent
MaMb
If M is truncated to m bits, the absolute
error increases rapidly.
Copyright © 2003 Texas Instruments. All rights reserved.
Examples of Floating Point DSP

Some DSP devices of the C6000 family:


ESIEE, Slide 47
C67xx support both single and double
precision format.
The C5000 DSP are fixed point DSP but
can be programmed in floating point if
necessary.
Copyright © 2003 Texas Instruments. All rights reserved.
Example of Floating Point Representation

Represent x=1.75 in Floating Point



Solution:


ESIEE, Slide 48
Use N=8, Mantissa size m=5 bits, exponent
size e=3 bits, M and E in 2’s complement
Mantissa normalized to 0.5 |M| <1
E=1 in binary representation: 001
M=0.875 in binary representation 0.1110
Copyright © 2003 Texas Instruments. All rights reserved.
Comparison of Fixed and Floating Point
Formats

Fixed point: linear scale



Floating point: non-linear scale with a
geometrical progression


ESIEE, Slide 49
Absolute error more or less constant
SNR decreases when the input decreases
Relative error more or less constant
SNR more or less constant over the full
data range
Copyright © 2003 Texas Instruments. All rights reserved.
Quantization Error and SNR with
Fixed-Point
d = x - x
( rounding )
q
d 
2
E (d ) = 0
E d
2
=
SNR dB
SNR dB  10log10 
ESIEE, Slide 50
For x  x max
2
x
2
2
d
q
=
12
  x2 
= 10 log10  2 
 d 
  6N -10log  x
10
2
max
3
  10log10  2 
Copyright © 2003 Texas Instruments. All rights reserved.
Quantization Error with Floating Point
d = x - x (rounding)
d m = rounding error on mantissa
1 - m-1
0  dm  2
2
d dm
d r = relative error on x = =
x M
ESIEE, Slide 51
Copyright © 2003 Texas Instruments. All rights reserved.
Quantization Error and SNR with
Floating Point
dr  2
-  m -1
For x random with "fast variations"
d r white noise uncorrelated with x
d = xˆ - x = xd r
 d2 =  x2 d2
r
SNR dB = 6m  1.44
ESIEE, Slide 52
Copyright © 2003 Texas Instruments. All rights reserved.
Comparison of Fixed Point and
Floating Point SNR
100
86dB
RSB en dB
80
Fixed point SNR
73 dB
60
Floating point SNR
40
20
0
-20
-100
-50
0
50
100
Signal Power in dB
Example for N=16 bits: m=12 e=4
ESIEE, Slide 53
Copyright © 2003 Texas Instruments. All rights reserved.
Comparison of Fixed Point and
Floating Point


For N bits, with Floating Point format
there is a compromise between dynamic
range (E), and precision (M).
Example for N=32 bits:
Fixed Point 32 bits
Dynamic Range
Precision max
Floating Point: m=24 b=8
 109
Dynamic Range
 1077
 9digits
Precision
 7digits
Dynamic range is defined as the ratio of the largest positive
value on the smallest non zero positive value
ESIEE, Slide 54
Copyright © 2003 Texas Instruments. All rights reserved.
Fixed Point vs. Floating Point

Fixed Point:



Floating Point:


ESIEE, Slide 55
Simple operators of addition and
multiplication
But it is necessary to monitor overflow and
underflow in order to keep precision and
dynamic range at their best.
Greater dynamic range and simpler
programming
More complex operators, so the
performances in terms of speed or power
consumption are not so good as those of
fixed point DSP.
Copyright © 2003 Texas Instruments. All rights reserved.
IEEE 754 Floating Point Format 1 of 4


Most processors respect the IEEE 754
format for Floating Point representation
of numbers.
IEEE format for N=32 bits:
32 bits = 1 bit (Sign bit) + 8 bits (Exponent) + 23 bits (Fraction)
Exponent: offset binary, offset = 127, exponent=expo-127
Mantissa: sign-magnitude, normalized between 1.0...0 and 1.1...1
Hidden bit 1,... .Only the fractional part (Fraction) is stored.
When exponent not equal to 0, |Mantissa = 1.fraction
e.g. : x=28=1,75 24  0 10000011 1100...0
sign
(expo-127)
value = (-1) * (1.fraction) * 2
for non-zero exponent
ESIEE, Slide 56
Copyright © 2003 Texas Instruments. All rights reserved.
Dynamic Range of IEEE 754 Single Precision
Floating Point Format 2 of 4
value = (-1)sign * (1.fraction) * 2(expo-127)
for non-zero exponent
 Largest positive number:




Smallest positive number (non-zero)



ESIEE, Slide 57
Max exponent = 254-127=127
Max Mantissa = 2-2-23
Max positive value = (2 -2-23)x2127  2128
Min exponent = 1-127
Min Mantissa = 1.0
Min positive value = 1.0 x 2-126
Copyright © 2003 Texas Instruments. All rights reserved.
IEEE 754 Single precision
Floating Point Format, Special Cases 3 of 4



ESIEE, Slide 58
Zero: 32 bits are 0
Underflow: exponent < 1
Overflow: exponent > 254
Copyright © 2003 Texas Instruments. All rights reserved.
IEEE 754 Floating Point Format 4 of 4

Double precision 64 bits:




ESIEE, Slide 59
1+11+52
Exponent offset binary: offset= 1023
Extended simple precision 43 bits :
1+11+31
Extended Double precision 79 bits:
1+15+63
Copyright © 2003 Texas Instruments. All rights reserved.
Block Floating Point
ESIEE, Slide 60
Copyright © 2003 Texas Instruments. All rights reserved.
Block Floating Point



This is not a DSP format
This is a way of doing floating point
operations efficiently on a fixed point
DSP
Natural approach for block operations
such as the Fast Fourier Transform
(FFT).

ESIEE, Slide 61
See details in chapter 19.
Copyright © 2003 Texas Instruments. All rights reserved.
Block Floating Point





ESIEE, Slide 62
A register contains the value of the
exponent (constant) to be applied to a
block of data: BLOCK EXPONENT
The mantissa is of size N bits.
Each data block is tested and scaled by
the exponent in order to avoid
overflows.
Useful when N is small (e.g.: N=16 bits)
Limits the loss of precision due to the
increase in dynamic range of floating
point.
Copyright © 2003 Texas Instruments. All rights reserved.