Ch.6 Fixed-Point vs. Floating Point

Download Report

Transcript Ch.6 Fixed-Point vs. Floating Point

Ch.5 Fixed-Point vs. Floating
Point
5.1 Q-format Number Representation
on Fixed-Point DSPs
• 2’s Complement Number
– B = bN-1…b1b0
– Decimal Value D = - bN-1 2N-1 + …+ b121+ b0
– There is a dynamic range limitation.
• The Q-format can be used to help prevent
overflow in multiplication.
5.1 Q-format
• Q-format or fractional representation
– Implied binary point is moved to the left.
– F(B)= - bN-1 20 + bN-2 21 +…+ b12-(N-2)+ b02 -(N-1)
– The programmer keeps track of the binary point.
• Example: Q-15
– 16 bit numbers—1 sign bit and 15 fractional bits.
– Multiplication of 2 such numbers gives a Q-30
number.
– The result can be truncated to keep the most
significant 15 fractional bits, and dropping the
extended sign bit—See Fig. 5.2
Problems with Q Format
• There can be precision loss with the Qformat—Figure 5-5 illustrates the concept
with the Q-12 example.
• Addition and subtraction can still be a
problem—scaling can be used to help.
6.2 Finite Word Length Effects on
Fixed-Point DSPs
• Coefficients in digital filters will be saved in
fixed-point formats in fixed-point DSP
implementations.
• The finite word length quantization effect is
similar to input data quantization
introduced by an A/D converter.
5.1 Finite Word Length Effects (p.2)
• In IIR filters, the fixed-point representation
of the coefficients can cause the poles to
shift in the z-plane.
• The amount of shift due to the quantization
of a single coefficient is influenced by the
positions of all the other poles.
• To reduce this effect, IIR filters are often
implemented as a cascade of 2nd order
systems.
5.2 Finite Word Length Effects (p.3)
• The frequency response of the implemented
system is also affected by the quantization of
coefficients in the difference equation.
• Finally, coefficient quantization can also lead to
limit cycles in IIR filters—this means that in the
absence of an input, the response of stable
system to a unit impulse could result in
undamped oscillations.
5.3 Floating-Point Number
Representation
• C67x processor supports single precision
and double precision floating-point
representations.
• The formats are shown in Figure 5.6 and
5.7.
5.4 Overflow and Scaling
• Scaling is the simplest correction method for
overflows in fixed-point implementations.
• This can be implemented in most filtering and
transform applications.
• The input is scaled down for processing and
the output is then scaled back up.
• Right shifting (dividing by 2) is an easy way to
implement scaling.
• The shifting can occur until the overflows
disappear from the computations.
5.4 Overflow and Scaling (p.2)
• Scaling of filter coefficients can also be
used to avoid overflows.
• It can be shown that the condition to
prevent overflow is
– ∑ | h[k] | ≤ 1 for k = 0 to N
• For IIR filters N is taken large enough so
that the remaining values are negligible.