Lecture 12: Quantization

Download Report

Transcript Lecture 12: Quantization

Lecture 12: Number representation
and Quantization effects
Instructor:
Dr. Gleb V. Tcheslavski
Contact:
[email protected]
Office Hours: Room 2030
Class web site:
http://ee.lamar.edu/gleb/ds
p/index.htm
ELEN 5346/4304
DSP and Filter Design
Fall 2008
1
2
Representation of numbers
Up to this point, we were considering implementations of discrete-time systems
without any considerations of finite-word-length effects that are inherent in any
digital realization, whether in hardware or software.
Let us consider first two different representations of numbers.
1. Fixed-point representation.
A real number X is represented as:
X   b A ,..., b1 , b0 , b1 ,...bB r 
B
i
b
r
 i ,0  bi  (r 1)
(12.2.1)
i  A
Where bi represents the digit, r is the radix or base, A is the number of integer
digits, and B is the number of fractional digits. For example:
123.4510  1102  2 101  3 100  4 101  5 102
101.012  1 22  0  21  1 20  0  21  1 22
ELEN 5346/4304
DSP and Filter Design
Fall 2008
3
Representation of numbers
We will focus our attention on the binary representation as most important for
DSP. In this case r = 2 and the digits {bi} are called binary digits or bits. They
take the values {0, 1}. The binary digit b-A is called the most significant bit
(MSB) of the number, and the binary digit bB is called the least significant bit
(LSB) of the number. The “binary point” between the digits b0 and b1 does not
exist explicitly and the logics assumes location of this point.
By using an n-bit integer format (A = n-1, B = 0), we can represent unsigned
integer numbers from 0 to 2n-1. More frequently, the fractional format (A = 0, B
= n-1) is used with a binary point between b0 and b1 that can represent
numbers from 0 to 1-2-n. Any integer or mixed number can be represented in a
fraction format by factoring out the term r A.
There are three formats to represent negative numbers. The format for the
positive numbers is the same: the MSB is set to zero
B
X  0.b1b2 ...bB   bi 2i , X  0
i 1
ELEN 5346/4304
DSP and Filter Design
Fall 2008
(12.3.1)
4
Representation of numbers
The negative numbers can be represented by:
1) Sign-Magnitude format: MSB is set to 1 to represent “-”
X SM  1.b1b2 ...bB  for X  0
(12.4.1)
2) One’s-Complement Format:
X1C  1.b1b2 ...bB  for X  0
(12.4.2)
Where bi  1  bi is the complement of bi (i.e., we replace ones by zeros and
zeros by ones for all bits).
B
X1C  1 20   1  bi   21  2  X 2 B
3) Two’s-Complement Format:
(12.4.3)
i 1
X 2C  1.b1b2 ...bB  forX  0
(12.4.4)
Where  is modulo-2 addition. For example, -3/8 is obtained by complementing
0011 (3/8) to obtain 1100 and then adding 0001, which yields 1101 to represent
-3/8 in the two’s-complement format.
ELEN 5346/4304
DSP and Filter Design
Fall 2008
5
Representation of numbers
The basic operations of addition and multiplication depend on the format used.
Most fixed-point digital signal processors use two’s-complement arithmetic,
therefore, the range for (B + 1) bit number ranges from -1 to 1-2-B.
In general, the multiplication of two fixed-point numbers each of b bits in length
results in a product of 2b bits of length. The product is either truncated or
rounded back to b bits resulting either in truncation or rounding errors.
A fixed-point representation allows to cover a range of numbers, say, xmax –
xmin with a fixed resolution:
xmax  xmin

m 1
where m = 2b is the number of levels and b is the number of bits.
ELEN 5346/4304
DSP and Filter Design
Fall 2008
(12.5.1)
6
Representation of numbers
2. Floating-point representation.
Covers a larger dynamic range by representing the number X as
X  2E M
(12.6.1)
where M is a mantissa – the fractional part of the number: 0.5  M  1, E
(exponent) is either negative or positive number. Both mantissa and exponent
require additional sign bits for representing negative numbers. For example:
X1  5M1  0.101000;E1  011;
X 2  3 8M 2  0.111000;E2  101
Multiplication of two floating-point numbers is done by multiplying their mantissas
and adding their exponents. Addition of two floating-point numbers requires that
the exponents must be equal, which can be achieved by shifting the mantissa of
the smaller number to the right and compensating by increasing the
corresponding exponent. This, in general, may lead to loss of precision.
ELEN 5346/4304
DSP and Filter Design
Fall 2008
7
Representation of numbers
Overflow occurs in the multiplication of two floating-point numbers when the sum
of the exponents exceeds the dynamic range of the fixed-point representation of
the exponent.
The floating-point representation allows us to cover a larger dynamic range than
the fixed-point representation by varying the resolution across the range. The
distance between two successive floating-point numbers increases as the
numbers increase in size. Also, the floating-point representation provides finer
resolution for small numbers but coarser resolution for large numbers.
ELEN 5346/4304
DSP and Filter Design
Fall 2008
8
Quantization
1. Fixed-point: truncation
To truncate a fixed-point number from
(+1) bits to (b+1) bits, we just discard the
least significant (-b) bits. The truncation
error is denoted by
 t  Q( X )  X
(12.8.1)
Here Q(X) is the truncated version of the number X. For a positive X, the error is
equal to zero if all bits being discarded are zeros and is largest if all discarded bits
are ones.
(2b  2  )  t  0
ELEN 5346/4304
DSP and Filter Design
Fall 2008
(12.8.2)
9
Quantization
For a negative X, the truncation error will be different for three different formats:
1) Sign-Magnitude:
0   t  2b  2 
(12.9.1)
0   t  2b  2 
(12.9.2)
  2b  2     t  0
(12.9.3)
2) One’s-complement:
3) Two’s-complement:
ELEN 5346/4304
DSP and Filter Design
Fall 2008
10
Quantization
2. Fixed-point: rounding
In case of rounding, the number is quantized to the nearest quantization level.
The rounding error does not depend on the format used to represent negative
numbers:
1 b
1 b

  2  2    r   2  2  
2
2
In practice,  >> b, therefore, 2-  0 in all expressions considered.
ELEN 5346/4304
DSP and Filter Design
Fall 2008
(12.10.1)
11
Quantization
3. Floating-point
Considering a floating-point representation
Q  X   2E Q  M 
(12.11.1)
X  2E M
(12.11.2)
of a number
Quantization is carried out on the mantissa only in case of floating-point
numbers. Therefore, it is more reasonable to consider the relative error.

Q X   X QM   M

X
M
(12.11.3)
In practice, a rounding quantizer can be modeled as follows:
Q  X   2 B  round  X  2 B 
ELEN 5346/4304
DSP and Filter Design
Fall 2008
(12.11.4)