CSE 1520 Computer Use: Fundamentals

Download Report

Transcript CSE 1520 Computer Use: Fundamentals

Week 4:
Data Representation: PART II
•
READING: Chapter 3
1
EECS 1520 -- Computer Use: Fundamentals
Representing Real Numbers
• Real numbers have a whole part and a fractional part
• Example of real numbers in base 10: 12.05, 35.75, ….
• We have learned how to convert the whole part to binary digits, the question
now is, how do convert the fractional part to binary numbers?
2
EECS 1520 -- Computer Use: Fundamentals
Representing Real Numbers
• Example:
12.45 =
1 * 101 + 2 * 100 + 4 * 10-1 + 5 * 10-2
6.451 =
6 * 100 + 4 * 10-1 + 5 * 10-2 + 1 * 10-3
• The positions to the right of the decimal point work the same way, except
that the powers are negative.
3
EECS 1520 -- Computer Use: Fundamentals
Representing Real Numbers
• Same rules apply to base 2
• Example:
What is the decimal representation of the binary number 01010.110?
1 * 23 + 1 * 20 + 1 * 2-1 + 1 * 2-2
= 8 + 2 + 0.5 + 0.25
= 10.75
4
EECS 1520 -- Computer Use: Fundamentals
Representing Real Numbers
• How do we convert a real number in base 10 to its binary representation?
• Example:
What is the binary representation of the decimal number 24.25?
2 steps:
1. find the binary number of the integer part
2. find the binary number of the fractional part
In step 2, we multiply by the new base, the fractional part of the result is
then multiplied by the new base. The process continues until the fractional
part of the result is zero
5
EECS 1520 -- Computer Use: Fundamentals
Number systems: Two’s Complement
Step 1: convert 24 to binary:
24/2
12/2
6/2
3/2
1/2
Q
12
6
3
1
0
R
0
0
0
1
1
11000
Step 2: convert 0.25 to binary
0.25*2
0.5*2
0.5
1.0
01
24.25 in binary is: 11000.01
6
EECS 1520 -- Computer Use: Fundamentals
Representing Real Numbers
• What is the binary representation of the decimal number 0.4?
0.4*2
0.8*2
0.6*2
0.2*2
0.4*2
0.8*2
.
.
.
0.8
1.6
1.2
0.4
0.8
1.6
.
.
.
Starts repeating
• We can say the binary representation of 0.4 is: 0.011001…..
• The more bits that you get, the closer you will get to 0.4
7
EECS 1520 -- Computer Use: Fundamentals
Representing Real Numbers
• Adding two unsigned binary numbers with the fractional part:
• Example:
0110.011
+ 0011.010
6.375
3.25
1001.101
9.625
8
EECS 1520 -- Computer Use: Fundamentals
Representing Real Numbers
• Subtracting two unsigned binary numbers with the fractional part:
• Example:
0101.101
- 0011.010
5.625
3.25
0010.011
2.375
• We can look at the corresponding decimal representation to confirm the final
results.
9
EECS 1520 -- Computer Use: Fundamentals
Representing Text
• English language includes 26 letters
• Uppercase and lowercase letter have to be treated separately
so 52 unique characters will be required
• Punctuation characters (i.e. ; “” ! ,)
• Numeric digits (i.e. actual character ‘0’, ‘1’,… ‘9’)
Etc…
Two character sets:
• 1. ASCII Character Set
• 2. The Unicode Character Set
10
EECS 1520 -- Computer Use: Fundamentals
ASCII Character Set
ASCII (American Standard Code for Information Interchange):
Each character is coded as 1 byte (8 bits)
• The codes are expressed as decimal numbers, these values are translated to
their binary equivalent for storage in the computer.
11
EECS 1520 -- Computer Use: Fundamentals
ASCII Character Set
ASCII (American Standard Code for Information Interchange):
Each character is coded as 1 byte (8 bits)
• For example, the character “5” is represented as ASCII value 53
8-bit binary string: 0011 0101
12
EECS 1520 -- Computer Use: Fundamentals
ASCII Character Set
• Example:
• Given that the ASCII code for B is 66, expressed as a decimal vale, what is
the ASCII code, in hexadecimal, for the letter G?
Characters in the ASCII table are arranged in alphabetical order, hence:
If the character “B” is 66 in decimal, ….”G” will be 71 in decimal
71 in decimal is 47 in hexadecimal
13
EECS 1520 -- Computer Use: Fundamentals
Unicode Character Set
• ASCII character set provides 256 characters (i.e. 8 bits)
• Only enough to cover the characters in English
• Unicode was designed to be a superset of ASCII, that is, the first 256
characters in the Unicode character set correspond exactly to the extended
ASCII character set
• 16-bit standard
• 65,536 possible codes (= 216)
14
EECS 1520 -- Computer Use: Fundamentals
Representing Text
• It is important to find ways to store and transmit text efficiently
Data compression: is a reduction in the amount of space needed to
store a piece of data
Compression ratio: is the size of the compressed data divided by the
size of the original data
15
EECS 1520 -- Computer Use: Fundamentals
Representing Text
• It is important to find ways to store and transmit text efficiently
• 3 types of text compression:
1. Keyword encoding
2. Run-length encoding
3. Huffman encoding
16
EECS 1520 -- Computer Use: Fundamentals
Representing Text: Keyword Encoding
• Idea: substitute a frequently used word with a single character
• Example:
Word
Symbol
as
*
the
-
and
+
that
$
must
&
well
%
“ To do well on the test”
“ To do % on - test”
22 characters (including space)
17 characters (including space)
Compression ratio = 17/22 = 0.773
17
EECS 1520 -- Computer Use: Fundamentals
Representing Text: Keyword Encoding
• Idea: substitute a frequently used word with a single character
Limitations:
- these characters can't be part of the text
- frequently used words tend to be short, so not much compression
- word variations not handled: The vs. the
18
EECS 1520 -- Computer Use: Fundamentals
Representing Text: Run-Length Encoding
• Also called recurrence coding
• A single character may be repeated over and over again in a long sentence.
This type of repetition doesn’t generally take place in English text, but can
occur in large data streams
• Idea: replace long series of a repeated character with a count of the repetition
• A sequence of repeated characters is replaced by:
• a “flag character”
• Followed by the repeated character
• Followed by a single digit that indicates how many times the character is
repeated
• Example: AAAAAAA can be replaced with *A7
19
EECS 1520 -- Computer Use: Fundamentals
Representing Text: Run-Length Encoding
• Example: replace AAAAAAA with *A7

The character “A” is represented as ASCII value 65

So replace: 01000001 01000001 01000001 01000001 01000001 01000001 01000001

With: 00101010 01000001 00000111
A
*
A
A
A
A
A
A
A
7
20
EECS 1520 -- Computer Use: Fundamentals
Representing Text: Run-Length Encoding
• Idea: substitute a frequently used word with a single character
Limitations:
- It’s not worth it to encode strings of two or three (i.e. it takes 3 characters to
encode a repetition sequence)
- In the case of two repeated characters, encoding would actually make the
string longer
21
EECS 1520 -- Computer Use: Fundamentals
Representing Text: Huffman Encoding
• Idea: Using variable-length bit strings to represent each character
• The approach is to:
Use only a few bits to represent characters that appear often and
Use longer bit strings for character that don’t appear often
• Example:
Huffman Code
Character
00
A
01
E
100
L
110
O
111
R
1010
B
1011
D
•
The word “BELL” would be:
1010 01 100 100
•
Only 12 bits are required
•
Compared to the fixed-size bit string, for
example, if 8 bits are required to represent each
character, it would need 32 bits
•
A compression ratio (vs ASCII) of 12/32 = 0.375!
22
EECS 1520 -- Computer Use: Fundamentals
Representing Text: Huffman Encoding
• Idea: Using variable-length bit strings to represent each character
• Example:
Huffman Code
Character
00
A
01
E
100
L
110
O
111
R
1010
B
1011
D
How about decoding the bit string with
Huffman Encoding?
For example, can we decode
0100111?
•
“0100111” would be decoded into the word “EAR”
23
EECS 1520 -- Computer Use: Fundamentals
Representing Audio Data
• A stereo sends an electrical signal to a speaker to produce sound, this signal
is an analog representation of the sound wave.
x(t)
The magnitude of the voltage signal
varies in direct proportion to the sound
wave
magnitude
t
24
EECS 1520 -- Computer Use: Fundamentals
Representing Audio Data
• To represent audio data on a computer, we must digitize the sound wave data.
• That is, we take the electric signal that represents the sound wave and
represent it as a series of discrete numeric values
Analog signal
Sampling period
To digitize the signal, we periodically measure the voltage of
the signal, a process called “Sampling”
Higher sampling rate produces better-quality sound
25
EECS 1520 -- Computer Use: Fundamentals
Representing Audio Data
• A sampling rate of around 40,000 times per second is enough to create a
reasonable sound reproduction
Part of the data is lost
High sampling rate
Low sampling rate
26
EECS 1520 -- Computer Use: Fundamentals
Representing Audio Data
• Popular audio data formats: WAV, AU, MP3
• All use some form of compression
• MP3 offers the strongest compression ratio than other formats
MP3
• MP3 is short for MPEG-2, audio layer 3 file
• Uses both lossy and lossless compression
- Lossless: bit stream compressed by a form of
Huffman encoding
- Lossy: uses mathematical models of human
psychoacoustics to discard information the
human ear can’t hear
27
EECS 1520 -- Computer Use: Fundamentals
Representing Images and Graphics
Representing Color:
• 3 basic colors: red, green, and blue
• Color is often expressed as an RGB (red-green-blue) value, which is actually
three numbers that indicate the relative contribution of each of these three
primary colors
• If each number in the triple is given on a scale of 0 to 255, 0 means no
contribution of that color and 255 means full contribution
28
EECS 1520 -- Computer Use: Fundamentals
Representing Images and Graphics
• The amount of data that is used to represent a color is called the “color depth”
• For example, TrueColor indicates a 24-bit color depth. With this scheme, each
number in an RGB value gets 8 bits, which gives the range of 0 to 255 for each
• This results in the ability to represent more than 16.7million unique colors!
29
EECS 1520 -- Computer Use: Fundamentals
Representing Images and Graphics
• For example, (255,255,0) means no contribution from “blue”, and this
corresponds to bright yellow”
“TrueColor” RGB values
Three-dimensional color space
30
EECS 1520 -- Computer Use: Fundamentals
Representing Images and Graphics
• A photograph is an analog representation of an image, it is continuous across
its surface
• Digitizing a picture means representing the picture as a collection of individual
dots called “pixels”
• Each pixel is composed of a single color
• The number of pixels used to represent a picture is called the “resolution”
31
EECS 1520 -- Computer Use: Fundamentals
Representing Images and Graphics
Digitized Images and Graphics:
• High resolution image
(i.e. more pixels)
• Low resolution image
• (i.e. less pixels)
32
EECS 1520 -- Computer Use: Fundamentals
Representing Images and Graphics
• The storage of image information on a pixel-by-pixel basis is called a rastergraphics format.
• Several popular raster-graphics file formats are: bitmap (BMP), JPEG
33
EECS 1520 -- Computer Use: Fundamentals
Representing Images and Graphics
• Another technique for representing images is called vector graphics.
• Instead of assigning colors to pixels, vector graphics format describes an
image in terms of lines and geometric shapes
• For example, a vector graphics can be a series of commands that describe a
line’s direction, thickness and color.
• Suitable for line art and cartoon-style drawing
34
EECS 1520 -- Computer Use: Fundamentals
Representing Videos
• Video (film) is a stream of images/frames (at 24 or 30 fps [frame per second])
• A video codec (COmpressor/DECompressor) refers to the methods used to
shrink the size of a movie
Two types of compression in video codec:
• Temporal compression: keyframe + series of delta frames that record only
changes from keyframe (good if image changes little, i.e. such as a scene that
has little movement)
• Spatial compression: removes redundant info within a frame (essentially jpeg
like compression on each frame). This compression often groups pixels into
blocks that have the same color (for example, a portion of a clear blue sky).
Instead of storing each pixel, the color and the coordinates of the area are stored.
35