Data Representation - United International College

Download Report

Transcript Data Representation - United International College

Introduction to Information Technology
Lecture 3
Data Representation
Dr. Ken Tsang 曾镜涛
Email: [email protected]
http://www.uic.edu.hk/~kentsang/IT/IT3.htm
Room E408 R9
1
Outline









Distinguish between analogue and digital information
Explain data compression and compression ratios
Examine the binary formats for negative values
Describe the characteristics of the ASCII and
Unicode character sets
Explain the nature of sound and its representation
Explain how RGB values define a colour
Look at representing Audio Information
Look at representing Images & Graphics
Look at representing Video Information
2
Data Representation

Data comes in many forms



Numbers: 235, 11.01, -24, …
Text: “hello, world!” “你好!”
Audio: .mp3
Images and graphics: .bmp, gif, JPEG
 Video: .avi
All of the data is stored in computers as binary digits
Data must be represented in a way that





Captures the essence of the information
And in a form that is convenient for computer
processing
3
Data Compression



Data compression
 Reduction in the amount of space needed to store
a piece of data
Compression ratio
 The size of the compressed data divided by the
size of the original data
Data compression techniques can be
 lossless, which means the data can be retrieved
without any loss of the original information,
 lossy, which means some information may be lost
in the process of compaction
4
WinRAR
Currently the best archiver
 WinRAR Tutorial

http://users.pandora.be/soulmaniacs/winrar.html
5
Data about the world around us
Is the physical world around us smooth
and continuous?
 In the microscopic level, materials are
all make up of molecules and atoms,
energies are all in units of quanta.
 A smooth and continuous physical world
is an illusion due to our limited senses.

6
Analogue Data: an example




Analogue: something that is analogous or
similar to something else (Webster)
Analogue Data: The use of continuously
changing quantities to represent data.
A mercury thermometer is an analogue
device. The mercury rises and falls in a
continuous flow in the tube in direct
proportion to the temperature.
The mathematical idealization of this
smooth change as a continuous function
leads to “Analogue Data”, an infinite
amount of data
7
From Analogue to Digital data

Data can be represented in one of two ways:
analogue or digital:
Analogue data: A continuous representation
(using mathematical function or smooth curve)
, analogous to the actual information it
represents
Digital data: A discrete representation,
breaking the information up into separate
elements (data)
8
Digital data in computer
Computer components are discrete in
nature
 Computer memory and other hardware
(e.g. cpu) have only finite room to store
and manipulate data
 The goal is to represent enough of the
world to satisfy our computational
needs and our senses of sight and sound

9
Digitized Information



Computers, cannot deal with analogue
information
So we digitize information by breaking it
into pieces and representing those pieces
separately
Why do we use binary?

Modern computers are designed to use and
manage binary values because the devices
that store and manage the data are far less
expensive and far more reliable if they only
have to represent one of two possible values
10
Electronic Signals




An analogue signal continually fluctuates in
voltage up and down
A digital signal has only a high or low state,
corresponding to the two binary digits
All electronic signals (both analogue and
digital) degrade as they move down a line
The voltage of the signal fluctuates due to
environmental effects
11
Analogue and Digital
Information

Periodically, a digital signal is reclocked to
regain its original shape
An analogue and a digital signal
Degradation of analogue and digital signals
12
Binary Representation




One bit can be either 0 or 1
Therefore, one bit can represent only two
things
To represent more than two things, we need
multiple bits
Two bits can represent four things because
there are four combinations of 0 and 1 that
can be made from two bits: 00, 01, 10, 11
13
Binary Representation
Represents
2 numbers
4
8
16
32
14
Binary Representation



n
In general, n bits can represent 2 things
n
because there are 2 combinations of 0 and
1 that can be made from n bits
Note that every time we increase the
number of bits by 1, we double the number
of things we can represent
Questions:


How many bits are needed to represent 128 things?
How many bits are needed to represent 67 things?
15
Representing Negative Values

You have used the signed-magnitude
representation of numbers before


The sign represents the ordering/direction
The digits represent the magnitude of the
number
16
Representing Negative Values

Problems with the sign-magnitude
representation



There are two representations of zero (plus
zero and minus zero, +0 and -0) which can
cause unnecessary complexity
Problem to represent the negative sign
If we allow only a fixed number of values
(stored in n-bits), we can represent numbers
as just integer values, where half of them
represent negative numbers
17
Representing Negative Values


For example, if the maximum number of decimal
digits we can represent is two, we can let 1
through 49 be the positive numbers 1 through 49
and let 50 through 99 represent the negative
numbers -50 through -1
This representation of negative numbers is called
the ten’s complement
18
Advantages of Using 10’s
Complement

To perform addition within this scheme, you
just add the numbers together and discard
any carry
19
Advantages of Using 10’s
Complement


A-B=A+(-B). We can subtract one number
from another by adding the negative of the
second to the first
Addition and subtraction become the same
20
2’s Complement
3 bits:
000
0
001
+1
010
+2
011
+3
100
-4
101
-3
110
-2
111
-1
8 bits:
21
Overflow



Overflow occurs when the value that we
compute cannot fit into the number of bits
we have allocated for the result
For example, if each value is stored using
eight bits, adding 127 to 3 causes overflow
Overflow is a classic example of the type of
problems we encounter by mapping an
infinite world onto a finite machine
22
Overflow
1111111
+ 0000011
10000010
127
+ 3
23
Representing Text


A text document can be decomposed into chapters,
paragraphs, sentences, words, and ultimately
individual characters
To represent a text document in digital form, we
simply need to be able to represent every character
that may appear


In English, “a, b, …, z, A, B,…Z”
The general approach for representing characters
is to list them all and assign each a binary string

‘a’  (01100001)2  (97)10  61h
24
Character Set
A character set is a list of
characters and the codes used to
represent them
 By agreeing to use a particular
character set, computer
manufacturers have made the
processing of text data easier
 ASCII, Unicode, etc.

25
ASCII
ASCII stands for American Standard
Code for Information Interchange
 The ASCII character set originally
used seven bits to represent each
character, allowing for 128 unique
characters
 Later ASCII evolved so that all eight
bits were used which allows for 256
characters

26
ASCII
27
ASCII

Note that the first 32 characters
in the ASCII character chart do
not have a simple character
representation that you could
print to the screen (unprintable)
28
Unicode Character Set


Extended version of the ASCII character
set is not enough for international use
The Unicode character set uses 16 bits per
character


Therefore, the Unicode character set can
represent 216, or over 65 thousand, characters
Unicode was designed to be a superset of
ASCII

The first 256 characters in the Unicode
character set correspond exactly to the
extended ASCII character set
29
Unicode
30
Representing Audio Information
We perceive sound when a series of air
compressions vibrate a membrane in our
ear, which sends signals to our brain
 A stereo sends an electrical signal to a
speaker to produce sound
 This signal is an analogue representation
of the sound wave
 The voltage in the signal varies in direct
proportion to the sound wave

31
Representing Audio Information

To digitize the signal we periodically
measure the voltage of the signal and
record the appropriate numeric value


A process called sampling
In general, a sampling rate of around
40,000 times per second is enough to
create a reasonable sound
reproduction
32
Representing Audio Information
33
Representing Audio Information
• A compact disk (CD) stores
audio information digitally
• On the surface of the CD are
microscopic pits that represent
Binary digits
•A low intensity laser is
pointed as the disc
•The laser light reflects
strongly if the surface is
smooth and reflects poorly if
the surface is pitted
34
Representing Audio Information

Audio Formats


WAV, AU, AIFF, VQF, and MP3
MP3 is dominant




MP3 is short for MPEG (Moving Picture Experts Group)
audio layer 3 file
MP3 employs both lossy and lossless compression
First it analyzes the frequency spread and compares it
to mathematical models of human psychoacoustics (the
study of the interrelation between the ear and the
brain), then it discards information that can’t be heard
by humans
Then the bit stream is compressed to achieve additional
compression
35
Representing Colour
Colour is our perception of the various
frequencies of light that reach the
retinas of our eyes
 Our retinas have three types of colour
photoreceptor cone cells that respond
to different sets of frequencies
 These photoreceptor categories
correspond to the colours of red,
green, and blue

36
Representing Colour
Color is often expressed in a computer
as an RGB (red-green-blue) value, which
is actually three numbers that indicate
the relative contribution of each of
these three primary colours
 For example, an RGB value of (255,
255, 0) maximizes the contribution of
red and green, and minimizes the
contribution of blue, which results in a
bright yellow

37
Three Dimension Colour Space
(0,0,0)
(1,1,1)
38
Representing Images and
Graphics


The amount of data that is used to
represent a colour is called the colour depth
HiColour is a term that indicates a 16-bit
color depth


Five bits are used for each number in an RGB
value and the extra bit is sometimes used to
represent transparency
TrueColour indicates a 24-bit colour depth

Each number in an RGB value gets eight bits
39
Indexed Color
• A particular application such as a browser
may support only a certain number of
specific colors, creating a palette from
which to choose.
• For example:
40
Digitized Images and Graphics




Digitizing a picture is the act of representing
it as a collection of individual dots called
pixels
The number of pixels used to represent a
picture is called the resolution
Storage of image information on a pixel-bypixel basis is called a raster-graphics format
Several popular raster file formats including
bitmap (BMP), GIF, and JPEG
41
BMP
42
Digitized Images and Graphics
High Resolution
43
Digitized Images and Graphics
Low Resolution
44
Representing Video
Video codec (COmpressor/DECompressor)
refers to the methods used to shrink the
size of a movie to allow it to be played on
a computer or over a network
 Almost all video codecs use lossy
compression to minimize the huge amounts
of data associated with video
 The goal is not to lose information that
affects the viewer's senses

45
Video Players




QuickTime Player (Apple)
Real Player
VLC media player
Microsoft Media Player
46
Summary









Distinguished between analogue and digital information
Explained data compression and compression ratios
Examined the binary formats for negative values
Described the characteristics of the ASCII and
Unicode character sets
Explained the nature of sound and its representation
Explained how RGB values define a colour
Looked at representing Audio Information
Looked at representing Images & Graphics
Looked at representing Video Information
47
The links

(for your website) to the glossary, PDF
(single) and PDF (2x2) are here:
http://www.uic.edu.hk/~davetowey/teaching
/CS/it1010/lectures/3.Glossary.pdf
http://www.uic.edu.hk/~davetowey/teaching
/CS/it1010/lectures/3.Data.Representation.
pdf
http://www.uic.edu.hk/~davetowey/teaching
/CS/it1010/lectures/2x2_3.Data.Represent
ation.pdf
48