Introduction to Computers and Terminology

Download Report

Transcript Introduction to Computers and Terminology

Data Representation
CS280 – 09/13/05
Binary (from a Hacker’s dictionary)
A base-2 numbering system with only two digits, 0 and
1, which is perfectly suited for electronic operations
since it can be expressed by power states (on/off),
voltage levels (high/low) or charge
(positive/negative), but is less than ideal for humans,
who find it awkward to say things like “It’s a Catch –
10110 situation,” “He’s the 11011010-pound gorilla,”
and “That’s the 10110010011011001-dollar question.”
“There are only 10 kinds of people in the world…those
that understand binary and those that do not.” (from
the ACM CS t-shirt).
Looking at more data
 Character representations
Data and Data Representation
 So how is this data that we operate on stored
in the computer?
Let’s start with numbers
 Binary codes are Base 2
 We “think” and operate in Base 10.
 What does this mean?
Counting
 Base 10 has 10 digits to represent different
numbers of things
 Base 2 has only 2 digits available.
 Counting
Base 10
0 1
2
3
4
5
6
7
8
then we run out of unique digits.
So we move to a positional system.
10 – means we have ten things – a 1 in the 10’s
position and no more things in the 1’s position.
9
Binary counting
0
1
 then we run out of digits
10
This represents the number 2. A 1 in the 2’s
position and a 0 in the 1’s position.
Positional notation
10’s positions represented
2’s positions represented
 1 * 100 = 1
 1 * 20 = 1
 1 * 101 = 10
 1 * 21 = 2
 1 * 102 = 100
 1 * 22 = 4
 1 * 103 = 1,000
 1 * 23 = 8
 1 * 104 = 10,000
 1 * 24 = 16
 1 * 105 = 100,000
 1 * 25 = 32
 1 * 106 = 1,000,000
 1 * 26 = 64
How does the computer then store
numbers?
Let’s say we want to represent the number 53 in
binary.
5310 = 110101
Why?
See chart next page.
Converting from binary to decimal
Use chart









1 * 20 = 1
1 * 21 = 2
1 * 22 = 4
1 * 23 = 8
1 * 24 = 16
1 * 25 = 32
1 * 26 = 64
1 * 27 = 128
1 * 28 = 256

75 decimal
Must use 7 bits
xxxxxxx
75 – 64 = 9
1xxxxxx
32 and 16 are not used
100xxxx
9–8=1
1001xxx
What is the general subtraction
algorithm to convert from binary to
decimal number?
4 and 2 are not used last digit is 1
1001001
Converting binary to decimal
 274 – decimal
 1011 – binary
 4 * 100 =
 1 * 20 =
4
 7 * 101 = 70
 2 * 102 = 200
total
274
 1 * 21 =
 0 * 22 =
 1 * 23 =
total
1
2
0
8
11
What is the general algorithm for taking a number in base X and
converting it to its base 2 equivalent?
Number representation
 Numbers are represented by their
corresponding binary representation


We are disregarding sign
We are disregarding floating point
 What about other kinds of data?
Think about the binary values as a
kind of code.
The binary values represent codes
 How many different values can be stored in 1
bit?
 How many in 2 bits?
 How many in 4 bits?
 How many in a byte?
General form encoding
 If you have x possible unique symbols, and y
positions for any one of those symbols, then
the general number of unique codes is
xy
 Example, you have 2 dice each of which has
6 different face values, so there are 36 or 62
possible unique codes.
ASCII codes represent characters of
data
 Use 1 byte or 8 bits
 Unicode extends the Ascii codes by another
byte.
 ASCII can form most of the characters used
by “Western” languages along with
punctuation symbols.
 Unicode allows for special symbols and
symbols in other languages like Japanese,
Chinese, Arabic
Figure 8.7. ASCII, The American Standard Code for
Information Interchange (page 220)
Reading the chart
 Left column is the left side of the byte (group
of 8 bits) (another term is the high order)
 Right column is the right side of the byte.
 Value is the corresponding binary code.
Binary to hex
 Hexidecimal (base 16) codes can be used to
represent groups of 4 binary digits.
 Hexidecimal counting:
0123456789ABCDEF
A = 10
B = 11
C = 12
D = 13
E = 14
F = 15
Binary
1010
1011
1100
1101
1110
1111
So the letter Z can be abbreviated
0101 1010 in binary
5
A in hex
Commonly binary numbers are represented in
groups of 4 numbers with the leading 0’s
used as placeholders.
Hex numbers are shown as 2 digit with a space
in between each group of two.
Encoding – character string
 Text or character strings are typically
contiguously stored in memory.
 Assume that each character takes up one
byte of space, how many bytes would be
required for a phone number (we are using a
slightly different example than the book. Note
the hyphens and spaces:
568 - 8771
568 – 8771 – requires 10 bytes
5
6
8
8
7
7
1
0011 0101
0011 0110
0011 1000
0010 0000
0010 1101
0010 0000
0011 1000
0011 0111
0011 0111
0011 0001
3516
3616
3816
2016
2E16
2016
3816
3716
3716
3116
5310
5410
5610
3210
4510
3210
5610
5510
5510
4910
In class assignment
 Using the chart on page 220, what is your first
(or nick) name in ASCII binary codes?
 Work with your partner.
 Write the first name (spread out).
 Write the binary code for each letter of your
name based on the ASCII chart.
 Convert at least one of those binary codes to
the decimal (base 10) equivalent.
What about other kinds of data?
Chapter 11 material
Pixels
 A pixel is like a dot. Your computer screen is
composed of thousands of pixels.
 How many?
 Settings – Control Panel – Display – Settings
 Screen area is the dimensions expressed in
terms of pixels. Higher the number the better
the resolution.
Each pixel
 Has a color associated with it.
 Colors are a combination of red, green, and
blue light – RGB
 The intensity of the particular color defines
how much of that color contributes to the
overall color displayed.
 Each color is associated with a 1 byte code.
In one byte we can have values from 0 (no
color) to 1 (full intensity).
Color
 See example in Word document
 Black is coded 0
0
0
red green blue
 White is coded 255 255 255
We will also use this feature when we code
HTML colors.
Sound
 Analog – real world – infinitely continuous
 Digital – representation - discrete
 Sound is a continuous series of sound waves.
 To digitize we cannot capture every infinite
value that hits our ears.
 But we can sample the values.
Figure 11.8. Sound wave. The horizontal axis is time;
the vertical axis is sound pressure.
Figure 11.9. Two sampling rates; the rate on the
right is twice as fast as that on the left.
Figure 11.11. (a) Three-bit precision for samples
requires that the indicated reading be approximated as
+10. (b) Adding another bit makes the sample twice as
accurate.
Figure 11.10. Schematic for analog-todigital and digital-to-analog conversion.
Sampling
 While we lose some information in this
process, it is usually negligible in terms of our
ability to perceive the sounds.
But to produce sounds
 Requires a large amount of data.
 For example, at a 16 bit representation of
each sound, it would take 10 megabytes to
reproduce 1 minute of a song.
 Compression – Remove the parts of the
sound that we cannot hear. – MP3 format.
Images
 Images have the same problem.
 If each image is made up of thousands of
pixels, and each pixel requires 3 bytes of
data, then each image is huge.
 JPEG format compresses the digital
representation to remove the differences in
hues of a picture that we cannot perceive.
 Then we can compress by using run-length
compression to code the remaining bits.
Run-length compression
If my bit pattern is:
0000000000000000000000001111111111111100000000
0000000011111111111111001001
We can code a value to indicate that we have:
24 0’s followed by 14 1’s followed by 16 0’s, etc.
When we have many changing values in the pattern, it
will not save us much space, but by making patterns
of identical pixels, you can save a good deal of data
space.
Lossy vs lossless conversion
 Lossless – no loss of data in the conversion
 Lossy – there is loss of data
 Run-length coding is lossless. You can convert the
original to a compressed form and recover it exactly.
 Compression that removes some of the detail (things
that we cannot perceive) is lossy. You cannot
reproduce exactly the same sound/picture.