Part Two: Turning Information into Data

Download Report

Transcript Part Two: Turning Information into Data

Jim Williams
HONP-112
Week 4




We know a binary digit only has one of two
possible values.
In the context of a coin toss, a coin also has
only 2 possible values - heads (H) or tails (T)
If we flipped 2 coins together, how many
different ways can they come up?
What about if we flipped 3 coins together?
How many different possible combinations of
H and T are there?




Each individual coin can come up in one of 2
possible ways (H or T)
If we flip 3 coins together, here are the
possible combinations:
HHH, HHT, HTH, HTT, THH, THT,TTH,TTT
How many combinations is that?


We can see that flipping 3 coins, each having
2 possible values, results in 8 possible
combinations.
Why Is That? Because we must multiply
together the possible number of values of
each individual coin for each coin we are
flipping.


We flip 3 coins.
Let’s multiply the possibilities for each:
◦ 2 possibilities for coin 1
◦ x 2 possibilities for coin 2
◦ x 2 possibilities for coin 3

We end up with 2 x 2 x 2 possible
combinations, which is 8.




When dealing with a small number of coins
we can easily create a table for all the
possible combinations.
There are three coins so we will have 3
columns.
We know the total number of combinations
we will have (8).
So for the first column, split the result in half
– 4 rows of tails, 4 rows of heads.
◦ Note: By convention we usually start the lists with
the “0” value – in this case the “tails”.
Coin 1
T
T
T
T
H
H
H
H
Coin 2
Coin 3

Now in the next column, instead of groups of
4, divide that in half so you are working in
groups of 2 instead. Alternate T and H as
shown :
Coin 1
Coin 2
T
T
T
T
T
H
T
H
H
T
H
T
H
H
H
H
Coin 3

Now in the next column, instead of groups of
2, divide that in half so you are working in
groups of 1 instead. Alternate T and H as
shown :
Coin 1
Coin 2
Coin 3
T
T
T
T
T
H
T
H
T
T
H
H
H
T
T
H
T
H
H
H
T
H
H
H



As you can see the general rule for the
combination table is:
First column: Alternate the Tails/Heads (0/1)
values in groups of half of your total number
of combinations.
Each subsequent column: Alternate the
Tails/Heads (0/1) values in groups of half of
the size of the group from the previous
column.






In our example of three coins:
We had 8 possibilities
So, First column: Groups of 4 (8/2): 4 rows of
heads, 4 rows of tails.
Second column: Groups of 2 (4/2) –
alternating groups of heads/tails
Third column: Groups of 1 (2/2) – alternating
groups of heads/tails
Try a combination table of 4 coins now.




When we had three coins, there were 8 combinations
of tails/tails.
As you can see, if we add just one more coin, we now
have 16 possible combinations of tails/heads.
If you are observant … you will notice we are really
adding a column to the left of the existing table – 16
rows long, now we have 8 rows of tails and 8 rows of
heads
Leave the old table where it is (next to the 8 tails in
the new column), and make an entire copy of it in the
remaining “empty” rows nest to the heads in the new
column!
Coin 1
Coin 2
Coin 3
Coin 4
T
T
T
T
T
T
T
H
T
T
H
T
T
T
H
H
T
H
T
T
T
H
T
H
T
H
H
T
T
H
H
H
Coin 1
Coin 2
Coin 3
Coin 4
H
T
T
T
H
T
T
H
H
T
H
T
H
T
H
H
H
H
T
T
H
H
T
H
H
H
H
T
H
H
H
H




Likewise, if we added another coin, we would double
the combinations to 32, and so on.
The possible combinations therefore will increase
exponentially (where the exponent is a power of 2 of
course) with each single bit added to the mix!
Consider this for a moment – would you rather
receive a million dollars, or a penny one day and the
amount doubled each day for a month?
The next time you hear the much overused/misused
phrase “increasing exponentially”, keep this in mind
and see if someone saying this REALLY means it :-)



For our 3-coin example, our total number of
possible combinations was 2 x 2 x 2.
Does this look familiar? It is 2 to the 3rd
power, or 23
Since a coin has only 2 possible values, the
general formula for the number of possible of
combinations is 2n
◦ n=number of coins



So based on our understanding of the coin
examples, we can apply the same rule to
working with Bits:
Given n number of bits: the resulting number
of bit pattern combinations is 2n
This principle (and its “inverse” discussed
later) is the most important thing you need to
understand for this lesson, and for this entire
course!!!


What the bit pattern combinations mean is
dependent on the context.
But it is important to understand that it is
each distinct PATTERN of binary digits that
will then be “assigned” to represent a distinct
value in human terms.
Bit 1
Bit 2
Bit 3
Meaning
1
1
1
Sloth
1
1
0
Greed
1
0
1
Envy
1
0
0
Lust
0
1
1
Anger
0
1
0
Pride
0
0
1
Gluttony
0
0
0
(not used)
Bit 1
Bit 2
Bit 3
Meaning
1
1
1
(not used)
1
1
0
Arctic
1
0
1
Antarctic
1
0
0
North Atlantic
0
1
1
South Atlantic
0
1
0
North Pacific
0
0
1
South Pacific
0
0
0
Indian Ocean




As you can see the meaning is arbitrary and
depends on the list of distinct values I want to
find a digital bit pattern to represent.
Notice that I “mapped” a list of distinct values to
distinct bit patterns.
Also notice that in both cases the list was of 7
items, (do you know what they were?) , so I was
left with one pattern I did not use. That is fine,
and which one to “ignore” is again up to the
designer of a system.
Remember that this “mapping” is conceptual. We
will see an real-life example later on. For now
you need to understand the concept.





What if we already know how many
combinations we WILL need
For instance we know we want to have 30
possible patterns?
We know that 2n = number of patterns
So try to work it out in reverse and get the
smallest possible whole number value for n
Example: 30 = 2n. Solve for n; but n must be
a WHOLE NUMBER.


We know that we need n bits to represent 2n
different possible patterns. In our example we
have 30 possible bit patterns.
Let’s try to come up with a number of bits that
works.
◦ 4 bits = 24 combinations = 16 (not enough)
◦ 5 bits = 25 combinations = 32 (enough)

This is “trial and error” but it works and I
recommend you use this method to solve these
types of problems. The “technical” way is on the
next slide – for informational purposes only…



If we already know we need x number of
combinations, the “technical” answer to our
question of “how many bits do we need” is the
base-2 logarithm of the number of
combinations.
In other words, what power do we need to raise 2
by to get the number of patterns?
But in cases where the number of needed
patterns is not a whole power of 2, we would
have to round up the base-2 logarithm to the
next whole number anyway. So use the simpler
“trial and error” method instead!



If we have a given number of bits (n), know
how many possible patterns we have.
Be able to write out the patterns.
Conversely, if we already know how many
patterns we will need, then know how to find
the smallest possible number of bits so that
2n gets us an answer greater than or equal to
the number of patterns.



The term “digital” means that we are dealing
with electronics that only understand two
distinct “states” – high/low, on/off, 1/0, etc.
Therefore anything we wish to represent
digitally must be converted to various
patterns of 1/0.
The computer will “understand” what these
patterns mean and handle them accordingly.


Analog means that we are dealing with
electronics that can understand an infinite
number of “states”, or values.
Unlike digital signals and media (consisting
only of discrete values), analog technology
can reproduce a continuous set of values.




An analog way to represent music is a
phonograph record.
Grooves cut into the record is analogous to the
sound waves of the music. A record groove is a
continuous line, and there are an infinite number
of points on a line.
A digital way to represent music is an audio file.
But we cannot store an infinite number of values,
so instead of a “line” we have many discrete
“points” instead, each containing a set number of
digits which represent different values.

See below. Notice the digital signal only has two
possible values. FYI it is not really a continuous
line. But it appears that way because there are
more changes per second than the scope can show
(you don’t have to know what all that means…)




Remember that neither analog nor digital is
“reality.”
For instance, a 35mm film photograph
(analog), or a JPEG file of a picture (digital) is
not REALLY the scene you captured.
Both are only representations of that reality.
Keep this point in mind throughout the rest
of this course.



We now know that a computer works
exclusively with binary numbers.
In other words, it must understand human
information in terms of digital electronics
(“on” or “off”)
Digital Representation - or digitizing - is
when we represent human information as
binary digits.



A computer has its own set of “rules” about
how different kinds of information may
represented digitally.
These rules may be different for different
types of computers. The designers of a
computer make the decision about how this
is done.
We will not get into the technical aspects of
how this is done yet – but we will study the
functional/analytical aspects.

Almost any piece of human information can
be digitized.
◦
◦
◦
◦
◦
◦
Books, papers, and other written works
A checkbook register
Customer purchase records
Music
Photographs
etc.

Several things are necessary to come up with a
way to digitize human information:
◦ 1. An full understanding of what the information is
◦ 2. Finding the smallest meaningful piece of information
that the rest is built upon.
◦ 3. How many possible “values” the smallest meaningful
pieces of information can have.


After we know this, we can design a way to
represent each of these different values using
combinations of bits.
Then string them all together to have a complete
representation of the full set of information
(photo, written document, song, etc.).



When digitizing, we must think of our
information in the smallest possible terms.
Then, we can find a way to digitally represent
each small piece of information.
Combining these small digital representations
gives us our complete digitization of the
whole information.



This step is the most important, and has
nothing to do with computers.
In the next few slides, we will try to break a
piece of information into its parts.
Then, we can see what we need to do to
digitize the information.


As you can see the most important part of
designing a way to digitize information is the
analysis phase. This is where we do our
thinking, discovery, and understanding of
whatever it is we are trying to digitally
represent.
For instance, if we were developing a way to
digitally represent musical notes, and did not
know how many musical notes there were, we
would fail.



Let’s take a piece of human information we
should all be familiar with – the written word
(i.e. a paper, book, article, etc.).
We want to come up with a way to represent
this type of information in a digital form.
Think a while - what does a written paper
consist of?







Assume the paper is written in the English
language.
A written paper consists of paragraphs.
Each paragraph consists of sentences.
Each sentence consists of words.
Each word consists of letters.
We cannot break the written word down any
further than this. (A letter is not built up out of
anything smaller.)
So, let’s start with finding out how many possible
“values” a letter has.



We have to allow for any possible letter to be
used in the paper.
The English language consists of 26 different
letters.
However, each letter can be of upper or lower
case. So now we have to account for 52
different letters.



So far, we have determined that we have to
account for 52 different letters.
We will ignore other things that may be in a
written paper for now (can you think of any?)
If we can figure out a way to digitize each
letter, then we can put them all together and
digitize the entire paper.



We must come up with a way to digitally
represent a single small piece of information
in 52 different possible ways.
Remember - this means that we represent the
information using binary digits (bits). So we
need to define 52 different PATTERNS of 1s
and 0s – and each of these patterns will be
“assigned” to each of the corresponding
possible values.
How many bits do we need to do this?


We know that we need n bits to represent 2n
different possible values. We have 52
possible values for each letter.
Let’s try to come up with a number of bits
that works.
◦ 4 bits = 24 combinations = 16 (not enough)
◦ 5 bits = 25 combinations = 32 (not enough)
◦ 6 bits = 26 combinations = 64 (this works!)





We determined that there are 52 possible
values for a single letter.
We also discovered that we need 6 bits to
represent these 52 possible values.
With this information, we can now represent
our human information (letters) as binary
digits that the computer can understand.
We will take the bit patterns and assign them
to each possible value for a letter.
Of course, the computer must be “told” how
to interpret the patterns.

Of course, in real life, there are other characters
besides letters
◦ punctuation marks, spaces, tabs, page breaks
◦ Number characters
◦ other “odd” characters (Greek letters, etc.)


So most computers use a sequence of 8 bits to
represent characters - allowing for 256 (or 28)
different values.
Research the “Extended-ASCII” chart. It shows
the real 8-bit patterns that are assigned to
characters. Of course the computer is designed
to understand this scheme.




Originally only 7 bits were used per character
(the original ASCII scheme).
Then one extra bit was added, and this
doubled the total number of possible
patterns from 128 to 256 (remember about
increasing exponentially??).
Maybe you still don’t think 256 possible
characters is enough patterns???
Others have addressed this – see “Unicode”
for more information.





As you know, a bit is a single binary digit.
In computers, we sometimes think of a
group of 8 binary digits as a single binary
number.
A group of 8 binary digits is called a Byte.
A byte example: 10010101
Most of the time, we measure things in Bytes,
not bits.





So one can say that characters in most modern
computers are represented by a one-byte code (as
opposed to 8-bit code).
Sometimes you will see things referred to in terms of
bytes, and other times you will still see reference to
bits instead (especially when marketing people want
their computer’s features to appear “bigger”).
Just know that there are 8 bits to a byte, or
conversely there are 1/8 bytes to a bit.
Example: I have a 24-bit pattern, please express this
in bytes.
Example: How many bits are contained in a 4-byte
pattern?


Consider the byte 01110101
Depending on the context, this may possibly
represent either of these:
◦ The quantity 117 (read as a base-2 number)
◦ The Extended-ASCII character for the letter “u” (look it
up on the chart)
◦ Possibly other types of things we will discuss later

So, saying “it depends on the context” is not a
“cop-out.” Since the computer can ONLY
represent two possible values, we have to know
the context because the same combinations of 1s
and 0s mean something completely different (as
per our example above).


You are asked to design a way to digitally
represent a piece of sheet music.
What is the process to design a digital
representation of this?



What does a musical score on paper consist
of?
How could it be broken down?
How many different contexts may you need
within the SAME piece of information?




This is NOT a music theory class and the
following information will not be complete.
I am providing this as an interactive exercise.
I am trying to illustrate how vitally important
it is to fully understand what it is we are
designing a digital representation for –
BEFORE we actually do anything with
computers.
Also – this is only a THEORETICAL exercise. It
does not represent any real-life music
encoding system!





A musical score is made up of staves, which are
made up of bars, which are made up of notes.
For now let’s only consider a single staff.
There are 12 possible musical pitches that can be
represented on a staff. (Name them)
These 12 pitches repeat in octaves. So for
instance you can have an “A” note in one octave
and also an “A” note an octave higher, 2 octaves
lower, etc.
So how many octaves would we need to
represent??



There is also a special “note” called a “rest”
which means silence.
Notes are maintained for a certain DURATION
– i.e. how long is the note held over some
interval of time? There are different types of
notes (which are all powers of 2 by the way!)
that specify this.
The piece of music itself is also in a “time
signature” which in layman’s terms defines
the context for the duration of the notes..



So far, we know we need to represent 12
possible values (each basic note value) and a
special value for a “rest” so that makes 13.
Also we need to represent each of the “types”
of notes based on duration (full, half, quarter,
eighth, etc.). How many types of notes do
you want to represent? How would you
represent them?
We need to decide how many octaves of notes
we want to represent as well and come up
with a way to assign them bit patterns.




Then we have things like “dotted” notes, ties,
slurs, repeat marks, key signature changes,
etc.
Just something to think about.
How would you design something like this?
Let’s just try the notes themselves:





This will form the basis of our system
If we want to represent 12 different note values
(well actually 13 including the “rest”), how many
bits do we need?
23 – 8: not enough patterns
24 – 16 – that is enough. We will use 13 of the
patterns and just not use the remaining patterns
for anything else in this context.
So we know we will use bit patterns that are 4bits long. We will determine these possible
combinations and assign the patterns to a note.
Bit 1
Bit 2
Bit 3
Bit 4
What it
represents
0
0
0
0
rest
0
0
0
1
A note
0
0
1
0
A#/Bb note
0
0
1
1
B note
0
1
0
0
C note
0
1
0
1
C#/Db note
0
1
1
0
D note
0
1
1
1
D#/Eb note
Bit 1
Bit 2
Bit 3
Bit 4
What it
represents
1
0
0
0
E note
1
0
0
1
F note
1
0
1
0
F#/Gb note
1
0
1
1
G note
1
1
0
0
G#/Ab note
1
1
0
1
(not used)
1
1
1
0
(not used)
1
1
1
1
(not used)

Now that we designed a way to represent all the
different pitches/notes, we can come up with
other schemes to represent:
◦ Note durations
◦ Octave numbers
◦ Other things…


Each of these would be considered a separate
context and the system would need to know how
to interpret these bit patterns in the proper
context.
I’ll leave it up to you to expand the design
further and come up with a way to “string” the
values together to create a music score.



As you see … it is not easy to break down
information so it can be represented digitally. It
is not so much the mechanics as it is developing
a full understanding of the information at hand.
Sometimes, computer people get involved with
designs without the involvement of subject
matter experts – and that is one reason for faulty
systems.
But luckily for us, most design work for common
applications has already been done and is
available for us to use in tried and true systems.




We have been studying the difference between
human information and its digital representation.
We have only discussed various ways to represent
information using combinations of bits, and the
conceptual process to accomplish this..
But we did not discuss the technical aspects of
how real human information is actually converted
to bits using , and vice-versa.
We will study that more in future lessons. It is
important that you just grasp the concept for
now.

To digitize conceptually, we follow these steps
Understand the information
Break the information down into smallest parts.
Determine how many possible values there are for the small part.
determine the n, where 2n can accommodate the number of
possible values.
◦ “Map” a bit pattern to each of the possible values we want to
represent
◦ String these values together to create an entire digital
representation of the data.
◦
◦
◦
◦



Realize same bit patterns can mean different things in
different contexts.
Be able to refer to digital data in terms of bits or bytes.
Given a “simple” context, be able to determine the
possible values and “map” these to bit combinations (test
questions will be simpler than our music example!).