Transcript Coursenotes

Combinatorics
(Important to algorithm analysis )
Problem I:
How many N-bit strings contain at least 1 zero?
Problem II:
How many N-bit strings contain more than 1 zero?
Which areas of CS need all this?
1. Information Theory
2. Data Storage, Retrieval
3. Data Transmission
4. Encoding
Information (Shannon) Entropy
quantifies, in the sense of an expected value, the
information contained in a message.
Example 1: A fair coin has an entropy of 1 [bit]. If the
coin is not fair, then the uncertainty is lower (if asked to
bet on the next outcome, we would bet preferentially on
the most frequent result) => the Shannon entropy is
lower than 1.
Example 2: A long string of repeating characters: S=0
Example 3: English text: S ~ 0.6 to 1.3
The source coding theorem: as the length of a stream
of independent and identically-distributed random variable data
tends to infinity, it is impossible to compress the data such
that the code rate (average number of bits per symbol) is less
than the Shannon entropy of the source, without information loss.
Information (Shannon) Entropy
Cont’d
The source coding theorem: It is impossible to compress the
data such that the code rate (average number of bits per symbol)
is less than the Shannon entropy of the source,
without information loss.
(works in the limit of large length of a stream
of independent and identically-distributed random variable data )
Combinatorics cont’d
Problem: “Random Play” on your I-Touch
works like this. When pressed once, it plays
a random song from your library of
N songs. When pressed again, it excludes
the song just played from the library, and
draws another song at random from the
remaining N-1 songs. Suppose you have
pressed “Random Play” k times.
What is the probability you will have heard your
one most favorite song?
Combinatorics cont’d
The “Random Play” problem.
1.
2.
3.
4.
Tactics. Get your hands dirty.
P(1) = 1/N. P(2) = ? Need to be careful: what if the song has not
played 1st? What if it has? Becomes complicated for large N.
Table. Press # (k) vs. P.
1
1/N
2
1/(N-1)
k
1/(N-k +1)
Key Tactics: find complimentary probability P(not) = (1 - 1/N)(1 - 1/(N1))*…(1- 1/(N-k +1)); then P(k) = 1 - P(not).
5.
Re-arrange. P(not) = (N-1)/N * (N-2)/ (N-1) * (N-3)/(N-2)*…(N-k)/(Nk+1) = (N-k)/N. Thus, P = 1 - P(not) = k/N.
6.
If you try to guess the solution, make sure your guess works for simple
cases when the answer is obvious. E.g. k=1, k=N.
Also, P <= 1.
The very simple answer suggests that a simpler solution may be
possible. Can you find it?
7.
Data Compression
Lossless Compression: 25.888888888 => 25.[9]8
Lossy Compression: 25.888888888 => 26
Lossless compression: exploit statistical redundancy. For example
In English, letter “e” is common, but “z” is not. And you never
have a “q” followed by a “z”.
Drawback: not universal. If no pattern - no compression.
Lossy compression (e.g. JPEG images).
Combinatorics Cont’d
Problem: DNA sequence contains only
4 letters (A,T,G and C). Short “words” made of
K consecutive letters are the genetic code. Each
word (called “codon”, ) codes for a specific
amino-acid in proteins. There are
a total of 20 biologically relevant amino-acids.
Prove that genetic code based on fixed K
is degenerate, that is there are amino-acids
which are coded for by more than one “word”.
It is assumed that every word codes for an
amino-acid
Permutations
• How many three digit numbers (decimal
representation) are there if you cannot use a
number more that once?
P(n,r)
• The number of ways a subset of r elements
can be chosen from a set of n elements is
given by
n!
P(n, r ) =
(n - r )!
Theorem
• Suppose we have n objects of k different
types, with nk identical objects of the kth
type. Then the number of distinct
arrangements of those n objects is equal to
n!
C (n, n1 ) × C (n - n1 , n2 ) × C (n - n1 - n2 , n3 ) ××× C (nk , nk ) =
(n1 !)(n2 !) ××× (nk !)
Visualize: nk identical balls of color k. total # of balls = n
Combinatorics Cont’d
The Mississippi formula.
Example: What is the number of
letter permutations of the word
BOOBOO?
6!/(2! 4!)
Permutations and Combinations with
Repetitions
• How many distinct arrangements are there
of the letters in the word MISSISSIPPI?
Each letter has a unique color (e.g. 1st “I”
is blue, the 2nd is red, etc. ). Each
arrangement must still read the same
MISSISSIPPI.