Part e - Department of Computer Science

Download Report

Transcript Part e - Department of Computer Science

Discrete Math
CS 2800
Prof. Bart Selman
[email protected]
Module
Probability --- Part e)
1) The Probabilistic Method
2) Randomized Algorithms
1
The Probabilistic Method
2 2
The Probabilistic Method
Method for providing non-constructive existence proofs:
Thm. If the probability that a randomly selected element of the set S does
not have a particular property is less than 1, then there exists an
element in S with this property.
Alternatively: If the probability that a random element of S has a
particular property is larger than 0, then there exists at least one
element with that property in S.
Note: We saw an earlier example of the probabilistic method when
discussing the 7/8 alg. for 3-CNF.
3
Example: Lower bound for Ramsey numbers
Recall the definition of Ramsey number R(k,k):
Let R(k,k) be the minimal n such that if the edges of the complete
graph on n nodes are colored Red and Blue, then there either is a
complete subgraph of k nodes with all edges Red or a complete
subgraph of k nodes with all edges Blue.
R(3,3) = 6. So, any complete 6 node graphs has either a Red or
a Blue triangle. (Proof: see “party problem”.)
4
Reminder: “The party problem”
Dinner party of six: Either there is a group of 3 who all
know each other, or there is a group of 3 who are all
strangers.
By contradiction. Assume we have a party of six
where no three people all know each other
Let’s say she knows 3 others.
and no three people are all strangers.
She either knows or doesn’t
know each other person.
If any of those 3 know each other, we
have a blue , which means 3 people
know each other. Contradicts
assumption.
So they all must be strangers. But then we
have three strangers. Contradicts
assumption.
But there are 5 other people!
So, she knows, or doesn’t
know, at least 3 others.
(GPH)
The case where she doesn’t know 3
others is similar. Also, leads
to constradiction.
So, such a party does not exist!
QED
Consider one
person.
How do we get a lower bound on R(k,k)?
E.g., lower bound for R(3,3)?
We need to find an n such that the complete graph on
n nodes with a Red/Blue edge coloring
does not contain a Red or a Blue triangle.
E.g.
So, R(3,3) > 5.
I.e., we have a lower
bound on the Ramsey
Number for k=3.
Can we do this for R(k,k) in general? Very difficult to construct
the colorings, but… we can prove they exist for non-trivial k.
Why?
Thm. For k ≥ 4, R(k,k) ≥ 2k/2
Unbiased coloring is not
essential. Only matters that
each possible coloring has
some non-zero probability.
So, e.g., k = 20, then there exists a Red/Blue coloring of the complete graph
with 1023 nodes that does not have any complete monochromatic sub
graph of size 20. (But we have no idea of how to find such a coloring!)
Proof: Consider a sample space where each possible coloring of
the n-node complete graph is equally likely. A sample coloring
can be obtained by randomly coloring each edge. I.e., with
probability ½ set edge to Blue, otherwise Red.
Let n < 2k/2
Aside: each particular coloring has probability (1/2) (n * (n-1) /2)
We want to show that there is a larger than 0 probability
of getting a coloring with no monochromatic k-clique.
Consider a subset of k nodes. There will be
subsets, S_1, S_2, … S_
.
of such
Let E_i event that S_i is a monochromatic subgraph (a Red or
a Blue clique).
So, a sample is a randomly selected graph coloring. What is
the probability of E_i?
Why?
Note: number of edges in clique of size k is:
So, the probability that the randomly selected coloring will
have some monochromatic k-clique is:
If we can show that this probability is strictly less than 1, then
we know that there must exist some coloring that does not have
any monochromatic k-clique!
We’ll do this. It’s mainly a matter of rewriting the combinatorial
expression for the probability and finding an upper bound.
First step: Can we just add up the individual probabilities?
But, computing the exact value of
very tricky. Why??
is
Events are not disjoint and not independent!
A coloring can have multiple monochromatic cliques. Also,
having e.g. several Blue cliques in part of the graph, makes it
more likelyto have more Blue cliques on other cliques that share
many edges with the Blue cliques. So, all kinds of subtle dependencies!
Would need to use the inclusion-exclusion formula (many terms)!
Fortunately, we “just” need to upper bound the probability. So,
we can truncate the formula to the first set of terms. This gives us
Boole’s Inequality:
p(E1 U E2) ≤ p(E1) + p(E2)
Why?
Since:
General form, see exercise
15, sect. 6.2.
p(E1 U E2) = p(E1) + p(E2) - p(E1  E2)
We want:
So, we get:
<1

So, we have “upper bounded” our probability. What’s left?
We need to show that the left hand side is strictly less than 1.
“Just” combinatorics…
O(nk)
Note: We have many terms in the sum, but p(E_i) `s can each
be very small.
Ex. 17, 5.4
by assumption:
by assumption:
So,
Note: Proof is non-constructive.
No effective method known
for finding such coverings!
So,
when
and
So, is the probability of having a monochromatic k-clique is
strictly less than 1. Therefore, there must exist edge colorings
in our sample space that do not have such monochromatic
k-cliques!
So, R(k,k) >= 2k/2
QED
Probabilistic Algorithms
14 14
Monte Carlo Algorithm
Probabilistic Algorithms --- algorithms that make random choices at one
or more steps.
Decision problems --- problems for which the answer is True or False.
E.g., is this number composite?
Monte Carlo algorithms --- there is a small probability that they will
return an incorrect answer. In general, by running the algorithm
longer, we can make the probability of error arbitrary small.
15
Example I: Probabilistic Primality Test
Miller’s Test
Let n be a positive integer and let n-1 = 2s t, where s is a non-negative
integer (takes out factors of 2) and t is an odd positive integer.
E.g. n = 2047. n – 1 = 2046 = 2 x 1023. So, s = 1 and t = 1023.
n passes the Miller test for (likely) primality for the base b (1<b<n) if either
bt ≡ 1 (mod n) or
j
2
b t  1(mod n), for some j , 0  j  s  1
With base b =2, we have
1023
2
 (2 )93  204893
11
 193  1 (mod 2047)
but 2047 = 23 x 89 
All primes pass the Miller test. So, if test fails, then n is composite.
Unfortunately, also, a composite integer n may pass
Miller’s test, but for fewer than n/4 bases b, with 1 < b < n.16
Example: Determine if n = 221 is prime.
We have n − 1 = 220 = 22·55. So s = 2 and t = 55.
Randomly select a b < n, e.g. b = 174. We compute:
bt mod n = 17455 mod 221 = 47 ≠ 1
b20·t mod n = 17455 mod 221 = 47 ≠ n − 1
b21·t mod n = 174110 mod 221 = 220 = n − 1. hmm…
Since 220 ≡ -1 mod n, either 221 is prime, or 174 is strong liar for 221.
We try another random b, e.g. 137:
bt mod n = 13755 mod 221 = 188 ≠ 1
b20·t mod n = 13755 mod 221 = 188 ≠ n − 1
b21·t mod n = 137110 mod 221 = 205 ≠ n − 1.
So, base 137 is a witness for the compositeness of 221. And 174 was
in fact a strong liar. Note that this tells us nothing about the factors of
221, which are 13 and 17.
17
Using Miller’s Test
Goal is to decide the question – “Is n composite?”
Algorithm:
1. Randomly pick a number 1 < b < n.
2. Does n pass the Miller test for b?
If n fails the test, then n is a composite number; answer is True; b is
known as a witness for the compositeness, and the test STOPS.
Otherwise the answer is “unknown”
3. Repeat step 1&2) k times until the required certainty is achieved.
If after k iterations, if n is not found to be a composite number, then
it can be declared probably prime (each time through with outcome
“unknown”, we have some additional evidence for primality).
18
Observations:
Algorithm can only make “mistakes” when it returns “unknown.”
The probability that a composite integer n passes the Miller’s test for a
randomly selected base b is less than 1/4.
By repeating test k times, given that the iterations are independent, the
probability that n is composite but the algorithm has not found a witness
for compositionality is less than (¼)
k.
19
Question: But, do we really need to select b at random in the algorithm? Why
not just try b’s starting from 1 through n? This might be just fine… Should
work for 75% of the b’s…
Perhaps a deterministic algorithm works just fine after all! Hmm…
It’s a subtle but key issue. The 25% of “bad bases” may all cluster
together starting at b=1. How long would the alg. take to find
a correct base? Still poly time?
Actually, exponential in the length of the input!  (remember: we measure
run time in terms of the number of digits to represent n.)
Randomization fundamentally gets around this: For any composite
n, wherever the strong liars are, after k guesses we will have
probability <= (1/4)^k that we did only used “liars”.
E.g. (1/4)^30 ~ 10-18, even independent of number of digits of n itself!
While for fixed scheme: n = 10,000, we may need to try 2,500 bases!
It’s hard to fool a randomized algorithm!!
20
By taking k sufficient large, we can make the probability of error quite small.
10 iterations  less than 1 in 106
30 iterations  less than 1 in 1018
Key Point:
In general, we can easily boost our confidence in the outcome of a
randomized procedure by doing repeated runs.
Compare to: taking more samples in statistics.
Note 1: If we use n as prime as one of the two primes used in the RSA
cryptosystem and n is actually a composite, the procedures used to decrypt
messages will not produce the original encrypted message. They key is
then discarded and two new possible primes are used.
21
Note 2: The simplest probabilistic primality test is the Fermat primality test.
(discussed earlier) It is only a heuristic test; some composite numbers
(Carmichael numbers) will be declared "probably prime" no matter what
witness is chosen. Nevertheless, it is sometimes used if a rapid screening of
numbers is needed, for instance in the key generation phase of the RSA public
key cryptographical algorithm.
Note 3: The Miller-Rabin primality test is a more sophisticated variant which
detect all composites (this means: for every composite number n, at least 3/4
(Miller-Rabin) or 1/2 (Solovay-Strassen) of numbers b are witnesses of
compositeness of n. They are often the methods of choice, as they are much
faster than other general primality tests.
22
Example II: Finding The Median
--- Randomized Divide and Conquer
Problem:
Given a set of N distinct numbers, what is the median value?
Example with N = 11:
14, 18, 12, 19, 3, 4, 7, 15, 5, 11, 8
Median?
Runtime??
O(N log(N))
3, 4, 5, 7, 8, 11, 12, 14, 15, 18, 19
A: 11
Can we do better? Hmmm…
What about finding minimum
element or max element??
O(N)
23
The median is the “middle position.”
More precisely, given a set S={a_1, a_2, a_3, …, a_n},
the median is the kth largest element in S, with
k = (n+1)/2 if n is odd, and
k = n / 2 if n is even.
Note: let’s assume all numbers are distinct.
Let’s consider a more clever algorithm for finding the kth
largest element of a set S.
Note: 1st largest is the minimum, and nth largest the max element.
Let’s try a “divide (of S) – and – conquer” strategy.
24
Select(S, k):
Choose a splitter a_i S={a_1, a_2, a_3, …, a_n},
For each element a_j of S
Note: a_i values not
Put a_j in S if a_j < a_i
ordered.
+
Put a_j in S if a_j > a_i
Endfor
If |S-| = k-1 then
The splitter a_i was in fact the desired answer
Else if |S-| >= k then
The kth largest element lies in SRecursively call Select(S-, k)
Else suppose |S-| = l < k-1
The kth largest element lies in S+
Recursively call Select(S+, k-1-l)
Endif
[on board]
25
Analysis
1) Correctness
2) Run time (randomized)
We’ll get: O(n) 
So, no need to sort first after all!
26
Correctness
1) Termination: yes, always calls on strictly smaller set.
2) Correctness:
By induction on n, the size of S.
Base case: |S| = 1.
Select(S,1) returns the right value.
Induction step: Assume recursive calls Select(S’, x) (with on
smaller sets) correctly return the x^th largest value.
From the code and the induction assumption, we see that
27
Select(S,k) correctly returns the k^th largest value.
Aside: What happens if we
consistently picked the
“worst” splitter?
Run time
First, some intuition. Consider the “splitter”.
What would we like a good splitter to do?
Note that after the splitter, we proceed with one of the two
-
recursive calls (on S or S+).
So, we want the splitter to reduce the size of the set in the
recursive call as much as possible. We don’t know in advance
whether the call is on S- or S+. So, let’s make them both as
small as possible…
Best splitter in the middle: pick the median. Hmm. 
Run time cont.
Fortunately, our splitter does not have to be THAT good.
Let’s assume each of the sets S- and S+ have at least ε n
fewer elements than S (of size n). (0< ε <=1)
So, those sets are each of size at most (1 – ε) n
Assume, work in Select without the recursive calls is linear time,
i.e., c.n for some constant c.
Now, run time is bounded by:
T (n)  T ((1   )n)  cn
29
T (n)  T ((1   )n)  cn
with
0   1
Let’s “unwind” this recursion:
2
3
T (n)  cn  c(1   )n  c(1   ) n  c(1   ) n  ...
 [1  (1   )  (1   )2  (1   )3  ...]cn
1
  cn

So, run time linear in n ! (smaller  bigger constant factor)
“divide-and-conquer” worked!
Remaining issue?
30
But, how do we get a reasonable splitter? In particular, we
can’t spend much time on finding a splitter…
Randomness and probability to the recue!
We will show that simply picking a random element
from S to split on will work just fine!
Let’s say we want to get an ε = 1/4. Then, we want to
avoid the bottom 1/4 elements and the top 1/4 elements.
On average (in expectation), how many random guesses
do we need to get a good (in middle half of elements) splitter?
A: 2.
Geom. r.v. with p = 1/2
31
A bit more formally
Let’s define “phases” in the run of the algorithm.
We say the alg. is in phase j if the set under consideration
is at most of size
3 j
n( )
4
but greater than
3 j 1
n( )
4
Note that a “successful splitter” --- within the middle
half elements --- cuts the size of the set by at least 3/4.
32
Let r.v. X be the number of steps of the algorithm. So,
X X
 X  X  X ...
0
1
2
3
where r.v. X_j is the number of steps in phase j.
The expected time spend in phase j is
3 j
E[ X j ]  2cn( )
4
Why?
Time in one iteration:
3 j
cn( ) for some constant c.
4
and, in expectation 2 iterations per phase.
33
Putting it together:
We have for the total run time
X X
 X  X  X ...
0
1
2
3
3 j
E[ X j ]  2cn( )
4
3
3
j
So, E[ X ]   E[ X ]   2cn( )  2cn ( ) j  8cn
j
4
j
j
j 4
We have proved:
Thm. The expected run time of Select(S,k) is O(n).
Note 1: Clever randomization and divide-and-conquer,
eliminated the need for sorting! O(n log n)
Note 2: Incidentally, similar strategy and analysis gives us
Quicksort (“calculating rank of all numbers”) with expected
34
run time O(n log n).
Probability theory is a very rich topic of increasing
importance in computer science (algorithms, AI,
machine learning, and data mining), but for now, this
ends our probabilistic adventures.
Fortune’s Formula:
a good read about probabilities
and the math behind “gambling”
and investing.
Example story: How to win a
Nobel prize, lose your life savings,
and go $20M in depth, all in the
same year!!
This concludes our journey through Discrete Mathematics:
from logic and proofs,
through sets, functions, sequences, and induction,
through algorithms and growth rates,
through number theory,
and, finally, through probability and randomization!!
THE END!! 
36