Transcript Slide 1

CS110: Introduction to Computer Science: Lab12
Birthdays Parties and Counting Blind
Using Probabilities to Build Estimates
Suppose you are throwing a surprise
party for your friends. However, it is
essential that no two people at the
party share a birthday. How many
people can you invite and still be
confident that this happens.
Interestingly, solutions to this
problem are useful for counting
things that are too big to capture.
Core Quantitative Issue
Computation
Probabilities
Estimation
Supporting Computational Issues
Random Numbers
Combinations
Experimental Science
Prepared by Fred Annexstein
University of Cincinnati
CC Some rights reserved. 2007
1
The Birthday Paradox
A simple problem in elementary probability and
statistics is the Birthday Problem: What is
the probability that at least two of N
randomly selected people have the same
birthday? (Same month and day, but not
necessarily the same year.)
Random num
from 0 to 364
BirthDay=
01/01 + num
1/1/2007
154
6/4/2007
216
8/5/2007
182
7/2/2007
344
12/11/2007
2
1/3/2007
338
12/5/2007
216
8/5/2007
253
9/11/2007
57
2/27/2007
Build the following table in Excel that gives a
random listing of Birthdays, starting with any
day (we chose 01/01)
181
7/1/2007
362
12/29/2007
164
6/14/2007
You will use Excel’s “conditional formatting” to
highlight those rows that indicate days that
are duplicates of previously appearing days
on the list.
116
4/27/2007
292
10/20/2007
23
1/24/2007
128
5/9/2007
57
2/27/2007
2
7/12/2007
The problem is usually simplified by assuming
two things: Nobody was born on February
29. People's birthdays are uniformly
distributed over the other 365 days of the
year.
Is it surprising how fast duplicates begin to
appear?!
192
The Birthday Problem in terms of Probabilities and Recursion
This problem, like others we have seen before, is easier to solve by
considering the complementary problem: What is the probability that
N randomly selected people have all different birthdays?
Well let’s see: Suppose for the moment that we knew the probability that
N-1 people all had different birthdays.
Let us call that value All_Different (N-1).
Good, put that in your pocket. Now focus on the probability that the Nth
person coming in to the party also had a different birthday from all
others. Sure, we know that would be the chance of hitting any of the
days not seen so far. Let us call that value
Diff_Person = 365-(N-1) / 365
Hence we know that:
All_Different(N) = All_Different(N-1) * Diff_Person
This is called a Recursive Function – an extremely important
concept in Computer Science. To solve it all you need is an
initial value, say for one person N=1, and then iterate.
3
Sample Spreadsheet
4
Questions:
1. When does the probability become greater than 50-50 that two
people at the party share a birthday?
2. When does the probability become greater than 75% that two
people at the party share a birthday?
3. When does the probability become greater than 99.9% that two
people at the party share a birthday?
4. On the planet Mars, there are 669 Martian days in a Martian year.
For a party on Mars when does the probability become greater
than 50-50 that two Martians at a party share a birthday?
5
BUT WHAT ABOUT LEAP YEARS?
What happens to the probabilities if we add February 29 to the mix? Do
you think it will increase or decrease the chance that 2 people share
a birthday?
To model the leap year, it gets somewhat more complicated, but we are
smart and can make some additional assumptions:
• Equal numbers of people are born on days other than February 29.
• The number of people born on February 29 is one-fourth of the
number of people born on any other day. Why?
Hence the probability that a randomly selected person was born on
February 29 is 1 out of 1461 = 0.25/365.25, and the probability that
a randomly selected person was born on another specified day is
1/365.25.
The probability that N persons, possibly including one born on February
29, have all different birthdays is the sum of two probabilities:
• That the N persons were born on N different days other than
February 29.
• That the N persons were born on N different days, and include one
person born on February 29.
We can add the probabilities because the two cases are mutually
6
exclusive. Now each probability can be expressed recursively:
Leap Year Recursion
First compute the case that excludes birthdays on Feb29. We have that
the probability that N persons were born on N different days other
than February 29 is:
All_Diff_no_Feb29(N) = All_Diff_no_Feb29 (N-1) *
Diff_Person_no_Feb29 (N)
Diff_Person_no_Feb29 (N) = (365- (N-1)) / 365.25
is the probability the Nth person has different birthday from all others.
For the second case, which includes one birthday on Feb 29:
All_Diff_yes_Feb29 (N) = All_Diff_yes_Feb29(N-1) *
Diff_Person_no_Feb_29 (N-1) +
All_Diff_no_Feb29(N-1) * 0.25 / 365.25;
Redo the calculations with the new ‘leap year’ formulas. Questions:
5. Does the new model increase or decrease the chance that 2 people share a birthday?
6. Do any of your previous answers (#1-3) change with the new calculations?
7
Estimating the Size of Big
We can apply the logic we just used in reverse to estimate the size of
something that might be very difficult to count directly, e.g., the number
of web-pages on the WWW, the number of deer in Mt. Airy, or the
number of ants in the Amazon rainforest.
Blind Counting Theorem: Given a large set of N objects. If we select
objects at random and with replacement, then the number of expected
draws before a repetition occurs is approximately:
=SQRT(N * pi() / 2)
Run 6 experiments using N=10,100,1000,10000,10^5,10^6 and using your
method from Slide 2 you must determine when a repetition occurs.
Compute the absolute and relative errors of your results with respect to
this Theorem on each experiment. Produce a table like the following:
Experiment
N
First repeat
Theorem Value
Error Abs
Error Rel
1
10
6
3.96
2.04
51.39%
2
100
16
12.53
3.47
27.66%
3
1000
50
39.63
10.37
26.16%
4
10000
120
125.33
5.33
4.25%
5
100000
301
396.33
95.33
24.05%
6
1000000
1400
1253.31
146.69
11.70%
8