Lecture 21 Approximation and Nested Problems

Download Report

Transcript Lecture 21 Approximation and Nested Problems

Approximation and Nested
Problem
Hypergeometric Example
 Four players are playing a poker game out of a deck
of 52 cards. Each player has 13 cards. Let X be the
number of Kings one player may have, and answer
the following questions.
 1. Is X a discrete or continuous random variable?
 2. Find an appropriate probability distribution that
can be used to describe X. Also, find the
corresponding parameter(s).
Hypergeometric Example
 3. Find the sample space and X and the probability
corresponding to each point in the sample space.
Binomial vs. Poisson
 Given that a random variable follows a Binomial
distribution with parameters n and p, X~BIN(n,p)
 Sometimes, we can approximate the distribution of X
with a Poisson whose λ=np.
 This is usually done when n is large and p is
small.
Example I
 A computer chip contains 1000 transistors. Each
transistor has probability 0.0025 of being defective.
What is the probability that the chip contains at most
4 defective transistors?
 This is basically a BIN(1000, 0.0025)
 P(X<=4)=P(X=0)+P(X=1)+P(X=2)+P(X=3)+P(X=4)
, where P(X=k)=1000Ck(0.0025)^k*(0.9975)^(1000k)
Example I
 Or, we can consider the number of defective
transistors follows a Poisson distribution with
λ=1000*0.0025= 2.5.
 P(X<=4)=P(X=0)+P(X=1)+P(X=2)+P(X=3)+P(X=4)
, where P(X=k)=e^(-2.5)*2.5^k / k!
Another word on Poisson
 Poisson experiment has the property that:
 The probability of an occurrence is the same for any two
intervals of equal length/area
 The occurrence or non-occurrence in any interval/area is
independent of the occurrence or non-occurrence in any other
interval/area
Another word on Poisson
 In example I, there are 1000 transistors on our chip
and there are an average of 2.5 defective transistors.
 Given this, if we have some other chips with 10000
transistors, what is the average of the number of
defective transistors? How about a chip with 500
transistors?
How good the approximation is?
 Look at two examples:
 1. Example Ia: chips with 1000 transistors, each with 0.0025
chance of being defective.
 2. Example Ib: chips with 1000 transistors, each with 25%
chance of being defective.
How good the approximation is?
 Let’s find the probability that there are at most 4
transistors that are defective on the chip using both
binomial and Poisson.
Example Ia
Example Ib
BIN(1000, 0.0025) BIN(1000, 0.25)
Binomial
Poisson
Approximation
0.891429
6E-117
0.89118
4.4E-101
Another approximation
 Let’s take a step back and consider two sampling
schemes and think about the probability (p) of each
unit being selected from the population.



Sampling with replacement.
Sampling without replacement.
Will this p be different if the population has a size 10, 1000,
1000000000 or infinite?
That reminds us of the binomial and hyper-geometric
distribution…
 Suppose we draw a sample of size n (fixed, say 100) from




a population.
If you sample with replacement, p1=the probability
that each unit is selected.
If you sample without replacement, p2=the
probability that each unit is selected.
As the size of the population increases, the difference
between p1 and p2 actually decreases!
In an extreme case, if the population is infinitely large,
there is no difference between sampling with or without
replacement.
Binomial approximation to Hyper-geometric
 If the size of the population is large, we can use a
binomial distribution to approximate the hypergeometric distribution with nbin=nhg and p=m/N.
Example II
 In a population of size 10,000, suppose that 20% of
the individuals favor Policy A. A sample of size 100 is
taken without replacement, and let X denote the
number of sampled individuals in favor of Policy A.


What is the exact distribution of X?
How could we approximate the exact distribution of X?
How good the approximation is?
 Let’s find the probability that there are less than 30
people in favor of the policy.
 If we use hyper-geometric, the probability is:
0.989107184.
 If we use binomial, the probability is: 0.988751021.
Nested problems
 Suppose we have a problem and we decide to use
binomial distribution to analyze it. Then we will
need to figure out n and p.
 The same thing will happen to problems where we
decide to use other probability distributions.
 *** We always need to find the parameters of the
distribution to be able to work on it.
Nested Problems
 A problem is called a nested problem if it is one
problem inside another.
 For example, the problem first requires us to work
on a binomial (n, p), where in order to find the
parameter p, we have to work out another problem.
Example III
 There are 12 independent sections of stat225 and
each section has 35 students. From previous
experience, on average, 10 students in each section
will get an A. If the number of students getting an A
follows a Poisson distribution and the probability of
getting an A is equal for all sections, answer the
following questions.
Example III
 A. Probability that there are 100 A’s in this course.
 This question is an application of the properties of Poisson r.v.
 Let Xi be the number of students getting A in this course for
each section i, then the total number of A’s in this course will
be the sum of all Xi’s.
 Since Xi~POI(10), X~POI(12*10)
 Finally, the probability of interest is:
 P(X=100)=exp(-120) 120^100/100!=0.0068
Example III
 B. What is the probability that half of the sections
have less than 10 A’s.
 This is a nested problem.
 Each section may or may not have less than 10 A’s, and
that could be considered as a Bernoulli random variable
with parameter p (probability that a section has less than
10 A’s). This part can be solved using a Poisson random
variable
 Then the number of sections with less than 10 A’s in this
course can be considered as a binomial (n, p) where n=12
and p is what we found in the previous step.
 This is a Poisson nested within a binomial.
Example III
 Step 1. Find p
 Let Xi be the number of A’s in each section, since it
follows a Poi (10), then the probability that there are
less than 10 A’s in one section is:
 P(Xi<10)=P(Xi=0)+P(Xi=1)+P(Xi=2)+P(Xi=3)+P(Xi
=4)+…+P(Xi=9),

where P(Xi=k)=exp(-10)*10^k/k!
 Finally, we have
 P(Xi<10)=0.46
 And this is the p for the binomial random variable
Example III
 Step 2: now we know that each section has a 46%
probability to have less than 10 A’s. We have 12
independent sections, so the number of sections that
have less than 10 A’s, denoted by Y, follows a
binomial distribution with parameter (12, 0.46).
 We are interested in the probability that half of the
sections would have less than 10 A’s, so it is
calculated as:

P(Y=6)=12C6(0.46^6)(1-0.46)^6=21.7%
Example IV
 Two players, A and B, are playing a card game. They
start with a deck of 26 cards with all clubs and
diamonds removed. A will deal 13 cards at random to
B and A wins if he gets more Ace than B. If both A
and B get one Ace, the one with spade Ace wins. They
repeat the game 10 times. Find the probability that A
wins 6 times.
Assuming each game is independent, the number of
games A wins follows a binomial distribution with
parameters (10 and p)
Example IV
 Step 1, find p.
 A wins when he has two Aces or spade Ace, let’s find
the probability of each event separately.



A. A has two Aces: let X1 be the number of Ace at A’s hand,
then apparently, X1~HG(26, 13, 2), therefore
P(X1=2)=2C2*24C11/26C13=0.24
B. A has only the spade Ace: we can calculate the probability
directly, which is 24C12/26C13 =0.26(note that, we don’t want
to use 25C12 on the top since in that case, it will also include
the possibilities with 2 Aces)
Finally, P(A wins)=0.24+0.26
Example IV
 Step 2, find the probability that A wins 6 times.
 Let Y be the number of times A wins and Y~BIN(10, 0.5)
 P(Y=6)=10C6*0.5^10=0.205
What is we want to calculate the probability that A wins 5 times?
Then P(Y=5)=10C5*0.5^10=0.246
Example IV
 If they repeat the game 100 times, how many times
do you expect B to win?
Example V
 Someone wants to open a store at downtown
Lafayette. He has decided to have his store open
Monday through Saturday but has not decided the
hours yet. He was torn between opening at 8 or 9. He
is willing to open the store at 8 if there are more than
10 customers visiting between 8 and 9 for at least
four days of a week. A quick research told him that
on average, there are about 5 customers visiting a
store in the neighborhood between 8 and 9. What is
the probability that the storekeeper starts his
business at 8?
Example V
 This is also an example of a nested problem.
 The storekeeper’s decision is based on the number
of days from Mon. to Sat. that there are more than
10 customers visiting his store. This number follows
a binomial distribution.
 For a binomial r.v., we need to figure out the two
parameters, n and p to be able to work on it.
 In this case, n=6, Monday through Saturday.
 p depends on the number of customers visiting, that
is a Poisson r.v. with a mean of 5.