Hypergeometric Random Variables

Download Report

Transcript Hypergeometric Random Variables

Hypergeometric Random
Variables
Sampling without replacement
• When sampling with replacement, each trial
remains independent. For example,…
• If balls are replaced, P(red ball on 2nd draw) =
P(red ball on 2nd draw | first ball was red).
• If balls not replaced, then given the first ball is red,
there is less chance of a red ball on the 2nd draw.
Though for a large population of balls,
the effect may be minimal.
n trials, y red balls
• Suppose there are r red balls, and N – r other balls.
• Consider Y, the number of red balls in n selections,
where now the trials may be dependent.
(for sampling without replacement, when sample
size is significant relative to the population)
• The probability y of the n selected balls are red is
p( y ) 
r
y
N r
n y
N
n
CC
C
Hypergeometric R. V.
• A random variable has a hypergeometric
distribution with parameters N, n, and r
if its probability function is given by
p( y ) 
r
y
N r
n y
N
n
CC
C
where 0 < y < min( n, r ).
Hypergeometric mean, variance
• If Y is a hypergeometric random variable with
parameter p the expected value and variance for Y
are given by
nr
n r  N  r  N  n 
E (Y ) 
and V (Y ) 



N
N  N  N  1 
( Proof not as easy as previous distributions
and is not given at this time. )
Sounds like…
• If we let p = r/N and q = 1- p = (N - r)/N, then the
hypergeometric measures
nr
E (Y ) 
= np and
N
n r  N  r  N  n 
 N n
V (Y )  

  npq 

N  N  N  1 
N

1


Look quite similar to the expressions for the
binomial distribution, E(Y) = np and V(Y) = npq.
Rule of Thumb
• For cases when n / N < 0.05, it may be reasonable
to approximate the hypergeometric probabilities
using a binomial distribution.
• Suppose each hour, 1000 bottles are filled by a
machine and on average 10% are “underfilled”.
• Each hour 20 of the bottles are randomly selected.
Find probability at least 3 of the 20 are underfilled.
• Since 20/1000 = 0.02, perhaps we could use the
binomial distribution to approximate the answer.
Easy binomial probability?
• Let p = 0.10, the “success of underfilling”
• P( at least 3 underfilled ) =
• 1 – P( 0, 1, or 2 underfilled)
= 1 – [ P(Y = 0) + P(Y = 1) + P(Y = 2)]
• Approximately equal to
1 – binomialcdf(20, 0.10, 2) = 0.32307
how close is this to actual hypergeometric?
A hypergeometric probability
• P( at least 3 underfilled ) =
• 1 – P( 0, 1, or 2 underfilled)
= 1 – [ P(Y = 0) + P(Y = 1) + P(Y = 2)]
100 900
0
20
1000
20
C C
1
C
100 900
1
19
1000
20
C C

C
100 900
2
18
1000
20
C C

C
 0.3228
As compared to 0.32307 using a binomial approx.
The Binomial Approximation
The hypergeometric
distribution
…and a very similar
binomial distribution
As population increases
• Let N get large as n and p=r/N remain constant,
and we would see that
lim
N 
r
y
N r
n y
N
n
C C
C
C p q
n
y
y
n y
Hypergeometric probabilities converge to the
binomial probabilities, as the events become
“almost independent”.
Proof ?