lecture5 - University of Michigan

Download Report

Transcript lecture5 - University of Michigan

School of Information
University of Michigan
Discrete and continuous
distributions
Where does the binomial coefficient come
from?
Suppose I 7 blue and pink balls, each of them uniquely marked so that I can
distinguish them
A
B
C
D
E
F
G
How many different samples can I draw containing the same balls but
in a different order?
7!
G E
C
D B
F A
I have 7 choices for the first spot, 6 choices for the second (since I’ve
picked 1 and now have only 6 to choose from),
5 choices for the third, etc.
7! = 7 * 6 * 5 * 4 * 3 * 2 * 1
Now if I am just counting the number of blue and pink balls, I don’t care
about the order.
So all possible arrangements (3!) of the pink balls look the same to me
A
B
F
D
E
C
G
A
B
G
D
E
C
F
A
B
E
D
F
C
G
A
A
B
G
D
F
C
E
A
B
E
D
G
C
F
A
B
F
D
G
C
E
B
D
So instead of having 7! combinations, we have 7!/3! combinations,
because where before we had 6 different possibilities of uniquely ordering
different pink balls – they are equivalent
C
E
F
G
The same goes for the blue balls, if we can’t tell them apart, we lose a
factor of 4!
number of ways of arranging n different things
Binomial coefficient =C(n,k)= ----------------------------------------------------------------(# of ways to arrange k things)*(# ways to arrange n-k things)
n!
= ----------------k! (n-k)!
Note that the binomial coefficient is symmetric – there are the same
number of ways of choosing k or n-k things out of n
We’ve got the coefficient, what is the
distribution about?
 Suppose your sample of 7 is actually drawn from a very
large population
 (so large that it is basically unaffected by the removal of a
measly 7 balls)
 p = probability that ball is pink
 (1-p) = probability that ball is not pink (blue)
 The probability that you draw a sample with 3 pink balls
and 4 blue balls in a particular order e.g. (two pink
followed by 3 blues, followed by a pink followed by a
blue) is
prob(pink)*prob(pink)*prob(blue)*prob(blue)*prob(blue)*prob(pink)*prob(blue)
= p3*(1-p)4
We’ve got the coefficient, what is the
distribution about?
 But the binomial distribution just tells us what the
probability is of drawing e.g. 3 pink balls, not 3
pink balls at a particular point in the draw
 The probability that you draw a sample with 3 pink
balls and 4 blue balls in no particular order is
= C(7,3) p3*(1-p)4
+
….
Probability distribution
 A probability distribution lists all the possible
outcomes and their probabilities
 Outcomes are mutually exclusive
 e.g. drawing 0, 1, 2, 3… pink balls
 Outcome probabilities sum to one
 e.g. when drawing 7 balls, the probability has to be
one of {0,1,2,3,4,5,6,7}
 Denote p(x) to mean P(X=x), that is the
probability that the outcome is x
Binomial distribution
 The binomial distribution tells us the probability
of drawing k pink balls out of n
 It depends on
 n = the number of trials (draws)
 k = the number of pink balls (successes)
 p = the probability of drawing a pink ball (success)
n k
nk
p(n, k )    p (1  p)
k 
n!
k
nk

p (1  p)
k!(n  k )!
the binomial distribution in R
>barplot(dbinom(0:7,7,0.
5),names.arg=0:7)
0.20
0.15
0.10
> dbinom(3,7,0.5)
[1] 0.2734375
0.05
are equally likely
0.00
 if blue and pink balls
0.25
 dbinom(x, size, prob)
0
1
2
3
4
5
6
7
what if p ≠ 0.5?
0.0
0.1
0.2
0.3
0.4
 > barplot(dbinom(0:7,7,0.1),names.arg=0:7)
0
1
2
3
4
5
6
7
What is the mean?
 mean of a binomial distribution is just n*p
 in general  = E(X) = x p(x)
0.00
0.05
0.10
0.15
0.20
0.25
probabilities that
sum to 1
0*
+1*
+2*
+3*
+4*
 = 3.5
+5*
+6*
+7*
What is the variance?
 variance of a binomial distribution is just
n*p*(1-p)
 in general s2 = E[(X-)2] =  (x-)2 p(x)
*
(0.5)2 *
probabilities that
sum to 1
0.20
0.25
(-0.5)2
0.15
(1.5)2 *
0.10
(-1.5)2 *
0.05
(-2.5)2 *
(2.5)2 *
0.00
(-3.5)2 *
+
+
+
+
+
+
(-3.5)2 *
+
0.0
0.00
0.05
0.1
0.10
0.2
0.15
0.3
0.20
0.4
0.25
Which distribution has greater variance?
0
1
2
3
4
5
6
7
p = 0.5
var = n*p*(1-p) = 7*0.5*0.5 = 7*0.25
0
1
2
3
4
5
6
7
p = 0.1
var = n*p*(1-p) = 7*0.1*0.9=7*0.09
briefly comparing an experiment to a distribution
theoretical
distribution
50
100
150
200
250
300
Histogram of y
result of
1000 trials
0
Frequency
experiments = 1000
tosses = 7
for (i in 1:experiments) {
x = sample(c("H","T"),
tosses, replace = T)
y[i] = sum(x=="H")
}
hist(y,breaks=-0.5:7.5)
lines(0:7,dbinom(0:7,7,0.5)*
1000)
0
2
4
y
6
cumulative distribution
 aka CDF = cumulative density function
 the probability that x is less than or equal to
some value a
Fx a   Pr  X  a    Pr  X  x    p x 
xa
xa
1.0
0.6
0.2
0.4
cumulative distribution
0.6
0.4
0.0
0.2
0.0
probability distribution
0.8
0.8
1.0
cumulative distribution
0
1
2
3
4
5
6
7
P(X=x)
> barplot(dbinom(0:7,7,0.5),names.arg=0:7)
0
1
2
3
4
5
6
7
P(X≤x)
> barplot(pbinom(0:7,7,0.5),names.arg=0:7)
0.0
0.0
0.2
0.2
0.6
0
1
2
3
4
P(X=x)
5
6
7
0.4
0.6
cumulative distribution
0.4
probability distribution
0.8
0.8
1.0
1.0
cumulative distribution
0
1
2
3
4
P(X≤x)
5
6
7
example: surfers on a website
 Your site has a lot of visitors 45% of whom are
female
 You’ve created a new section on gardening
 Out of the first 100 visitors, 55 are female.
 What is the probability that this many or more of
the visitors are female?
 P(X≥55) = 1 – P(X≤54) = 1-pbinom(54,100,0.45)
another way to calculate cumulative
probabilities
 ?pbinom
 P(X≤x) = pbinom(x, size, prob, lower.tail = T)
 P(X>x) = pbinom(x, size, prob, lower.tail = F)
> 1-pbinom(54,100,0.45)
[1] 0.02839342
> pbinom(54,100,0.45,lower.tail=F)
[1] 0.02839342
0.04
0.02
what is the area
under the curve?
0.00
probability distribution
0.06
female surfers visiting a section of a website
0
6 13 21 29 37 45 53 61 69 77 85 93
1.0
cumulative distribution
0.6
0.4
0.2
cumulative distribution
0.8
> 1-pbinom(54,100,0.45)
[1] 0.02839342
0.0
<3 %
0
6 13 21 29 37 45 53 61 69 77 85 93
Another discrete distribution: hypergeometric
 randomly draw n elements without replacement
from a set of N elements, r of which are S’s
(successes) and (N-r) of which are F’s (failures)
 hypergeometric random variable x is the number
of S’s in the draw of n elements
 r  N  r 
 

x  n  x 

p( x) 
N
 
n 
hypergeometric example




fortune cookies
there are N = 20 fortune cookies
r = 18 have a fortune, N-r = 2 are empty
What is the probability that out of n = 5 cookies, s=5
have a fortune (that is we don’t notice that some cookies
are empty)
 > dhyper(5, 18, 2, 5)
 [1] 0.5526316
 So there is a greater than 50% chance that we won’t
notice.
hypergeometric and binomial
 When the population N is (very) big, whether one
samples with or without replacement is pretty much the
same
 100 cookies, 10 of which are empty
0.5
binomial
0.0
0.1
0.2
0.3
0.4
hypergeometric
1
2
3
4
5
number of full cookies out of 5
code aside
> x = 1:5
hypergeometric probability
> y1 = dhyper(1:5,90,10,5)
binomial probability
> y2 = dbinom(1:5,5,0.9)
> tmp = as.matrix(t(cbind(y1,y2)))
> barplot(tmp,beside=T,names.arg=x)
Poisson distribution
 # of events in a given interval
 e.g. number of light bulbs burning out in a building in a year
 # of people arriving in a queue per minute
p ( x) 
x 
e
x!
  = mean # of events in a given interval
Example: Poisson distribution
 You got a box of 1,000 widgets.
 The manufacturer says that the failure rate is 5
per box on average.
 Your box contains 10 defective widgets. What
are the odds?
> ppois(9,5,lower.tail=F)
[1] 0.03182806
 Less than 3%, maybe the manufacturer is not
quite honest.
 Or the distribution is not Poisson?
Poisson approximation to binomial
 If n is large (e.g. > 100) and n*p is moderate (p
should be small) (e.g. < 10), the Poisson is a
good approximation to the binomial with  = n*p
0.00
0.05
0.10
0.15
binomial
Poisson
0
1
2
3
4
5
6
7
8
9
11
13
15
Continuous distributions
 Normal distribution (aka “bell curve”)
 fits many biological data well
 e.g. height, weight
 serves as an approximation to binomial,
hypergeometric, Poisson
 because of the Central Limit Theorem (more on
this later) is important to inference problems
sampling from a normal distribution
0.0
0.1
0.2
0.3
0.4
Histogram of x
Density
x <- rnorm(1000)
h <- hist(x, plot=F)
ylim <range(0,h$density,dnor
m(0))
hist(x,freq=F,ylim=ylim)
curve(dnorm(x),add=T)
-4
-2
0
x
2
4
plotting on log axes
7
 First of all, this is what a log function looks like
2
1
0
y = log(x) is equivalent to
x = exp(y) = ey
3
y
4
5
6
> x = 1:1000
> y = log(x)
> plot(x,y)
0
200
400
600
x
800
1000
plotting the function y = e-x
 > x = 1:20
 > y = exp(-x)
0.3
 > plot(x,y)
0.0
0.1
y
0.2
hard to tell what’s going
on here, all the values
are so close to 0
5
10
x
15
20
1 e-01
1 e-05
1 e-03
5
10
15
x
just y on a log scale
> plot(x,y,log="y")
20
1 e-09
1 e-07
y
1 e-05
1 e-07
1 e-09
y
1 e-03
1 e-01
changing the axes
1
2
5
10
20
x
both x and y on a log scale
> plot(x,y,log="xy")
from PS: CO2 levels over last ~ 50 years
CO2 levels over last ~ 400,000 years