Binomial distribution

Download Report

Transcript Binomial distribution

Probability theory 2
Tron Anders Moger
September 13th 2006
The Binomial distribution
• Bernoulli distribution: One experiment with
two possible outcomes, probability of
success P.
• If the experiment is repeated n times
• The probability P is constant in all
experiments
• The experiments are independent
• Then the number of successes follows a
binomial distribution
The Binomial distribution
If X has a Binomial distribution, its PDF is
defined as:
n!
x
n x
P( X  x) 
P (1  P)
x!(n  x)!
E ( X )  nP
Var ( X )  nP (1  P )
Example
• Since the early 50s, 10000 UFO’s have been
reported in the U.S.
• Assume P(real observation)=1/100000
• Binomial experiments, n=10000, p=1/100000
• X counts the number of real observations
P(At least one observatio n is real)  P( X  1)  1  P( X  0)
10000  1  
1 

 1  
 1 

0
 10000   10000 
0
10000
 0.095  9.5%
The Hypergeometric distribution
• Randomly sample n objects from a group of
N, S of which are successes. The
distribution of the number of successes, X,
in the sample, is hypergeometric distributed:
 S  N  S 
S!
( N  S )!
 


x n  x  x!( S  x)! (n  x)!( N  S  n  x)!
P( X  x)   

N!
N
 
n!( N  n)!
n 
Example
• What is the probability of winning the
lottery, that is, getting all 7 numbers on your
coupon correct out of the total 34?
 7  34  7 
7!
(34  7)!
 


7
7

7
  7!(7  7)! (7  7)!(34  7  7  7)!  1.86  10 7
P( X  7)   
34!
 34 
 
7!(34  7)!
7 
The distribution of rare events:
The Poisson distribution
• Assume successes happen independently, at
a rate λ per time unit. The probability of x
successes during a time unit is given by the
Poisson distribution:
 x
e 
P( x) 
x!
E( X )  
Var ( X )  
Example: AIDS cases in 1991 (47
weeks)
• Cases per week:
110121300000021221301000
11111021020216100102000
• Mean number of cases per week:
λ=44/47=0.936
• Can model the data as a Poisson process
with rate λ=0.936
Example cont’d:
No. of
No.
Expected no. observed
cases
observed
(from Poisson dist.)
0
20
18.4
1
16
17.2
2
8
8.1
3
2
2.5
4
0
0.6
5
0
0.11
6
1
0.017
• Calculation: P(X=2)=0.9362*e-0.936/2!=0.17
• Multiply by the number of weeks: 0.17*47=8.1
• Poisson distribution fits data fairly well!
The Poisson and the Binomial
•
•
•
•
Assume X is Bin(n,P), E(X)=nP
Probability of 0 successes: P(X=0)=(1-p)n
Can write λ =nP, hence P(X=0)=(1- λ/n)n
If n is large and P is small, this converges to e-λ,
the probability of 0 successes in a Poisson
distribution!
• Can show that this also applies for other
probabilities. Hence, Poisson approximates
Binomial when n is large and P is small (n>5,
P<0.05).
Bivariate distributions
• If X and Y is a pair of discrete random
variables, their joint probability function
expresses the probability that they
simultaneously take specific values:
–
–
–
–
P ( x, y )  P ( X  x  Y  y )
marginal probability: P( x)   P( x, y )
P ( x, y )
P( x | y ) 
conditional probability:
P( y )
X and Y are independent if for all x and y:
y
P ( x, y )  P ( x ) P ( y )
Example
• The probabilities for
– A: Rain tomorrow
– B: Wind tomorrow
are given in the following table:
No wind
Some wind Strong wind
Storm
No rain
0.1
0.2
0.05
0.01
Light rain
0.05
0.1
0.15
0.04
Heavy rain
0.05
0.1
0.1
0.05
Example cont’d:
• Marginal probability of no rain: 0.1+0.2+0.05+0.01=0.36
• Similarily, marg. prob. of light and heavy rain: 0.34 and
0.3. Hence marginal dist. of rain is a PDF!
• Conditional probability of no rain given storm:
0.01/(0.01+0.04+0.05)=0.1
• Similarily, cond. prob. of light and heavy rain given storm:
0.4 and 0.5. Hence conditional dist. of rain given storm is a
PDF!
• Are rain and wind independent? Marg. prob. of no wind:
0.1+0.05+0.05=0.2
P(no rain,no wind)=0.36*0.2=0.072≠0.1
Covariance and correlation
• Covariance measures how two variables vary
together:
Cov( X , Y )  E ( X  E( X ))(Y  E(Y ))  E( XY )  E( X ) E(Y )
• Correlation is always between -1 and 1:
Corr ( X , Y ) 
Cov( X , Y )
Cov( X , Y )

Var ( X )Var (Y )
 XY
• If X,Y independent, then E ( XY )  E ( X ) E (Y )
• If X,Y independent, then Cov( X , Y )  0
• If Cov(X,Y)=0 then Var ( X  Y )  Var ( X )  Var (Y )
Continuous random variables
• Used when the outcomes can take any
number (with decimals) on a scale
• Probabilities are assigned to intervals of
numbers; individual numbers generally
have probability zero
• Area under a curve: Integrals
Cdf for continuous random variables
• As before, the cumulative distribution
function F(x) is equal to the probability of
all outcomes less than or equal to x.
• Thus we get P(a  X  b)  F (b)  F (a)
• The probability density function is however
b
now defined so that
P (a  X  b)   f ( x)dx
• We get that
F ( x0 ) 
x0


a
f ( x) dx
Expected values
• The expectation of a continuous random
variable X is defined as
E ( X )   xf ( x)dx
• The variance, standard deviation,
covariance, and correlation are defined
exactly as before, in terms of the
expectation, and thus have the same
properties
Example: The uniform distribution
on the interval [0,1]
• f(x)=1
• F(x)=x 1
1
1
1 2
• E ( X )   xf ( x)dx   xdx   2 x  
0
0
0
2
2
Var
(
X
)

E
(
X
)

E
(
X
)
•
1
  x d ( x)   0.5   13  14  121
2
0
2
1
2
The normal distribution
• The most used continuous probability
distribution:
– Many observations tend to approximately
follow this distribution
– It is easy and nice to do computations with
– BUT: Using it can result in wrong conclusions
when it is not appropriate
Histogram of weight with normal curve
displayed
Distribution of weight among 95 students
25
20
15
10
5
0
40.0
45.0
50.0
Weight (kg)
55.0
60.0
65.0
70.0
75.0
80.0
85.0
90.0
95.0
The normal distribution
• The probability density function is
f ( x) 
1
2
2
e
 ( x   )2 / 2 2
where E ( X )   Var ( X )   2
2
Notation N (  ,  )
Standard normal distribution N (0,1)
Using the normal density is often OK unless
the actual distribution is very skewed
• Also: µ±σ covers ca 65% of the distribution
• µ±2σ covers ca 95% of the distribution
•
•
•
•
The normal distribution with small and
large standard deviation σ
0.4
0.3
0.2
0.1
02
4
6
8
10
x
12
14
16
18
20
Simple method for checking if data
are well approximated by a normal
distribution: Explore
• As before, choose Analyze->Descriptive
Statistics->Explore in SPSS.
• Move the variable to Dependent List (e.g.
weight).
• Under Plots, check Normality Plots with
tests.
Histogram of lung function for the
students
20
16
12
8
4
Std. D ev = 120.12
Mean = 503
N = 95.00
0
300
400
350
500
450
600
550
700
650
800
750
Aver age PEF value measured in a sitting position
Q-Q plot for lung function
Normal Q-Q Plot of PEFSITTM
3
2
1
0
-1
-2
-3
200
300
Observed Value
400
500
600
700
800
Age – not normal
Histo gram
50
40
30
20
10
Std. D ev = 3.11
Mean = 22.4
N = 95.00
0
20.0
Age
22.5
25.0
27.5
30.0
32.5
35.0
Q-Q plot of age
Normal Q-Q Plot of AGE
3
2
1
0
-1
-2
10
Observed Value
20
30
40
A trick for data that are skewed to
the right: Log-transformation!
40
30
20
10
Std. Dev = 1.71
Mean = 1.50
0
0.0
0
1.0
0
2.0
0
3.0
0
4.0
0
5.0
0
6.0
0
7.0
0
8.0
0
9.0
0
10
.00
= 1 06.00
1N
1.0
0
SKEWED
Skewed distribution, with e.g. the observations 0.40, 0.96, 11.0
Log-transformed data
14
12
10
8
6
4
Std. Dev = 1.05
2
Mean = -.12
N = 1 06.00
0
25
2.
75
1.
25
1.
5
.7
5
.2
25
-.
75
-.
5
.2
-1
5
.7
-1
5
.2
-2
5
.7
-2
LNSKEWD
ln(0.40)=-0.91
ln(0.96)=-0.04
ln(11) =2.40
Do the analysis on logtransformed data
SPSS: transform- compute
OK, the data follows a normal
distribution, so what?
• First lecture, pairs of terms:
– Sample – population
– Histogram – distribution
– Mean – Expected value
• In statistics we would like the results from
analyzing a small sample to apply for the
population
• Has to collect a sample that is representative
w.r.t. age, gender, home place etc.
New way of reading tables and
histograms:
• Histograms show that data can be described by a normal
distribution
• Want to conclude that data in the population are normally
distributed
• Mean calculated from the sample is an estimate of the
expected value µ of the population normal distribution
• Standard deviation in the sample is an estimate of σ in the
population normal distribution
• Mean±2*(standard deviation) as estimated from the sample
(hopefully) covers 95% of the population normal
distribution
In addition:
• Most standard methods for analyzing continuous
data assumes a normal distribution.
• When n is large and P is not too close to 0 or 1, the
Binomial distribution can be approximated by the
normal distribution
• A similar phenomenon is true for the Poisson
distribution
• This is a phenomenon that happens for all
distributions that can be seen as a sum of
independent observations.
• Means that the normal distribution appears
whenever you want to do statistics
The Exponential distribution
• The exponential distribution is a distribution for
positive numbers (parameter λ):
f (t )  e t
• It can be used to model the time until an event,
when events arrive randomly at a constant rate
E (T )  1/ 
Var (T )  1/  2
Next time:
• Sampling and estimation
• Will talk much more in depth about the
topics mentioned in the last few slides today