Probability Distributions - Sys

Download Report

Transcript Probability Distributions - Sys

Probability Distributions
Random Variables
Week 2
• A random variable, X associates a unique numerical value with every
outcome of an experiment. The value of the random variable will vary
from trial to trial as the experiment is repeated.
• There are two types of random variable - discrete and continuous.
• A random variable has either an associated probability mass
distribution (discrete random variable, PMF) or probability density
function (continuous random variable, PDF).
Random variables can be discrete or
continuous
• Discrete random variables have a countable number of
outcomes
• Examples: Dead/alive, treatment/placebo, dice, counts, etc.
• Continuous random variables have an infinite
continuum of possible values.
• Examples: blood pressure, concentration, weight, the speed
of a car, the real numbers from 1 to 6.
What is a probability distribution?
The set of probabilities for the possible outcomes of a random variable
is called a “probability distribution.”
A probability distribution is a statistical function that describes all the
possible values that a random variable can take within a given range.
What is a probability distribution?
Let X be the number obtained in throwing a fair die. The
corresponding probability function is:
1
𝑓 𝑥 =
6
∑𝑃 𝑋 = 𝑥 = 1
𝑓 𝑥
Heights are actual probabilities
𝑥
What is a probability distribution?
Let X be the sum of the two numbers obtained in throwing
two fair dice. The corresponding probability mass function is:
∑𝑃 𝑋 = 𝑥 = 1
𝑓 𝑥
Heights are actual probabilities
𝑥
Probability Mass Function
Definition: p 𝑥 is a probability mass function for the discrete
random variable X if, for all 𝑥
𝑝 𝑥 ≥0
𝑝(𝑥) = 1
The Mean or Expected Value of a PMF
Probability Mass Functions have means given by:
𝜇 = 𝐸 𝑋 = ∑ 𝑥𝑖 𝑓(𝑥𝑖 )
For example, for the single die:
1
1
1
1
1
1
𝜇 = 1 + 2 + 3 + 4 + 5 + 6 = 3.5
6
6
6
6
6
6
Expected Value = Long range average
Variance of a discrete distribution
𝜎 2 = 𝑉𝑎𝑟 𝑋 = ∑(𝑥𝑖 − 𝜇)2 𝑓(𝑥𝑖 )
Cumulative Distribution Function (CDF)
CDF – cumulative (probability) distribution function; assigns the sum of
probabilities less than or equal to X
Cumulative distribution function (CDF)
1.0
5/6
2/3
1/2
1/3
1/6
P(x)
1
2
3
4
5
6
x
Cumulative distribution function
x
P(x≤A)
1
P(x≤1)=1/6
2
P(x≤2)=2/6
3
P(x≤3)=3/6
4
P(x≤4)=4/6
5
P(x≤5)=5/6
6
P(x≤6)=6/6
Discrete Distributions
Yes-No responses
• Bernoulli Distribution
• Binomial Distribution
• Geometric Distribution
• Poisson Distribution
Sums of Bernoulli responses
Number of trials until first success
Points in time or space
Bernoulli Distribution
• A Bernoulli event is one for which the probability the event occurs is p
and the probability the event does not occur is 1-p; i.e., the event is
has two possible outcomes usually viewed as success or failure.
• A Bernoulli trial is an instantiation of a Bernoulli event. So long as the
probability of success or failure remains the same from trial to trial
(i.e., each trial is independent of the others), a sequence of Bernoulli
trials is called a Bernoulli process.
Bernoulli Distribution
• A Bernoulli distribution is the pair of probabilities of a Bernoulli event
where success (1) and failure (0) have probabilities:
Expectation:
Variance:
Binomial Distribution
Binomial Distribution
Let x be the number of “successes” in n trials. x is said to be a binomially distributed
provided:
1.
2.
3.
4.
The trials are identical and independent
The number of trials is fixed, n
Each trial results in one of two possible outcomes success or failure
The probability of success on a single trial is p, and is constant from trial to trial
x ~ B (n,p)
Binomial Distribution
• The binomial distribution is just n independent Bernoullis added up.
• It is the number of “successes” in n trials.
• If Z1, Z2, …, Zn are Bernoulli, then X is binomial
17
Binomial Distribution
Bernoulli Distribution
For the case when n = 1, the distribution is called the Bernoulli Distribution.
x ~ B (1,p)
Binomial Distribution
There are n independent trials of the experiment
Let p denote the probability of success and then
1 – p is the probability of failure
Let x denote the number of successes in n independent
trials of the experiment. So 0 ≤ x ≤ n
Binomial PMF vs CDF
• Abbreviation for binomial distribution is B(n,p)
• A binomial pmf function gives the probability of a random
variable equaling a particular value, i.e., P(x=2)
• A binomial cdf function gives the probability of a random
variable equaling that value or less , i.e., P(x ≤ 2)
• P(x ≤ 2) = P(x=0) + P(x=1) + P(x=2)
Binomial PMF
The probability of obtaining x successes in n independent trials of a
binomial experiment, where the probability of success is p, is given
by:
is also called a binomial coefficient and is the number of
combinations of n items taken x at a time.
• n = number of trials
• x = number of successes – x axis of the distribution
• p = probability of success
These are all equivalent
Examples
1. Suppose you independently flip a coin 4 times. What is the
probability of obtaining exactly 2 tails?
Number of trials = 4, x = 2, p = 0.5
2. A roulette wheel has 38 slots (US version). What is the
probability of winning twice in 50 spins?
What does the Binomial Distribution Look Like?
Mean and Variance for the Binomial Distrib
• Mean or Expected Value
𝜇=𝑛𝑝
• Variance
𝜎 2 = 𝑛𝑝𝑞
Exercise
Plot various binomial distributions for:
n = 5 and p = 0.1, 0.2, 0.45, 0.8, 0.9
n = 50 and p = 0.1, 0.2, 0.45, 0.8, 0.9
Binomial Distribution Experiment
Basic Experiment: 5 fair coins are tossed.
Event of interest: total number of heads.
The probability of heads coming up (a success) is equal to 0.5.
So the number of heads in the five coins is a binomial random
variable with n=5 and p=0.5.
The Experiment is repeated 50 times.
Each group (or two people) throws coins twice and collate the data on the board.
Binomial Distribution Example
Basic Experiment: 5 fair coins are tossed.
Event of interest: total number of heads.
The probability of heads coming up (a success) is equal to 0.5.
So the number of heads in the five coins is a binomial random
variable with n=5 and p=0.5.
The Experiment is repeated 50 times.
# of heads
0
1
2
3
4
5
Observed
1
11
11
19
6
2
Theoretical
1.56
7.81
15.63
15.63
7.81
1.56
20
18
16
14
12
10
8
6
4
2
0
0 1 2 3 4 5
Week 3
Discrete Distributions
Yes-No responses.
• Bernoulli Distribution
• Binomial Distribution
• Geometric Distribution
• Poisson Distribution
Sums of Bernoulli responses
Number of trials until first success
Points in given time or space
Geometric Distribution
The probability of transfecting a cell line is 30%. The probability of failure = 70%
Let us calculate the probability of success in a given set of transfections.
Experiment 1
Probability of Success
1
0.3 = 0.3
2
0.7 x 0.3 = 0.21
3
0.7 x 0.7 x 0.3 = 0.147
4
0.7 x 0.7 x 0.7 x 0.3 = 0.1029
….
N
(0.7)^N-1 x 0.3
Geometric Distribution – what does it mean?
Experiment 1
Probability of Success
1
0.3 = 0.3
2
0.7 x 0.3 = 0.21
3
0.7 x 0.7 x 0.3 = 0.147
4
0.7 x 0.7 x 0.7 x 0.3 = 0.1029
….
N
(0.7)^N-1 x 0.3 (Tends to zero for large N)
At first glance it might appear that the more attempts you make at a transfection the less likely it will work.
This is NOT what it is saying and in fact this statement makes no sense.
What the distribution tells us is the probability of the first transfection on the kth experiment.
In other words it is more and more unlikely that the first transfection will happen in later experiments than earlier ones.
Geometric Distribution
If a single event or trial has two possible outcomes (say X can be 0 or 1 with P(X=1)=p),
the probability of having to observe k trials before the first "one" appears is given
by the geometric distribution.
• Before we can succeed at trial k, we must first have had k-1 failures
qk-1
Finally, a single success occurs with probability p, so there is a term: p1
• Each failure occurs with probability q, so there is a term with:
•
Geometric Distribution
Mean:
Variance:
Geometric Distribution
Geometric distribution for p = 3/10
x represents the number of trials required to get a success
Geometric Distribution – CDF
Experiment 1
Probability of Success
Probability of Success or Earlier (CDF)
1
0.3 = 0.3
0.7
2
0.7 x 0.3 = 0.21
0.51
3
0.7 x 0.7 x 0.3 = 0.147
0.657
4
0.7 x 0.7 x 0.7 x 0.3 = 0.1029
0.7599
(0.7)^N-1 x 0.3 (Tends to zero for large N)
Tends to 1.0
….
N
Another way of looking at this to look at probability of success at a given trial N, or earlier
To do this we must compute the cumulative distribution function, CDF. Without providing a proof, the CDF for
the geometric distribution is given by:
For example: There is an 76 % chance that after 4 attempts we will succeed
How many experiments must we do so that there is a 99% chance we will succeed at transfecting the cells?
0.99 = 1 – (1 - 0.3)^N Solve for N (Answer: 12.9 experiments)
Class Problem
Many of the pharmaceuticals on the market today were found using
high throughput screening assays. In these assays, large numbers of random molecules
are tested and of these only a few show appreciable activity. If we assume that
the success rate in these screens is one in ten thousand (p=0.0001), then how
large of a library do we need to be 99% sure that we will find at least one active molecule?
Poisson Distribution
Week 3
37
Poisson Distribution
The Poisson distribution arises in two important instances:
1) It is an approximation to the binomial distribution when n is large and p is
small.
2) The Poisson describes the number of events that will occur in a given time
period when the events occur randomly and are independent of one another.
Similarly, the Poisson distribution describes the number of events in a given area
when the presence or absence of a point is independent of occurrences at other
points.
Poisson Distribution
Examples:
Events per unit time
1) Telephone called received in a hour
2) Articles received in a day at an airline’s lost and found
3) Car accidents in a month at a busy intersection
4) Deaths per month due to a rare disease
Events per unit distance
1) Defects occurring in 50 meters of insulated wire
2) Deaths per 10,000 passenger miles
Events per unit area
1) Bacteria per square centimeter of culture plate
Events per unit volume
1) White blood cells in a cubic millimeter of blood
2) Hydrogen atoms per cubic light-year in intergalactic space
Poisson Distribution
X=number of occurrences of event in a given time, distance, area, volume etc.
1. The probability an event occurs in the interval is proportional to the length of the
interval.
2. An infinite number of occurrences are possible.
3. Events occur independently at a rate .
40
Poisson Distribution - Application
• Poisson distribution is applied where random events in space
or time are expected to occur
• Deviation from Poisson distribution may indicate some degree
of non-randomness in the events under study
• Investigation of cause may be of interest
Poisson Distribution
Source: http://en.wikipedia.org/wiki/Poisson_distribution
42
Poisson Distribution
For the Poisson one parameter: 
Mean:
Standard Deviation:
Variance:
43
Poisson Distribution - Example
In a small US town the number of accidents per year is 2.4.
a) What is the probability that in any particular year there will be no accidents?
b) What is the probability that in any particular year there will be 5 accidents?
𝜆 = 2.4 accidents per year
44
Poisson Distribution
Poisson with  =2.4
45
Poisson Distribution - Example
What is the probability that that there will be more than 4 accidents per year?
𝜆 = 2.4 accidents per year
46
The Poisson Distribution
Emission of -particles
• Rutherford, Geiger, and Bateman (1910) counted the number
of -particles emitted by a film of polonium in 2608 successive
intervals of one-eighth of a minute. They counted 10,097 alpha
particles.
• Do their data follow a Poisson distribution?
The Poisson Distribution
Emission of -particles
• Calculation of 𝜆 :
𝜆 = No. of particles per interval
= 10097/2608
= 3.87
• Expected values:
e -3.87(3.87)x
2680  P(x) = 2608 
x!
The Poisson Distribution
Emission of -particles
The fact that the observed and expected
align very closely tells us what?
The Poisson Distribution
Emission of -particles
Random events
Regular events
Clumped events
The Poisson Distribution
If there are 3 x 109 base pairs in the human genome and the mutation
rate per generation per base pair is 10-9, what is the mean number of
new mutations that a child (=genome) will have, what is the variance in
this number, and what will the distribution look like?
The Poisson Distribution
0.5
0.1
1
0.8
1
0.8
0.6
0.4
0.2
0
12
10
8
6
0
2
12
10
8
6
4
0
1
1
0.8
0.6
0.4
0.2
0
2
12
10
8
6
4
2
0
6
4
0.6
0.4
0.2
0
2
1
0.8
0.6
0.4
0.2
0
12
10
8
6
4
2
0
12
10
8
6
4
2
0
1
0.8
0.6
0.4
0.2
0
Vocabulary
• Discrete – Data that can only take on set number of values
• Continuous - Quantitative data that can take on any value between the
minimum and maximum, and any value between two other values
• Trial – each repetition of an experiment
• Success – one assigned result of a binomial experiment
• Failure – the other result of a binomial experiment
• PDF – probability distribution function; assigns a probability to each value of X
• CDF – cumulative (probability) distribution function; assigns the sum of
probabilities less than or equal to X
• Binomial Coefficient – combination of k success in n trials