Continuous Distributions - Sys

Download Report

Transcript Continuous Distributions - Sys

Continuous Distributions
1
Continuous Distributions
Foundations for much of
statistical inference
•
•
•
•
•
•
•
•
Normal Distribution
Log Normal Distribution
Gamma Distribution
Chi Square Distribution
F Distribution
t Distribution
Weibull Distribution
Extreme Value Distribution
(Type I and II)
• Exponential Distribution
Environmental variables
Time to failure, radioactivity
Basis for statistical tests.
Lifetime distributions
Reaction Kinetics
Continuous random variables are defined for continuous
numbers on the real line. Probabilities have to be computed
for all possible sets of numbers.
2
Continuous Distributions
The distributions discussed so far have only has a discrete set of possible
outcomes (eg 0,1,2,...).
Now we'll discuss continuous distributions, whose outcomes lie along the
real line.
Strange Observation:
One interesting point about continuous probability distributions is that,
because an infinite number of points lie on the real line, the probability
of observing any particular point is effectively zero. This means that the
height of the curve does not represent the probability
3
Continuous Distributions (PDF)
Continuous distributions are described by probability density functions, f(x)
What is the meaning of the probability function f(x) when X is continuous?
First observe that it is meaningless to define events in terms of single
continuous values. The probability of an event occurring at
2.35678935465457348204945023983598459830923…. is zero.
A continuous random variable has an infinite number of values.
Thus for a continuous random variable, an event must be defined in terms
of an interval of values.
4
Continuous Distributions
One can therefore find the probability that a random variable X will fall
between two values by integrating f(x) over the interval:
The total integral over the real line must equal one:
Any one point has zero probability of occurrence.
5
Continuous Distributions
Big difference between discrete and continuous distributions:
Height is the probability (Sum of heights = 1)
The area is the probability (Total area = 1)
6
Probability Density Function (PDF)
A function which integrates to 1 over its range and from
which event probabilities can be determined.
f(x)
Area under curve
sums to one.
Random variable range
7
Probability Density Function
Chi Square density functions
x
2
0. 0.1 0.2 0.3 0.4 0.5
The pdf does not
have to be
symmetric, nor
be defined for all
real numbers.
fX(x|b)
The shape of the
curve is
determined by
one or more
distribution
parameters.
0
5 10
15
20
25
30
y
8
Normal Distribution
Or Gaussian Distribution
The Gaussian distribution, or Normal distribution, is
probably the most commonly encountered continuous
distribution. Each time you take a set of data, average
it and calculate the standard deviation of that data,
one implicitly assumes that the underlying
distribution is Gaussian.
The normal distribution is the distribution
that is expected when measurements are
made up from a large number of 'noise'
components that are all distributed in the
same way as each other.
Many biological and physical
measurements have lots of sources of
inaccuracy and noise and so the
distributions of those measurements will
be approximately normal, as long as the
distributions of those components is
9
similar (They don’t have to be normal!)
Normal Distribution - Properties
1.
2.
3.
4.
The mean, median, and mode are equal
The normal curve is bell-shaped and symmetric about the mean
The total area under the curve is equal to one
The normal curve asymptotically approaches zero on either side of
the mean.
10
Standard Normal Distribution: Z score
Rescales any normal distribution axis from its true units (time, weight, dollars, barrels, and so forth) to
the standard measure referred to as a z-value. Thus, any value of the normally distributed continuous
random variable can be represented by a unique z-value.
1. Moves mean to zero
2. Normalizes the standard deviation so that 68% mark is now at the x value 1.0
11
Standard Normal Distribution
All normal random
variables can be related
back to the standard
normal random
variable.
m-3s
-3
m-2s m-s
m
m+s m+2s m+3s
-2
0
+1
-1
+2
+3
A Standard Normal
random variable has
mean 0 and
standard deviation 1.
12
Illustration
Density of (X-m)/s
Density of X-m
Density of X
s
1
m
0
13
Normal Distribution - Properties
To the left of 𝜇 − 𝜎 and the right of 𝜇 + 𝜎 the graph curves upwards. The graph curves
downwards to the right of 𝜇 − 𝜎 and 𝜇 + 𝜎. The points at which the curve changes are
called the inflection points.
Inflection point: Where the second derivative is zero and changes sign
14
Normal Distribution
A symmetric distribution defined on the range - to +  whose shape is
defined by two parameters, the mean, denoted 𝜇, that centers the
distribution, and the standard deviation, 𝜎, that determines the spread
of the distribution.
68% of total area is
between 𝜇 − 𝜎 and 𝜇 + 𝜎
Inflection Point
P( m - s  X  m + s )  68%
15
Normal Distribution
16
Notation
Suppose 𝑿 has a normal distribution with mean m and standard deviation
s, we often denote this by 𝑿~𝑵(𝝁, 𝝈).
A new random variable defined as 𝒁 = (𝑿 − 𝝁)/𝝈, has the standard
normal distribution, denoted 𝒁~ 𝑵 𝟎, 𝟏
𝝈𝒁 + 𝝁 = 𝑿
To create a random variable with specific mean and
standard deviation, we start with a standard normal
deviate, multiply it by the target standard deviation, and
then add the target mean.
Why is this important? Because in this way, the probability of any event on a normal
random variable with any given mean and standard deviation can be computed from
tables of the standard normal distribution.
Tables in statistics textbooks often have pre-calculated tables that show how the z-score
varies with the probability density.
17
Example
Assume that a set of test scores has a mean of 150 and standard deviation of
25.
If a particular student had a score of 190, what is his/her z –score?
𝑥 −𝜇
𝑧=
𝜎
Therefore z = (190-150)/25 = 1.6
That is the score is 1.6 standard deviations above form the mean.
What percentage of students have scores above this?
18
Look up table
p-value = P(Z>+1.6)
=1 - P(Z<1.6)
=1 – 0.9452
= 0.0548
= 5.5 %
19
Look up table
Because of symmetry we
could also have looked up
the area from –infinity to -1.6
20
Exercises
1. If z = 2.15, what is the area beyond z?
2. Find the area below z
3. What is the sum of the above two areas?
4. What is the area between the mean and 2.15 standard deviations
5. What is the probability of obtaining a z score between −2.20 and 0.25 on the standard normal curve?
6. What z score is exceeded by 10% of all scores under the normal curve?
21
Example
After running repeated experiments, we find that the doubling time for a
particular strain of E. coli is 58 minutes with a standard deviation of 10
minutes. Using z-scores, determine the range of expected doubling times at
the 95% and 99% confidence levels.
Rearrange the z-score formula to solve for x (both upper and lower):
𝑥 =𝜇+𝑧𝜎
Look up a standard table to find out what the z score is for 95%
𝑥𝑢𝑝𝑝𝑒𝑟 = 58 + 1.645 × 10
22
Example
Look up a standard table
to find out what the z score
is for 95%
1.645
23
Example
Look up a standard table
to find out what the z score
is for 95%
𝑥𝑢𝑝𝑝𝑒𝑟 = 58 + 1.645 × 10 = 74.45 mins
𝑥𝑙𝑜𝑤𝑒𝑟 = 58 − 1.645 × 10 = 41.55 mins
Class you work out
the 99% limits.
24
Week 4: Exercise
A pharmaceutical company manufactures stocks of Ebola vaccine. The
vaccine has a shelf life that is approximately normally distributed with mean
equal to 800 hours and standard deviation of 40 hours. Find the probability
that a random sample of 16 vials of vaccine will have an likely shelf life of 775
hours?
25