Importance of the normal distribution
Download
Report
Transcript Importance of the normal distribution
Importance of the
normal distribution
(Session 09)
SADC Course in Statistics
Learning Objectives
At the end of this session you will be able to:
• discuss reasons why the normal probability
distribution is important
• state the Central Limit Theorem and its
value in approximating Binomial and
Poisson probabilities by normal probabilities
• explain how the assumption of normality
for a given random variable can be checked
To put your footer here go to View > Header and Footer
2
Importance of Normal Distribution
• Many measurements can be closely
approximated by the normal distribution
since many variables show normal variation
as a resultant of many minor influences up
and down
• Data which are not normal, can often be
transformed into a normal random variable
• The normal distribution underpins a lot of
inference ideas. We have seen that
probability statements about any normally
distributed variable can be done via N(0,1)
To put your footer here go to View > Header and Footer
3
The Central Limit Theorem (CLT)
• One of the key reasons why the normal
distribution is important is because of the
Central Limit Theorem (CLT).
• This theorem states that the sample mean
of any random variable has an
approximate normal distribution, provided
that the sample size is sufficiently large.
To put your footer here go to View > Header and Footer
4
Consequences of the Central
Limit Theorem
• Many statistical techniques are based on the
assumption that the mean of the distribution
follows a normal distribution
• As a consequence of the Central Limit
Theorem, the above assumption is not
invalidated as long as the sample size is large
enough, e.g. say > about 30.
• The CLT also implies that the binomial and
Poisson probabilities approach the normal
probabilities as n becomes large (see below).
To put your footer here go to View > Header and Footer
5
Normal approximation to the
binomial distribution
• Recall that the form of the binomial
distribution for p=0.5 closely resembles the
normal distribution
• This is because the binomial probabilities
are symmetric when p=0.5
• However, even with p0.5, the normal
approximation holds for large n because a
binomial random variable is the mean of
several Bernoulli random variables and then
the CLT applies
To put your footer here go to View > Header and Footer
6
Normal approximation to the
Poisson distribution
• Recall from previous session (slides 8-12)
that as the Poisson parameter becomes
large, the shape of the Poisson distribution
becomes bell-shaped and symmetrical
• This is again a consequence of the CLT
since is the mean of the Poisson
distribution
To put your footer here go to View > Header and Footer
7
More formally…
If X is an average of a series of n Bernoulli
random variables (0,1 variables), then
Z
Xp
p (1 p )
n
has a normal distribution with mean 0 and
variance 1 (standard normal) when the
sample size n is large.
Note that X = r/n, where r=number of
successes in n trials, i.e. r is a binomial
random variable.
To put your footer here go to View > Header and Footer
8
and further …
The same result is true for the Poisson
average, i.e. Z defined below can be
approximated by the standard normal
distribution for large values of .
Z
Y
n
To put your footer here go to View > Header and Footer
9
Checking for normality
Thus the normal distribution plays an
important role in statistics.
Most of the techniques covered in Modules
H2 and H8 are based on assuming that the
key response of interest follows a normal
distribution.
We therefore need to be able to check
whether measurements on a given random
variable follows a normal distribution.
This is done by producing a normal
probability plot.
To put your footer here go to View > Header and Footer
10
Normal Probability Plot
Statistics software packages generally have
a facility for producing this plot.
Below is the plot for maize cob weights. In
this plot, the Y-axis corresponds to values
you would expect from an actual normal
distribution. The X-axis corresponds to your
data.
This implies that a straight line indicates the
normality assumption is valid.
What do you deduce from graph below?
To put your footer here go to View > Header and Footer
11
To put your footer here go to View > Header and Footer
12