5 normal distributio..
Download
Report
Transcript 5 normal distributio..
NORMAL
DISTRIBUTION
AND ITS
APPL ICATION
INTRODUCTION
Statistically, a population is the set of all
possible values of a variable.
Random selection of objects of the
population makes the variable a random
variable ( it involves chance mechanism)
Example: Let ‘x’ be the weight of a newly
born baby. ‘x’ is a random variable
representing the weight of the baby.
The weight of a particular baby is not
known until he/she is born.
Discrete random variable:
If a random variable can only take values
that are whole numbers, it is called a
discrete random variable.
Example: No. of daily admissions
No. of boys in a family of 5
No. of smokers in a group of 100
persons.
Continuous random variable:
If a random variable can take any value, it
is called a continuous random variable.
Example: Weight, Height, Age & BP.
Continuous Probability Distributions
Continuous distribution has an infinite
number of values between any two
values assumed by the continuous
variable
As with other probability distributions,
the total area under the curve equals 1
Relative frequency (probability) of
occurrence of values between any two
points on the x-axis is equal to the total
area bounded by the curve, the x-axis,
and perpendicular lines erected at the
two points on the x-axis
The Normal or Gaussian distribution is the
most important continuous probability
distribution in statistics.
The term “Gaussian” refers to ‘Carl Freidrich
Gauss’ who develop this distribution.
The word ‘normal’ here does not mean
‘ordinary’ or ‘common’ nor does it mean
‘disease-free’.
It simply means that the distribution
conforms to a certain formula and shape.
Histograms
A kind of bar or line chart
Values
on the x-axis (horizontal)
Numbers on the y-axis (vertical)
Normal distribution is defined by a
particular shape
Symmetrical
Bell-shaped
Histogram
F r e q u e n cy
20
10
0
1 1 .5
2 1 .5
3 1 .5
4 1 .5
5 1 .5
6 1 .5
7 1 .5
Age
Figure 1 Histogram of ages of 60 subjects
A Perfect Normal Distribution
Gaussian Distribution
Many biologic variables follow this pattern
Hemoglobin, Cholesterol, Serum Electrolytes, Blood
pressures, age, weight, height
One can use this information to define
what is normal and what is extreme
In clinical medicine 95% or 2 Standard
deviations around the mean is normal
Clinically,
5% of “normal” individuals
are labeled as extreme/abnormal
We just accept this and move on.
Normal distribution
Most important distribution in statistics
Also called the Gaussian distribution
Density given by
f ( x)
1
2
( x )2
e
2
2
for - < x <
where is the mean and the standard
deviation
Gaussian or Normal Distribution Curve
Characteristics of Normal Distribution
Symmetrical about mean,
Mean, median, and mode are equal
Total area under the curve above the xaxis is one square unit
1 standard deviation on both sides of
the mean includes approximately 68%
of the total area
2 standard deviations includes
approximately 95%
3 standard deviations includes
approximately 99%
Characteristics of the Normal Curve
Values on the
horizontal axis are Z
values ranging from
0< to <1 (probability
units)
The mean is the center
and the values in
Standard Deviations
account for
proportions of the
population
1 SD = 68% of the sample
2 SD= 95% of the sample
3 SD = 99% of the sample
Characteristics of the Normal Distribution
Normal distribution is completely
determined by the parameters
and
values of shift the
distribution along the x-axis
Different values of determine
degree of flatness or peakedness of
the graph
Different
Applications of Normal Distribution
Frequently, data are normally
distributed
Essential for some statistical procedures
If not, possible to transform to a more
normal form
Approximations for other distributions
Because of the frequent occurrence of
the normal distribution in nature, much
statistical theory has been developed
for it.
What’s so Great about the
Normal Distribution?
If you know two things, you know
everything about the distribution
Mean
Standard
deviation
You know the probability of any
value arising
Standardised Scores
My diastolic blood pressure is 100
So
what ?
Normal is 90 (for my age and sex)
Mine
is high
But
how much high?
Express it in standardised scores
How
that?
many SDs above the mean is
Mean = 90, SD = 4 (my age and sex)
My Score - Mean Score 100-90
2.5
SD
4
This is a standardised score, or z-score
Can consult tables (or computer)
See
how often this high (or higher) score
occur
99.38% of people have lower scores
A Z-score Table
Z-Score
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
Proportion
Scoring Lower
0.9998
0.9987
0.9938
0.9772
0.9332
0.8413
0.6915
0.5000
% (Rounded to
whole number)
100%
100%
99%
98%
93%
84%
69%
50%
Standard Normal Distribution
Normal distribution is really family of
curves determined by and
Standard normal distribution is one with a
= 0 and = 1
Standard normal density given by:
f ( x)
1 z2 2
e
2
for - < x <
where z = (x - ) /
Standard Normal Distribution
To find probability that z takes on
a value between any two points on
the z-axis, need to find area
bounded by perpendiculars erected
at these points , the curve, and the
z-axis
Values
are tabled.
Standard normal distribution is
symmetric
Examples of Standard Normal Distribution
Height
and weight
Calculate
z-statistics
Pr(X
< x)
Pr(X > x)
Pr(x1 < X < x2)
Why?
Determine
percentiles
Comparisons between different
distributions
Normal Distributions Go Wrong
Wrong shape
Non-symmetrical
Skew
Too
fat or too narrow
Kurtosis
Aberrant values
Outliers
Effects of Non-Normality
Skew
Bias
parameter estimates
E.g.
mean
Kurtosis
Doesn’t
Does
effect parameter estimates
effect standard errors
Outliers
Depends
Distributions
Bell-Shaped (also
known as symmetric”
or “normal”)
Skewed:
positively (skewed to
the right) – it tails off
toward larger values
negatively (skewed to
the left) – it tails off
toward smaller values
Kurtosis
Outliers
20
10
0
Value
Dealing with Outliers
Error
Data
entry error
Correct it
Real
value
Difficult
Delete
it
ANY
QUESTIONS