5 normal distributio..

Download Report

Transcript 5 normal distributio..

NORMAL
DISTRIBUTION
AND ITS
APPL ICATION
INTRODUCTION
Statistically, a population is the set of all
possible values of a variable.
Random selection of objects of the
population makes the variable a random
variable ( it involves chance mechanism)
Example: Let ‘x’ be the weight of a newly
born baby. ‘x’ is a random variable
representing the weight of the baby.
The weight of a particular baby is not
known until he/she is born.
Discrete random variable:
If a random variable can only take values
that are whole numbers, it is called a
discrete random variable.
Example: No. of daily admissions
No. of boys in a family of 5
No. of smokers in a group of 100
persons.
Continuous random variable:
If a random variable can take any value, it
is called a continuous random variable.
Example: Weight, Height, Age & BP.
Continuous Probability Distributions
Continuous distribution has an infinite
number of values between any two
values assumed by the continuous
variable
 As with other probability distributions,
the total area under the curve equals 1
 Relative frequency (probability) of
occurrence of values between any two
points on the x-axis is equal to the total
area bounded by the curve, the x-axis,
and perpendicular lines erected at the
two points on the x-axis

The Normal or Gaussian distribution is the
most important continuous probability
distribution in statistics.
The term “Gaussian” refers to ‘Carl Freidrich
Gauss’ who develop this distribution.
The word ‘normal’ here does not mean
‘ordinary’ or ‘common’ nor does it mean
‘disease-free’.
It simply means that the distribution
conforms to a certain formula and shape.
Histograms

A kind of bar or line chart
 Values
on the x-axis (horizontal)
 Numbers on the y-axis (vertical)

Normal distribution is defined by a
particular shape
 Symmetrical
 Bell-shaped
Histogram
F r e q u e n cy
20
10
0
1 1 .5
2 1 .5
3 1 .5
4 1 .5
5 1 .5
6 1 .5
7 1 .5
Age
Figure 1 Histogram of ages of 60 subjects
A Perfect Normal Distribution
Gaussian Distribution

Many biologic variables follow this pattern

Hemoglobin, Cholesterol, Serum Electrolytes, Blood
pressures, age, weight, height
One can use this information to define
what is normal and what is extreme
 In clinical medicine 95% or 2 Standard
deviations around the mean is normal

 Clinically,
5% of “normal” individuals
are labeled as extreme/abnormal

We just accept this and move on.
Normal distribution
 Most important distribution in statistics
 Also called the Gaussian distribution
 Density given by
f ( x) 
1
2 
( x )2
e
2
2
 for - < x < 
 where  is the mean and  the standard
deviation
Gaussian or Normal Distribution Curve
Characteristics of Normal Distribution
Symmetrical about mean, 
 Mean, median, and mode are equal
 Total area under the curve above the xaxis is one square unit
 1 standard deviation on both sides of
the mean includes approximately 68%
of the total area
 2 standard deviations includes
approximately 95%
 3 standard deviations includes
approximately 99%

Characteristics of the Normal Curve


Values on the
horizontal axis are Z
values ranging from
0< to <1 (probability
units)
The mean is the center
and the values in
Standard Deviations
account for
proportions of the
population
1 SD = 68% of the sample
 2 SD= 95% of the sample
 3 SD = 99% of the sample

Characteristics of the Normal Distribution

Normal distribution is completely
determined by the parameters 
and 
values of  shift the
distribution along the x-axis
 Different values of  determine
degree of flatness or peakedness of
the graph
 Different
Applications of Normal Distribution

Frequently, data are normally
distributed


Essential for some statistical procedures
If not, possible to transform to a more
normal form
Approximations for other distributions
 Because of the frequent occurrence of
the normal distribution in nature, much
statistical theory has been developed
for it.

What’s so Great about the
Normal Distribution?

If you know two things, you know
everything about the distribution
 Mean
 Standard

deviation
You know the probability of any
value arising
Standardised Scores

My diastolic blood pressure is 100
 So

what ?
Normal is 90 (for my age and sex)
 Mine
is high
 But

how much high?
Express it in standardised scores
 How
that?
many SDs above the mean is

Mean = 90, SD = 4 (my age and sex)
My Score - Mean Score 100-90

 2.5
SD
4
This is a standardised score, or z-score
 Can consult tables (or computer)

 See
how often this high (or higher) score
occur
 99.38% of people have lower scores
A Z-score Table
Z-Score
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
Proportion
Scoring Lower
0.9998
0.9987
0.9938
0.9772
0.9332
0.8413
0.6915
0.5000
% (Rounded to
whole number)
100%
100%
99%
98%
93%
84%
69%
50%
Standard Normal Distribution
 Normal distribution is really family of
curves determined by  and 
 Standard normal distribution is one with a
 = 0 and  = 1
 Standard normal density given by:
f ( x) 
1 z2 2
e
2
 for - < x < 
 where z = (x - ) / 
Standard Normal Distribution

To find probability that z takes on
a value between any two points on
the z-axis, need to find area
bounded by perpendiculars erected
at these points , the curve, and the
z-axis
 Values
are tabled.
 Standard normal distribution is
symmetric
Examples of Standard Normal Distribution
 Height
and weight
 Calculate
z-statistics
 Pr(X
< x)
 Pr(X > x)
 Pr(x1 < X < x2)
 Why?
 Determine
percentiles
 Comparisons between different
distributions
Normal Distributions Go Wrong

Wrong shape
 Non-symmetrical
 Skew
 Too
fat or too narrow
 Kurtosis

Aberrant values
 Outliers
Effects of Non-Normality

Skew
 Bias
parameter estimates
 E.g.

mean
Kurtosis
 Doesn’t
 Does

effect parameter estimates
effect standard errors
Outliers
 Depends
Distributions

Bell-Shaped (also
known as symmetric”
or “normal”)

Skewed:


positively (skewed to
the right) – it tails off
toward larger values
negatively (skewed to
the left) – it tails off
toward smaller values
Kurtosis
Outliers
20
10
0
Value
Dealing with Outliers
 Error
 Data
entry error
 Correct it
 Real
value
 Difficult
 Delete
it
ANY
QUESTIONS