Transcript Item VII

The Gaussian (Normal) Distribution:
Briefly, More Details & Some Applications
The Gaussian (Normal) Distribution
• The Gaussian Distribution is one of the most
used distributions in all of science. It is also called
the “bell curve” or the Normal Distribution.
If this is the “Normal Distribution”, logically,
shouldn’t there also be an “Abnormal Distribution”?
Johann Carl Friedrich Gauss
(1736–1806, Germany)
• Mathematician, Astronomer & Physicist.
• Sometimes called the
“Prince of Mathematics" (?)
• A child prodigy in math.
(Do you have trouble believing some of the following? I do!)
• Age 3: He informed his father of a mistake in a payroll
calculation & gave the correct answer!!
• Age 7: His teacher gave the problem of summing all integers
1 - 100 to his class to keep them busy. Gauss quickly wrote
the correct answer 5050 on his slate!!
• Whether or not you believe all of this, it is 100% true that he
Made a HUGE number of contributions to
Mathematics, Physics, & Astronomy!!
Johann Carl Friedrich Gauss
Genius! Made a HUGE number of
contributions to Math, Physics, & Astronomy
1. Proved the Fundamental Theorem of Algebra,
that every polynomial has a root of the form a+bi.
2. Proved the fundamental Theorem of Arithmetic,
that every natural number can be represented as a
product of primes in only one way.
3. Proved that every number is the sum of at most 3 triangular numbers.
4. Developed the method of least squares fitting & many other methods
in statistics & probability.
5. Proved many theorems of integral calculus, including the divergence
theorem (when applied to the E field, it is what is called Gauss’s Law).
6. Proved many theorems of number theory.
7. Made many contributions to the orbital mechanics of the solar system.
8. Made many contributions to Non-Euclidean geometry
9. One of the first to rigorously study the Earth’s magnetic field
Characteristics of a Normal
or Gaussian Distribution
r
. 4
0
. 3
0
. 2
0
. 1
l
i
t r
b
u
i o
n
:
m
=
0
,
s2
=
1
f ( x
0
a
. 0
- 5
x
It is Symmetric
It’s Mean, Median, & Mode are Equal
a
A 2-Dimensional Gaussian
Gaussian or Normal Distribution
• It is a symmetrical, bell-shaped curve.
• It has a point of inflection at a position 1
standard deviation from the mean.
l
Formula:
f (X )
m
1
f (X ) 
( e)
s 2
( X  m )2

2s 2
X
The Normal Distribution
f ( x) 
1
s 2
Note the constants:
 = 3.14159
e = 2.71828
1 xm 2
 (
)
2
s
e
This is a bell shaped curve
with different centers &
spreads depending on m & s
• There are only 2 variables that determine the
curve, the mean m & the variance s.
The rest are constants.
• For “z scores” (m = 0, s = 1), the equation
becomes:
1 z2 / 2
f ( z) 
e
2
• The negative exponent means that big |z|
values give small function values in the tails.
Normal Distribution
• It’s a probability function, so no matter what
the values of m and s, it must integrate to 1!

s

1
2
1 xm 2
 (
)
 e 2 s dx
1
The Normal Distribution is Defined by
its Mean & Standard Deviation.

m=

x

s2

= (x
l

2
1
s 2
1
s 2
1 xm 2
 (
)
 e 2 s dx
1 xm 2
 (
)
e 2 s
dx)  m
Standard Deviation = s
2
Normal Distribution
• Can take on an infinite
number of possible values.
• The probability of any one
of those values occurring
is essentially zero.
• Curve has area or
probability = 1
7-6
• A normal distribution with a mean m = 0
& a standard deviation s = 1 is called
The standard normal distribution.
• Z Value: The distance between a
selected value, designated X, and the
population mean m, divided by the
population standard deviation, s
X m
Z 
s
7-7
Example 1
• The monthly incomes of recent MBA graduates in
a large corporation are normally distributed with a
mean of $2000 and a standard deviation of $200.
What is the Z value for an income of $2200? An
income of $1700?
• For X = $2200, Z= (2200-2000)/200 = 1.
• For X = $1700, Z = (1700-2000)/200 = -1.5
• A Z value of 1 indicates that the value of $2200 is
1 standard deviation above the mean of $2000,
while a Z value of $1700 is 1.5 standard deviation
below the mean of $2000.
Probabilities Depicted by Areas
Under the Curve
• Total area under the curve is 1
• The area in red is equal to
p(z > 1)
• The area in blue is equal to
p(-1< z <0)
• Since the properties of the
normal distribution are
known, areas can be looked
up on tables or calculated on a
computer.
Probability of an Interval
F (2)  F (1)  p (1  X  2)
Cumulative Probability
F (a)  p( X  a)
Normal Curv e
probability density
Cumulative Probability
1  F (a)  p(a  X )
-3
-1
0
Z
a=X
2
3
• Given any positive value for z, the corresponding
probability can be looked up in standard tables.
A table will give this
probability
Given positive z
The probability found using a table is the
probability of having a standard normal variable
between 0 & the given positive z.
Areas Under the Standard Normal Curve
Areas and Probabilities
• The Table shows cumulative normal
probabilities. Some selected entries:
z
F(z)
z
F(z)
z
F(z)
0
.1
.2
.50
.54
.58
.3
.4
.5
.62
.66
.69
1
2
3
.84
.98
.99
• About 54 % of scores fall below z of .1. About 46 % of
scores fall below a z of -.1 (1-.54 = .46). About 14% of
scores fall between z of 1 and 2 (.98-.84).
7-9
Areas Under the Normal Curve
• About 68 percent of the area under the normal
curve is within one standard deviation of the mean:
-s < m < s
• About 95 percent is within two standard deviations
of the mean:
-2s < m < 2s
• About 99.74 percent is within three standard
deviations of the mean:
-3s < m < 3s
7-10
r
a
l
i
t r
b
u
i o
n
:
m
=
0
,
s2
=
1
Areas Under the Normal Curve
. 4
0
. 3
0
. 2
0
. 1
Between:
1.68.26%
2.95.44%
3.99.74%
f ( x
0
. 0
- 5
m  3s
Irwin/McGraw-Hill
m
m  2s
x
m  1s
m  2s
m  1s
m  3s
© The McGraw-Hill Companies, Inc., 1999
Key Areas Under the Curve
For normal
distributions
+ 1 s ~ 68%
+ 2 s ~ 95%
+ 3 s ~ 99.9%
“68-95-99.7 Rule”
68% of
the data
95% of the data
99.7% of the data
68.26 -95.44-99.74 Rule
For a Normally distributed variable:
1. > 68.26% of all possible observations lie within
one standard deviation on either side of the mean
(between ms and ms).
2. > 95.44% of all possible observations lie within
two standard deviations on either side of the mean
(between m2s and m2s).
3. > 99.74% of all possible observations lie within
three standard deviations on either side of the
mean (between m3s and m3s).
• Using the unit normal (z), we can find areas and
probabilities for any normal distribution.
• Suppose X = 120, m =100, s =10.
• Then z = (120-100)/10 = 2.
• About 98 % of cases fall below a score of 120 if

the distribution is normal. In the normal, most

(95%) are within 2 s of the mean. Nearly
everybody (99%) is within 3 s of the mean.
68.26-95.44-99.74 Rule
68-95-99.7 Rule in Math terms…
m s

m s s

m  2s

m s s
2
m  3s

m s s
3
1
2
1
2
1
2
1 xm 2
 (
)
 e 2 s dx  .68
1 xm 2
 (
)
 e 2 s dx  .95
1 xm 2
 (
)
 e 2 s dx  .997
7-11
Example 2
• The daily water usage per person in New
Providence, New Jersey is normally distributed
with a mean of 20 gallons and a standard
deviation of 5 gallons.
• About 68% of the daily water usage per person in
New Providence lies between what two values?
• That is, about 68% of the daily water usage will
lie between 15 and 25 gallons.
m  1s  20  1(5).
7-18
Normal Approximation to the Binomial
• Using the normal distribution (a continuous
distribution) as a substitute for a binomial
distribution (a discrete distribution) for large
values of n seems reasonable because as n
increases, a binomial distribution gets closer and
closer to a normal distribution.
• The normal probability distribution is generally
deemed a good approximation to the binomial
probability distribution when n and n - 1 are both
greater than 5.
7-20
Binomial Distribution for n = 3 & n = 20
n=20
0.4
0.2
0.3
0.15
P(x)
P(x)
n=3
0.2
0.1
0.1
0.05
0
0
0
1
2
3
number of occurences
2
4
6 8 10 12 14 16 18 20
number of occurences
Central Limit Theorem
• Flip coin N times
• Each outcome has an associated random variable
Xi (= 1, if heads, otherwise 0)
• Number of heads:
NH = x1 + x2 + …. + xN
• NH is a random variable
Central Limit Theorem
• Coin flip problem.
• Probability function of NH
– P(Head) = 0.5 (fair coin)
N=5
N = 10
N = 40
Central Limit Theorem
The distribution of the sum of N random variables
becomes increasingly Gaussian as N grows.
Example: N uniform [0,1] random variables.
112.3
127.8
143.3
25
20
P
e
r
c
e
n
t
15
10
5
0
80
90
100
110
120
POUNDS
130
140
150
160
Normal Distribution
%
Probability / %
%
Normal Distribution
Why are normal distributions so important?
• Many dependent variables are commonly assumed to
be normally distributed in the population
• If a variable is approximately normally distributed we
can make inferences about values of that variable
• Example: Sampling distribution of the mean
• So what?
• Remember the Binomial distribution
– With a few trials we were able to calculate
possible outcomes and the probabilities of
those outcomes
Normal Distribution
Why are normal distributions so important?
• Remember the Binomial distribution
– With a few trials we were able to calculate possible
outcomes and the probabilities of those outcomes
• Now try it for a continuous distribution with an
infinite number of possible outcomes. Yikes!
• The normal distribution and its properties are
well known, and if our variable of interest is
normally distributed, we can apply what we know
about the normal distribution to our situation, and
find the probabilities associated with particular
outcomes.
• Since we know the shape of the normal curve,
we can calculate the area under the curve
• The percentage of that area can be used to
determine the probability that a given value
could be pulled from a given distribution.
• The area under the curve tells us about the
probability- in other words we can obtain a pvalue for our result (data) by treating it as a
normally distributed data set.