Normal and Standard Normal Distributions

Download Report

Transcript Normal and Standard Normal Distributions

Normal and Standard Normal
Distributions
June 29, 2004
Histogram
Percent of total that fall in the 10pound interval.
Data are divided into 10pound groups (called
“bins”).
With only one woman <100 lbs,
this 25
bin represents <1% of the
total 120-women sampled.
115-125
125-135
20
135-145
P
e 15
r
c
e
n 10
t
105-115
95-105
145-155
155-165
5
85-95
0
80
90
100
110
120
POUNDS
130
140
150
160
What’s the shape of the
distribution?
25
20
P
e 15
r
c
e
n 10
t
5
0
80
90
100
110
120
POUNDS
130
140
150
160
~ Normal Distribution
25
20
P
e
r
c
e
n
t
15
10
5
0
80
90
100
110
120
POUNDS
130
140
150
160
The Normal Distribution

Equivalently the shape is described as:
“Gaussian” or “Bell Curve”
 Every normal curve is defined by 2
parameters:
– 1. mean - the curve’s center
– 2. standard deviation - how fat the curve is
(spread)

X ~ N (, 2)
Examples:








height
weight
age
bone density
IQ (mean=100; SD=15)
SAT scores
blood pressure
ANYTHING YOU
AVERAGE OVER A
LARGE ENOUGH #
A Skinny Normal Distribution
More Spread Out...
Wider Still...
The Normal Distribution:
as mathematical function
f ( x) 
Note constants:
=3.14159
e=2.71828
1
 2
1 x 2
 (
)
e 2 
Integrates to 1



1
2
1 x 2
 (
)
 e 2  dx
1
Expected Value

E(X)=


x
1
 2
1 x 2
 (
)
e 2 
dx
=
Variance

Var(X)=

( x

2
1
 2
Standard Deviation(X)=
1 x 2
 (
)
 e 2  dx) 

2
= 2
normal curve with =3 and =1
**The beauty of the normal curve:
No matter what  and  are, the area between - and
+ is about 68%; the area between -2 and +2 is
about 95%; and the area between -3 and +3 is
about 99.7%. Almost all values fall within 3 standard
deviations.
68-95-99.7 Rule
68% of
the data
95% of the data
99.7% of the data
How good is rule for real data?
Check the example data:
The mean of the weight of the women = 127.8
The standard deviation (SD) = 15.5
68% of 120 = .68x120 = ~ 82 runners
In fact, 79 runners fall within 1-SD (15.5 lbs) of the mean.
112.3
127.8
143.3
25
20
P
e
r
c
e
n
t
15
10
5
0
80
90
100
110
120
POUNDS
130
140
150
160
95% of 120 = .95 x 120 = ~ 114 runners
In fact, 115 runners fall within 2-SD’s of the mean.
96.8
127.8
158.8
25
20
P
e
r
c
e
n
t
15
10
5
0
80
90
100
110
120
POUNDS
130
140
150
160
99.7% of 120 = .997 x 120 = 119.6 runners
In fact, all 120 runners fall within 3-SD’s of the mean.
81.3
127.8
174.3
25
20
P
e
r
c
e
n
t
15
10
5
0
80
90
100
110
120
POUNDS
130
140
150
160
Example

Suppose SAT scores roughly follows a normal
distribution in the U.S. population of collegebound students (with range restricted to 200-800),
and the average math SAT is 500 with a standard
deviation of 50, then:
– 68% of students will have scores between 450 and 550
– 95% will be between 400 and 600
– 99.7% will be between 350 and 650
Example
BUT…
 What if you wanted to know the math SAT
score corresponding to the 90th percentile
(=90% of students are lower)?
P(X≤Q) = .90 

Q
1 x  500 2
)
50
dx
 (
1
e 2
(50) 2
200

Solve for Q?….Yikes!
 .90
The Standard Normal
Distribution
“Universal Currency”

Standard normal curve: =0 and =1

Z ~ N (0, 1)
f ( z) 
1
2
1
 z2
e 2
The Standard Normal Distribution (Z)
All normal distributions can be converted into
the standard normal curve by subtracting the
mean and dividing by the standard deviation:
Z
X 

Somebody calculated all the integrals for the standard
normal and put them in a table! So we never have to
integrate!
Even better, computers now do all the integration.
Example

For example: What’s the probability of getting a math SAT
score of 575 or less, =500 and =50?
575  500
Z
 1.5
50
i.e., A score
of 575 is 1.5 standard deviations above the mean
575
 P( X  575) 
1
 (50)
200
2
1.5
1 x 500 2
 (
)
 e 2 50 dx 



1
2
1
 Z2
 e 2 dz
Yikes!
But to look up Z= 1.5 in standard normal chart (or enter
into SAS) no problem! = .9332
Use SAS to get area
You can also use also use SAS:
data _null_;
theArea=probnorm(1.5);
put theArea;
run;
0.9331927987
This function gives the
area to the left of X
standard deviations in a
standard normal curve.
In-Class Exercise
a.
b.
If birth weights in a population are
normally distributed with a mean of 109
oz and a standard deviation of 13 oz,
What is the chance of obtaining a birth
weight of 141 oz or heavier when
sampling birth records at random?
What is the chance of obtaining a birth
weight of 120 or lighter?
Answer
a.
What is the chance of obtaining a birth
weight of 141 oz or heavier when
sampling birth records at random?
141  109
Z
 2.46
13
From the chart  Z of 2.46 corresponds to a right tail (greater than)
area of: P(Z≥2.46) = 1-(.9931)= .0069 or .69 %
Answer
b. What is the chance of obtaining a birth
weight of 120 or lighter?
120  109
Z
 .85
13
From the chart  Z of .85 corresponds to a left tail area of:
P(Z≤.85) = .8023= 80.23%
Reading for this week

Walker: 1.3-1.6 (p. 10-22), Chapters 2 and 3
(p. 23-54)