Chapter 1.3 - People Server at UNCW

Download Report

Transcript Chapter 1.3 - People Server at UNCW

1.3 Density Curves and Normal Distributions








Density curves
Measuring center and spread for density curves
Normal distributions
The 68-95-99.7 rule
Standardizing observations
Using the standard Normal Table
Inverse Normal calculations
Normal quantile plots
1
Exploring Quantitative Data
2
We now have a kit of graphical and numerical tools for describing
distributions. We also have a strategy for exploring data on a single
quantitative variable. Now, we’ll add one more step to the strategy.
Exploring Quantitative Data
1. Always plot your data: make a graph.
2. Look for the overall pattern (shape, center, and spread) and
for striking departures such as outliers.
3. Calculate a numerical summary to briefly describe center
and spread.
4. Sometimes the overall pattern of a large number of
observations is so regular that we can describe it by a
smooth curve.
2
Recall: Histogram
Table 1.3
Introduction to the Practice of Statistics, Sixth Edition
© 2009 W.H. Freeman and Company
Recall: From Section 1.1, we have:
Q: How many percent of those chose fifth-grade students have IQ scores of 105
or less?
Important property of a density curve is that areas under the curve correspond
to relative frequencies
Density Curves
Example: Here is a histogram
of vocabulary scores of 947
seventh graders.
The smooth curve drawn over
the histogram is a
mathematical model for the
distribution.
5
Density Curves

An important property of a density curve is that areas under
the curve correspond to relative frequencies
relative frequencies=.303

area = .293
Note the relative frequency of vocabulary scores <= 6 is
roughly equal to the area under the density curve <= 6.
Density Curves and Normal Distribution
Density curves come in any
imaginable shape.
Some are well known
mathematically and others aren’t.
Density Curves and Normal Distribution
Definition, pg 56
Introduction to the Practice of Statistics, Sixth Edition
© 2009 W.H. Freeman and Company
Normal distributions
Normal – or Gaussian – distributions are a family of symmetrical,
bell shaped density curves defined by a mean m (mu) and a
standard deviation s (sigma) : N(m,s).
1
f ( x) 
e
2
1  xm 
 

2 s 
2
x
e = 2.71828… The base of the natural logarithm
π = pi = 3.14159…
x
A family of density curves
Here means are the same (m = 15)
while standard deviations are
different (s = 2, 4, and 6).
0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
Here means are different
(m = 10, 15, and 20) while standard
deviations are the same (s = 3)
0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
The 68-95-99.7 Rule
The 68-95-99.7 Rule
In the Normal distribution with mean µ and standard deviation σ:
 Approximately 68% of the observations fall within σ of µ.
 Approximately 95% of the observations fall within 2σ of µ.
 Approximately 99.7% of the observations fall within 3σ of µ.
Standard Normal Distribution N(0, 1)
11
The standard Normal distribution
Because all Normal distributions share the same properties, we can
standardize our data to transform any Normal curve N(m,s) into the
standard Normal curve N(0,1).
N(64.5, 2.5)
N(0,1)
=>
x
z
Standardized height (no units)
For each x we calculate a new value, z (called a z-score).
Standardizing: calculating z-scores
A z-score measures the number of standard deviations that a data
value x is from the mean m.
z
(x  m )
s
When x is 1 standard deviation larger
than the mean, then z = 1.
for x  m  s , z 
m s  m s
 1
s
s
When x is 2 standard deviations larger
than the mean, then z = 2.
for x  m  2s ,
z
m  2s  m 2s

2
s
s
When x is larger than the mean, z is positive.
When x is smaller than the mean, z is negative.
Use normalcdf(start, end, 0, 1) to find prob=area under N(0, 1).
Prob=Area=normalcdf(-999, -1, 0, 1)
=0.1587
Prob=Area=normalcdf(-999, -1, 0, 1)
=0.8413
B
A
Prob=Area=normalcdf(-1, 2, 0, 1)
=0.8186
For Part A: Prob=Area
=normalcdf(1, 999, 0, 1) =0.1587
Use normalcdf(start, end, 0, 1) to find prob=area under N(0, 1).
Example: Let Z follows a standard normal distribution, Z~N(0, 1),
find out:
(1)
(2)
(3)
(4)
(5)
(6)
(7)
Pr(Z>0).
Pr(Z>3).
Pr(Z<-1).
Pr(-1<Z<1).
Pr(-2<Z<1).
Pr(1.5<Z<2.3).
Pr(-4<Z<4).
Standard Normal Distribution N(0, 1)
Normal Calculations
How to Solve Problems Involving Normal Distributions
Express the problem in terms of the observed variable x, list the
values of µ and σ.
Perform calculations.
 Step1: Standardize x to restate the problem in terms of a
standard Normal variable z.
 Step 2: Draw a picture of N(0, 1), and shade the area of
interest under the curve.
 Step 3: Use normalcdf(start, end, 0, 1) to find:
PROB=required area under standard Normal curve.
Write your conclusion in the context of the problem.
16
Example 1:
The National Collegiate Athletic Association (NCAA) requires Division I athletes to
score at least 820 on the combined math and verbal SAT exam to compete in their
first college year. The SAT scores of 2003 were approximately normal with mean
1026 and standard deviation 209.
What proportion of all students would be NCAA qualifiers (SAT ≥ 820)?
x  820
m  1026
s  209
(x  m)
z
s
(820  1026)
209
 206
z
 0.99
209
z
Use Calculator and find :
normalcdf( -0.99, 999, 0, 1)  0.84
Example 2:
Recall: The SAT scores of 2003 were approximately
normal with mean 1026 and standard deviation 209.
The NCAA defines a “partial qualifier” eligible to practice and receive an athletic
scholarship, but not to compete, as a combined SAT score is at least 720.
What proportion of all students who take the SAT would be partial qualifiers?
That is, what proportion have scores between 720 and 820?
x  720
m  1026
s  209
(x  m)
z
s
(720  1026)
209
 306
z
 1.46
209
Use Calculator and find :
normalcdf( -1.46, - 0.99, 0, 1)  9%
z
x  820
m  1026
s  209
z
(x  m)
s
(820  1026)
209
 206
z
 0.99
209
z
About 9% of all students who take the SAT
have scores between 720 and 820.
Example 1.25, Page 59: Heights of young women
The distribution of heights of young women aged 18 to 24 is approximately
Normal distribution with mean µ = 64.5 inches and standard deviation s = 2.5
inches. That is: heights follows approximately N(64.5”,2.5”) distribution.
Let X be the height of women aged 18 to 24. X ~ N(64.5”,2.5”) approx.
Question: What percent of women are shorter than 67 inches tall (i.e. 5’6”)?
mean µ = 64.5"
standard deviation s = 2.5"
x (height) = 67"
EX 1.25, Page 59: Women heights
Women heights Approx. N(64.5”,2.5”)
distribution. What percent of women are
shorter than 67 inches tall (that’s 5’6”)?
mean µ = 64.5"
standard deviation s = 2.5"
x (height) = 67"
We calculate z, the standardized value of x:
z
z
(x  m)
s
,
(67  64.5) 2.5

1
2.5
2.5
Conclusion:
84.13% of women are shorter than 67”.
By subtraction, 1 - 0.8413, or 15.87% of women are taller than 67".
EX 1.25, Page 59: Women heights (Cont.)
Let X=height (inches) of young women aged 18-24 years.
X ~N(64.5", 2.5") approx.
Question:
a) What percent of these women's heights are between 63" and 68"?
b) What percent of these women are taller than 65 inches tall?
EX 1.25, Page 59: Women heights (Cont.)
Let X=height (inches) of young women aged 18-24 years.
X ~N(64.5", 2.5") approx.
Question:
c) What height represents the 90th percentile of this aged woman?
Question c) is what it is called a
"backwards problem", since you're
solving for an X value while know
an area…
Review: p-th percentile

The p-th percentile of a distribution is the value that has p
percent of the observations fall at or below it. (Recall Q1,
Median, and Q3.)
Inverse normal calculations
For N(0, 1), find the observed range of values that correspond to a
given proportion/ area under the curve, by invnorm(%, 0, 1)
EX: (1) the 25th percentile.
(2) the 55th percentile
(3) the 10th percentile.
(4) the 90th percentile
Inverse normal calculations
Example1: Suppose the height of a randomly selected 5-year-old
child is a normal distribution with m =100cm and s =6cm.
(1)What’s the 90th percentile?
(2) What’s the 50th percentile?
(3) What’s the 10th percentile?
(4) What’s the 25th percentile?
(5) what’s the 56th percentile?
Solution (1) :
Step1 : From Calculator : Z  1.28
Step2 : So the z - score is 1.28, which means that :
( x  m ) ( x  100)

 1.28
s
6
Step3 : x  100  (1.28  6)  x  107.68
Answer Key:
(1) 107.68;
(2) 100;
(3) 92.31;
(4) 95.95;
(5) 100.91.
Inverse normal calculations
Example 2: A soft-drink machine is regulated so that it discharges
an average of 200 milliliters per cup with SD 15 milliliters. With
normality assumption.
(1) Find the prob that a cup will contain more than 220 milliliters
(2) Find the prob that a cup will contain between 180 and 230
milliliters
(3) Find the 40th percentile of the discharge amount
(4) Find the 89th percentile of the discharge amount
Normal quantile plots
One way to assess if a distribution is indeed approximately normal is to
plot the data on a normal quantile plot.
The data points are ranked and the percentile ranks are converted to zscores with Table A. The z-scores are then used for the x axis against
which the data are plotted on the y axis of the normal quantile plot.

If the distribution is indeed normal the plot will show a straight line,
indicating a good match between the data and a normal distribution.

Systematic deviations from a straight line indicate a nonnormal
distribution. Outliers appear as points that are far away from the overall
pattern of the plot.
Good fit to a straight line: the
distribution of rainwater pH
values is close to normal.
Curved pattern: the data are not
normally distributed. Instead, it shows
a right skew: a few individuals have
particularly long survival times.
Normal quantile plots are complex to do by hand, but they are standard
features in most statistical software.
Normal quantile plot of CO2 – Table 1.6 on page 33
Notice the systematic failure of the points to fall on the line, especially at the
low end where the data is “piled up”. Also, note the outliers at the high
end… Conclusion: Not normal
Normal quantile plot of the IQ scores of 78 7th grades students - Data in
Table 1.9 on page 39
Notice that the data points follow the line fairly well, though there is a slight
curve at the low-middle, indicating more data than would be expected for a
normal. The y-intercept is around 110 (mean= approx. 110) and the slope is
around 10 (s.d. is approx. 10). Conclusion: Normal