Normal distribution

Download Report

Transcript Normal distribution

Chapter 2 Modeling Distributions of Data
Objectives
SWBAT:
1) Find and interpret the percentile of an individual value within a distribution of data.
2) Find and interpret the standardized score (z-score) of an individual value within a
distribution of data.
3) Use the 68-95-99.7 rule to estimate areas (proportions of values) in a Normal
distribution.
4) Use Table A or technology to find (i) the proportion of z-values in a specified interval,
or (ii) a z-score from a percentile in the standard Normal distribution.
5) Use Table A or technology to find (i) the proportion of values in a specified interval, or
(ii) the value that corresponds to a given percentile in any Normal distribution.
The Normal Distribution
• A Normal distribution is described by a
Normal density curve (bell-shaped
curve).
• Any particular Normal distribution is
completely specified by two numbers:
• The mean of the Normal distribution is
at the center of the symmetric Normal
curve.
Important Properties of a Normal Distribution
• The Normal distribution is roughly symmetric, unimodal, and bellshaped.
• The mean, median, and mode have roughly the same value, which is
located exactly in the center of the distribution.
• The total area under the Normal curve equals 1. The area to the left of
the mean is .5 and the area to the right of the mean is .5.
• Data that lie beyond two standard deviations from the mean are rare,
and data that lie beyond three standard deviations from the mean are
very rare. Outliers are considered values falling below a z-score of -2.68
(area is .0037) or above a z-score of 2.68 (area is .9963).
• Many variables approximate the Normal distribution. However, even if
a data set is skewed, the sample from that data set will likely
approximate the Normal distribution.
The 68-95-99.7 Rule (aka the Empirical Rule)
The 68-95-99.7 Rule
In the Normal distribution with mean µ and standard deviation σ:
• Approximately 68% of the observations fall within σ of µ.
• Approximately 95% of the observations fall within 2σ of µ.
• Approximately 99.7% of the observations fall within 3σ of µ.
Example: Suppose a sample of scores yields a mean of 100 and a standard
deviation of 15. Assume that the distribution is Normal.
Approximately what percent of scores should fall between 85 and 115? (Hint:
Draw a diagram first!)
85 and 115 are both one standard deviation from the mean, so the percent of scores that fall
between 85 and 115 is approximately 68%
Let’s try some more with the same distribution…
What percent of scores should fall:
a) Between 70 and 130?
95%
c) Between 70 and 115?
13.5%+68%=81.5%
e) Less than 70?
2.5%
b) Between 55 and 145?
99.7%
d) Greater than 115?
13.5%+2.5%=16%
• We know approximately what percent of data fall exactly 1, 2, and 3
standard deviations from the mean. However, how can we find a
percent if a value does not fall exactly 1, 2, or 3 standard deviations from
the mean.
• The first thing we have to do is standardize our score(s). This is referred
to as finding the z-score.
The Standard Normal Distribution
All Normal distributions are the same if we measure in units of size σ from the mean µ as center.
The standard Normal distribution is the Normal distribution with mean 0 and standard deviation 1.
If a variable x has any Normal distribution N(µ,σ) with mean µ and standard deviation σ, then the
standardized variable
z=
has the standard Normal distribution, N(0,1).
x -m
s
Some notes on z-scores:
• Z-scores can be negative, positive, or 0.
• A positive z-score would indicate the original value is above the mean. For
example, a z-score of 1.24 would mean that the score is 1.24 standard
deviations above the mean.
• A negative z-score would indicate the original value is below the mean. For
example, a z-score of -0.27 would mean that the score is 0.27 standard
deviations below the mean.
• A z-score of 0 would indicate that the original value was the same as the
mean.
• Question: Are all negative z-scores bad?
Example: A distribution is approximately Normal with a mean of 26 and a
standard deviation of 7. Calculate and interpret the z-score for a value of 21.
The value 21 is 0.71 standard deviations
below the mean.
• Example: Seth recently took two tests in school. On his history test he
scored a 75. The class average on the test was a 63 and the test had a
standard deviation of 3. On his biology test, Seth scored a 81. The
class average on the test was a 76 and the test had a standard
deviation of 7. Relatively speaking, on which test did he perform
better?
He performed better on his history test
as he was more standard deviations
above that respective class mean.
Steps to Use the standard Normal table
1) Draw the Normal curve. Make sure to identify the mean and standard
deviation.
2) Standardize the score(s) of interest.
3) Plot the score(s), draw a vertical line(s), and shade the area of interest.
4) Look up the z-score on the standard Normal table.
• If the area of interest is shaded to the left, the value in the table is the desired area.
• If the area of interest is shaded to the right, we need to subtract the area in the table
from 1.
• If the area of interest is shaded between two z-scores, we need to look up the area
for both z-scores and subtract.
Note for the AP test: For all Normal distribution problems (and other
distributions which we will get to) you need to do 3 things:
1) state the distribution and identify values of interest
2) show work
3) answer the question
Example: A data set is Normally distributed with a mean of 259 and a
standard deviation of 74. Find the area under the curve less than a
score of 180.
• We have to look up the z-score of -1.07
on the table.
• Since the score starts out as “-1.0”, go to
the z column and go down until you
reach -1.0.
• Since there is a 7 in the hundredths place (.07), go to the right
until you reach the .07 column. You should now be located in the
spot that has -1.0 to the left, and .07 on the top.
This value should be .1423.
Example: A data set is Normally distributed with a mean of 26 and a
standard deviation of 2.4. Find the area under the curve more than a
score of 29.
N(26, 2.4)
A z-score of 1.25 on the table gives an area of .8944. However,
this is the area to the LEFT of the score. We want the area to
the right of the score. Since the area under the Normal curve
totals 1, we need to subtract .8944 from 1. This gives us our
desired area, which is .1056.
Example: In the 2008 Wimbledon tennis tournament, Rafael Nadal
averaged 115 miles per hour on his first-serves. Assuming that the
distribution of his first-serve speeds is Normal with a standard
deviation of 6 mph, find what proportion of his first-serves you would
expect to be between 110 and 125 mph.
N(115, 6)
Look up both areas and subtract.
.9525-.2033=.7492
Using the Normal Distribution in Reverse
• In 2008, the distribution of batting averages for MLB players with at least 300
plate appearances was approximately Normal with a mean of 0.272 and a
standard deviation of 0.027.
• Suppose a player gets a salary bonus if his batting average is in the top 10% of
all players. How well must a player hit for his batting average to be in the top
10%?
• We need to find the boundary between the lowest 90% of the distribution
and the highest 10%.
• The boundary value is called the 90th percentile, because 90% of the values
fall below it.
• A percentile describes location in a distribution, like quartiles.
• N(.272, .027)
• We know that the area under the
curve is .90. Therefore, we want to
look at the interior of the standard
Normal table for a proportion closest
to 0.9000 and get the z-score
associated with this proportion.
• The closest value is 0.8997. This corresponds to a z-score of
1.28. This means the 90th percentile is 1.28 standard deviations
above the mean.
• Now let’s find the batting average associated with this z-score.
How can we do these calculations on the TI-84?
• To find areas: normalcdf(lower, upper, mean, SD)
• 2nd, DISTR, 2: normalcdf
• To find boundaries: invNorm(area to left, mean, SD)
• 2nd, DISTR, 3: invNorm
• Note: mean and SD default to 0,1 if not entered.
• Note: You must show your three steps!
• State distribution and identify values of interest
• Show work
• Answer
Example: Suppose that Clayton Kershaw of the LA Dodgers throws his
fastball with a mean velocity of 94 miles per hour (mph) and a standard
deviation of 2 mph, and that the distribution if his fastball speeds can
be modeled by a Normal distribution.
a) About what proportion of his fastballs will travel at least 100 mph?
N(94, 2)
normalcdf (100, 100000, 94, 2)
Lower bound: 100
Upper bound: 100000
Mean: 94
SD: 2
Approximately .0013 of his fastballs will travel at least 100 mph.
Note: (a) and (b) in the notes are the same question.
c) About what proportion of his fastballs will travel less than 90 mph?
N(94, 2)
normalcdf (0, 90, 94, 2)
Lower bound: 0 (use 0 because a pitch cannot be negative mph)
Upper bound: 90
Mean: 94
SD: 2
Approximately .0228 of his fastballs will travel less than 90 mph.
d) About what proportion of his fastballs will travel between 93 and 95 mph?
N(94, 2)
normalcdf (93, 95, 94, 2)
Lower bound: 93
Upper bound: 95
Mean: 94
SD: 2
Approximately .3829 of his fastballs will travel between 93 and 95 mph.
e) What is the 30th percentile of Kershaw’s distribution of fastball
velocities?
N(94, 2)
invNorm(.3, 94, 2)
area to left: .30
mean: 94
SD: 2
The 30th percentile is 92.9512 mph.
f) What fastball velocities would be considered low outliers for Kershaw?
N(94, 2)
The values would fall below a z-score of -2.68 (area of .0037)
On the calculator:
invNorm(.0037, 94, 2)
Area to the left: .0037
Mean: 94
SD: 2
Same answer!!!
Fastballs below 88.64 mph would be
considered outliers.
g) Suppose that a different pitcher’s fastballs have a mean velocity of
92 mph and 40% of his fastballs go less than 90 mph. What is his
standard deviation of his fastball velocities, assuming his distribution of
velocities can be modeled by a Normal distribution?
N(92, ?)
Use your table and work backwards to find the z-score associated with
.40.
Z=-0.25.
Now substitute into our equation.
To check: use invNorm (.4, 92, 8) and you should get 90.
Normal Distribution Calculations
We can answer a question about areas in any Normal distribution by
standardizing and using Table A or by using technology.
How To Find Areas In Any Normal Distribution
Step 1: State the distribution and the values of interest. Draw a Normal curve with
the area of interest shaded and the mean, standard deviation, and boundary
value(s) clearly identified.
Step 2: Perform calculations—show your work! Do one of the following: (i)
Compute a z-score for each boundary value and use Table A or technology to
find the desired area under the standard Normal curve; or (ii) use the
normalcdf command and label each of the inputs.
Step 3: Answer the question.
Working Backwards: Normal Distribution Calculations
Sometimes, we may want to find the observed value that
corresponds to a given percentile. There are again three steps.
How To Find Values From Areas In Any Normal Distribution
Step 1: State the distribution and the values of interest. Draw a Normal curve with
the area of interest shaded and the mean, standard deviation, and unknown
boundary value clearly identified.
Step 2: Perform calculations—show your work! Do one of the following: (i) Use
Table A or technology to find the value of z with the indicated area under the
standard Normal curve, then “unstandardize” to transform back to the original
distribution; or (ii) Use the invNorm command and label each of the inputs.
Step 3: Answer the question.
Assessing Normality
The Normal distributions provide good models for some distributions of real data.
Many statistical inference procedures are based on the assumption that the population
is approximately Normally distributed.
A Normal probability plot provides a good assessment of whether a data set follows
a Normal distribution.
Interpreting Normal Probability Plots
If the points on a Normal probability plot lie close to a straight line, the plot indicates
that the data are Normal.
Systematic deviations from a straight line indicate a non-Normal distribution.
Outliers appear as points that are far away from the overall pattern of the plot.
Transforming Data
Effect of Adding (or Subtracting) a Constant
Adding the same number a to (subtracting a from) each observation:
• adds a to (subtracts a from) measures of center and location
(mean, median, quartiles, percentiles), but
• Does not change the shape of the distribution or measures of
spread (range, IQR, standard deviation).
Effect of Multiplying (or Dividing) by a Constant
Multiplying (or dividing) each observation by the same number b:
• multiplies (divides) measures of center and location (mean,
median, quartiles, percentiles) by b
• multiplies (divides) measures of spread (range, IQR, standard
deviation) by |b|, but
• does not change the shape of the distribution