Chapter 2: Modeling Distributions of Data

Download Report

Transcript Chapter 2: Modeling Distributions of Data

Chapter 2: Modeling
Distributions of Data
2.1 Describing Location in a Distribution
2.2 Density Curves and Normal Distributions
2.1 Describing Location in a Distribution
Objectives
SWBAT:
1) Find and interpret the standardized score (z-score) of an individual
value within a distribution of data.
2) Describe the effect of adding, subtracting, multiplying by, or
dividing by a constant on the shape, center, and spread of a
distribution of data
What is a percentile? On a test, is a student’s percentile the same as the
percent correct?
• A percentile describes location in a distribution, like quartiles.
• You can sort of think of percentiles as values of the variable that divide a set
of ranked data into 100 equal subsets.
• They ARE NOT the same as the percent correct on a test.
• If you ever received a percentile for a score on a standardized test, it means
that you did better than percentage of people. So if you scored in the 91st
percentile, it means you did better than 91% of individuals, and 9% of
individuals did better than you.
Macy, a 3-year-old female is 100 cm tall. Brody, her 12-year-old brother is 158
cm tall. Obviously, Brody is taller than Macy—but who is taller, relatively
speaking? That is, relative to other kids of the same ages, who is taller?
According to the Centers for Disease Control and Prevention, the heights of
three-year-old females have a mean of 94.5 cm and a standard deviation of 4
cm. The mean height for 12-year-olds males is 149 cm with a standard deviation
of 8 cm.
• #obvi Brody is taller, but which sibling is more above the average for their
distribution? And does this matter? Should we take anything else into
account?
Macy: 100 – 94.5 = 5.5
Brody: 158 – 149 = 9
• We need to account for the standard deviations of the respective distributions:
Relatively
speaking, Macy is
taller.
How do you calculate and interpret a standardized score (z-score)? Do z-scores
have units? What does the sign of a standardized score tell you?
• To calculate a z-score, we take the observation we want to standardize, subtract the mean
of the distribution, and then divide by the standard deviation of the distribution:
• The “units” are standard deviations.
• Z-scores can be negative, positive, or 0.
• A positive z-score would indicate the observation is above the mean. For example, a zscore of 1.24 would indicate that the observation is 1.24 standard deviations above the
mean.
• A negative z-score would indicate the observation is below the mean. For example, a zscore of -0.27 would indicate that the observation is 0.27 standard deviations below the
mean.
• A z-score of 0 would indicate that the observation was the same as the mean.
Carucci Example: According to mrbigglesworth.com, domestic short hair cats
meow an average of 18 times a day, with a standard deviation of 3.1. On a
given day, Trig meows 27 times (he really wants some treats). Calculate and
interpret the z-score for Trig’s meows.
Trig’s meows are 2.90 standard
deviations above the daily domestic
short hair mean.
In 2001, Arizona Diamondback Mark Grace’s homerun total had a standardized
score of z = –0.48. Interpret this value.
• Grace’s homerun total is 0.48 standard deviations below the MLB mean.
The mean number of homeruns hit in 2001 was 21.4, with a standard deviation
of 13.2. Calculate the number of homeruns that Mark Grace hit.
Grace hit approximately
15 homeruns.
What is the effect of adding or subtracting a constant from each observation?
Effect of Adding (or Subtracting) a Constant
Adding the same number a to (subtracting a from) each observation:
• adds a to (subtracts a from) measures of center and location
(mean, median, quartiles, percentiles), but
• Does not change the shape of the distribution or measures of
spread (range, IQR, standard deviation).
What is the effect of multiplying or dividing each observation by a constant?
Effect of Multiplying (or Dividing) by a Constant
Multiplying (or dividing) each observation by the same number b:
• multiplies (divides) measures of center and location (mean,
median, quartiles, percentiles) by b
• multiplies (divides) measures of spread (range, IQR, standard
deviation) by |b|, but
• does not change the shape of the distribution
Example: The scores on Mr. Carucci’s statistics quiz had a mean of 12 and a
standard deviation of 3. Mr. Carucci wants to transform the scores to have a
mean of 75 and a standard deviation of 12. What transformations should he
apply to each test score?
• We always want to take care of the multiplication part first. So to go from a
standard deviation of 3 to a standard deviation of 12, we need to multiply
each quiz score by 4.
• After we do this, the effect on the mean is that the mean is also multiplied
by 4, so our new mean is currently 48. We have to get to a mean of 75, so
we now need to add 27 to each quiz score.
• So to recap, our transformations are multiplying each quiz score by 4, and
then adding 27.
2.2 Density Curves and Normal Distributions
What is the Normal distribution? What are some characteristics of the
Normal distribution?
• A Normal distribution is described by a Normal density curve (bell-shaped
curve).
• Any particular Normal distribution is completely specified by two numbers:
• The mean of the Normal distribution is at the center of the symmetric
Normal curve.
Important Properties of a Normal Distribution
• The Normal distribution is roughly symmetric, unimodal, and bell-shaped.
• The mean, median, and mode have roughly the same value, which is located
exactly in the center of the distribution.
• The total area under the Normal curve equals 1. The area to the left of the
mean is .5 and the area to the right of the mean is .5.
• Data that lie beyond two standard deviations from the mean are rare, and
data that lie beyond three standard deviations from the mean are very rare.
Outliers are considered values falling below a z-score of -2.68 (area is .0037)
or above a z-score of 2.68 (area is .9963).
• Many variables approximate the Normal distribution. However, even if a
population is skewed, the sampling distribution from that data set has a good
chance of approximately being Normal (much more on this later in the
course).
What is the 68-95-99.7 rule? When does it apply?
The 68-95-99.7 Rule
In the Normal distribution with mean µ and standard deviation σ:
• Approximately 68% of the observations fall within σ of µ.
• Approximately 95% of the observations fall within 2σ of µ.
• Approximately 99.7% of the observations fall within 3σ of µ.
Example: Suppose a sample of scores yields a mean of 100 and a standard
deviation of 15. Assume that the distribution is Normal.
Approximately what percent of scores should fall between 85 and 115? (Hint:
Draw a diagram first!)
85 and 115 are both one standard deviation from the mean, so the percent of scores that fall
between 85 and 115 is approximately 68%
Let’s try some more with the same distribution…
What percent of scores should fall:
a) Between 70 and 130?
95%
c) Between 70 and 115?
13.5%+68%=81.5%
e) Less than 70?
2.5%
b) Between 55 and 145?
99.7%
d) Greater than 115?
13.5%+2.5%=16%
• We know approximately what percent of data fall exactly 1, 2, and 3 standard
deviations from the mean.
• However, how can we find a percent if a value does not fall exactly 1, 2, or 3
standard deviations from the mean?
• This is where z-scores come into play, along with our standard Normal table (or
calculator).
When using the standard Normal table, we begin by standardizing our score. After we
have our z-score, we look up the z-score on the standard Normal table.
• If the area of interest is shaded to the left, the value in the table is the desired area.
• If the area of interest is shaded to the right, we need to subtract the area in the table from 1.
• If the area of interest is shaded between two z-scores, we need to look up the area for both zscores and subtract.
Note for the AP test: For all Normal distribution problems (and other distributions
which we will get to) you need to do 3 things:
1) state the distribution and identify values of interest
2) show work
3) answer the question
Example: Find the proportion of observations from the standard Normal
distribution that are:
a) less than 0.54
0.7054
b) greater than –1.12
1 – 0.1314 = 0.8686
c) greater than 3.89
Approximately 0
d) between 0.49 and 1.82
0.9656 – 0.6879 = 0.2777
e) within 1.5 standard deviations of the mean
0.9332 – 0.0668 = 0.8664
How can we find the same proportions using the TI-84?:
a) less than 0.54
b) greater than –1.12
c) greater than 3.89
d) between 0.49 and 1.82
e) within 1.5 standard deviations of the mean
Alternate Example: In the 2008 Wimbledon tennis tournament, Rafael Nadal
averaged 115 miles per hour (mph) on his first serves. Assume that the
distribution of his first serve speeds is Normal with a mean of 115 mph and a
standard deviation of 6 mph.
a) About what proportion of his first serves would you expect to be slower than
103 mph?
About 2.28% of his first serves will be slower than 103 mph.
b) About what proportion of his first serves would you expect to exceed 120 mph?
About 20.23% of his first serves will exceed 120 mph.
c) What percent of Rafael Nadal’s first serves are between 100 and 110 mph?
About 19.61% of his first serves will be between 100 and 110 mph.
d) The fastest 30% of Nadal’s first serves go at least what speed?
The fastest 30% of Nadal’s first serves will go at least 118.146 mph.
e) A different player has a standard deviation of 8 mph on his first serves and
20% of his serves go less than 100 mph. If the distribution of his serve speeds
is approximately Normal, what is his average first serve speed?
• We are working with a Normal distribution that has a standard deviation of 8
mph.
• First, find the z-score associated with an area of 0.2.
• You can look on the inside of the table, or use the
command:
• Now, substitute into the z-score formula and solve for
the mean.
His average first speed is
approximately 106.72 mph.
Example: Suppose that Clayton Kershaw of the Los Angeles Dodgers throws
his fastball with a mean velocity of 94 miles per hour (mph) and a standard
deviation of 2 mph and that the distribution of his fastball speeds can be
modeled by a Normal distribution.
a) About what proportion of his fastballs will travel at least 100 mph?
About 0.13% of his fastballs will travel at least 100 mph.
b) About what proportion of his fastballs will travel greater than 100 mph?
This is the same question as part (a).
c) About what proportion of his fastballs will travel less than 90 mph?
Approximately 0.0228 of his fastballs will travel less than 90 mph.
d) About what proportion of his fastballs will travel between 93 and 95 mph?
Approximately 0.3829 of his fastballs will travel between 93 and 95 mph.
e) What is the 30th percentile of Kershaw’s distribution of fastball velocities?
The 30th percentile is 92.95 mph.
g) Suppose that a different pitcher’s fastballs have a mean velocity of 92 mph
and 40% of his fastballs go less than 90 mph. What is his standard deviation of
his fastball velocities, assuming his distribution of velocities can be modeled by
a Normal distribution?
N(92, ?)
Find the z-score associated with .40.
Now substitute into our equation.
To check: use invNorm (area: 0.4, mean: 92, st dev: 8) and
you should get 90.