Transcript Ch6x

Warm-up
• The women’s heptathlon in the Olympics consists of 7 track
and field events. In the 2000 Olympics the best 800-m time,
run by Getrud Bacher of Italy, was 129 seconds while the
mean was 137 seconds. The winning long jump by the Russian
Yelena Prokhorova was 6.6 meters and the mean was 6
meters.
• Which performance deserves more points?
Ch. 6
The Standard Deviation as a Ruler and the
Normal Model
Standard Dev. = Ruler
• Because the standard deviation measures average distance of
a data value away from the mean, it is perfect for measuring
different looking values (i.e. meters ran vs. centimeters
jumped)
• We can measure individual data values by finding how many
standard deviations away from the mean it is, this is called
standardizing
Some more info
• For Bacher: The std. deviation of all the qualifying times was 5
seconds
• For Prokhorova: The std. deviation of all long jumps was 30 cm
• Now who did better?
Standardizing w/ z-scores
• The result is called the z-score : 𝑧 =
(𝑦−𝑦)
𝑠
• A z-score of 2 would mean the value is 2 std devs above the
mean while -1 would mean 1 std dev below
• The biggest advantage of using z-scores to measure data is we
can compare different variables of different events, units, etc.
Who Won?
• Now that we found the z-scores, who do you think did better
in the heptathlon?
• In the long jump, Bacher jumped 5.84 meters.
• For the 800m run, Prokhorova ran it in 130.32 seconds.
Practice w/ Z-scores
•
•
•
•
•
•
•
•
•
•
•
•
•
Montana: 12.21
Connecticut: 12.53
New York: 12.83
Washington: 12.84
California: 12.88
Maine: 13.56
Rhode Island: 14.32
Oregon: 14.45
Massachusetts: 14.55
New Hampshire: 14.88
Colorado: 15.09
Vermont: 15.97
Alaska: 16.29
Warm-Up
• In your own words, describe the process of calculating zscores.
• Then explain what a z-score of 0 means.
Effects of Standardizing
• What is happening to our data when we standardize it?
• All data values are shifted lower since we are subtracting each
by the mean!
• What are the new measures of position?
• Measures of spread?
Shifting Data
• When adding or subtracting a constant to every data value, all
measures of position (mean, median, quartiles, min/max) will
increase/decrease by that same constant
• Measures of spread (range, IQR, standard deviation) are not
changed
• What happens when we divide the data values by the
standard deviation?
• When in doubt, test it yourself!
Rescaling Data
• Rescaling (multiplying/dividing) of data occurs in situations
such as converting units of the data values (if your data is in kg
and you want it in lbs)
• Effects of rescaling: All measures of position and measures of
spread are multiplied/divided by that same constant
Why is this important for zscores?
• Take a look at the z-score equation again: 𝑧 =
(𝑦−𝑦)
𝑠
• We are shifting the data by the mean and rescaling by the
standard deviation
• What are the effects of standardizing? What are some
advantages of using a set of z-scores instead of the original
data values?
Warm-up
• If you found out you scored 3 standard deviations away from
the mean (for the quiz), would you be surprised?
Main Topic
• How big of a z-score is shocking?
• Just like how we used bar charts, histograms, etc. to model
data, we can answer this question using a model
The Normal Model
• Normal models are “bell-shaped curves” that help us visualize
how extreme a z-score is
• ONLY USE NORMAL MODELS FOR DISTRIBUTIONS WHOSE
SHAPES ARE UNIMODAL AND ROUGHLY SYMMETRIC
• We write N(µ,ơ) to denote a normal model
• If we standardize the data first, we can use N(0,1) which is
called the standard Normal distribution
• http://www.shmoop.com/video/normal-distribution-curve
The 68-95-99.7 Rule
Using the Normal Model
• Imagine that you scored a 600 on the Math section of the SAT.
You find out that the mean this year was 500 with a standard
deviation of 100. Where do you stand among all students who
took the test?
• This is denoted by Area(y≥600) where we are finding the area
under the histogram from 600 and above.
Another example.
• The previous example had a nice number (600 being 1 std dev
away). How would we approach the same problem if your
score was 680?
• You have to use the z-table (only accepts z-scores)! (Or
graphing calculator)
How about % in between?
• I want to find the proportion of SAT scores that fall between
450 and 600. How do I do this?
• We want Area(450<y<600).
From Percentiles to Scores
• Finding areas from z-score is the simplest way to work with
Normal models but sometimes we start with areas and are
asked to work backward to find the corresponding z-score
• Example 1: What z-score represents the first quartile in a
Normal model?
• Example 2: Suppose a college says it admits only people with
SAT Verbal test scores among the top 10%. How high a score
does it take to be eligible?