Transcript 3.3x

Section 3.3
Measures of Relative Position
With some added content
by D.R.S., University of Cordele
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Measures of Relative Position
“How do I compare with everybody else?”
1. nth place
2. Percentiles
a. Given percentile P, find data value there.
b. Given data value, what’s its percentile?
3. Quartiles
4. Five Number Summary and the Box Plot diagram
5. Standard Score (also known as z-score)
6. Outliers
Nth Place
The highest and the lowest
2nd highest, 3rd highest, etc.
“Olin earned $41,246. He’s in ___th place out of ___.”
3
Getting a handle on the idea of Percentiles
“Olin’s $41.246 salary
is the same or higher
than ____% of the
population.”
FRACTION:
> or = how many?
how many in population?
=
and convert it to a percent: _____ %
If your test score were at this percentile, do you
consider it to be high or low or middleish?
90th percentile is _______________ (≥90% of the pop.)
70th percentile is _______________ (≥70% of the pop.)
40th percentile is _______________ (≥40% of the pop.)
10th percentile is _______________ (≥10% of the pop.)
Two Kinds of Percentile Problems
Percentile is given.
You
. have to find the data value.
Question is like this:
“The salary at the 90th
percentile is $how much?”
Example 3.18 is this kind of problem
The
______th
Percentile
Example 3.19 is this kind of problem
The Data
Value is
_______
Data value is given.
They ask for percentile.
The question is like this:
“A $50,000 salary puts you
in the the ?th percentile?”
“What is the data value at the Pth percentile?”
This is like Example 3.18
Formula: Location 𝐿 =
𝑛∙𝑃
100
(𝑛=number of data values)
• Your data values are in order from lowest to highest.
• Compute Location 𝐿 and then:
If 𝐿 happens to be
an exact integer…
…take the average
of the values in
positions 𝐿 and
(𝐿 + 1).
If 𝐿 is NOT an exact integer…
…bump up 𝐿 to the next highest
integer (“ceiling”)
-- never round down, but always
bump up -- and take value in that
position.
Example 3.18: Finding Data Values Given the
Percentiles
A car manufacturer is studying the highway miles per
gallon (mpg) for a wide range of makes and models of
vehicles. See separate handout for the data.
a. Find the value of the 10th percentile.
b. Find the value of the 20th percentile.
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 3.18:
(a) Find the mpg value for the 10th percentile
a. There are ____ values in this data set, thus n = ___.
We want the 10th percentile, so P = ___.
𝑛∙𝑃
Compute Location
𝐿=
=
100
Is it an exact integer? No. ALWAYS BUMP UP,
so take the data value in position # ______,
which is ______ mpg.
Answer: “The 10th percentile is _____ mpg.”
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 3.18:
(b) Find the mpg value for the 20th percentile
a. There are ____ values in this data set, thus n = ___.
We want the 20th percentile, so P = ___.
Location 𝐿 =
𝑛∙𝑃
100
Calculate:
Is it an exact integer? ________.
so take the data values in position # ______ and
#______, and average them.
Answer: “The 20th percentile is ___ mpg.”
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
If you know the value, what’s its percentile?
Pth Percentile of a Data Value
Figure out 𝐿 = how many values < or = your data value.
Formula: 𝑃 =
𝐿
𝑛
∙ 100
For this formula, always ROUND in the usual
rounding way of rounding
(5 or higher round up; 4 or lower chop down)
where P is the percentile rounded to the nearest whole
number,
L is the number of values in the data set less than or
equal to the given value, and
n is the number of data values in the data set.
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 3.19: Finding the Percentile of a Given
Data Value
In the data set from the previous example, the Nissan
Xterra averaged 21.1 mpg. In what percentile is this
value?
Solution
We begin by making sure that the data are in order
from smallest to largest. We know from the previous
example that they are, so we can proceed with the next
step.
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 3.19: Finding the Percentile of a Given
Data Value (cont.)
The Xterra’s value of _____ mpg is repeated in the data
set, in both the 48th and 49th positions, so we will pick
the one with the largest location value, which is the
____th. Using a sample size of n = ____ and a location
of L = ____, we can substitute these values into the
formula for the percentile of a given data value, which
gives us the following.
And round to nearest
𝐿
Your calculation here:
whole number:
𝑃 = ∙ 100
𝑛
______th percentile
This is your answer.
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
.
.
Avoid this common error:
• If your answer is “36%”, you are WRONG.
• The correct answer is “The 36th Percentile”.
• Percents and Percentiles are related, sure.
• But good grammar and proper usage matter.
Excel gives different answers
Excel does some
fancy interpolation
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
14reserved.
All rights
Quartiles
Quartiles
Q1 = First Quartile: 25% of the data are less than or
equal to this value.
Q2 = Second Quartile: 50% of the data are less than or
equal to this value.
Q3 = Third Quartile: 75% of the data are less than or
equal to this value.
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 3.20: Finding the Quartiles of a Given
Data Set – TWO DIFFERENT WAYS
Using the mpg data from the previous examples, find
the quartiles.
a. Use the percentile method to find the quartiles.
The Percentile method says to find the 25th percentile and that’s Q1.
And find the 50th percentile and that’s Q2.
And find the 75th percentile and that’s Q3.
b. Use the approximation method to find the quartiles.
The approximation method says to find the median and that’s Q2.
If it landed on an actual value (odd # of data values), don’t include it in next steps.
Q1 is the median of the values to the left of Q2.
Q3 is the median of the values to the right of Q2.
c. How do these values compare?
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 3.20: Finding the Quartiles of a Given
Data Set with the Percentile Method
a. Percentile Method
First quartile is 25th percentile
Position 𝐿 = 135 ∙
25
100
If we hurry through these, it’s
because most or all of the problems
seem to be done with the
Approximation Method instead.
= 33.75 ↑= 34
Second quartile is 50th percentile
Position 𝐿 = 135 ∙
50
100
= 67.5 ↑= 68
Third quartile is 75th percentile
Position 𝐿 = 135 ∙
75
100
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Count up to 34th position:
“Q1 is 19.8 mpg”
Count up to 68th position:
“Q2 is 23.6 mpg”
“Median is 23.6 mpg”
Count up to 102nd position:
“Q3 is 25.3 mpg”
= 101.25 ↑= 102
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 3.20: Finding the Quartiles
with the Approximation Method
b. Approximation Method (probably more common
in this course, and also same as TI-84’s 1-Var Stats)
First find the Median, that’s same as Q2.
Positions #1, 2, 3,…, 67
Position #______
Positions #69, 70, 71, …, 135
has data ______ mpg
Q1 = median of these
Positions # Pos.#____ Positions #
1 thru ____ _____mpg ___ thru 67
Q1
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Q2
Q3 = median of these
Positions Pos.#____ Positions #
#68 - ____ thru ____ _____ -135
Q3
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 3.20: Finding the Quartiles of a Given
Data Set (cont.)
If you put all 135 data values
into a TI-84 list and did 1-Var
Stats, the results look like this.
Scroll down to the second page
of results.
c.
n is __________________________________
minX is _______________________________
Q1 is _________________________________
Q2 is same as Med which is _______________
Q3 is_________________________________
maxX is _______________________________
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Additional examples of finding Quartiles and the
Five-Number Summary for a data set.
http://www.drscompany.com/edu/Examples/index.htm
#!stats/chapter3/section3/position/5number
A difficult, challenging example is at this link:
http://www.drscompany.com/edu/Examples/index.htm
#!stats/chapter3/section3/position/other/03
Quintiles and Deciles
You might also encounter
– Quintiles, dividing data set into 5 groups.
– Deciles, dividing data set into 10 groups.
These are done by the Percentile method:
– Deciles correspond to percentiles 10, 20, …, 90
– Quintiles correspond to percentiles 20, 40, 60, 80
21
Five-Number Summary and Box Plots
Interquartile Range (IQR)
The interquartile range is the range of the middle 50%
of the data, given by
How “wide” is the
“middle half”
IQR = Q3 - Q1
of the data set?
where Q3 is the third quartile and
Q1 is the first quartile.
For the vehicle mpg ratings example,
IQR = _____ - _____ = _____ mpg
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 3.23: Creating a Box Plot
Draw a box plot to represent the five-number summary
from the previous example. Recall that the five-number
summary was 12.1, 19.8, 23.6, 25.3, 35.9.
Solution
Step 1: Label the horizontal axis at even intervals.
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 3.23: Creating a Box Plot (cont.)
Step 2: Place a small line segment above each of the
numbers in the five-number summary.
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 3.23: Creating a Box Plot (cont.)
Step 3: Connect the line segment that represents Q1 to
the line segment that represents Q3, forming a
box with the median’s line segment in
between.
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 3.23: Creating a Box Plot (cont.)
Step 4: Connect the “box” to the line segments
representing the minimum and maximum to
form the “whiskers.”
TI-84 Boxplot information is at this link:
http://www.drscompany.com/edu/QuickNotes/Statistics/DataDescription/Boxplot_TI84.pdf
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Standard Scores
Standard Score
The standard score for a population value is given by
𝑥−𝜇
𝑧=
𝜎
where x is the value of interest from the population,
μ is the population _____________
σ is the population ___________________.
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Standard Scores
The standard score for a sample value is given by
x-x
z
s
where x is the value of interest from the sample,
x is the sample _____________
s is the sample ____________________.
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Standard Score answers the question
“How does my 𝑥 compare to the mean?”
“Am I in the middle of the pack?”
“Am I above or below the middle?”
“Am I extremely high or extremely low?”
𝑧 Score is the measuring stick
If z= 0, then I’m ________________________.
If z > 0,then I’m ________________________.
If z < 0, then I’m ________________________.
z is almost always between _____ and _____.
29
Example 3.25: Calculating a Standard Score
If the mean score on the math section of the SAT test is
500 with a standard deviation of 150 points, what is the
standard score for a student who scored a 630?
Solution (note this formula is for a ______________)
μ = 500 and σ = 150. The value of interest is x = 630, so
we have the following.
𝑥−𝜇
𝑧=
𝜎
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
𝑧=
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Excel STANDARDIZE function to convert a data
value (x) to a standard score (z)
𝒛 Score: 𝑥 is how many standard
deviations away from the mean?
If you know the x value
• Population:
𝑥−𝜇
𝑧=
𝜎
• Sample
𝑥−𝑥
𝑧=
𝑠
To work backward from z to x
• Population
𝑥 =𝑧∙𝜎+𝜇
• Sample
𝑥 =𝑧∙𝑠+𝑥
These formulas agree with the labeling of the axes you did in the Empirical
Rule and Chebyshev’s Theorem problems. In those problems, the z values
were always nice integers: -3, -2, -1, 0, 1, 2, 3.
32
𝑧 score values
Typically round to two decimal places.
– Don’t say “0.2589”, say “0.26”
If not two decimal places, pad
– Don’t say “2”, say “2.00”
– Don’t say “-1.1”, say “-1.10”
𝑧 scores are almost always in the interval −4 < 𝑧 < 4.
Be very suspicious if you calculate a 𝑧 score that’s not a
small number.
33
Example: Using 𝑧 scores to compare
unlike items
The Literature test
• The mean score was 77
points.
• The standard deviation was
11 points
• Sue earned 91 points
• Find her z score for this test:
The Biology test
• The mean score was 47
points
• The standard deviation was
6 points
• Sue earned 55 points
• Find her z score for this test:
On which test did she have the “better” performance?
34
𝑧 scores caution with negatives
Example: compare test scores on two different tests to
ascertain “Which score was the more outstanding of
the two?”
Be careful if the 𝑧 scores turn out to be negative.
Which is the better performance? 𝑧 = −1.99
or 𝑧 = −0.34 ?
Stop and think back to your basic number line and the
meaning of “<“ and “>”
35
Interquartile Range and Outliers
Extra topic for awareness
Concept: An OUTLIER is a wacky far-out abnormally
small or large data value compared to the rest of the
data set.
We’d like something more precise.
Define: IQR = Interquartile Range = Q3 – Q1.
Define: If 𝑥 < 𝑥 − 1.5 ∙ 𝐼𝑄𝑅, 𝑥 is an Outlier.
Define: If 𝑥 > 𝑥 + 1.5 ∙ 𝐼𝑄𝑅, 𝑥 is an Outlier.
(Other books might make different definitions)
36
Outliers Example
Here’s an quick elementary example:
Data values 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20
Mean 𝑥 = 6.8 and 𝐼𝑄𝑅 = 9 – 3 = 6
37
Outliers Example
Data values 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20
We found IQR = 6 and the mean is 6.8
One definition uses 𝐼𝑄𝑅 ∗ 1.5 to define outliers
Here, 6 ∗ 1.5 = 9
Anything more than 9 units away from 𝒙 is then
considered to be abnormally small or large.
6.8 – 9 = −3.2, nothing smaller than −3.2
6.8 + 9 = 15.8: the 20 is an outlier.
38
No-Outliers Example
Data values 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10
Mean 𝑥 = 5.9 and 𝐼𝑄𝑅 = 9 – 3 = 6
(coincidence that 𝑥 = 𝐼𝑄𝑅, insignificant)
𝐼𝑄𝑅 ∗ 1.5 = 9
Anything more than 9 units away from 𝒙 is abnormal.
5.9 − 9 = −3.1; 5.9 + 9 = 14.9
This data set has No Outliers.
39
Outliers: Good or Bad?
“I have an outlier in my data set.
Should I be concerned?”
– Could be bad data. A bad measurement.
Somebody not being honest with the pollster.
– Could be legitimately remarkable data, genuine
true data that’s extraordinarily high or low.
“What should I do about it?”
– The presence of an outlier is shouting for
attention. Evaluate it and make an executive
decision.
40