Transcript residual

CCSS – STATISTICS THREAD
LOUISVILLE, KY 2012
Lisa Fisher-Comfort, MN Regional Coordinator,
Michael Long, HI Regional Coordinator
Earlier in Chapter 4…
Lesson 4.2.1 Math Notes
Lesson 6.1.1 - Closure
Residuals
Battle Creek Cereal Scatterplot
Lesson 6.1.2 - Closure





The first step in any statistical analysis is to graph the data. Graphs do not
necessarily start at the origin; indeed, frequently in statistical analyses they do
not.
A residual is a measure of how far our prediction using the best-fit model is from
what was actually observed.
A residual has the same units as the y-axis.
A residual can be graphed with a vertical segment. The length of this segment
(in the units of the y-axis) is the residual.
A positive residual means the actual observed y-value of a piece of data is
greater than the y-value that was predicted by the LSRL.

A negative residual means the actual data is less than predicted.

Extrapolation of a statistical model can lead to nonsensical results.
The following table shows data for one season of the El Toro professional basketball
team. El Toro team member Antonio Kusoc was inadvertently left off of the list.
Antonio Kusoc played for 2103 minutes. We would like to predict how many points he
scored in the season.
Player Name
Sordan, Scottie
Lippen, Mike
Karper, Don
Shortley, Luc
Gerr, Bill
Jodman, Dennis
Kennington, Steve
Bailey, John
Bookler, Jack
Dimkins, Rickie
Edwards, Jason
Gaffey, James
Black, Sandy
Talley, Dan
Minutes Played
Total Points Scored
in a Season
3090
2825
1886
1641
1919
2088
1065
7
740
685
274
545
671
191
2491
1496
594
564
688
351
376
5
278
216
98
182
185
36
checksum 17627
checksum 7560
Your Task (6-30)
a. Obtain a Lesson 6.1.4 Resource Page from your
teacher. Draw a line of best fit for the data and then
use it to write an equation that models the relationship
between total points in the season and minutes played.
b. Which data point is an outlier for this data? Whose
data does that point represent? What is his residual?
c. Would a player be more proud of a negative or
positive residual?
d. Predict how many points Antonio Kusoc made.
LSRL on a Calculator
6-33. A least squares regression line (LSRL) is a unique
line that has the smallest possible value for the sum of the
squares of the residuals.
a. Your teacher will show you how to use your calculator to
make a scatterplot. (Graphing calculator instructions can
also be downloaded from www.cpm.org/technology.) Be
sure to use the checksum at the bottom of the table in
problem 6-30 to verify that you entered the data into
your calculator accurately.
b. Your teacher will show you how to find the LSRL and
graph it on your calculator. Sketch your scatterplot and
LSRL on your paper.
Lesson 6.1.4 - Closure


This is a two-day lesson. Problem 6-34 is a Least
Squares Demo that can be teacher led to
summarize their understanding of Least Square
Regression Lines.
LeastSquaresDemo.html
Find the Correlation Coefficient for the
El Toro Basketball team

Describe the form, direction, strength, and outliers of the
association.
Form could be linear but the residual plot indicates a
another model might be better.
Direction is negative with a slope of 0.59; an increase of
one minutes played produced 0.59 points scored on
the average.
Strength is a fairly strong and positive linear association
because r = 0.865.
Outliers: Scottie Sordan (a.k.a. Michael Jordan).
Lesson 6.2.2 Closure
Computer Exploration of Correlation Coefficients Problem 6-72
Students are asked to create scatterplots with the
following associations and record r:
Strong positive linear association
Weak positive linear association
Strong negative linear association
No linear association (random scatter)
http://illuminations.nctm.org/LessonDetail.aspx?ID=L456
#first
High-Temperature Data
1
2
3
4
5
6
7
8
9
10
City
Anchorage, AK
Spokane, WA
Billings, MT
Juneau, AK
Bangor, ME
Bellingham, WA
Albuquerque, NM
Denver, CO
Portland, OR
Seattle, WA
1975 (°F)
13
52
62
29
53
52
67
60
57
54
2000 (°F)
33
44
44
45
48
53
53
54
54
54
11
Boston, MA
60
56
12
New York, NY
56
58
13
Duluth, MN
55
60
14
Bismarck, ND
66
61
15
Baltimore, MD
61
62
16
Washington, D.C.
62
62
17
Philadelphia, PA
59
62
18
El Paso, TX
83
65
19
Lansing, MI
55
66
20
Phoenix, AZ
77
67
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
San Francisco, CA
Sacramento, CA
Los Angeles, CA
Raleigh, NC
Des Moines, IA
Kansas City, MO
Chicago, IL
Oklahoma City, OK
Louisville, KY
Topeka, KS
Atlanta, GA
Orlando, FL
Baton Rouge, LA
Honolulu, HI
New Orleans, LA
67
71
71
63
72
73
60
76
70
74
66
79
81
84
80
67
68
69
70
73
74
75
76
76
77
79
82
84
85
86
Displaying Temperatures


Is the planet getting hotter? Experts look at the temperature of the air and the
oceans, the kinds of molecules in the atmosphere, and many other kinds of data
to try to determine how the earth is changing. However, sometimes the same data
can lead to different conclusions because of how the data is represented.
Your teacher will provide you with temperature data from November 1, 1975,
and from November 1, 2000. To make sense of this data, you will first need to
organize it in a useful way.





Your teacher will assign you a city and give you two sticky notes. Label the
appropriately colored sticky note with the name of the city and its temperature in
1975. Label the other sticky note with its city name and temperature in 2000.
Follow the directions of your teacher to place your sticky notes on the class histogram.
Use the axis at the bottom of the graph to place your sticky note.
How many cities were measured for this study?
Describe the spread and shape of each of the histograms that you have created.
Which measure of central tendency would you use to describe a typical temperature
for each year? Justify your choice.
http://www.almanac.com/weather
Histograms and World Temperatures
Boxplots and Temperatures

The histograms your class made in problem 8-44 display data along the horizontal axis.
Another way to display the data is to form a box plot, which divides the data into four equal
parts, or quartiles. To create a box plot, follow the steps below with the class or in your
team.
With a sticky dot provided by your teacher, plot the 1975 temperature for your city on a number line in
front of the class.
What is the median temperature for 1975? Place a vertical line segment about one-half inch long
marking this position above the number line on your resource page.
How far does the data extend from the median? That is, what are the minimum and maximum
temperatures in 1975? Place vertical line segments marking these positions above the number line.
The median splits the data into two sets: those that come before it and those that come after it when the
data is ordered from least to greatest, like it is on the number line. Find the median of the lower set
(called the first quartile). Mark the first quartile with a vertical line segment above the number line.
Look at the temperatures that come after the median on your number line. The median of this portion of
data is called the third quartile. Mark the third quartile with a vertical line segment above the
number line.
Draw a box that contains all of the data points between the first and third quartiles. Your graph should
be similar to a box with outer segments like the one shown below.
What does the box plot tell you about the temperatures of the cities in 1975 that the dot plot did not?


Two-day lesson
Students use center, shape, spread and outliers to
compare two sets of numerical data.
Box Plots and Histograms
8-30. Mrs. Ross is the school basketball coach. She
wants to compare the scoring results for her team from
two different games. The number of points scored by
each player in each of the games are shown below.
Game 1: 12, 10, 10, 8, 11, 4, 10, 14, 12, 9
Game 2: 7, 14, 11, 12, 8, 13, 9, 14, 4, 8
a.
How many total players are on the team?
b.
What is the mean number of points per player for
each game?
c.
What is the median number of points per player
for each game?
d.
What is the range of points scored by each player
for each game?
e.
With your team, discuss and find another method
for comparing the data.
f.
Do you think the scoring in two games is
equivalent?
Follow Up Question
Math Notes
MATH NOTES
ETHODS AND MEANINGS
Mean Absolute Deviation
One method for measuring the spread (variability) in a set of data
is to calculate the average distance each data point is from the mean.
This distance is called the mean absolute deviation. Since the
calculation is based on the mean, it is best to use this measure of spread
when the distribution is symmetric.
For example, the points shown below left are not spread very far from the
mean. There is not a lot of variability. The points have a small average
distance from the mean, and therefore a small mean absolute deviation.
x
x
x x
mean
x
x
x
mean
The points above right are spread far from the mean. There is more variability.
They have a large average distance from the mean, and therefore a large mean
absolute deviation.
Mean Absolute Deviation
Data from Game 1: 12, 10, 10, 8, 11, 4, 10, 14, 12, 9
New tool – Standard Deviation
Standard Deviation

Data from 11-60
Sugar W: 10, 32, 32, 34, 34, 36, 37, 39, 39, 40,
41, 43, 43, 44, 45, 46, 46, 49, 70 checksum 760