It`s Never Too Soon for a Practice AP Question
Download
Report
Transcript It`s Never Too Soon for a Practice AP Question
Warm-up
The number of deaths among persons aged 15
to 24 years in the United States in 1997 due to
the seven leading causes of death for this age
group were accidents, 12,958; homicide, 5,793;
suicide, 4,146; cancer, 1,583; heart disease,
1,013; congenital defects, 383; AIDS, 276.
Make a bar graph to display these data.
What additional information do you need to make a pie
chart?
Warm up 2
1. The histogram shows the number of major hurricanes that reached the East Coast of
the United States from 1944 to 2000. Describe the shape, center, and spread of the
distribution.
Frequency
15
10
5
0
0
1
2
3
4
Hurricanes
5
6
7
WU 2 Cont.
Hallux abducto valgus (call it HAV) is a deformation of the big toe that is not
common in youth and often requires surgery. Doctors used X-rays to measure the
angle (in degrees) of deformity in 38 consecutive patients under the age of 21 who
came to a medical center for surgery to correct HAV. The angle is a measure of the
seriousness of the deformity. Here are the data.
28 32 25 34 38 26 25 18 30 26 28 13 20
21 17 16 21 23 14 32 25 21 22 20 18 26
16 30 30 20 50 25 26 28 31 38 32 21
Make a stemplot and give a numerical description of this distribution. Are there an
outliers? Write a brief discussion of the shape, center, and spread of the angle of
deformity among young patients needing surgery for this condition.
Section 1.2
Describing Distributions with
Numbers
Specific Ways to Describe Shape,
Center and Spread
Center:
Mean – ordinary arithmetic average. Pronounced
“x-bar.”
n
1
X Xi
n i 1
Σ, pronounced “sigma”
means the sum of… In
other words, you add
up the terms 1 through
n.
•Median – the midpoint of the data set.
Denoted M.
Bonds vs. Aaron
Barry Bonds
Hank Aaron
16
40
13
32
19
42
27
44
24
46
26
39
25
49
44
29
25
73
30
44
33
39
38
33
40
47
34
34
34
34
45
40
37
44
20
37
24
Have no fear… Your calculator is here!
You can get all this information from your
calculator.
Type your data in L1 and L2.
Stat, 1-Var Stats, L1.
Do the same thing for L2.
Compare Centers
Find the mean and median of both Bonds’
and Aaron’s home runs.
X 35.4375
Y 34.9
M X 34
M Y 38
Bonds has a higher average
number of home runs, but this
average is affected by the
extreme value of 73. The
median for Aaron is higher
than Bonds, indicating that he
hit more home runs than
Bonds in a typical season.
Resistant and Non-resistant
The mean is affected by extreme
observations, such as Bonds’ single
season record of 73 home runs. It is a
non-resistant measure of center.
The median, however, is resistant to
extreme measures. It is preferable when a
data set has outliers.
Think About This
Change Bonds’ single season record from
73 home runs to 100 home runs. How is
the mean affected? The median?
How do the mean and median compare to
each other in a symmetric distribution?
In a (unimodal) skewed right distribution?
In a (unimodal) skewed left distribution?
Introduction to Measures of Spread
Today, we’ll learn about quartiles. Oddly
enough, they divide a data set into fourths
(25% sections).
Finding quartiles is like finding the median.
You count midpoints, and average the
middle two numbers if there are an even
number of data points.
A Visual Representation of Quartiles
Q1
Lower
Quartile
25th
%ile
25%
Q2
Q3
Upper
Quartile
75th
%ile
Median
50th
%ile
25%
25%
25%
So, there are really only THREE quartiles, and the middle one
isn’t usually called a quartile (it’s called the median). We
generally refer to Q1, M, and Q3.
To find Q1, you find the median of the
lowest half of data.
To find Q3, you find the median of the
higher half of the data.
Try it!
16
19
24
25
25
33
33
34
34
37
37
Find the Range, Median, Q1, and Q3
40
42
46
49
73
Solution
16
19
24
25
25
Q1 = 25
33
33
34
34
37
Median = 34
37
40
42
46
49
73
Q3 = 41
So, the Range is 73 – 16 = 57. This gives us a little
information about the variability of Bonds’ home runs
in a season.
The middle 50% of the data lies between 25 and 41,
so we see where the spread of the middle half of the
data lies.
Interquartile Range and the Outlier
Rule
IQR is simply Q3 - Q1.
In our Barry Bonds example, IQR = 41 –
25 = 16.
The IQR is a suitable measure of spread
and is paired with Median.
We use the IQR to define what an outlier
is. An outlier is any value (or values) that
falls more than 1.5*IQR above the upper
quartile or below the lower quartile.
“Fences”
Think of the 1.5*IQR rule as fences. They draw
the boundary line beyond which values are
outliers.
Is Barry Bonds’ 73 homer season an outlier???
Recall: Q1 = 25; Q3 = 41; IQR = 16
So, 1.5*IQR = 1.5*16 = 24.
Add 24 to Q3 and Subtract 24 from Q1:
Upper boundary = 24 + 41 = 65
Lower boundary = 25 – 24 = 1
Conclusion: 73 falls above the outlier boundary of 65, so
it is an outlier!!!
5 Number Summary
The five number summary consists of the
lowest value, Q1, the Median, Q3, and the
highest value. It is important because
we’ll use it to create a new kind of graph:
a boxplot (also called a box-and-whiskers
plot).
Bonds’ Boxplot
Recall his 5 number summary:
L = 16; Q1 = 25; M = 34; Q3 = 41; H = 73
10
20
30
40
50
Number of home runs in a season
60
70
Modified Boxplots
Modified boxplots show outliers as isolated points.
Bonds’ 73 home run season was an outlier, so the
whisker in a modified boxplot only extends to the
last data point that was NOT an outlier. Any outlier
is shown as a star (*).
CAUTION: Many students extend the whisker to the
outlier “fence” (i.e. 65) This is WRONG! The
whisker should stop at the last actual data point. So
tell me – where should the upper whisker end in a
modified boxplot of Bonds’ home runs per
season???
49
We can look at these in the calculator as
well.
Go to StatPlot.
It’s Never Too Soon for a
Practice AP Question
2005 AP Statistics Problem #1
http://apcentral.collegeboard.com/
apc/public/repository/_ap05_frq_st
atistics_45546.pdf
Question 1 Part a)
Part a) is graded Essentially Correct, Partially
Correct, or Incorrect
To receive an Essentially Correct, a student
must successfully compare center, shape and
spread. Specific numeric values are not
required.
To receive a Partially Correct, a student must
successfully compare 2 of the 3 measures of
center, shape and spread.
All other responses are graded as Incorrect.
Special Notes
Compare means you state which is larger.
For example, “the mean of the rural
students’ daily caloric intake is greater
than the mean for the urban students” is a
correct comparison. However, stating “the
mean of the rural students’ daily caloric
intake is 40.45 while the mean for the
urban students is 32.6” is not a
COMPARISON.
In Conclusion
Graders were looking for three
comparisons:
Center—the mean caloric intake of the rural
students is greater than the mean caloric
intake of the urban students
Spread—the spread of the rural students’
distribution is larger than the spread of the
urban students
Shape—the rural students’ caloric intakes are
roughly symmetric while the urban students’
caloric intakes are skewed right.
There’s More to Spread than
IQR
Section 1.2 Standard Deviation
Describing Data with Numbers
So far, we’ve learned the 5 Number
Summary to describe a set of data: Min,
Q1, M, Q3, and Max.
We’ve also used the mean as another
measure of center.
Measuring Spread: Standard
Deviation
The most commonly used measure of
spread is the standard deviation.
Standard deviation tells us, on average,
how far the observations are away from
the mean.
Standard Deviation and Variance
Variance is the average of the squares of the
deviations of the observations from the mean.
WHAT???
But your calculator can tell you all of this!
1
s
xi x
n 1
2
2
Properties of Standard Deviation
s2 is called variance. Square root of s2 is
__.
s measures spread about the mean and is
called standard deviation.
s = 0 only when there is NO SPREAD (in
other words, all the data values are the
same). As the observations become more
spread out about their mean, s gets larger.
s is not resistant to skewness or outliers.
WHY?
Recap
Measures of spread:
IQR, standard deviation
Measures of center:
Median, Mean
When to use which???
The mean and the std. dev. are not resistant to
outliers, so use them only when the distribution is
roughly symmetric and there aren’t outliers.
Use the 5 Number Summary when the distribution is
strongly skewed or has outliers.
How the AP Folks Test Your Ability to
Reason
How do the following affect the mean?
The median? The Std. Dev.?
Adding a certain amount to every value in
a data set
Multiplying each value in a data set by the
same number
Homework
Day 1: Chapter 1 #40, 41, 45, 50,
52
Day 2: Chapter 1 #63, 91, 94, 96,
101