It`s Never Too Soon for a Practice AP Question
Download
Report
Transcript It`s Never Too Soon for a Practice AP Question
Section 1.2
Describing Distributions with
Numbers
Specific Ways to Describe
Shape, Center and Spread
• Center:
– Mean – ordinary arithmetic average. Pronounced
“x-bar.”
n
1
X Xi
n i 1
•Median – the midpoint of the data set.
Denoted M.
Bonds vs. Aaron
Barry Bonds
Hank Aaron
16
40
13
32
19
42
27
44
24
46
26
39
25
49
44
29
25
73
30
44
33
39
38
33
40
47
34
34
34
34
45
40
37
44
20
37
24
Compare Centers
Find the mean and median of both
Bonds’ and Aaron’s home runs.
Bonds has a higher mean of home runs,
but Aaron has a higher median. Why?
Resistant and Non-resistant
Means are affected by extreme
observations and outliers. The mean is
a non-resistant measure of center.
The median is resistant to extreme
measures. It is preferable when a data
set has outliers.
Think About This
Change Bonds’ single season record
from 73 home runs to 100 home runs.
How is the mean affected? The
median?
How do the mean and median compare
to each other in a symmetric
distribution? In a (uni-modal) skewed
right distribution? In a (uni-modal)
skewed left distribution?
Introduction to Measures of
Spread
One measure of spread you’ve already
studied is range, where you subtract the
lowest value from the highest value. It is not
a dependable measure of spread, because it
only depends on two values in the data set.
Today, we’ll learn about quartiles. They
divide a data set into fourths.
Finding quartiles is like finding the median.
You count midpoints, and average the middle
two numbers if there is an even number of
data points.
A Visual Representation of
Quartiles
Q1
Lower
Quartile
25th
%ile
Q2
Median
50th
%ile
Q3
Upper
Quartile
75th
%ile
So, there are really only THREE quartiles, and the middle one
isn’t usually called a quartile (it’s called the median). We
generally refer to Q1, M, and Q3.
Try it!
16
19
24
25
25
33
33
34
34
37
Find the Range, Median, Q1, and Q3
37
40
42
46
49
73
Solution
16
19
24
25
25
Q1 = 25
33
33
34
34
Median = 34
37
37
40
42
46
49
Q3 = 41
So, the Range is 73 – 16 = 57. This gives us a little
information about the variability of Bonds’ home runs
in a season.
The middle 50% of the data lies between 25 and 41,
so we see where the spread of the middle half of the
data lies.
73
Interquartile Range and the
Outlier Rule
IQR is simply the difference between the upper
quartile and the lower quartile.
In our Barry Bonds example, IQR = 41 – 25 = 16.
We use the IQR to define what an outlier is. An outlier
is any value (or values) that falls more than 1.5*IQR
above the upper quartile or below the lower quartile.
“Fences”
Think of the 1.5*IQR rule as fences. They draw the
boundary line beyond which values are outliers.
Is Barry Bonds’ 73 homer season an outlier???
5 Number Summary
The five number summary consists of:
minimum, Q1, the Median, Q3, and maximum.
It is important because we’ll use it to create a
boxplot
(also called a box-and-whiskers plot).
Bonds’ Boxplot
Recall his 5 number summary:
L = 16; Q1 = 25; M = 34; Q3 = 41; H = 73
Barry Bond’s Homeruns in a season
10
20
30
40
50
Number of Homeruns
60
70
Describing Distributions using
a Boxplot
Spread: IQR or Range
Center: Median
Outliers: Use formula or a Modified Boxplot
Shape:
If the Median is approx. centered: Roughly
symmetric
If the Median is closer to the maximum:
skewed left
If the Median is closer to the minimum: skewed
right
Graph Choices for Comparing
Distributions
Boxplots alone contain little detail, but sideby-side boxplots effectively compare large
sets of quantitative data.
Let’s Plot Bonds vs Aaron’s and compare. =]
Keys to Remember
**Plot both distributions using the
same scale.
**Always compare apples to apples. By
that, I mean compare mean to mean,
median to median, Q1 to Q1, etc.
Students lose points on the AP exam
when they make comparisons between
two different measures.
Measuring Spread: Standard
Deviation
The most commonly used measure of
spread is the standard deviation.
Standard deviation tells us the average
distance the observations are away
from the mean.
Standard Deviation and
Variance
Variance is the average of the squares of
the deviations of the observations from the
mean.
WHAT???
n
s
2
(X
i 1
i
X)
n 1
Find this on your formula sheet!
2
Let Me ‘xplain
Observation
Deviation from
Mean
Squared Deviation
1792
1792-1600 =192
1922=36,864
1666
1362
1614
1460
1867
1439
Mean = 1600
Sum
Standard Deviation Calculation
Continued
s2=214,870/6 = 35, 811.67
This is the variance.
s = √35,811.67 = 189.24 calories
Properties of Standard
Deviation
s measures spread about the mean
s = 0 only when there is NO SPREAD
(meaning all the values are the same).
As the observations become more
spread out about their mean, s gets
larger.
s is not resistant to skewness or
outliers. WHY?
How the AP Folks Test Your
Ability to Reason
How do the following affect the mean?
The median? The Std. Dev.?
Adding a certain amount to every value in
a data set
Multiplying each value in a data set by the
same number
Recap
Measures of spread:
Measures of center:
Range/spread, IQR, standard deviation
Median, Mean
When to use which???
The mean and the std. dev. are not resistant
to outliers, so use them only when the
distribution is roughly symmetric and there
aren’t any outliers.
Use the 5 Number Summary when the
distribution is strongly skewed or has outliers.
Height Project due ____day!
Height Project:
Collect heights, in inches, from 50 high school girls and 50 high
school boys (keep them separate). Organize these in two
frequency tables and create side-by-side box plots. Describe the
distributions of both the boys and girls heights. Write 3
statements comparing the distributions.
Grading Rubric:
Frequency Table of values(one for boys, one for girls)
Side-by-side boxplots
Describe the distributions for both the boys and
girls heights
Comparison statements
10 points
30 points
30 points
30 points