Transcript Chapter 3
Chapter 3
Numerical Summaries
of Center and Variation
Copyright © 2014 Pearson Education, Inc. All rights reserved
Learning Objectives
3- 2
Understand how measures of center and spread are used
to describe characteristics of real-life samples of data.
Understand when it is appropriate to use the mean and
standard deviation and when it is better to use the
median and interquartile range.
Understand the mean as the balancing point of the
distribution of a sample of data and the median as the
point that has roughly 50% of the distribution below it.
Be able to write comparisons between samples of data
in context.
Copyright © 2014 Pearson Education, Inc. All rights reserved
3.1
Summaries for
Symmetric
Distributions
Copyright © 2014 Pearson Education, Inc. All rights reserved
Summaries for Symmetric Distribution
The mean describes the center.
The standard deviation describes the spread.
3- 4
A numerical summary.
The balancing point for the distribution.
Can be used as a typical value for symmetric mound
shaped distributions.
A numerical summary
Measures a typical distance of the observations from the
mean.
Measures the variability when the distribution is
symmetric
Copyright © 2014 Pearson Education, Inc. All rights reserved
The Mean as a Balancing Point
If we place a finger at the mean, the
histogram will balance perfectly.
Major League Baseball 2010
3- 5
Copyright © 2014 Pearson Education, Inc. All rights reserved
Skewness and the Mean
For a skewed right histogram, the mean is to
the right of the typical value.
Major League Baseball 2010
3- 6
Copyright © 2014 Pearson Education, Inc. All rights reserved
Symmetric Distributions and the Mean
3- 7
For a symmetric distribution, the mean is at
the center.
Copyright © 2014 Pearson Education, Inc. All rights reserved
The Formula for the Mean
To calculate the mean, use the formula:
x
x
n
Σ,
read “sigma”, means “add”.
x represents all of the data values.
n represents the sample size.
x represents the sample mean.
3- 8
Copyright © 2014 Pearson Education, Inc. All rights reserved
Calculating the Sample Mean
Find the mean of the number of siblings for
the 8 students questioned:
3,2,2,1,2,3,5,2
The sample size: n = 8.
x 3 2 2 1 2 3 5 2
x
n
2.5
3- 9
Copyright © 2014 Pearson Education, Inc. All rights reserved
8
Standard Deviation
The Standard Deviation, s, is a measure of
the spread.
It represents a typical distance from the mean
of the observations.
For mound shaped distributions, the majority
of the observations are less than one standard
deviation from the mean.
The square of the standard deviation is called
the variance.
3 - 10
Copyright © 2014 Pearson Education, Inc. All rights reserved
Put the Following in Order From
Smallest Standard Deviation to Largest
Solution:
3 - 11
(c), (b), (a)
Copyright © 2014 Pearson Education, Inc. All rights reserved
The Standard Deviation and the Mean
3 - 12
In San Francisco, the mean high temperature is 65
degrees and the standard deviation is 8 degrees. In
Provo the mean is 67 and the standard deviation is
21. Is a high temperature of 52 rarer in San
Francisco or in Provo?
SF: 65 – 8 = 57, Provo: 67 – 21 = 46
Since 52 degrees is within one standard deviation of
Provo’s mean and not of San Francisco’s mean, a
temperature of 52 is rarer in San Francisco.
Copyright © 2014 Pearson Education, Inc. All rights reserved
Using StatCrunch to Find the Mean and
Standard Deviation
3 - 13
Enter Data, then go to
Stat →Summary Stats → Columns
Click on the variable name and
hit Calculate.
This calculates the mean,
standard deviation and other
statistics that will be used later.
Copyright © 2014 Pearson Education, Inc. All rights reserved
3.2
What’s Unusual?
The Empirical Rule
and z-Scores
Copyright © 2014 Pearson Education, Inc. All rights reserved
The Empirical Rule Graphically
3 - 15
Copyright © 2014 Pearson Education, Inc. All rights reserved
Empirical Rule
The Empirical Rule: If a distribution is
unimodal and symmetric, then
Approximately
68% of the observations (roughly
two-thirds) will be within one standard deviation
of the mean.
Approximately 95% of the observations will be
within two standard deviations of the mean.
Nearly all the observations will be within three
standard deviations of the mean.
3 - 16
Copyright © 2014 Pearson Education, Inc. All rights reserved
Empirical Rule Example
3 - 17
The mean body weight for women between 18 and
25 years old is 134 lbs and the standard deviation is
26 lbs. Assume a mound shaped distribution.
134 – 26 = 108, 134 + 26 = 160
About 68% of women in this age group weigh
between 108 and 160 lbs.
134 – 2(26) = 82, 134 + 2(26) = 186
About 95% weigh between 82 and 186 lbs.
Almost all weigh between 56 and 212 lbs.
Copyright © 2014 Pearson Education, Inc. All rights reserved
Using the Empirical Rule
High temperatures in San Francisco follow a
unimodal and symmetric distribution with
mean 65 degrees and standard deviation 8
degrees. Give a range of temperatures that
includes the middle 95% of high temperature
days in San Francisco.
65 – 2(8) = 49,
65 + 2(8) = 81
About 95% of all days in San Francisco have
high temperatures between 49 and 81 degrees.
3 - 18
Copyright © 2014 Pearson Education, Inc. All rights reserved
Empirical Rule Example
3 - 19
Daily cash register receipts at a local store follow a
mound shaped distribution with mean $9,200 and
standard deviation $150. The day a new employee
was hired the store took in $4,500. Should the
manager be concerned?
9200 – 3(150) = 4700
Yes, the manager should be concerned, since it is
highly unlikely that such a low receipt total for the
day would happen by random chance alone.
Copyright © 2014 Pearson Education, Inc. All rights reserved
The Trouble With Evaluating if a Data
Value is Unusual
Is 2 less than the mean male height short?
2 feet shorter is much shorter.
2 millimeters shorter is not much shorter.
Instead, statisticians normalize the values by
citing the Z-Score.
3 - 20
Copyright © 2014 Pearson Education, Inc. All rights reserved
The Z-Score
The Z-Score measures the number of
standard deviations the value is from the
mean.
The resulting units are called Standard Units.
The Z-Score is used to compare values
measured in different units such as feet and
millimeters.
3 - 21
Copyright © 2014 Pearson Education, Inc. All rights reserved
The Z-Score Formula
xx
z
s
The mean price for a loaf of bread is $3.12
and the standard deviation is $0.89. Find the
z-Score for a loaf of bread that costs $2.00.
2.00 3.12
z
1.26
0.89
3 - 22
The z-Score is about -1.26.
Copyright © 2014 Pearson Education, Inc. All rights reserved
Comparing values
What is more unusual: a value of 0.26 from a
distribution with mean 0.37 and standard deviation
0.03 or a value of 45 from a distribution with mean
38 and standard deviation 4?
0.26 0.37
3.67
0.03
The value of 0.26 is z0.26
more unusual since
it has a z-score that
45 38
z45
1.75
is farther from 0.
4
3 - 23
Copyright © 2014 Pearson Education, Inc. All rights reserved
3.3
Summaries for Skewed
Distributions
Copyright © 2014 Pearson Education, Inc. All rights reserved
Skewness and the Trouble with the Mean
For a skewed distribution, the mean gets
“pulled” towards the tail.
The mean is also “pulled” towards outliers.
For a skewed distribution or a distribution
with only upper or only lower outliers, the
mean does not represent a typical value.
3 - 25
Copyright © 2014 Pearson Education, Inc. All rights reserved
The Median to Represent the Center
The middle value, called the median is often
a better representation of the center.
The median is defined by the middle number
or the average of the two middle numbers if
the sample size is even.
The median cuts the data in half. Typically
half the values are below the median and half
are above.
3 - 26
Copyright © 2014 Pearson Education, Inc. All rights reserved
Median vs. Mean
The median income of $18,000 better
represents the typical income than much
higher mean income.
The right tail greatly increases the mean but
only slightly increases the median.
3 - 27
Copyright © 2014 Pearson Education, Inc. All rights reserved
Calculating the Median
Sort the data from largest to smallest.
If the set contains an odd number of observed
values, the median is the middle observed value.
If the set contains an even number of observed
values, the median is the average of the two
middle observed values. This places the median
precisely halfway between the two middle
values.
3 - 28
Copyright © 2014 Pearson Education, Inc. All rights reserved
Example
The following data represent eight home prices in
thousands of dollars. Find the median:
123, 457, 278, 184, 216, 336, 192, 184
First sort from smallest to largest:
123, 184, 184, 192, 216, 278, 336, 457
Since there are an even number of numbers take the
average of the middle two:
192 216
204
2
3 - 29
Copyright © 2014 Pearson Education, Inc. All rights reserved
Quartiles
The First Quartile (Q1) is the value such that
25% of the data lie at or below this value.
Q1 is roughly the median of the lower half of
the data.
The Third Quartile (Q3) is the value such that
75% of the data lie at or below this value.
Q3 is roughly the median of the upper half of
the data.
3 - 30
Copyright © 2014 Pearson Education, Inc. All rights reserved
The Interquartile Range (IQR)
The Interquartile Range (IQR) represents the
range of the middle 50% of the data.
Cut the ordered data into four equal parts.
The distance taken up by the middle two
parts is the interquartile range.
IQR = Q3 – Q1
3 - 31
Copyright © 2014 Pearson Education, Inc. All rights reserved
Interpreting Q1, Q3, and IQR
The first quartile for birth weights is 3.1 kg and
the third quartile is 3.7 kg. Interpret Q1, Q3,
and the IQR.
Q1 = 3.1. This means that 25% of all babies are
born weighing at or below 3.1 kg.
Q3 = 3.7. This means that 75% of all babies are
born weighing at or below 3.7 kg.
Q3 – Q1 = 0.6. The middle half of all birth
weights has a range of 0.6 kg.
3 - 32
Copyright © 2014 Pearson Education, Inc. All rights reserved
How the Quartiles and IQR are Used
Quartiles and the IQR are primarily used
when there are large data sets, for example:
National
Exam Scores
Physical Measurements: Weight, Height
Cholesterol Levels, BMI, etc.
Income of state residents
Time to run one mile
3 - 33
Copyright © 2014 Pearson Education, Inc. All rights reserved
The Range
The range is the distance spanned by the entire
data set.
Range = Maximum ˗ Minimum
The range is easy to calculate, but is subject to
peculiarities of the data set and is very sensitive
to outliers.
A smaller sample size is likely to produce a
smaller range. The range of a sample is a poor
predictor of the range for the population.
3 - 34
Copyright © 2014 Pearson Education, Inc. All rights reserved
3.4
Comparing Measures
of Center
Copyright © 2014 Pearson Education, Inc. All rights reserved
Mean and Standard Deviation or Median
and IQR?
Use the mean and standard deviation when
the distribution is mound shaped.
Use the Median and IQR when the
distribution is skewed left or skewed right.
If the distribution is not unimodal, it may be
better to split the data.
3 - 36
Copyright © 2014 Pearson Education, Inc. All rights reserved
Song lengths
Song lengths are
skewed right because
there are many short songs, no negative length songs,
but a few long songs.
The mean is influenced greatly by the right tail.
The median isn’t.
The median of 226 seconds better represents the
typical song. The IQR of 117 seconds covers the
high bars of the histogram.
3 - 37
Copyright © 2014 Pearson Education, Inc. All rights reserved
San Francisco
Temperatures
3 - 38
The distribution is
approximately
mound shaped.
With mound shaped distributions the mean and
median are nearly the same number.
The mean is preferred over the median if they are
close together.
One standard deviation from the mean gives a lower
bound of 57 and an upper bound of 73. This covers
the high bars of the histogram.
Copyright © 2014 Pearson Education, Inc. All rights reserved
The Effect of Outliers
Number of employees at several businesses
on main street: 6, 7, 14, 18, 23, 25, 26
Mean
3 - 39
Median = 18
If the 26 employee business is turned into
a Wal-Mart: 6, 7, 14, 18, 23, 25, 334
Mean
= 17
= 61
Median = 18
Conclusion: The mean is strongly affected
by outliers, while the median is not affected
by outliers.
Copyright © 2014 Pearson Education, Inc. All rights reserved
Affected by Outliers?
Affected by Outliers:
Mean
Standard
Deviation
Range
Not Affected by Outliers:
Median
Interquartile
3 - 40
Range (IQR)
Copyright © 2014 Pearson Education, Inc. All rights reserved
Bimodal Distributions
For most bimodal distributions, neither the
mean nor the median represent typical values.
Investigate further to see if there are two
separate sub-populations.
Consider separating the two populations and
present their graphs and statistics
individually.
3 - 41
Copyright © 2014 Pearson Education, Inc. All rights reserved
Trouble with Bimodal Distributions
There are two typical values.
Neither the mean nor the median describe the
typical values.
The data should be separated out by lunch
customers and dinner customers.
3 - 42
Copyright © 2014 Pearson Education, Inc. All rights reserved
Separating Lunch and Dinner
3 - 43
Displaying the data with two histograms
allows a comparison between lunch and
dinner.
Copyright © 2014 Pearson Education, Inc. All rights reserved
Separating Lunch and Dinner
3 - 44
The Lunch distribution is mound shaped and the
Dinner distribution is skewed right.
Do not compare the mean of one data set with the
median of another.
Use the medians for comparisons. Lunch median is
$8 and Dinner median is $22
Copyright © 2014 Pearson Education, Inc. All rights reserved
3.5
Using Boxplots for
Displaying Summaries
Copyright © 2014 Pearson Education, Inc. All rights reserved
The Five Point Summary
When the data are partitioned into four equal
segments, five important numbers arise. They are
called the Five Point Summary:
– Smallest Value
First Quartile (Q1) – The Median of the Lower Half
Median – The Middle Number or Center
Third Quartile – The Median of the Upper Half
Maximum – Largest Value
Minimum
3 - 46
Copyright © 2014 Pearson Education, Inc. All rights reserved
How many Boyfriends/Girlfriends?
The results of a survey asking how many
boyfriends/girlfriends people have had is
shown below:
0,
1, 1, 2, 3, 4, 4, 5, 6, 8,10
The five point summary is:
Minimum
=0
Median = 4
Maximum = 10
3 - 47
Q1
=1
Q3 = 6
Copyright © 2014 Pearson Education, Inc. All rights reserved
Potential Outliers
A Potential Outlier is a data value that is a
distance of more than 1.5 interquartile ranges
below the first quartile or above the third
quartile.
1.
2.
3.
4.
3 - 48
Calculate IQR = Q3 – Q1
Find m = Q1 – (1.5)(IQR)
Find M = Q3 + (1.5)(IQR)
Any values less than m or more than M are
potential outliers.
Copyright © 2014 Pearson Education, Inc. All rights reserved
Finding Possible Outliers
The first quartile, Q1, for triglycerides is 109
mg/dL. The third quartile, Q2, is 150 mg/dL.
Determine which if any of the following triglyceride
readings are potential outliers:
38, 200, 225
3 - 49
IQR = 150 – 109 = 41
Q1 – (1.5)(IQR) = 109 – (1.5)(41) = 47.5
Q3 – (1.5)(IQR) = 150 + (1.5)(41) = 211.5
38 and 225 are potential Outliers since 38 < 47.5
and 225 > 211.5.
Copyright © 2014 Pearson Education, Inc. All rights reserved
Boxplots
A Boxplot is a chart that visually displays Q1, the
median, Q3, and the potential outliers.
To create a boxplot:
1.
2.
3.
4.
3 - 50
Plot the potential outliers
Draw small vertical line segments at Q1, Q3, and
the median.
Draw a box with base from Q1 to Q3.
Sketch horizontal line segments from the ends of
the box to the smallest and largest values that are
not potential outliers.
Copyright © 2014 Pearson Education, Inc. All rights reserved
Box Plot
3 - 51
Copyright © 2014 Pearson Education, Inc. All rights reserved
Interpreting a Boxplot
What percent of students scored below 83%?
Answer:
What percent of students scored between
83% and 92%?
Answer:
3 - 52
25%
50%
Copyright © 2014 Pearson Education, Inc. All rights reserved
Comparing Distributions with Boxplots
Both cities have similar typical temperatures.
Both cities have fairly symmetric distributions.
Provo has a much greater variation in
temperatures than San Francisco.
3 - 53
Copyright © 2014 Pearson Education, Inc. All rights reserved
What Boxplots Show and Don’t Show
Boxplots Show:
Typical
Range of Values
Possible Outliers
Variation
Boxplots Don’t Show:
Modality
Mean
Anything
3 - 54
for small data sets, especially < 5.
Copyright © 2014 Pearson Education, Inc. All rights reserved
Chapter 3
Case Study
Copyright © 2014 Pearson Education, Inc. All rights reserved
Perceived Risk
3 - 56
Copyright © 2014 Pearson Education, Inc. All rights reserved
Perceived Risk of Appliances
3 - 57
Skewed right for both men and women.
Unimodal for both men and women.
Women’s typical value slightly higher than men’s.
Five Point Summary appropriate for both.
Copyright © 2014 Pearson Education, Inc. All rights reserved
Risk of Appliance:
Statistics
Men’s median is 10, women’s median is
higher at 15.
The middle 50% of men varied by 20, while
the variation was higher, 25 for women.
3 - 58
Copyright © 2014 Pearson Education, Inc. All rights reserved
Perceived Risk X-rays
3 - 59
Relatively symmetric for both men and women.
Unimodal for both men and women.
Women’s typical value close to men’s.
Mean and standard deviation appropriate for both.
Copyright © 2014 Pearson Education, Inc. All rights reserved
Risk of X-rays:
Statistics
Mean Standard
Deviation
Men
46.8 20
Women 47.8 20.8
Men and Women have similar mean and
standard deviation risk perception for X-rays.
About 68% of men perceive a risk between
26.8 and 66.8.
About 68% of women perceive a risk
between 27 and 68.6.
3 - 60
Copyright © 2014 Pearson Education, Inc. All rights reserved
Chapter 3
Guided Exercise 1
Copyright © 2014 Pearson Education, Inc. All rights reserved
The mean rate of violent crime in the west
was 406 per 100,000 people, and the standard
deviation was 177. Assume the distribution is
approximately unimodal and symmetric.
3 - 62
Between which two values would you expect to find
about 95% of the violent crime rates?
Between which two values would you expect to find
about 68% of the violent crime rates?
If a western state had a violent crime rate of 584
crimes per 100,000 people, would you consider this
unusual?
Would 30 crimes per 100,000 people be unusual?
Copyright © 2014 Pearson Education, Inc. All rights reserved
The mean rate of violent crime in the west
was 406 per 100,000 people, and the standard
deviation was 177. Assume the distribution is
approximately unimodal and symmetric.
3 - 63
Copyright © 2014 Pearson Education, Inc. All rights reserved
The mean rate of violent crime in the west was 406
per 100,000 people, and the standard deviation was
177. Assume the distribution is approximately
unimodal and symmetric.
By the Empirical Rule,
about 95% of the data
is within two standard
deviations of the mean.
3 - 64
This represents the green and blue areas together.
The number 583 represents one standard deviation
more than the mean: 406 + 177 = 583.
Copyright © 2014 Pearson Education, Inc. All rights reserved
The mean rate of violent crime in the west was 406
per 100,000 people, and the standard deviation was
177. Assume the distribution is approximately
unimodal and symmetric.
406 – 177 = 229
406 – 2(177) = 52
406 + 2(177) = 760
3 - 65
Copyright © 2014 Pearson Education, Inc. All rights reserved
The mean rate of violent crime in the west was 406 per
100,000 people, and the standard deviation was 177. Assume
the distribution is approximately unimodal and symmetric.
3 - 66
Between which two values
would you expect to find
about 95% of the violent
crime rates?
95% of the violent
crime rates are between
52 and 760 crimes per 100,000 people.
Between which two values would you expect to find
about 68% of the violent crime rates?
68% of the violent crime rates are between 229 and
583 crimes per 100,000 people.
Copyright © 2014 Pearson Education, Inc. All rights reserved
The mean rate of violent crime in the west was 406 per
100,000 people, and the standard deviation was 177. Assume
the distribution is approximately unimodal and symmetric.
3 - 67
If a western state had a violent
crime rate of 584 crimes per
100,000 people, would you
consider this unusual?
No, since 584 is within 2
standard deviations of
the mean.
Would 30 crimes per 100,000 people be unusual?
Yes, because less than 5% occur so far from the
mean.
Copyright © 2014 Pearson Education, Inc. All rights reserved
Chapter 3
Guided Exercise 2
Copyright © 2014 Pearson Education, Inc. All rights reserved
The head circumferences in centimeters
for some men and women in a statistics
class are given.
Men: 58, 60, 62.5, 63, 59.5, 59, 60, 57, 55
Women: 63, 55, 54.5, 53.5, 53, 58.5, 56,
54.5, 55, 56, 56, 54, 56,53, 51
Compare the circumferences of the men’s
and women’s heads.
3 - 69
Copyright © 2014 Pearson Education, Inc. All rights reserved
Histograms of the two sets of Data.
3 - 70
Copyright © 2014 Pearson Education, Inc. All rights reserved
Shapes
The distribution for
men is unimodal and
not too far from
symmetric.
The distribution for
women is unimodal
and nearly symmetric
except one possible
outlier.
3 - 71
Copyright © 2013
2014 Pearson Education, Inc..
Inc. All rights reserved.
reserved
Mean and Standard Deviation or
Quartiles and IQR?
3 - 72
Since the women’s
distribution has a
possible outlier, the
quartiles and IQR
should be used for
comparisons.
Copyright © 2014 Pearson Education, Inc. All rights reserved
Compare Centers
3 - 73
The median head circumference for the men
was 59.5 cm, and the median head
circumference for the women was 55 cm.
This shows that the men tended to have
larger heads.
Copyright © 2014 Pearson Education, Inc. All rights reserved
Compare Variances
3 - 74
The interquartile range for the head
circumferences for the men was 2 cm, and
the interquartile range for the women was 2.5
cm. This shows that the women tended to
have more variation, as measured by the
interquartile range.
Copyright © 2014 Pearson Education, Inc. All rights reserved
Outliers
Men: 58, 60, 62.5, 63, 59.5, 59, 60, 57, 55
Women: 63, 55, 54.5, 53.5, 53, 58.5, 56, 54.5, 55,
56, 56, 54, 56, 53, 51
3 - 75
Q1 – (1.5)(IQR) = 55, Q3 + (1.5)(IQR) = 63
No Possible outliers for the men.
Q1 – (1.5)(IQR) = 49.75, Q3 + (1.5)(IQR) = 59.75
63 is a possible outlier for the women.
Copyright © 2014 Pearson Education, Inc. All rights reserved
Final Comparison
3 - 76
The typical head circumference for men is
about 4.5 cm larger than the head
circumference for women. The women’s
head circumference had slightly more
variation than the men’s.
Copyright © 2014 Pearson Education, Inc. All rights reserved