Describing Quantitative Data with Numbers

Download Report

Transcript Describing Quantitative Data with Numbers

Describing Quantitative Data with
Numbers
08.23.2016
Going over the HW
37. (a)

(b) The distribution is roughly symmetric with a midpoint of 6
hours. The hours of sleep vary from 3 to 11. There do not appear to
be any outliers
38.
(a)
 The distribution is skewed to the right with a mode of 0. There are three
outliers: China with 51 gold medals, USA with 36, and Great Britain with
19. The rest of the countries earned 7 or fewer.
 (b) No, this does not seem to be a representative sample because 17 out of
the 30 countries in the sample won gold medals. Overall, we might expect
less than half to win gold medals
39. (a) These dots represent games won by the other team

40.

(b) Only two of the 34 scores are negative, meaning that the team was
very successful, scoring at least as many goals as the opponent in 32
games. In one extreme game, they beat the opponent by eight goals
(a) These dots represent cars that get 6 mpg better efficiency on the
highway than in the city
(b) The highway fuel efficiency is higher than the city fuel efficiency for all
cars. Most of the cars got at least 9 mpg more on the highway than in the
city. Only two cars got less than 7 mpg more on the highway than in the
city
41. (a) Answers will vary. The tail should be longer on the left

(b) Older coins are less common (due to being lost, destroyed, or taken
out of circulation), so there would be fewer of them
42. The shape of the distribution is fairly uniform—each number appears
with roughly the same frequency as each of the others in the last digit
of telephone numbers
43. While both groups have a range of about 20, the internal-reasons
group has a higher center (about 21 versus about 17.5 for the
external-reasons group)
44. This claim seems partially supported. Instead of the lowest shelf,
stores seem to put cereals with the most sugar on the middle shelf
45. (a) If we had not split the stems, most of our data would have
appeared on only a few stems


(b) 16%. This may be due to the power of the Mormon church in Utah
(c) Roughly symmetric with a center of 13% and a spread of roughly
3.5%. No outliers (other than Utah)
46. (a) If we had not split the stems, most of our data would
have appeared on only a few stems


(b) Key: 2|3 means that an 8-ounce serving of that soft drink
has 23 mg of caffeine
(c) Somewhat skewed to the right. The center is 28 mg and
the values range from 15 to 47 mg. All drinks meet the FDA
limits. No outliers.
47. (a) Answers vary. Split stems provide more detail, but
possibly harder to interpret


(b) Relatively symmetric with center near 780 mm and a
range of 353 mm
(c) El nino seems to reduce monsoon rainfall. Rainfall was
below average in 18 of 23 El Nino years
53.
54.

(a)
(b) Skewed to the right with center near 3 metric tons per person.
The range is 19.5 metric tons per person. There appear to be three
outliers: Canada, Australia, and USA
55. Somewhat skewed to the left with a center at 35. The
smallest DRP score is 14 and the largest is 54. There are no
gaps or outliers
56. Roughly Symmetric. No clear outliers
57.
58. (a)
1200
1000
800
600
400
200
0
Chest Size in inches

(b) Symmetric with a center around 40 inches and a range of
15 inches. This information might be useful to the military
when ordering uniforms
59.
60.
68.


The scale on the x-axis is not the same on the two graphs
Both graphs are skewed to the right, but the Yankees have a longer right tail.
This means have a larger center and a larger spread. The median salary for the
Yankees is between $4,000,000 and $8,000,000 while the Phiillies’ is
between $0 and $4,000,000. The Yankees have a range about twice as large as
the Phillies ($32 million vs $16 million)
(a) Bar graph—radio station is categorical
(b) Dotplot, stemplot, or histogram—quantitative variable
(c) Dotplot, stemplot, or histogram—quantitative variable
69. A
70. D
71. C
72. B
73. B
74. D
Section 1.3
 Mean, median, and mode
 What is the difference?
The Mean
Note: the x-bar notation only applies to the mean of a sample, not the mean of a
population
However, the calculations are the same
Let’s Try it
 There are only 4 of us that are 18 years or older (including
me)
 For that sample of four people, we have lived in Colorado for
the following lengths of time: 24 years, 18 years, 18 years, and
7 years.
 What is the mean of these data?
Let’s Try it
 There are only 4 of us that are 18 years or older (including
me)
 For that sample of four people, we have lived in Colorado for
the following lengths of time: 24 years, 18 years, 18 years, and
7 years.
 What is the mean of these data?
 16.75 years
 Now let’s remove the 7 years observation
 What happens to the mean?
Let’s Try it
 There are only 4 of us that are 18 years or older (including
me)
 For that sample of four people, we have lived in Colorado for
the following lengths of time: 24 years, 18 years, 18 years, and
7 years.
 What is the mean of these data?
 16.75 years
 Now let’s remove the 7 years observation
 What happens to the mean?
 Now 20 years—big difference
 What does this tell us about the mean as a way to measure the center of a
dataset?
An Alternative: The Median
Let’s Try it
 Same data: 24 years, 18 years, 18 years, and 7 years
 What is the median?
Let’s Try it
 Same data: 24 years, 18 years, 18 years, and 7 years
 What is the median?
 18 years
 Now do it removing the 7 years observation
 What is the new median?
Let’s Try it
 Same data: 24 years, 18 years, 18 years, and 7 years
 What is the median?
 18 years
 Now do it removing the 7 years observation
 What is the new median?
 Still 18 years
 What does this tell us about the median?
Mean or Median?
 It depends…
 When describing a distribution, median is more useful
 For some calculations, the mean might be more appropriate
 Taxes
 GPA
What about the mode?
 The least often used—except on standardized tests
 Simply the most common value for a variable
 So…in our 4-observation dataset of years spent in Colorado,
what is the mode?
Beyond the Center
 In practice, we often care about much more than just the
center of the data
 The average temperature is the same in San Francisco as in
Springfield (MO)
 Despite very different temperatures
 What does the mean/median fail to capture?
Beyond the Center
 In practice, we often care about much more than just the
center of the data
 The average temperature is the same in San Francisco as in
Springfield (MO)
 Despite very different temperatures
 What does the mean/median fail to capture?
 Variability
 Can be measured in terms of the range
 Any problems with using the range to describe variability?
Variability
 The Range
 Weakness: depends on the minimum and maximum values
 Particularly if they are outliers, this could be a problem
 Interquartile range (IQR)
 Looks at the range of the middle half (50%) of the data
 1st quartile is the point that separates the bottom quarter of data from
the second-from-the-bottom
 2nd quartile is the median
 3rd quartile is the point that separates the top quarter of data from the second-
from-the-top
Variability
Defining Outliers
5-number summary
So, let’s do it using our heights:
Min Q1 Med
Q3
Max
60 64 67
70
77
Boxplots
Boxplots
 In our example, the median is exactly in between the 1st and
third quartiles. This does not always happen
 Similarly, you’ll notice that one whisker is longer than the
other
 This is totally normal
 What does that tell us about the skewness of our data?
Standard Deviation
 Most common way of measuring the spread of a distribution
 Essentially measuring how far, on average, the values in the
distribution are from the mean
 So the mean is important here
 If you have reason to think the mean is not ideal, standard
deviation might not be ideal either
Standard Deviation
Standard Deviation
 On the AP test, you will be given the formula for standard
deviation
 You do not need to memorize it
 But you do need to understand what the formula means
You may now begin your homework
 Chapter 1: 91-97, 103-110