Chapter 4 Displaying Quantitative Data

Download Report

Transcript Chapter 4 Displaying Quantitative Data

Chapter 4
Displaying Quantitative Data
Dealing With a Lot of
Numbers...
When looking at large sets of quantitative
data, it can be difficult to get a sense of
what the numbers are telling us without
summarizing the numbers in some way.
 In this chapter, we will concentrate on
graphical displays of quantitative data.

Percent of Population over 65
per state (1996)
13.0 14.3 12.5
5.2 12.8 12.6
13.2 18.5 15.2
14.4 9.9 13.7
10.5 12.9 12.6
11.0 11.4 11.4
13.8 13.2
13.9
11.4
14.1
12.4
12.4
12.3
13.8
11.4
12.0
13.8
11.0
13.4
12.5
14.5
13.4
13.5
13.4
15.9
15.8
12.1
14.4
12.5
10.2
8.8
12.1
11.2
11.6
15.2
13.3
11.2
What do these data tell us?

Make a picture




Histogram
Stem-and-Leaf Display
Dot plot
First three things to do with data



Make a picture
Make a picture
Make a picture
Displaying Quantitative Data

Histogram
Give each graph a title
 Give each one of the axes a label
 Make as neat as possible

• Computer
• Grid paper
Displaying Quantitative Data

Histogram
Divide data values into equal-width piles
(called bins)
 Count number of values in each bin
 Plot the bins on x-axis
 Plot the bin counts on y-axis

Example – Population Over 65

Decide on bin values




Low value is 5.2 and high value is 18.5
Bins are 5.0 up to 6.0, 6.0 up to 7.0, etc.
Written as 5.0 ≤ X < 6.0, 6.0 ≤ X < 7.0
Count number of values in each bin





Bin 5.0 ≤ X < 6.0 has 1 value
Bin 6.0 ≤ X < 7.0 has 0 values
Bin 7.0 ≤ X < 8.0 has 0 values
Bin 8.0 ≤ X < 9.0 has 1 value
Continue counting values in each bin
Example – Population Over 65

Plot bins on x-axis


14 bins from 5.0 ≤ X < 6.0 to 18.0 ≤ X < 19.0
Plot bin counts on y-axis

Bin counts are:
1, 0, 0, 1, 1, 2, 9, 13, 13, 5, 4, 0, 0, 1
Displaying Quantitative Data

Stem and Leaf Display
Picture of Distribution
 Generally used for smaller data sets
 Group data like histograms
 Still have original values (unlike
histograms)
 Two columns

• Left column: Stem
• Right column: Leaf
Displaying Quantitative Data

Stem and Leaf Display

Leaf
• Contains the last digit of the values
• Arranged in increasing order away from stem

Stem
• Contains the rest of the values
• Arranged in increasing order from top to bottom
Example – Population Over 65
Leaf = tenths digit
 Stem = tens and ones digits
 Ex. 5 | 2
 Ex. 10| 2 5
 Ex. 14| 1 3 4 4 5

Percent of Population over Age 65 (by state) in
1996
5
6
7
8
9
10
11
12
13
14
15
16
17
18
2
8
9
2
0
0
0
1
2
5
5
0
1
2
3
2
2
1
2
4
8
2
3
3
4
9
4 4 4 4 6
4 4 5 5 5 6 6 8 9
4 4 4 5 7 8 8 8 9
5
Example – Frank Thomas

Career Home Runs (19902004)
4 7 15 18 24 28 29
32 35 38 40
40 41 42 43
0
1
2
3
4
4
5
4
2
0
7
8
8 9
5 8
0 1 2 3
Displaying Quantitative Data

Back-to-back Stem-and-Leaf Display
Used to compare two variables
 Stems in center column
 Leafs for one variable – right side
 Leafs for other variable – left side
 Arrange leafs in increasing order,
AWAY FROM STEM!

Example – Compare Frank
Thomas to Ryne Sandberg

Career Home Runs for
Ryne Sandberg (19811997)
0 5 7 8 9 12 14 16
19 19 25 26 26 26 30
40
9 8 7 5 0 0
9 9 6 4 2 1
6 6 6 5 2
0 3
0 4
4
5
4
2
0
7
8
8 9
5 8
1 2 3
Displaying Quantitative Data
If there are a large number of
observations in only a few stems, we
can split stems.
 Split the stems into two stems

First stem is 0 – 4.
 Second stem is 5 – 9.


If you choose to split one stem you
MUST split them all!
Example – Population Over 65
12 0 1 1 3 4 4 5 5 5 6 6 8 9
13 0 2 2 3 4 4 4 5 7 8 8 8 9
12
12
13
13
0
5
0
5
1
5
2
7
1
5
2
8
3
6
3
8
4
6
4
8
4
8
4
9
Looking at Distributions

Always report 3 things when
describing a distribution:
1.
2.
3.
Shape
Center
Spread
Looking at Distributions

Shape

How many humps (called modes)?
•
•
•
•
None = uniform
One = unimodal
Two = bimodal
Three or more = multimodal
Unimodal vs Bimodal
Size of Diamonds (carats)
Histogram of Octane Rating
10
15
9
8
Frequency
Frequency
7
6
5
4
3
10
5
2
1
0
0
86
87
88
89
90
91
92
Octane
93
94
95
96
0.1
0.2
0.3
Size (carats)
0.4
Looking at Distributions

Shape

Is it symmetric?
• Symmetric = roughly equal on both sides
• Skewed = more values on one side
• Right = Tail stretches to large values
• Left = Tail stretches to small values

Are there any outliers?
• Interesting observations in data
• Can impact statistical methods
Examples of Skewness
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Looking at Distributions

Center
A single number to describe the data
 Can calculate different numbers for center

Looking at Distributions

Spread

Variation in the data values
• Smallest observation to the largest observation
• May take into account any outliers
• Later, spread will be a single number
Example – Population Over 65

Shape
Unimodal
 Symmetric
 Two Outliers (5% and 18%)

Center - 12%
 Spread - Almost all observations are
between 8% and 16%

Example – Frank Thomas
• Shape
0
1
2
3
4
4
5
4
2
0
7
8
8 9
5 8
0 1 2 3
– Unimodal
– Skewed left
– No outliers
• Center - 28
• Spread – between 4 and 43
Example – Compare Frank
Thomas to Ryne Sandberg
98 7500
99 6421
6 6652
03
04
4
5
4
2
0
7
8
89
58
123
• Shape
– Unimodal
– Skewed right
– No Outliers
• Center – 26
• Spread – between 0 and 40
• Both players have about the same
spread
• Thomas has more higher values
What Do We Know?


Histograms, Stem-and-Leaf Displays, Back-toBack Stem-and-Leaf Displays
When describing a display, always mention:




Shape: number of modes, symmetric or skewed
Spread
Center
Outliers (mention them if they exist; otherwise,
say there are no outliers)
What Do We Know? (cont.)
A graph is either symmetric or skewed,
not both!
 If a graph is skewed, be sure to specify
the direction:


Skewed left or skewed right