Chapter 4 - peacock
Download
Report
Transcript Chapter 4 - peacock
Chapter 4
Displaying and
Summarizing
Quantitative Data
Performance Scales:
1. Know the goals and the plots on the real number line
(dot plots, histograms, and box plots).
2. Use the goals to understand the plots on the real
number line (dot plots, histograms, and box plots).
3. Use the goals to represent data with plots on the real
number line (dot plots, histograms, and box plots)
MAFS.912.S-ID.1.1.
4. Adapts and applies the goals to represent data with
plots on the real number line (dot plots, histograms,
and box plots) in different and more complex
problems.
Learning Goals
1. Know how to display the distribution of a
quantitative variable with a histogram, a
stem-and-leaf display, or a dotplot.
2. Know how to display the relative position of
quantitative variable with a Cumulative
Frequency Curve and analysis the
Cumulative Frequency Curve.
3. Be able to describe the distribution of a
quantitative variable in terms of its shape.
4. Be able to describe any anomalies or
extraordinary features revealed by the
display of a variable.
Learning Goals
5. Be able to determine the shape of the
distribution of a variable by knowing
something about the data.
6. Know the basic properties and how to
compute the mean and median of a set of
data.
7. Understand the properties of a skewed
distribution.
8. Know the basic properties and how to
compute the standard deviation and IQR of
a set of data.
Learning Goals
9. Understand which measures of center and
spread are resistant and which are not.
10. Be able to select a suitable measure of
center and a suitable measure of spread for
a variable based on information about its
distribution.
11. Be able to describe the distribution of a
quantitative variable in terms of its shape,
center, and spread.
Learning Goal 1
Know how to display
the distribution of a
quantitative variable
with a histogram, a
stem-and-leaf display,
or a dotplot
Learning Goal 1:
Ways to Graph Quantitative Data
Histograms and Stemplots
These are summary graphs for a single variable. They are
very useful to understand the pattern of variability in the
data.
Dotplots
Quick and easy graph for small data sets.
Cumulative Frequency Curves (Ogive)
Used to compare relative standings of the data.
Line Graphs: Time Plots
Use when there is a meaningful sequence, like time. The
line connecting the points helps emphasize any change
over time.
Learning Goal 1:
Dealing With a Lot of Numbers…
Summarizing the data will help us
when we look at large sets of
quantitative data.
Without summaries of the data, it’s
hard to grasp what the data tell us.
The best thing to do is to make a
picture…
We can’t use bar charts or pie charts
for quantitative data, since those
displays are for categorical variables.
Learning Goal 1:
Tabulating Numerical Data
What is a Frequency Distribution (table)?
A frequency distribution is a list or a table …
containing class groupings (ranges within
which the data fall) ...
and the corresponding frequencies with
which data fall within each grouping or class.
Learning Goal 1:
Why Use a Frequency Distribution?
It is a way to summarize numerical
data.
It condenses the raw data into a more
useful form.
It allows for a quick visual
interpretation of the data.
Quantitative Data
HISTOGRAM
Learning Goal 1:
Histograms
A Histogram is a graph
that uses bars to portray
the frequencies or the
relative frequencies of the
possible outcomes for a
quantitative variable.
12
Learning Goal 1:
Histograms
The most common graph used to
display one variable quantitative data.
Learning Goal 1:
Histograms
To make a histogram we first need to
organize the data using a quantitative
frequency table.
Two types of quantitative data
1. Discrete – use ungrouped frequency table to
organize.
2. Continuous – use grouped frequency table to
organize.
Learning Goal 1:
Quantitative Frequency Tables – Ungrouped
• What is an ungrouped frequency
table? An ungrouped frequency
table simply lists the data values
with the corresponding frequency
counts with which each value
occurs.
• Commonly used with discrete
quantitative data.
Learning Goal 1:
Quantitative Frequency Tables – Ungrouped
• Example: The at-rest pulse rate for 16
athletes at a meet were
57, 57, 56, 57, 58, 56, 54, 64, 53,
54, 54, 55, 57, 55, 60, and 58.
Summarize the information with an
ungrouped frequency distribution.
Learning Goal 1:
Quantitative Frequency Tables – Ungrouped
Example
continued: 57,
57, 56, 57, 58,
56, 54, 64, 53,
54, 54, 55, 57,
55, 60, 58.
Note: The
(ungrouped)
classes are the
observed
values
themselves.
Class (pulse rate)
Frequency, f
53
1
54
3
55
2
56
2
57
4
58
2
59
0
60
1
61
0
62
0
63
0
64
1
Total
N =16
Learning Goal 1:
Quantitative Relative Freq. Tables - Ungrouped
Class
(pulse rate)
Frequency,
f
Relative
Frequency
53
1
0.0625
54
3
0.1875
55
2
0.1250
56
2
0.1250
57
4
0.2500
58
2
0.1250
59
0
0
60
1
0.0625
61
0
0
62
0
0
63
0
0
64
1
0.0625
Total
N =16
1
Note: The
relative
freq. for a
class is
obtained by
computing
f/n.
Learning Goal 1:
Relative Freq. Tables – Your Turn
TVs per Household Trends in Television,
published by the Television Bureau of
Advertising, provides information on television
ownership. The table gives the number of TV
sets per household for 50 randomly selected
households. Use classes based on a single
value to construct a ungrouped-data relative
frequency table for these data.
Learning Goal 1:
Relative Freq. Tables – Solution
Learning Goal 1:
Quantitative Frequency Tables – Grouped
• What is a grouped frequency table? A
grouped frequency table is obtained by
constructing classes (or intervals) for
the data, and then listing the
corresponding number of values
(frequency counts) in each interval.
• Commonly used with continuous
quantitative data.
Learning Goal 1:
Quantitative Frequency Tables – Grouped
Class: an interval of values.
Example: 61 x 70.
Frequency: the number of data values
that fall within a class.
“Five data fall within the class 61 x 70”.
Relative Frequency: the proportion of
data values that fall within a class.
“18% of the data fall within the class
61 x 70”.
Learning Goal 1:
Grouped Frequency Tables – Example
A frequency table
organizes quantitative data.
partitions data into classes (intervals).
shows how many data values are in each class.
Test Score
Number of
Students
61-70
4
71-80
8
81-90
15
91-100
7
Learning Goal 1:
Grouped Frequency Table Terminology
Class - non-overlapping intervals the data is
divided into.
Class Limits –The smallest and largest
observed values in a given class.
Class Boundaries – Fall halfway between
the upper class limit for the smaller class and
the lower class limit for larger class. Used to
close the gap between classes.
Class Width – The difference between the
class boundaries for a given class.
Class Midpoint or Mark – The midpoint of a
class.
Learning Goal 1:
Grouped Frequency Tables – Classes
• A grouped frequency table should
have a minimum of 5 classes and a
maximum of 20 classes.
• For small data sets, one can use
between 5 and 10 classes.
• For large data sets, one can use up to
20 classes.
Learning Goal 1:
Number of Classes
Same data set
Too Many Classes - Not
summarized enough.
Learning Goal 1:
Number of Classes
Same data set
Too Few Classes –
summarized too much.
Learning Goal 1:
Number of Classes
Same data set
Correct Number of Classes –
5 to 10.
Learning Goal 1:
Class Limits
Lower Class Limits are the smallest
numbers that can actually belong to different
classes.
Lower Class
Limits
Learning Goal 1:
Class Limits
Upper Class Limits are the
largest numbers that can
actually belong to different
classes.
Upper Class
Limits
Learning Goal 1:
Class Boundaries
Class Boundaries are the numbers used to
separate classes, but without the gaps
created by class limits.
Class boundaries split the gap, created by
the class limits between two consecutive
classes, in half.
Half of the gap is given to the upper class
and half given to the lower class. Thus,
bringing the bars of the two consecutive
classes together, with no gap.
Learning Goal 1:
Structure of a Data Class
A “class” is basically an
interval on a number (b + 0.5) - (a - 0.5)
line.
It has:
A lower limit a and an
upper limit b.
A width.
A lower boundary and
an upper boundary
(integer data).
A midpoint.
Learning Goal 1:
Structure of a Data Class - Problem
(b + 0.5) - (a - 0.5)
If a = 60 and b = 69
for integer data,
what is the value of
the lower boundary?
a). 60
b). 59.5
c). 9
d). 64.5
Learning Goal 1:
Structure of a Data Class - Problem
(b + 0.5) - (a - 0.5)
If a = 60 and b = 69
for integer data,
what is the value of
the lower boundary?
a). 60
b). 59.5
c). 9
d). 64.5
Learning Goal 1:
Class Boundaries
Class Boundaries are the number separating
classes.
- 0.5
Class
Boundaries
99.5
199.5
299.5
399.5
499.5
Learning Goal 1:
Class Midpoints or Class Mark
Class Midpoint or Class Mark is the
midpoint of each class.
Class midpoints can be found by
adding the lower class limit to the
upper class limit and dividing the sum
by two.
Learning Goal 1:
Class Midpoints
Class Midpoint is
the midpoint of
each class.
Class
Midpoints
49.5
149.5
249.5
349.5
449.5
Learning Goal 1:
Class Width
Class Width is the
difference between two
consecutive lower class
limits or two consecutive
lower class boundaries
Class
Width
100
100
100
100
100
Learning Goal 1:
Constructing A Frequency Table
1. Decide on the number of classes (should be between
5 and 20) .
2. Calculate (round up).
class width
(highest value) – (lowest value)
number of classes
3. Starting point: Begin by choosing a lower limit of the
first class.
4. Using the lower limit of the first class and class width,
proceed to list the lower class limits.
5. List the lower class limits in a vertical column and
proceed to enter the upper class limits.
6. Go through the data set putting a tally in the
appropriate class for each data value.
Learning Goal 1:
Constructing A Frequency Table - Example
A manufacturer of insulation randomly selects
20 winter days and records the daily high
temperature.
24, 35, 17, 21, 24, 37, 26, 46, 58, 30,
32, 13, 12, 38, 41, 43, 44, 27, 53, 27
Learning Goal 1:
Constructing A Frequency Table - Example
Sort raw data in ascending order:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38,
41, 43, 44, 46, 53, 58
Find range: 58 - 12 = 46
Select number of classes: 5 (usually between 5 and
10)
Compute class interval (width): 10 (46/5 then round
up)
Determine lower class (limits): 10, 20, 30, 40, 50. List
in a vertical column.
Compute upper class limits 19, 29, 39, 49, 59, and
then class midpoints: 14.5, 24.5, 34.5, 44.5, 54.5.
Count observations & assign to classes
(continued)
Learning Goal 1:
Constructing A Frequency Table - Example
(continued)
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Class
10 - 19
20 - 29
30 - 39
40 - 49
50 - 59
Total
Relative
Frequency
Frequency
3
6
5
4
2
20
.15
.30
.25
.20
.10
1.00
Percentage
15%
30%
25%
20%
10%
100%
Learning Goal 1:
Tip for Constructing A Frequency Table
Use Tally marks to count the data in each
class.
Record the frequencies (and relative
frequencies if desired) on the table.
Learning Goal 1:
Histogram
Then to make the Histogram, graph the Frequency Table data.
Learning Goal 1:
Making a Histogram
•
•
•
•
•
Make a frequency table.
Choose appropriate scale for vertical axis
(freq. or relative freq.) and horizontal axis
(based on classes). Label both axis.
Place class boundaries on horizontal axis.
Place frequencies on vertical axis.
For each class, draw a bar with height equal
to the class frequency and width equal to the
class width.
Title the graph.
Learning Goal 1:
Making a Histogram
Class
Midpoint Frequency
Class
15
25
35
45
55
3
6
5
4
2
Histogram: Daily High Tem perature
7
6
Frequency
10 - 19
20 - 29
30 - 39
40 - 49
50 - 59
(No gaps
between
bars)
5
4
3
2
1
0
5
15 25 35 45 55
Temperatures (degrees)
65
Learning Goal 1:
Frequency Table From a Histogram
• There are several procedures that one
can use to construct a grouped
frequency tables.
• However, because of the many
statistical software packages
(MINITAB, SPSS etc.) and graphing
calculators (TI-84 etc.) available today,
it is not necessary to try to construct
such distributions using pencil and
paper.
Learning Goal 1:
Frequency Table From a Histogram
• The weights of 30 female students
majoring in Physical Education on a
college campus are as follows: 143,
113, 107, 151, 90, 139, 136, 126, 122,
127, 123, 137, 132, 121, 112, 132,
133, 121, 126, 104, 140, 138, 99, 134,
119, 112, 133, 104, 129, and 123.
• Summarize the data with a frequency
distribution using seven classes.
Learning Goal 1:
Frequency Table From a Histogram
• The MINITAB statistical software was
used to generate the histogram
(similar to the histogram on our TI-84)
in the next slide.
• The histogram has seven classes.
• Classes for the weights are along the
x-axis and frequencies are along the
y-axis.
• The number at the top of each
rectangular box, represents the
frequency for the class.
Learning Goal 1:
Frequency Table From a Histogram
Histogram with 7 classes for the weights.
Learning Goal 1:
Frequency Table From a Histogram
• Observations
• From the histogram, the classes (intervals)
are 85 – 95, 95 – 105,105 – 115 etc. with
corresponding frequencies of 1, 3, 4, etc.
• We will use this information to construct the
group frequency distribution.
Learning Goal 1:
Frequency Table From a Histogram
• Observations (continued)
• Observe that the upper class limit of 95 for the
class 85 – 95 is listed as the lower class limit
for the class 95 – 105.
• Since the value of 95 cannot be included in
both classes, we will use the convention that
the upper class limit is not included in the class.
Learning Goal 1:
Frequency Table From a Histogram
• Observations (continued)
• That is, the class 85 – 95 should be
interpreted as having the values 85 and up to
95 but not including the value of 95.
• Using these observations, the grouped
frequency distribution is constructed from the
histogram and is given on the next slide.
Learning Goal 1:
Frequency Table From a Histogram
Class (weight)
Frequency
85 – 95
1
95 – 105
3
105 – 115
4
115 – 125
6
125 – 135
9
135 – 145
6
145 – 155
1
Total
n = 30
Learning Goal 1:
Using the TI-84 to Make Histograms
Start by entering data into a list (STAT /
Edit / L1).
Example: Enter the presidential data on
the next slide into list L1.
Learning Goal 1:
Using the TI-84 to Make Histograms
Learning Goal 1:
Using the TI-84 to Make Histograms
Choose 2nd: Stat Plot to choose a
histogram plot.
Caution: Watch out for other plots that
might be “turned on” or equations that
might be graphed.
Learning Goal 1:
Using the TI-84 to Make Histograms
Turn the plot “on”,
Choose the histogram plot.
Xlist should point to the location of the
data.
Learning Goal 1:
Using the TI-84 to Make Histograms
Under the “Zoom” menu,
choose option 9: ZoomStat
Learning Goal 1:
Using the TI-84 to Make Histograms
The result is a histogram where the calculator
has decided the width and location of the ranges.
You can use the Trace key to get information
about the ranges and the frequencies.
Learning Goal 1:
Using the TI-84 to Make Histograms
You can change the size and location of the
ranges by using the Window button.
Use the Xscl to change the class width on the
graph.
Press the Graph button to see the results
Learning Goal 1:
Using the TI-84 to Make Histograms
Voila!
Of course, you can still change the ranges if
you don’t like the results.
And you can construct a frequency table from
the histogram.
Learning Goal 1:
Using the TI-84 to Make Histograms – Your Turn
Using the data given, on
sodium in cereals, construct
a histogram on your TI – 84
and then using your
histogram construct a
frequency/relative frequency
table.
Use 8 classes, with a lower
class limit of 0.
Sodium
Data:
0 210
260 125
220 290
210 140
220 200
125 170
250 150
170 70
230 200
290 180
Learning Goal 1:
Using the TI-84 to Make Histograms – Solution
STAT, EDIT, (enter data)
STAT PLOT
ZOOM, #9:ZoomStat
Sodium Data:
0 210
260 125
220 290
210 140
220 200
125 170
250 150
170 70
230 200
290 180
64
Learning Goal 1:
Using the TI-84 to Make Histograms – Solution
Sodium Data:
0 210
260 125
220 290
210 140
220 200
125 170
250 150
170 70
230 200
290 180
65
Learning Goal 1:
Using the TI-84 to Make Histograms – Solution
Sodium
Data:
0 210
260 125
220 290
210 140
220 200
125 170
250 150
170 70
230 200
290 180
66
Learning Goal 1:
TI-84 to Make Histogram Using Freq. Table Data
Class Limits
350 to < 450
450 to < 550
550 to < 650
650 to < 750
750 to < 850
850 to < 950
Frequency
11
10
2
2
2
1
Same as raw
data, using the
class midpoint
to represent
the class.
Learning Goal 1:
TI-84 to Make Histogram Using Freq. Table Data
Enter the data into 2 lists. L1 is the classes
(class midpoint) and L2 is the frequency.
Learning Goal 1:
TI-84 to Make Histogram Using Freq. Table Data
Turn on Stats Plot1 and select the histogram.
Xlist is L1 the classes and Freq is L2 the
frequencies.
Learning Goal 1:
TI-84 to Make Histogram Using Freq. Table Data
Select ZoomStat to graph the histogram.
Learning Goal 1:
TI-84 to Make Histogram Using Freq. Table Data
Adjust the WINDOW to improve the picture
and/or make the values better.
Learning Goal 1:
TI-84 to Make Histogram Using Freq. Table Data
Use the Trace Key to determine values on the
graph.
Learning Goal 1:
Freq. Histogram vs Relative Freq. Histogram
Frequency Histogram - a bar graph in which the
horizontal scale represents the classes of data values
and the vertical scale represents the frequencies.
Learning Goal 1:
Freq. Histogram vs Relative Freq. Histogram
Relative Frequency Histogram - has the same shape
and horizontal scale as a histogram, but the vertical
scale is marked with relative frequencies.
Learning Goal 1:
Freq. Histogram vs Relative Freq. Histogram
They look the same with the exception
of the vertical axis scale.
Learning Goal 1:
Freq. Histogram vs Relative Freq. Histogram - Example
Learning Goal 1:
Histograms - Facts
• Histograms are useful when the data
values are quantitative.
• A histogram gives an estimate of the
shape of the distribution of the
population from which the sample was
taken.
• If the relative frequencies were plotted
along the vertical axis to produce the
histogram, the shape will be the same
as when the frequencies are used.
Learning Goal 1:
Anatomy of a Histogram
Title
Note that there are
no spaces between bars.
(continuous data)
Number of
observations.
Height of each bar
represents the
frequency in
each class.
Number of
occurrences (frequencies)
are shown on the
vertical axis.
Empty Class: No
data were
recorded between
75 and 80.
Each bar represents a class. The
number of classes is usually between 5 and 20.
Here, there are 17 classes. The width of each class
is determined by dividing the range of the data
set by the number of classes, and rounding up.
In this data set, the range is 82.
82/17 = 4.8, rounded up to 5. This class goes
from 5 to 10.
Label both
horizontal and
vertical
axes.
The numbers shown on the
horizontal axis are the boundaries of
each class.
NOTE: Sometimes the numbers shown on the
horizontal axis are the midpoints of each class.
(A class midpoint is also referred to as the mark
of the class.)
Quantitative Data
STEM AND LEAF PLOT
Learning Goal 1:
Stem-and-Leaf Plots
• What is a stem-and-leaf plot? A stem-andleaf plot is a data plot that uses part of a data
value as the stem to form groups or classes
and part of the data value as the leaf.
• Most often used for small or medium sized
data sets. For larger data sets, histograms do
a better job.
• Note: A stem-and-leaf plot has an advantage
over a grouped frequency table or histogram,
since a stem-and-leaf plot retains the actual
data by showing them in graphic form.
Learning Goal 1:
Stem-and-Leaf Plots
Stem-and-leaf plots are used for summarizing
quantitative variables.
Separate each observation into a stem (first part of the
number) and a leaf (typically the last digit of the
number).
Write the stems in a vertical column ordered from
smallest to largest, including empty stems; draw a
vertical line to the right of the stems.
Write each leaf in the row to the right
of its stem in order.
81
Learning Goal 1:
Stem and Leaf Plot Construction
Learning Goal 1:
Stem-and-Leaf Plots
How to make a stemplot:
1) Separate each observation into a
stem, consisting of all but the final
(rightmost) digit, and a leaf, which is
that remaining final digit. Stems may
have as many digits as needed. Use
only one digit for each leaf—either
round or truncate the data values to
one decimal place after the stem.
2) Write the stems in a vertical column
with the smallest value at the top, and
draw a vertical line at the right of this
column.
3) Write each leaf in the row to the right
of its stem, in increasing order out from
the stem. Title and include key.Original
data: 9, 9, 22, 32, 33, 39, 39, 42, 49, 52,
58, 70.
STEM
LEAVES
Include key – how to
read the stemplot.
0|9 = 9
Learning Goal 1:
Stem-and-Leaf Plots – Picking Stems
Data in ordered array:
21, 24, 24, 26, 27, 27, 30, 32, 38, 41
Here, use the 10’s digit for the stem unit:
Stem Leaf
2
1
21 is shown as
38 is shown as
3
8
41 is shown as
4
1
Learning Goal 1:
Stem-and-Leaf Plots – Picking Stems
(continued)
Completed stem-and-leaf diagram:
Data in ordered array:
21, 24, 24, 26, 27, 27, 30, 32, 38, 41
Stem
Key
3⃓ 0 = 30
Leaves
2
1 4 4 6 7 7
3
0 2 8
4
1
Learning Goal 1:
Stem-and-Leaf Plots - Using Other Stem Units
Using the 100’s digit as the stem:
Round off the 10’s digit to form the leaves
Stem
Leaf
613 would become (610)
6
1
776 would become (780)
7
8
12
2
...
1224 becomes (1220)
Learning Goal 1:
Stem-and-Leaf Plots - Using Other Stem Units
(continued)
Using the 100’s digit as the stem:
The completed stem-and-leaf display:
Data:
613, 632, 658, 717,
722, 750, 776, 827,
841, 859, 863, 891,
894, 906, 928, 933,
955, 982, 1034,
1047,1056, 1140,
1169, 1224
Stem
6
Leaves
136
7
2258
8
346699
9
13368
10
356
11
47
12
2
Key
6⃓ 3 = 630
Learning Goal 1:
Stem-and-Leaf Plots - Example
Construct a stem-and-leaf diagram, which
simultaneously groups the data and provides a
graphical display similar to a histogram.
Learning Goal 1:
Stem-and-Leaf Plots - Example
Put the data in a List in the TI – 84
(STAT/EDIT/L1).
Order the data using sort ascending function
(STAT/EDIT/2:SortA(… ) and List 1.
Learning Goal 1:
Stem-and-Leaf Plots - Example
Return to the list (STAT/EDIT) to view ordered
data.
Learning Goal 1:
Stem-and-Leaf Plots - Example
First, list the leading digits of the numbers in
the table (3, 4, . . . , 9) in a column, as shown
to the left of the vertical rule.
Next, write the final digit of each number from
the table to the right of the vertical rule in the
row containing the appropriate leading digit.
Do not
forget the
title and key.
Learning Goal 1:
Stem-and-Leaf Plots - Variation
Splitting Stems – (too few stems or
classes) Split stems to double the
number of stems when all the leaves
would otherwise fall on just a few
stems.
Each stem appears twice.
Leaves 0-4 go on the 1st stem.
Leaves 5-9 go on the 2nd stem.
Learning Goal 1:
Stem-and-Leaf Plots – Split Stems Example
A pediatrician tested the cholesterol levels of several
young patients and was alarmed to find that many had
levels higher than 200 mg per 100 mL. The table below
presents the readings of 20 patients with high levels.
Construct a stem-and-leaf diagram for these data by
using
a. one line per stem.
b. Split Stems - two lines per
stem.
Learning Goal 1:
Stem-and-Leaf Plots – Split Stems Example
The stem-and-leaf diagram in (a) is only moderately helpful
because there are so few stems. (b) is a better stem-and-leaf
diagram for these data. It uses Split Stems - two lines for each
stem, with the first line for the leaf digits 0-4 and the second line
for the leaf digits 5-9.
Cholesterol Levels
Key
19⃓ 9 = 199
Cholesterol Levels
Key
19⃓ 9 = 199
Learning Goal 1:
Stem-and-Leaf Plots - Your Turn
• A sample of the number of admissions
to a psychiatric ward at a local hospital
during the full phases of the moon is
as follows: 22, 30, 21, 27, 31, 36, 20,
28, 25, 33, 21, 38, 32, 35, 26, 19, 43,
30, 30, 34, 27, and 41.
• Display the data in an appropriate
stem-and-leaf plot.
Learning Goal 1:
Stem-and-Leaf Plots – Correct Solution
Admissions to
Psychiatric Ward
1
1 9
2 0112
2 56778
3 0001234
3 568
4 13
4
Key
3⃓ 5 = 35
Learning Goal 1:
Stem-and-Leaf Plots – Incorrect Solution
Key: 1|9 = 19
Admissions to Psychiatric Ward
1
2
3
4
9
0 1 1 2 5 6 7 7 8
0 0 0 1 2 3 4 5 6 8
1 3
Learning Goal 1:
Stemplots versus Histograms
Stemplots are quick and dirty histograms that can easily
be done by hand, therefore, very convenient for back of
the envelope calculations. However, they are rarely found
in scientific or laymen publications.
Learning Goal 1:
Stemplots versus Histograms
Stem-and-leaf displays show the distribution
of a quantitative variable, like histograms do,
while preserving the individual values.
Stem-and-leaf displays contain all the
information found in a histogram and, when
carefully drawn, satisfy the area principle and
show the distribution.
Quantitative Data
DOTPLOTS
Learning Goal 1:
Dotplots
• What is a dot plot? A dot plot is a plot
that displays a dot for each value in a
data set along a number line. If there
are multiple occurrences of a specific
value, then the dots will be stacked
vertically.
Learning Goal 1:
Dotplots
A dotplot is a simple
display. It just places
a dot along an axis
for each case in the
data.
The dotplot to the
right shows Kentucky
Derby winning times,
plotting each race as
its own dot.
You may see a
dotplot displayed
horizontally or
vertically.
Learning Goal 1:
Dotplots
To construct a dot plot
1.
2.
3.
4.
Draw a horizontal line.
Label it with the name of the variable.
Mark regular values of the variable (scale) on it.
For each observation, place a dot above its value
on the number line.
Sodium in Cereals
103
Learning Goal 1:
Dotplots - Example:
The following data shows the length of 50
movies in minutes. Construct a dot plot for
the data.
64, 64, 69, 70, 71, 71, 71, 72, 73, 73, 74, 74,
74, 74, 75, 75, 75, 75, 75, 75, 76, 76, 76, 77,
77, 78, 78, 79, 79, 80, 80, 81, 81, 81, 82, 82,
82, 83, 83, 83, 84, 86, 88, 89, 89, 90, 90, 92,
94, 120.
Length of 50 Movies
Figure 2-5
Learning Goal 1:
Dotplots – Frequency Table Data
The following frequency distribution shows the
number of defectives observed by a quality control
officer over a 30 day period. Construct a dot plot
for the data.
Learning Goal 1:
Dotplots – Solution
Learning Goal 1:
Dotplots – Your Turn
One of Professor Weiss’s sons wanted to add a
new DVD player to his home theater system. He
used the Internet to shop and went to
pricewatch.com. There he found 16 quotes on
different brands and styles of DVD players.
Construct a dotplot for these data.
Learning Goal 1:
Dotplots – Solution
To construct a dotplot for the data, we begin by
drawing a horizontal axis that displays the possible
prices.
Then we record each price by placing a dot over the
appropriate value on the horizontal axis.
For instance, the first price is $210, which calls for a
dot over the “210” on the horizontal axis.
Learning Goal 1:
Think Before You Draw
Remember the “Make a picture” rule?
Now that we have options for data
displays, you need to Think carefully
about which type of display to make.
Before making a stem-and-leaf
display, a histogram, or a dotplot,
check the
Quantitative Data Condition: The data are
values of a quantitative variable whose
units are known.
Learning Goal 2
Know how to display
the relative position of
quantitative variable
with a Cumulative
Frequency Curve and
analysis the Cumulative
Frequency Curve.
Quantitative Data
OGIVE - CUMULATIVE
FREQUENCY CURVE
Learning Goal 2:
Cumulative Frequency and the Ogive
Histogram displays the distribution of a
quantitative variable. It tells little about the
relative standing (percentile, quartile, etc.) of
an individual observation.
For this information, we use a Cumulative
Frequency graph, called an Ogive
(pronounced O-JIVE).
Learning Goal 2:
Measures of Relative Standing
How many measurements lie below the
measurement of interest? This is measured
by the pth percentile.
p%
(100-p) %
p-th percentile
x
Learning Goal 2:
Percentile
The pth percentile is a value such that p
percent of the observations fall below or at
that value.
114
Learning Goal 2:
Special Percentiles – Deciles and Quartiles
• Deciles and quartiles are special
percentiles.
• Deciles divide an ordered data set into
10 equal parts.
• Quartiles divide the ordered data set
into 4 equal parts.
• We usually denote the deciles by D1,
D2, D3, … , D9.
• We usually denote the quartiles by Q1,
Q2, and Q3.
Learning Goal 2:
Special Percentiles – Deciles and Quartiles
•
•
•
•
•
•
•
There are 9 deciles and 3 quartiles.
Q1 = first quartile = P25
Q2 = second quartile = P50
Q3 = third quartile = P75
D1 = first decile = P10
D2 = second decile = P20 . . .
D9 = ninth decile = P90
Learning Goal 2:
Percentile - Examples
90% of all men (16 and older) earn more than
$319 per week.
BUREAU OF LABOR STATISTICS
10%
90%
$319
50th Percentile = Median
25th Percentile = Lower Quartile (Q1)
75th Percentile
= Upper Quartile (Q3)
$319 is the 10th
percentile.
Learning Goal 2:
Calculating Percentile
• The percentile corresponding to a
given data value, say x, in a set is
obtained by using the following
formula.
Number of values at or below x
Percentile
100%
Number of values in data set
Learning Goal 2:
Calculating Percentile - Example
• Example: The shoe sizes, in whole
numbers, for a sample of 12 male
students in a statistics class were as
follows: 13, 11, 10, 13, 11, 10, 8, 12, 9,
9, 8, and 9.
• What is the percentile rank for a shoe
size of 12?
Learning Goal 2:
Calculating Percentile - Solution
• Solution: First, we need to arrange the
values from smallest to largest.
• The ordered array is given below: 8, 8,
9, 9, 9, 10, 10, 11, 11, 12, 13, 13.
• Observe that the number of values at
or below the value of 12 is 10.
Learning Goal 2:
Calculating Percentile - Solution
• Solution (continued): The total number
of values in the data set is 12.
• Thus, using the formula, the
corresponding percentile is:
The value of 12
corresponds to
approximately the
83rd percentile.
Learning Goal 2:
Calculating Percentile - Example
• Example: The data given below
represents the 19 countries with the
largest numbers of total Olympic
medals – excluding the United States,
which had 101 medals – for the 1996
Atlanta games. Find the 65th percentile
for the data set.
• 63, 65, 50, 37, 35, 41, 25, 23, 27, 21,
17, 17, 20, 19, 22, 15, 15, 15, 15.
Learning Goal 2:
Calculating Percentile - Solution
• Solution: First, we need to arrange the
data set in order. The ordered set is: .
• 15, 15, 15, 15, 17, 17, 19, 20, 21, 22,
23, 25, 27, 35, 37, 41, 50, 63, 65.
• Next, compute the position of the
percentile.
• Here n = 19, k = 65.
• Thus, c = (19 65)/100 = 12.35.
• We need to round up to a value 13.
Learning Goal 2:
Calculating Percentile - Solution
• Solution (continued): Thus, the 13th value in
the ordered data set will correspond to the
65th percentile.
• That is P65 = 27.
Learning Goal 2:
Cumulative Frequency
•
What is a cumulative frequency for a
class? The cumulative frequency for a
specific class in a frequency table is
the sum of the frequencies for all
values at or below the given class.
Learning Goal 2:
Cumulative Frequency Tables
Cumulative frequencies for a class are the
sums of all the frequencies up to and
including that class.
Example
Learning Goal 2:
Cumulative Frequency Tables
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Class
Frequency Percentage
Cumulative Cumulative
Frequency Percentage
10 - 20
3
15
3
15
20 - 30
6
30
9
45
30 - 40
5
25
14
70
40 - 50
4
20
18
90
50 - 60
2
10
20
100
Total
20
100
Learning Goal 2:
Cumulative Frequency Curve - Ogive
A line graph that depicts cumulative
frequencies.
Used to Find Quartiles and
Percentiles.
Ogive: Daily High Temperature
Cumulative Percentage
100
80
60
40
20
0
10
20
30
40
50
60
Learning Goal 2:
Constructing an Ogive
1.
2.
3.
4.
Make a frequency table and add a cumulative
frequency column.
To fill in the cumulative frequency column, add the
counts in the frequency column that fall in or below
the current class interval.
Label and scale the axes and title the graph.
Horizontal axis “classes” and vertical axis
“cumulative frequency or relative cumulative
frequency”.
Begin the ogive at zero on the vertical axis and lower
boundary of the first class on the horizontal axis.
Then graph each additional Upper class boundary
vs. cumulative frequency for that class.
Learning Goal 2:
Ogive - Example
Learning Goal 2:
Cumulative Frequency Curve – Example
The frequencies of the scores of 80 students
in a test are given in the following table.
Complete the corresponding cumulative
frequency table.
A suitable table is as follows:
Learning Goal 2:
Cumulative Frequency Curve – Example
The information provided by a cumulative frequency
table can be displayed in graphical form by plotting the
cumulative frequencies given in the table against the
upper class boundaries, and joining these points with a
smooth. Construct the Cumulative Frequency Curve.
The cumulative frequency curve corresponding to the
data is as follows:
Learning Goal 2:
Cumulative Frequency Curve – Class Problem
The results obtained by 200 students in a mathematics test
are given in the following table.
Draw a cumulative frequency curve and use it to estimate.
a) The median mark.
b) The number of students who scored less than 22
marks.
c) The pass mark if 120 students passed the test.
d) The min. mark required to obtain an A grade if 10% of
the students received an A grade.
Learning Goal 2:
Cumulative Frequency Curve – Solution
a)
b)
c)
d)
The required cumulative frequency curve is as follows:
The median mark: median mark is 26
The number of students who scored less than 22 marks: approximately 69
students scored less than 22 marks
The pass mark if 120 students passed the test: pass mark is 28
The min. mark required to obtain an A grade if 10% of the students received
an A grade: min. mark required for an A is 38
Learning Goal 3
Be able to describe the
distribution of a
quantitative variable in
terms of its shape.
Learning Goal 3:
What is the Shape of the Distribution?
1. Does the histogram have a single
central peak or several separated
peaks?
2. Is the histogram symmetric?
3. Do any unusual features stick out?
In any graph, look for the overall pattern
and any striking deviations from that
pattern.
Learning Goal 3:
Shape, Center, and Spread
When describing a distribution, make
sure to always talk about three things:
shape, center, and spread…
Actually you should comment on four
things when describing a distribution.
The three above and any deviations
from the shape.
These deviations from the shape are
called ‘outliers’ and will be discussed
later.
Learning Goal 3:
Shape - Peaks
Does the histogram have a single
central peak or several separated
peaks?
Peaks in a histogram are also called
modes.
A histogram with one main peak is
dubbed unimodal; histograms with two
peaks are bimodal; histograms with
three or more peaks are called
multimodal.
Learning Goal 3:
Shape: Unimodal - Example
Unimodal – single peak.
139
Learning Goal 3:
Shape: Bimodal - Example
Bimodal - two peaks.
Learning Goal 3:
Shape: Multimodal - Example
Multimodal – three or more peaks.
Learning Goal 3:
Shape: Bimodal or Multimodal
A bimodal or multimodal shape distribution
might indicate that the data are from two or
more different populations.
Height of plants by color
5
red
Number of plants
4
pink
blue
3
2
1
0
Height in centimeters
Learning Goal 3:
Shape: Uniform
A histogram that doesn’t appear to have any
mode and in which all the bars are
approximately the same height is called
uniform or rectangular.
A distribution in which every class has equal
frequency, no mode.
A uniform distribution is symmetrical with the
added property that the bars are the same
height.
Learning Goal 3:
Shape: Uniform - Example
Uniform – no mode, symmetrical.
Learning Goal 3:
Shape: Modal Comparison
Learning Goal 3:
Shape: Symmetrical
• In a symmetrical distribution, the data
values are evenly distributed on both
sides of the mean.
• If you can fold the histogram along a
vertical line through the middle and
have the edges match pretty closely,
the histogram is symmetric.
Learning Goal 3:
Shape: Symmetrical - Example
Symmetrical – The distribution’s shape is
generally the same if folded down the
middle.
Learning Goal 3:
Shape: Skewed
The (usually) thinner ends of a distribution
are called the tails. If one tail stretches out
farther than the other, the histogram is said to
be skewed to the side of the longer tail.
In the figure below, the histogram on the left
is said to be skewed left, while the histogram
on the right is said to be skewed right.
Learning Goal 3:
Shape: Skewed Right - Example
In a skewed right distribution, most of the
data values fall to the left, and the “tail” of
the distribution is to the right.
Learning Goal 3:
Shape: Skewed Left - Example
In a skewed left distribution, most of the
data values fall to the right, and the “tail”
of the distribution is to the left.
Learning Goal 3:
Shape: Skewed - Comparison
A distribution is skewed to
the left if the left tail is
longer than the right tail
A distribution is skewed to
the right if the right tail is
longer than the left tail
151
Learning Goal 3:
Shape: Other Common Terms
Hump – high bar
Valley – between 2 peaks
Gap – no data
Learning Goal 3:
Shapes
Learning Goal 4
Be able to describe any
anomalies or
extraordinary features
revealed by the display
of a variable.
Learning Goal 4:
Overall Pattern - Anything Unusual?
Do any unusual features stick out?
Sometimes it’s the unusual features that
tell us something interesting or exciting
about the data.
You should always mention any
stragglers, or outliers, that stand off away
from the body of the distribution.
Are there any gaps in the distribution? If
so, we might have data from more than
one group.
Learning Goal 4:
Deviations from the Overall Pattern
Outliers – An individual observation that falls
outside the overall pattern of the distribution.
Extreme Values – either high or low.
Outliers
Causes:
1. Data Mistake
2. Special nature of some observations
Learning Goal 4:
Outliers
An Outlier falls far from the rest of the data.
157
Learning Goal 4:
Outliers
Outliers are observations that lie outside the
overall pattern of a distribution. Always look for
outliers and try to explain them.
Alaska
Florida
The overall pattern is
fairly symmetrical
except for two states
clearly not belonging
to the main trend.
Alaska and Florida
have unusual
representation of the
elderly in their
population.
A large gap in the
distribution is typically
a sign of an outlier.
Learning Goal 5
Be able to determine
the shape of the
distribution of a variable
by knowing something
about the data.
Learning Goal 5:
Determine the Shape of a Distribution - Example
160
Learning Goal 5:
Determine the Shape of a Distribution - Example
It’s often a good idea to think about what the distribution of a
data set might look like before we collect the data. What do
you think the distribution of each of the following data sets
will look like?
1. Number of Miles run by Saturday morning joggers at a park.
Roughly symmetric, slightly skewed right.
2. Hours spent by U.S. adults watching football on
Thanksgiving Day.
Bimodal. Many people watch no football, others watch most
of one or more games.
3. Amount of winnings of all people playing a particular state’s
lottery last week.
Strongly skewed to the right, with almost everyone at $0, a
few small prizes, with the winner an outlier.
Learning Goal 5:
Determine the Shape of a Distribution – Your Turn
Consider a data set containing IQ scores for the
general public. What shape would you expect a
histogram of this data set to have?
a.
b.
c.
d.
Symmetric
Skewed to the left
Skewed to the right
Bimodal
162
Learning Goal 5:
Determine the Shape of a Distribution – Your Turn
Consider a data set containing IQ scores for the
general public. What shape would you expect a
histogram of this data set to have?
a.
b.
c.
d.
Symmetric
Skewed to the left
Skewed to the right
Bimodal
163
Learning Goal 5:
Determine the Shape of a Distribution – Your Turn
Consider a data set of the scores of students on
a very easy exam in which most score very well
but a few score very poorly. What shape would
you expect a histogram of this data set to have?
a.
b.
c.
d.
Symmetric
Skewed to the left
Skewed to the right
Bimodal
164
Learning Goal 5:
Determine the Shape of a Distribution – Your Turn
Consider a data set of the scores of students on
a very easy exam in which most score very well
but a few score very poorly. What shape would
you expect a histogram of this data set to have?
a.
b.
c.
d.
Symmetric
Skewed to the left
Skewed to the right
Bimodal
165
Learning Goal 6
Know the basic
properties and how to
compute the mean and
median of a set of data.
Learning Goal 6:
Measures of Central Tendency
A measure of central tendency for a
collection of data values is a number
that is meant to convey the idea of
centralness or center of the data set.
The most commonly used measures of
central tendency for sample data are
the: mean, median, and mode.
Learning Goal 6:
Measures of Central Tendency
Overview
Central Tendency
Mean
Median
Mode
n
X
X
i 1
n
i
Midpoint of
ranked
values
Most
frequently
observed
value
Learning Goal 6:
The Mean
• Mean: The mean of a set of numerical
(data) values is the (arithmetic)
average for the set of values.
• When computing the value of the
mean, the data values can be
population values or sample values.
• Hence we can compute either the
population mean or the sample mean
Learning Goal 6:
Mean Notation
• NOTATION: The population mean
is denoted by the Greek letter µ
(read as “mu”).
• NOTATION: The sample mean is
denoted by 𝑥 (read as “x-bar”).
• Normally the population mean is
unknown.
Learning Goal 6:
The Mean
The mean is the most common measure of
central tendency.
The mean is also the preferred measure of
center, because it uses all the data in
calculating the center.
For a sample of size n:
n
X
X
i1
n
Sample size
i
X1 X2 Xn
n
Observed values
Learning Goal 6:
The Mean - Example
• What is the mean of the following 11
sample values?
3
8
6
14
0
0
12 -7
0
-10
-4
Learning Goal 6:
The Mean - Example (Continued)
• Solution:
3 8 6 14 0 (4) 0 12 (7) 0 (10)
x
11
2
Learning Goal 6:
Mean – Frequency Table
• When a data set has a large number of
values, we summarize it as a
frequency table.
• The frequencies represent the number
of times each value occurs.
• When the mean is calculated from a
frequency table it is an approximation,
because the raw data is not known.
Learning Goal 6:
Mean – Frequency Table Example
What is the mean of the following 11 sample
values (the same data as before)?
Class
Frequency
-10 to < -4
2
-4 to < 2
4
2 to < 8
2
8 to < 14
2
14 to < 20
1
Learning Goal 6:
Mean – Frequency Table Example
Solution:
Class
Midpoint
Frequency
-10 to < -4
-7
2
-4 to < 2
-1
4
2 to < 8
5
2
8 to < 14
11
2
14 to < 20
17
1
2 7 4 1 2 5 2 11 1 17
x
11
2.82
Learning Goal 6:
Calculate Mean on TI-84 Raw Data
1. Enter the raw data into a list, STAT/Edit.
2. Calculate the mean, STAT/CALC/1-Var
Stats
List: L1
FreqList: (leave blank)
Calculate
177
Learning Goal 6:
Calculate Mean on TI-84 Frequency Table Data
Same Data
Class
Mark Freq
0-50
25
1
50-100
75
1
100-150
125
3
150-200
175
4
200-250
225
7
250-300
275
4
1. Enter the Frequency table data into two
lists (L1 – Class Midpoint, L2 – Frequency),
STAT/Edit.
2. Calculate the mean,
STAT/CALC/1-Var Stats
List: L1
FreqList: L2
Calculate
178
Learning Goal 6:
Calculate Mean on TI-84 – Your Turn
Raw Data: 548, 405, 375, 400, 475, 450, 412
375, 364, 492, 482, 384, 490, 492
490, 435, 390, 500, 400, 491, 945
435, 848, 792, 700, 572, 739, 572
Solution: 516.2
Learning Goal 6:
Calculate Mean on TI-84 – Your Turn
Frequency Table Data (same):
Class Limits
350 to < 450
450 to < 550
550 to < 650
650 to < 750
750 to < 850
850 to < 950
Solution: 517.9
Frequency
11
10
2
2
2
1
Learning Goal 6:
Median
The median is the midpoint of the
observations when they are ordered
from the smallest to the largest (or
from the largest to smallest)
If the number of observations is:
Odd, then the median is the middle
observation
Even, then the median is the average of
the two middle observations
181
Center of a Distribution -- Median
The median is the value with exactly half the
data values below it and half above it.
It is the middle data value (once the data
values have been ordered) that divides
the histogram into two equal areas.
It has the same units as the data.
Learning Goal 6:
Finding the Median
The location of the median:
n 1
Median position
position in the ordered data
2
If the number of values is odd, the median is the
middle number.
If the number of values is even, the median is the
average of the two middle numbers.
Note that
𝑛+1
2
is not the value of the median,
only the position of the median in the ranked
data.
Learning Goal 6:
Finding the Median – Example (n odd)
• What is the median for the following
sample values?
3
8
6
2
12 -7
14
0
-1 -10
-4
Learning Goal 6:
Finding the Median – Example (n odd)
• First of all, we need to arrange the data set in
order ( STATS/SortA )
• The ordered set is:
• -10 -7 -4 -1 0 2 3 6 8 12 14
6th value
• Since the number of values is odd, the
median will be found in the 6th position in the
ordered set (To find; data number divided by
2 and round up, 11/2 = 5.5⇒6).
• Thus, the value of the median is 2.
Learning Goal 6:
Finding the Median – Example (n even)
• Find the median age for the following
eight college students.
23 19 32 25 26 22 24 20
Learning Goal 6:
Finding the Median – Example (n even)
• First we have to order the values as shown
below.
19 20 22 23 24 25 26 32
Middle Two
Average
• Since there is an even number of ages, the
median will be the average of the two middle
values (To find; data number divided by 2,
that number and the next are the two middle
numbers, 8/2 = 4⇒4th & 5th are the middle
numbers).
• Thus, median = (23 + 24)/2 = 23.5.
Learning Goal 6:
The Median - Summary
The median is the midpoint of a distribution—the number such that half
of the observations are smaller and half are larger.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
1
2
3
4
5
6
7
8
9
10
11
12
1
2
3
4
5
6
7
8
9
10
11
0.6
1.2
1.6
1.9
1.5
2.1
2.3
2.3
2.5
2.8
2.9
3.3
3.4
3.6
3.7
3.8
3.9
4.1
4.2
4.5
4.7
4.9
5.3
5.6
25 12
6.1
1. Sort observations from smallest to
largest.n = number of observations
______________________________
2. If n is odd, the median is observation
n/2 (round up) down the list
n = 25
n/2 = 25/2 = 12.5=13
Median = 3.4
3. If n is even, the median is the mean
of the two center observations
n = 24
n/2 = 12 &13
Median = (3.3+3.4) /2 = 3.35
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
1
2
3
4
5
6
7
8
9
10
11
1
2
3
4
5
6
7
8
9
10
11
0.6
1.2
1.6
1.9
1.5
2.1
2.3
2.3
2.5
2.8
2.9
3.3
3.4
3.6
3.7
3.8
3.9
4.1
4.2
4.5
4.7
4.9
5.3
5.6
Learning Goal 6:
Finding the Median on the TI-84
1. Enter data into L1
2. STAT; CALC; 1:1-Var Stats
189
Learning Goal 6:
Find the Mean and Median – Your Turn
CO2 Pollution levels in 8 largest nations
measured in metric tons per person:
2.3 1.1 19.7 9.8 1.8 1.2 0.7 0.2
a. Mean = 4.6
b. Mean = 4.6
c. Mean = 1.5
Median = 1.5
Median = 5.8
Median = 4.6
190
Learning Goal 6:
Find the Mean and Median – Your Turn
CO2 Pollution levels in 8 largest nations
measured in metric tons per person:
2.3 1.1 19.7 9.8 1.8 1.2 0.7 0.2
a. Mean = 4.6
b. Mean = 4.6
c. Mean = 1.5
Median = 1.5
Median = 5.8
Median = 4.6
191
Learning Goal 6:
Mode
A measure of central tendency.
Value that occurs most often or frequent.
Used for either numerical or categorical data.
There may be no mode or several modes.
Not used as a measure of center.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9
0 1 2 3 4 5 6
No Mode
Learning Goal 6:
Mode - Example
The mode is the measurement which
occurs most frequently.
The set: 2, 4, 9, 8, 8, 5, 3
The mode is 8, which occurs twice
The set: 2, 2, 9, 8, 8, 5, 3
There are two modes - 8 and 2
(bimodal)
The set: 2, 4, 9, 8, 5, 3
There is no mode (each value is
unique).
Learning Goal 6:
Summary Measures of Center
Learning Goal 7
Understand the
properties of a
skewed distribution.
Learning Goal 7:
Where is the Center of the Distribution?
If you had to pick a single number to
describe all the data what would you
pick?
It’s easy to find the center when a
histogram is unimodal and
symmetric—it’s right in the middle.
On the other hand, it’s not so easy to
find the center of a skewed histogram
or a histogram with outliers.
Learning Goal 7:
Meaningful measure of Center
Your measure of center must be meaningful.
The distribution of women’s height appears coherent and
symmetrical. The mean is a good measure center.
Height of 25 women in a class
x 69.3
Is the mean always a good measure of center?
Learning Goal 7:
Impact of Skewed Data
Mean and median of a symmetric
distribution
Disease X:
x 3.4
M 3.4
Mean and median are the same.
and skewed distribution.
Multiple myeloma:
x 3.4
M 2.5
The mean is pulled toward
the skew.
Learning Goal 7:
The Mean
Nonresistant – The mean is sensitive to the
influence of extreme values and/or outliers.
Skewed distributions pull the mean away
from the center towards the longer tail.
The mean is located at the balancing point of
the histogram. For a skewed distribution, is
not a good measure of center.
Learning Goal 7:
Mean – Nonresistant Example
The most common measure of central tendency.
Affected by extreme values (skewed dist. or outliers).
0 1 2 3 4 5 6 7 8 9 10
Mean = 3
1 2 3 4 5 15
3
5
5
0 1 2 3 4 5 6 7 8 9 10
Mean = 4
1 2 3 4 10 20
4
5
5
Learning Goal 7:
The Median
Resistant – The median is said to be
resistant, because extreme values
and/or outliers have little effect on the
median.
In an ordered array, the median is the
“middle” number (50% above, 50%
below).
Learning Goal 7:
Median – Resistant Example
Not affected by extreme values (skewed
distributions or outliers).
0 1 2 3 4 5 6 7 8 9 10
Median = 3
0 1 2 3 4 5 6 7 8 9 10
Median = 3
Learning Goal 7:
Mean vs. Median with Outliers
Percent of people dying
x 3.4
x 4.2
Without the outliers
With the outliers
The mean (non-resistant) is
The median (resistant), on the
pulled to the right a lot by the
other hand, is only slightly
outliers (from 3.4 to 4.2).
pulled to the right by the outliers
(from 3.4 to 3.6).
Learning Goal 7:
Effect of Skewed Distributions
• The figure below shows the relative positions of the
mean and median for right-skewed, symmetric, and
left-skewed distributions.
• Note that the mean is pulled in the direction of
skewness, that is, in the direction of the extreme
observations.
• For a right-skewed distribution, the mean is greater
than the median; for a symmetric distribution, the mean
and the median are equal; and, for a left-skewed
distribution, the mean is less than the median.
Learning Goal 7:
Comparing the mean and the median
The mean and the median are the same only if the distribution is symmetrical. The
median is a measure of center that is resistant to skew and outliers. The mean is not.
Mean and median for a
symmetric distribution
Mean
Median
Left skew
Mean
Median
Mean and median for
skewed distributions
Mean
Median
Right skew
Learning Goal 7:
Which measure of location is the “best”?
Because the median considers only the order
of values, it is resistant to values that are
extraordinarily large or small; it simply notes
that they are one of the “big ones” or “small
ones” and ignores their distance from center.
To choose between the mean and median,
start by looking at the distribution.
Mean is used, for unimodal symmetric
distributions, unless extreme values (outliers)
exist.
Median is used, for skewed distributions or
when there are outliers present, since the
median is not sensitive to extreme values.
Learning Goal 7:
Class Problem
Observed mean =2.28, median=3,
mode=3.1
What is the shape of the
distribution and why?
Learning Goal 7:
Solution
Solution: Skewed Left
Left-Skewed
Mean Median Mode
Symmetric
Right-Skewed
Mean = Median = Mode
Mode Median Mean
Learning Goal 7:
Example
Five houses on a hill by the beach.
$2,000 K
House Prices:
$500 K
$300 K
$100 K
$100 K
$2,000,000
500,000
300,000
100,000
100,000
Learning Goal 7:
Example – Measures of Center
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
Which is the best
measure of center?
Median
Sum $3,000,000
Mean:
($3,000,000/5)
= $600,000
Median: middle value of ranked data
= $300,000
Mode: most frequent value
= $100,000
Conclusion – Mean or Median?
Mean – use with symmetrical
distributions (no outliers),
because it is nonresistant.
Median – use with skewed
distribution or distribution with
outliers, because it is resistant.
Learning Goal 8
Know the basic
properties and how to
compute the standard
deviation and IQR of a
set of data.
Learning Goal 8:
How Spread Out is the Distribution?
Variation matters, and Statistics is
about variation.
Are the values of the distribution tightly
clustered around the center or more
spread out?
Always report a measure of spread
along with a measure of center when
describing a distribution numerically.
Learning Goal 8:
Measures of Spread
A measure of variability for a collection
of data values is a number that is
meant to convey the idea of spread for
the data set.
The most commonly used measures of
variability for sample data are the:
range
interquartile range
variance and standard deviation
Learning Goal 8:
Measures of Variation
Variation
Range
Interquartile
Range
Variance
Standard
Deviation
Measures of variation
give information on the
spread or variability
of the data values.
Same center,
different variation
Learning Goal 8:
The Interquartile Range
One way to describe the spread of a
set of data might be to ignore the
extremes and concentrate on the
middle of the data.
The interquartile range (IQR) lets us
ignore extreme data values and
concentrate on the middle of the data.
To find the IQR, we first need to know
what quartiles are…
Learning Goal 8:
The Interquartile Range
Quartiles divide the data into four equal
sections.
One quarter of the data lies below the lower
quartile, Q1
One quarter of the data lies above the upper
quartile, Q3.
The quartiles border the middle half of the data.
The difference between the quartiles is the
interquartile range (IQR), so
IQR = upper quartile(Q3) – lower quartile(Q1)
Learning Goal 8:
Interquartile Range
Eliminate some outlier or extreme
value problems by using the
interquartile range.
Eliminate some high- and low-valued
observations and calculate the range
from the remaining values.
IQR = 3rd quartile – 1st quartile
IQR = Q3 – Q1
Learning Goal 8:
Finding Quartiles
1.
2.
3.
4.
5.
Order the Data
Find the median, this divides the data into a
lower and upper half (the median itself is in
neither half).
Q1 is then the median of the lower half.
Q3 is the median of the upper half.
Example
Even data
Q1=27, M=39, Q3=50.5
IQR = 50.5 – 27 = 23.5
Odd data
Q1=35, M=46, Q3=54
IQR = 54 – 35 = 19
Learning Goal 8:
Quartiles
Example:
X
minimum
Q1
25%
12
Middle fifty
Median
(Q2)
25%
30
25%
45
X
Q3
maximum
25%
57
70
Interquartile range
= 57 – 30 = 27
Not influenced by extreme values (Resistant).
Learning Goal 8:
Quartiles
Quartiles split the ranked data into 4
segments with an equal number of values per
segment.
25%
25%
25%
25%
Q1
Q2
Q3
The first quartile, Q1, is the value for which
25% of the observations are smaller and 75%
are larger.
Q2 is the same as the median (50% are
smaller, 50% are larger).
Only 25% of the observations are greater
than the third quartile.
Learning Goal 8:
The Interquartile Range - Histogram
The lower and upper quartiles are the 25th
and 75th percentiles of the data, so…
The IQR contains the middle 50% of the
values of the distribution, as shown in figure:
+
Learning Goal 8:
Find and Interpret IQR
Travel times to work for 20 randomly selected New Yorkers
10
30
5
25
40
20
10
15
30
20
15
20
85
15
65
15
60
60
40
45
5
10
10
15
15
15
15
20
20
20
25
30
30
40
40
45
60
60
65
85
Q1 = 15
M = 22.5
Q3= 42.5
IQR = Q3 – Q1
= 42.5 – 15
= 27.5 minutes
Interpretation: The range of the middle half of travel times
for the New Yorkers in the sample is 27.5 minutes.
Learning Goal 8:
Interquartile Range on the TI-84
•
•
Use STATS/CALC/1-Var Stats to find
Q1 and Q3.
Then calculate IQR = Q3 – Q1.
Interquartile range = Q3 – Q1 = 9 – 6 = 3.
Learning Goal 8:
Calculate IQR - Your Turn
The following scores for a statistics 10point quiz were reported. What is the
value of the interquartile range?
7 8 9 6 8 0 9 9 9
0 0 7 10 9 8 5 7 9
Solution: IQR = 3
Learning Goal 8:
5-Number Summary
Definition:
The five-number summary of a distribution consists
of the smallest observation, the first quartile, the
median, the third quartile, and the largest observation,
written in order from smallest to largest.
Minimum
Q1
M
Q3
Maximum
Learning Goal 8:
5-Number Summary
The 5-number summary of a distribution
reports its minimum, 1st quartile Q1, median,
3rd quartile Q3, and maximum in that order.
Obtain 5-number summary from 1-Var Stats.
Min.
3.7
Q1
6.6
Med.
7
Q3
7.6
Max.
9
Learning Goal 8:
Calculate 5 Number Summary
1.
2.
3.
4.
5.
Enter data into L1.
STAT; CALC; 1:1-Var Stats; Enter.
List: L1.
Calculate.
Scroll down to 5 number summary.
228
Learning Goal 8:
Calculate 5 Number Summary – Your Turn
The grades of 25 students are given
below :
42, 63, 47, 77, 46, 71, 68, 83, 91, 55,
67, 66, 63, 57, 50, 69, 73, 82, 77, 58,
66, 79, 88, 97, 86.
Calculate the 5 number summary for the
students grades.
Solution: 42, 57.5, 68, 80.5, 97
Learning Goal 8:
Calculate 5 Number Summary – Your Turn
A group of University students took part in a
sponsored race. The number of laps completed is
given in the table.
number of laps
frequency (x)
1-5
2
6 – 10
9
11 – 15
15
16 – 20
20
21 – 25
17
26 – 30
25
31 – 35
2
36 - 40
1
Calculate the 5 number summary.
Solution: 3, 13, 18, 28, 38
Learning Goal 8:
Standard Deviation
A more powerful measure of spread
than the IQR is the standard deviation,
which takes into account how far each
data value is from the mean.
A deviation is the distance that a data
value is from the mean.
Since adding all deviations together would
total zero, we square each deviation and
find an average of sorts for the deviations.
But to calculate the standard deviation
you must first calculate the variance.
Learning Goal 8:
Variance
The variance is measure of variability
that uses all the data.
It measures the average deviation of
the measurements about their mean.
Learning Goal 8:
Variance
The variance, notated by s2, is found by
summing the squared deviations and
(almost) averaging them:
s
2
x x
2
n 1
Used to calculate Standard Deviation.
The variance will play a role later in our
study, but it is problematic as a measure of
spread - it is measured in squared units – not
the same units as the data, a serious
disadvantage!
Learning Goal 8:
Variance
The variance of a population of N
measurements is the average of the squared
deviations of the measurements about their
mean m.
Sigma
Squared
2
(
x
m
)
2
i
N
The variance of a sample of n measurements
is the sum of the squared deviations of the
measurements about their mean, divided by
(n – 1).
S
Squared
( xi x )
s
n 1
2
2
Learning Goal 8:
Standard Deviation
The standard deviation, s, is just the
square root of the variance.
Is measured in the same units as the
original data. Why it is preferred over
variance.
s
x x
n 1
2
Learning Goal 8:
Standard Deviation
In calculating the variance, we squared
all of the deviations, and in doing so
changed the scale of the
measurements.
To return this measure of variability to
the original units of measure, we
calculate the standard deviation, the
positive square root of the variance.
Population standard deviation :
Sample standard deviation : s s 2
2
Learning Goal 8:
Finding Standard Deviation
The most common measure of spread looks at how far
each observation is from the mean. This measure is
called the standard deviation. Let’s explore it!
Consider the following data on the number of pets
owned by a group of 9 children.
1) Calculate the mean.
2) Calculate each deviation.
deviation = observation – mean
deviation: 1 - 5 = -4
deviation: 8 - 5 = 3
x =5
Learning Goal 8:
Finding Standard Deviation
(xi-mean)2
xi
(xi-mean)
1
1 - 5 = -4
(-4)2 = 16
3
3 - 5 = -2
(-2)2 = 4
3) Square each deviation.
4
4 - 5 = -1
(-1)2 = 1
4) Find the “average” squared
deviation. Calculate the sum of
the squared deviations divided
by (n-1)…this is called the
variance.
4
4 - 5 = -1
(-1)2 = 1
4
4 - 5 = -1
(-1)2 = 1
5
5-5=0
(0)2 = 0
7
7-5=2
(2)2 = 4
8
8-5=3
(3)2 = 9
9
9-5=4
(4)2 = 16
5) Calculate the square root of the
variance…this is the standard
deviation.
Sum=?
“average” squared deviation = 52/(9-1) = 6.5
Standard deviation = square root of variance =
Sum=?
This is the variance.
6.5 2.55
Learning Goal 8:
Standard Deviation - Example
The standard deviation is used to describe the variation around the mean.
1) First calculate the variance s2.
1 n
2
s
(
x
x
)
i
n 1 1
2
2) Then take the square root to get
the standard deviation s.
x
Mean
± 1 s.d.
1 n
2
s
(
x
x
)
i
n 1 1
Learning Goal 8:
Standard Deviation - Procedure
1. Compute the mean .
x
2. Subtract the mean from each individual
value to get a list of the deviations from the
mean x x .
3. Square each of the differences to produce
the square
of the deviations from the mean
2
x x.
4. Add all of the squares of the deviations from
2
the mean to get x x .
x x
5. Divide the sum
by n 1 . [variance]
6. Find the square root of the result.
2
Learning Goal 8:
Standard Deviation - Example
Find the standard deviation of the Mulberry
Bank customer waiting times. Those times (in
minutes) are 1, 3, 14. Use a Table.
We will not normally calculate standard deviation by hand.
Learning Goal 8:
Calculate Standard Deviation
1.
2.
3.
4.
5.
Enter data into L1
STAT; CALC; 1:1-Var Stats; Enter
List: L1;Calculator
Sx is the sample standard deviation.
σx is the population standard
deviation.
242
Learning Goal 8:
Calculate Standard Deviation – Your Turn
The prices ($) of 18 brands of walking
shoes:
90 70 70 70 75 70
65 68 60 74 70 95
75 70 68 65 40 65
Calculate the standard deviation.
Solution: Sx = $11.31
Learning Goal 8:
Calculate Standard Deviation – Your Turn
During 3 hours at Heathrow airport 55 aircraft
arrived late. The number of minutes they
were late is shown in the grouped frequency
table.
minutes late
frequency
010 20 30 40 50 -
9
19
29
39
49
59
27
10
7
5
4
2
Calculate the standard deviation for the
number of minutes late.
Solution: 14.9 min.
Learning Goal 8:
Standard Deviation - Properties
The value of s is always positive.
s is zero only when all of the data values are the
same number.
Larger values of s indicate greater amounts of
variation.
The units of s are the same as the units of the
original data. One reason s is preferred to s2.
Measures spread about the mean and should
only be used to describe the spread of a
distribution when the mean is used to describe
the center (ie. symmetrical distributions).
Nonresistant (like the mean), s can increase
dramatically due to extreme values or outliers.
Learning Goal 8:
Standard Deviation - Example
Larger values of standard deviation indicate greater amounts
of variation.
Small standard deviation
Large standard deviation
Learning Goal 8:
Standard Deviation - Example
Standard Deviation: the more variation, the
larger the standard deviation. Data set II has
greater variation.
Learning Goal 8:
Standard Deviation - Example
Data Set I
Data Set II
Data set II has greater variation and the visual clearly
shows that it is more spread out.
Learning Goal 8:
Comparing Standard Deviations
The more variation, the larger the standard deviation.
Data A
11
12
13
14
15
16
17
18
19
20 21
Mean = 15.5
S = 3.338
20 21
Mean = 15.5
S = 0.926
20 21
Mean = 15.5
S = 4.567
Data B
11
12
13
14
15
16
17
18
19
Data C
11
12
13
14
15
16
17
18
19
Values far from the mean are given extra weight
(because deviations from the mean are squared).
Learning Goal 8:
Spread: Range
The range of the data is the difference
between the maximum and minimum
values:
Range = max – min
A disadvantage of the range is that a
single extreme value can make it very
large and, thus, not representative of
the data overall.
Learning Goal 8:
Range
Simplest measure of variation.
Difference between the largest and the
smallest values in a set of data.
Example:
Range = Xlargest – Xsmallest
0 1 2 3 4 5 6 7 8 9 10 11 12
Range = 14 - 1 = 13
13 14
Learning Goal 8:
Disadvantages of the Range
Ignores the way in which data are distributed
7
8 9 10 11 12
Range = 12 - 7 = 5
7
8 9 10 11 12
Range = 12 - 7 = 5
Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
Learning Goal 8:
Range
• The range is affected by outliers (large
or small values relative to the rest of
the data set).
• The range does not utilize all the
information in the data set only the
largest and smallest values.
• Thus, range is not a very useful
measure of spread or variation.
Learning Goal 8:
Summary Measures
Describing Data Numerically
Central Tendency
Quartiles
Variation
Mean
Range
Median
Interquartile Range
Mode
Variance
Standard Deviation
Shape
Skewness
Learning Goal 9
Understand which
measures of center and
spread are resistant
and which are not.
Learning Goal 9:
Resistant or Non-Resistant
Which measures of center and spread
are resistant?
1. Median – Extreme values and outliers
have little effect.
2. IQR – Measures the spread of the middle
50% of the data, therefore extreme
values and outliers have no effect.
3. When using Median to measure the
center of a distribution, use IQR to
measure the spread of the distribution.
Learning Goal 9:
Resistant or Non-Resistant
Which measures of center and spread
are Non-Resistant?
1. Mean – Extreme values and outliers pull
the mean towards those values.
2. Standard Deviation – Measures the
spread relative to the mean. Extreme
values or outliers will increase the
standard deviation of the distribution.
3. When using Mean to measure the center
of a distribution, use Standard Deviation
to measure the spread of the distribution.
Learning Goal 9:
Resistant or Non-Resistant
Measures of Center:
Mean (not resistant)
Median (resistant)
Measures of Spread:
Standard deviation (not resistant)
IQR (resistant)
Range (not resistant)
Most often and preferred, use the mean and the
standard deviation, because they are calculated
based on all the data values, so use all the
available information.
Learning Goal 9:
Resistant or Non-Resistant
Animated
Center and Spread
63.33
Mean:
68.82
Mean:72.5
72.5
70
Median:
70
Median:72.5
72.5
S:
16.84
S:
12.56
S:10.16
10.16
IQR:
30
IQR:
20
IQR: 15
15
What is the difference between
the center and spread of a
distribution?
Which measure of center
(mean or median) was affected
more by adding data points
that skewed the distribution?
Explain your answer.
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Quiz Scores
In a symmetric distribution:
• The mean, non-resistant, is used to represent the center.
• The standard deviation (S), non-resistant, is used to represent the spread.
In a skewed distribution:
• The median, resistant, is used to represent the center.
• The interquartile range (IQR), resistant, is used to represent the spread.
©2013 All rights reserved.
For each distribution below,
which measure of center and
spread would you use?
How do you know?
A
B
Mean
&S
Median
& IQR
CCSS 6th Grade Statistics and Probability 2.0
Describe the distribution of a data set.
Lesson to be used by EDI-trained teachers only.
Learning Goal 9:
Resistant or Non-Resistant
Median and IQR are paired together –
Resistant.
Mean and Standard Deviation are
paired together – Non-Resistant.
Learning Goal 10
Be able to select a
suitable measure of
center and a suitable
measure of spread for
a variable based on
information about its
distribution.
Learning Goal 10:
Choosing Measures of Center and Spread
We now have a choice between two descriptions for center and
spread
Mean and Standard Deviation
Median and Interquartile Range
Choosing Measures of Center and Spread
•The median and IQR are usually better than the mean and
standard deviation for describing a skewed distribution or a
distribution with outliers.
•Use mean and standard deviation only for reasonably
symmetric distributions that don’t have outliers.
•NOTE: Numerical summaries do not fully describe the
shape of a distribution. ALWAYS PLOT YOUR DATA!
Learning Goal 10:
Choosing Measures of Center and Spread
Plot your data
Dotplot, Stemplot, Histogram
Interpret what you see:
Shape, Outliers, Center, Spread
Choose numerical summary:
𝒙 and s, or
Median and IQR
Learning Goal 10:
Choosing Center and Spread - Practice
The distribution of a data set shows the arrangement of values in the data set.
The center of a distribution is a number that represents all the values in the data set.
The spread of a distribution is a number that describes the variability in the data set.
The dot plots below show the ratings given to a new movie by two different audiences.
1.
1
2.
Audience #1
2
3
4 5 6 7 8
Audience Rating
9 10
Mean: 7
Median: 7
S: 1.43
IQR: 2
1
Symmetric
Audience #2
2
3
4 5 6 7 8
Audience Rating
9 10
Mean: 5.71
Median: 6
S: 1.67
IQR: 3
Center: Mean
Spread: S
Skewed
Shape: The shape of the distribution is mostly
Shape: The shape of the distribution is mostly
symmetric.
Center: Because the distribution is symmetric, the
mean of 7 can be used as the measure of center.
Spread: The S of the distribution is 1.43.
symmetric.
Center: Because the distribution is symmetric, the
mean of 5.71 can be used as the measure of center.
Spread:The S of the distribution is 1.67.
Center: Median
Spread: IQR
©2013 All rights reserved.
CCSS 6th Grade Statistics and Probability 2.0
Describe the distribution of a data set.
Lesson to be used by EDI-trained teachers only.
Learning Goal 10:
Choosing Center and Spread - Practice
The distribution of a data set shows the arrangement of values in the data set.
The center of a distribution is a number that represents all the values in the data set.
The spread of a distribution is a number that describes the variability in the data set.
The histograms below show the number of hours studied in a week for students in two math classes.
4.
Class #1
Students
10
8
6
4
2
0-2
3-5
6-8
9-11 12-14 15-17
Mean: 9.69
Median: 10.5
S: 3.6
IQR: 6.5
Symmetric
Class #2
10
8
6
4
2
Students
3.
0-2
Hours Studied
3-5
6-8
9-11 12-14 15-17
Mean: 7.75
Median: 7
S: 2.93
IQR: 4.5
Center: Mean
Spread: S
Hours Studied
Shape: The shape of the distribution is skewed to
Shape: The shape of the distribution is skewed to
the left.
the right
Center: Because the distribution is skewed, the
Center: Because the distribution is skewed, the
Skewed
median of 10.5 can be used as the measure of center. median of 7 can be used as the measure of center.
Spread: The IQR of the distribution is 6.5.
Spread:The IQR of the distribution is 4.5.
Center: Median
Spread: IQR
©2013 All rights reserved.
CCSS 6th Grade Statistics and Probability 2.0
Describe the distribution of a data set.
Lesson to be used by EDI-trained teachers only.
Learning Goal 10:
Choosing Center and Spread - Practice
The distribution of a data set shows the arrangement of values in the data set.
The center of a distribution is a number that represents all the values in the data set.
The spread of a distribution is a number that describes the variability in the data set.
The dot plot below shows the number of hours of The histogram below shows the number of hours of
sleep per night for 33 students in a 6th-grade class. sleep per night for 33 adults selected at random.
1.
2.
4
5
6 7 8 9 10 11
Hours of Sleep
Adults
Mean: 8.4
Median: 9
S: 1.53
IQR: 3
12
10
8
6
4
2
Mean: 6.8
Median: 7
S: 1.54
IQR: 2.5
0-1
2-3
4-5
6-7
8-9
Center: Mean
Spread: S
10+
Hours Slept
Skewed
Shape: The shape of the distribution is skewed
Shape: The shape of the distribution is fairly
left.
symmetric, with a slight skew to the left.
Center: Because the distribution is mostly symmetric,
the mean of 6.8 can be used as the measure of center.
Spread:The S of the distribution is 1.54.
Center: Because the distribution is skewed, the
median of 9 can be used as the measure of center.
Spread: The IQR of the distribution is 3.
Symmetric
Center: Median
Spread: IQR
©2013 All rights reserved.
CCSS 6th Grade Statistics and Probability 2.0
Describe the distribution of a data set.
Lesson to be used by EDI-trained teachers only.
Learning Goal 10:
Choosing Center and Spread - Practice
The histograms below show the scores of 31 students on a pretest and posttest.
Pretest
41-50 51-60 61-70 71-80 81-90 91-100
Mean: 57.67
Median: 54
S: 9.07
IQR: 14
Score
12
10
8
6
4
2
Students
2.
12
10
8
6
4
2
Students
1.
Posttest
41-50 51-60 61-70 71-80 81-90 91-100
Mean: 76
Median: 76
S: 9.81
IQR: 24
Score
Shape: The shape of the distribution is skewed
Shape: The shape of the distribution is mostly
right.
symmetric.
Center: Because the distribution is mostly symmetric,
the mean of 76 can be used as the measure of center.
Spread:The S of the distribution is 9.81.
Center: Because the distribution is skewed, the
median of 54 can be used as the measure of center.
Spread: The IQR of the distribution is 14.
Did scores on the test improve from the pretest
to the posttest? Explain your answer.
Yes, test scores improved from the pretest to the posttest. It
can be seen by the noticeably higher center in the distribution
of scores for the posttest.
CCSS 6 Grade Statistics and Probability 2.0
th
©2013 All rights reserved.
Describe the distribution of a data set.
Lesson to be used by EDI-trained teachers only.
Learning Goal 10:
Choosing Center and Spread - Practice
The dot plot below shows the number of pets in each
household of 28 students in a 6th-grade class.
Mean: 1.82
Median: 2
S: 1.13
IQR: 1.5
1.
Shape:
The shape of the distribution is skewed right.
Center: Because the distribution is skewed, the
median of 2 can be used as the measure of center.
Spread: The IQR of the distribution is 1.5.
0
1
2
3
4
5
6
7
8
9
Number of Pets
©2013 All rights reserved.
CCSS 6th Grade Statistics and Probability 2.0
Describe the distribution of a data set.
Lesson to be used by EDI-trained teachers only.
Learning Goal 10:
Choosing Center and Spread - Questions
Choose Yes or No to indicate
whether each statement is
true about this distributions.
A. Both distributions are symmetric.
B. The median is the best measure of center
for Distribution A.
C. Overall, scores were higher in Distribution A
than Distribution B.
D. There is more variability in scores for
Distribution A than Distribution B.
E. Distribution A is skewed to the right.
F. The Standard Deviation can be used
to describe the spread for Distribution B.
©2013 All rights reserved.
O Yes O No
O Yes O No
O Yes O No
O Yes O No
O Yes O No
O Yes O No
CCSS 6th Grade Statistics and Probability 2.0
Describe the distribution of a data set.
Lesson to be used by EDI-trained teachers only.
Learning Goal 11
Be able to describe the
distribution of a
quantitative variable in
terms of its shape,
center, and spread.
Learning Goal 11:
How to Analysis Quantitative Data
2009 Fuel Economy Guide
Examine each variable
by itself.
Then study
relationships among
the variables.
MODEL
2009 Fuel Economy Guide
2009 Fuel Economy Guide
MPG
MPG
MODEL
<new>MODEL
MPG
1
Acura RL
9 22 Dodge Avenger
1630 Mercedes-Benz E350
24
2
Audi A6 Quattro
1023 Hyundai Elantra
1733 Mercury Milan
29
3
Bentley Arnage
1114 Jaguar XF
1825 Mitsubishi Galant
27
4
BMW 5281
1228 Kia Optima
1932 Nissan Maxima
26
5
Buick Lacrosse
1328 Lexus GS 350
2026 Rolls Royce Phantom
18
6
Cadillac CTS
1425 Lincolon MKZ
2128 Saturn Aura
33
7
Chevrolet Malibu
1533 Mazda 6
2229 Toyota Camry
31
8
Chrysler Sebring
1630 Mercedes-Benz E350
2324 Volkswagen Passat
29
9
Dodge Avenger
1730 Mercury Milan
2429 Volvo S80
25
Start with a graph or
graphs
Add numerical
summaries
<new>
Learning Goal 11:
How to Describe a Quantitative Distribution
The purpose of a graph is to help us understand the data. After you
make a graph, always ask, “What do I see?”
How to Describe the Distribution of a Quantitative Variable
In any graph, look for the overall pattern and for striking
departures from that pattern.
Describe the overall pattern of a distribution by its:
•Shape
Don’t forget your
•Center
SOCS!
•Spread
•Outliers
Note individual values that fall outside the overall pattern.
These departures are called outliers.
Learning Goal 11:
Describing a Quantitative Distribution
We describe a distribution (the values the variable
takes on and how often it takes these values) using
the acronym SOCS.
Shape– We describe the shape of a distribution in one of
two ways:
Symmetric/Approx. Symmetric
or
Skewed right/Skewed left
Approx. Symmetric (with extreme values)
Dot Plot
Number of Home Runs in a Single Season
Babe Ruth’s
Single Season
Home Runs
20
25
30
35
40
45
Ruth
50
55
60
65
Learning Goal 11:
Describing a Quantitative Distribution
Outliers: Observations that we would consider
“unusual”. Data that don’t “fit” the overall pattern of
the distribution.
Babe Ruth had two seasons that appear to be
somewhat different than the rest of his career.
These may be “outliers”. (We’ll learn a numerical way
to determine if observations are truly “unusual” later).
Outliers 22, 25
Dot Plot
Number of Home Runs in a Single Season
Babe Ruth’s
Single Season
Home Runs
Possible Outliers
20
25
30
35
Unusual observation???
40
45
Ruth
50
55
60
65
Learning Goal 11:
Describing a Quantitative Distribution
Center: A single value that describes the entire
distribution. Symmetric distributions use mean and
skewed distributions use median.
Dot Plot
Number of Home Runs in a Single Season
Babe Ruth’s
Single Season
Home Runs
20
Median is 46
25
30
35
40
45
Ruth
50
55
60
65
Learning Goal 11:
Describing a Quantitative Distribution
Spread: Talk about the variation of a distribution.
Symmetric distributions use standard deviation and
skewed distributions use IQR.
Dot Plot
Number of Home Runs in a Single Season
Babe Ruth’s
Single Season
Home Runs
20
25
30
35
Q1
IQR is 19
40
45
Ruth
50
55
Q3
60
65
Learning Goal 11:
Distribution Description using SOCS
The distribution of Babe Ruth’s
number of home runs in a single
season is approximately symmetric1
with two possible outlier observations
at 23 and 25 home runs.2 He typically
hits about 463 home runs in a season.
Over his career, the number of home
runs has normally varied from
between 35 and 54.4
1-Shape
2-Outliers
3-Center
4-Spread
Learning Goal 11:
Describe the Distribution – Your Turn
The table and dotplot below displays the
Environmental Protection Agency’s estimates
of highway gas mileage in miles per gallon
(MPG) for a sample of 24 model year 2009
midsize cars.
Describe the shape, center, and spread of
the distribution. Are there any outliers?
2009 Fuel Economy Guide
MODEL
2009 Fuel Economy Guide
2009 Fuel Economy Guide
MPG
MPG
MODEL
<new>MODEL
MPG
1
Acura RL
922 Dodge Avenger
1630 Mercedes-Benz E350
24
2
Audi A6 Quattro
1023 Hyundai Elantra
1733 Mercury Milan
29
3
Bentley Arnage
1114 Jaguar XF
1825 Mitsubishi Galant
27
4
BMW 5281
1228 Kia Optima
1932 Nissan Maxima
26
5
Buick Lacrosse
1328 Lexus GS 350
2026 Rolls Royce Phantom
18
6
Cadillac CTS
1425 Lincolon MKZ
2128 Saturn Aura
33
7
Chevrolet Malibu
1533 Mazda 6
2229 Toyota Camry
31
8
Chrysler Sebring
1630 Mercedes-Benz E350
2324 Volksw agen Passat
29
9
Dodge Avenger
1730 Mercury Milan
2429 Volvo S80
25
<new>
Learning Goal 11:
Describe the Distribution – Solution
The distribution of highway gas
mileage in miles per gallon (MPG) for
a sample of 24 model year 2009
midsize cars is skewed left with two
possible outliers at 18 and 14 miles
per gallon. The gas mileage of a
typical 2009 midsize car in the sample
is 28 mpg. The gas mileage normally
varied from between 24.5 and 30
mpg.
Learning Goal 11:
Describe the Distribution – Your Turn
Smart Phone
Battery Life
(minutes)
Apple iPhone
300
Motorola Droid
385
Palm Pre
300
Blackberry
Bold
Blackberry
Storm
Motorola Cliq
Samsung
Moment
Blackberry
Tour
HTC Droid
360
330
360
330
300
460
Smart Phone
Battery Life:
Here is the
estimated battery
life for each of 9
different smart
phones in minutes.
Describe the
distribution.
Learning Goal 11:
Describe the Distribution – Solution
Collection 1
Dot Plot
Solution:
300
340 380 420 460
BatteryLife (minutes)
300 300 300 330 330 360 360 385 460
Shape: There is a peak at 300 and the distribution has a
long tail to the right (skewed to the right).
Center: The median value is 330 minutes.
Spread: The IQR is 72.5 minutes.
Outliers: There is one phone with an unusually long
battery life, the HTC Droid at 460 minutes.
Cartoon Time
Assignment
Chapter 4 Notes Worksheet
Exercises pg. 72 – 79: #5 - 18, 30 - 33, 43,
44, 48
Read Ch-5, pg. 80 - 94