Transcript Box Plots

Why use boxplots?
• ease of construction
• convenient handling of outliers
• construction is not subjective
(like histograms)
• Used with medium or large size
data sets (n > 10)
• useful for comparative displays
Disadvantage of
boxplots
• does not retain the
individual observations
• should not be used with
small data sets (n < 10)
How to construct
• find five-number summary
Min Q1 Med Q3 Max
• draw box from Q1 to Q3
• draw median as center line in
the box
• extend whiskers to min & max
Modified boxplots
• display outliers
• fencesALWAYS
mark offuse
mild
&
modified
extreme
outliers
boxplots in this class!!!
• whiskers extend to largest
(smallest) data value inside the
fence
Inner fence
Interquartile Range
Q1 –– 1.5IQR
Q3 + 1.5IQR
(IQR)
is the range
(length) of
theobservation
box
Any
outside this
Q3 -fence
Q1 is an outlier! Put a dot
for the outliers.
Q1
Q3
Modified Boxplot . . .
Draw the “whisker” from the quartiles
to the observation that is within the
fence!
Q1
Q3
Outer fence
Q1 – 3IQR
Q3 + 3IQR
observation
between
AnyAny
observation
outside
this
theisfences
is considered
fence
an extreme
outlier! a
mild outlier.
Q1
Q3
For the AP Exam . . .
. . . you just need to find outliers,
you DO NOT need to identify them
as mild or extreme.
Therefore, you just need to use the
1.5IQRs
A report from the U.S. Department of Justice gave
the following percent increase in federal prison
populations in 20 northeastern & mid-western
states in 1999.
5.9
4.5
1.3
3.5
5.0
7.2
5.9
6.4
4.5
5.5
5.6
5.3
4.1
8.0
6.3
4.4
4.8
7.2
Create a modified boxplot. Describe the distribution.
Use the calculator to create a modified boxplot.
6.9
3.2
Symmetrical boxplots
Approximately symmetrical boxplot
Skewed boxplot
Evidence suggests that a high indoor radon
concentration might be linked to the development of
childhood cancers. The data that follows is the
radon concentration in two different samples of
houses. The first sample consisted of houses in
which a child was diagnosed with cancer. Houses in
the second sample had no recorded cases of
childhood cancer.
Cancer
10 21
20 45
16 21
17 33
5 23 15 11 9 13 27 13 39 22 7
12 15 3 8 11 18 16 23 16 9 57
18 38 37 10 15 11 18 210 22 11 16
10
No Cancer
9 38 11 12 29 5 7 6 8 29 24 12 17
11 11 3 9 33 17 55 11 29 13 24 7 11
21 6 39 29 7 8 55 9 21 9 3 85 11 14
Create parallel boxplots. Compare the distributions.
Cancer’s 5 # Summary:
Min
Q1
Med
Q3
Max
3
11
16
22
210
Q3
26.5
Max
85
IQR = 11
No Cancer’s 5 # Summary:
Min
Q1
3
8.5
Med
11.5
IQR = 18
Calculating the fence (Cancer):
Q1 – 1.5 IQR
11 – 1.5*11 = - 5.5
Q3 + 1.5 IQR
22 + 1.5*11 = 38.5
Calculating the fence (No Cancer):
Q1 – 1.5 IQR
8.5 – 1.5*18 = -18.5
Q3 + 1.5 IQR
26.5 + 1.5*18 = 53.5
Creating a Box Plot
Cancer
No
Cancer
0
50
100
Radon
150
200
Cancer
No Cancer
100
200
Radon
The median radon concentration for the no cancer
group is lower than the median for the cancer
group. The range of the cancer group is larger than
the range for the no cancer group. Both
distributions are skewed right. The cancer group
has outliers at 39, 45, 57, and 210. The no cancer
group has outliers at 55 and 85.
Creating a Box Plot on your
Calculator
Knowing about the DATA
• Which terms best represent the data?
– The mean and median best illustrate skewed data
– While variance and standard deviation represent
symmetrical data
– Spread – how far away from the mean does the data
stretch
– To calculate variances – we need to square the
differences between the mean and each data value.
– Variance (s2) - a measure of how far a set of numbers
is spread out.
A variance of zero indicates that all the values are identical
Example:
• A person’s metabolic rate is the rate at which
the body consumes energy. Metabolic rate is
important in studies of weight gain, dieting and
exercise. Here are the metabolic rate of 7 men
who took part in a study of dieting (units per
24 hours)
• Data: 1792
1666
1362
1614
1460
1867
1439
• Calculating standard deviation and variance on
the calculator