Section 2.5 - Web4students

Download Report

Transcript Section 2.5 - Web4students

Chapter 2
Exploring Data with Graphs
and Numerical Summaries
Section 2.5
Using Measures of Position to Describe
Variability
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Percentile
The p th percentile is a value such that p percent of the
observations fall below or at that value.
3
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Quartiles
Figure 2.14 The Quartiles Split the Distribution Into Four Parts. 25% is below the first
quartile (Q1), 25% is between the first quartile and the second quartile (the median, Q2), 25%
is between the second quartile and the third quartile (Q3), and 25% is above the third quartile.
Question: Why is the second quartile also the median?
4
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
SUMMARY: Finding Quartiles
 Arrange the data in order.
 Consider the median. This is the second quartile, Q2.
 Consider the lower half of the observations (excluding
the median itself if n is odd). The median of these
observations is the first quartile, Q1.
 Consider the upper half of the observations (excluding
the median itself if n is odd).
 Their median is the third quartile, Q3.
5
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Example: Cereal Sodium Data
Consider the sodium values for the 20 breakfast cereals. What are the
quartiles for the 20 cereal sodium values? From Table 2.3, the sodium values, in
ascending order, are:
The median of the 20 values is the average of the 10th and 11th
observations, 180 and 180, which is Q2 = 180 mg.
The first quartile Q1 is the median of the 10 smallest observations (in the
top row), which is the average of 130 and 140, Q1 = 135mg.
The third quartile Q3 is the median of the 10 largest observations (in the
bottom row), which is the average of 200 and 210, Q3 = 205 mg.
6
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
The Interquartile Range (IQR)
The interquartile range is the distance between the third
quartile and first quartile:
IQR = Q3  Q1
IQR gives spread of middle 50% of the data
In Words: If the interquartile range of U.S. music teacher salaries equals $16,000, this
means that for the middle 50% of the distribution of salaries, $16,000 is the distance
between the largest and smallest salaries.
7
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Detecting Potential Outliers
Examining the data for unusual observations, such as
outliers, is important in any statistical analysis. Is there a
formula for flagging an observation as potentially being an
outlier?
The 1.5 x IQR Criterion for Identifying Potential Outliers
An observation is a potential outlier if it falls more than
1.5 x IQR below the first quartile or more than 1.5 x IQR
above the third quartile.
8
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
5-Number Summary of Positions
The 5-number summary is the basis of a graphical display called
the box plot, and consists of
9

Minimum value

First Quartile

Median

Third Quartile

Maximum value
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
SUMMARY: Constructing a Box Plot
 A box goes from Q1 to Q3.
 A line is drawn inside the box at the median.
 A line goes from the lower end of the box to the
smallest observation that is not a potential outlier and
from the upper end of the box to the largest
observation that is not a potential outlier.
 The potential outliers are shown separately.
10
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Example: Boxplot for Cereal Sodium Data
Figure 2.15 shows a box plot for the sodium values. Labels
are also given for the five-number summary of positions.
Figure 2.15 Box Plot and Five-Number Summary for 20 Breakfast Cereal Sodium Values.
The central box contains the middle 50% of the data. The line in the box marks the median.
Whiskers extend from the box to the smallest and largest observations, which are not identified
as potential outliers. Potential outliers are marked separately. Question: Why is the left whisker
drawn down only to 50 rather than to 0?
11
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Comparing Distributions
A box plot does not portray certain features of a distribution, such as
distinct mounds and possible gaps, as clearly as does a histogram.
Box plots are useful for identifying potential outliers.
Figure 2.16 Box Plots of Male and Female College Student Heights. The box plots use the
same scale for height. Question: What are approximate values of the quartiles for the two
groups?
12
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Z-Score
The z-score also identifies position and potential outliers.
The z-score for an observation is the number of standard
deviations that it falls from the mean. A positive z-score
indicates the observation is above the mean. A negative zscore indicates the observation is below the mean. For sample
data, the z -score is calculated as:
z
observation - mean
standard deviation
An observation from a bell-shaped distribution is a potential
outlier if its z-score < -3 or > +3 (3 standard deviation
criterion) .
13
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.