Transcript Chapter 4

Describing Data:
Displaying and Exploring Data
Chapter 4
McGraw-Hill/Irwin
©The McGraw-Hill Companies, Inc. 2008
GOALS







2
Develop and interpret a dot plot.
Develop and interpret a stem-and-leaf
display.
Compute and understand quartiles, deciles,
and percentiles.
Construct and interpret box plots.
Compute and understand the coefficient of
skewness.
Draw and interpret a scatter diagram.
Construct and interpret a contingency table.
Dot Plots

A dot plot groups the data as little as possible,
identifying all individual observations.
–
–
3
In a dot plot, each observation is simply displayed
as a dot along a horizontal number line indicating
the possible values of the data.
If there are identical observations or observations
are too close to be shown individually, the dots
are “piled” on top of each other.
Dot Plots - Examples
Reported below are the number of vehicles sold in the last 24
months at Smith Ford Mercury Jeep, Inc., and Brophy Honda
Volkswagen. Construct dot plots and report summary statistics
for the two dealers.
4
Dot Plot – Minitab Example
5
Stem-and-Leaf

In Chapter 2, we organize data into a frequency distribution.
–

A technique used to display quantitative information in a
condensed form is the stem-and-leaf display.

Stem-and-leaf display is a statistical technique to present a
set of data.
–
–

6
A quick visual picture of the shape of the distribution.
Each numerical value is divided into two parts. The leading
digit(s) becomes the stem and the trailing digit the leaf.
The stems are located along the vertical axis, and the leaf values
are stacked against each other along the horizontal axis.
Advantage of the stem-and-leaf display over a frequency
distribution - the identity of each observation is not lost.
Stem-and-Leaf – Example
Suppose the seven observations in
the 90 up to 100 class are: 96, 94,
93, 94, 95, 96, and 97.
The stem value is the leading digit or
digits, in this case 9. The leaves
are the trailing digits.
The stem is placed to the left of a
vertical line and the leaf values to
the right.
Then, we sort the values within each
stem from smallest to largest.
Thus, the stem-and-leaf display
would appear as follows:
7
Stem-and-leaf: Another Example
Listed in Table 4–1 is the number of 30-second radio advertising
spots purchased by each of the 45 members of Automobile
Dealers Association. Organize the data into a stem-and-leaf
display. Around what values do the number of advertising spots
tend to cluster? What is the fewest number of spots purchased
by a dealer? The largest number purchased?
8
Stem-and-leaf: Another Example
9
Quartiles, Deciles and Percentiles
10

The standard deviation is the most widely used
measure of dispersion.

Alternative ways of describing spread of data include
determining the location of values that divide a set
of observations into equal parts.

These measures include quartiles, deciles, and
percentiles.
Percentile Computation
11

To formalize the computational procedure, let Lp refer to the
location of a desired percentile. If we want to find the 33rd
percentile we would use L33 and if we wanted the median, the
50th percentile, then L50.

For example, with the number of observations n, if we want to
locate the median (i.e. L50), its position is at (n + 1)/2 & the
location of the first quartile will be (n + 1)/4 .
Percentiles - Example
Listed below are the commissions earned by a sample
of 15 brokers at one office of Salomon Smith Barney,
which is an investment company.
$2,038
$2,097
$2,287
$2,406
$1,758
$2,047
$1,940
$1,471
$1,721 $1,637
$2,205 $1,787
$2,311 $2,054
$1,460
Locate the median, the first quartile, and the third
quartile for the commissions earned.
12
Percentiles – Example (cont.)
Step 1: Organize the data from lowest to
largest value
$1,460
$1,758
$2,047
$2,287
13
$1,471
$1,787
$2,054
$2,311
$1,637
$1,940
$2,097
$2,406
$1,721
$2,038
$2,205
Percentiles – Example (cont.)
Step 2: Compute the first and third quartiles.
Locate L25 and L75 using:
25
75
L25  (15  1)
4
L75  (15  1)
 12
100
100
Therefore, the first and third quartiles are the 4th and 12th
observatio n in the array, respective ly
L25  $1,721
L75  $2,205
14
Percentiles – Example (Excel)
15
Boxplot - Example
16
Boxplot Example
17
Skewness



In Chapter 3, measures of central location (the mean,
median, and mode) and measures of dispersion (e.g.
range and the standard deviation) were introduced
Another characteristic of a set of data is the shape.
There are four shapes commonly observed:
–
–
–
–
18
symmetric,
positively skewed,
negatively skewed,
bimodal.
Commonly Observed Shapes
19
Skewness - Formulas for Computing
There are several measures of skewness, with the simplest one being
Pearson’s coefficient of skewness, ranging from -3 up to 3.
–
–
–

20
A value near -3 indicates considerable negative skewness.
A value such as 1.63 indicates moderate positive skewness.
A value of 0, which will occur when the mean and median are equal,
indicates the distribution is symmetrical and that there is no skewness.
Another example is software coefficient of skewness, based on the
cubes of standardized deviations from the mean.
Skewness – An Example
21

Following are the earnings per share for a sample of
15 software companies in 2005. The values are
arranged from smallest to largest.

Compute the mean, median, and standard deviation.
Find the coefficient of skewness using Pearson’s
estimate. What is your conclusion regarding the
shape of the distribution?
Skewness – An Example Using
Pearson’s Coefficient
X
X
n


$74.26
 $4.95
15

2
 X X
($0.09  $4.95) 2  ...  ($16.40  $4.95) 2 )
s

 $5.22
n 1
15  1
3( X  Median ) 3($4.95  $3.18)
sk 

 1.017
s
$5.22
22
Describing Relationship between Two
Variables


23
One graphical technique we
use to show the relationship
between variables is called a
scatter diagram.
To draw a scatter diagram we
need two variables. We scale
one variable along the
horizontal axis (X-axis) and
the other variable along the
vertical axis (Y-axis) of a
graph.
Describing Relationship between Two
Variables – Scatter Diagram Examples
24
Describing Relationship between Two
Variables – Scatter Diagram Excel Example
In Chapter 2 we saw data giving the
information on the prices of 80 vehicles sold
at an automobile dealer. The data include the
selling price of the vehicle as well as the age
of the purchaser.
Is there a relationship between the selling price
of a vehicle and the age of the purchaser?
Would it be reasonable to conclude that the
more expensive vehicles are purchased by
older buyers?
25
Describing Relationship between Two
Variables – Scatter Diagram Excel Example
26
Contingency Tables


27
A scatter diagram requires that both of the
variables be at least interval scale.
What if we wish to study the relationship
between two variables when one or both are
nominal or ordinal scale? In this case we tally
the results in a contingency table.
Contingency Tables – An Example
A manufacturer of windows produced 50 products yesterday. The
inspector reviewed each window for its quality. Each was
classified as acceptable or defective and by the shift on which it
was produced. Thus we reported two variables on each product.
The two variables are shift and quality. The results are reported in
the following table.
28
End of Chapter 4
29