Transcript Document

Chapter 3
Descriptive Statistics II: Additional
Descriptive Measures and Data
Displays
PERCENTILES
If the value A is the pth percentile value for a
data set, then at least p% of the values are less
than or equal to A and at least (1-p)% of the
values are greater than or equal to A.
Percentile Rules
p 
Rule 1: If the position calculator, 
  n , produces an
 100 
integer, average the value occupying that position in the
ordered list with the value in the next higher position and
use the result as the pth percentile value.
p 
Rule 2: If the position calculator, 
  n , produces a
 100 
non-integer, round the position result up to the next
higher integer. The pth percentile value will be the value
occupying that position in the ordered list.
Quartiles
Quartiles Q1, Q2, and Q3 break an ordered list
of numbers into four approximately equal
subgroups, each containing about 25% of the
values.
Interquartile Range (3.1)
IQR = Q3 – Q1
Stem-and-Leaf Illustration
89
80
57
95
82
95
88 55 66
85 60 85
65 70 99 100 74
90 80 90 92 95
70 85 72 75
98 65 80 89
The stem-and-leaf diagram for the data appears below:
This row shows the values 66, 65, 60 and 65, in the
order in which they appear in the data list.
5
6
7
8
9
10
7
6
0
9
9
0
5
5
4
2
5
0
0
8
5
5
25
5 0 5 5 0 09
0 0 2 5 8
Figure 3.1 Box Plot Illustration
Smallest
Middle 50%
Q1
Largest
Q3
Q2(median)
220
225
230
235
240
245
250
In a standard box plot, the box extends from the first quartile to the third
quartile. The position of the median is indicated inside the box.
The “whiskers” extend to the largest and smallest values
.
Figure 3.2
220
225
A Second Box Plot
230
235
240
245
250
This box plot represents a symmetric data set, with the median centered
inside the box.
Identifying Outliers
• 1.5 x Interquartile Range
• Chebyshev’s Rule
• Empirical Rule
Chebyshev’s Rule (3.2)
For any set of values, at least
(1 - 1/k2) x 100%
of them will be within plus or minus k standard
deviations of the mean, where k is a number
greater than 1.
The Empirical Rule
For a Bell-Shaped Distribution:
• 68.3% of the values will be within 1 standard
deviation of the mean.
• 95.5% of the values will be within 2 standard
deviations of the mean, and
• 99.7% (almost all) of the values will be within 3
standard deviation of the mean.
Figure 3.3
A Bell-Shaped (Normal)
Distribution
68.3%
95.5%
99.7%
-3
-2
-1
0
1
2
3
Calculating z scores
value  mean
Z=
standard deviation
(3.3)
Covariance
(Population)
sxy =
 (x  
i
x
)( yi   y )
N
(3.4)
Figure 3.4 Covariance Possibilities
(b) Negative
(a) Positive
y
y
x
x
(c) Zero
y
x
In a), an upward sloping line best describes the points, indicating a positive
covariance. In b), the downward sloping line implies a negative covariance. In c), the
line has 0 slope, which means a covariance of 0.
Correlation Coefficient
(Population)
rxy =
s xy
(s x ) (s y )
(3.5)
Covariance
(Sample)
 ( x  x )( y  y )
i
sxy =
(3.6)
i
n 1
Correlation Coefficient
(Sample)
rxy =
s xy
(sx ) (s y )
(3.7)
Coefficient of Variation
(Population)
CV =
s

(3.8)
Geometric Mean (Version 1)
GM =
n
x1 x2 ..xn
(3.9)
Geometric Mean (Version 2)
GM =
n
EndingAmou
nt
BeginningAmount
(3.10)
Weighted Average
xw
 wixi
=
wi
(3.11)