Transcript Chapter 4

Summary Statistics:
Measures of Location and Dispersion
.
The sum of values,
x1  x2    xn
n
, can be denoted as
x
i 1
i
 Select
4 students and ask “how many brothers
and sisters do you have?”
• Data: 2, 3, 1, 3
4
x
i 1
i
 2  3 1 3  9
Or we can write
x  9
 cx  c x
 c  nc
x  c   x  c   x  nc
 Solve
the following:
 4x
 x  3
 4x  3
 4x  3
2
 Measure
of Central Tendency Description of Average (Typical Value)
 Sample
Mean:
x

x
n
 number
of siblings – Data: 2, 3, 1, 3
 Suppose
we had selected a 5th person for
our sample which had 10 siblings.
• New Data: 2, 3, 1, 3, 10
 The
sample mean is sensitive to extreme
values and does not have to be a possible
data value.
~
x
 rank
 if
 if
data from smallest to largest
n is odd, median is the middle score
n is even, median is the mean of two
middle scores
 number
 New
of siblings – Data: 2, 3, 1, 3
Data: 2, 3, 1, 3, 10
 Sample
median is not sensitive to extreme
scores
 Half
the data will fall above the sample
median and half below the sample median
 The
median is a better measure of central
tendency if extreme scores exist.
 If
extreme scores are unlikely, the mean
varies less from sample to sample than
the median and is a better measure.
 If
the distribution is right skewed
 If
the distribution is symmetric
 If
the distribution is left skewed
~
xx
~
xx
~
xx
 sample
mode: most frequent score
 Example: number
of siblings –
Data: 2,3,1,3
Mode = 3
 New
Data: 2,3,1,3,10
Mode = 3
 Mode
does not always exist/can be more than one
 Also, it
is unstable
 Should
be used with qualitative data
Low  High
2
 Example: number
of siblings –
Data: 2,3,1,3
 Midrange
 New
=
Low  High 1  3

2
2
2
Data: 2,3,1,3,10
Low  High 1  10

 5.5
2
2
 Midrange
=
 Midrange
is totally dependent on extreme scores.
 Percentiles
– gives the percentage below an
observation
 Quartiles
– divide the data into four equally
sized parts
 Q1
, First Quartile: 25th percentile
 Q2 , Second Quartile (
 Q3 , Third
~x ), 50th percentile
Quartile, 75th percentile
 Order
 Find

the data from smallest to largest
~
x . This is Q2
Q1 is the median of the lower half of the data;
that is, it is the median of the data falling below Q2
(not including Q2 )

Q3 is the median of the upper half of the data;
(same as above)
 Interquartile
 Range
5
range (IQR) = Q3 – Q1
of the middle 50% of the data
number summary – The low score, Q1, Q2, Q3,
and the high score
Students
0 0013555678
1 0
2
3
4
5
6
7
Faculty
0
1 055
2 04588
3 1
4 3
5
6
7 3
Students
Low = 0
Q1 = 1
Q2 = 5
Q3 = 7
High = 10
Faculty
Low = 10
Q1 = 15
Q2 = 25
Q3 = 31
High = 73
 The
box goes from Q1 to Q3 and represents IQR
 The
line through the box is Q2 ( ~
x )
 Extreme
values are identified by *’s
 Lines, called
whiskers, run from Q1 to the lowest
value and from Q3 to the highest value (If the
low or high are extreme then the whisker goes
to the next value)
80
70
Students
60
50
40
30
20
10
0
Students
Faculty
A
43
38
33
A
B
C
Distribution #1
1
2 5
3 5555555
4 5
5
Distribution #2
1 5
2 55
3 555
4 55
5 5
Distribution #1
X = 35
~ = 35
X
mode = 35
midrange =35
Distribution #2
X = 35
~ = 35
X
mode = 35
midrange = 35
Example: Years of experience of faculty
Data: 1, 30, 22, 10, 5
 Range
is sensitive to extreme scores
(Based entirely on the high and low)
 Range
is easy to compute
x
Sum of Squared X SSX
S 


n 1
n 1
2
 Large
 x  x 
2
n 1



n  x   x 
2
nn  1
values of suggest large variability
 It
is difficult to interpret since it is in square
units
 Keep
in mind it can never be negative
2
Example: Years of experience of faculty
Data: 1, 30, 22, 10, 5
sample standard deviation – measures the
average distance data points are from x
S  S2
Standard deviation is in the same units as the
data
Z-score – Gives the number of standard
deviations an observation is above or
below the mean
xx
z
s
Example: Test scores X = 79, s = 9
If your score is 88%, what is your z-score?
If your score is 63%, what is your z-score?
 Approximately
68% of the data fall within 1
standard deviation of the mean
( x  s, x  s )
 Approximately
95% of the data fall within 2
standard deviations of the mean
( x  2 s, x  2 s )
 Approximately
99.7% of the data fall within
3 standard deviations of the mean
( x  3s, x  3s )
Example:
Suppose that the amount of liquid in
“12 oz.” Pepsi cans is a mound shaped
distribution with x  12 oz. and s = 0.1 oz.