Comparing the Mean and Median

Download Report

Transcript Comparing the Mean and Median

Chapter 5
Describing Distributions
Numerically
Describing a Quantitative Variable
using Percentiles

Percentile
– A given percent of the observations are
less than this value.
– Ex. 10th percentile - 10% of the
observations of the variable are less than
the 10th percentile.
– Ex. 90th percentile - 90% of the
observations of the variable are less than
the 90th percentile.
Important Percentiles
Minimum – 0th percentile
 Q1 – 25th percentile (called the first
quartile)
 Median – 50th percentile
 Q3 – 75th percentile (called the third
quartile)
 Maximum – 100th percentile

Median

50th percentile
– 50% of the observations are below the median
– 50% of the observations are above the median

Median is the ______________________
 Measures the __________ of the
observations
Properties of the Median

Which observations affect the median?

73 is an outlier
– Does this observation affect the median?
Range
Measures spread (variability)
 Minimum – 0th percentile
 Maximum – 100th percentile
 Range = _______________________

Properties of the Range

Which observations affect the range?

73 is an outlier
– Does this observation affect the range?
IQR (Interquartile Range)
Measures spread (variability)
 IQR = Q3 - Q1
 Spread of the center 50% of the
observations

Finding Q1 and Q3

In general,
– Q1 is the _________ of the lower half of
the ordered observations.
– Q3 is the _________ of the upper half of
the ordered observations.

Actual calculations from textbook and R
may be slightly different.
IQR of Home Runs Per Season for
Barry Bonds

Order the home runs from smallest to largest
5 16 19 24 25 25 26 33 33 34 34
37 37 40 42 45 45 46 46 49 73
 Lower Half
– 5 16 19 24 25 25 26 33 33 34 34
– Q1 = 25

Upper Half
– 34 37 37 40 42 45 45 46 46 49 73
– Q3 = 45

IQR = 45 – 25 = 20
Five Number Summary
–Min = ____
–Q1 = ____
–Median = _____
–Q3 = _____
–Max = _____
Graph of Five Number Summary

Boxplot
– Box ___________________________.
– Line in the box marks the ____________.
– Lines extend out from box to the most
extreme data point which is no more than
1.5 times the IQR from the box.
A
B
C
D
E
F
0
5
10
15
20
25
Mean

Ordinary average
– Add up all observations.
– Divide by the number of observations.
Mean

Formula
– n observations
– y1, y2, y3, …, yn are the observations.
n
y1  y2  y3    yn
y

n
y
i 1
n
i
Properties of the Mean

What effect do the observations have
on the mean?

73 is an outlier. What effect does this
observation have on the mean?
Standard Deviation
Measures spread (variability)
 “Average” spread from mean.
 Denoted by letter s.

Standard Deviation
n
( y1  y )  ( y 2  y )    ( yn  y )
s

n 1
2
2
2
 ( y  y)
i 1
i
n 1
2
Standard Deviation

Usually calculate using computer or
calculator.
– Choose n-1 option on calculator.

Do once by hand
– Make a table.
Properties of s

s≥0
– s = 0 only when all observations are equal.
– s > 0 in all other cases.

s has the same units as the data.
Properties of s

What effect do the observations have
on the value of s?

73 is an outlier. What effect does this
observation have on the value of s?
Comparison of the Mean and Median

Median

Mean
Mean vs. Median

Mean and Median are generally similar
when
– Distribution is ________________

Mean and median are generally
different when either
– Distribution is ________________
– ___________ are present.
Influence of Outliers on the Mean
and Median

Small Example: Income in a small town
of 6 people
$25,000 $27,000 $29,000
$35,000 $37,000 $38,000
Mean income is $31,830
 Median income is $32,000

Influence of Outliers on the Mean
and Median
– Bill Gates moves to town.
$25,000 $27,000 $29,000
$35,000 $37,000 $38,000 $100,000,000
– The mean income is $14,313,000
– The median income is $35,000
Influence of Skewness on the Mean
and Median

The observations in the tail influence
the mean. These observations do not
influence the median.
– Skewed to the right (large values)
____________________
– Skewed to the left (small values)
____________________
Final Word - Mean vs. Median

Always question when means are
reported for skewed data
– Income
– Housing prices
– Course grades
Which summaries are the best?

Five Number Summary
– ______________________
– ______________________

Mean and Standard Deviation
– ______________________

ALWAYS GET A PICTURE OF YOUR
DATA.