Transcript ch 5
Center
• Midrange: taking the average of the maximum &
the minimum values. Sensitive to outliers.
• Median: the value that divides the data into 2
equal areas. More variation, less effective.
• Mean: sum of all the data values divided by the
count. The point at which the histogram would
balance. Outliers & skewness “pull” the mean in
that direction.
Spread
• Range: the difference between the
maximum and minimum values
• Interquartile range: when you divide the
data into quartiles, the difference between
the first and third quartiles. (Q3 - Q1) Q1
= 25%ile; Q3 = 75%ile
• 5 Number Summary: reports the median,
quartiles, maximum and minimum values.
Used to construct a boxplot.
• Constructing a boxplot:
– 1) mark Q1, Q3 & median to create your box.
– 2) Erect “fences” at 1.5 and 3 IQR’s below Q1
and above Q3 (OUTLIERS!)
– 3) Add whiskers to extreme values within
fences
– 4) Mark any outliers with special symbols
County Size of New Jersey
• Interpretation of boxplots
– Median centered in box…middle half is
symmetric
• If not, skewed to the other side
– Whiskers same length…symmetric
• If not, skewed to side with longer whisker
Histogram
Aaron vs.
Bonds
Barry Bonds
Hank Aaron
16
25
24
19
33
13
27
26
44
30
28
34
46
37
33
39
40
34
45
44
42
40
37
34
49
24
32
44
39
29
73
46
45
45
5
44
38
47
34
40
26
28
20
12
10
Comparative B & W
• Deviation: how far each value is from the
mean.
• Variance: adding the squared deviations
and finding their average.
s
2
y y
n 1
2
s
y y
2
n 1
• Standard Deviation…
s
y y
n 1
2
Counting Chips
• Rules for center/shape/spread:
– Skewed Data: report median & IQR
– Symmetric Data: Report mean and standard
deviation
– If outliers, report mean and standard deviation
with and without the outliers.
MC ???