Transcript Slide 1

Section 2.4
How Can We Describe the Spread of
Quantitative Data?
Agresti/Franklin Statistics, 1 of 63
Measuring Spread: Range

Range: difference between the largest
and smallest observations
Agresti/Franklin Statistics, 2 of 63
Measuring Spread: Standard
Deviation

Creates a measure of variation by
summarizing the deviations of each
observation from the mean and
calculating an adjusted average of these
deviations
s
( x  x )2
n 1
Agresti/Franklin Statistics, 3 of 63
Empirical Rule
For bell-shaped data sets:

Approximately 68% of the observations fall
within 1 standard deviation of the mean

Approximately 95% of the observations fall
within 2 standard deviations of the mean

Approximately 100% of the observations fall
within 3 standard deviations of the mean
Agresti/Franklin Statistics, 4 of 63
Parameter and Statistic

A parameter is a numerical summary of
the population

A statistic is a numerical summary of a
sample taken from a population
Agresti/Franklin Statistics, 5 of 63
Section 2.5
How Can Measures of Position
Describe Spread?
Agresti/Franklin Statistics, 6 of 63
Quartiles




Splits the data into four parts
The median is the second quartile, Q2
The first quartile, Q1, is the median of the lower
half of the observations
The third quartile, Q3, is the median of the
upper half of the observations
Agresti/Franklin Statistics, 7 of 63
Example: Find the first and third
quartiles
Prices per share of 10 most actively traded stocks on
NYSE (rounded to nearest $)
2 4 11 12 13 15 31 31 37 47
a.
b.
c.
d.
Q1 = 2
Q1 = 12
Q1 = 11
Q1 =11.5
Q3 =
Q3 =
Q3 =
Q3 =
47
31
31
32
Agresti/Franklin Statistics, 8 of 63
Measuring Spread: Interquartile
Range

The interquartile range is the distance
between the third quartile and first
quartile:
IQR = Q3 – Q1
Agresti/Franklin Statistics, 9 of 63
Detecting Potential Outliers

An observation is a potential outlier if
it falls more than 1.5 x IQR below the
first quartile or more than 1.5 x IQR
above the third quartile
Agresti/Franklin Statistics, 10 of 63
The Five-Number Summary

The five number summary of a
dataset:
• Minimum value
• First Quartile
• Median
• Third Quartile
• Maximum value
Agresti/Franklin Statistics, 11 of 63
Boxplot

A box is constructed from Q1 to Q3

A line is drawn inside the box at the median

A line extends outward from the lower end of
the box to the smallest observation that is not
a potential outlier

A line extends outward from the upper end of
the box to the largest observation that is not a
potential outlier
Agresti/Franklin Statistics, 12 of 63
Boxplot for Sodium Data
Sodium Data:
0 200
70 210
125 210
125 220
140 220
150 230
170 250
170 260
180 290
200 290
Five Number Summary:
Min: 0
Q1: 145
Med: 200
Q3: 225
Max: 290
Agresti/Franklin Statistics, 13 of 63
Boxplot for Sodium in Cereals
Sodium Data:
0 210
260 125
220 290
210 140
220 200
125 170
250 150
170 70
230 200
290 180
Agresti/Franklin Statistics, 14 of 63
Z-Score

The z-score for an observation measures how far
an observation is from the mean in standard
deviation units
observatio n - mean
z
standard deviation

An observation in a bell-shaped distribution is a
potential outlier if its z-score < -3 or > +3
Agresti/Franklin Statistics, 15 of 63
Chapter 3
Association: Contingency,
Correlation, and Regression
 Learn
….
How to examine links
between two variables
Agresti/Franklin Statistics, 16 of 63
Variables
Response variable: the outcome
variable
 Explanatory variable: the variable
that explains the outcome variable

Agresti/Franklin Statistics, 17 of 63
Association

An association exists between the
two variables if a particular value for
one variable is more likely to occur
with certain values of the other
variable
Agresti/Franklin Statistics, 18 of 63
Section 3.1
How Can We Explore the Association
Between Two Categorical Variables?
Agresti/Franklin Statistics, 19 of 63
Example: Food Type and
Pesticide Status
Agresti/Franklin Statistics, 20 of 63
Example: Food Type and
Pesticide Status


What is the response variable?
What is the explanatory variable?
Pesticides:
Food Type:
Organic
Conventional
Yes
No
29
98
19485
7086
Agresti/Franklin Statistics, 21 of 63
Example: Food Type and
Pesticide Status


What proportion of organic foods contain
pesticides?
What proportion of conventionally grown foods
contain pesticides?
Pesticides:
Food Type:
Organic
Conventional
Yes
29
19485
No
98
7086
Agresti/Franklin Statistics, 22 of 63
Example: Food Type and
Pesticide Status

What proportion of all sampled items contain
pesticide residuals?
Pesticides:
Food Type:
Organic
Conventional
Yes
No
29
98
19485
7086
Agresti/Franklin Statistics, 23 of 63
Contingency Table

The Food Type and Pesticide Status
Table is called a contingency table

A contingency table:
• Displays 2 categorical variables
• The rows list the categories of 1 variable
• The columns list the categories of the other
variable
• Entries in the table are frequencies
Agresti/Franklin Statistics, 24 of 63
Example: Food Type and
Pesticide Status

Contingency Table Showing Conditional
Proportions
Agresti/Franklin Statistics, 25 of 63
Example: Food Type and
Pesticide Status



What is the sum over each row?
What proportion of organic foods contained
pesticide residuals?
What proportion of conventional foods
contained pesticide residuals?
Pesticides:
Food Type:
Yes
No
Organic
0.23
0.77
Conventional 0.73
0.27
Agresti/Franklin Statistics, 26 of 63
Example: Food Type and
Pesticide Status
Agresti/Franklin Statistics, 27 of 63
Example: For the following pair of variables,
which is the response variable and which is
the explanatory variable?
College grade point average (GPA) and high
school GPA
a.
College GPA: response variable and
High School GPA : explanatory variable
b.
College GPA: explanatory variable and
High School GPA : response variable
Agresti/Franklin Statistics, 28 of 63
Section 3.2
How Can We Explore the Association
Between Two Quantitative
Variables?
Agresti/Franklin Statistics, 29 of 63
Scatterplot

Graphical display of two quantitative
variables:
• Horizontal Axis: Explanatory variable, x
• Vertical Axis: Response variable, y
Agresti/Franklin Statistics, 30 of 63
Example: Internet Usage and
Gross National Product (GDP)
Agresti/Franklin Statistics, 31 of 63
Positive Association

Two quantitative variables, x and y, are
said to have a positive association when
high values of x tend to occur with high
values of y, and when low values of x
tend to occur with low values of y
Agresti/Franklin Statistics, 32 of 63
Negative Association

Two quantitative variables, x and y,
are said to have a negative
association when high values of x
tend to occur with low values of y,
and when low values of x tend to
occur with high values of y
Agresti/Franklin Statistics, 33 of 63
Example: Did the Butterfly Ballot Cost
Al Gore the 2000 Presidential
Election?
Agresti/Franklin Statistics, 34 of 63
Linear Correlation: r

Measures the strength of the linear
association between x and y
•
•
•
•
A positive r-value indicates a positive association
A negative r-value indicates a negative association
An r-value close to +1 or -1 indicates a strong linear
association
An r-value close to 0 indicates a weak association
Agresti/Franklin Statistics, 35 of 63
Calculating the correlation, r
1
xx y y
r
(
)(
)

n 1
sx
sy
Agresti/Franklin Statistics, 36 of 63
Example: 100 cars on the lot of a
used-car dealership
Would you expect a positive association, a
negative association or no association between
the age of the car and the mileage on the
odometer?



Positive association
Negative association
No association
Agresti/Franklin Statistics, 37 of 63