Transcript Ch 3 Notes
Ch 3 – Examining Relationships
YMS – 3.1
Scatterplots
Some Vocabulary
Response Variable
– Measures an outcome of a study
– AKA dependent variable
Explanatory Variable
– Attempts to explain the observed outcomes
– AKA independent variable
Scatterplot
– Shows the relationship between two quantitative
variables measured on the same individuals
Scatterplots
Examining
– Look for overall pattern and any deviations
– Describe pattern with form, strength, and direction
Drawing
– Uniformly scale the vertical and horizontal axes
– Label both axes
– Adopt a scale that uses the entire available grid
Categorical Variables
– Add a different color/shape to distinguish between
categorical variables
Classwork p125 #3.7, 3.10-3.11
Homework: #3.16, 3.22 and 3.2 Blueprint
YMS – 3.2
Correlation
Correlation
Measures the direction and strength of the
linear relationship between two
quantitative variables
Facts About Correlation
Makes no distinction between explanatory and
response variables
Requires both variables be quantitative
Does not change units when we change units of
measurement
Sign of r indicates positive or negative association
r is inclusive from -1 to 1
Only measures strength of linear relationships
Is not resistant
In Class Exercises
p146 #3.28, 3.34 and 3.37
Correlation Guessing Game
Homework
3.3 Blueprint
YMS – 3.3
Least-Square Regression
Regression
Regression Line
– Describes how a response variable y changes as an
explanatory variable x changes
LSRL of y on x
– Makes the sum of the squares of the vertical
distances of the data points from the line as small as
possible
Line should be as close as possible to the points
in the vertical direction
– Error = Observed (Actual) – Predicted
LSRL
Equation of the LSRL
Slope
Intercept
Coefficient of determination – r2
The fraction of the variation in the values of y
that is explained by the least-squares regression
of y on x
Measures the contribution of x in predicting y
If x is a poor predictor of y, then the sum of the
squares of the deviations about the mean
(SST) and the sum of the squares of deviations
about the regression line (SSE) would be
approximately the same.
Understanding r-squared:
A single point simplification
Al Coons
Buckingham Browne & Nichols School
Cambridge, MA
[email protected]
y
Error w.r.t. mean model
Error eliminated by y-hat model
Proportion of error
eliminated by Y-hat
model
Error eliminated by y-hat model
=
Error w.r.t. mean model
r2 = proportion of variability accounted for by
the given model (w.r.t the mean model).
y
Error w.r.t. mean model
Error eliminated by y-hat model
Proportion of error
eliminated by Y-hat
model
Error eliminated by y-hat model
=
Error w.r.t. mean model
=
~
Facts about
Least-Squares Regression
Distinction between explanatory and response
variables is essential
A change of one standard deviation in x
corresponds to a change of r standard
deviations in y
LSRL always passes through the point
The square of the correlation is the fraction of
the variation in the values of y that is explained
by the least-squares regression of y on x
Classwork: Transformations and LSRL WS
Homework: #3.39 and ABS Matching to Plots Extension
Question (we’ll finish the others in class)
Residuals
observed y – predicted y or
Positive values show that data point lies above
the LSRL
The mean of residuals is always zero
Residual Plots
A scatterplot of the regression residuals
against the explanatory variable
Helps us assess the fit of a regression line
Want a random pattern
Watch for individual points with large
residuals or that are extreme in the x
direction
Outliers vs. Influential
Observations
Outlier
– An observation that lies outside the overall
pattern of the other observations
Influential observation
– Removing this point would markedly change
the result of the calculation
Classwork: Residual Plots WS
Homework: p177 #3.52 and 3.61
Doctor’s for the Poor
This will be graded for accuracy!
Ch 3 Review
p176 #3.50-3.51, 3.56, 3.59, 3.69,
3.76-3.77