Welcome to Applied Multiple Regression (Psych 308c)! Spring 2012
Download
Report
Transcript Welcome to Applied Multiple Regression (Psych 308c)! Spring 2012
Welcome to Applied Multiple
Regression (Psych 308c)!
Spring 2013 MI
Professor Dale Berger [email protected]
ACB101 Tue 2:00-4:00
Teaching Associates: 909-621-8084
Aly
Val
Maggie
Nic
Stephen
[email protected]
Monday 4:00 -- 5:50
[email protected]
Tuesday 4:00 -- 5:50
[email protected] Wednesday 11:00 -- 12:50
[email protected] Wednesday 4:00 -- 5:50
[email protected] Thursday
4:00 -- 5:50
ACB 119
ACB 208
ACB 119
ACB 208
ACB 208
Overview of the course
Lecture: conceptual, interpretation, presentation
(take notes, rewrite notes, discuss)
Homework: conceptual, computer, writing
(discuss, critique, check your mastery)
Lab/Review sessions: discussion, guidance
Sources: Packet, Howell, Sakai, Internet
Exams: conceptual (interpretation, explanation)
Plan for today
•
•
•
•
Review the packet
Brief history of correlation/regression
Some vocabulary and notation
Univariate and bivariate normal distributions
Brief history
Helen M. Walker (1891-1983)
Studies in the history of statistical method (1929)
• In the mid-1800s a number of individuals were hovering on
the verge of discovering correlation, but the conceptual
breakthrough came from Sir Frances Galton (1822-1911).
Galton was interested in genetics
• Cousin of Charles Darwin
• Collected vast amounts of data
• Among first to graph data
Chart from Sir Francis Galton
5
Galton, F. (1886). Regression Towards
Mediocrity in Hereditary Stature, Journal of
the Anthropological Institute of Great Britain
and Ireland, 15, 246-263.
Slope of the
line (r) is an
index of the
strength of
the regression
Co-relations
and their
measurement
(1888)
6
The Pearsons and Fisher
• Galton (1822 – 1911) left money to endow a chair in
“eugenics” at University College in London
– He asked Karl Pearson to hold the first chair in Applied Statistics
– Second holder was Sir Ronald Fisher (1890-1962)
• Department split
– Eugenics with Fisher as chair
– Applied Statistics with Egon Pearson (1895-1980) as chair
• Helen Walker (1891-1983) worked with both men
– Retired in Claremont, taught statistics in Psychology Department
at Claremont Graduate School until Spring 1970
– Four degrees of separation:
Galton Fisher/Pearson Walker Berger YOU
Goal: Predict Y from X
7
6
5
Y4
3
2
1
1
Example:
X = Likert scale of Job Satisfaction;
Y = Likert scale of Intention to Stay
2
3
4
X
5
6
7
Model 1: Use Y bar to predict Yi
7
6
5
Y4
3
2
1
1
2
3
4
X
5
6
7
Model 1: Use Y bar to predict Yi
7
6
5
Y4
3
2
1
1
2
3
4
5
6
7
X
The sum of the squared deviations = SSTotal = 10.
This is the numerator of the variance of Y, an index of
the variability of Y to be potentially explained.
Model 2: Use regression to predict Yi
7
6
5
Y4
3
2
1
1
2
3
4
X
5
6
7
Model 2: Use regression to predict Yi
7
6
5
Y4
3
2
1
1
2
3
4
5
6
7
X
The sum of the squared errors = SSerror = 6.4 ;
This model leaves 6.4/10 = 64% of the variance of Y
unexplained, so it accounts for 36% of the Y variance.
Model 2b: Portion of SSTotal explained
7
6
5
Y4
3
2
1
1
2
3
4
X
5
6
7
Model 2b: Portion of SSTotal explained
7
6
5
Y4
3
2
1
1
2
3
4
5
6
7
X
The sum of the squared deviations of the regression
predictions from the mean = SSreg = 3.6 . The proportion
of Y variance accounted for by the model is 3.6/10 = 36%.
The Regression Model
𝑌𝑖 = 𝑎 + 𝑏𝑋𝑖
7
6
5
Y4
3
2
1
1
2
3
4
5
X
For these data, r = .60, b = .60, and a = 1.60.
r2 = .62 = .36 = 36%; Predicted Y = 1.6 + .6 Xi
6
7
The Regression Model
𝑌𝑖 = 1.6 + .6𝑋𝑖
Intercept = 1.6
Slope = +.60
7
6
5
Y4
3
2
1
1
2.2
4
4.0
7
5.8
1
2
3
4
X
5
6
7
Example: Use height to predict weight
X = Height; Y = Weight
𝑌𝑖 = 𝑎 + 𝑏𝑋𝑖
60
65
70
110
135
160
-190 + 5x60 = -190 + 300 = 110
17
160
Weight (Y)
rise
110
run
60
70
Height (X)
Slope = rise / run = (160-110)/(70-60) = 50 / 10 = 5
Intercept = (predicted Y when X = 0) = -190
General Linear Model (GLM)
19
Applications of Regression
Describe relationships
r, multiple R2, models
Test hypotheses
F test of R and R2 change, t-test of B
Predict
Formulas to predict
20
Assumptions depend upon the application
Describe a sample: Assume linear relationship
R2 is the proportion of Y variance explained by a
linear composite of X variables
Test hypotheses: random sampling, normally
distributed and independent errors,
homogeneity of error variance
Predict future: Also need to assume that the
system is stable
21
Normal Distribution
Z
Z
Bivariate Normal Distribution
Bivariate plot (scatterplot)
Could come from a bivariate normal distribution
Bivariate data example
Data constructed by Anscombe consists
of four sets of 11 pairs of X-Y scores.
Can these four sets of data be pooled?
Bumble applied regression to each set
separately and recorded summary
statistics.
Anscombe, F. J. (1973). Graphs in statistical analysis. The American Statistician, 27, 17-21.
25
These summary statistics are
identical in all four sets
Sample Size (N)
11
Mean of X
9.0
Mean of Y
7.5
Correlation
0.816
Linear Equation
Regression SS
Residual SS
y′ = 3 + .5x
27.50
13.75 (df = 9)
26
Bumble’s Conclusions:
• There is a strong linear relationship
between X and Y, as is apparent from
r = .819, F(1, 9) = 18.00, p=.00217.
• All four data sets are equivalent
and probably were sampled from
the same population.
• What more would you like to know?
• LOOK at the data!!
27
Anscombe’s four data sets
28
Data Set 1
29
Data Set 2
30
Data Set 3
31
Data Set 4
32
33
You can see a lot by just looking
---Yogi Berra
If you don’t look, you won’t see it.
--- DB
34
Describing distributions
• What information is needed?
– Univariate
• Shape (Normal?),
• mean, standard deviation
– Bivariate
• Shape (Bivariate normal? Linear? Correlation?)
• For X and Y: mean and standard deviation
Selected references
Anscombe, F. J. (1973). Graphs in statistical analysis, The American Statistician,
27, 17-21.
Berger, D. E. et al. (2014). Web Interface for Statistics Education: WISE
http://wise.cgu.edu
Galton, F. (1888). Co-relations and their measurement, chiefly from
anthropometric data. Proceedings of the Royal Society, 45, 135-145.
Galton, F. (1886). Regression towards mediocrity in hereditary stature. The Journal of
the Anthropological Institute of Great Britain and Ireland, 15, 246-263.
Walker, H. M. (1929). Studies in the history of statistical method. The Williams and
Wilkins Co., Baltimore, 1929
36