Bivariate Regression Analysis

Download Report

Transcript Bivariate Regression Analysis

Interpreting Bi-variate OLS Regression
• Stata Regression Output
• Regression plots and RSS
• R2 -- Coefficient of Determination
– Adjusted R2
• Sample Covariance/Correlation
• Hypothesis Testing
– Standard Errors
– T-tests and P-values
February 14, 2006
Lecture 5a
Slide #1
Stata Regression Model:
Regressing Political Ideology Scale onto “Militant”
Y average of variables 81 and
82 (reversed), 84, 85 and 86
X is variable 98: Political Ideology
1 = “strong Lib” 7=“strong Cons”
.5
.3
.4
.2
.3
D
it y
ns
De
y
.2sit
en
.1
.1
0
0
0
2
February 14, 2006
4
mili tant
6
8
0
Lecture 5a
2
4
p98_ideol
6
8
Slide #2
Regression Output
regress militant p98_ideo, beta
Source |
SS
df
MS
-------------+-----------------------------Model | 885.217261
1 885.217261
Residual | 2499.09626 2582 .967891658
-------------+-----------------------------Total | 3384.31352 2583 1.31022591
Number of obs =
2584
F( 1, 2582) = 914.58
Prob > F
= 0.0000
R-squared
= 0.2616
Adj R-squared = 0.2613
Root MSE
= .98381
-----------------------------------------------------------------------------militant |
Coef. Std. Err.
t
P>|t|
Beta
-------------+---------------------------------------------------------------p98_ideo | .3650334 .0120704
30.24 0.000
.5114341
_cons | 2.289487 .0547324
41.83 0.000
.
-----------------------------------------------------------------------------February 14, 2006
Lecture 5a
Slide #3
Regression Descriptive Statistics
corr militant p98_ideo, means
Variable |
Mean Std. Dev.
Min
Max
-------------+---------------------------------------------------militant |
3.837771
1.144651
1
7
p98_ideo |
4.241486
1.603726
1
7
| militant p98_ideo
-------------+-----------------militant | 1.0000
p98_ideo | 0.5114 1.0000
February 14, 2006
Lecture 5a
Slide #4
0
2
4
6
8
Regression Plot
0
2
4
p98_ideol
95% CI
6
8
Fitted values
mili tant
February 14, 2006
Lecture 5a
Slide #5
Measuring “Goodness of Fit”
• Root of Mean Squared Error (“Root MSE”)
se 
RSS
2
, where RSS =  e , K = paramet ers
n K
– Measures spread around the regression line
• Coefficient of Determination (R2)
ESS   (Yˆi  Y ) 2 and TSS   (Yi  Y ) 2
“model” or explained sum of squares
“total” sum of squares
2
e

ESS
RSS
2
R 
and (1  R ) 

TSS
TSS  (Yi  Y ) 2
2
February 14, 2006
Lecture 5a
Slide #6
Explaining R2
For each observation Yi, variation around the mean can
be decomposed into that which is “explained” by the
regression and that which is not:
Book terminology:
TSS = (all)2
RSS = (unexplained)2
ESS = (explained)2
unexplained deviation
explained deviation
Y
Yˆ
February 14, 2006
Lecture 5a
Stata terminology:
Residual = (unexplained)2
Model = (explained)2
Total = (all)2
Slide #7
Sample Covariance & Correlation
• Sample covariance for a bivariate model is
defined as:
 (Xi  X )(Yi  Y )
sXY 
n 1
• Sample correlations (r) “standardize”
covariance by dividing by the product of the
X and Y standard deviations:
sX Y
r
sX sY
February 14, 2006
Sample correlations range from
-1 (perfect negative relationship) to
+1 (perfect positive relationship)
Lecture 5a
Slide #8
Standardized Regression Coefficients
(aka “Beta Weights” or “Betas”)
• Formula:
sX
b  b1
sY
• In our example:
*
1
1.604
0.365 
 0.511
1.145
• Interpretation: the number of std. deviations
change in Y one should expect from a one
std. deviation Change in X.
February 14, 2006
Lecture 5a
Slide #9
Hypothesis Tests for Regression
Coefficients
• For our model: Yi = 2.289+0.365*Xi+ei
• Another sample of 2584 observations would lead
to different estimates for b0 and b1. If we drew
many such samples, we’d get the sample
distribution of the estimates
• We need to estimate the sample distribution,
(because we usually can’t see it) based on our
sample size and variance
February 14, 2006
Lecture 5a
Slide #10
To do that we calculate SEbs
(Bivariate case only)
se
SEb1 
, where TSSX   (Xi  X ) 2
TSSX
SEb0  se
February 14, 2006
1
X2

n TSSX
Lecture 5a
Slide #11
Interpreting Standard Errors
• For our model:
– b0 = 2.289, and SEb0 = 0.055
– b1 = 0.365, and SEb1 = 0.012
The T-test reports the number of
standard errors our estimate falls
away from zero. Thus, the “T” for
b1 is 30.24 for our model. (rounding!)
0
(which is 30.24 SEb1 “units”
away from b1)
February 14, 2006
Assuming that we estimated the
sample standard error correctly, we
can identify how many standard
errors our estimate is away from
zero.
Estimated Sampling Distribution for b1
b1 = 0.365
b1 - SEb1= 0.353
Lecture 5a
b1 + SEb1= 0.373
Slide #12
Classical Hypothesis Testing
Assume that b1 is zero. What is the probability that your sample would have
resulted in an estimate for b1 that is 30.24 SEb1’s away from zero?
To find out, determine the cumulative density of the estimated sampling
distribution that falls more than 30.24 SEb1’s away from zero.
See Table A4.1, page 350, in Hamilton. It reports discrete “p-values”, given
the sample size and t-values. Note the distinction between 1 and 2 sided tests
In general, if the t-stat is above 2,
the p-value will be <0.05 -- which is
the acceptable upper limit in a
classical hypothesis test.
Note: in Stata-speak,
a p-value is a “p>|t|”
Assume that b1 = 0.0
(null hypothesis)
February 14, 2006
Estimated b1 = 0.365
(working hypothesis)
Lecture 5a
Slide #13
Coming up...
• For Tuesday
– Use variables 87-89 to make an “egalitarian” index for
your dependent variable (Y)
– Use p98_ideo (ideology) as the independent variable
(X) to predict egaitarianism. Fully interpret the results.
• Walk through the entire interpretation
• Build a Stata do-file as you go
• For Next Week:
– Remainder of Chapter 2
• Schedule:
– Feb 21: Residual Analysis & Exam Review
– Feb 28: Exam
February 14, 2006
Lecture 5a
Slide #14