Dummies and Interactions

Download Report

Transcript Dummies and Interactions

Dummies (no, this lecture is
not about you)
POL 242
Renan Levine
February 13/15, 2007
Regression Interpretation Review

Interpretation of a Regression
“Countries" dataset.


The dependent variable is military expenditures as a % of
GDP.
There are two independent variables: % of GDP from
industry [INDUSTRY] and Civil Liberties Index [CIVLIB].
-----------------Variable
B
INDUSTRY
.040
CIVLIB
-.696
(Constant) 4.86
Variables in the Equation --------SE B
Beta
T
Sig T
.022
.135
1.82
.0705
.147
-.351
-4.73
.0000
.967
5.03
.0000
What to look at first



The primary question you are asking when you run a
regression is "what is the effect of x [independent variable]
on y [dependent variable]?"
So first look at the value of the unstandardized coefficients (B)
The unstandardized coefficient tells you how much the value of
the dependent variable is expected to change if the value of the
independent variable increases by one unit controlling for all
other independent variables.
 Ex: For INDUSTRY, B= 0.04.
 You can write this in a sentence like this: All else being equal, for
every one percent increase in the % of GDP from industry, the
amount of military expenditures as a % of GDP goes up 0.04
points [or 4 hundredths of a percent of GDP].
Are you sure?

Can we be sure that this is not the result of
sampling error?



For this we look at the significance (Sig T).
Ex: P = 0.0705.
You can write this result in a sentence like this:
The relationship between the % of GDP from
industry and the % of military expenditures is
significant [P < 0.5].
Are you worried about multicollinearity?


In this case, both variables are significant, so
we do not need to check the tolerance.
If two or more variables are insignificant,
what should you examine?
Which variable has the biggest effect?

Now that we have observed the effect of each independent
variable on the dependent variable, we can turn to the question
of which variable has the biggest effect on the dependent
variable.
 For this, we look at the standardized coefficient (Beta).
 In this example, the largest beta is for Civil Liberties.
 You can report this result in three sentences like this: According
to the standardized coefficients, the level civil liberties has a
bigger effect on a country's percentage of GDP spent on military
expenditures than the percentage of GDP from industry. A
change in one standard deviation in the level of civil liberties
causes a 0.35 decrease in the percentage of GDP spent on
military expenditures when controlling for percentage of GDP
from industry. In comparison, a change in one standard in the
percentage of GDP from industry causes a 0.13 increase in the
percentage of GDP spent on military expenditures.
Last



Finally, we can turn to the final question of
how much total variation the model explains.
The R-squared of the model provides this
information.
Remember that R-squared values will be
lower if your DV has fewer categories.

R-squared tends to be very high when
researching country-level data.
Multivariate Regression






You finished a worksheet, demonstrating you know
mechanics of a regression.
Now you need to start developing a model of the
world.
You are trying to explain your DV.
You must choose all those IVs that you (or others)
think explains the DV
Today: how to use nominal IVs.
Two tools to make your model a more accurate
description of the social world
Coefficient


Remember that the unstandardized
coefficient tell us how much a one-unit
increase in the independent variable affects
the dependent variable.
The dependent variable must be (at least)
ordinal.


The change from “strongly agree” to “agree” or
“disagree” to “strongly disagree” causes…
For each additional percentage point increase in
literacy…
Nominal Variables

Makes no sense to use nominal variables this way.



The change from New Brunswick to Ontario or P.E.I to
Quebec causes !?!?!?!
Being a Liberal rather than a Conservative, or being Green
rather than a NDP supporter?!?!?
But you can say..


The difference between New Brunswick and all other
provinces OR Ontario and all other provinces is, OR
Atlantic provinces and all others…
NDP supporters vs Liberals AND/OR Conservatives vs.
Liberals, Bloc vs. Liberals AND/OR Others vs. Liberals.
Make Nominal Vars Dichotomous!








Quebec vs. R.O.C.
Alberta vs. all other provinces
Men vs. Women
Africa vs. all other countries
City residents vs. all others
NDP supporters only
Catholics only
Or even…



People over 65 only
University educated vs. non-University educated
Wealthiest countries vs. all others
Rules for Dummies


“Dummy” Variables are dichotomous.
Can use dummy variables for each category of a nominal
variable minus one.


Minimum: use just one dummy


Example. Original variable has five categories (Liberal, Conservative, NDP,
Bloc, & Other)
1= NDP, 0= All others
Maximum: use # of categories – 1 = 4




1= NDP, 0= Liberals; *
1= Tory, 0= Liberals; *
1= Bloc, 0= Liberals; *
1= Other, 0= Liberals;*
* 0= Liberals when # of categories-1 dummies are created if Liberals are excluded.
All options in between are also okay!
One remaining category is the reference


What is the reference?


Reference category is the baseline.
When you use maximum number of
categories, you are comparing each dummy
to the [omitted] reference.

Often is the most popular category.


Ex. For Canadian partisanship, Liberals are most
popular, so if we have four dummies we often omit the
Liberals and compare NDP-ers, Tories, Separatists and
Others to the Liberals.
Your call as to what you want baseline to be.
What does a dummy do?
Y
Dummy’s B
Since X is
zero or one, it
is just like
adding
another
constant!
{
Constant
(Y intercept)
X
Next



Work on the lab!
You have a write-up to do.
You can use previous results, but you should
really think about DV and IVs


You may use dummies in your write-up.
You may need to use dummies (and/or
interactions) in your final paper (when we most
care about the accuracy of your model).