Parameter Estimation, Dummy Variables, & Goodness of Fit
Download
Report
Transcript Parameter Estimation, Dummy Variables, & Goodness of Fit
Parameter Estimation,
Dummies, & Model Fit
We know mechanically how to “run a
regression”…but how are the
parameters actually estimated?
How can we handle “categorical”
explanatory (independent)
variables?
What is a measure of “goodness of
fit” of a statistical model to data?
Example: Alien Species
Exotic species cause economic and
ecological damage
Not all countries equally invaded
Want to understand characteristics
of country that make it more likely
to be “invaded”.
Understanding Invasive
Species
Steps to improving our understanding:
1. Generate a set of hypotheses (so they
can be “accepted” or “rejected”)
2. Develop a statistical model. Interpret
hypotheses in context of statistical
model.
3. Collect data. Estimate parameters of
model.
4. Test hypotheses.
2 Hypotheses (in words)
•
We’ll measure “invasiveness” as
proportion of Alien/Native species
(article by Dalmazzone).
1.
Population density plays a role in a
country’s invasiveness.
2.
Island nations are more invaded than
mainland nations.
Population Density
2.0
A.N
1.5
1.0
0.5
0.0
0
200
400
600
Pop.dens
800
1000
1200
Island vs. Mainland
2.0
A.N
1.5
1.0
0.5
0.0
-0.1
0.1
0.3
0.5
Island
0.7
0.9
1.1
Variables
Variables:
Dependent: Proportion of number of
alien species to native species in each
country.
Independent:
• Island?
• Population Density
• GDP per capita
• Agricultural activity
Computer Minimizes Sei2
Remember, OLS finds coefficients
that minimize sum squared
residuals
Graphical representation
Why is this appropriate?
Can show that this criterion leads to
estimates that are most precise
unbiased estimates.
Dummy Variable
Generally:
Use a “Dummy Variable”. Value = 1 if
country is Island, 0 otherwise.
More generally, if n categories, use n-1
dummies.
Male/Female; Pre-regulation/Post-regulation;
etc..
E.g. if want to distinguish between 6
continents
Problem: Lose “degrees of freedom”.
A Simple Model
A simple linear model looks like this:
Ai 1 2 ISLi 3Pi 4GDPi 5 AGR i i
Dummy changes intercept (explain).
Interaction dummy variable?
E.g. Invasions of island nations more strongly
affected by agricultural activity.
Translating our Hypotheses
2 Hypotheses
Hypothesis 1: Population: Focus on 3
Hypothesis 2: Island: Focus on 2
“Hypothesis Testing”… forthcoming in course.
Parameter Estimates:
(Intercept)
Island
Pop.dens
GDP
Value
-0.0184
0.0623
0.0010
0.0000
Std.Error t value Pr(>|t|)
0.0859
-0.2141 0.8326
0.0821
0.7588 0.4564
0.0002
6.2330 0.0000
0.0000
3.3249 0.0032
Agr -0.0014 0.0015
-0.9039 0.3763
“Goodness of Fit”: R2
“Coefficient of Determination”
R2=Squared correlation between Y
and OLS prediction of Y
R2=% of total variation that is
explained by regression, [0,1]
OLS maximizes R2.
Adding independent cannot R2
Adjusted R2 penalizes for # vars.
Answers
Island nations are more heavily invaded
(.0623)
Not significant (p=.46)
Population density has impact on
invasions (.001)
Significant (p=.0000)
R2=.80; about 80% of variation in
dependent variable explained by model.
Also, corr(A,Ahat)=.89