regression.g492

Download Report

Transcript regression.g492

G492 /G751
Eric Rasmusen,
[email protected]
29 September 2009
Statistics and Regression
1
Professor Rasmusen,
Greetings, I hope all is well. I was your student in G492 and G406 in the Spring of 2007
– so it has been a while. I wrote my thesis for your class on Charter Schools, in case
that rings any bells. I was writing because a short series of events prompted me to want
to contact the BEPP program at Kelley. I graduated last May from Kelley with Honors
through the Mitte Program, and distinction in my class. I fully enjoyed the entire program
and BEPP offered an outstanding opportunity to couple my business degree with
economic and policy education. I currently am a Congressional Staffer working in
Washington for Congressman John W. Olver of Massachusetts. Just this morning I was
working on some data for the Congressman and used many of the tools I learned in
Kelley, and subsequently impressed the audience in our meeting. After coming out of
the meeting I felt a great sense of pride that I could use what I learned to “show off” my
Kelley education (most of my colleagues have gone to small east coast private
schools). I went online and visited the BEPP website out of curiosity and noticed there
is a listing of recent and older graduates who have gone through BEPP. Most of my
colleagues went on to Consulting Firms and other Private Sector options and I think it
could be beneficial to the program to also highlight a student who went on to work in
policy.
I wasn’t certain who to contact with this thought, so I considered writing to you, my
former professor, and seeing if you could forward this on to whomever would be in
charge of the website’s information. I hope the semester has started off well, and
Bloomington is warming up quickly!
2
How would you explain how
regressions work?
Here is a fun way.
3
What does it mean to use a
test that rejects the hypothesis
that a coin is fair, at the 5%
significance level?
4
What does it mean to use a test that rejects
the hypothesis that a coin is fair, at the
5% significance level?
---That the test will falsely reject 5% of the
times it is used.
NOT the probability that the null
hypothesis is true, given that you
accepted it.
5
Consider testing whether a coin is
fair by flipping once and rejecting
fairness if Heads comes up. That test
will falsely reject 50% of the time.
Even a test that says you reject
fairness if Heads comes up more than
100 out of 110 throws will still reject
falsely with some probability.
Rice Applet: Confidence Levels
6
KNOW THY DATA!
In being persuasive, data management is
important. Anyone can understand a
data entry mistake. That destroys
credibility.
Maybe have one member of the team just
for data.
7
ETHICS
“… the problem is not that one is always
being asked to step across a welldefined line by unscrupulous lawyers.
Rather, it is that one becomes caught
up in the adversary proceeding itself
and acquires the desire to win. …”
“Continuing to regard oneself as objective,
one can slip little by little from true
objectivity.”
8
An Excel file with
the data is here.
9
An Excel file with
the data is here.
10
An Excel file with
the data is here.
The correlation coefficient is the square root of the R2 for a regression of one
variable on a constant and the other variable.
11
An Excel file with
the data is here.
If income per capita
rises by xxx, how
much does the
abortion rate rise?
12
Principles of Regression
Presentation
1. Only present relevant and meaningful numbers.
2. Do not write 1.23423 when rounding to 1.23 will do just
as well. Fewer digits yield greater clarity.
3. Use correlation matrices to show the simple correlations
between important variables.
4. Give summary statistics. Think about which are most
useful. Think about presenting the mean, median, mode,
minimum, maximum, standard deviations, and number of
observations. Do not present all of these--think.
13
5. Use words for variable names, not computer codes.
6. Present the coefficients, standard errors or t-statistics
(not both), R2, and number of observations. Do not
present other statistics (e.g. an F-test for all coefficients
equalling zero) unless you have a reason to.
7. If the left-hand variable (y-variable, dependent variable,
endogenous variable) takes only a few values (e.g., 0
and 1) then use a special technique such as logit or tobit.
If a right-hand variable (x-variable, independent
variable, exogenous variable) takes only a few values,
that does not create a need to use anything besides
14
OLS.
Consider using nonlinear specifications,
as illustrated in the java applet at
http://www.ruf.rice.edu/~lane/stat_sim/transf
ormations/index.html
15
Principles of Tables
1. Keep the data-to-ink ratio high.
2. Leave out dividing lines and boxes unless you have a
good reason for them.
3. Leave off repetitive, useless numbers.
4. Don't use just capital letters.
5. Circle or otherwise mark important numbers, in
particular, ones you mention in the text or talk.
6. Make the table self-contained. Don't require the reader to
refer to the text or a previous table. Include the source
and the units of measurement.
7. Number and title every table.
16
Uniform Crime Reports: Crime in the United States, Section IV
17
Regression Tables
• Ginger Jin and Phillip Leslie (2003): "The
Effect of Information on Product Quality:
Evidence from Restaurant Hygiene Grade
Cards," Quarterly Journal of Economics,
118(2): 409-451(May 2003)
18
19
20
21
22
23
24