Transcript Power 14
Power 14
Goodness of Fit
& Contingency Tables
1
II. Goodness of Fit & Chi
Square
Rolling
a Fair Die
The Multinomial Distribution
Experiment: 600 Tosses
2
The Expected Frequencies
Outcom e
P robability
1
2
3
4
5
6
1/6
1/6
1/6
1/6
1/6
1/6
Expected Fre quency
100
100
100
100
100
100
3
The Expected Frequencies & Empirical Frequencies
Outcom e
1
2
3
4
5
6
Expected Fre quencies
100
100
100
100
100
100
Empirical
Expected Fre Frequency
quency
114
94
84
101
107
107
4
Hypothesis Test
Null
H0: Distribution is Multinomial
Statistic: (Oi - Ei)2/Ei, : observed minus
expected squared divided by expected
Set Type I Error @ 5% for example
Distribution of Statistic is Chi Square
One Throw, side one comes up: multinomial distribution
n
n
P(n1 =1, n2 =0, n3 =0, n4 =0, n5 =0, n6 =0) = n!/ n( j )][ p( j )]n( j )
j 1
j 1
P(n1 =1, n2 =0, n3 =0, n4 =0, n5 =0, n6 =0)= 1!/1!0!0!0!0!0!(1/6)1(1/6)0
(1/6)0 (1/6)0 (1/6)0 (1/6)0
5
Chi Square: x2 = (Oi - Ei)2 = 6.15
Face
Observed, Oj
Expected, Ej
(Oj – Ej)2 /Ej
Oj - Ej
1
114
100
14
196/100 = 1.96
2
92
100
-8
64/100 = 0.64
3
84
100
- 16
256/100 = 2.56
4
101
100
1
1/100 = 0.01
5
107
100
7
49/100 = 0.49
6
107
100
7
49/100 = 0.49
Sum = 6.15
6
0.20
DENSITY
0.15
0.10
0.05
5%
0.00
0
5
10
CHI
15
11.07
Chi Square Density for 5 degrees of freedom
Contingency Table Analysis
Tests
for Association Vs. Independence For
Qualitative Variables
8
Does Consumer Knowledge Affect Purchases?
Frost Free Refrigerators Use More Electricity
P urchase
Frost Free
Not Frost Free
Totals
Consum er Inform
Cons. Not Inform
.
Totals
9
Marginal Counts
Purchase
Frost Free
Not Frost Free
Totals
Consum er Inform
540
Cons. Not Inform
180
.
Totals
432
288
720
10
Marginal Distributions, f(x) & f(y)
Purchase
Frost Free
Not Frost Free
Totals
Consum er Inform
0.75
Cons. Not Inform
0.25
.
Totals
0.6
0.4
1
11
Joint Disribution Under Independence
f(x,y) = f(x)*f(y)
Purchase
Frost Free
Not Frost Free
Totals
Consum er Inform
0.45
0.3
0.75
Cons. Not Inform
0.15
0.1
0.25
.
Totals
0.6
0.4
1
12
Expected Cell Frequencies Under Independence
Purchase
Frost Free
Not Frost Free
Totals
Consum er Inform
324
216
540
Cons. Not Inform
108
72
180
.
Totals
432
288
720
13
Observed Cell Counts
Purchase
Frost Free
Not Frost Free
Totals
Consum er Inform
314
226
Cons. Not Inform
118
62
.
Totals
14
Contribution to Chi Square: (observed-Expected)2/Expected
P urchase
Frost Free
Not Frost Free
Totals
Consum er Inform
0.31
0.46
Cons. Not Inform
0.93
1.39
.
Totals
Upper Left Cell: (314-324)2/324 = 100/324 =0.31
Chi Sqare = 0.31 + 0.93 + 0.46 +1.39 = 3.09
(m-1)*(n-1) = 1*1=1 degrees of freedom
15
Figure 4: Chi-Square Dens ity, One Degree of Freedom
1.0
0.8
0.6
Dens ity
0.4
0.2
5%
0.0
0
2
4
6
5.02
8
10
Chi-Square Variable
12
14
Conclusion
No
association between consumer
knowledge about electricity use and
consumer choice of a frost-free refrigerator
17
Using Goodness of Fit to
Choose Between Competing
Probability Models
Men
on base when a home run is hit
18
Men on base when a home
run is hit
#
0
Observed 421
Fraction
1
2
3
Sum
227
96
21
765
0.550 0.298 0.125 0.027 1
19
Conjecture
Distribution
is binomial
20
Average # of men on base
#
0
1
2
3
fraction
0550
0.298 0.125
0.027
product
0
0.298 0.250
0.081
Sum of products = n*p = 0.298+0.250+0.081 = 0.63
pˆ npˆ / n 0.63/ 3 0.21
21
Using the binomial
k=men on base, n=# of trials
= [3!/0!3!] (0.21)0(0.79)3 = 0.493
P(k=1) = [3!/1!2!] (0.21)1(0.79)2 = 0.393
P(k=2) = [3!/2!1!] (0.21)2(0.79)1 = 0.105
P(k=3) = [3!/3!0!] (0.21)3(0.79)0 = 0.009
P(k=0)
22
Assuming the binomial
The
probability of zero men on base is
0.493
the total number of observations is 765
so the expected number of observations for
zero men on base is 0.493*765=377.1
23
Goodness of Fit
#
0
1
2
3
Sum
Observed
421
227
96
21
765
binomial
377.1 300.6 80.3
6.9
764.4
(Oj – Ej)
43.9
-73.6 15.7
14.1
18.0 2.6
28.8 54.5
2
(Oj–Ej) /Ej 5.1
24
Chi Square, 3 degrees of freedom
0.25
DENSITY
0.20
0.15
0.10
5%
0.05
0.00
0
5
10
CHI
7.81
15
20
Conjecture: Poisson where
mnp = 0.63
P(k=3)
= 1- P(k=2)-P(k=1)-P(k=0)
P(k=0) = e-m mk /k! = e-0.63 (0.63)0/0! = 0.5326
P(k=1) = e-m mk /k! = e-0.63 (0.63)1/1! = 0.3355
P(k=2) = e-m mk /k! = e-0.63 (0.63)2/2! = 0.1057
26
Average # of men on base
#
0
1
2
3
fraction
0550
0.298 0.125
0.027
product
0
0.298 0.250
0.081
Sum of products = n*p = 0.298+0.250+0.081 = 0.63
pˆ npˆ / n 0.63/ 3 0.21
27
Conjecture: Poisson where
mnp = 0.63
P(k=3)
= 1- P(k=2)-P(k=1)-P(k=0)
P(k=0) = e-m mk /k! = e-0.63 (0.63)0/0! = 0.5326
P(k=1) = e-m mk /k! = e-0.63 (0.63)1/1! = 0.3355
P(k=2) = e-m mk /k! = e-0.63 (0.63)2/2! = 0.1057
28
Goodness of Fit
#
0
1
2
3
Sum
Observed
421
227
96
21
765
Poisson
407.4 256.7 80.9
20.0 765
(Oj–Ej) /Ej 0.454 3.44 2.82
0.05 6.76
2
29
Chi Square, 3 degrees of freedom
0.25
DENSITY
0.20
0.15
0.10
5%
0.05
0.00
0
5
10
CHI
7.81
15
20
Likelihood Functions
Review
OLS Likelihood
Proceed in a similar fashion for the probit
31
Likelihood function
The
joint density of the estimated residuals
can be written as:
g (eˆ0 eˆ1 eˆ2 ..... eˆn1 )
If
the sample of observations on the
dependent variable, y, and the independent
variable, x, is random, then the observations
are independent of one another. If the errors
are also identically distributed, f, i.e. i.i.d,
then
32
Likelihood function
Continued:
If i.i.d., then
g (eˆ0 eˆ1... eˆn1 ) f (eˆ0 ) * f (eˆ1 )...f (eˆn1 )
If
the residuals are normally distributed:
f (eˆi ) ~ N (0, ) (1 / 2 )e
2
1/ 2[( eˆi 0) / ]2
This
is one of the assumptions of linear
regression: errors are i.i.d normal
then the joint distribution or likelihood
function, L, can be written as:
33
Likelihood function
n 1
L g (eˆ0 eˆ1... eˆn 1 ) (1 / 2 )e
1 / 2[( eˆi 0 ) / ]2
i 0
(1 / 2 )
2
L (1 / )
2 n/2
* (1 / 2 )
n/2
*e
n1
[ eˆi
]2
i 0
and
taking natural logarithms of both sides,
where the logarithm is a monotonically
increasing function so that if lnL is
maximized, so is L:
34
Log-Likelihood
n 1
ln L (n / 2) * ln[ ] (n / 2) * ln(2 ) (1 / 2 ) eˆi
2
2
2
i 0
n 1
ln L (n / 2) * ln[ 2 ] (n / 2) * ln(2 ) (1 / 2 2 ) [ yi aˆ bˆ * xi ]2
i 0
Taking
the derivative of lnL with respect to
either a-hat or b-hat yields the same
estimators for the parameters a and b as
with ordinary least squares, except now we
know the errors are normally distributed.
35
Probit
Example:
expenditures on lottery as a % of
household income
lotteryi = a + b*incomei + ei
if lotteryi >0, i.e. a + b*incomei + ei >0,
then Berni , the yes-no indicator variable is
equal to one and ei >- a - b*incomei
this determines a threshold for observation i
in the distribution of the error ei
assume ei ~ N (0, 2 )
36
f ( z) [1/ 2 ]* e
1/ 2[( z 0) /1]2
Density Function for the Standardized Normal Variate
threshold: (a b * incomei 0) / (ei 0) /
0.45
i
0.4
0.35
Density
0.3
0.25
0.2
0.15
0.1
0.05
0
-5
-4
-3
-2
-1
0
1
Standard Deviations
2
3
4
5
f ( z) [1/ 2 ]* e
1/ 2[( z 0) /1]2
Density Function for the Standardized Normal Variate
threshold: (a b * incomei 0) /
0.45
Area
above the threshold
i
is the probability of
playing the lottery for
observation i, Pyes
0.4
0.35
Density
0.3
0.25
0.2
0.15
0.1
0.05
0
-5
-4
-3
-2
-1
0
1
Standard Deviations
2
3
4
5
f ( z) [1/ 2 ]* e
1/ 2[( z 0) /1]2
Density Function for the Standardized Normal Variate
threshold: (a b * incomei 0) /
0.45
Area
above the threshold
i
is the probability of
playing the lottery for
observation i, Pyes
0.4
0.35
Density
0.3
0.25
0.2
0.15
Pno for
observation i
0.1
0.05
0
-5
-4
-3
-2
-1
0
1
Standard Deviations
2
3
4
5
Probit
Likelihood
function for the observed sample
LIK n! /(n yes!nno !) *
Bern 0
Pno
P
yes
Bern 1
n
i
LIK n! /(n yes!nno !) * P(i ) (no1 Berni ) * P(i ) Bern
yes
i 1
Log
likelihood:
n
ln LIK ln[n! /(n yes!nno !)] (1 Berni ) ln Pino Berni ln Piyes
i 1
40
Pino
a b*incomei
Piyes
a b*incomei
1 / 2 * e
Pino
a b*income
(1/ 2 )([ ei 0 ] / ) 2
[1 / 2 ] * e
(1/ 2 )([ ei 0 ] / ) 2
41
f ( z) [1/ 2 ]* e
1/ 2[( z 0) /1]2
Density Function for the Standardized Normal Variate
threshold: (a b * incomei 0) /
0.45
Area
above the threshold
i
is the probability of
playing the lottery for
observation i, Pyes
0.4
0.35
Density
0.3
0.25
0.2
0.15
Pno for
observation i
0.1
0.05
0
-5
-4
-3
-2
-1
0
1
Standard Deviations
2
3
4
5
Probit
Substituting
these expressions for Pno and
Pyes in the ln Likelihood function gives the
complete expression.
43
Probit
Likelihood
function for the observed sample
LIK n! /(n yes!nno !) *
Bern 0
Pno
P
yes
Bern 1
n
i
LIK n! /(n yes!nno !) * P(i ) (no1 Berni ) * P(i ) Bern
yes
i 1
Log
likelihood:
n
ln LIK ln[n! /(n yes!nno !)] (1 Berni ) ln Pino Berni ln Piyes
i 1
44
45
Outline
I.
Projects
II. Goodness of Fit & Chi Square
III.Contingency Tables
46
Part I: Projects
Teams
Assignments
Presentations
Data
Sources
Grades
47
Team One
:
Project choice
: Data Retrieval
: Statistical Analysis
: PowerPoint Presentation
: Executive Summary
: Technical Appendix
: Graphics (Excel, Eviews, other)
48
Assignments
1. Project choice: Markus Ansmann
2. Data Retrieval: Theodore Ehlert
3. Statistical Analysis: David Sheehan
4. PowerPoint Presentation: Qun Luo
5. Executive Summary: Steven Comstock
6. Technical Appendix: Alan Weinberg
7. Graphics: Gregory Adams
49
PowerPoint Presentations: Member
4
1. Introduction: Members 1 ,2 , 3
– What
– Why
– How
2.
Executive Summary: Member 5
3. Exploratory Data Analysis: Members 3, 7
4. Descriptive Statistics: Member 3, 7
5. Statistical Analysis: Member 3
6. Conclusions: Members 3 & 5
7. Technical Appendix: Table of Contents,
Member 6
50
Executive Summary and
Technical Appendix
51
I.
Your report should have an executive summary of one to one
and a half pages that summarizes your findings in words for a nontechnical reader. It should explain the problem being examined
from an economic perspective, i.e. it should motivate interest in the
issue on the part of the reader. Your report should explain how you
are investigating the issue, in simple language. It should explain
why you are approaching the problem in this particular fashion.
Your executive report should explain the economic importance of
your findings.
The technical details of your findings you can attach as an
appendix.
52
Grades
Component
A
Introduction
Exec. Summy
Explor.
Descriptive
Stat. Anal.
Conclusions
Tech. Appen.
Graphics
Overall Proj.
B
C
53
Data Sources
FRED:
Federal Reserve Bank of St. Louis,
http://research.stlouisfed.org/fred/
– Business/Fiscal
Index
of Consumer Sentiment, Monthly (1952:11)
Light Weight Vehicle Sales, Auto and Light Truck, Monthly
(1976.01)
Economagic,
http://www.economagic.com/
U S Dept. of Commerce,
http://www.commerce.gov/
– Population
– Economic Analysis, http://www.bea.gov/
54
Data Sources (Cont. )
Bureau
of Labor Statistics,
http://stats.bls.gov/
California Dept of Finance,
http://www.dof.ca.gov/
55