Transcript Power 14

Power 14
Goodness of Fit
& Contingency Tables
1
II. Goodness of Fit & Chi
Square
 Rolling
a Fair Die
 The Multinomial Distribution
 Experiment: 600 Tosses
2
The Expected Frequencies
Outcom e
P robability
1
2
3
4
5
6
1/6
1/6
1/6
1/6
1/6
1/6
Expected Fre quency
100
100
100
100
100
100
3
The Expected Frequencies & Empirical Frequencies
Outcom e
1
2
3
4
5
6
Expected Fre quencies
100
100
100
100
100
100
Empirical
Expected Fre Frequency
quency
114
94
84
101
107
107
4
Hypothesis Test
 Null
H0: Distribution is Multinomial
 Statistic: (Oi - Ei)2/Ei, : observed minus
expected squared divided by expected
 Set Type I Error @ 5% for example
 Distribution of Statistic is Chi Square
One Throw, side one comes up: multinomial distribution
n
n
P(n1 =1, n2 =0, n3 =0, n4 =0, n5 =0, n6 =0) = n!/  n( j )][ p( j )]n( j )
j 1
j 1
P(n1 =1, n2 =0, n3 =0, n4 =0, n5 =0, n6 =0)= 1!/1!0!0!0!0!0!(1/6)1(1/6)0
(1/6)0 (1/6)0 (1/6)0 (1/6)0
5
Chi Square: x2 =  (Oi - Ei)2 = 6.15
Face
Observed, Oj
Expected, Ej
(Oj – Ej)2 /Ej
Oj - Ej
1
114
100
14
196/100 = 1.96
2
92
100
-8
64/100 = 0.64
3
84
100
- 16
256/100 = 2.56
4
101
100
1
1/100 = 0.01
5
107
100
7
49/100 = 0.49
6
107
100
7
49/100 = 0.49
Sum = 6.15
6
0.20
DENSITY
0.15
0.10
0.05
5%
0.00
0
5
10
CHI
15
11.07
Chi Square Density for 5 degrees of freedom
Contingency Table Analysis
 Tests
for Association Vs. Independence For
Qualitative Variables
8
Does Consumer Knowledge Affect Purchases?
Frost Free Refrigerators Use More Electricity
P urchase
Frost Free
Not Frost Free
Totals
Consum er Inform
Cons. Not Inform
.
Totals
9
Marginal Counts
Purchase
Frost Free
Not Frost Free
Totals
Consum er Inform
540
Cons. Not Inform
180
.
Totals
432
288
720
10
Marginal Distributions, f(x) & f(y)
Purchase
Frost Free
Not Frost Free
Totals
Consum er Inform
0.75
Cons. Not Inform
0.25
.
Totals
0.6
0.4
1
11
Joint Disribution Under Independence
f(x,y) = f(x)*f(y)
Purchase
Frost Free
Not Frost Free
Totals
Consum er Inform
0.45
0.3
0.75
Cons. Not Inform
0.15
0.1
0.25
.
Totals
0.6
0.4
1
12
Expected Cell Frequencies Under Independence
Purchase
Frost Free
Not Frost Free
Totals
Consum er Inform
324
216
540
Cons. Not Inform
108
72
180
.
Totals
432
288
720
13
Observed Cell Counts
Purchase
Frost Free
Not Frost Free
Totals
Consum er Inform
314
226
Cons. Not Inform
118
62
.
Totals
14
Contribution to Chi Square: (observed-Expected)2/Expected
P urchase
Frost Free
Not Frost Free
Totals
Consum er Inform
0.31
0.46
Cons. Not Inform
0.93
1.39
.
Totals
Upper Left Cell: (314-324)2/324 = 100/324 =0.31
Chi Sqare = 0.31 + 0.93 + 0.46 +1.39 = 3.09
(m-1)*(n-1) = 1*1=1 degrees of freedom
15
Figure 4: Chi-Square Dens ity, One Degree of Freedom
1.0
0.8
0.6
Dens ity
0.4
0.2
5%
0.0
0
2
4
6
5.02
8
10
Chi-Square Variable
12
14
Conclusion
 No
association between consumer
knowledge about electricity use and
consumer choice of a frost-free refrigerator
17
Using Goodness of Fit to
Choose Between Competing
Probability Models
 Men
on base when a home run is hit
18
Men on base when a home
run is hit
#
0
Observed 421
Fraction
1
2
3
Sum
227
96
21
765
0.550 0.298 0.125 0.027 1
19
Conjecture
 Distribution
is binomial
20
Average # of men on base
#
0
1
2
3
fraction
0550
0.298 0.125
0.027
product
0
0.298 0.250
0.081
Sum of products = n*p = 0.298+0.250+0.081 = 0.63
pˆ  npˆ / n  0.63/ 3  0.21
21
Using the binomial
k=men on base, n=# of trials
= [3!/0!3!] (0.21)0(0.79)3 = 0.493
 P(k=1) = [3!/1!2!] (0.21)1(0.79)2 = 0.393
 P(k=2) = [3!/2!1!] (0.21)2(0.79)1 = 0.105
 P(k=3) = [3!/3!0!] (0.21)3(0.79)0 = 0.009
 P(k=0)
22
Assuming the binomial
 The
probability of zero men on base is
0.493
 the total number of observations is 765
 so the expected number of observations for
zero men on base is 0.493*765=377.1
23
Goodness of Fit
#
0
1
2
3
Sum
Observed
421
227
96
21
765
binomial
377.1 300.6 80.3
6.9
764.4
(Oj – Ej)
43.9
-73.6 15.7
14.1
18.0 2.6
28.8 54.5
2
(Oj–Ej) /Ej 5.1
24
Chi Square, 3 degrees of freedom
0.25
DENSITY
0.20
0.15
0.10
5%
0.05
0.00
0
5
10
CHI
7.81
15
20
Conjecture: Poisson where
mnp = 0.63
 P(k=3)
= 1- P(k=2)-P(k=1)-P(k=0)
 P(k=0) = e-m mk /k! = e-0.63 (0.63)0/0! = 0.5326
 P(k=1) = e-m mk /k! = e-0.63 (0.63)1/1! = 0.3355
 P(k=2) = e-m mk /k! = e-0.63 (0.63)2/2! = 0.1057
26
Average # of men on base
#
0
1
2
3
fraction
0550
0.298 0.125
0.027
product
0
0.298 0.250
0.081
Sum of products = n*p = 0.298+0.250+0.081 = 0.63
pˆ  npˆ / n  0.63/ 3  0.21
27
Conjecture: Poisson where
mnp = 0.63
 P(k=3)
= 1- P(k=2)-P(k=1)-P(k=0)
 P(k=0) = e-m mk /k! = e-0.63 (0.63)0/0! = 0.5326
 P(k=1) = e-m mk /k! = e-0.63 (0.63)1/1! = 0.3355
 P(k=2) = e-m mk /k! = e-0.63 (0.63)2/2! = 0.1057
28
Goodness of Fit
#
0
1
2
3
Sum
Observed
421
227
96
21
765
Poisson
407.4 256.7 80.9
20.0 765
(Oj–Ej) /Ej 0.454 3.44 2.82
0.05 6.76
2
29
Chi Square, 3 degrees of freedom
0.25
DENSITY
0.20
0.15
0.10
5%
0.05
0.00
0
5
10
CHI
7.81
15
20
Likelihood Functions
 Review
OLS Likelihood
 Proceed in a similar fashion for the probit
31
Likelihood function
 The
joint density of the estimated residuals
can be written as:
g (eˆ0  eˆ1  eˆ2 ..... eˆn1 )
 If
the sample of observations on the
dependent variable, y, and the independent
variable, x, is random, then the observations
are independent of one another. If the errors
are also identically distributed, f, i.e. i.i.d,
then
32
Likelihood function
 Continued:
If i.i.d., then
g (eˆ0  eˆ1... eˆn1 )  f (eˆ0 ) * f (eˆ1 )...f (eˆn1 )
 If
the residuals are normally distributed:
f (eˆi ) ~ N (0,  )  (1 /  2 )e
2
1/ 2[( eˆi 0) /  ]2
 This
is one of the assumptions of linear
regression: errors are i.i.d normal
 then the joint distribution or likelihood
function, L, can be written as:
33
Likelihood function
n 1
L  g (eˆ0  eˆ1...  eˆn 1 )   (1 /  2 )e
1 / 2[( eˆi  0 ) /  ]2
i 0
 (1 / 2 )
2
L  (1 /  )
2 n/2
* (1 / 2 )
n/2
*e
n1
[ eˆi
]2
i 0
 and
taking natural logarithms of both sides,
where the logarithm is a monotonically
increasing function so that if lnL is
maximized, so is L:
34
Log-Likelihood
n 1
ln L  (n / 2) * ln[ ]  (n / 2) * ln(2 )  (1 / 2 ) eˆi
2
2
2
i 0
n 1
ln L  (n / 2) * ln[ 2 ]  (n / 2) * ln(2 )  (1 / 2 2 ) [ yi  aˆ  bˆ * xi ]2
i 0
 Taking
the derivative of lnL with respect to
either a-hat or b-hat yields the same
estimators for the parameters a and b as
with ordinary least squares, except now we
know the errors are normally distributed.
35
Probit
 Example:
expenditures on lottery as a % of
household income
 lotteryi = a + b*incomei + ei
 if lotteryi >0, i.e. a + b*incomei + ei >0,
then Berni , the yes-no indicator variable is
equal to one and ei >- a - b*incomei
 this determines a threshold for observation i
in the distribution of the error ei
 assume ei ~ N (0,  2 )
36
f ( z)  [1/ 2 ]* e
1/ 2[( z 0) /1]2
Density Function for the Standardized Normal Variate
threshold: (a  b * incomei  0) /   (ei  0) / 
0.45
i
0.4
0.35
Density
0.3
0.25
0.2
0.15
0.1
0.05
0
-5
-4
-3
-2
-1
0
1
Standard Deviations
2
3
4
5
f ( z)  [1/ 2 ]* e
1/ 2[( z 0) /1]2
Density Function for the Standardized Normal Variate
threshold: (a  b * incomei  0) / 
0.45
Area
above the threshold
i
is the probability of
playing the lottery for
observation i, Pyes
0.4
0.35
Density
0.3
0.25
0.2
0.15
0.1
0.05
0
-5
-4
-3
-2
-1
0
1
Standard Deviations
2
3
4
5
f ( z)  [1/ 2 ]* e
1/ 2[( z 0) /1]2
Density Function for the Standardized Normal Variate
threshold: (a  b * incomei  0) / 
0.45
Area
above the threshold
i
is the probability of
playing the lottery for
observation i, Pyes
0.4
0.35
Density
0.3
0.25
0.2
0.15
Pno for
observation i
0.1
0.05
0
-5
-4
-3
-2
-1
0
1
Standard Deviations
2
3
4
5
Probit
 Likelihood
function for the observed sample
LIK  n! /(n yes!nno !) *

Bern 0
Pno
P
yes
Bern 1
n
i
LIK  n! /(n yes!nno !) *  P(i ) (no1 Berni ) * P(i ) Bern
yes
i 1
 Log
likelihood:
n
ln LIK  ln[n! /(n yes!nno !)]   (1  Berni ) ln Pino  Berni ln Piyes
i 1
40
Pino  
 a b*incomei

Piyes  

 a b*incomei
1 / 2 * e
Pino 
 a b*income
(1/ 2 )([ ei 0 ] /  ) 2

[1 / 2 ] * e
(1/ 2 )([ ei 0 ] /  ) 2
41
f ( z)  [1/ 2 ]* e
1/ 2[( z 0) /1]2
Density Function for the Standardized Normal Variate
threshold: (a  b * incomei  0) / 
0.45
Area
above the threshold
i
is the probability of
playing the lottery for
observation i, Pyes
0.4
0.35
Density
0.3
0.25
0.2
0.15
Pno for
observation i
0.1
0.05
0
-5
-4
-3
-2
-1
0
1
Standard Deviations
2
3
4
5
Probit
 Substituting
these expressions for Pno and
Pyes in the ln Likelihood function gives the
complete expression.
43
Probit
 Likelihood
function for the observed sample
LIK  n! /(n yes!nno !) *

Bern 0
Pno
P
yes
Bern 1
n
i
LIK  n! /(n yes!nno !) *  P(i ) (no1 Berni ) * P(i ) Bern
yes
i 1
 Log
likelihood:
n
ln LIK  ln[n! /(n yes!nno !)]   (1  Berni ) ln Pino  Berni ln Piyes
i 1
44
45
Outline
 I.
Projects
 II. Goodness of Fit & Chi Square
 III.Contingency Tables
46
Part I: Projects
 Teams
 Assignments
 Presentations
 Data
Sources
 Grades
47
Team One
:
Project choice
 : Data Retrieval
 : Statistical Analysis
 : PowerPoint Presentation
 : Executive Summary
 : Technical Appendix
 : Graphics (Excel, Eviews, other)
48
Assignments
1. Project choice: Markus Ansmann
 2. Data Retrieval: Theodore Ehlert
 3. Statistical Analysis: David Sheehan
 4. PowerPoint Presentation: Qun Luo
 5. Executive Summary: Steven Comstock
 6. Technical Appendix: Alan Weinberg
 7. Graphics: Gregory Adams

49
PowerPoint Presentations: Member
4
 1. Introduction: Members 1 ,2 , 3
– What
– Why
– How
 2.
Executive Summary: Member 5
 3. Exploratory Data Analysis: Members 3, 7
 4. Descriptive Statistics: Member 3, 7
 5. Statistical Analysis: Member 3
 6. Conclusions: Members 3 & 5
 7. Technical Appendix: Table of Contents,
Member 6
50
Executive Summary and
Technical Appendix
51
I.
Your report should have an executive summary of one to one
and a half pages that summarizes your findings in words for a nontechnical reader. It should explain the problem being examined
from an economic perspective, i.e. it should motivate interest in the
issue on the part of the reader. Your report should explain how you
are investigating the issue, in simple language. It should explain
why you are approaching the problem in this particular fashion.
Your executive report should explain the economic importance of
your findings.
The technical details of your findings you can attach as an
appendix.
52
Grades
Component
A
Introduction
Exec. Summy
Explor.
Descriptive
Stat. Anal.
Conclusions
Tech. Appen.
Graphics
Overall Proj.
B
C
53
Data Sources
 FRED:
Federal Reserve Bank of St. Louis,
http://research.stlouisfed.org/fred/
– Business/Fiscal
 Index
of Consumer Sentiment, Monthly (1952:11)
 Light Weight Vehicle Sales, Auto and Light Truck, Monthly
(1976.01)
 Economagic,
http://www.economagic.com/
 U S Dept. of Commerce,
http://www.commerce.gov/
– Population
– Economic Analysis, http://www.bea.gov/
54
Data Sources (Cont. )
 Bureau
of Labor Statistics,
http://stats.bls.gov/
 California Dept of Finance,
http://www.dof.ca.gov/
55