Log-linear Part One

Download Report

Transcript Log-linear Part One

Log-linear Models
Please read Chapter Two
We are interested in relationships
between variables
White Victim
White Prisoner 151
Black Victim
9
(151/160=0.94)
Black Prisoner
63
(63/166=0.40)
103
Pearson Chi-square test of Independence
Based on P(A,B) = P(A) P(B)
p11
p12
p13
p14
p1+
p21
p22
p23
p24
p2+
p+1
p+2
p+3
p+4
p++ = 1
Under H0 of independence, pij = pi+ p+j
x11
x12
x13
x14
x1+
x21
x22
x23
x24
x2+
x+1
x+2
x+3
x+4
x++ = N
Computing the Pearson chisquare
test of independence
• Calculate (estimated) expected frequencies
• Calculate
• For large samples, has an approximate
Chisquare distribution if H0 is true
• Degrees of freedom (I-1)(J-1)
Numerical example of Pearson chisquare
White
Prisoner
Black
Prisoner
Total
White
Victim
Black
Victim
151
(105)
63
(109)
214
9
(55)
103
(57)
112
Total
160
166
326
With R
Conclusions
•
•
•
•
X2 = 115, df = (2-1) (2-1) = 1
Critical value at alpha = 0.05 is 3.84
Reject H0
Conclude race of prisoner and race of
victim are not independent.
• That’s not good enough! Murder
victims and the persons convicted of
murdering them tend to be of the same
race. (Say what happened!)
Two treatments for Kidney Stones
Treatment A
Treatment B
Effective
273
289
Ineffective
77
61
X2 = 2.3106, df = 1, p = 0.1285
These results are consistent with no difference in
effectiveness between treatments.
All this applies to the multinomial, but
there are 3 main sampling models
• Multinomial
• Poisson
• Product Multinomial
Fortunately, the same statistical methods work with all.
Poisson
• Independent Poisson processes
generate the counts in each category
(for ex., traffic accidents).
• In homework you proved that
conditionally upon the total number of
events, the joint distribution of the
counts is multinomial.
• Justifies use of multinomial theory
• But in hard cases, Poisson probability
calculations can be easier.
Product multinomial
• Take independent random samples of sizes
N1, N2, …, NI from I sub-populations.
• In each, observe a multinomial with J
categories. Compare.
• Examples: Vitamin C study, Kidney stone
study.
• Likelihood: A product of I multinomial
likelihoods, because of independent sampling
from sub-populations.
• This is almost always the right model for
experimental studies.
Suppose the null hypothesis is no differences
among the I vectors of multinomial probabilities
x11
x12
x13
x14
x1+ = N1
x21
x22
x23
x24
x2+ = N2
x+1
x+2
x+3
x+4
x++ = N
• Then under H0, the MLE of the (common) pj is the
sample proportion, pooling data across the I rows:
x+j/N.
• And the expected cell frequency is
Same as for the usual chisquare test of independence
So let’s concentrate on the
multinomial
Assume a multinomial and test
independence? Messy!
p1
p2
p1+p2
p3
p4
p3+p4
p1+p3
p2+p4
Log-linear models
• Linear model for the (natural) logs of the
expected frequencies
• Looks like ANOVA notation (STA332)
• First, one-factor (not in the text)
• Then two-factor (in the text)
• Start with the familiar normal example,
testing for differences among means.
Compare 3 means
• Grand Mean
• Effects are deviations from the grand
mean
–
–
–
•
Single categorical variable, k categories
Linear model for log of expected frequencies
No probability can equal zero!
This is a Re-Parameterization
Substitute into likelihood function and do maximum likelihood
How many parameters, k or k-1?
There are still k-1 parameters
•
•
• All “effects” zero corresponds to equal probabilities
Maximum Likelihood
Log Likelihood
k = 3 Categories
Numerical MLE
Remember the employment study?
• 106 Employed in a job related to field of
study
• 74 Employed in a job unrelated to their
field of study
• 20 Unemployed
• Use R to
– Estimate the effects
– Test equal probabilities (senseless)
Generic MLE with R
Estimate the probabilities and test
This seems like a lot of trouble just to
estimate some probabilities
and test if they are equal.
But the payoff comes for tables of
two or more dimensions.