file - Wallusch Datenbank
Download
Report
Transcript file - Wallusch Datenbank
Jacek Wallusch
_________________________________
Introduction to Econometrics
Lecture 4:
Count Data Models
definitions
Count Data
____________________________________________________________________________________________
Count
1. to name numbers in regular order;
2. to have a specified value [a touchdown counts for six
points]*;
Integer
1. anything complete in itself; entity;
2. any positice or negative whole number or zero;
ItE: 4
Webster’s New World College Dictionary
* That’s Webster as well!!!
Example
Count Data
____________________________________________________________________________________________
Doctor Visits: number of doctor visits per one respondent
within a specified period of time
explanatory variables:
socioeconomic – gender, age, squared age, income
health insurance status indicators – insured, not insured
recent health status measure – number of illnesses in past
2 weeks, number of days of reduced activity
long-term health status measure – health questionnaire
score
ItE: 4
Example: Cameron and Trivedi 1998
Count Data
Example
____________________________________________________________________________________________
Doctor Visits: see cont_docvisits.xls
exlaining variable:
avg. = 6.823 std.dev. = 7.395
skew = 4.176 kurt = 46.744
144
138
132
126
120
114
108
102
96
90
84
78
72
66
60
54
48
42
36
30
24
18
12
6
0
1200
1000
800
600
400
200
0
ItE: 4
Example: Katchova 2013
Examples
Count Data
____________________________________________________________________________________________
Other Topics:
research and development: patents granted, domestic
patents, foreign patents etc.
sales: number of items sold
quality control: number of defective items produced
Important note: time structure and/or cross-section
ItE: 4
Count Data
Background
____________________________________________________________________________________________
Poisson distribution and rare events
Poisson density:
PrY y
e
t
t
y
yi !
expected value and variance:
EY VarY
Warning: equidispersion property is often violated in
practice, i.e.
EY VarY
ItE: 4
– intensity (rate) coefficient, t – exposure (length of time during which the
events are recorded)
Estimation
coefficients
____________________________________________________________________________________________
Poisson distribution and rare events
e i iyi
f yi | xi
yi !
The model:
left-hand variable y: number of occurences of the event of
interest (e.g. doctor visits)
mean coefficient:
i exp x b
'
i
ItE: 4
b – coefficient vector, x – vector of linearly independent regressors
Probability
Poisson model
____________________________________________________________________________________________
Poisson distribution and estimated probabilities
Using the estimated mean to calculate probabilities:
i exp x b
'
i
condition
Pr yi y | i
e
i
y
i
y!
Conditional probability
that the left hand
variable = y
ItE: 4
b – coefficient vector, x – vector of linearly independent regressors
negative binomial
Other Distributions
____________________________________________________________________________________________
Important limitation: Overdispersion
Solution: negative binomial model
x 1 r
x r
p 1 p
P X x | p, r
r 1
mean coefficient:
variance:
r1 p / p
r 1 p / p r
2
2
1
2
ItE: 4
r – number of trials at wich success occurs, p – probability of success
Distributions
examples
____________________________________________________________________________________________
Randomly generated distributions
300
250
200
Poisson
Binomial
150
100
50
0
0
1
2
3
descriptive statistics:
ItE: 4
4
5
6
7
8
9
min
0
0
max
9
9
average
3.024
4.495
variance
2.884
2.286
Results
Interpretation
____________________________________________________________________________________________
Marginal Effects
Exponential conditional mean:
E y | x exp x' b
E y | x
'
b i exp x j b
xi
differentiation:
Procedures:
1 T
'
(1) j 1 b i exp x j b
T
(2)bi exp x b
'
j
ItE: 4
(1) average response after aggregating over all individuals vs. (2) response for the
individual with average characteristics
Overdispersion
Methods of Estimatios
____________________________________________________________________________________________
GRETL
Test for overdispersion:
null hypothesis: no overdispersion
E y | x Var y | x
interpretation: rejection of the null suggests the need for
different distribution
ItE: 4
Overdispersion
Methods of Estimatios
____________________________________________________________________________________________
GRETL
NEG BIN 2:
Var y | x
2
-coefficient: measure of heterogeneity between
individuals
NEG BIN 1:
Var y | x 1
conditional variance: scalar multiple () of conditional
mean
ItE: 4