No Slide Title

Download Report

Transcript No Slide Title

Dr. Ka-fu WONG
ECON1003
Analysis of Economic Data
Ka-fu Wong © 2003
Lab 1-1
Counting Green Beans in the Bottle
 We are interested in knowing the number of
green beans in the bottle.
 Tools:
 We do not have a weight balance. If we have
a balance, we can take out a small number of
beans and weight them. We can than estimate
the number of beans in the bottle.
 We do have a pack of red beans.
 What do we need to do to obtain a reasonable
estimate?
Ka-fu Wong © 2003
Lab 1-2
In-class Lab
Capture/Re-capture
GOAL:
1. Illustrate that how to estimate the population
size when the cost of counting all individuals is
prohibitive.
2. Illustrate how easy and intuitive statistics could
be. Statistics need not be completely deep,
murky, and mysterious. Our common sense can
help us to negotiate our way through the course.
Ka-fu Wong © 2003
Lab 1-3
History and examples of
capture / recapture method
 Capture-recapture methods were originally
developed in the wildlife biology to monitor the
census of bird, fish, and inset populations
(counting all individuals is prohibitive). Recently,
these methods have been utilized considerably in
the areas of disease and event monitoring.
 http://www.pitt.edu/~yuc2/cr/history.htm
Ka-fu Wong © 2003
Lab 1-4
The fish example

Estimating the number of fish in a lake or pond.
 C fish is caught, tagged, and returned to the
lake.
 Later on, R fish are caught and checked for
tags. Say T of them have tags.
 The numbers C, R, and T are used to estimate
the fish population.
Ka-fu Wong © 2003
Lab 1-5
Green beans in a bottle
 The objective is to estimate the number
of green beans in a bottle.
 Capture one cup of beans. Count them
and call it C. Replace the green beans
with red beans. Put them back into the
bag.
 Capture another cup of beans. Count
the total number of beans (R) and the
number of red beans (T).
 Based on this information,
 How to obtain a reasonable estimate
of the number of beans in the bag?
Ka-fu Wong © 2003
Lab 1-6
Green beans in a bag
 We know that C/N ≈ T/R
 Hence, a simple estimate is
CR/T
 C= the number of beans capture in the
first round.
 R= the total number of beans capture
in the second round.
 T= the number of red beans capture in
the second round.
Ka-fu Wong © 2003
Lab 1-7
Simulations to see the properties of this
proposed estimator
 How good is the proposed estimator?
 To see the properties of this proposed estimator,
I have use MATLAB to simulation our Capturerecapture experiment with different numbers of
capture (C) and different numbers of recapture
(R), relative to the total number of fish in the
pond.
 Throughout,
 N=500 and
 1000 simulations
Ka-fu Wong © 2003
Lab 1-8
Simulation design – via MATLAB
 Individual simulation experiment:
 Create 500 fish, labelled 1 to 500.
 Capture a random sample of C fish, mark them
by converting their label to zero.
 Capture another random sample of R fish.
Count the number of marked fish in the sample.
Call it T.
 Compute the estimate as CR/T.
 Repeat this experiment 1000 times. Hence, we
have 1000 estimates.
 Compute the mean and standard deviation of
these 1000 estimates.
Ka-fu Wong © 2003
Lab 1-9
Properties of our estimator
Increasing C and R
N
C
R
S
Mean
Std
500
40
40
971
640.76
401.57
500
60
60
1000
579.22
321.54
500
80
80
1000
533.61
154.67
500
100
100
1000
522.85
104.29
500
120
120
1000
513.82
77.41
500
140
140
1000
507.04
60.98
500
250
250
1000
500.64
22.93
500
500
500
1000
500.00
0.00
•N = Total number of fish in the pond.
•C = number of captured fish.
•R = number of re-captured fish.
•S = number of simulation with non-zero marked fish in recapture.
Ka-fu Wong © 2003
Lab 1-10
Properties of our estimator
Constant C and increasing R
N
C
R
S
Mean
Std
500
120
40
1000
507.86
75.07
500
120
60
1000
513.40
79.55
500
120
80
1000
508.19
73.56
500
120
100
1000
511.24
74.55
500
120
120
1000
510.93
75.41
500
120
140
1000
511.21
75.63
500
120
250
1000
510.49
74.04
500
120
500
1000
507.47
77.32
•N = Total number of fish in the pond.
•C = number of captured fish.
•R = number of re-captured fish.
•S = number of simulation with non-zero marked fish in recapture.
Ka-fu Wong © 2003
Lab 1-11
Properties of our estimator
Increasing C and constant R
N
C
R
S
Mean
Std
500
40
120
961
646.59
405.72
500
60
120
1000
582.17
327.97
500
80
120
1000
533.28
142.23
500
100
120
1000
512.28
95.40
500
120
120
1000
508.78
78.75
500
140
120
1000
507.50
60.61
500
250
120
1000
500.86
22.38
500
500
120
1000
500.00
0.00
•N = Total number of fish in the pond.
•C = number of captured fish.
•R = number of re-captured fish.
•S = number of simulation with non-zero marked fish in recapture.
Ka-fu Wong © 2003
Lab 1-12
Conclusion from the simulations
 The proposed estimator generally overestimate the
number of fish in pond, i.e., estimate is larger than the true
number of fish in pond.
 That is, there is a bias.
 Holding R constant, increasing the number of capture (C)
helps:
 Bias is reduced, i.e., Mean is closer to the true
population
 The estimator is more precise, i.e., standard deviation of
the estimator is smaller.
 Holding C constant, increasing the number of recapture (R)
does not help:
 Bias is more or less unchanged.
 The precision of the estimator is more or less
unchanged.
Ka-fu Wong © 2003
Lab 1-13
Additional issues
 Our proposed estimator is good enough but it
can be better. Alternative estimators have been
developed to reduce or eliminate the bias of
estimating N.
 For instance, Seber (1982, p.60) suggests an
estimator of N
(C+1)(R+1)/(T+1) – 1
(Note that our proposed formula is CR/T.)
Seber, G. (1982): The Estimation of Animal Abundance and Related
Parameters, second edition, Charles.
Ka-fu Wong © 2003
Lab 1-14
Simulations to see the properties of this
modified estimator
 How good is the modified estimator?
 To see the properties of this modified estimator,
we repeat the above simulation exercise with
this new formula.
(C+1)(R+1)/(T+1) – 1
Ka-fu Wong © 2003
Lab 1-15
Properties of modified estimator
Increasing C and R
N
C
R
S
Mean
Std
500
40
40
1000
488.60
271.05
500
60
60
1000
504.39
202.16
500
80
80
1000
498.88
121.47
500
100
100
1000
501.72
91.20
500
120
120
1000
498.10
72.01
500
140
140
1000
501.14
58.44
500
250
250
1000
498.60
21.72
500
500
500
1000
500.00
0.00
•N = Total number of fish in the pond.
•C = number of captured fish.
•R = number of re-captured fish.
•S = number of simulation with non-zero marked fish in recapture.
Ka-fu Wong © 2003
Lab 1-16
Properties of modified estimator
Constant C and increasing R
N
C
R
S
Mean
Std
500
120
40
1000
498.55
67.38
500
120
60
1000
500.05
71.54
500
120
80
1000
495.58
69.22
500
120
100
1000
497.01
71.14
500
120
120
1000
498.45
71.05
500
120
140
1000
495.17
67.46
500
120
250
1000
500.41
75.29
500
120
500
1000
496.73
74.27
•N = Total number of fish in the pond.
•C = number of captured fish.
•R = number of re-captured fish.
•S = number of simulation with non-zero marked fish in recapture.
Ka-fu Wong © 2003
Lab 1-17
Properties of modified estimator
Increasing C and constant R
N
C
R
S
Mean
Std
500
40
120
1000
491.84
291.00
500
60
120
1000
499.33
216.81
500
80
120
1000
496.51
117.05
500
100
120
1000
493.50
87.53
500
120
120
1000
503.24
73.65
500
140
120
1000
498.59
56.30
500
250
120
1000
499.76
22.58
500
500
120
1000
500.00
0.00
•N = Total number of fish in the pond.
•C = number of captured fish.
•R = number of re-captured fish.
•S = number of simulation with non-zero marked fish in recapture.
Ka-fu Wong © 2003
Lab 1-18
Conclusion from the simulations
 The modified estimator performs better than the
original estimator.
 There is no apparent bias.
 The estimator is more precise.
 Holding R constant, increasing the number of
capture (C) helps:
 The estimator is more precise, i.e., standard
deviation of the estimator is smaller.
 Holding C constant, increasing the number of
recapture (R) does not help:
 The precision of the estimator is more or less
unchanged.
Ka-fu Wong © 2003
Lab 1-19
What to take away today
 Statistics could be easy and intuitive.
 Statistics need not be completely deep, murky,
and mysterious.
 Our common sense can help us to negotiate our
way through the course.
Syllabus will be distributed and discussed on
Wednesday 22 January 2003.
Ka-fu Wong © 2003
Lab 1-20
In-class Lab
Capture / recapture
- END -
Ka-fu Wong © 2003
Lab 1-21