Transcript Document
Statistical Modeling of SARS
Epidemic Propagation via
Branching Processes
V.Kamalesh, V.Kuralmani, Goh Li Ping, Qian Long, Fu Xiuju, Terence Hung
Software & Computing Programme
Institute of High Performance Computing
“To succeed in containing SARS in Singapore, everyone must
cooperate and play his part.” - Prime Minister Goh Chok Tong
History of Branching Process
The study of branching processes originated
with a mathematical puzzle posed by Sir
Francis Galton, the noted cousin of Charles
Darwin, in the Educational Times of 1 April
1873.
Branching process may be viewed as a
mathematical representation of the evolution of
a population wherein the reproduction and
death are subject to the laws of chance.
Galton’s Puzzle
A large nation, of whom we will only concern ourselves with the
adult males, N in number, and who each bear separate surnames,
colonise a district. Their law of population is such that, in each
generation, P0 per cent of the adult males have no male children
who reach adult life; P1 have only one such male child; P2 have 2,
and so on up to P5 who have 5.
Find
(1) What proportion of the surnames will have become extinct after
r generations; and
(2) how many instances there will be of the same surname being
held by m persons
A solution was proffered by the Rev. Henry William Watson, and
from his 1874 joint paper with Galton , the mathematical tool of
branching emerged, the Galton-Watson Process.
Examples of BP
Propagation of human and animal species and
genes
Nuclear chain reaction
Electronic cascade phenomena
Epidemic Models
Branching Process
X0 1
X0 1
Y2 2
2
X1 3
Y1
Y3 2
X2 9
X 3 26
Y1 3
Y2 3
Y3 3
Y4 3
Y5 3
Y6 3 Y7 3 Y8 3 Y9 3
Bienayme-Galton-Watson BP
Bienayme-Galton-Watson BP can be thought of as a
stochastic model of an evolving population of particles
or individuals.
It starts at time 0 with Z(0) particles, each of which splits
into a random number of offspring that constitute the
first generation, and so on.
The number of “offspring” produced by a single
“parent” particle at any time is independent of the
history of the process, and of other particles existing at
the present.
The archetypal branching
Process (Galton-Watson):
Discrete reproduction periods (‘generations’; no
overlap or parents equivalent to offspring)
1 type of individuals, with identical offspring
distribution
They do not affect each other’s reproduction
Distributions of offspring numbers do not change in
time
BP as an epidemic Model
Branching processes can be adopted as models for the
spread of epidemic diseases.
Infections directly due to an infective are the offspring
One can approximate the infective population during the
early stages of the epidemic by a branching process
Minor epidemic: Extinction of the branching process
Major epidemic: Non-extinction of the branching process
Specification & standard details
A Galton-Watson process {xn; n=0,1,2,…} is a Markov chain
defined on a probability space (Ω,Γ,P) with state space Δ={0,1,…}
and it has the representation
x0 = N, some specified positive integer,
x1 = ξ1 + ξ2 + … + ξx0
x2 = ξx0+1 + ξx0+2 + …+ ξx0+x1
.
.
.
xn = ξx0+x1+…+xn-2+1 + …+ ξx0+x1+…+xn-1
and xn = 0 if xn-1 = 0, n ≥ 1
where ξi, i=1,2,… are independent and identically (iid) distributed
non-negative integer valued rv on (Ω,Γ,P) and their common
probability law is given by
P(ξi = k) = pk, k = 0,1,…; ∑ pk = 1
The Model
A Galton-Watson process is a Markov chain
{X(n); n ≥ 0} on the non-negative integers,
where for n ≥ 0
X(n+1) = ξ(n+1,1) + … ξ (n+1,X(n)) if X(n) ≥ 0
=0
if X(n) = 0
and {ξ (n,r); r,n ≥1} are independent random
variables, identically distributed like ξ (say) and
with other additional assumptions. Also
E(ξ i) = m
Offspring mean (m)
Since the offspring mean of a branching process
indicates almost sure extinction or possible explosion
of a population, there is considerable interest in
knowing the value of this criticality parameter (growth
rate parameter, basic reproductive rate)
The offspring mean (m) is also known as the infection
rate and its estimation is of great interest
The problem of estimation of ‘m’ arises when we deal
with the problem of determining vaccination policies
aimed at preventing major epidemics
Estimation of offspring mean
Galton-Watson BP is classified as:
Sub-critical if m < 1 (always extinction, finite expected time to
extinction)
Critical if m = 1 (always extinction, infinite expected time to
extinction)
Super-critical if m > 1 (probability of extinction smaller than 1)
Offspring mean indicates the (almost) sure extinction or possible
explosion of a population
One of the basic problems of the statistics of a G-W process is to
find a ‘good’ estimator for m
Estimation methods:
MLE, Least-squares, Ratio, Moment type, Bayes, etc.
Probability of extinction
A parameter of special interest is the following:
∞
∞
q = P(U ∩{xk = 0} = P(xn → 0) = P(E) (say)
n=1 k=n
This is referred to as the probability of extinction of a
G-W process with x0 = 1
It can be verified that:
q = 1 if m ≤ 1, and q < 1 if m > 1
Estimation of q is relevant when one is dealing with the
recognition of a new mutation in a genetic population
Immigration Process
Estimation of the offspring mean ‘m’ breaks down in the
sub-critical case ( when 0 < m < 1), in view of extinction
being almost certain in such situations.
The introduction of an immigration process into the
system facilitates the estimation of the offspring and
immigration mean under the sub-critical case.
The analysis of a G-W process with immigration has
some interesting conclusions: for example, if the mean of
the offspring distribution is > 1, immigration makes very
little difference to the eventual behaviour of the process.
BP with immigration
The simple subcritical G-W process X = {X(t); t=0,1,2,…}
with immigration, has the specification that X(0) is a nonnegative integer-valued random variable, and for t≥1,
X(t) = z(t,1) + …z(t,X(t-1)) + Y(t)
= Y(t)
if X(t-1) > 0
if X(t-1) = 0
and {z(t,r); r,n ≥1} are independent random variables,
identically distributed like z (say) and with other
additional assumptions.
Y(t) is the immigration component
Data Source
The data was taken from the following
website:
http://sarstracker.blogspot.com/
(source: Straits Times 12 April 2003).
After careful study of the data, we
transformed it into a format which could be
used to fit the Galton-Watson branching
process.
Singapore SARS Data
Singapore SARS Data
SARS Tree Diagram
Super
Spreader 1
Esther Mok
Esther’s
Mom
Esther’s
Dad
Pastor
Simon
Hospital
staff
Nurse in ward
5A
Friend
Hospital
staff
Friend
Friend
Hospital
staff
Patient
in same
ward
Friend
Her
Dad
daughter
Patient
in same
ward
Esther’s
Grdma
Hospital
staff
Esther’
s Uncle
son
Hospital
staff
Hospital
Staff
Hospital
Staff
Hospital
Staff
Patient
in same
ward
Friend
Hosp
Staff
Patient
Sister
Mom
Patient
in same
ward
daught
er
Hosp
Staff
Dr
Leong
Indon
Maid
Hospital
Staff
son
Hosp
Staff
Hosp
Staff
Hosp
Staff
Hosp
Staff
Husband
Patient
8A
daughter
Hosp
Staff
Mdm Paiinah
Hospital
staff
Hospital
Staff
Hospital
staff
Visitor
Hospital
Staff
Boy 5 fr
Pat
Schhouse
Mom of
3 in ICU
Visitor
Super
Spreader 4
Hospital
Staff
Hospital
staff
Patient
in same
ward
Hospital
staff
Hospital
Staff
Hospital
Staff
Hospital
Staff
Patient
8A
Nurse
Mom
daughter
son
Super
Spreader 5
Nurse
Visitor
Dr Lim
from
TTSH
Hosp
Staff
Visitor
Hospital
staff
Patient Patient Patient Patient Patient
Hospital
staff
Dr Ong
Hok Su
Mom
Hosp
Staff
Patient
Patient
Mdm
Painah
sis
Patient
Hospital
staff
Patient
Hosp
Staff
Patient
Heath
care
staff
Hosp
Staff
Heath
care
staff
Mdm
Painah
Bro
Heath
care
staff
Mdm
Painah
sis
Heath
care
staff
Heath
care
staff
Heath
care
staff
Health Health Health Health Health Health Health Health Health Health Health Health Visitor Visitor Health Health Visitor Health Visitor Visitor
care
care
care
care
care
care
care
care
care
care
care
care
care
care
care
staff
staff
staff
staff
staff
staff
staff
staff
staff
staff
staff
staff
staff
staff
staff
Hosp Mdm
Mdm
Staff Painah Painah
mom
sis
Hosp
Staff
Hosp
Staff
Hosp
Staff
Hosp
Staff
Hospital
staff
Hospital
staff
Hosp
Staff
Hosp
Staff
Visitor
Visitor
Visitor
Visitor
Visitor
Hosp
Staff
Friend
Jamail
ah
Taxi
driver
son
TTSH
nurse
PPWS
PPWS
Jamail
ah
Patient
Visitor
Dr Lee
Kang
Hoe
Grd
Daughter
Grd
Daughter
Grd
Daughter
Patient
Doctor
Nurse
Methodology
Study the links between the SARS affected patients
and identify the generation they belong to.
For example, z(0) is the initial number of patients,
z(1) the next generation and so on.
Hence z(0) is the parent and z(1) is the offspring for
the first generation. Similarly z(1) is the parent and
z(2) is the offspring for the second generation
The parents are the infectives and the offspring the
infection
Methodology (Cont.)
Calculate the following probabilities:
p(0) – probability of 0 person infected
p(1) – probability of 1 person infected
p(2) – probability of 2 persons infected
p(3) – probability of 3 persons infected
p(4) – probability of 4 or more persons infected
(super spreader)
Determine the time period and fit the Galton-Watson branching
process
Generation Size
Z
0
=1
Z is the generation for 5 generations, Z(0) to Z(5).
These have been colour-banded to show clearly the
number of offspring at each point. For example
Z4=17
Z
1
=25
Z
2
=36
Z
3
=72
Z
4
=17
Z
5
=6
The population size of each generation is:
Z(0) =1 (1 female)
61.2% of
Z(1) = 25 (14 females + 11 males)
SARS
Z(2) = 36 (21 females + 15 males)
infected are
Z(3) = 72 (46 females + 26 males)
females and
Z(4) = 17 (10 females + 7 males)
38.8% of
Z(5) = 6 (4 females + 2 males)
them are
males
Total = 157 (96 females + 61 males)
Super Infectors
Super
Spreader
1
Sex
No. infected directly
Female
25 (14 female + 11 male)
2
Female
23 (13 female + 10 male)
3
Female
24 (18 female + 6 male)
4
Male
43 (25 female + 18 male)
5
Male
11 (4 female + 7 male)
Probability Calculation
p(0) – probability of 0 person infected = 0.8344
p(1) – probability of 1 person infected = 0.0927
p(2) – probability of 2 persons infected = 0.01986
p(3) – probability of 3 persons infected = 0.01986
p(4) – probability of 4 or more persons infected (super
spreader) = 0.0331
The mean of the offspring distribution is 1.0331
Software
To model the SARS epidemic we use a JAVA program
which simulates a single-type BP and computes the
extinction probabilities.
In this program we specify the distribution for offspring
in a BP and "Maximum generations" giving the number
of generations we wish to observe the BP.
The program computes and displays the probabilities
that the branching process will die out by generation g,
for g = 1 to Maximum Generations.
Source: Written by Julian Devlin, 8/97, for the text book “Introduction to
Probability”, by Charles M. Grinstead & J. Laurie Snell
Probability of extinction
We set the maximum generations to 30 and the results are:
Extinction Probability
1
0.83400005
2
0.9530404
3
0.98533565
4
0.99529344
5
0.99847656
6
0.9995056
7
0.9998395
8
0.99994797
9
0.99998313
10
0.9999946
11
0.99999833
12
0.9999995
13
0.9999999
14
1.0
15
1.0
16
1.0
Probability of Extinction of the SARS
epidemic
1.05
1
Probability
Generation
0.95
0.9
0.85
0.8
0.75
1
3
5
7
9
11
Generation
13
15
17
19
Some Conclusions
The probability that the SARS epidemic will eventually
become extinct is 1.
This is likely to happen in the 14th generation.
Since this data has already encountered 5 generations,
there can utmost be 9 more generations.
Assuming each generation takes a maximum of 10 days,
based on the given data the epidemic will last only for a
maximum of 90 more days from 8 April 2003.
This result is conditional upon the same environment
and quarantine conditions.
Other related work @ IHPC
Auto-Regressive (AR) model
• Assumptions
Every time series data consist of both deterministic
and stochastic components.
The deterministic component gives rises to trends
seasonal patterns and cycles.
While the stochastic component causes statistical
fluctuations which have a short term correlation
structure.
Auto-Regressive (AR) model
• Methodology
– Step 1: determine the maximum number of the sample data
– Step 2: calculate the mean value of the sample data for
previous time
– Step 3: estimate the unknown parameters from historical
data
– Step 4: use the estimated parameters to predict future case
numbers
• Software
– An in-house software in FORTRAN language has been
developed. It is compatible with Window systems and UNIX
systems
Auto-Regressive (AR) model
Result: two days prediction
use the previous data to predict the data of two days later
200
Predicted
150
Observed
100
Two day prediction
50
0
0
10
20
30
40
Day number of patient starting from Mar 16
by two day prediction
50
Auto-Regressive (AR) model
Result: three days prediction
use the previous data to predict data of three days later
200
Predicted
150
Observed
100
Three day prediction
50
0
0
10
20
30
40
Day number of patient starting from Mar 16
by three days prediction
50
Future Research …
A Time Series approach to the study of a Branching Process
Motivation: Venkataraman,K.N (1982) A Time Series approach to the
study of the simple subcritical Galton-Watson process with
immigration, Adv.Appl.Prob., 14, 1-20.
Let ε(t) = 0 for t<0; ε(0) = X(0); and for t≥1,
ε(t) = X(t) – m X(t-1) – λ
Heyde and Seneta (1972) were the first to observe that the above
equation is analogous to the first-order autoregressive model for time
series
Vital difference: In BP ε(t) is determined by X(t) whereas in the
analogous time series model X(t) will be determined in terms of ε(t)
Thank you !!!