Stochastic Process - Home | CISB-ECN

Download Report

Transcript Stochastic Process - Home | CISB-ECN

Stochastic Processes
Eberhard O. Voit
Integrative Core
Problem Solving with Models
November 2011
Agenda
Definitions and Introductory Example
Binomial (Bernoulli) Process
Poisson Process
In some respect, these
are the ultimate
stochastic processes.
Random Walk
Random Walk as Markov Chain
Mean time
Others are Markov,
normal, queuing, and
branching processes
Diffusion
Random Number Generation
“Stochastic”
stoco: Aim, goal, target; guess, solution to a puzzle
stocastiko: Able to hit the target, smart
stocastik: Fortuneteller
K.-G. Hagstroem:
Skandinavisk Aktuarientidskrift 23, 54-57, 1940
Uses: English, Latin, Greek, French, German, Dutch, Swedish, Italian
Stochastic Process
Phenomenon changing over time and containing random component(s)
A mathematical representation of this phenomenon
~ Probabilistic model of a dynamic process
Example:
Draw cards from a deck repeatedly (with or without replacement)
T = {discrete time points}; t  T
At each t an observation is made on the random variable X(t) (e.g., X(5) =
Then, {X(t): t  T} is a random process or stochastic process (SP)
Process may be multivariate; T is called the index set.
Ace)
Stochastic Process
Is anything truly stochastic?
Does stochasticity depend on the observer or degree of knowledge?
Examples:
Random number generator
The unfortunate flower pot accident
Stochastic Process
Aims:
Simulation of scenarios
Best, worst, most likely outcomes; outcomes within certain ranges
Make statements about general, average, or collective behavior
Mean and variance, e.g., with respect to time
In many cases, these statements refer to the statistical distribution of
collective outcomes and its properties (mean, variance, moments, …)
or to other distributions, which the target distribution approaches
for large numbers of “trials”
Recall Central Limit Theorem
Stochastic Process
More Aims:
Make predictions (in the face of randomness), based on distributions
Estimate functional form of SP
Estimate key parameters (separate signal from noise)
Note:
Seamless transition to time series analysis (e.g., weather forecasting)
X(t)
time t
Old Example: Population Growth by Cell Doubling
Pt+t = 2 Pt
P(t+t ) +t = 2 Pt+t = 2  2  Pt = 22  Pt
Randomize:
Replace “2” with a random number?
Equivalent?
Make t random?
Voit and Dick (1983): At division, cells are assigned a cycle duration (probabilistic)
No cell death
Things get out of hand quickly!
Voit, E.O. and G. Dick: Mathem. Biosci. 66, 229-246 and 247-262, 1983
Example: Population Growth by Cell Doubling
Voit and Dick (1983): Example: Cycle durations 5, 6, or 8
Things get out of hand quickly! Linear algebra comes to the rescue!
Example:
Population Growth
by Cell Doubling
Compute short-term and longterm population sizes:
Ultimately exponential
Compute age distributions
over time:
Ultimately stable
Other Old Examples:
Dynamics of Red Blood Cells
Original Equations:
Rn = (1 – f)  Rn-1 + Mn-1
Mn = g  f  Rn-1
Could make f and g probabilistic
Leslie Matrix for Growth of Age-Structured Populations
 P1 
 
 P2 
  
 
P 
 m t t
 a1 a 2

s 1 0
  0 s2



0 0

 a m 1 a m 
  P1 
 0
0   
 P2 

 0
0  
 


   
Pm
 s m 1 0   t
Markov (Chain) Models
 p11

 p21
M   p31

 

 pm1
p12

p22
 p2,m 1
p32

p1,m 1
p3,m 1
 

pm 2  pm,m 1
p1m 

p2m 
p3m 

 

pmm 
Binomial (Bernoulli) Process
Bernoullis: Swiss-Dutch family of mathematicians (over three centuries)
Stochastic processes: Daniel Bernoulli (1700 – 1782)
Sequence of “trials” leading to “success” or “failure”
Flipping coin; checking for faulty items; mutations within DNA sequence
Trials independent of each other; no learning
Number of trials:
Number of successes:
Success probability per trial:
n
k
p
Probability of having exactly k successes:
P (k; n, p) 
n!
p k (1  p)n k
k! (n  k )!
Bernoulli Process
P (k; n, p) 
n!
p k (1  p)n k
k! (n  k )!
Examples:
Suppose: Mutation rate per DNA base is 0.0035
What is probability of finding exactly one mutation in a stretch of 10 bases?
P (1;10,0.0035) 
10!
0.00351 (1  0.0035)9  0.0339
1! 9!
What is probability of finding exactly two mutations in a stretch of 10 bases?
P ( 2;10,0.0035) 
10!
0.00352 (1  0.0035)8  0.000537
2! 8!
Bernoulli Process
P (k; n, p) 
n!
p k (1  p)n k
k! (n  k )!
Examples:
What is probability of not finding a single mutation in a stretch of 10 bases?
P (0;10,0.0035) 
10!
0.00350 (1  0.0035)10 
0! 10!
0.9655
What is probability of finding more than two mutations in a stretch of 10 bases?
P(k > 2) = 1 - 0.9655 - 0.0339 - 0.000537  0.00002
(2 in 100,000)
Bernoulli Process
P (k; n, p) 
n!
p k (1  p)n k
k! (n  k )!
Increase length of DNA; expect more mutations. E.g., n = 1,000
For exactly one mutation:
P (1;1000,0.0035) 
1000!
0.00351 (1  0.0035)999
1! 999!
1000! is a real problem!
But:
Thus:
1000! = 1  2  3  …  1000; 999! = 1  2  3  …  999; 1! = 1
1000! / 999! = 1000
Does not help much with a few hundred mutations!
Bernoulli Process
P (k; n, p) 
n!
p k (1  p)n k
k! (n  k )!
Circumvent problems:
1. Sterling’s formula:
(James Sterling:
1692-1770; Scot)
n
n
2n   
e
1

ln( n! ) 
ln( 2 )   n 
2

n! 
n = 15:
27.8993
0.9189
1
 ln( n )  n
2
41.9748
15
27.8937
2. Use Poisson process instead of Bernoulli process
Poisson Process
Siméon Denis Poisson (1781 – 1840)
Poisson distribution first published in 1837: Recherches sur la probabilité
des jugements en matière criminelle et en matière civile (“Research on the
Probability of Judgments in Criminal and Civil Matters”).
Classical examples:
1. Decay of a radioactive sample: decay is believed to be random;
once a particle has decayed, it does not decay again.
2. Raindrops on a tiled sidewalk
3. Mutations within DNA sequence
4. Events in continuous spaces (number of rust spots on a
metal rod)
Poisson Process
If the expected number of occurrences in a given interval is , then the
probability that there are exactly k occurrences (k = 0, 1, 2, ...) is equal to
e   k
P (k;  ) 
k!
Example from before:
Suppose: Mutation rate per DNA base is 0.0035
What is probability of finding exactly one mutation in a stretch of 10 bases?
Bernoulli:
P = 0.0339
For large number of trials, and / or small rates, Bernoulli is well
approximated by Poisson:
Poisson: “rate” with respect to 10 nucleotides is 0.035 (no n in formula)
P(1; 0.035) = exp(-0.035)  0.035 / 1 = 0.9656  0.035  0.0338
Poisson Process
Example where approximation is not good
Suppose: Mutation rate per base is 0.2
What is probability of finding exactly three mutations in a stretch of 10
bases?
10!
 0.23  0.87  0.2013
3! 7!
Bernoulli:
P
Poisson:
e 2  23
P
 0.1805
3!
(20% error)
e   k
P( k ;  ) 
k!
Poisson Process
Example from before:
Probability of one mutation in a 1,000-nucleotide segment
Bernoulli:
1000!
P (1;1000,0.0035) 
0.00351 (1  0.0035)999
1! 999!
= 1000  0.0035  0.9965999
= 3.5  0.0301 = 0.1054
Poisson: rate with respect to 1,000 nucleotides is 3.5
P(1; 3.5) = exp(-3.5)  3.51 / 1!  0.1057
Poisson Process
Example from before:
Probability of 25 mutations in a 1,000-nucleotide segment
Bernoulli:
1000!
P ( 25;1000,0.0035) 
0.003525 (1  0.0035)975
25! 975!
= mess; risk or inaccuracies due to large numbers
Poisson:
P(25; 3.5) = exp(-3.5)  3.525 / 25!  7.7810-14
Poisson Process
e   k
P( k ;  ) 
k!
Notes:
1. Both mean and variance are .
2. The normal distribution with mean and standard deviation  is an excellent
approximation for large .
3.  can be made a function of time: non-homogeneous Poisson process.
HW:
1. Compute probability (Bernoulli and Poisson) that there are exactly 3 or 4
mutations in a stretch or 1,000 nucleotides.
2. Compute probability that there are at least 4 mutations.
Random Walk (Gambling Model)
Play game; at every discrete instance: win or lose
Game: win $1, lose $1; probability of winning or losing is p or q = 1-p, respectively
Fair game: p = q = 0.5
Formulate as SP
p
p
i-1
q
q
p
i
p
i+1
q
q
“transient”
0
1
p
q
“absorbing”
N
1
Random Walk (Gambling Model)
Define transition probabilities
Pi, i+1 = p; Pi, i-1 = q = 1-p
( for i = 1, 2, …, N-1)
P00 = PNN = 1
Can formulate this as Markov Chain
 1 0 0 ...

 q 0 p ...
M   0 q 0 ...

   

0 0 0 0
0

0
0



1
Expected (mean) duration of a fair game (time to absorption):
Tk = k (N-k), where k is the initial fortune.
N = 100, k = 50: Tk = 2500; variance large
HW: Start with a budget of $5. Flip coins* (win/loss) until broke or MaxWin;
MaxWin = $8 or $10, or $20
* Real coin or computer program
Random Walk
Notes:
General random walk usually without absorbing states;
infinite index set
Probability = 1 that process hits a particular state
Probability = 1 that process returns to a state it left
Infinite time: Hit any state infinitely many times!
2-dim lattice:
Probability = 1 that process hits a particular state
Probability = 1 that two particles meet
3-dim lattice:
Probability < 1 that process hits a particular state
Probability < 1 that two particles meet
(Polya, 1926)
Diffusion (Random Walk (Brownian Motion) of Particles)
L
x-Dx
C( x, t )Dx
R
x
x+Dx
Number of particles within segment
[ x, x  Dx ]
C( x, t  t )  C( x, t )  RC( x  Dx, t )  RC( x, t )
 LC( x  Dx, t )  LC( x, t )
at time t.
Use Taylor approximation (w.r.t. t and x)
C
1  2C 2
C( x, t  t )  C( x, t ) 
t
t  ...( HOT )
2
t
2 t
C
1  2C 2
C( x  Dx, t )  C( x, t ) 
Dx 
Dx  ...( HOT )
2
x
2 x
Assume L = R = 0.5 and substitute in
C( x, t  t )  C( x, t )  RC( x  Dx, t )  RC( x, t )
 LC( x  Dx, t )  LC( x, t )
Result:
C
1  2C 2
1  2C 2
t
t  ... 
Dx  ...
2
2
t
2 t
2 x
C
1  2C 2
1  2C 2
t
t  ... 
Dx  ...
2
2
t
2 t
2 x
Divide by t and look at limit
t  0,
Dx  0
such that
( Dx )2

2t
C Dx 2  2C

t
2t x 2
C
 2C
 2
t
x
Typical diffusion equation (pde)
Also know as heat equation
Exponential and trigonometric solutions
Random Number Generation
Monte Carlo simulations
Random numbers should collectively follow a particular distribution
Can use uniformly random numbers and convert them
One method: Use S-distribution with random variable X; cdf F is solution of:
dF
 a  (F g  F h )
dX
F ( X 0 )  0.5
Logistic function is a special case with g = 1, h = 2, K = 1
Note that pdf f = dF/dX
Voit, E.O.: Biometrical J. 34 (7), 855-878, 1992.
Voit, E.O., and L.H. Schwacke: Risk Analysis 20(1): 59-71, 2000.
Flexibility in Shape
0.5
0.25
0
0
10
Three representative S-distributions with
From left to right:
g=0.25,
h=0.5;
20
a = 1 and F(0)=0.01.
g=0.75,
h=1.5,
g=1.2,
h=5.0
Random Number Generation
Task: Determine random numbers that collectively form a desired distribution
1. Model distribution as S-distribution
2. Random numbers will be distributed according to this distribution if their cdf
values are uniformly distributed
3. Select uniformly between 0 and 1 on the Y-axis and use corresponding X-value
1
RU ~ U(0,1)
0
RS ~ S(a, g, h, F0)
Express X as a function of F
dF
 a  (F g  F h )
dX
dX
 1 /[a  (F g  F h )]
dF
“Quantile equation”
Compute solution once; create comprehensive table of values F, X; then
look up S-distributed random numbers (corresponding to desired distribution)
Note: S-distribution is an approximation, but often sufficiently good.
HW: Create at least two different S-distributions; solve quantile equation;
draw 100 random numbers; confirm that they are correctly distributed.