Transcript pdf file

Why CpG islands?
CSE, Marmara University
mimoza.marmara.edu.tr/~m.sakalli/cse546
Including some slights of Papoulis. These notes will be further modified.
Dec/15/09
Notes on probability are from A. Papoulis and S. U. Pillai,
1
RNA interference, and DNA methylation RCOOHRCOOCH3
• Methylation involves on the regulation of the gene expression, protein
functioning, and RNA metabolisms.
• A cell is combination of numerous proteins, each determining how a cell
functions. Disproportionately expressed proteins will have devastating effects.
• Two possible vulnerabilities:
– one is at the transcriptional level, while dna is converted to mRNA, a fraction of
antisense oligonucleotide binding to unprocessed gene in the DNA, and creating a 3strand complex, as a result blocking transcription process,
– and the second vulnerability is that at the level of translation. Translation is a
ribosome-guided process for manufacturing a protein from mRNA. There, once
antisenseRNA hybridizes mRNA, then protein generation is inhibited since the editing
enzymes splicing introns from RNAs are blocked. RNaseH recognizes the double
helix complex of antisense on bound mRNA, and somehow frees antisense on, and
cleaves mRNA.
• Antisense therapy: HIV, influenza and for cancer treatment where replication and
transcription is targeted.
2
RNA interference, and DNA methylation RCOOHRCOOCH3
• RNA interference (RNAi) is a system controlling (either increasing or
decreasing) the activity of RNAs. MicroRNA (miRNA) and small interfering
RNA (siRNA) which are the direct products of genes, and can bind to other
specific RNAs. They play roles in defending cells against parasitic genes –
viruses and transposons – but also gene expression in general. It is universal.
• The methylation process differs in prokaryotic and eukaryotic cells, in the former
one it occurs at the 5’ of cytosine pyrimidine and at the 6’ of nitrogen of the
adenine purine ring, while in the later one, it occurs at the # 5 carbon of the
cytosin pyrimidine sites.
• In mammalian, metyhlation occurs at the 5C of CpG dinucleotide. CpG is 1% of
human genome. Most are methylated. Unmethylated CpG islands present in the
regulatory genes, including promoter regions, therefore impeding transcription
and protein modeling, (cromotin and histone).
• One abnormality for example caused due to the incomplete methylation is Rett
syndrome. Epigenetic abnormalities. Methylated histones holding dna tightly and
blocking transcriptions.
3
• The occurrence of CpG sequences is the least frequent in many
genomes.. rarer than would be expected by the independent
probabilities of C and G. This is said (!!because) C in CpG has a
tendency to methylate and to become methyle-C, and methylation
process is suppressed in areas around genes, hence these areas have
a relatively higher concentration of CpG in islands.
• Epigenetic Importance: Methyle-C has a high change in mutating
to T, therefore important in epigenetic inheritance, as well its
importance in in controlling gene expression and regulation.
• Questions: How close a short sequence is to be a CpG island, and
the likelihood of a long sequence containing one or more CpG
islands, and more importantly the relation it bears, coincidental or
for some functional reasoning.
• Therefore Markov chains.
4
A Markov chain is a stochastic random process, a discrete process {
Xn } where n  { 0, 1, 2, . . . }, with the Markov property, for
which, the conditional probability distribution of the future
states depends only upon the current state and a fixed number
of past states (with m memories). Continuous Time MC has
continuous time index.
Pr{ Xm+1 = j | X0 = k0, . . . , Xm-1 = km-1, Xm = i } =
Pr{ Xm+1 = j | Xn = i }  transition probabilities.
Finite state machine, iid sequence.
for every i, j, k0, . . . , km-1 and for every m.
Stationary: For all n, the transition matrix does not change over
time and the future state depends only on the current state i
and not on the previous states.
Pr{ Xn+1 = j | Xn = i } = Pr{ X1 = j |X0 = i }.
5
0
(0.5)
0
4
(0.5)
(1)
(1)
2
1
(1)
(1)
(1)
2
1
(1)
The one-step transition matrix for a Markov chain with
states S = { 0, 1, 2 } is [………, …, …]
where Pr{ X1 = j | X0 = i } = pij(n)>0.
Accessibility: A Markov Process is ergodic if if possible to
communicate between any two i to j states. Then this is
irreducible, if all states communicate..
Periodic if returns to the same state at every k (periodicity)
steps. Aperiodic if there is no a repetitive k steps.
A system lucking the system is absorbing state.
If there is no absorbing state then the Markov Chain
irreducible.
6
0
State
0
1
2
3
4
0
X
X
0
0
X
1
X
X
0
0
0
2
0
0
X
X
0
3
0
0
0
X
0
4
0
0
0
0
0
1
4
3
2
0
State
0
1
2
3
0
0
X
0
X
1
X
0
0
0
2
X
0
0
0
3
0
0
X
X
1
3
2
7
Learn these..
• Conditional probability, joint probability.
• Independence of occurrences of events.
• Bayesian process.
• Expressing sequences statistically with their distribution. Discriminating states.
• MLE, EM.
• MCMC, for producing a desired posteriori distribution, 1- Metropolis-Hastings,
RWMC, 2-Gibbs sampling.
• Markov chains, properties maintained. Stationary, ergodic, irreducibility,
aperiodic,
• Hidden Markov Models (the goal is to detect the sequence of underlying states
that is likely to give rise to an observed sequence).
• This is Viterbi Algorithm.
8
Independence: A and B are said to be independent events,
if
P ( AB )  P ( A)  P ( B ).
(1-45)
Notice that the above definition is a probabilistic statement,
not a set theoretic notion such as mutually exclusiveness.
Suppose A and B are independent, then
P( A | B ) 
P( AB)
P( A) P( B )

 P( A).
P( B )
P( B )
(1-46)
Thus if A and B are independent, the event that B has
occurred does not give any clue on the occurrence of the
event A. It makes no difference to A whether B has occurred
or not.
9
PILLAI
Example 1.2: A box contains 6 white and 4 black balls.
Remove two balls at random without replacement. What
is the probability that the first one is white and the second
one is black?
Let W1 = “first ball removed is white”
B2 = “second ball removed is black”
10
PILLAI
Ex 1.2: A box contains 6 w and 4 b balls. Remove two at random
without replacement. What is the probability that the 1st one is white
and the 2nd one is black?
Let W = “first ball removed is white” and B = “second ball(1-47)
1
removed is black”
We need
2
P(W1  B2 )  ? We have W1  B2  W1B2  B2W1.
Using the conditional probability rule,
P(W1B2 )  P( B2W1 )  P( B2 | W1 ) P(W1 ).
6
6
3
P(W1 ) 

 ,
6  4 10 5
4
4
P( B2 | W1 ) 
 ,
and
54 9
5 4 20
and hence
P (W1 B2 )   
 0.25.
9 9 81
But
11
PILLAI
Are the events W1 and B2 independent? Our common sense
says No. To verify this we need to compute P(B2). Of course
the fate of the second ball very much depends on that of the
first ball. The first ball has two options: W1 = “first ball is
white” or B1= “first ball is black”. Note that W1  B1   ,
and W1  B1  . Hence W1 together with B1 form a partition.
Thus (see (1-42)-(1-44))
P( B2 )  P( B2 | W1 ) P(W1 )  P( B2 | R1 ) P( B1 )
4 3
3
4 4 3 1 2 42 2

 

    
 ,
5  4 5 6  3 10 9 5 3 5
15
5
and
2 3
20
P ( B2 ) P (W1 )    P ( B2W1 ) 
.
5 5
81
As expected, the events W1 and B2 are dependent.
12
PILLAI
From (1-35),
P ( AB )  P ( A | B ) P ( B ).
(1-48)
Similarly, from (1-35)
P( B | A) 
or
P( BA)
P( AB)

,
P( A)
P( A)
P ( AB )  P ( B | A) P ( A).
(1-49)
P( A | B ) P( B )  P ( B | A) P ( A).
or Bayes’ theorem
P( B | A)
P( A | B ) 
 P( A)
P( B )
(1-50)
13
PILLAI
Although simple enough, Bayes’ theorem has an interesting
interpretation:
P(A|B): a-posteriori probability of A given B.
P(B): (New Infor.) Evidence of “B has occurred”.
P(B|A): Likelihood of B given A
P(A): the a-priori probability of the event A.
We can also view the event B as new knowledge obtained
from a fresh experiment. We know something about A as
P(A). The new information is available in terms of B. The
new information should be used to improve our
knowledge/understanding of A. Bayes’ theorem gives the
exact mechanism for incorporating such new information.
14
PILLAI
A more general version of Bayes’ theorem involves
partition of . From (1-50)
P ( B | Ai ) P ( Ai )
P ( Ai | B ) 

P( B )
P ( B | Ai ) P ( Ai )
n
 P( B | A ) P( A )
i 1
i
,
(1-51)
i
where we have made use of (1-44). In (1-51), Ai , i  1  n,
represent a set of mutually exclusive events with
associated a-priori probabilities P( Ai ), i  1  n. With the
new information “B has occurred”, the information about
Ai can be updated by the n conditional probabilities
P( B | Ai ), i  1  n, using (1 - 47).
15
PILLAI
Example 1.3: Two boxes B1 and B2 contain 100 and 200
light bulbs respectively. The first box (B1) has 15 defective
bulbs and the second 5. Suppose a box is selected at
random and one bulb is picked out.
(a) What is the probability that it is defective?
Solution: Note that box B1 has 85 good and 15 defective
bulbs. Similarly box B2 has 195 good and 5 defective
bulbs. Let D = “Defective bulb is picked out”.
Then
P ( D | B1 ) 
15
 0.15,
100
P ( D | B2 ) 
5
 0.025.
200
16
PILLAI
Since a box is selected at random, they are equally likely.
P ( B1 )  P ( B2 ) 
1
.
2
Thus B1 and B2 form a partition as in (1-43), and using
(1-44) we obtain
P( D)  P( D | B1 ) P( B1 )  P( D | B2 ) P( B2 )
 0.15 
1
1
 0.025   0.0875.
2
2
Thus, there is about 9% probability that a bulb picked at
random is defective.
17
PILLAI
(b) Suppose we test the bulb and it is found to be defective.
What is the probability that it came from box 1? P( B1 | D)  ?
P( B1 | D) 
P( D | B1 ) P( B1 ) 0.15  1 / 2

 0.8571.
P ( D)
0.0875
(1-52)
Notice that initially P( B1 )  0.5; then we picked out a box
at random and tested a bulb that turned out to be defective.
Can this information shed some light about the fact that we
might have picked up box 1?
From (1-52), P( B1 | D)  0.857  0.5, and indeed it is more
likely at this point that we must have chosen box 1 in favor
of box 2. (Recall box1 has six times more defective bulbs
compared to box2).
18
PILLAI
14. Stochastic Processes
Introduction
Let  denote the random outcome of an experiment. To every such
outcome suppose a waveform
X (t,  )
X (t , ) is assigned.

The collection of such
X (t,  )
waveforms form a

X (t,  )
stochastic process. The
set of { k } and the time

X (t,  )
index t can be continuous
X (t,  )
or discrete (countably
t
0
t
t
infinite or finite) as well.
Fig. 14.1
For fixed  i  S (the set of
all experimental outcomes), X (t , ) is a specific time function.
For fixed t,
X 1  X (t1 , i )
n
k
2
1
1
2
is a random variable. The ensemble of all such realizations
X (t , ) over time represents the stochastic
19
PILLAI/Cha
process X(t). (see Fig 14.1). For example
X (t )  a cos( 0t   ),
where  is a uniformly distributed random variable in (0, 2 ),
represents a stochastic process. Stochastic processes are everywhere:
Brownian motion, stock market fluctuations, various queuing systems
all represent stochastic phenomena.
If X(t) is a stochastic process, then for fixed t, X(t) represents
a random variable. Its distribution function is given by
FX ( x, t )  P{X (t )  x}
(14-1)
Notice that FX ( x, t ) depends on t, since for a different t, we obtain
a different random variable. Further
 dFX ( x, t )
(14-2)
f X ( x, t ) 
dx
represents the first-order probability density function of the
20
process X(t).
PILLAI/Cha
For t = t1 and t = t2, X(t) represents two different random variables
X1 = X(t1) and X2 = X(t2) respectively. Their joint distribution is
given by
FX ( x1 , x2 , t1 , t2 )  P{X (t1 )  x1 , X (t2 )  x2 }
(14-3)
and
2

FX ( x1 , x2 , t1 , t2 )

f X ( x1 , x2 , t1 , t2 ) 
x1 x2
(14-4)
represents the second-order density function of the process X(t).
Similarly f X ( x1 , x2 , xn , t1 , t2 , tn ) represents the nth order density
function of the process X(t). Complete specification of the stochastic
process X(t) requires the knowledge of f X ( x1 , x2 , xn , t1 , t2 , tn )
for all ti , i  1, 2, , n and for all n. (an almost impossible task
in reality).
21
PILLAI/Cha
Mean of a Stochastic Process:

 (t )  E{ X (t )}    x f ( x, t )dx

X
(14-5)
represents the mean value of a process X(t). In general, the mean of
a process can depend on the time index t.
Autocorrelation function of a process X(t) is defined as

RXX (t1 , t2 )  E{ X (t1 ) X * (t2 )}    x1 x2* f X ( x1 , x2 , t1 , t2 )dx1dx2 (14-6)
and it represents the interrelationship between the random variables
X1 = X(t1) and X2 = X(t2) generated from the process X(t).
Properties:
1. R (t , t )  R* (t , t )  [ E{ X (t ) X * (t )}]*
XX
1 2
XX
2 1
2
1
(14-7)
2
2. RXX (t, t )  E{| X (t ) | }  0. (Average instantaneous power)
22
PILLAI/Cha
3. RXX (t1 , t2 ) represents a nonnegative definite function, i.e., for any
n
set of constants{ai }i 1
n
n
*
a
a
 i j RXX (ti , t j )  0.
(14-8)
i 1 j 1
n
Eq. (14-8) follows by noticing that E{| Y | }  0 for Y   ai X (ti ).
i 1
The function
(14-9)
CXX (t1 , t2 )  RXX (t1 , t2 )   X (t1 )  *X (t2 )
2
represents the autocovariance function of the process X(t).
Example 14.1
Let
T
z   T X (t )dt.
Then
T
T
T
T
E [| z | ]   T  T E{ X (t1 ) X * (t2 )}dt1dt2
2
  T  T R XX (t1 , t2 )dt1dt2
(14-10)
23
PILLAI/Cha
Example 14.2
X (t )  a cos( 0t   ),
 ~ U (0,2 ).
(14-11)
This gives
 (t )  E{ X (t )}  aE{cos( 0 t   )}
 a cos  0 t E{cos  }  a sin  0 t E{sin  }  0,
X
since E{cos  } 
1
2
2
0
(14-12)
cos d  0  E{sin  }.
Similarly
R XX (t1 , t2 )  a 2 E{cos( 0 t1   ) cos( 0 t2   )}
a2
 E{cos  0 (t1  t2 )  cos( 0 (t1  t2 )  2 )}
2
a2
 cos  0 (t1  t2 ).
(14-13)
2
24
PILLAI/Cha
Stationary Stochastic Processes
Stationary processes exhibit statistical properties that are
invariant to shift in the time index. Thus, for example, second-order
stationarity implies that the statistical properties of the pairs
{X(t1) , X(t2) } and {X(t1+c) , X(t2+c)} are the same for any c.
Similarly first-order stationarity implies that the statistical properties
of X(ti) and X(ti+c) are the same for any c.
In strict terms, the statistical properties are governed by the
joint probability density function. Hence a process is nth-order
Strict-Sense Stationary (S.S.S) if
f X ( x1 , x2 , xn , t1 , t2 , tn )  f X ( x1 , x2 , xn , t1  c, t2  c, tn  c)
(14-14)
for any c, where the left side represents the joint density function of
the random variables X 1  X (t1 ), X 2  X (t2 ), , X n  X (tn ) and
the right side corresponds to the joint density function of the random
variables X 1  X (t1  c), X 2  X (t2  c), , X n  X (tn  c).
A process X(t) is said to be strict-sense stationary if (14-14) is 25
true for all ti , i  1, 2, , n, n  1, 2,  and any c.
PILLAI/Cha
For a first-order strict sense stationary process,
from (14-14) we have
f X ( x, t )  f X ( x, t  c )
(14-15)
for any c. In particular c = – t gives
f X ( x, t )  f X ( x)
(14-16)
i.e., the first-order density of X(t) is independent of t. In that case

E[ X (t )]    x f ( x )dx   , a constant.
(14-17)
Similarly, for a second-order strict-sense stationary process
we have from (14-14)
f X ( x1 , x2 , t1 , t2 )  f X ( x1 , x2 , t1  c, t2  c)
for any c. For c = – t2 we get
f X ( x1 , x2 , t1 , t2 )  f X ( x1 , x2 , t1  t2 )
(14-18)
26
PILLAI/Cha
i.e., the second order density function of a strict sense stationary
process depends only on the difference of the time indices t1  t2   .
In that case the autocorrelation function is given by

RXX (t1 , t2 ) 
E{ X (t1 ) X * (t2 )}
   x1 x2* f X ( x1 , x2 ,  t1  t2 )dx1dx2
 RXX (t1  t2 )  RXX ( )  RXX* (  ),
(14-19)
i.e., the autocorrelation function of a second order strict-sense
stationary process depends only on the difference of the time
indices   t1  t2 .
Notice that (14-17) and (14-19) are consequences of the stochastic
process being first and second-order strict sense stationary.
On the other hand, the basic conditions for the first and second order
stationarity – Eqs. (14-16) and (14-18) – are usually difficult to verify.
In that case, we often resort to a looser definition of stationarity,
known as Wide-Sense Stationarity (W.S.S), by making use of 27
PILLAI/Cha
(14-17) and (14-19) as the necessary conditions. Thus, a process X(t)
is said to be Wide-Sense Stationary if
(i) E{ X (t )}  
(14-20)
and
(14-21)
(ii) E{ X (t1 ) X * (t2 )}  RXX (t1  t2 ),
i.e., for wide-sense stationary processes, the mean is a constant and
the autocorrelation function depends only on the difference between
the time indices. Notice that (14-20)-(14-21) does not say anything
about the nature of the probability density functions, and instead deal
with the average behavior of the process. Since (14-20)-(14-21)
follow from (14-16) and (14-18), strict-sense stationarity always
implies wide-sense stationarity. However, the converse is not true in
general, the only exception being the Gaussian process.
This follows, since if X(t) is a Gaussian process, then by definition
X 1  X (t1 ), X 2  X (t2 ), , X n  X (tn ) are jointly Gaussian random
variables for any t1 , t2 , tn whose joint characteristic function 28
PILLAI/Cha
is given by
n
 (1 , 2 , , n )  e
j
n
  ( tk )k  C
k 1
XX
( ti ,tk )i k / 2
l ,k
(14-22)
X
where C XX (ti , tk ) is as defined on (14-9). If X(t) is wide-sense
stationary, then using (14-20)-(14-21) in (14-22) we get
n
 (1 ,  2 , ,  n )  e
j

k 1
k  12
n
n
 C
11 k 1
XX
( ti  tk )i k
(14-23)
X
and hence if the set of time indices are shifted by a constant c to
generate a new set of jointly Gaussian random variables X 1  X (t1  c),
X 2  X (t2  c),, X n  X (tn  c) then their joint characteristic
n
{
X
}
function is identical to (14-23). Thus the set of random variables i i 1
n
and { X i}i 1 have the same joint probability distribution for all n and
all c, establishing the strict sense stationarity of Gaussian processes
from its wide-sense stationarity.
To summarize if X(t) is a Gaussian process, then
wide-sense stationarity (w.s.s)  strict-sense stationarity (s.s.s).
Notice that since the joint p.d.f of Gaussian random variables depends
29
only on their second order statistics, which is also the basis PILLAI/Cha
The ergodic hypothesis: an isolated system in thermal equilibrium, evolving in
time, will pass through all the accessible microstates states at the same
recurrence rate, i.e. all accessible microstates are equally probable.
The average over long times will equal the average over the ensemble of all equienergetic microstates: if we take a snapshot of a system with N microstates, we
will find the system in any of these microstates with the same probability.
30