Molecular Information Theory

Download Report

Transcript Molecular Information Theory

Molecular Information Theory
Niru Chennagiri
Probability and Statistics
Fall 2004
Dr. Michael Partensky
Overview







Why do we study Molecular Info. Theory?
What are molecular machines?
Power of Logarithm
Components of a Communication System
Discrete Noiseless System
Channel Capacity
Molecular Machine Capacity
Motivation
 Needle in a haystack situation.
 How will you go about looking for the
needle?
 How much energy you need to spend?
 How fast can you find the needle?
 Haystack = DNA, Needle = Binding site,
You = Ribosome
What is a Molecular Machine?
 One or more molecules or a molecular
complex: not a macroscopic reaction.
 Performs a specific function.
 Energized before the reaction.
 Dissipates energy during reaction.
 Gains information.
 An isothermal engine.
Where is the candy?
 Is it in the left four boxes?
 Is it in the bottom four boxes?
 Is it in the front four boxes?
You need answer to three questions to find the candy
Box labels: 000, 001, 010, 011, 100, 101, 110, 111
Need log8 = 3 bits of information
More candies…
 Box labels: 00, 01, 10, 11, 00, 01, 10, 11
 Candy in both boxes labeled 01.
 Need only log8 - log2 = 2 bits of
information.
In general,
m boxes with n candies need
log m - log n bits of information
Ribosomes
2600 binding sites from
4.7 million base pairs
Need
log(4.7 million) - log(2600)
= 10.8 bits of information.
Communication System
Information Source
 Represented by a stochastic process
 Mathematically a Markov chain
 We are interested in ergodic sources: Every
sequence is statistically same as every other
sequence.
How much information is
produced?
Measure of uncertainty H should be:
 Continuous in the probability.
 Monotonic increasing function of the
number of events.
 When a choice is broken down into two
successive choices, Total H = weighted sum
of individual H
Enter Entropy
âH
L
n
H=- K
pi log pi
i =1
1
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
1
Properties of Entropy
 H is zero iff all but one p are zero.
 H is never negative.
 H is maximum when all the events are
equally probable
 If x and y are two events
H(x,y) H(x) + H(y)
 Conditional entropy:
H x ( y)   p(i , j ) log pi ( j )
i, j
Hx(y) H(y)
Why is entropy important?
 Entropy is a measure of uncertainty.
 H  H After  H Before
 Entropy relation from thermodynamics
S  ( k B ln 2) H  ( k B ln 2) R
 Also from thermodynamics
q
S 
T
 For every bit of information gained, the
machine dissipates kBTln2 joules.
Ribosome binding sites
Information in sequence
Position p
H Before H After Change in
H
1
1
2
A-1/2 2
G-1/2
2
U-1
0
2
3
G-1
2
0
2
1
Information curve
H (l )  
 f (b, l) log f (b, l)
b{ A,C ,G ,T }
Information gain for site l is
RSequence (l )  2  H (l )
Plot of this across the sites gives Information curve.
For E.Coli, Total information is about 11 bits.
… same as what the ribosome needs.
Sequence Logo
Channel capacity
Source transmitting 0 and 1 at 1000 symbols/sec.
1 in 100 symbols have an error.
What is the rate of transmission?
Need to apply a correction
correction = uncertainty in x for a given value of y
Same as conditional entropy
H y ( x)  (0.99 log 0.99  0.01log 0.01)
= 81 bits/sec
Channel capacity contd.
C  Max{H ( x)  H y ( x)}
For a continuous source with white noise,
P

C  W log 1  

N
Signal to noise ratio
Bandwidth
Shannon’s theorem:
As long as the rate of transmission is below
C, the number of errors can me made as
small as needed.
Molecular Machine Capacity
 Lock and key mechanism.
 Each pin on the ribosome is a simple
harmonic oscillator in thermal bath.
 Velocity of the pins represented by points in
2-d velocity space
 More pins -> more dimensions.
 Distribution of points is spherical.
Machine capacity
For larger dimensions:
All points are in a thin spherical shell
Radius of the shell is the velocity and hence
square root of the energy
Before binding:
rbefore 
Py  N y
After Binding:
raftere 
Ny
Number of choices
= Number of ‘after’ spheres that can sit in
the ‘before’ sphere
=Vol. of Before sphere/Vol. Of after sphere
Machine capacity = logarithm of number of
choices
P

C  d log 1  

N
References

Claude Shannon, Mathematical Theory of communication, Reprinted with
corrections from The Bell System Technical Journal,Vol. 27, pp. 379–423, 623–656,
July, October, 1948.

Mathematical Theory of Communication by Claude E. Shannon, Warren Weaver

T. D. Schneider, Sequence Logos, Machine/Channel Capacity, Maxwell's Demon,
and Molecular Computers: a Review of the Theory of Molecular Machines,
Nanotechnology, 5: 1-18, 1994

T. D. Schneider, Theory of Molecular Machines. I. Channel Capacity of Molecular
Machines J. Theor. Biol., 148:, 83-123, 1991

How (and why) to find a needle in a haystack Article in The Economist (April 5th11th 1997, British version: p. 105-107, American version: p. 73-75, Asian version: p.
79-81).

http://www.math.tamu.edu/~rahe/Math664/gene1.html
http://www.lecb.ncifcrf.gov/~toms/