Molecular Information Theory
Download
Report
Transcript Molecular Information Theory
Molecular Information Theory
Niru Chennagiri
Probability and Statistics
Fall 2004
Dr. Michael Partensky
Overview
Why do we study Molecular Info. Theory?
What are molecular machines?
Power of Logarithm
Components of a Communication System
Discrete Noiseless System
Channel Capacity
Molecular Machine Capacity
Motivation
Needle in a haystack situation.
How will you go about looking for the
needle?
How much energy you need to spend?
How fast can you find the needle?
Haystack = DNA, Needle = Binding site,
You = Ribosome
What is a Molecular Machine?
One or more molecules or a molecular
complex: not a macroscopic reaction.
Performs a specific function.
Energized before the reaction.
Dissipates energy during reaction.
Gains information.
An isothermal engine.
Where is the candy?
Is it in the left four boxes?
Is it in the bottom four boxes?
Is it in the front four boxes?
You need answer to three questions to find the candy
Box labels: 000, 001, 010, 011, 100, 101, 110, 111
Need log8 = 3 bits of information
More candies…
Box labels: 00, 01, 10, 11, 00, 01, 10, 11
Candy in both boxes labeled 01.
Need only log8 - log2 = 2 bits of
information.
In general,
m boxes with n candies need
log m - log n bits of information
Ribosomes
2600 binding sites from
4.7 million base pairs
Need
log(4.7 million) - log(2600)
= 10.8 bits of information.
Communication System
Information Source
Represented by a stochastic process
Mathematically a Markov chain
We are interested in ergodic sources: Every
sequence is statistically same as every other
sequence.
How much information is
produced?
Measure of uncertainty H should be:
Continuous in the probability.
Monotonic increasing function of the
number of events.
When a choice is broken down into two
successive choices, Total H = weighted sum
of individual H
Enter Entropy
âH
L
n
H=- K
pi log pi
i =1
1
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
1
Properties of Entropy
H is zero iff all but one p are zero.
H is never negative.
H is maximum when all the events are
equally probable
If x and y are two events
H(x,y) H(x) + H(y)
Conditional entropy:
H x ( y) p(i , j ) log pi ( j )
i, j
Hx(y) H(y)
Why is entropy important?
Entropy is a measure of uncertainty.
H H After H Before
Entropy relation from thermodynamics
S ( k B ln 2) H ( k B ln 2) R
Also from thermodynamics
q
S
T
For every bit of information gained, the
machine dissipates kBTln2 joules.
Ribosome binding sites
Information in sequence
Position p
H Before H After Change in
H
1
1
2
A-1/2 2
G-1/2
2
U-1
0
2
3
G-1
2
0
2
1
Information curve
H (l )
f (b, l) log f (b, l)
b{ A,C ,G ,T }
Information gain for site l is
RSequence (l ) 2 H (l )
Plot of this across the sites gives Information curve.
For E.Coli, Total information is about 11 bits.
… same as what the ribosome needs.
Sequence Logo
Channel capacity
Source transmitting 0 and 1 at 1000 symbols/sec.
1 in 100 symbols have an error.
What is the rate of transmission?
Need to apply a correction
correction = uncertainty in x for a given value of y
Same as conditional entropy
H y ( x) (0.99 log 0.99 0.01log 0.01)
= 81 bits/sec
Channel capacity contd.
C Max{H ( x) H y ( x)}
For a continuous source with white noise,
P
C W log 1
N
Signal to noise ratio
Bandwidth
Shannon’s theorem:
As long as the rate of transmission is below
C, the number of errors can me made as
small as needed.
Molecular Machine Capacity
Lock and key mechanism.
Each pin on the ribosome is a simple
harmonic oscillator in thermal bath.
Velocity of the pins represented by points in
2-d velocity space
More pins -> more dimensions.
Distribution of points is spherical.
Machine capacity
For larger dimensions:
All points are in a thin spherical shell
Radius of the shell is the velocity and hence
square root of the energy
Before binding:
rbefore
Py N y
After Binding:
raftere
Ny
Number of choices
= Number of ‘after’ spheres that can sit in
the ‘before’ sphere
=Vol. of Before sphere/Vol. Of after sphere
Machine capacity = logarithm of number of
choices
P
C d log 1
N
References
Claude Shannon, Mathematical Theory of communication, Reprinted with
corrections from The Bell System Technical Journal,Vol. 27, pp. 379–423, 623–656,
July, October, 1948.
Mathematical Theory of Communication by Claude E. Shannon, Warren Weaver
T. D. Schneider, Sequence Logos, Machine/Channel Capacity, Maxwell's Demon,
and Molecular Computers: a Review of the Theory of Molecular Machines,
Nanotechnology, 5: 1-18, 1994
T. D. Schneider, Theory of Molecular Machines. I. Channel Capacity of Molecular
Machines J. Theor. Biol., 148:, 83-123, 1991
How (and why) to find a needle in a haystack Article in The Economist (April 5th11th 1997, British version: p. 105-107, American version: p. 73-75, Asian version: p.
79-81).
http://www.math.tamu.edu/~rahe/Math664/gene1.html
http://www.lecb.ncifcrf.gov/~toms/