GeneNetworkConnectivityEffiKenigsberg
Download
Report
Transcript GeneNetworkConnectivityEffiKenigsberg
Inferring the nature of the gene
network connectivity
Dynamic modeling of gene expression data
Neal S. Holter, Amos Maritan, Marek Cieplak, Nina V. Fedoroff, and
Jayanth R. Banavar
Topics in biophysics
13.1.2009
Effi Kenigbserg
Outline
Gene networks
basics
what can be measured
microarray technology - the explosion of
dataset
Holter’s paper – trying to simplify the problem
Once upon a time
“the father of genetics“
Gene : the basic unit of
heredity in a living organism
Gregor Mendel
1822-1884
From DNA to Protein the flow of information
Across different tissues conditions and cell phase:
DNA sequence is (almost) identical
Number of mRNA and protein copies is highly
variable
Cells within the same tissues and
conditions show similar gene expression
profiles
Proteins are crucial functional units of the
living cell
Cells that function similarly express similar
protein profiles
How is protein abundance
regulated?
The key variables
Abundance (concentration) of proteins –high
throughput measurement hasn’t been done
yet.
mRNA expression - a fair predictor of protein
abundance (r ~ 0.7 in yeast ).
Before 1995, it was not practical.
Now days it is relatively easy
How is mRNA expression measured?
Microarray technology
Allows detection of thousands of DNA
molecules simultaneously
Two competing array type:
Gene chip (DNA chip, Affymetrix chip)
cDNA chip DNA microarray, two-channel array)
Affymetrix chip
Consists of an arrayed series of thousands of
microscopic spots of DNA oligonucleotide
Target
probe
Making a labeled DNA from mRNA
sample
Extract mRNA from the cell
Convert mRNA into colored cDNA
(complementary fluorescently labeled DNA)
Hybridize cDNA with array
Each cDNA sequence hybridizes (attaches)
specifically with the corresponding gene
sequence in the array
Wash unhybridized cDNA off
Scanning the array
The laser excited array is being scanned.
The scanned result for a given gene is the
average over all probes which correspond to
this gene.
Analyzing the array scans
SCHENA, Brown, et al.
Data Explosion!
Hundred of thousands (or maybe millions?)
microarray experiments are conducted every
year
Will we ever understand this data?
Usage of mRNA expression data
How do gene expression levels at time t can
describe gene expression levels at time t+Δ?
The budding yeast - Saccharomyces
cerevisiae (sugar fungi of beer)
5–10 micrometers
doubling time of ~2 hours
~4800 genes
Cell cycle in budding yeast
A succession of events whereby a cell grows and divides into two
daughter cells that each contain the information and machinery
necessary to repeat the process
S. cerevisiae regulatory network
Ananko et al. 2002
Less than 100 genes
t
The dataset (yeast cell cycle)
800 genes
12 equally spaced time points
(12 microarrays)
Two cell cycles long
genes
Red – high mRNA expression
Green – low mRNA expression
(relative to a control)
The linear interaction model
the expression levels of the n genes at a given time
are postulated to be linear combinations of their
levels at a previous time
In order to learn n² gene interactions,
n equations (time points) are needed
Simplifying gene interactions using SVD
Singular Value Decomposition
Let A be our dataset (n * m matrix). Then there
exists a factorization of the form:
A USV
where:
T
U is a n x n unitary matrix U *U T I
S is a n x m diagonal matrix , with positive values on the
diagonal
V is a m x m unitary matrix V *V T I
Wikipedia’s SVD example
The singular values
S
Using SVD
The modes: the first r rows of the matrix
Xi
, i = 1..r
r=number of singular values
Expression of each gene is a linear
combination of the modes
r
A j (t ) U j ,i X i (t )
i 1
SV T
How do modes effect each other?
Time translation matrix, M, represents the
interactions between modes
When r = #(singular values), M can be calculated directly
Cell cycle singular values
18
16
14
12
10
Value
8
6
4
2
0
11
10
9
8
7
6
index
5
4
3
2
1
Complexity may be reduced by using only the
modes corresponding to the highest singular values
Gene expression profile is well
reconstructed using only 2 modes
Mode 1
o measured
- approximated
Mode 2
The first two characteristic modes for the cell cycle data
Simplify gene interactions using clustering
Alon, Barkai et al. 1999
Clustering genes by similarity and learning the
interactions between clusters may simplify the
problem
Spellman et al.
Conclusions
Gene connectivity networks are highly
redundant
It is possible to describe some of variability of
huge biological datasets by simple interaction
models
There is a lot of biological data out there