Introduction to Bioinformatics.

Download Report

Transcript Introduction to Bioinformatics.

Introduction to
Bioinformatics
1
Introduction to Bioinformatics.
LECTURE 9: Clustering gene expression
*
Chapter 9: The genomics of wine-making
2
Introduction to Bioinformatics
LECTURE 9: CLUSTERING GENE EXPRESSION
9.1 Chateau Hajji Feruz Tepe
* Wine making dates back to at least 5000 BC, based on
archeological finds in Iran: Hajji Feruz Tepe .
Overview of Neolithic houses at Hajji Feruz Tepe that
3
yielded six wine jars in the floor along one wall of the room.
Introduction to Bioinformatics
LECTURE 9: CLUSTERING GENE EXPRESSION
* Wine making dates back to at least 5000 BC, based on
archeological finds in Iran: Hajji Feruz Tepe .
One of six jars once filled
with wine from the
Neolithic residence at
Hajji Feruz Tepe (Iran).
Chemical analysis of
patches of a reddish
residue covering the
interior of this vessel
showed that this
originally was resinated
wine.
4
Introduction to Bioinformatics
LECTURE 9: CLUSTERING GENE EXPRESSION
* Recipe for wine
making:
1. fruit juice (or other sugar-rich
liquid)
2. yeast: Saccharomyces
cerevisiae
5
6
Introduction to Bioinformatics
9.1 CHATEAU HAJJI FERUZ TEPE
Yeast (Saccharomyces cerevisiae) is a unicellular
fungus found naturally in grapevines and responsible
of wine-making fermenting sugars and producing
alchool.
7
8
9
Introduction to Bioinformatics
9.1 CHATEAU HAJJI FERUZ TEPE
From being budded off from its parent cell, to
reproducing its own offspring, each yeast cell goes
through a number of typical steps that also involve
changes in gene expression, turning whole
pathways on and off.
10
11
12
Introduction to Bioinformatics
LECTURE 9: CLUSTERING GENE EXPRESSION
13
Introduction to Bioinformatics
9.1 CHATEAU HAJJI FERUZ TEPE
Remember, a gene is an on-off switch and RNa and
proteins are messengers between the genes.
If a gene is ‘on’ the gene is ‘expressed’. The degree to
which the gene is expressed is called the expression
level of the gene.
If a gene is off, it can be said that it has expression level
zero.
14
Introduction to Bioinformatics
9.1 CHATEAU HAJJI FERUZ TEPE
Today the study of such phenomena is possible through
the technology of microarray that can measure the
expression level of every gene in a cell.
With the gene expression data, genes can be clustered on
the basis of the similarity of their expression profiles.
15
Introduction to Bioinformatics
9.1 CHATEAU HAJJI FERUZ TEPE
* With water, sugar and flour, yeast ferments the sugars
in the dough and produces carbon dioxide CO2 (this
causes the dough to rise). In this process it produces
alcohol as a by-product (originally perhaps as near-toxic
protection!).
* When the sugar supply is exhausted S. cerevisiae must
find a new source of energy: when oxygen is available it
shifts to respiration: alcohol now becomes the source of
energy.
* This state change is called the diauxic shift
16
Introduction to Bioinformatics
9.1 CHATEAU HAJJI FERUZ TEPE
* S. cerevisiae is (one of) the most studied organism in
biology
* S. cerevisiae is a complex unicellular Eukaryote
* 12.5 Mbp genome in 16 linear chromosomes (except
mitochondriae) containing 6400 genes (2000 more than E.
coli).
17
Introduction to Bioinformatics
9.1 CHATEAU HAJJI FERUZ TEPE
18
Introduction to Bioinformatics
9.1 CHATEAU HAJJI FERUZ TEPE
* S. cerevisiae can be regarded as a complex factory
transforming many raw materials to final materials,
involving many ‘conveyor belts’ between the genes
* Such a conveyor belt of coupled expressed genes is
called a genetic pathway
* The diauxic shift means that the whole system has to be
transformed from the old process to the new process,
meaning that entire new pathways are formed, and old
pahways are shut-off.
19
Introduction to Bioinformatics
9.1 CHATEAU HAJJI FERUZ TEPE
* Therefore it is usefull to monitor the genome-wide
expression of S. cerevisiae in time, including the diauxic shift.
* Such a conveyor belt of coupled expressed genes is called
a genetic pathway
* This monitoring can be done with microarrays, the
foremost important tools in bioinformatics.
* Other dynamical processes as the Cell Cycle can also be
studied with microarrays.
* This requires the data analysis of the microarrays – here we
study the clustering of expression profiles: time series of
20
expression levels.
Introduction to Bioinformatics
LECTURE 9: CLUSTERING GENE EXPRESSION
9.2 Monitoring cellular communication
* Purpose of microarrays: snap-shot of the expression levels
in the cell.
* Expressed gene = DNA → mRNA → proteins ….
* In the cell therefore expressed genes cause high numbers
of mRNA molecules.
* Idea of microarrays: measure the concentrations of mRNA,
and reverse-compute the DNA belonging to this mRNA.
* As RNA can be spliced due to exons, the backward
computed DNA is not entirely equal to the real DNA: it is
called cDNA: complementary DNA.
21
Introduction to Bioinformatics
9.2 MONITORING CELLULAR COMMUNICATION
* The cDNA computed from mRNA hints to an expressed
gene, the cDNA is stored as an EST: Expressed
Sequence Tag.
* EST sequencing can identify genes that are ‘missed’ with
ab initio gene-finding methods, such as ORF-finder.
22
Introduction to Bioinformatics
LECTURE 9: CLUSTERING GENE EXPRESSION
9.3 Microarray technologies
* A microarray is an array of sensitive spots, each containing
a stretch of DNA, e.g. based on an EST
* Hybridization (=chemical binding) of the DNA with
components in the substrate indicates the presence of the
associated mRNA
* The hybridization can be made visible by inserting
fluoriscent molecules on the DNA (red, green) and later
illuminating them with a suitable laser
23
24
Until recently we lacked tools to observe genome-wide
expression
1989 saw the introduction of the microarray technique by
Stephen Fodor
But only in 1992 this technique became generally
available – but still very costly
25
Microarray
Microarray-developper
Stephen Fodor
developped microarray
26
Introduction to Bioinformatics
9.3 MICROARRAY TECHNOLOGIES
27
Introduction to Bioinformatics
9.3 MICROARRAY TECHNOLOGIES
Example of an Affymetrix microarray simulation. Example of the simulated singlechannel oligonucleotide microarray slide image (crop from top left corner) (a). We have
used an Affymetrix .cel file as the ground truth data. Thus the text about the slide type is
28
observable. Real Affymetrix slide image is shown for comparison (b).
Introduction to Bioinformatics
LECTURE 9: CLUSTERING GENE EXPRESSION
9.4 The diauxic shift and yeast gene
expression
* In 1997 DeRisi et alum used microarrays to measure the
genome-wide expression on S. cerevisiae during the
diauxic shift.
* 9 initial hours of growth, 6 hours before the diauxic shift,
and 6 hour there after.
* They compared the mRNAs in the array at t time-steps
before the diauxic shift, and compared those with the
mRNA-levels at time 0.
29
Introduction to Bioinformatics
9.4 THE DIAUXIC SHIFT AND YEAST GENE EXPRESSION
* This experiment gave a set of 43.000 ratios: seven timepoints (t1, t2,…, t7) of 6400 gene expression levels
normalized o their start value.
* This is the reference design in microarray literature
30
Introduction to Bioinformatics
9.4 THE DIAUXIC SHIFT AND YEAST GENE EXPRESSION
* This experiment typically provides a time series that is
small relative to the size of the genome ; here m=7
timepoints for n=6400 genes.
* This is due to the cost of an array: ~ 1000 euro/array
* With this kind of experiment we can in principle also
reconstruct the gene regulatory networks
31
Introduction to Bioinformatics
9.4 THE DIAUXIC SHIFT AND YEAST GENE EXPRESSION
9.4.1 Data Description
* First analyse the relative change in activity
* Less than 5% of the genes change more than 1.5-fold,
or less then 0.67-fold.
* fold-change: f = new_value/old_value; if f > 1 the foldchance is f, if f < 1 then the fold-change is – 1/f
* Example: x0 = 1, x1 = 0.3333, fold-change is -3,
x0 = 1, x1 = 3, fold-change is +3.
32
Introduction to Bioinformatics
9.4 THE DIAUXIC SHIFT AND YEAST GENE EXPRESSION
9.4.1 Data Description
* Now select only those genes with an absolute
fold-change above a certain threshold:
abs(fold-change) > threshold
33
Introduction to Bioinformatics
9.4 THE DIAUXIC SHIFT AND YEAST GENE EXPRESSION
9.4.1 Data Clustering
* Next, cluster the genes relative to their expression
levels.
* High intra-cluster similarity and low inter-cluster
similarity.
* Use a distance/similarity measure and a clustering
algorithm.
34
Introduction to Bioinformatics
9.4 THE DIAUXIC SHIFT AND YEAST GENE EXPRESSION
Data Clustering
1. Define a suitable Distance Measure d(x1,x2), e.g.
Pearson’s correlation coefficient, or a normalized distance
like the Mahalanobis distance, or a metric like the
generalized p-norm.
2. Define a clustering criterion, e.g.:
C = ∑ij in same cluster dij - ∑ij in different cluster dij.
3. Apply a suitable clustering algorithm, e.g. hierarchical, or
K-means clustering.
35
Introduction to Bioinformatics
9.4 THE DIAUXIC SHIFT AND YEAST GENE EXPRESSION
Hierarchical clustering
36
Introduction to Bioinformatics
9.4 THE DIAUXIC SHIFT AND YEAST GENE EXPRESSION
K-means clustering
37
Introduction to Bioinformatics
9.4 THE DIAUXIC SHIFT AND YEAST GENE EXPRESSION
Gene function and Clustering
1. Genes with similar expression profiles have similar
functions.
2. Define a clustering criterion, e.g.:
C = ∑ij in same cluster dij - ∑ij in different cluster dij.
3. Apply a suitable clustering algorithm, e.g. hierarchical, or
K-means clustering.
38
Introduction to Bioinformatics
9.4 THE DIAUXIC SHIFT AND YEAST GENE EXPRESSION
Gene function and Clustering
1. Single linkage = min i,j ||x[i] – y[j]||.
2. Average linkage = mean i,j ||x[i] – y[j]||.
3. Centroid distance: dAB = ||mA – mB||
39
Introduction to Bioinformatics
9.4 THE DIAUXIC SHIFT AND YEAST GENE EXPRESSION
9.4.3 Data Visualisation
* In a tree using Hierarchic clustering.
* In a plane using MDS
40
Introduction to Bioinformatics
9.4 THE DIAUXIC SHIFT AND YEAST GENE EXPRESSION
Gene function and Clustering
2. Multi Dimensional Schaling
41
Introduction to Bioinformatics
9.4 THE DIAUXIC SHIFT AND YEAST GENE EXPRESSION
Gene function and Clustering
1. Hierarchical clustering: level of cut-off
42
Introduction to Bioinformatics
9.4 THE DIAUXIC SHIFT AND YEAST GENE EXPRESSION
Pre-processing
* Select only genes with ‘enough’ fold-change
* Delete missing values
43
44
45
46
Introduction to Bioinformatics
9.4 THE DIAUXIC SHIFT AND YEAST GENE EXPRESSION
47
timesteps →
Heatmap
gene in hierarchical cluster →
48
Introduction to Bioinformatics
LECTURE 9: CLUSTERING GENE EXPRESSION
49
Introduction to Bioinformatics
LECTURE 9: CLUSTERING GENE EXPRESSION
9.5 CASE STUDY: Cell-cycle regulated
genes
* A set of microarrays over the cell-cycle of yeast.
50
Introduction to Bioinformatics
LECTURE 9: CLUSTERING GENE EXPRESSION
From being budded off from its parent cell, to
reproducing its own offspring, each yeast go
through a number of typical step that also involve
changes in gene expression, turning whole
pathways on and off.
51
Introduction to Bioinformatics
LECTURE 9: CLUSTERING GENE EXPRESSION
Here we examine the expressions of the entire yeast
genome through two rounds of the cell cycle.
The temporal expression of genes are measured by
microarray at 24 time points every five hours. In detail we
have the expression profile of about 6400 genes.
52
Introduction to Bioinformatics
9.5 THE CELL CYCLE
53
Introduction to Bioinformatics
LECTURE 9: CLUSTERING GENE EXPRESSION
54
Introduction to Bioinformatics
LECTURE 9: CLUSTERING GENE EXPRESSION
55
Introduction to Bioinformatics
LECTURE 9: CLUSTERING GENE EXPRESSION
56
Introduction to Bioinformatics
LECTURE 9: CLUSTERING GENE EXPRESSION
57
Introduction to Bioinformatics
LECTURE 9: CLUSTERING GENE EXPRESSION
58
END of LECTURE 9
59