Transcript ppt

Microarrays and the shadows
in Plato's cave
Matthias E. Futschik
Institute for Theoretical Biology
Humboldt-University, Berlin, Germany
Overview

General Introduction




Introduction to microarray techniques:





Plato and the cave: What can we learn from Greek philosophy?
The statement of the problem: Where do we start & where do we
want to go?
Microarray data analysis: The way out of the cave and the
challenges that we are meeting.
Experimental background
Types of microarrays
Microarray platforms
Image processing and data storage
Design of experiments


Reference design
Factorial design
Yeast cDNA microarray
Plato’s Cave
AND now, I said, let me show in a figure
how far our nature is enlightened or unenlightened:
Behold! Human beings living in a underground den,
which has a mouth open towards the light and
reaching all along the den; here they have been from their childhood, and have their leg and
necks chained so that they cannot move, and can only see before them, being prevented by the
chains from turning round their heads.Above and behind them a fire is blazing at a distance, and
between the fire and the prisoners there is a raised way; and you will see, if you look, a low wall
built along the way, like the screen which marionette players have in front of them, over which
they show the puppets….
To them, I said, the truth would be literally nothing but the shadows of the images…
And now look again, and see what will naturally follow if the prisoners are released and disabused of their error.
At first, when any of them is liberated and compelled suddenly to stand up and
turn his neck round and walk and look towards the light, he will suffer sharp pains; the glare will
distress him, and he will be unable to see the realities of which in his former state he had seen the
shadows; and then conceive some one saying to him, that what he saw before was an illusion, but
that now, when he is approaching nearer to being and his eye is turned towards more real
existence, he has a clearer vision,…
Plato’s Cave
And suppose once more, that he is reluctantly dragged up
a steep and rugged ascent, and held fast until he's forced
into the presence of the sun himself, is he not likely to be
pained and irritated?
When he approaches the light his eyes will be dazzled, and he will not be able to see anything at
all of what are now called realities… He will require to grow accustomed to the sight of the upper world.
Imagine once more, I said, such an one coming suddenly out of the sun to be replaced in his old
situation; would he not be certain to have his eyes full of darkness?
And if there were a contest, and he had to compete in measuring the shadows with the prisoners who had never
moved out of the den, while his sight was still weak, and before his eyes had become steady (and the time which
would be needed to acquire this new habit of sight might be very considerable) would he not be ridiculous?
Men would say of him that up he went and down he came without his eyes; and that it was better not
even to think of ascending; and if any one tried to loose another and lead him up to the light,
let them only catch the offender, and they would put him to death.
The Fire: Genetic networks
The Problem
Complex regulation of gene expression
The Sun:
New drug discovery by
knowledge discovery
The Shadow: Microarrays
Thousands of simultaneously measured gene activities
Methodology
1.
2.
3.
4.
5.
6.
7.
From Image to Numbers: Image-Analysis
Well begun is half done: Design of experiments
Cleaning up: Preprocessing and Normalisation
Go fishing: Significance of Differences in Gene Expression Data
Who is with whom: Clustering of samples and genes
Gattica becomes alive: Classification and Disease Profiling
The whole picture: An integrative approach
What are microarrays?



Microarrays consist of localised
spots of oligonucleotides or
cDNA attached on glass surface
or nylon filter
Different production:

Spotted microarrays

Photolithographicly
synthesised microarrays
(Affymetrix)
Different read-outs:

Two-channel (or two-colour)
microarrays

One-channel (or one-colour)
microarrays
Applications:
Clustering of genes:
Co-expression and co-regulation go
together enabling functional annotation
Clustering of time series

Classification
of tissue samples
and marker gene
identification
Clustering of arrays: finding
new disease subclasses
Reconstruction of
gene networks
(Reverse engineering)
Typical cDNA microarray experiment
Microarray technology I

Two-colour microarray (cDNA and spotted
oligonucleotide microarrays)


Probes are PCR products based on a chosen cDNA library or
synthesized oligonucleotides (length 50-70) optimized for specifity and
binding properties >> probe design
Probes are mechanically spotted. To control variation of amount of
printed cDNA/oligos and spot morphology, reference RNA sample is
included. Thus, ratios are considered as basic units for analysing gene
expression. Absolute intensities should be interpreted with care.
Array production - Galbraith lab
Affymetrix GeneChip technology
Production by photolitography
Hybridisation process
and biotin labelíng;
Fragmentation aims
to destroy higher
order structures of
cRNA
Microarray technologies II

One-colour microarrays (Affymetrix GeneChips)

Measurement of hybridisation of target RNA to sets of 25oligonucleotides (probes).

Probes are paired: Perfect match (PM) and mis-match (MM).
PM are complementary to the gene sequence of interest. MM
include a single nucleotide changed in the middle position of
the oligonucleotide. MM serve for controlling of experimental
variation and non-specific cross-hybridisation. Thus, MMs
constitute internal references (on the probe site).

Average (PM-MM) delivers measure for gene expression.
However, different methods to calculates summary indices
exist (e.g. MAS,dchip, RMA...)
From Images to Numbers
Impurities,
overlapping spots
Donut-shaped spots,
Inhomogeneous
intensities
Local Background
Spatial bias
Image Analysis
1. Localisation of spots: locate centres
after (manual) adjustment of grid
2. Segmentation: classification of
pixels either as signal or background.
Different procedures to define
background.
3. Signal extraction: for
each spot of the array,
calculates signal intensity
pairs, background and quality
measures.
Data acquisition

Scans of slides are usually stored in 16-bit TIFF files.
Thus, scanned intensities vary between 0 and 216.

Scanning of separate channels can adjusted by
selection of laser power and gain of photo-multiplier.


Common aim: balancing of channels.
Common problems: avoiding of saturation of high intensity
spots while increasing signal to noise ratios.
Data acquisition

Image processing software produces a variety of
measures: Spot intensities, local background,
spot morphology measures. Software vary in
computational approaches of image
segmentation and read-out.

Open issues:




local background correction
derivation of ratios for spot intensities
flagging of spots,
multiple scanning procedures
Design of experiment
Two channel microarrays incorporate a reference sample.
Choice of reference determines follow-up analysis.
A1
A2
All samples are co-hybridised with common
reference sample
R
Reference design:
●
Advantage: Robust and scalable. Length of path of direct
comparison equals 2.
● Disadvantage: Half of the measurements are made on
reference sample which is commonly of little or no interest
●
A3
Alternative Designs:
Dye-swap design: each comparison includes
dye-swap to distinguish dye effects from
differential expression (important for direct
labelling method)
●
A1
Loop-design: No reference sample is
involved. Increase of efficiency is, however,
accompanied with a decrease of robustness.
A2
●
Latin-square design: classical design to
separate effects of different experimental
factors
●
A3
Comparison of designs:
Yang and Speed,
Nature genetics reviews, 2002
Define before experiment
what differences (contrasts)
should be determined to
make best use out of
(usually) limited number of
arrays
Sources of variation in gene expression
measurements using microarrays
●
●
●
Microarray platform
Manufacturing or spotting process
● Manufacturing batch
● Amplification by PCR and purification
● Amount of cDNA spotted, morphology of spot and
binding of cDNA to substrate
mRNA extraction and preparation
● Protocol of mRNA extraction and amplification
● Labelling of mRNA
Sources of variation in gene expression
measurements using microarrays II
●
●
●
Hybridisation
● Hybridisation conditions such as temperature, humidity,
hyb-buffer
Scanning
● scanner
● scanning intensity and PMT settings
Imaging
● software
● flagging, background correction,...
Design of experiment
Important issues for DOE:
Technical replicates assess variability induced by
experimental procedures.
●
●
Biological replicates (assess generality of results).
Number of replicates depends on desired sensitivity and
sensibility of measurements and research goal.
●
Randomisation to avoid confounding of experimental
factors. Blocking to reduce number of experimental factors.
●
Design of experiment
●
Control spots
assess reproducibility within and between array, background
intensity, cross-hybridisation and/or sensitivity of measurement
● can consists of empty spots or hybridisation-buffer, genomic
DNA, foreign DNA, house-holding genes
●
Foreign (non-cross-hybridising) cDNA can be 'spiked in'. Use of
dilution series can assess sensitivity of detecting differential
expression by 'ratio controls'.
●
Validation of results:
by other experimental techniques (e.g. Northern, RT-PCR)
● by comparison with independent experiments.
●
Data storage
Microarrays experiments produce large amounts of data:
data storage and accessibility are of major importance
for the follow-up analysis.
Not only signal values have to be stored but also:
TIFF images and imaging read-out
● Gene annotation
● Experimental protocol
● Information about samples
● Results of pre-processing, normalization and further analysis
●
In fact, data of the whole experiment has to be stored,
and its internal structure i.e. which sample was extracted by
what methods was hybridised on which batch of slides by
whom and when?
Type of storage
Flat file: for small-scale experiments and one-off
analysis. Example NCBI - GenBank
●
●
Database: necessary for large scale experiments.
Microarray DBs typically relational, SQL-based models.
Their internal relational structure should reflect the
experiment structure.
These types of databases
will become essiential
tools for post-genomic analysis.
Data storage II
Sharing microarray data:
• NCBI
• EBI
• Stanford
• Journals ie Nature
Standardization of information by MGED:
● MIAME (minimal information about a microarray
experiment)
● MAGE-ML based on XML for data exchange
Take-home messages for today
• Remember: Microarrays are shadows of genetic
networks
• Watch out for experimental variation
•The complexity of microarray experiments should
be reflected in the structure used for data storage
• For soccer tonight: Go, Italy, go!