Plate effects in cDNA microarray data
Download
Report
Transcript Plate effects in cDNA microarray data
Plate Effects in
cDNA Microarray Data
Henrik Bengtsson
[email protected]
Mathematical Statistics
Centre for Mathematical Sciences
Lund University, Sweden
Outline
•
•
•
•
•
•
•
Data
Known systematic variation / artifacts
New way of plotting microarray data
Print order / Plate effects
Normalization of plate effects
Normalization strategies
Finding the best strategy:
Measure of Reproducibility
• Results
• Discussion
2 of 21
Data
• Matt Callow’s apoAI experiment (2000):
– (8 apoAI-KO mice vs. pool of 8 control mice),
8 control mice vs. pool of 8 control mice, i.e.
eight hybridized slides.
– 5357 EST’s/genes (6 triplicates, 175
duplicates, 4989 single spotted) & 840 blanks
=> 6384 spots in all.
– Labeled using Cy3-dUTP and Cy5-dUTP.
– Signals extracted from the images by Spot.
3 of 21
Intensity dependent effects
The log-ratio, M, depends on the intensity of the spot, A.
4 of 21
Print-tip/spatial intensity effects
The log-ratio (and its variance) varies with print-tip group.
But, how are the spots printed…?
5 of 21
6384 spots printed onto N slides in total 399
print turns using 4x4 print-tips
4·4·399=
6384
6 of 21
Print order plot
The spots are order according to when they were
spotted/dipped onto the glass slide(s). Note that it
takes hours/days to print all spots an all slides.
7 of 21
Print dip plot
Median values of the 16 log-ratios at each dip
from each of the 399 print turns.
8 of 21
cDNA clones
Sources of artifacts
excitation
Plate effects
red
laser
green
laser
PCR
(clone sets,
...,product
?)
scanning
amplification
purification
Reference
Test
sample
sample
printing
Intensity effects
RNA
RNA
(labeling efficiency)
Print order
effects
cDNA
cDNA
emission
Intensity effects
(quenching)
overlay images
(climate, print-tips,...)
Hybridize
Production
data: (Rfg,Gfg,Rbg,Gbg, ...)
9 of 21
Plate effects
The log-ratios depends on the plate the spotted clone comes from.
(384-well plates from 6 different labs were used)
10 of 21
Normalizing plate by plate
Assumption:
The genes from one plate are in average
non-differentially expressed.
Correctness?
Are clones on the plates selected randomly?
Spots on plates are less random than for
instance spots in print-tip groups.
Recall that in the current setup we do a
comparison between 8 control mice and the
pool of them.
11 of 21
Removing (constant) plate biases
Will remove
some of the
intensity
dependent
effects...
...and some
of the spatial
artifacts.
12 of 21
...and then an intensity normalization?
?
• Intensity normalization =>
reintroduced plate biases!
Why? Because the intensities of the
spots, A, also show plate effects.
13 of 21
Should we normalize A for
plate effect? No!
Less DNA hybridized to the
blanks and to the ”brain”
spots, compared to the rest
(“liver” clones)
Intensity dep. normalization plate by plate
...plus a print-tip
normalization?
Removes
the plate
effects...
...and most
of the spatial
artifacts.
14 of 21
Multiple ways to normalize
Component-wise normalization methods, e.g.
• Ex: print-tip normalization + constant plate normalization
• Ex: plate intensity normalization + print-tip normalization
• ...
• will work in the general case
• Simultaneous normalization methods (not covered here)
• Ex: print-tip & plate intensity normalization (two dimensions)
• ...
• requires a model and will not be applicable to the general case
Need a way to compare different the outcomes...
15 of 21
Measure of Reproducibility
Ex: two different genes: da < db
Median absolute deviation (MAD) for gene i with
replicates j=1,2,...,J:
di = 1.4826 · median | rij |
where rij = Mij – median Mij is residual j for gene i.
The measure of reproducibility (small in good) is a scalar defined as the
mean of all genewise MADs:
M.O.R. = di / N
where N is the number of genes.
16 of 21
Results
21 different normalization
strategies was performed on both
background and non-background
subtracted data, i.e. total 42 runs.
– Constant platewise
normalization,
Pl(A) – Intensity dependent
platewise normalization,
Sl(A) – Intensity dependent
slidewise normalization,
Pr(A) – Intensity dependent
print-tip-wise normalization,
sPr(A) – Scaled intensity dependent
print-tip-wise normalization,
Pl
bg
17 of 21
– background corrected data.
Results
• Doing platewise
intensity dependent
normalization lowers
the gene variability
by another ~10%
from print-tip norm.
•In all cases it is
better not to do
background
correction.
• Using measure of
reproducibility is
helpful in deciding
what to do.
Pl – Constant platewise norm., Pl(A) – Intensity dep. platewise norm., Sl(A) – Intensity
dep. slidewise norm., Pr(A) – Intensity dep. print-tip-wise norm., sPr(A) – Scaled
intensity
dep. print-tip-wise norm., bg – background corrected data.
18
of 21
Visual comparison
No normalization:
(M.O.R.=0.270; 100%)
19 of 21
Scaled print-tip intensity
normalization:
(M.O.R.=0.123; 46%)
Scaled print-tip follow by
plate intensity normalization:
(M.O.R.=0.110; 41%)
Discussion
• What are the reasons for seeing plate effects and
where do they actually occur?
i) in clone setup, ii) on the plates, iii) during printing,
iv) at hybridization or where?
• Look at the behavior of the variance in addition to the
bias. Are there any reasons for doing platewise
normalization of variances too?
• How general is the result that not doing background
subtraction performs better than doing it?
20 of 21
Acknowledgements
Statistics Dept, UC Berkeley:
* Sandrine Dudoit
* Terry Speed
* Yee Hwa Yang
Lawrence Berkeley National Laboratory:
* Matt Callow
Mathematical Statistics, Lund University:
* Ola Hössjer, Jan Holst
Ernest Gallo Research Center, UCSF:
* Karen Berger
com.braju.sma – object-oriented extension to sma (free):
http://www.braju.com/R/
[R] Software (free):
http://www.r-project.org/
The Statistical Microarray Analysis (sma) library (free):
http://www.stat.berkeley.edu/users/terry/zarray/Software/smacode.html
21 of 21
Extra slides
22 of 21