Microarray Pitfalls
Download
Report
Transcript Microarray Pitfalls
Microarray Pitfalls
Stem Cell Network
Microarray Course, Unit 3
October 2006
Goals
• To provide some guidelines on Affymetrix
microarrays:
– How to use them
– How not to use them
– Things to keep in mind when designing
experiments and analyzing data
• This is a general discussion of issues and
is by no means exhaustive
Inconsistent Annotations
• Affymetrix provided probeset annotations
change over time
• The gene symbol associated with a given
probeset is not necessarily stable
• This is due to changes in gene prediction
as new information becomes available.
Inconsistent Annotations (2)
An inconsistently annotated probeset
• Perez-Iratxeta, C. and M.A. Andrade.
2005. Inconsistencies over time in 5% of
NetAffx probe-to-gene annotations. BMC
Bioinformatics. 6, 183.
– 5% of probesets have gene identifiers that
change over the two year time span covered
by this analysis
Inconsistent Annotations (3)
• How do we deal with this?
– Always note annotation version used in
analysis especially when it is for publication
– Report probeset name as well as gene
symbol
– Remember that re-analysis with later
annotations may yield different results
– Keep your annotation files up to date
Old chips, new data
• Expression microarrays are designed
based the best available model of the
genome of interest
• The model for the HG-U133 microarrays
was a human genome assembly that was
only 25% complete!
• The human assembly is >99% complete
now
Old chips, new data (2)
• How do we deal with this?
– A number of groups provide re-mappings of
probes to probesets based upon the latest
data available, for example:
• Dai M, et al. Evolving gene/transcript definitions
significantly alter the interpretation of GeneChip
data. Nucleic Acids Res. 2005;33:e175
Multiple Testing Corrections
• A single expression microarray experiment
actually consist of hundreds of thousands
of simultaneous parallel experiment
• This means you can test many hypotheses
simultaneously
• This is not free: the significance of any
given result is decreases as a function of
the number of hypotheses tested
Multiple Testing Corrections (2)
• How do we deal with this?
– Limit the number of hypothesis you are testing
instead of just ‘fishing’ in the whole data set.
– Do this by selecting a set of candidate genes
ahead of time based on your knowledge of
the biology of the system.
Multiple Testing (3)
• Sandrine Dudoit, Juliet Popper Shaffer and
Jennifer C. Boldrick Multiple Hypothesis Testing
in Microarray Experiments Statistical Science
2003, Vol. 18, No. 1, 71–103
– “The biological question of differential expression can
be restated as a problem in multiple hypothesis
testing: the simultaneous test for each gene of the
null hypothesis of no association between the
expression levels and the responses”
• Talk to a statistician if you have doubts
Not everything is in the array
• Probesets are designed with a bias
towards the 3’ end of the gene.
• they won’t distinct splice variants
• won’t pick up alternative 3’ endings
Not everything is in the array (2)
• What can we do about this?
– You should be aware of this, but not much can
be done.
– Use other technologies to complement your
microarray results (PCR, sequencing)
What are you measuring?
• Remember that you are detecting the
average mRNA over a population of cells.
• Is your sample homogenous?
• If it’s not homogenous then what are you
measuring? How many types of cells in
what state?
• Time series of differentiating cells are
particularly problematic.
Inhomogenous Samples?
• Many sources of inhomogeneity
– Source organism gender
– Cell cycle
– Tissue source
– Diet
• Some can be eliminated
• All should be documented where possible
Chips don’t detect protein
• Central assumption of microarray analysis:
The level of mRNA is positively correlated
with protein expression levels.
– Higher mRNA levels mean higher protein
expression, lower mRNA means lower protein
expression
• Other factors:
– Protein degradation, mRNA degradation,
polyadenylation, codon preference, translation
rates,….
Conclusion
• This is a general discussion of issues,
doesn’t cover all pitfalls.
• Please contact [email protected] if you
have any comments, corrections or
questions.
• See associated bibliography for references
from this presentation and further reading.
• Thanks for your attention!