Microarray Quality Assessment

Download Report

Transcript Microarray Quality Assessment

Microarray Quality
Assessment
Issues in High-Throughput Data Analysis
BIOS 691-803 Spring 2010
Dr Mark Reimers
Quality Assessment
• Are there any factors that would lead you
to doubt or distrust a particular datum
(array) ?
• Quality of inputs – e.g. RNA quality
• Statistical QA – evidence of systematic
variation different from others
BioAnalyzer
Ideal: Two sharp peaks for 18S & 28S RNA
Spot QA for cDNA Spotted Arrays
• Spot Measures
– Signal/Noise
• Foreground / background or
– foreground / SD
– Uniformity
– Spot Area
• Global Measures
– Qualitative assessments
– Averages of spot measures
• Inspect images for artifacts
– Streaks of dye, scratches etc.
• Are there biases in regions?
With commercial arrays we assume these issues are under control
Statistical Approaches
• Question: Are any samples different from
others on technical grounds?
• Exploratory Data Analysis (EDA)
• Boxplots, clustering, PCA
– Are there any outliers?
– Are there associations with technical factors?
• Technician; date of sample prep; etc.
4
5
6
7
8
9
EDA - Boxplots
• Boxplot of 16 chips from Cheung et al
Nature 2005
Another Portrait - Densities
0.7
Density Plots:
before and after
Chips
0.1
0.2
0.3
0.4
GSM25540.CEL
GSM25541.CEL
GSM25542.CEL
GSM25543.CEL
GSM25548.CEL
GSM25549.CEL
GSM25550.CEL
GSM25551.CEL
0.0
Density
0.5
0.6
GSM25524.CEL
GSM25525.CEL
GSM25526.CEL
GSM25527.CEL
GSM25528.CEL
GSM25529.CEL
GSM25530.CEL
GSM25531.CEL
4
6
8
10
log(Signal)
12
14
Probe Intensities in 23 Replicates
Some Causes of Technical Variation
•
•
•
•
•
•
•
•
Temperature of hybridization differs
Amount of RNA differs
RNA degraded in some samples
Yield of conversion to cDNA or cRNA
differs
Strength of ionic buffers differs
Stringency of wash differs
Scratches on some chips
Ozone (affects Cy5) at some times
Borrow an Idea from Model Testing
• Question: Is the model adequate? Or do
hidden factors cause systematic errors?
• Examine residuals after fitting model
– Should be IID Normal
– Is there structure in residuals?
– Plot against known technical covariates, such
as order of sample
• How to adapt residual examination for
high-throughput assays?
Statistical QA for Arrays
• Model for signal of probe i on chip j: yij ~ mi + eij
– Each gene has same mean in all arrays (mostly true)
– Look at residuals after fitting model
• New twist for high-throughput assays:
– Examine residuals within each chip (fix j; vary i)
– Plot against known technical factors of probes
– Is there any factor that seems to be predicting
systematic errors?
Statistical QA of Arrays
• Significant artifacts may not be obvious
from visual inspection or bulk statistics
• General approach: plot deviations from
average or residuals from fit against any
technical variable:
– Average Intensity across chips
– CG content or Tm
– Probe position relative to 3’ end of gene (for
poly-T primed RNA)
– Physical location on chip
Ratio vs Intensity Plots:
Saturation & Quenching
• Saturation
– Decreasing rate of
binding of RNA at
higher occupancies on
probe
• Quenching:
– Light emitted by one
dye molecule may be
re-absorbed by a
nearby dye molecule
– Then lost as heat
– Effect proportional to
square of density
Plot of log ratio against
average log intensity
across chips
GSM25377 from the
CEPH expression data
GSE2552
How Much Variability on R-I?
• Ratio-Intensity plots for six arrays at
random from Cheung et al Nature (2005)
Covariation with Probe Tm
• MAQC project
• Agilent 44K
– Array 1C3
– Performed by
Agilent
•Plot of log ratios to average against Tm
•Bimodal distribution because two
samples are very different
Covariation with Probe Position
• RNA degrades
from 5’ end
• Intensity should
decrease from
3’ end uniformly
across chips
• affyRNAdeg
plots in affy
package
Plot of average intensity for
each probe position across all
genes against probe position
Effect of Runs of Guanines
• 4 G’s allows
quadruplex
structure
Spatial Variation Across Chips
Red/Green ratios
show variation
-probably
concentrated
Ratios of ratios
on slide to ratios
on standard show
consistent biases
In House Spotted Arrays
Ratio of ratios shows
much clearer
concentration of red
spots on some slides
Legend
Note non-random but
highly irregular
concentration of red
Bioconductor arrayQuality Package
Background Subtraction (1)
• We think that local
background
contributes to bias
• Does subtracting
background remove
bias?
Local off-spot background may
not be the best estimate of
spot background (nonspecific hyb)
Spots
BG
subtracted
Background Subtraction (2)
Raw spot ratios show
a mild bias relative to
average
After subtracting a
high green bg in the
center a red bias
results
Raw Ratios Background
BG-subtracted
Other Bias Patterns
Processed
This spotted oligo array
shows strong biases at
the beginning and end of
each print-tip group
The background shows
a milder version of this
effect
Subtracting background
compensates for about
half this effect
Raw Spot
Background
Local Bias on Affymetrix Chips
Image of raw data on a log2
scale shows striations but no
obvious artifacts
Image of ratios of probes to
standard shows a smudge
Noncoding
probes
Images show high values as red, low values as yellow
Spatial Artifacts on Affy Chips
Bubbles (yellow) in
hybridization chamber
Scratches on cover slip
Touching cover slip and
wiping incompletely
QC in Bioconductor
• Robust Multi-chip Analysis (RMA)
– fits a linear model to each probe set
– High residuals show regional patterns
High residuals in green
See http://plmimagegallery.bmbolstad.com/
Available in affyQCReport package at www.bioconductor.org
Affy QC Metrics in Bioconductor
• affyPLM package fits
probe level model to
Affymetrix raw data
• NUSE - Normalized
Unscaled Standard
Errors
– normalized relative to
each gene
• How many big errors?
Spatial Artifacts in Agilent
• Usually not so strong
as on other array
types
• More diffuse artifacts –
probably reflecting
washing irregularities
Spatial Artifacts in Nimblegen
• More common than
Agilent
• Usually more diffuse,
probably reflecting
washing
• Some sharp artifacts of
unclear origin
Spatial Artifacts in Illumina Arrays
• Often bigger artifacts than
Affy
• Less consequential because
more beads, and all have
same sequence