Microarrays and array normalization

Download Report

Transcript Microarrays and array normalization

Gene Expression Microarrays
Microarray Normalization
Xiaole Shirley Liu
STAT115, STAT215, BIO512, BIST298
Announcement: you can register for
ANY of the above 4 courses now
Microarrays
• Grow cells at certain condition, collect
mRNA population, and label them
• Microarray has high density sequence
specific probes with known location for
each gene/RNA
• Sample hybridized to microarray probes by
DNA (A-T, G-C) base pairing, wash nonspecific binding
• Measure sample mRNA value by checking
labeled signals at each probe location
2
Spotted cDNA Arrays
• Pat Brown Lab, Stanford
University
• Robotic spotting of cDNA
(mRNA converted back to
DNA, no introns)
• Several thousand probes /
array
• One long probe per gene
3
Spotted cDNA Arrays
• Competing hybridization
– Control
– Treatment
• Detection
–
–
–
–
4
Green: high control
Red: high treatment
Yellow: equally high
Black: equally low
Why Competing Hybridization?
• DNA concentration in probes not the
same, probes not spotted evenly
5
Oligonucleotide Arrays
• Some Design Considerations
–
–
–
–
–
–
6
More or fewer probes / array?
Long or short oligos?
Same or different probe lengths?
How many probes / gene?
How are probes placed on the array?
One- or two-color assay
Affymetrix Oligo Arrays
• GeneChip® by Affymetrix
• Parallel synthesis of
oligonucleotide probes (25mer) on a slide using
photolithographic methods
• Millions of probes /
microarray
• Multiple probes per gene
• One-color arrays
7
Affymetrix GeneChip Probes
8
Labeled Samples Hybridize to DNA
Probes on GeneChip
9
Shining Laser Light Causes
Tagged Fragments to Glow
10
Perfect Match (PM) vs MisMatch (MM)
(control for cross hybridization)
11
NimbleGen
Oligo Arrays
12
Agilent Oligo Arrays
13
14
Why do we bother learning about
microarrays now?
• RNA-seq is probably preferred in new
expression experiments
• The amount of useful public data
• The data analysis techniques
15
Public Microarray Resources
• GEO: Gene Expression Omnibus, a NCBI
repository for gene expression and hybridization
data, growing quickly.
• TCGA: The Cancer Genome Atlas
– http://www.cbioportal.org/public-portal/
– https://cghub.ucsc.edu/
– http://www.broadinstitute.org/cancer/cga/
• Oncomine: Cancer Microarray Database
– Published cancer related microarrays
– Raw data all processed, nice interface
16
Affymetrix Microarray Imagine Analysis
• Gridding: based on spike-in DNA
• Affymetrix GeneChip Operating System
(GCOS)
– cel file
X
701
702
Y
523
523
MEAN
311.0
48.0
STDV
76.5
10.5
NPIXELS
16
16
– cdf file
• Which probe at (X,Y) corresponds to which probe
sequence and targeted transcript
• MM probes always (X,Y+1) PM
17
Replicates
• Always preferred
• Biological replicates:
– Different animals, tissues, etc
• Technical replicates:
– Repeated measures of the same sample
• In between:
– Same cell line grown on different days
18
Normalization
• Try to preserve biological variation and
minimize experimental variation, so
different experiments can be compared
• Assumption: most genes / probes don’t
change between two conditions
• Normalization can have larger effect on
analysis than downstream steps (e.g. group
comparisons)
19
Median Scaling
• Linear scaling
array1
array1
– Ensure the different arrays have the same
median value and same dynamic range
– X' = (X – c1) * c2
array2
20
array2
LOESS
• LOcally WEighted Scatterplot Smoothing,
more general form is LOESS
• Fit a smooth curve
– Use robust local linear fits
– Effectively applies different scaling factors at
different intensity levels
– Y = f(X)
– Transform X to X' = f(X)
– Y and X' are comparable
21
Quantile Normalization
• Bolstad et al Bioinformatics 2003
– Currently considered the best normalization method
– Assume most of the probes/genes don’t change between samples
• Calculate mean for each quantile and reassign each probe
by the quantile mean
• No experiment retain value, but all experiments have
exact same distribution
Experiments
Probes
22
Mean
How to Visualize Microarray
Normalization?
23
Dilution Series
• RNA sample in 5 different concentrations
• 5 replicates scanned on 5 different scanners
• Before and after quantile normalization
24
MvA Plot
log2R vs log2G
Values should be
on diagonal
25
M=log2R- log2G
A=(log2R+log2G)/2
Values should scatter
around 0
Before Normalization
• Pairwise MA plot for 5 arrays, probe (PM)
M  log 2 ( PM i / PM j )
A  log 2 PM i  PM j
26
After Normalization
• Pairwise MA plot for 5 arrays, probe (PM)
M  log 2 ( PM i / PM j )
A  log 2 PM i  PM j
27
When Might qnorm Fail?
• Loven et al, Cell 2012
28
Summary
• Microarrays: Different oligo arrays
• Array normalization: Loess, qnorm
– Assumptions
• Normalization visualization: MA plots
• We will cover batch effect removal after
clustering analysis…
29