cDNA Microarray

Download Report

Transcript cDNA Microarray

cDNA Microarray
Design and Pre-processing
By
H. Bjørn Nielsen
Why Experimental Design
1.
To enable statistical hypothesis verification/falsification
2.
To balance the effects from undesired controllable
effects
3.
To ensure sufficient statistical power
1. To enable statistical hypothesis
verification/falsification
Typically, we want to identify differential expressed
genes between a set of conditions using t-test or
ANOVA like statistics.
This implies that we replicate sampling from a set of
fixed conditions.
Control vs. Treatment
Treatment 1,
Treatment 2,
Treatment 3
Multi factorial
Control
Mutant,
Treatment
Mutant Treated
1. To enable statistical hypothesis
verification/falsification
But we may also fit to a trend using alternative statistics
(Bayesian fit, Boot strapping, ANOVA etc.)
Series
T0, T1, T2, .... Tn
The length of the series or
the sampling density may
be most important
Control vs. Treatment
Treatment 1,
Treatment 2,
Treatment 3
Multi factorial
Control
Mutant,
Replications is essential
Treatment
Mutant Treated
2. To balance the effects from
undesired controllable effects
Typical controllable effects
Labeling dye
Microarray slide
Sampling time
Growth conditions
Minimize and Balance
Typical uncontrollable effects
Random effects
Unintended deviations in sample handling, growth conditions, etc.
2. To ensure sufficient statistical power
An appropriate number of replicates are required for distinguishing
noise from 'effect'
Gene expression studies typically requires +3 replicates
Make sure to replicate over the most important sources of variance
Typical order of noise contributions are:
Biological variation
Sample preparation batch
Hybridization/slide effect
Dye effect/Spot effect
t=
An example
Aim: Identify differentially expressed genes between ill and healthy patients.
Samples: 4 ill and 4 healthy patients
Using a two channel cDNA array.
How should we do?
Slide
Dye
Condition
Slide 1
Cy3
ill
Slide 1
Cy5
...
...
...
Another example
Aim: Identify differentially expressed genes between ill and healthy patients.
Samples: 4 ill (2xM +2xF) and 4 healthy (2xM +2F)
Using a two channel cDNA array.
How should we do?
Slide
Dye
Sex
Condition
Slide 1
Cy3
M
ill
Slide 1
Cy5
...
...
...
...
Yet another example
Aim
Identify genes differentially affected by starving in obese and lean people
Samples: 4 obese (2x starving + 2x not starving) and
4 lean (2x starving +2x not starving)
Using a one channel GeneChip.
How should we do?
Chip #
BMI
Food
1
O
S
2
L
N
...
...
cDNA pre-processing
• Background correction
• Normalization
– Within slide
– Between slide
Background correction
Is it meaningful?
Methods:
– subtraction
– movingmin (3x3)
– normexp
– none
Ritchie et al. 2007, Bioinformatics
Normalization within array
Correct for any bias that follow an undesired
uncontrollable effect
–
–
–
–
Print tip
Microtiter plate
Printing order
Spatial trends (uneven hybridization)
As well as intensity dependent biases
Normalization between array
Correction for intensity dependent biases
–
–
–
–
Lowess
Qspline
Quantiles
And more
M
A