Transcript Document

Design and Analysis of cDNA
Microarray Experiments at
CSIRO Livestock Industries
Toni Reverter ([email protected])
CLI Bioinformatics
Queensland Bioscience Precinct
Brisbane, 4067 Australia
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at
CSIRO Livestock Industries (Toni Reverter)
Contents
Introduction:
Analysis possibilities
Challenges
Process for microarray
Technical Concerns:
Design
Image (data) quality
Data analysis
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at
CSIRO Livestock Industries (Toni Reverter)
Analysis Possibilities
(adapted from Hongzhe Li, 2002)
Determine genes which are differentially expressed (DE).
Connect DE genes to sequence databases to search for
common upstream regions.
Overlay DE genes on pathway diagrams.
Relate expression levels to other information on cells, e.g.
tumor types.
Identify temporal and spatial trends in gene expression.
Seek roles of genes based on patterns of co-regulation.
…many more!!!
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at
CSIRO Livestock Industries (Toni Reverter)
Challenges
Time Dependent
Human Dependent
Data Dependent
Chronology
Skill Integration
Paradigm
Logical
cDNA
1800s – DATA
30-60s – METHODS
50-70s – SOFTWARE
1980s – COMPUTER




Quantitative
Non-Q
Computer Sci.
Statisticians
Mathematicians
…….
Biochemists
Physiologists
Pathologists
…….
BANANA
EGG
Distribution
Source
Size
“banana omelette”
Historical
Excitement
Balance
Interdisciplinary
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at
CSIRO Livestock Industries (Toni Reverter)
Array Process
Tissue Samples
Treat A
Treat B
Analysis
mRNA Extraction & Amplification
+
cDNA “A” Cy5
cDNA “B” Cy3
Image Capture
Laser 1
Laser 2
Optical Scanner
Hybridization
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at
CSIRO Livestock Industries (Toni Reverter)
What you see is
What you get
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at
CSIRO Livestock Industries (Toni Reverter)
Technical Concerns
Egg Level (Biochemist):
1. Preparation (Printing) of the Chip
2. RNA Extraction, Amplification and Hybridisation
3. Optical Scanner (Reading)
Banana Level (Quantitative):
1. Design
2. Image (data) Quality
3. Data Analysis
Replication:
1. Animal
2. Sample
3. Array
4. Spot
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at
CSIRO Livestock Industries (Toni Reverter)
Technical Concerns: Image (Data) Quality GP3xCLI
##############################################################
# GP3xCLI
#
# GenePix Processing Program by CSIRO Livestock Industries #
#
#
# Enquiries: [email protected]
#
# Copyright (c) 2003 CSIRO-LI
#
##############################################################
GPR Input:
Processed on:
F12.gpr
Tue Apr
8 13:40:01 EST 2003
=-=-=-=-=-=-= IMAGE QUALITY =-=-=-=-=-=-=-=
Total No. of Spots ------------------------> 19200
Spots
Spots
Red
Green
with Flag = -50 -------------------->
with Flag = -100 -------------------->
dye with Background >= Foreground --->
dye with Background >= Foreground --->
4720
12
892
915
Median to Mean Correlation Analysis:
DATA LEFT
RED
GREEN
Corr
Raw
Log2
Raw
Log2
______________________________________
> 0.00
19200 19200
19200 19200
> 0.20
19199 19200
19199 19200
> 0.40
19183 19200
19192 19200
> 0.60
19008 19200
19102 19200
> 0.80
17061 19199
18541 19198
> 0.85
14466 19193
17872 19196
> 0.90
10491 19137
15786 19181
=-=-=-=-=-=-= VALID SPOTS* =-=-=-=-=-=-=-=
Total No. of Valid Spots -----------------> 14433
Percentage of Valid Spots -----------------> 75.2
Total
Mean
Min.
Max.
No.
No.
No.
No.
of Genes ------------------------> 7220
Repetitions ----->
2 for 6600 Genes
Repetitions ----->
1 for
580 Genes
Repetitions -----> 24 for
8 Genes
Log(R/G) vs 0.5*Log(R*G)
________
____________
N
14433
14433
Mean
-0.017
10.327
Std
0.617
2.079
Min
-8.711
3.246
Max
4.030
15.994
Correlation
0.362
Log(R/G) across Intensity Values
Intensity
Spots
% <0
% >0
__________________________________
( 0 , 4)
4
100.0
0.0
( 4 , 8)
1499
74.1
25.9
( 8 , 12)
9847
40.4
59.6
(12 , 16)
3083
17.3
82.7
__________________________________
*NB: Valid Spot defined as spots with Background < Foreground for
both Red and Green channels and with a Quality Flag of 0.
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at
CSIRO Livestock Industries (Toni Reverter)
Clever Programming
Tailored to your needs
N=1
for filename in R16T0S1.gpr R16T0S2.gpr R16T24S1.gpr R16T24S2.gpr
S32T0S1.gpr S32T0S2.gpr S32T24S1.gpr S32T24S2.gpr
do
# Get valid readings, compute log ratios
awk 'NR>30 && $NF>=0 && $4!="no_spot" && substr($4,1,5)!="score" &&
\
substr($4,1,5)!="custo" && substr($4,1,6)!="spotre"
&&
\
$9>$12 && $18>$21 {print $4, $9-$12, $18-$21, log($9-$12)/log(2.0), \
log($18-$21)/log(2.0)}' $filename | sort > junk1
awk '$2!=$3 {print $0, $4-$5, 0.5*($4+$5)}' junk1 > junk2
# get the median of log ratios
REC=`wc -l junk2 | awk '{print int($1/2)}'`
MED=`sort -n +5 junk2 | awk -v rec=$REC 'NR==rec {print $6}'`
echo "Median of file" $filename " = " $MED
# Global normalization: substract the median to each log ratio
awk -v median=$MED -v slide=$N '{print "Slide_"slide, int(slide/2+.5),
$1, $6-median}' junk2 | sort +2 > dat.$N
\
N=`expr $N + 1`
done
cat dat.* > total.dat
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at
CSIRO Livestock Industries (Toni Reverter)
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at
CSIRO Livestock Industries (Toni Reverter)
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at
CSIRO Livestock Industries (Toni Reverter)
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at
CSIRO Livestock Industries (Toni Reverter)
Technical Concerns: Experimental Design
O
A
O
A
O
A
B
AB
B
AB
B
AB
Reference
Loop
All-Pairs
Variance of Estimated Effects
(Relative to the All-Pairs)
Main effect of A
Main effect of B
Interaction AB
Contrast A-B
Reference
1
1
3
2
Loop
4/3
1
8/3
1
All-Pairs
1
1
2
1
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at
CSIRO Livestock Industries (Toni Reverter)
Technical Concerns: Experimental Design
Glonek & Solomon
Factorial and Time Course Designs for
cDNA Microarray Experiments
• Read pp 1-2
• Definition
A design with a total of n slides and design matrix X is said to be admissible
if there exists no other design with n slides and design matrix X* such that
ci*  ci
For all i with strict inequality for at least one i. Where ci* and ci are respectively
the diagonal elements of (X*’X*)-1 and (X’X)-1.
• Read pp 24
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at
CSIRO Livestock Industries (Toni Reverter)
No. of Arrays:
(S-1)
to
S·(S-1)
S=3
2
6
S=4
3
12
S = 12
11
132
What is the No. of Possible Configurations?
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at
CSIRO Livestock Industries (Toni Reverter)
SA-1
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at
CSIRO Livestock Industries (Toni Reverter)
Pie-Bald black
Non-Pie-Bald black
Normal
White
SA-1
=
53
= 125
Recessive
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at
CSIRO Livestock Industries (Toni Reverter)
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at
CSIRO Livestock Industries (Toni Reverter)
0 hr
24 hr
SA-1 = 1210 = 62 Billion!
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at
CSIRO Livestock Industries (Toni Reverter)
0 hr
24 hr
R
G
R
G
G
R
G
R
G
R
R
G
G
R
R
G
G
R
R
G
G
R
R
G
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at
CSIRO Livestock Industries (Toni Reverter)
R
24: 23 To 552
F
HS
M
TM
F
HS
G
G
R
R
G
R
R
G
pooling
14: 13 To 182
M
F
HS
TM
G
R
G
G
R
G
R
R
M
F
M
HS
HS
HS
R
G
R
G
G
R
G
G
R
R
G
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at
CSIRO Livestock Industries (Toni Reverter)
Slide
RES
SUS
0
3
24
M
F
HS
TM
1
0
0
1
0
-1
0.066
-0.066
0.266
-0.266
2
0
0
-1
1
0
0.600
-0.600
-0.600
0.600
3
1
-1
1
-1
0
-0.600
0.600
-0.400
0.400
4
-1
1
-1
0
1
0.600
-0.600
0.400
-0.400
5
-1
1
1
-1
0
-0.600
0.600
1
-1
6
1
-1
1
-1
0
0.666
-0.666
-0.400
0.400
7
1
-1
-1
0
1
-0.666
0.666
0
0
8
0
0
1
0
-1
-0.333
0.333
0
0
9
0
0
0
-1
1
0.333
-0.333
-0.666
0.666
10
0
0
0
1
-1
-1.000
1.000
0
0
11
-1
1
0
-1
1
-0.500
0.500
0
0
12
1
-1
0
1
-1
-1.000
1.000
-1
1
13
-1
1
0
1
-1
0.666
-0.666
0.666
-0.666
14
0
0
0
1
-1
-1.000
1.000
0
0
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at
CSIRO Livestock Industries (Toni Reverter)
RES
RES
SUS
0
3
24
M
F
HS
TM
8
-8
1
0
-1
-1.766
1.766
-3.866
3.866
8
-1
0
1
1.766
-1.766
3.866
-3.866
8
-4
-4
-1.335
1.335
0.666
-0.666
10
-6
-1.033
1.033
-0.468
0.468
10
2.368
-2.368
-0.198
0.198
6.247
-6.247
0.493
-0.493
6.247
-0.493
0.493
3.798
-3.798
SUS
0
3
24
M
F
HS
TM
Sum(ABS)
3.798
29.3
29.3
22.0
23.0
39.1
23.1
27.1
21.7
21.7
17.6
17.6
7.1
7.1
14.3
14.3
Reference Design
Sum(ABS)
26.8 26.8
17.3
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at
CSIRO Livestock Industries (Toni Reverter)
Technical Concerns: Statistical Analysis
Data Beautifying Techniques
Technique
Choice
Aim
Real
1. Transformation
2. Normalisation
3. Standarisation
Base-2 Log
Numerically tractable
Location: M - c
Systematic effects
2.i. Global: - Mean
- Median
- Regr. Coeff (LOWESS)
2.ii. Local: - LOWESS within print-tip-group
Scale Parameter
Stabilise variance
Ideal
Gaussian
Gaussian
Gaussian
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at
CSIRO Livestock Industries (Toni Reverter)
Technical Concerns: Statistical Analysis
Assumption: The proportion of genes that are DE is minimal
Q:
A:
NB:
Which genes to use?
Only Pin
the ones
(housekeeping)
that we knoweffects
are not DE
group
(sub-array)
“Boutique” arrays become a nuisance
Adapted from T Speed 2002
Lowess lines through points from pin groups
Boxplots of log ratios by pin group
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at
CSIRO Livestock Industries (Toni Reverter)
Technical Concerns: Statistical Analysis
Data Beautifying Techniques
Except Log2, everything else applies only to Ratios:
M = log2(R/G)
Except Log2, everything else applies only within slide
Everything is beautified to identify DE genes straight
from “M vs A” plot (A = Average) from a single slide or
from a function of M’s (t-stat) across slides
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at
CSIRO Livestock Industries (Toni Reverter)
Technical Concerns: Statistical Analysis
Whenever possible, avoid ratios.
Include the “possible systematic sources of variation”
into a model-based (eg. ANOVA) analysis and the
data will be implicitly normalised. Then check the
residuals.
Log2(Intensity) = Array
Dye
A*D
Treatment
Sample
Gene
Gene*Treatment
Gene*Sample
Residuals
+
+
+
+
+
+
+
+
Normalisation Model
Gene Model
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at
CSIRO Livestock Industries (Toni Reverter)
Rockhampton Model
Rockhampton
MEDIUM
(4  Animals)
Log2(Intensity) = Design
Array
Dye
Array*Dye
(Diet
Gene
Gene*Diet
Residuals
LOW
(3  Ani, 1 Rep)
(pooled 3 Anim)
HIGH
(pooled 2 Anim)
MEDIUM
(Pooled & Ampl)
Reference
All Pairs
+
+
+
+
+)
+
+
N Levels
2
14
2
28
3
7,347
22,041
300,936
LOW
(Pooled & Ampl)
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at
CSIRO Livestock Industries (Toni Reverter)
Japanese Model
Japanese
JANUARY 01
JAPANESE
HOLSTEIN
Log2(Intensity) = Array
+
Dye
+
Array*Dye +
Breed
+
Time
+
Breed*Time +
Gene
+
Gene*Br*Ti +
Residuals
JUNE 01
OCTOBER 01
JAPANESE
JAPANESE
HOLSTEIN
N Levels
12
2
24
2
3
6
5900
35,400
259,080
HOLSTEIN
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at
CSIRO Livestock Industries (Toni Reverter)
REML
Residual
Gene
Gene*Treatment
Rockhampton
Japanese
0.720
3.664
0.137
0.889
2.883
0.133
81.1
3.0
73.8
3.4
% Total Variance
Explained by
Gene
Gene*Treatment
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at
CSIRO Livestock Industries (Toni Reverter)
Clever Programming
Tailored to your needs
Your Needs: “Important values are…”
1. Away from (0,0)
2. In quadrants 1 and 4.
T24 - T0
Interaction Solutions
4
Disease
2
Generate a new variable:
+1.0*[(R24-R0)+(S0-S24)] if R0<R24 & S0>S24
0
+0.5*[(R24-R0)+(S24-S0)] if R0<R24 & S0<S24
-2
-0.5*[(R0-R24)+(S0-S24)] if R0>R24 & S0>S24
-1.0*[(R0-R24)+(S24-S0)] if R0>R24 & S0<S24
-4
-4
-2
0
Resistant
2
4
…then apply model-based clustering.
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at
CSIRO Livestock Industries (Toni Reverter)
Clever Programming
BAYESMIX
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at
CSIRO Livestock Industries (Toni Reverter)
Clever Programming
BAYESMIX
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at
CSIRO Livestock Industries (Toni Reverter)
Clever Programming
BAYESMIX
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at
CSIRO Livestock Industries (Toni Reverter)
Challenges
Human Dependent
Interdisciplinary Skills
Minimal knowledge of the application discipline is needed
…..failing that, the Statisticians will win,
..…but with the wrong weapons.
1.
2.
3.
4.
5.
Amount of Expression = Amount of Response
Same cut-off point to judge all genes
Over-emphasis in normalization (hence, despise “Boutique Arrays”)
Over-emphasis in variance stabilization
Over-emphasis in controlling false-positives
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at
CSIRO Livestock Industries (Toni Reverter)
Conclusion
The Statistical Analysis of cDNA Microarray Data:
General:
1. Still in its infancy (…possibly even embryonic stage)
2. Many decisions have a heuristic rather than a
theoretical foundation
3. No hope for a “One size fits all” software (even method)
4. Safer to aim towards “Tailor to one’s needs”
5. Integration of interdisciplinary skills is a must
Livestock Species:
1. Tailing humans (…at the moment)
2. Strong background knowledge of genetics accumulated
3. Journals will soon be inundated
4. We have the opportunity to participate
MMI Genomics/UCD & MSU Visits – May 2003
Design and Analysis of cDNA Microarray Experiments at
CSIRO Livestock Industries (Toni Reverter)
Thank You!
MMI Genomics/UCD & MSU Visits – May 2003