Transcript Slide 1
A New Statistical Method for Analyzing Longitudinal
Multifactor Expression Data and It’s Application to
Time Course Burn Data
Baiyu Zhou
Department of Statistics
Stanford University
10/06/2008
Outline
• Data description
• Brief review: current statistical methods
• Proposed statistical method
• Application on Burn data
Data Description
• Two data sets:
(1) burn + gender
• (1) burn + gender
(2) burn + age
• (2) burn + age
Burn
patients
Controls
Total
Burn
patients
Controls
Total
Male
66
17
83
Adult
86
38
124
Female
20
21
41
Children
33
31
64
Total
86
38
124
Total
119
69
188
Gender effect on burn patients
Age effect on burn patients
• Gene expression from each patient (blood) was measured at different
time points after burn.
The data sets are longitudinal (time course) and involve multiple factors
(Burn/control; gender or age)
Brief Review : Current methods (1)
Time course microarray data analysis
• Time course clustering. Identify co-expressed genes
Ma et al., Nucleic Acids Res. 2006 Mar 1;34(4):1261-9
• Fit smooth function, use gene specific summary statistic to characterize
the significance of change over time or between biological conditions
Storey et al., Proc Natl Acad Sci U S A. 2005 Sep 6; 102(36):12837-42.
• Empirical Bayes method to rank differentially expressed genes between
biological conditions.
Tai et al. Annals of Statistics 34(5), 2387–2412.
Brief Review : Current methods (2)
Multifactor microarray data analysis
• ANOVA for gene selection
Pavlidis et al., Methods. 2003;31:282–289.
• Nonparametric ANOVA, but has restrictions on # of replicates and noise
distribution
Gao et al., Bioinformatics 2006 22(12):1486-1494;
• We have developed a non paremetric ANOVA (NANOVA) method and
gene classification algorithm for microarray data analysis
• easily handle balanced/unbalanced experiment design
• free of distributional assumption
• estimating FDR
• robust to outliers
Zhou et al., in manuscript
There is no existing method for analyzing longitudinal multifactor
expression data !
Methodology
Let
be a gene expression from an individual over p time
points. Each individual is associated with two factors (e.g. gender; burn).
We want to identify genes :
(1) respond differently for male and female burn patients
(2) Respond to burn
......
Some genes might respond to burn at :
• Early stage
• Late stage
Which time point to use? (t1, t2 ….tp or their average ?)
We call (1), (2) … ANOVA structures (interaction effect, main effect).
In p-dimensional space, there is a direction on which the interested ANOVA
structure is most prominent . We first estimate this direction , project data
into the estimated direction and perform NANOVA analysis and gene
classification algorithm.
Gene Classification
We use NANOVA to classify genes into 5 classes by factor
effects
• C1 (interaction): factor effects are dependant
• C2 (additive): have both factor effects, but factors are independent
• C3 (
effect): have only
effect
• C4 (
effect): have only
effect
• C5 : no factor effects
gene expression
2.5
2.0
1.5
1.0
gene expression
male
female
burn
1.0 1.5 2.0 2.5 3.0 3.5 4.0
C2
3.0
C1
control
female
male
burn
burn
control
gene expression
2.5
2.0
1.5
1.0
gene expression
female
male
0.0 0.5 1.0 1.5 2.0 2.5 3.0
C4
3.0
C3
control
female
male
burn
control
Burn Data Analysis
Data preprocessing
• In our analysis, we used two time points : early and middle stage. Only used patients
have both data points.
Post burn day (min) Post burn day
(median)
Post burn day
(max)
Early stage
0.1
2.2
10.2
Middle stage
10.5
19.9
48.6
• Filtering probe sets : CV (coefficient of variation) > 0.5; median expression > 50
# of probe sets
# of arrays
(patients)
# of array
(controls)
Burn + gender
6060
172
38
Burn + age
6491
238
69
Burn Data Analysis
• After applying the proposed method, we classified genes (probes) into different
gene sets (FDR = 0.05 )
C1 (# of
probes)
C2 (# of
probes)
C3 (# of
probes)
C4 (# of
probes)
Burn + gender
51
755
4110
180
Burn + age
2181
1183
2562
151
• Burn effect is dominating
• Burn effect is dependant on age for a large set of genes
• gender has a smaller effect than age in burn patients.
C1 Genes
Have burn and age/gender effect. Burn effect is dependant on age/gender
700
500
300
100
mid signal
50
60
70
80
90
200
400
600
C1 : 231629_x_at
C1 : 205583_s_at
mid signal
150
100
150
early signal
200
800
200 300 400 500 600
early signal
200
early signal
100
mid signal
C1 : 204153_s_at
50 60 70 80 90
mid signal
C1 : 222606_at
200
300
400
500
600
early signal
Red: burn; green: control; circle: adult; triangle: children
Each point is a group mean (e.g. burn children)
Top ranking C1 genes : Burn + Gender
Top ranking C1 genes : Burn + Age
C2 Genes
Have burn and age/gender effect. Burn effect is independent of age/gender
C2 : 213398_s_at
900
500
700
mid signal
500
400
300
mid signal
600
1100
C2 : 211914_x_at
300
400
500
600
500
700
900
early signal
C2 : 225612_s_at
C2 : 216379_x_at
140
120
130
mid signal
70
60
50
mid signal
80
early signal
1100
50
60
70
early signal
80
90
110
130
early signal
Red: burn; green: control: circle: adult; triangle: children
Top ranking C2 genes : Burn + Gender
Top ranking C2 genes : Burn + Age
C3 Genes
Only have burn effect. No age/gender effect
C3 : 202592_at
150
400
500
600
700
800
120
160
200
240
C3 : 218244_at
C3 : 227626_at
mid signal
250
200
250
early signal
300
100 150 200 250
early signal
350
early signal
150
mid signal
250
mid signal
800
600
400
mid signal
350
C3 : 1569263_at
100
150
200
250
early signal
Red: burn; green: control: circle: adult; triangle: children
Top ranking C3 genes : Burn + Gender
Top ranking C3 genes : Burn + Age
C4 Genes
Only have age/gender effect. No Burn effect
420
340
54
56
58
60
62
340
380
420
460
early signal
C4 : 211105_s_at
C4 : 228590_at
mid signal
140
400
160
500
600
early signal
180 200
52
mid signal
380
60
mid signal
70
C4 : 202206_at
50
mid signal
C4 : 226348_at
140
160
180
early signal
200
360
400
440
480
early signal
Red: burn; green: control: circle: adult; triangle: children
Top ranking C4 genes : Burn + Gender
Top ranking C4 genes : Burn + Age
GO Enrichment Analysis
Top ranking pathways in C3 ( Burn + gender)
http://david.abcc.ncifcrf.gov/
GO Enrichment Analysis
Top ranking pathways in C3 ( Burn + Age)
http://david.abcc.ncifcrf.gov/
GO Enrichment Analysis
Top ranking pathways in C2 ( Burn + Gender)
Top ranking pathways in C2 ( Burn + Age)
http://david.abcc.ncifcrf.gov/
GO Enrichment Analysis
Top ranking pathways in C1 ( Burn + Age)
http://david.abcc.ncifcrf.gov/
A Few Interesting Pathways
Some pathways are important for burn patients. Although they don’t
have gender difference, they are very different in adults and children
patients.
Interpretation of Projection Direction
• The projection direction is gene specific
• The following 4 genes are from C3 ( Burn + Gender)
Burn effect is most prominent:
(1) At early stage
(2) At middle stage
(3) on the average of the two stages
(4) on the change of the gene expression between early stage and middle stage
• The projection direction contains temporal information of gene expression
(1) which time points are important
(2) what kind of patterns (e.g. average or change) are important
Temporal Information in Projection Direction
We did GO analysis on 200 probe sets from C3 (Burn + Gender), which have
(1) strong early stage signals or (2) Strong middle stage signals
0.0
-1.0
-0.5
mid
0.5
1.0
Projection vectors of C3 genes
0.0
0.2
0.4
0.6
0.8
1.0
early
0.0
-1.0
-0.5
mid
0.5
1.0
Projection direction of C3 genes
0.0
0.2
0.4
0.6
early
(1) Enriched in acute response genes: kinase cascade, immune response ……
(2) Enriched in DNA repair, metabolism, cell cycle genes ……
0.8
1.0
Temporal Information of Pathways
Projection direction contains temporal information about pathways
Example 1: T cell receptor signaling pathway ( C3 of Burn + Gender)
0.0
hs a04660:T cell receptor s ignaling pathway
-1.0
-0.5
mid
0.5
1.0
Projection direction of C3 (burn+gender)
0.0
0.2
0.4
0.6
0.8
1.0
early
Most genes cluster together. Projection direction indicates importance in both early
and middle stage
Temporal Information of Pathways
Example 2: Hematopoietic cell lineage ( C3 of Burn + Gender)
0.0
hs a04640:Hem atopoietic cell lineage
-1.0
-0.5
mid
0.5
1.0
Projection direction of C3 (burn+gender)
0.0
0.2
0.4
0.6
0.8
1.0
early
Most genes form sub clusters. It might be interesting to analyze these two sub clusters
of genes.
Summary
• A new approach to analyze longitudinal mutifactor expression data
(1) Classify genes into different gene sets based on factor effects, suited for
explorative study
(2) The projection direction contains temporal information
• Application on burn data pointed out some important genes/pathways and
their roles in male/female or adult/children burn patients.
References
• Ma et al., Nucleic Acids Res. 2006 Mar 1;34(4):1261-9
• Storey et al., Proc Natl Acad Sci USA. 2005 Sep 6; 102(36):12837-42.
• Tai et al. Annals of Statistics 34(5), 2387–2412.
• Pavlidis et al., Methods. 2003;31:282–289.
• Gao et al., Bioinformatics 2006 22(12):1486-1494.
• Anderson et al., Ann. Statist. Volume 13, Number 2 (1985)
• Dennis et al., Genome Biology 2003; 4(5):P3
Acknowledgement
• Wing Wong
• Weihong Xu, Wenzhong Xiao
• Ted Anderson