Two-way ANOVA

Download Report

Transcript Two-way ANOVA

Using 2-way ANOVA to dissect the immune response to
hookworm infection in mouse lung
Eric Olson
[email protected]
Using 2-way ANOVA to dissect the immune response to hookworm
infection in mouse lung
General microarry data analysis workflow
From raw data to biological significance
Comparison statistics
Two-way ANOVA
GeneSifter Overview
The Gene Expression Omnibus (GEO)
Microarray analysis of gene expression following hookworm infection
Data overview
Dissection of the immune response using 2-way ANOVA
The Microarray Data Analysis Process
Experimental Design
Number of groups, factors, replicates
Data management
Data, sample annotation, gene annotation, databases
Differential Expression
Comparison statistics, Correction for multiple testing, Clustering
Biological significance
Individual genes, Biological themes
Platform Selection
One-color, two-color, platform comparisons
System access
Ease of you, accessibility
Making data public and using public data
MIAME, Journals, GEO, meta-analysis
The Microarray Data Analysis Process
Experimental Design
Number of groups, factors, replicates
Data management
Data, sample annotation, gene annotation, databases
Differential Expression
Comparison statistics, Correction for multiple testing, Clustering
Biological significance
Individual genes, Biological themes
Platform Selection
One-color, two-color, platform comparisons
System access
Ease of you, accessibility
Making data public and using public data
MIAME, Journals, GEO, meta-analysis
Experiment Design
•Type of experiment
–
–
–
Two groups
• Normal vs. cancer
• Control vs. treated
Three or more groups, single factor
• Time series
• Dose response
• Multiple treatment
Four or more groups, multiple factors
• Time series with control and treated cells
The type of experiment and number of groups and factors will determine the
statistical methods needed to detect differential expression
•Replicates
–
–
The more the better, but at least 3
Biological better than technical
Rigorous statistical inferences cannot be made with a sample size of one. The
more replicates, the stronger the inference.
Pavlidis P, Li Q, Noble WS. The effect of replication on gene expression microarray experiments. Bioinformatics. 2003 Sep
1;19(13):1620-7.
Experimental Design and Other Issues in Microarray Studies - Kathleen Kerr http://ra.microslu.washington.edu/learning/documents/KerrNAS.pdf
Differential Expression
The fundamental goal of microarray experiments is to identify genes that are differentially
expressed in the conditions being studied. Comparison statistics can be used to help identify
differentially expressed genes and cluster analysis can be used to identify patterns of gene
expression and to segregate a subset of genes based on these patterns.
•Statistical Significance
– Fold change
Fold change does not address the reproducibility of the observed difference
and cannot be used to determine the statistical significance.
–
Comparison statistics
• 2 group
– t-test, Welch’s t-test, Wilcoxon Rank Sum,
• 3 or more groups, single factor
– One-way ANOVA, Kruskal-Wallis
• 4 or more groups, multiple factors
– Two-way ANOVA
Comparison tests require replicates and use the variability within the
replicates to assign a confidence level as to whether the gene is
differentially expressed.
Supporting material Draghici S. (2002) Statistical intelligence: effective analysis of high-density microarray data. Drug Discov Today,
7(11 Suppl).: S55-63.
t-test for comparison of two groups
Calculate t statistic
t=
difference between groups
difference within groups
=
Mean grp 1 – Mean grp 2
((s12/n1) + (s22/n2))1/2
s = variance
n = size of sample
Determine confidence level for t
(probability that t could occur by
chance)
df = n1 + n2 - 2
The larger the difference
between the groups and the
lower the variance the bigger t
will be and the lower p will be
Differential Expression
2 groups, 4 replicates each
Mean, standard deviation, fold change and p-value calculated
8
18
7
16
Mean Signal
6
5
4
3
12
10
8
6
2
Mean Signal
14
4
1
2
0
0
Exp
Con
Gene 1
Fold Change = 5.3
p = 0.19
Exp
Con
Gene 2
Fold Change = 5.3
p = 0.03
Fold change vs. p value
Analysis of Variance (ANOVA)
•Like t-test, identifies genes with large differences between groups
and small differences within groups
•For use with 3 or more groups
•One-way and two-way
•One-way examines effects of one factor on gene expression
•Two-way can examine effects of two factors on gene expression as
well as the interaction of the two factors
Pavlidis P. Using ANOVA for gene selection from microarray studies of the nervous system. Methods. 2003
Dec;31(4):282-9.
Glantz S. Primer of Biostatistics. 5th Edition. McGraw-Hill.
Glantz S, Slinker B. Primer of Regression and Analysis of Variance. McGraw-Hill.
Two-way ANOVA Example
Triple treatment in Huntington’s Disease model
(R6/2 mice, GSE857, Affymetrix U74Av2)
Disease effect
3
3
Interaction
Disease and treatment
effect
(no Interaction)
R6/2 +
R6/2
Treatment effect
R6/2 -
3
WT +
3
WT -
WT
Gene expression pattern
Disease
Treatment
+
Two-way ANOVA compared to t-test
Triple treatment in Huntington’s Disease model
(R6/2 mice, GSE857, Affymetrix U74Av2)
Disease
Treatment
+
Disease Differences
WT
3
3
R6/2
3
3
t-test
274
Two-way
791
Pavlidis P, Noble WS. Analysis of strain and regional variation in gene expression in mouse brain. Genome Biol.
2001;2(10):RESEARCH0042.
Analysis Workflow Examples
2 groups
5 groups, single factor
12 groups, two factors
(apoE -/- aorta vs. wt aorta)
(Drosophila Innate Immune Response Time Series)
(Immune response to hookworms
in mouse lung)
t-test
One-way ANOVA
Two-way ANOVA
BH (FDR)
BH (FDR)
BH (FDR)
Up regulated
Down regulated
Clustering
Clustering
Gene Lists
Gene Lists
Gene Lists
Individual genes of interest
Biological themes
(Pathways, molecular functions, etc.)
Using 2-way ANOVA to dissect the immune response to hookworm
infection in mouse lung
General microarry data analysis workflow
From raw data to biological significance
Comparison statistics
Two-way ANOVA
GeneSifter Overview
The Gene Expression Omnibus (GEO)
Microarray analysis of gene expression following hookworm infection
Data overview
Dissection of the immune response using 2-way ANOVA
GeneSifter – Microarray Data Analysis
Accessibility
Web-based
Secure
Data management
Data
Annotation (MIAME)
Multiple upload tools
CodeLink
Affymetrix
Illumina
Agilent
Custom
Differential Expression - Powerful, accessible tools for
determining Statistical Significance
R based statistics
Bioconductor
Comparison Tests
t-test, Welch’s t-test, Wilcoxon Rank sum test,
one-way ANOVA, two-way ANOVA
Correction for Multiple Testing
Bonferroni, Holm,
Westfall and Young maxT,
Benjamini and Hochberg
Unsupervised Clustering
PAM, CLARA, Hierarchical clustering
Silhouettes
GeneSifter – Microarray Data Analysis
Integrated tools for determining Biological Significance
One Click Gene Summary™
Ontology Report
Pathway Report
Search by ontology terms
Search by KEGG terms or Chromosome
The GeneSifter Data Center
• Free resource
Training
Research
Publishing
• 6 areas
Cardiovascular
Cancer
Endocrinology
Neuroscience
Immunology
Oral Biology
• Access to :
Data
Analysis summary
Tutorials
WebEx
The GeneSifter Data Center
www.genesifter.net/dc
The Gene Expression Omnibus (GEO)
Gene expression data repository
(mostly microarrays)
Over 3000 data sets
All array platforms represented
Searchable by
Platform
Species
Experiment annotation
Downloadable data
Using the Gene Expression Omnibus (http://www.microarraysuccess.org/newsletter)
Using 2-way ANOVA to dissect the immune response to hookworm
infection in mouse lung
General microarry data analysis workflow
From raw data to biological significance
Comparison statistics
Two-way ANOVA
GeneSifter Overview
The Gene Expression Omnibus (GEO)
Microarray analysis of gene expression following hookworm infection
Data overview
Dissection of the immune response using 2-way ANOVA
Project Analysis : Two-way ANOVA
Scott lab, Johns Hopkins University
(Bloomberg School of Public Health )
Affymetrix Mouse 430 2.0
Wild type and SCID mice
Control and 5 time points after infection
CEL files available
(loaded and MAS5 processed in GeneSifter)
Alex Loukas, and Paul Prociv. Immune Responses in Hookworm Infections.
Clinical Microbiology Reviews, October 2001, p. 689-703, Vol. 14, No. 4
Analysis of Variance (ANOVA)
•Like t-test, identifies genes with large differences between groups
and small differences within groups
•For use with 3 or more groups
•One-way and two-way
•One-way examines effects of one factor on gene expression
•Two-way can examine effects of two factors on gene expression as
well as the interaction of the two factors
Pavlidis P. Using ANOVA for gene selection from microarray studies of the nervous system. Methods. 2003
Dec;31(4):282-9.
Glantz S. Primer of Biostatistics. 5th Edition. McGraw-Hill.
Glantz S, Slinker B. Primer of Regression and Analysis of Variance. McGraw-Hill.
Project Analysis : Two-way ANOVA
Factor One: Strain (2 levels, SCID, WT)
Factor Two: Time after infection (6 levels, con, 2,3,4,8,12 dpi)
Gene expression pattern
Strain:
Time:
WT
SCID
Strain Effect
Time Effect
Interaction
Project Analysis : Two-way ANOVA
Project Analysis : Two-way ANOVA
Identify Factors
Indicate number of levels for each
Identify levels for each factor
Project Analysis : Two-way ANOVA
Assign levels for each factor to cells
Include fold-change cutoff if desired
Select effect to filter on first
(you can switch later)
Two-way ANOVA : Strain Effects
Biological Significance
Gene Annotation Sources
•
UniGene - organizes GenBank sequences into a non-redundant set of gene-oriented clusters.
Gene titles are assigned to the clusters and these titles are commonly used by researchers to
refer to that particular gene.
•
LocusLink (Entrez Gene) - provides a single query interface to curated sequence and descriptive
information, including function, about genes.
•
Gene Ontologies – The Gene Ontology™ Consortium provides controlled vocabularies for the
description of the molecular function, biological process and cellular component of gene products,
that can be used by databases such as Entrez Gene.
•
KEGG - Kyoto Encyclopedia of Genes and Genomes provides information about both regulatory
and metabolic pathways for genes.
•
Reference Sequences- The NCBI Reference Sequence project (RefSeq) provides reference
sequences for both the mRNA and protein products of included genes.
GeneSifter maintains its own copies of these databases and updates them automatically.
One-Click Gene Summary
Two-way ANOVA : Strain Effects
Ontology Report
Ontology Report : z-score
R = total number of genes meeting
selection criteria
N = total number of genes measured
r = number of genes meeting selection
criteria with the specified GO term
n = total number of genes measured with
the specific GO term
Reference:
Scott W Doniger, Nathan Salomonis, Kam D Dahlquist, Karen Vranizan, Steven C Lawlor and Bruce R Conklin; MAPPFinder: usig
Gene Ontology and GenMAPP to create a global gene-expression profile from microarray data, Genome Biology 2003, 4:R7
Z-score Report
KEGG Report
Two-way ANOVA : Strain Effects
Strain effects - Visualization
Visualization of 517 genes
(strain effect p < 0.001)
Strain effects - Partitioning
Segregation of expression patterns using k-medoids clustering
Strain effects - Partitioning
Silhouette widths are used to find “best” number of clusters
k
2
4
6
mean sil. width
0.71
0.41
0.25
Dudoit S, Fridlyand J. A prediction-based resampling method for estimating the number of clusters in a dataset.
Genome Biol. 2002 Jun 25;3(7):RESEARCH0036. Epub 2002 Jun 25.
Strain : Cluster 1
Strain : Cluster 2
Two-way ANOVA : Time Effects
Two-way ANOVA : Time Effects
Time : Cluster 1
Time : Cluster 2
Two-way ANOVA : Interaction
Two-way ANOVA : Interaction
Interaction : Cluster 3
Interaction : Cluster 2
Two-way ANOVA : Summary
Immune response to hookworms in mouse lung
12 groups (3 biological replicates)
2 factors (Strain and Time)
Two-way ANOVA
~39,000 genes
Interaction
56 genes
Pattern selection –
Hierachical clustering, PAM
(Interaction)
Z-scores
Biological process
Transcription (4)
Circadian Rhythm (3)
Strain
517 genes
Time
1054 genes
Biological process
Immune response (8)
Chitin catabolism (4)
Strain effects, time effects and interaction
GeneSifter Workflow Examples
2 groups
5 groups, single factor
12 groups, two factors
(apoE -/- aorta vs. wt aorta)
(Drosophila Innate Immune Response Time Series)
(Immune response to hookworms
in mouse lung)
t-test
One-way ANOVA
Two-way ANOVA
BH (FDR)
BH (FDR)
BH (FDR)
Up regulated
Down regulated
Clustering
Clustering
Gene Lists
Gene Lists
Gene Lists
Individual genes of interest
Biological themes
(Pathways, molecular functions, etc.)
Resources
Monthly Webinar Series
8/10/06 -
Microarray analysis of gene expression in Huntington's Disease peripheral blood - a platform
comparison
Archived - Using 2-way ANOVA to dissect gene expression following myocardial infarction in mice
Archived - Using 2-way ANOVA to dissect the immune response to hookworm infection in mouse lung
Archived - The microarray data analysis process - from raw data to biological significance
Archived - Microarray analysis of gene expression in androgen-independent prostate cancer
Archived - Microarray analysis of gene expression in male germ cell tumors
Thank You
www.genesifter.net
Trial account, tutorials, sample data and Data Center
Eric Olson
[email protected]