Transcript Slide 1
Microarray Experiment Design
and Data Interpretation
Susan Hester, Ph.D.
Environmental Carcinogenesis Division
Toxicogenomic Core Facility
US EPA
[email protected]
919-541-1320
1
Presentation Outline
• Traditional biology versus genomics
• Basics of genomics
• Data mining goals and approaches using parallel analyses
-some examples
•Interpreting changes in gene expression to identify altered
molecular pathways
• Evaluating pathway alterations in concert with traditional
toxicology data for greater understanding of mode of action
2
Traditional Biology
Measure one tree
at a time
Measure one element
in 10-50 samples
3
“Omic” Biology
Measure tens of
thousands of elements
in 2 to 4 samples
Measure Forests
(groups of trees)
4
Genomic research is a data-rich technology
• Microarrays are called chips or arrays
• Takes advantage of the natural property of DNA to pair
with its complimentary strand
• One strand is built into the array and then is used as a
probe for the complementary strand in the biologic
sample
• The binding confirms the presence of mRNA or cDNA
In the sample
5
Genomic Profiling-Find ”Significantly
Changed Genes”
From:
All probesets
Typical experiment
is ~ 1M datapoints
To:
Reduce to a much
smaller number
of “meaningful genes”
6
Finding genes in samples-1st step
1 genechip cell location
1 genechip
apply sample
7
2nd step
Tagged DNA fragments that
base pair will glow
2nd step
shine light
final image
text file with
gene intensities
8
Experimental Design
• Use adequate controls
• Sample collection
• Choose time-points and doses
• Hybridization schemes-1 or 2 colors
9
Data Quality and Data Mining
• RNA quality
• Scans
• Summary statistics
10
RNA quality:
• Agilent 2100 Bioanalyzer
• Measure RNA quality and quantity
• Uses small sample size and take minutes
Good
Quality
RNA
Agilent Gel Image
Degraded
RNA
11
QC Assessment of Scanned Slide
• Showing Good
Dynamic Range
of Signal Intensity
• Low background
signal
Poor scan
Good scan
12
Summary Statistics for each array
Raw gene intensity distribution
for each array
After normalization shows
reduced variance
max
median
min
Grp
1
2
3
4
5
6
13
Example of with-in group outliers
Example of 2 array outliers
(high and low median values)
Arrays
14
Goals of Data Mining
• Reduce the large dataset by first exclude “unchanging
genes”
• Early microarray papers used a simple “fold change”
to find differences
• Most analyses now rely on statistical tests to identify
changed genes-supervised versus unsupervised
• Find genes that distinguish the various biologic classes
“significant genes”
15
General Approach: From many genes to a few
28,000 rat genes
34,000 mouse genes
normalize data to
compare across arrays
supervised (prior knowledge)
T test, ANOVA, etc.
and
analysis begins here
unsupervised (no prior knowledge)
PCA, KNN, clustering
genes…now associate with gene name
using databases to assign gene function
characterize genes into pathways
explore pathways by combining into networks
16
Array Image Inspection Confirms the
Induction of Many Genes
1 uM As
50 uM As
17
Statistical Filter shows more
significant genes at higher doses
1 uM As
50 uM As
genes that have
values>1.5 fold
and significant
p<0.05
18
Many Views of the Data
• Table
of filtered genes
• Principal Component Analysis (PCA)
• Venn Diagrams-gene level
• Correlate Transcription with Functional Assays
• Map genes to pathways
• Venn Diagram-pathway level
19
Table view:
Significantly Altered Genes by Chemical, Day and Dose
in rat liver
Myclobutanil Propiconazole
4 Day
Low-Dose
Mid-Dose
High-Dose
30 Day
Low-Dose
Mid-Dose
High-Dose
90 Day
Mid-Dose
High-Dose
Triadimefon
4
3
228
0
1
396
5
7
1275
381
1164
220
2536
1395
419
1033
2522
1134
2
36
272
8452
10446
10337
20
Principal Component Analysis
• Identifies dose-response, if present
• Assess experiment
• Worth analyzing ?
• Identify outliers-bad chips
• Find samples with similar expression patterns
What it looks like:
What it does
• uses all samples and genes
• using statistics, reduces and plots the data
• helps visualize data in 2 or 3 planes (3D)
What it tells
• groups samples or genes with similar
profiles
• differentiates treatment or exposure
groups
21
Principal Component Analysis Rat Liver
Principal Component Plot 30d HD
15
10
5
PCA3
0
-5
-10
-15
-25
20
10
0
PCA2
-10
-20
-30
0
-10
-20
-30
A1
50
40
30
20
10
-20
PC
control
Myclobutanil
Propiconazole
Triadimefon
22
Numbers of Common and Unique Genes
Over Time (High Dose)-rat liver
23
Dose response corresponds to
functional assays
Functional assays
Better description of dose response by genomics
EROD Activities in 30 day Conazole treated livers
Stress Response Genes
30 day dose curve for T/C
10
Fold Induction
9
14
12
6
5
4
3
2
1
10
Cyp2b15
Cyp4a12
Cyp1a1
Gsta2
Aldh1a1
Ces2
Udpgtr2
8
6
4
2
0
0
low
mid
high
Dose
PROD Acitivities in 30 day Conazole treated livers
140
Fold Induction
Fold Change
8
7
120
100
80
60
40
20
0
T/C, L
T/C, M
T/C, Hi
Low
Mid
High
Dose
Triadimefon
Propiconazlole
Myclobutanil
24
Mapping genes to pathways
Pathway
Process
p-Value
# of
# of
%
genes
genes in
Expressed Pathway
Transcription of
Retinoid-Target
genes
Cell signaling/Regulation of
transcription
7.56E-09
68
125
54
Regulation
activity of EIF2
Cell signaling/Translation
regulation
5.86E-05
31
56
55
IGF-R signaling
Growth and differentiation
4.57E-06
40
72
56
AKT signaling
Growth and differentiation
9.50E-06
33
57
58
PTEN pathway
Growth and differentiation
3.65E-05
31
55
56
Tryptophan
metabolism
Metabolic maps/Amino acid
metabolism
3.99E-05
17
24
71
Cholesterol
Biosynthesis
Metabolic maps/Steroid
metabolism
6.25E-06
16
22
82
GTP-XTP
metabolism
Metabolic maps/Nucleotide
metabolism
4.58E-07
34
54
63
CTP/UTP
metabolism
Metabolic maps/Nucleotide
metabolism
1.32E-05
34
60
57
ATP/ITP
metabolism
Metabolic maps/Nucleotide
metabolism
1.49E-05
36
65
55
25
Pathway Venn
Unique and common pathways over time
26
Pathway and network visualizations
• cellular
• molecular
• network
• metabolic
• transcription
27
Example of a molecular pathway with
gene intensity values added
Oxidative Phosphorylation pathway
red=gene induced
green=gene repressed
rainbow=mixed
ATPase
Oxidoreductase
NADH dehydrogenase
succinate dehydrogenase
complex
cytochrome c
oxidase subunit
28
Cellular pathway
extracellular
cytoplasmic
Note c-Jun
JNK1, ERK1
repression*
nuclear
Expression legend
Green= decreased
Red=increased
Rainbow=mixed
29
Gene Network:
One Transcription factor:
30
Network objects mapped to cellular localization
31
Conclusions
Steps for a successful microarray experiment:
• Experiment design-focus your research question
• Data quality assessment
• Supervised and unsupervised analyses
• Integrating gene expression results with other
phenotypic endpoints
32