Slide Presentation

Download Report

Transcript Slide Presentation

Doug Selinger, Graduate Student
Dept. of Genetics
Advisor: Dr. George Church
High Resolution Genome Array
Analysis of E. coli
- The Affymetrix E. coli chip: the first true “genome array”
- Log vs. Stationary Phase gene expression analysis
- Introduction to GAPS (Genome Array Processing Software)
- Array Normalization
- RNA Abundance Measures
- Strand-specific Transcript Detection
- High Resolution Analysis
Why a genome chip?
• Promoter studies
– DNA binding protein studies (Saeed Tavazoie,
Martha Bulyk)
– Multiple promoters (within coding regions,
between genes of an operon, alternate
transcriptional starts)
• Determination of operon structure
• Studies of 5’ and 3’ UTR function
• Identification of untranslated and antisense
RNAs
Affymetrix® E. coli Chip
mRNAs
60 base average
resolution
Intergenic
regions
6 base average
resolution
rRNAs, tRNAs
Chip Images:
Whole Chip
Coding Regions
Intergenic Regions
D
lpp 5’ end of coding region
lpp 3’ end of coding region
high
low
Log vs. Stationary Expression Analysis
Cells were grown at 37° C in LB to either:
1) Mid-log phase (OD600 = 0.6) in a fermentor
or
2) Late stationary phase (overnight incubation) in a
shaken culture
Cells were taken directly into acid phenol:chloroform,
RNA was purified, labeled in duplicate, and hybridized
to chips.
Significance of Changes
Significant Changes: 1,529 genes (including rRNAs, tRNAs)
1) Must be called present in at least one condition
2) Must meet one of the following criteria:
i) >95% confident with Student’s t-test
or
ii) at least 11/13 of probe pairs agree in direction.
(The highest and lowest are discarded, leaving 13.)
This yields:
False Positive Rate: 0% (0/52)
Sensitivity: 100% (52/52)
25 Genes Most Increased* in Stationary Phase
Bnumber
b1005
b0836
b0953
b3049
b4045
b3510
b0812
b1480
b2665
b3555
b3239
b1240
b1635
b1051
b0966
b1318
b1154
b1566
b2212
b1492
b2266
b1164
b3183
b1262
Gene
ycdF
rmf
glgS
yjbJ
hdeA
dps
rpsV
ygaU
yiaG
yhcO
gst
msyB
yccV
ycjV
ycfK
flxA
alkB
xasA
elaB
ycgZ
yhbZ
trpC
b1739
osmE
Fold Change
Annotation
102.0
orf, hypothetical protein
>896.5
putative receptor
17.1
ribosome modulation factor
155.8
glycogen biosynthesis, rpoS dependent
9.0
orf, hypothetical protein
41.4
orf, hypothetical protein
55.2
global regulator, starvation conditions
48.2
30S ribosomal subunit protein S22
59.8
orf, hypothetical protein
11.9
orf, hypothetical protein
139.5
orf, hypothetical protein
3.8
orf, hypothetical protein
80.7
glutathionine S-transferase
10.8
acidic protein suppresses mutants lacking function of protein export
15.7
orf, hypothetical protein
75.2
putative ATP-binding component of a transport system
>123.6
orf, hypothetical protein
13.1
orf, hypothetical protein
5.6
DNA repair system specific for alkylated DNA
84.6
acid sensitivity protein, putative transporter
>92.1
orf, hypothetical protein
3.0
orf, hypothetical protein
6.7
putative GTP-binding factor
6.8
N-(5-phosphoribosyl)anthranilate isomerase and indole-3glycerolphosphate synthetase
23.6
activator of ntrL gene
*ranked by absolute Smax change
Genes Known to be Growth Phase Regulated
Gene
rmf
glgS
hdeA
dps
hdeB
osmY
himA
csgB
clpA
wrbA
fic
htrE
cstA
sspA
ftsA
hyaE
dacC
emrA
otsB
cfa
iciA
rpoH
hns
Fold Change
Annotation
17.09
ribosome modulation factor
155.78
glycogen biosynthesis, rpoS dependent
41.43
orf, hypothetical protein
55.23
global regulator, starvation conditions
4.96
orf, hypothetical protein
8.87
hyperosmotically inducible periplasmic protein
23.35
integration host factor (IHF), alpha subunit; site specific recombination
>20.70
minor curlin subunit precursor, similar to CsgA
7.70
ATP-binding component of serine protease
6.96
trp repressor binding protein; affects association of trp repressor and operator
25.79
induced in stationary phase, recognized by rpoS, affects cell division
>16.54
probable outer membrane porin protein involved in fimbrial assembly
11.21
carbon starvation protein
3.68
regulator of transcription; stringent starvation protein A
>3.62
ATP-binding cell division protein, septation process, complexes with FtsZ,
associated with junctions of inner and outer membranes
>2.92
processing of HyaA and HyaB proteins
7.54
D-alanyl-D-alanine carboxypeptidase; penicillin-binding protein 6
>2.89
multidrug resistance secretion protein
2.38
trehalose-6-phosphate phophatase, biosynthetic
>2.88
cyclopropane fatty acyl phospholipid synthase
>2.51
replication initiation inhibitor, binds to 13-mers at oriC
0.35
RNA polymerase, sigma(32) factor; regulation of proteins induced at high
temperatures
0.04
DNA-binding protein HLP-II (HU, BH2, HD, NS); pleiotropic regulator
GAPS
GAPS = Genome Array Processing Software
Flexible and exhaustive analysis
of high-density microarrays
GAPS
- More flexibility in the way the analysis is done,
including a choice of abundance measures, instead of
the single measure (average difference) used in
GeneChip.
- An algorithm which corrects for varying signal in
different regions of the chip
- High resolution analysis, where data can be analyzed
oligo by oligo
- Handles replicates and multiple timepoints
GAPS
- Automatic annotation of results
- All output files can easily be opened and analyzed in
Excel
- Detection of transcripts is decided by a simple
statistical test, rather than a "black box" algorithm, as
in GeneChip.
GAPS
Array Normalization
How can experiments be normalized between
arrays and corrected for the variation of
fluorescence within an array?
Chip Signal Interpretation and Normalization
Perfect match (PM)
Mismatch (MM)
Probe Set = 15 Probe Pairs target same transcript
Background subtracted PM - MMs are multiplied by:
1) Spatial correction factor
2) Inter-chip scaling factor
Absolute RNA level: scale to 16S ribosomal features.
Relative levels: scale to total chip intensity.
GAPS
Correction for spatial variation in probe intensity
2
2
0
0
0
3
0
0
0
2
0
0
0
0
2
5
0
0
1
8
0
0
0
1
6
0
0
0
2
0
0
0
Intensity
Intensity
1
4
0
0
0
1
2
0
0
0
1
5
0
0
1
0
0
0
0
8
0
0
0
6
0
0
5
0
0
4
0
0
3
0
0
0
5
0
0
4
0
0
3
0
0
2
0
0
1
0
0
Y
X
5
0
0
2
0
0
1
0
0
0
0
Log Phase 1
Chip Control Grid
6
0
0
5
0
0
4
0
0
3
0
0
6
0
0
0
4
0
0
0
2
0
0
0
0
5
0
0
4
0
0
3
0
0
2
0
0
1
0
0
Y
X
1
0
0
0
2
0
0
1
0
0
0
0
Log Phase 2
Chip Control Grid
GAPS
Transcript Abundance
Given multiple probes per transcript, what is
the best way to measure transcript abundance?
Measuring RNA abundance:
from chip intensity to concentration
Perfect match (PM)
Mismatch (MM)
Probe Set = 15 Probe Pairs target same transcript
Standard Affymetrix Metric: Average Difference (AD)
Remove outliers, sum all PM - MM, divide by # of probe pairs.
Other possible metrics
Second Maximal (Smax) PM-MM
Median PM-MM
Sensitivity With Different Metrics
160000
140000
120000
Signal
100000
Genes detected:
Metric
Log
Smax
87%
Median
20%
AD
18%
Stationary
97%
90%
87%
Smax
 Median
 AD

80000
60000
40000
20000
0
0.01
0.1
1
-20000
Copies / Cell
10
100
Fold Change With Different Metrics
100
Observed Fold Change
Smax
y = 1.11x0.46
R = 0.94
0.37
Median y = 1.95x
R = 0.94
AD
10
Smax
 Median
 AD

y = 1.08x0.54
R = 0.95
1
1
10
100
Known Fold Change
1000
GAPS
Transcript Detection
What is the best way to distinguish fluorescence
from specific hybridization from cross-hybe and
random noise?
Detection of Antisense and Untranslated RNAs
A) Crick Strand
csrB
B) Expression Array
b1365 (sense)
Watson Strand (same array)
csrB (untranscribed strand)
Reverse Complement Array
b1365 (antisense)
pos. 171
228
Detection of csrB and b1365
250000
Neg Controls, Average of 20
200000
csrB, sense
csrB, antisense
Intensity (PM-MM)
150000
b1365 antisense
100000
50000
0
0
1
2
3
4
5
6
7
8
9
10
-50000
-100000
-150000
Probe Pair Rank
11
12
13
14
15
16
Antisense "ORFs" Detected
4500
4033
4000
3399
3500
3470
3270
3079
3036
3000
2500
2181
2061
2000
1726
1500
1000
1222
587
500
0
1
2
3
4
5
6
7
Rank(s) Used
8
9
10
4 to 8
Detection of Transcription on Both Strands
of Coding Regions
Sense
Antisense
(data from Dan Janse)
ORF Analysis
GAPS
High Resolution Analysis
- Chips provide a sensitive, accurate readout
of global transcription patterns.
- Can we make use of intergenic representation
and the relatively high density of probes
throughout the genome?
Detection of Transcription Start of lpp
Intensity (PM - MM) / Smax
2
1.5
1
Known
Hairpin
0.5
Log
Stationary
Genomic DNA
0
-300
-200
-100
0
100
200
300
-0.5
Known Transcription Start
(position -33)
Translation Stop
(237 bases)
-1
Bases from Translation Start
400
lpp Watson
40
30
20
SP Sig
SP Stop Score
ORF 5'
ORF 3'
10
0
1754600
1754800
1755000
1755200
1755400
-10
-20
Genome Position (bp)
1755600
1755800
ela operon (Watson)
40
35
30
25
SP stop
ORF 5'
ORF 3'
SP sig/gDNA sig
20
15
10
elaC
elaD
5
0
2378000 2378500 2379000 2379500 2380000 2380500 2381000 2381500 2382000 2382500
-5
Genome Position (bp)
Hi-Res Analysis
b1365 (antisense)
pos. 171
228
Possible Effect of Secondary Structure
on rpsO Signal
5
Intensity (PM - MM) / Smax
4
3
Reported
Transcription
Start
Known region of
secondary structure
2
Log
Stationary
Genomic DNA
1
0
-200
-100
0
100
200
300
-1
-2
Translation Stop (270 bases)
-3
Bases from Translation Start
400
500
Results Summary
We describe the design and use of a whole genome chip
for E. coli and introduce the GAPS package for genome
array analysis.
87 - 97% of genes detected (of 4,403 bnumbers) including
a large number of antisense “ORFs”
1,529 significant changes
Sensitivity ~= 0.1 copies/cell
Dynamic Range ~= 103
High Resolution (30 base overall average)
- Genes: average 60 base resolution
- Intergenic regions: average 6 base resolution
Potential use for RNA secondary structure analysis and
non-steady state studies, such as RNA degradation
Future Directions
1) Calibrate probe intensity using genomic DNA chip hybridizations
and derive absolute RNA levels.
2) Study RNA processing, alternative promoters and operon structure.
3) Map actual start and stop sites for all transcripts, identify new RNAs.
4) Apply this analysis to non steady-state experiments, e.g. RNA
degradation.
Acknowledgements
Advisor
George Church
Wet Lab Support
Kevin Cheung
Jeremy Edwards
Dan Janse
Felix Lam
Bioinformatics
Adnan Derti
Allegra Petti
Affymetrix
Eric Johanson
Rui Mei
Novartis
David Lockhart
Many members of the Church
lab for useful discussions, advice,
and encouragement
Questions, comments to: [email protected]