Workflow for Double-Ended Shotgun Sequencing of

Download Report

Transcript Workflow for Double-Ended Shotgun Sequencing of

Double-Ended Shotgun
Sequencing of PA14
Daniel G. Lee
10/30/02
Determination of PA14 Genomic Sequence and
Whole-Genome Alignment with PAO1
•
The complete genome of a related P. aeruginosa strain, PAO1, has been
determined. The genome size is 6.2 Mb.
•
PAO1 is less virulent than PA14 in almost all of our model hosts.
•
PA14 contains additional DNA (sometimes large islands of DNA) not
found in PAO1. Some of these additional genes may be responsible for
the enhanced virulence of PA14.
•
A complete PA14 genomic sequence will allow us to:
– identify all the (DNA) differences between PA14 and PAO1 (and later
evaluate their contribution to virulence).
– Simplify the bioinformatics component of the PA14 Unigene library.
– Design a microarray (whole-genome or PA14-specific).
PA14 Sequencing - Outline
1. Sequencing workflow.
2. Finishing.
3. Annotation and whole-genome alignment.
4. Integration with PA14 insertion library.
5. Requirements for publication.
PA14 Genomic DNA Prep
1.
Original PA14 RifR isolate from LGR.
2.
500 ml culture.
3.
Alkaline lysis, with:
1.
CTAB ppt.
2.
2 x Chloroform/Isoamyl Alcohol extraction.
3.
3 x Phenol extraction.
4.
1 x Phenol/Chloroform/Isoamyl Alcohol extraction.
5.
1 x Chloroform/Isoamyl Alcohol extraction.
4.
Isoamyl alcohol ppt.
5.
Resupended in 5 ml TE @1.174 mg/ml (5.87 mg total).
Workflow for Double-Ended
Shotgun Sequencing of PA14
PA14 Genomic
DNA Prep
Shear PA14
DNA and
Size Fractionate
Ligate PA14
fragments into vector
and transform E. coli
Plasmid preps of
PA14 library.
Linear amplification
of inserts using
dideoxy terminators
Sequencing of
amplification
products
Contig assembly
(PHRED and
PHRAP)
Genome-wide
alignment with PAO1
• order contigs.
• identify gaps for
sequence finishing.
• identify differences
between PA14 and PAO1.
Finishing and
Annotation
Plasmid Library Construction
PA14 Genomic
DNA Prep
Shear PA14
DNA and
Size Fractionate
Ligate PA14
fragments into vector
and transform E. coli
Plasmid preps of
PA14 library .
Linear amplification
of inserts using
dideoxy terminators
Sequencing of
amplification
products
Contig assembly
(PHRED and
PHRAP)
Genome-wide
alignment with PAO1
• order contigs.
• identif y gaps f or
sequence f inishing.
• identif y dif f erences
between PA14 and PAO1.
Finishing and
Annotation
1. Shear DNA using nitrogen
(cleavage more random
than sonication).
2. Fill-in to produce blunt
ends.
3. Size fractionate on lowmelt agarose gel.
•
1-3 kb fragments (700
bp).
•
3-7 kb fragments
4. Ligate.
5. Transform.
6. Pick colonies.
Plasmid Preps
PA14 Genomic
DNA Prep
Shear PA14
DNA and
Size Fractionate
Ligate PA14
fragments into vector
and transform E. coli
Plasmid preps of
PA14 library .
Linear amplification
of inserts using
dideoxy terminators
Sequencing of
amplification
products
Contig assembly
(PHRED and
PHRAP)
• identif y gaps f or
sequence f inishing.
2. Freeze cell pellets.
3. Alkaline-lysis mini-preps
in 96-well plates.
Genome-wide
alignment with PAO1
• order contigs.
1. O/N cultures in 96-well
plates.
Finishing and
Annotation
•
604/650 plates done.
• identif y dif f erences
between PA14 and PAO1.
4. Dry DNA pellets O/N.
5. Resuspend DNA in H2O.
6. Transfer to 384-well plate.
•
QC by agarose gel.
Sequencing Reactions
PA14 Genomic
DNA Prep
Shear PA14
DNA and
Size Fractionate
Ligate PA14
fragments into vector
and transform E. coli
Plasmid preps of
PA14 library .
Linear amplification
of inserts using
dideoxy terminators
Sequencing of
amplification
products
Contig assembly
(PHRED and
PHRAP)
Genome-wide
alignment with PAO1
• order contigs.
• identif y gaps f or
sequence f inishing.
• identif y dif f erences
between PA14 and PAO1.
Finishing and
Annotation
1. Set up reaction mix:
•
Labelled ddNTPs.
•
dNTPs
•
Buffer
•
Taq
•
Forward or reverse
sequencing primer.
2. Aliquot rxn mix to 384well PCR plate; freeze.
3. Add 3 ml DNA to each
well (or 3 ml vector for
“PCR control”).
4. “PCR”
DNA Sequencing
PA14 Genomic
DNA Prep
Shear PA14
DNA and
Size Fractionate
Ligate PA14
fragments into vector
and transform E. coli
1. EtOH ppt. PCR reactions.
2. Dry.
Plasmid preps of
PA14 library .
Contig assembly
(PHRED and
PHRAP)
Linear amplification
of inserts using
dideoxy terminators
Sequencing of
amplification
products
Genome-wide
alignment with PAO1
• order contigs.
• identif y gaps f or
sequence f inishing.
• identif y dif f erences
between PA14 and PAO1.
Finishing and
Annotation
3. Rssp. in H2O.
4. Add previously
characterized PCR
reactions as “sequencing
controls”.
5. ABI Prism sequencers
(liquid polymer capillary
sequencer, 96 reactions
at a time).
• ABI sequencer
outputs
electropherograms.
• PHRED determines
identity of base as
well as quality score.
Contig Assembly
PA14 Genomic
DNA Prep
Shear PA14
DNA and
Size Fractionate
Ligate PA14
fragments into vector
and transform E. coli
1. Electropherograms.
2. PHRED - determines
Plasmid preps of
PA14 library .
Contig assembly
(PHRED and
PHRAP)
Linear amplification
of inserts using
dideoxy terminators
Sequencing of
amplification
products
Genome-wide
alignment with PAO1
• order contigs.
• identif y gaps f or
sequence f inishing.
• identif y dif f erences
between PA14 and PAO1.
base identity and quality
score for each position.
3. PHRAP - aligns
Finishing and
Annotation
sequences to assemble
contigs, determines
consensus sequence and
quality score for each
position.
Contig Assembly
CCG-AATTCCGGCTTTACGACGACTTGGGGTACCA
ccg-aatt-cggctttacg
aatgccggcattacg
tt--ggc-ttacgaccctttg-ggt
t--ggc-ttacg--gactaggggtacca
PA14 Sequencing:
Current Status (as of 10/17/02)
1.
Total amount of sequence: ~ 6 Mb (6.5 X coverage)
•
•
72,000 sequences (36,000 clones).
390 (out of ~650) 96-well plates sequenced.
•
604 plates mini-prepped
Total number of “contigs”: < 2000?
2.
•
•
•
•
•
•
1 contig ~ 44 kb
1 contig ~ 35 kb
~ 10 contigs > 25 kb
~ 12 contigs 20-25 kb
most contigs are 5-10 kb.
Library consists of ~1 kb inserts (current plans to introduce
a library of 3-6 kb inserts).
As of 10/21/02, one contig 73 kb, many > 50 kb.
Workflow for Double-Ended
Shotgun Sequencing of PA14
PA14 Genomic
DNA Prep
Shear PA14
DNA and
Size Fractionate
Ligate PA14
fragments into vector
and transform E. coli
Plasmid preps of
PA14 library.
Linear amplification
of inserts using
dideoxy terminators
Sequencing of
amplification
products
Contig assembly
(PHRED and
PHRAP)
Genome-wide
alignment with PAO1
• order contigs.
• identify gaps for
sequence finishing.
• identify differences
between PA14 and PAO1.
Finishing and
Annotation
Comparisons of PAO1 and PA14
B
A
A: gaps in regions corresponding to PAO1 sequence.
B: gaps in PA14-specific regions.
Tools for Genome-Wide Alignments of
PAO1 and PA14
1.
2.
Software Packages available from TIGR for
Alignments.
•
MUMmer 2.1 - aligns MUMs (maximal unique matches) for
two input sequences (two 3-4 Mb genomes aligned in
under 30 seconds, using less than 100 Mb of memory, on
a typical desktop computer running Unix/Linux).
•
NUCmer - alignments of highly similar sequences that
may have large rearrangements (i.e. -- a group of
assembly contigs vs. a complete genome).
•
PROmer - amino acid translation in all 6 frames for
protein/peptide alignments. Useful for comparative
genome annotation.
DisplayMUMs for graphical analysis of MUMmer
output.
Tools for Annotation of PA14
1. PROmer - amino acid translation in all 6 frames
for protein/peptide alignments. Useful for
comparative genome annotation.
2. Jonathan’s automated suite of annotation tools
(Hrp project)
Approaches for Finishing
1.
PCR amplification and directed sequencing of gapped
regions.
2.
Isolation of cosmid clones spanning gaps, subcloning,
sequencing of subclones (using universal primers).
3.
(Direct genomic sequencing).
4.
(Altering sequencing reaction conditions for regions that are
difficult to sequence through).
Finishing
B
A
Methods:
•
PCR.
•
Cosmids
•
(Directed sequencing)
•
(Altered rxn. Conditions)
Considerations:
•
Type of gap.
•
Anticipated size of gap.
•
Quality/nature of sequence at
junction.
Integration with PA14 Unigene Library
Subject for
BLAST
Searches
PAO1
PA14 Contigs
Finished
PA14
Sequence
(annotated)
Verify PA14
Sequences
(close gaps,
improve
sequence
quality)
Assign Insert
Coordinates
Assign
Identity of
Disrupted
ORF
Requirements for Publication
1.
“Finished” PA14 sequence.
a)
b)
2.
3.
Sufficient quality.
No gaps?
Annotation.
Comparison to PAO1.
4. What else?
a) Virulence data?
b) Proteomics?
c) Others?
ACKNOWLEDGEMENTS
MGH:
N. Liberati
S. Miyata
J. Urbach
F. Ausubel
X. He
M. Saucier
L. Rahme
Harvard Partners
Genome Center:
K. Montgomery
G. Grills
L. Li
W. Brown
J. Decker
R. Elliot
L. Gendal
K. Osborn
A. Parerra
C. Xi
P. Juels
R. Kucherlapati