Finishing & Sequence Features Helen Beasley
Download
Report
Transcript Finishing & Sequence Features Helen Beasley
Solanum lycopersicum
Chromosome 4
Mapping and Finishing Update
Wellcome Trust Medical Photographic Library
SRC-UK and
Wellcome Trust Sanger Institute
SOL Korea – September 2007
Tomato Physical Map
BACs are selected for sequencing on chromosome 4 using the physical
map assembled in fpc.
The map has been assembled using fingerprinted clones from 2 BAC
libraries. Extending and gap filling clones are identified using end
sequences. Clones are fingerprinted, entered in fpc and overlaps
checked before being selected for sequencing.
Tomato BAC libraries
Library
No. of
clones
Average Insert
Genome
equivalents
Fingerprints
LE_HBa
129,024
117 kb
15 X
88,000 (AGI)
SL_MboI
52,992
135 kb
7X
43,000 (WTSI)
SL_EcoI
72,264
95-100 kb
7X
Map Coverage – Chromosome 4
Chromosome 4 is represented by 45 FPC contigs that cover
approximately 22.2Mb, estimated from fingerprints (5 bands/kb). 40
clones have been selected to extend original contigs based on clone
end sequence matches
All contigs are anchored to the chromosome by SGN chromosome 4
markers
FISH (H. de Jong, Wageningen) has confirmed the placement of some
contigs on chromosome 4, but may refute placement of >= 7 contigs.
Confirmation of chromosome 4 contigs is high priority.
142 markers are missing out of the 907 SGN chromosome 4 markers
from current fpc build. Overgo probes are being used to screen the
BAC libraries. They may identify ~47 additional clones
The Syngenta marker data will also be used for identifying additional
BACs.
FISH Data
Confirmation of chromosome
location
Verification of contig and marker
placement
Assessment of heterochromatin &
euchromatin distribution
This image demonstrates:
FISH performed by S. B. Chang at
Prof S. Stack’s Laboratory, University of Colorado, USA.
–
LE_HBa114C15 on short arm
–
LE_HBa308B7 on
heterochromatin/centromere
border
–
LE_HBa20F17 on long arm
Chromosome 4 –
Distribution of contigs
Mapped Markers
FISH
confirmed
ctg503
ctg5014
ctg5716
ctg5252
ctg15
ctg1189
ctg1406
ctg916
ctg5711
ctg1795
This shows that clones for sequencing have been selected from
seed contigs along the length of the chromosome. Including those
selected from putative heterochromatic regions to try to asses the
boundary domains
Distribution of Chromosome 4 Contigs
Chr4 Mapped
Markers
FISH
confirmed
TG485
T0635
T0954
= Euchromatin
= Heterochromatin
Centromere
T1322
CT_At5g
T1068
TG287
P74
P41
TG163
37360
ctg503
ctg5014
ctg5252
ctg15
ctg1189
ctg1406
ctg5716
ctg916
ctg5711
ctg1795
Analysed BAC
and Number of
gene models
bTH8H22 - 4 Genes
bTH36C23 – 2 Genes
bTH50I18 – 3 Genes
bTH114C15
2 Genes
bTH308B7
0 Genes
bTH198L24 – 0 Genes
bTH31H5 – 1 Gene
bTH132O11
3 Genes
bTH53M2
5 Genes
bTH59M16
7 Genes
This shows that clones for sequencing have been selected from seed
contigs along the length of the chromosome. Ten contigs shown are from
the current 45 fpc contigs on chr4 - including those selected from
putative heterochromatic regions to try to assess the boundary domains.
The number of gene models obtained from the gene
prediction training set
Sequence Plot of ctg916 euchromatin
Sequence Plot of ctg5711 euchromatin
Sequence Plot of ctg15 (heterochromatic euchromatic boundary region)
Same plot
as before
with greyscale
adjusted to
view repeat
features
Sequence Plot of ctg5014 near centromere
Same plot
as before
with greyscale
adjusted to
view repeat
features
TPF File
Tile Path Format file – tab delimited flat file
GAP
?
CT990489
GAP
CT990488
?
GAP
?
CT990558
GAP
CT990624
CT476825
CT573298
CT485992
type-3
?
LE_HBa-24G5
LE_HBa-20F17
type-3
?
LE_HBa-114C15
SL_MboI-143K21
type-3
?
LE_HBa-147F16
LE_HBa-308B7
type-3
?
LE_HBa-27G19
LE_HBa-198L24
LE_HBa-119A16
LE_HBa-31H5
ctg145
ctg145
ctg5716
ctg5716
ctg5014
ctg5014
ctg15
ctg15
ctg15
ctg15
AGP File
Accesioned Golden Path – tab delimited flat file
Order and alignment of Phase 3 finished accessions
chr4
chr4
chr4
chr4
chr4
chr4
chr4
chr4
chr4
chr4
1
50001
100001
150001
200001
360433
370114
532278
582278
632278
50000
100000
150000
200000
360432
370113
532277
582277
632277
682277
1
2
3
4
5
6
7
8
9
10
N
N
N
N
F
F
F
N
N
N
50000
clone
50000
clone
50000
contig
50000
clone
CT476825.1
CT573298.1
CT485992.1
50000
contig
50000
clone
50000
contig
Gaps and unfinished clones are entered as 50,000bp sections to
more accurately represent the chromosome in each build
no
no
no
no
1
2001
2001
no
no
no
160432
11681
164164
+
+
+
AGP View on SGN
PseudoGoldenPath analysis for Contig
Extension and Gap Closure
A PGP viewer is being developed to visualise sequence alignments and
contig positioning
Contains finished and unfinished sequence
Unfinished clones are represented as sequence contigs
Unmasked BES aligned to PGP sequence using ssaha2
Parameters e.g. minimum percentage id = 95%, minimum of 60% of the end
sequence found
Map gaps are assigned an arbitrary 5kb size
Clone candidates for contig extension checked with BLAST and fingerprinted
Aim to incorporate other data such as markers
Closing the Map using PGP
Bridging clones
identified from BES
alignments to
sequence
Sequenced clones
MAP GAP
53 clone extensions have been identified, including 5 merges with previously
unplaced contigs. 2 merges of chromosome 4 contigs have also been made
Extender from Fosmid Library
Fosmid end sequences deposited by Cornell have been aligned
to chromosome 4 sequence
Potential Extender
A copy of the fosmid library has been received at WTSI and ~
50,000 clones will be end sequenced by December and the
sequences deposited in the Ensembl / NCBI Trace repositories
WTSI Tomato Clone Pipeline
Pipeline Stage
Number of BACs
Subcloning
34
Shotgun
21
Assembly Start
7
Auto-prefinishing
3
Finishing
11
QC Checking
4
Phase 2
Finished
63
Phase 3
Total
143
HTGS:
Phase 1
Chromosome 4
Sequence Generated
Total Sequence Available
10,666,227 bp
Total Unique Sequence
10,633,995 bp
Total amount of Finished Sequence = 7,543,322 bp
Summary of Progress on Chromosome 4
45 map contigs have been built on chromosome 4
Clone end sequence alignments visualised with the PGP viewer are being
used to extend contigs and close gaps
~100,000 fosmid end sequences will be generated by end 2007
10.6Mb of sequence has been generated, of which 7.5Mb are finished
All sequence assemblies >2kb are deposited in HTGS divisions of
EMBL/GenBank/DDBJ
Acknowledgements
Wellcome Trust Sanger Institute:
Jane Rogers
Sean Humphray
Clare Riddle and Mapping Core Group
Karen McLaren and Finishing Team 46
Stuart McLaren and Pre-finishing Team 58
Christine Lloyd and QC Team 57
Karen Oliver
Matt Jones
Carol Scott
Imperial College London:
Gerard Bishop
Daniel Buchan
James Abbott
Sarah Butcher
University of Nottingham:
Graham Seymour
Scottish Crop Research Institute:
Glenn Bryan
FUNDING
Cornell University:
Lukas Mueller
Jim Giovannoni
MIPS/IBI Institute for Bioinformatics:
Klaus Mayer
Remy Bruggmann
FISH Resources
Stephen Stack Group (Colorado)
Hans de Jong (Wageningen)