ppt - Sol Genomics Network

Download Report

Transcript ppt - Sol Genomics Network

The US Contribution to the
International Tomato Genome
Sequencing Project
Overview of Presentation



Background on the International Solanaceae Initiative
(SOL) and the International Tomato Genome
Sequencing Project
Sequencing strategy
Resources available


SOL Genomics Network (sgn.cornell.edu)


Details about resources
Informatics pipelines
Educational outreach activities
An International Workshop to Discuss
Sequencing of the Tomato Genome: Feasibility, Benefits and Strategy
November 3, 2003, Washington D.C
* funded in part by the National Science Foundation
On November 3, 2003 an international meeting was held in Washington DC which was
attended by 70 scientists from 11 countries. The outcome was the creation of a 10
year vision for research in the family Solanaceae referred to as “ The International
Solanaceae Genome Project or SOL”. SOL, which includes sequencing the tomato
genome, will create a worldwide research and informational infrastructure in which a
systems biology approach can be taken to address key questions in biology and
agriculture for which the Solanaceae are ideally suited
For details, see: http://sgn.cornell.edu/solanaceae-project/
The SOL Vision
Potato
Eggplant
Petunia
Coffee*
Pepper
Tomato reference
genome sequence
Understanding Diversification &
Adaptation
Nicotiana
Arabidopsis and
other genomes
Exploring the Role of Natural
Diversity in the Genetic
Improvement of Crops
* Coffee is closely related to the Solanaceae and has a similar genome size and chromosome karyotype -- a comparative map of coffee
with solanaceous species is part of the SOL project
Objectives of Tomato Sequencing
Project

Produce a contiguous sequence of the gene rich, euchromatic
arms of each of the 12 tomato chromosomes.


Groups from 10 countries are partners in the project
Our group is sequencing 3 of the chromosomes, the remaining 9
are each being sequenced by a group in a different country.

Process and annotate this sequence in a manner consistent and
compatible with similar data from Arabidopsis, rice and other
plant species.

Create an international bioinformatics portal for comparative
Solanaceae genomics which can store, process, and make
available to the public the sequence data and derived
information from this project and associated genomics activities
in other solanaceous plants.
Tomato Euchromatin Gene Space
Sequencing Strategy
The tomato genome contains approximately 950
Mb of DNA of which 23% is euchromatin.


Peterson et al., 1996, Genome 39:77-82
The majority of tomato genes reside in the
euchromatin.


Gene rich and repeat poor



Approximately 85% of the tomato genes
supported by available BAC (Bacterial Artificial
Chromosome) sequence data
available from BACs isolated on the basis of target genes
Organization of tomato genome & impact on sequencing strategy
telomere
euchromatin
162 bp subtelomeric repeat
centromere
A
telomere
structure
pericentric
heterochromatin
euchromatin
pericentric pericentric
heterochromatin
heterochromatin
BAC hybridization in euchromatin
C
7 bp telomeric
repeat
B
BAC hybridization
US Project







Initiated in September 2004
Chromosomes 1, 10, 11
Funding from NSF Plant Genome Research Program
DNA sequencing is sub-contracted to a high-capacity
sequencer
Distribution of materials to sequencing partners
Coordination of international efforts
Bioinformatics portal
 SOL Genomics Network (SGN)
 sgn.cornell.edu
Jim Giovannoni
PI, BTI
• Overall operation of project
• Interactions among co PIs
• Generation of BAC libraries
• Clone distribution to international project
members
• Clone handling & storage
• Computational analysis of regulatory domains
Steve Tanksley
Co-PI, Cornell
• Selection of seed BACs and
extension BACs for sequencing.
• Overgo anchoring of genetic
markers.
• Genetic mapping of BACs
• Comparative mapping
Lukas Mueller
Co-PI, Cornell
• Bioinformatics
• Interaction with sequencing
center
• BAC assembly
• Annotation
• Data integration with other
countries
• Training
Stephen Stack
Co-PI, CSU
• Distal/proximal BAC anchoring
• FISH for gap estimates
• Heterochromatin BAC
identification of sequencing
• International coordination for in
situ research
Joyce Van Eck
Co-PI, BTI
• Day to day coordination/
operations of project.
• Planning and running
teleconferencing of co PIs.
• Assist in preparing annual
reports and conference
presentations.
• Educational outreach activities
Outline of Approach


Sequencing is following a BAC-by-BAC strategy.
Starting point for sequencing is approximately 1000 "seed” BACs
individually anchored to a high density genetic map.

Each sequenced anchor BAC serves as a seed from which to radiate
out into the minimum tiling path.

Especially interested in BACs located as close as possible to
telomeres and euchromatin/heterochromatin borders.

Fluorescence In Situ Hybridization (FISH) is being utilized for BAC
localization.

To steer sequencing activities into the euchromatin and away from the
heterochromatin
Resources Available

High density genetic map



Physical map





Accounts for 20% of the genome sequence
Fingerprint Contigs (FPC)



Developed from genetic markers
Integrate the genetic with the physical map
Seed BACs (Bacterial Artificial Chromosomes)
BAC libraries and corresponding hybridization filters
BAC end sequences (~ 400,000)


various types of molecular markers
Overgo probes (overlapping oligonucleotide probes)


Solanum lycopersicum x S. pennellii F2 population
(Tanksley et al. 1992, 132:1141-1160)
Assemble the BAC collection into contigs
Rod Wing and Wellcome Trust Sanger Institute
FISH (Fluorescence In Situ Hybridization)
Future Resource

Fosmid Library
 Use for filling small gap intervals
 Made from sheared genomic DNA
 Average insert size of 40 kb (~12x physical
coverage)
 End-sequence 400,000 clones
Selection and Verification of Seed
BACs

Selection
 Choose two seed BACs (>100kb) that are well within
the euchromatic region
 Only one needs to be confirmed to move ahead

Verification (at least one method should be chosen)
 Verify marker-BAC association by sequencing with
marker-specific primers
 Rehybridizing BAC clones using overgo probes
 PCR amplification of genetic markers from the BAC
clones
Methods To Verify Locations of Seed
BACs


Map BACs in tomato Introgression Lines (ILs)
 CAPS markers
Fluorescence In Situ Hybridization (FISH)
 Steve Stack’s lab, Colorado State University
 US and countries not set up to do FISH
 Countries doing FISH
 China
 The Netherlands
 France sent a participant to Stack lab to learn
FISH.
FISH Image
BAC Libraries

DNA from Heinz 1706
Library
Total #
name/enzyme of clones
Cloning
vector
Average
insert size
(kb)
# of BAC end
sequences
HindIII
129,024
pBeloBAC11
117
188,130
MboI
50,688
pEC BAC1
135
112,507
EcoRI
75,000
pIndigoBAC-5
95 - 100
101,375
Seed BACs for each chromosome are distributed to
each respective country sequencing that
chromosome
euchromatin
euchromatin
Pachytene chromosome
FISH
seed BAC
Genetic markers anchored via OverGo hybridizatrions
Seed BACs (solid)
anchored to genetic
map and pachytene
chromosomes via
FISH; bridging clones
(dashed) in MTP
identified through
combination of BAC
end sequence database
and FPC
Genetic map
Informatics Pipelines

SOL Genomics Network (sgn.cornell.edu)

BAC registry database


Project members can log in to upload BAC information.
Project-wide Sequencing Quality Control (QC)

Implemented various global QC checks

Functional and Structural Annotation

The International Tomato Annotation Group (ITAG)


Formed at a meeting in Ghent, Belgium (October, 2006)
Established an annotation protocol for the tomato genome.
Summary of Tomato Genome Annotation Pipeline
Repeat Content in Annotated BACs on Chromosome 1
GBrowse on SGN
Euchromatic
BAC
Heterochromatic
BAC
Tomato FISH Map on SGN
-Represents FISH analyses done at several labs involved in the project
Indicates euchromatin
Indicates heterochromatin; this darker blue at the ends represents the
telomeres
FISH localized BACs
Tomato FISH Map
Outreach



Bioinformatics Summer Internship
 SOL Genomics Network
 Undergraduates and high school students
 Each student has her/his own project
 Housing provided
The Solanaceae Family goes to School
 Geared towards kindergarten - 5th grade
 Elementary schools
 Afterschool programs
Others
 Presentations to various groups
 High school teacher workshop
2005 and 2006
Bioinformatics Summer Interns
The Solanaceae Family goes to School
SOL Newsletter
-bimonthly
-sent by e-mail
-list of ~400 members worldwide
-also posted as pdf on SGN
(sgn.cornell.edu)
-Send e-mail to [email protected]
to be added to list
Acknowledgements

Boyce Thompson Institute



Colorado State University

Julia Vrebalov
Ruth White



Lorinda Anderson
Suzanne Rogers
Song-Bin Chang
Cornell University








Yimin Xu
Nancy Eanetta
Rob Buels
Beth Skwarecki
Marty Kreuter
Naama Menda
John Binns
Chenwei Lin

SeqWright
Agencourt Bioscience
SymBio

Funding



NSF Plant Genome Research
Program