PPT - NC BioGrid

Download Report

Transcript PPT - NC BioGrid

Center for Integrated Fungal
Research
Fungal Genomics Laboratory
Industrial applications
•glutamic acid
•citric acid
•amylases
•proteases
•lipases
Bioterrorism
Biologically interesting and
genetically tractable
Insight into eukaryotic gene regulation and development
Framework of rice blast genome
Deep (25X) large insert (130 kb) single enzyme (HindIII) BAC
library from rice infecting strain 70-15 – 9,216 clones
B. BAC fingerprints
used to create
contigs
RFLP 2
RFLP 1
RFLP 3
BAC 1
C. BAC contigs
BAC 2
anchored to genetic
BAC 3
map
BAC 4
BAC 5
BAC 6
A. BAC-end sequence
provides “Sequence Tag
Connectors”
1f 2f
3f 4f
5f
6f
1r 2r
3r 4r
5r
6r
STC: ~500 bp sequence every 3-4 kb across genome
USDA-IFAFS
project Oct 2000
“Gene discovery in the rice blast fungus:
ESTs and sequence of chromosome 7”
1. Generate ~5 X draft sequence of
chromosome 7 (4.2 Mb).
2. Generate 35,000 ESTs and create a set of
~5,000 ESTs representing unique genes.
3. Provide basic sequence analysis and
integration of data into physical map of
chromosome 7.
NSF-IFAFS project
Oct 2001
whole genome sequence
host-pathogen function analysis
•
Generate ~7 x draft sequence of M.grisea
•
Generate 50,000 knockouts
•
Analyze host-pathogen interaction
•
Provide basic sequence analysis
Consequences of Scaling
1994
1995
1996
1997
1998
lab processing Base Pairs
1999
2000
Sequences
2001
2002
moore's law
• Moore’s law has
allowed labs to
keep ahead of data
• Sequence data is
now outpacing
processing
capability
• Bioinformatics
processing will be a
real problem
Computational platforms
• Modern biology requires robust
computational platforms
• Computer technology implementation is
expensive (from a biologists viewpoint)
• Computer technology development is even
more expensive (you want how much?!)
• This detracts from research for small labs
On the brink
• Significant investment in off the
shelf components and cross training
people
• Moderate sized genomes
• 20 to 50 Mega Bases
• Takes 2 weeks for initial analyses
• Homology searches take days
Local blast
(www.fungalgenomics.ncsu.edu)
Federated database
Select a
chromosome
Link to genetic
information
(blue)
Link to marker
data and other
data at
http://ascus.cit.cor
nell.edu/blastdb/
High Throughput Genomic
Processing and Display
Rice blast
N. crassa synteny
3 kb
N. crassa Contig 1.515
2 kb
10 kb
185kb
0.5 kb
M. grisea - BAC 6J18
111kb
15 kb
20 kb
1 kb
N. crassa Contig 1.13
1 kb
N. crassa Contig 1.513
N. crassa Contig 1.841
17 kb
97 out of 179 unique ESTs from chromosome 7 gave significant
(E<10-5) tBlastX match to N. crassa genome shotgun assembly
CIFR BioInformatics
Foundation
BioInformatics
Biological
results
GRL
Rube
Sequence
Pipe line
mask
Phred
Phrap
Advanced
BioInformatics
Artemis
Curation
consed
extract
load
Research
BioInformatics
Higher Order
BioInformatics
synteny
Cluster analysis
Pathway analysis
In-silico mutation
Cellular models
Blast
Report db
Data
Loading
Relational
Data Model
Curation
Work area
High
Throughput
WebBlaster
Sequence
Data
OO
Genomic
Analysis
Public
Http
Exposure
AlkaEST
Data
mining
Http
Blast
Report
Genome
browser
BioPerl
Interface
Submissions
Extraction
homology
Repeat analysis
Gene prediction
EST analysis
PBS/LSF
Grid Access
NC BioInformatics
Super computing
Grid
Genbank
Developed at
CIFR
Ongoing work
at CIFR
Open source
and others
And over the . . .
• Our whole genome arrives Spring 2002
• Everyone wants immediate results
• Host (Rice) genome size far greater than
the pathogen
• Comparative genomics likely to require N
way analyses
• And then there’s proteomics ….
Research Biology
NCSU GRL
•Romulus
•Remus
Excellent
foundation work
~6 years
to sequence
M.grisea
Industrial Scale Biology
High
Throughput
Sequence
Centers
(Whitehead)
~4 days
to sequence
M.grisea
Research Bioinformatics
CIFR FGL
•Mycelial mat
Excellent
foundation work
est. 4 years
to analyze
M.grisea
Industrial Scale Bioinformatics
North
Carolina
BioGrid
Hopefully
4 hours
to analyze
M.grisea
Islands of Capability
• There are not enough resources for every
lab to re-implement technologies
• Individual centers specialize according to
their research focus
• Grid ties together disparate systems
• Share knowledge and capabilities
• Standards based for interoperability
Future directions
5 years*
• Organized distributed research - “Virtual Centers”
• Bioinformatics
• Tool development
• Gene prediction algorithms for filamentous fungi
• Gene Indexing
• “Distributed Annotation Systems (DAS)”
• Develop better search features “Queries”
• Integrate sequenced and annotated BAC clones
• Integrate ESTs and expression profiles etc
• Functional Genomics
• Comparative studies - saprophyte vs pathogen etc
• Coordinate IRBGC and PGI etc
•
•
•
•
Complete nucleotide sequence, full length ESTs
Knock out/silence all genes
Transcriptional profiling in various backgrounds (path mutants)
Construct protein-protein linkage maps (signaling pathways)
* The biologists view
Future Directions
5 years*
•
•
•
•
Collaborative knowledge sharing
New data mining approaches
New ways of visualizing the information
In-silico experimentation
•
•
•
•
Gene knock outs
Regulatory modification
Pathway models
Cellular models
* The bioinformaticians view
Finding solutions to
practical problems
•
•
•
•
•
•
Seeking answers requires asking questions
Takes 1-2 weeks per question
BioGrid may give near real-time response
BioGrid will bridge the islands of capability
Focus resources back on our work
Consequently, we are going to further
accelerate the rate of discovery