Pubblicazioni
Download
Report
Transcript Pubblicazioni
Flexible genome retrieval for
supporting in-silico studies of
endobacteria-AMFs
S. Montani1, G. Leonardi1,
S. Ghignone2, L. Lanfranco2
1 Dipartimento di Informatica, University of Piemonte Orientale, Alessandria, Italy
2 Dipartimento di Biologia Vegetale, University of Turin, Italy
Arbuscular mycorrhizal fungi
(AMFs)
Obligate symbionts in strict association with roots of
land plants
In soil: positive impacts on plants health and productivity
Often in further symbiosis with bacteria
Tripartite system:
AMF Spore
(i) endobacterium
(ii) AMF
(iii) plant roots
AMF Hypha
Endobacteria
Studying the tripartite system
Potentially strong practical impacts
symbiotic consortia may lead to:
Comparative genomics approach to infer
new metabolic pathways
appearance of interesting molecules for sustainable agriculture and
(possibly) for industrial biotechnological applications
phylogenetic relationships
genome evolution
metabolic functions of a given organism (also with few available data)
Key part of the study:
genomic data of the endobacteria and AMF-endobacteria interaction
A computational environment for
AMF-endobacteria interaction
Genomic study of the system AMF
Gigaspora margarita (isolate BEG34)
and of its endobacterium Candidatus
Glomeribacter gigasporarum
BIOBITS project, Regione Piemonte Converging Technologies
Generic Model Organism Database
Modular architecture
(GMOD) project: open source tools
Database
Synteny and visualization tools
BIOBITS research tools
for creating and managing
genome-scale biological databases
Architecture of the system
Flexible retrieval
Data storage
CHADO DB
Bacterial genomes, known annotations, proteins
and metabolic pathways, and newly discovered
annotations
Manually loaded with genomes of Candidatus
Glomeribacter’s relatives
Import modules and RRE - Queries
information retrieved from the biological
databases accessible through the Internet (e.g.
GenBank)
Data visualization
GMOD customizable modules for comparative
genomics
CMap allows to view comparisons of genetic and
physical maps
GBrowse_syn is a synteny browser to display
multiple genomes, with a central reference species
SyBil is a system for comparative genomics
visualizations
New applications (BIOBITS research
tools)
Biomart-based tools
reorganizes the information into a data warehouse
analyzes the data by means of clustering and data
mining techniques
Flexible retrieval tool
Case-based reasoning paradigm
Case-based retrieval
• retrieve past cases
similar to the current
one
• reuse past successful
solutions after, if
necessary, properly
• revising them
• retain the current case
Case representation
Sequence of nucleotides, properly aligned with
the same reference organism
Percentage of similarity with the aligned
nucleotide in the reference organism
Case representation
Flexible retrieval
Abstracting the data at different levels in a
taxonomy
“Bird’s eye” view of similarity
Example:
•DCW region (cellular division)
• About 10 genes
• Region conserved in relatives
• a single gene may not
Flexible retrieval
Abstracting the data at different “states”
granularity levels
Similar to the (state) Temporal Abstraction
technique: from points to intervals sharing a common
persistent behavior
Each state specialized in
further subdivisions
Efficient retrieval
Multi-dimensional index structures
Queries at any level of detail
Interactivity
Query answering
Query: similarity string at any detail level (Hv..Hv)
Query generalization to find index root
Hv..Hv -> H..H -> H
Index navigation backwards respect to query
generalization steps
Computation time
Efficient retrieval particularly critical in very large
databases (bacteria genome DBs growing very fast)
Existing implementation in the haemodialysis domain
1475 real haemodialysis patients cases
Fast index-based TA is (41 msec on Intel Core 2
Duo T9400 processor running at 2.53 GHz,
equipped with 4 Gb of DDR2 ram)
Conclusions
Modular architecture for in-silico comparative
genomics studies of AMF-endobacteria
interaction
Flexible genome retrieval tool
Flexible query definition, at different levels of
abstractions
Efficient index-based retrieval
Interactive query refinement/generalization
Future work
Complete tool implementation
Experiments on RefSeq NCBI data
Tool usability
New applications published as new GMOD
modules