Pubblicazioni

Download Report

Transcript Pubblicazioni

Flexible genome retrieval for
supporting in-silico studies of
endobacteria-AMFs
S. Montani1, G. Leonardi1,
S. Ghignone2, L. Lanfranco2
1 Dipartimento di Informatica, University of Piemonte Orientale, Alessandria, Italy
2 Dipartimento di Biologia Vegetale, University of Turin, Italy
Arbuscular mycorrhizal fungi
(AMFs)

Obligate symbionts in strict association with roots of
land plants



In soil: positive impacts on plants health and productivity
Often in further symbiosis with bacteria
Tripartite system:
AMF Spore
(i) endobacterium
(ii) AMF
(iii) plant roots
AMF Hypha
Endobacteria
Studying the tripartite system

Potentially strong practical impacts

symbiotic consortia may lead to:



Comparative genomics approach to infer




new metabolic pathways
appearance of interesting molecules for sustainable agriculture and
(possibly) for industrial biotechnological applications
phylogenetic relationships
genome evolution
metabolic functions of a given organism (also with few available data)
Key part of the study:

genomic data of the endobacteria and AMF-endobacteria interaction
A computational environment for
AMF-endobacteria interaction



Genomic study of the system AMF
Gigaspora margarita (isolate BEG34)
and of its endobacterium Candidatus
Glomeribacter gigasporarum
BIOBITS project, Regione Piemonte Converging Technologies
Generic Model Organism Database
Modular architecture
(GMOD) project: open source tools



Database
Synteny and visualization tools
BIOBITS research tools
for creating and managing
genome-scale biological databases
Architecture of the system
Flexible retrieval
Data storage

CHADO DB



Bacterial genomes, known annotations, proteins
and metabolic pathways, and newly discovered
annotations
Manually loaded with genomes of Candidatus
Glomeribacter’s relatives
Import modules and RRE - Queries

information retrieved from the biological
databases accessible through the Internet (e.g.
GenBank)
Data visualization

GMOD customizable modules for comparative
genomics



CMap allows to view comparisons of genetic and
physical maps
GBrowse_syn is a synteny browser to display
multiple genomes, with a central reference species
SyBil is a system for comparative genomics
visualizations
New applications (BIOBITS research
tools)

Biomart-based tools



reorganizes the information into a data warehouse
analyzes the data by means of clustering and data
mining techniques
Flexible retrieval tool

Case-based reasoning paradigm
Case-based retrieval
• retrieve past cases
similar to the current
one
• reuse past successful
solutions after, if
necessary, properly
• revising them
• retain the current case
Case representation


Sequence of nucleotides, properly aligned with
the same reference organism
Percentage of similarity with the aligned
nucleotide in the reference organism
Case representation
Flexible retrieval

Abstracting the data at different levels in a
taxonomy

“Bird’s eye” view of similarity
Example:
•DCW region (cellular division)
• About 10 genes
• Region conserved in relatives
• a single gene may not
Flexible retrieval

Abstracting the data at different “states”
granularity levels


Similar to the (state) Temporal Abstraction
technique: from points to intervals sharing a common
persistent behavior
Each state specialized in
further subdivisions
Efficient retrieval



Multi-dimensional index structures
Queries at any level of detail
Interactivity
Query answering


Query: similarity string at any detail level (Hv..Hv)
Query generalization to find index root


Hv..Hv -> H..H -> H
Index navigation backwards respect to query
generalization steps
Computation time


Efficient retrieval particularly critical in very large
databases (bacteria genome DBs growing very fast)
Existing implementation in the haemodialysis domain
 1475 real haemodialysis patients cases
 Fast index-based TA is (41 msec on Intel Core 2
Duo T9400 processor running at 2.53 GHz,
equipped with 4 Gb of DDR2 ram)
Conclusions


Modular architecture for in-silico comparative
genomics studies of AMF-endobacteria
interaction
Flexible genome retrieval tool



Flexible query definition, at different levels of
abstractions
Efficient index-based retrieval
Interactive query refinement/generalization
Future work

Complete tool implementation



Experiments on RefSeq NCBI data
Tool usability
New applications published as new GMOD
modules