genome - Microme

Download Report

Transcript genome - Microme

First Microme Jamboree – June, Monday 27 and Tuesday 28
MicroScope functionalities to
support pathways curation
LABGeM team
Laboratory of Bioinformatic Analysis in Genomic and Metabolism
CEA/DSV/IG/Genoscope & CNRS UMR8030
The MicroScope platform
http://www.genoscope.cns.fr/agc/microscope
Labelled in 2006 (RIO) and in 2009
October 2002 :
Begining of the Acinetobacter baylyi ADP1genome annotation
Computational platform for the annotation and
comparative analysis of bacterial genomes.
- equipments (servers/disks storage/backups)
- softwares and data
- human resources (development/training/support)
=> it offers to the community of microbiologists high
technological resources for the automatic and expert
analysis of genomic data.
Usage of the platform
859 personal
accounts
Expert annotations :
{
493 in France
175 in Europe
81 in USA + 110 others countries
370 000 expert annotations
5000 expert annotations a month (2010)
About 980 bacterial genomes : 345 genomes
annotated in the system (mostly sequenced at
Genoscope and in USA...) and 635 from public
databanks
Since 2004, 33 ‘genome’ papers (4 announcements)
Specific genomic analysis : 22 other publications
Process Management
JBPM Workflows
Data Management
Three MicroScope components
PkGDB
Primary Databank
Update
Syntactic
Annotations
Functional / relational
Analyses
JBPM Database
DB
Release
Job
History
=> full automatisation :
MicroCyc
Primary
Databanks
Internal
Genomic
Objects
Computational
results
MaGe Web Interface
Visualization
> 25 methods :
Integrated in a
workflow
management system
Login
Genome overview
Data Export
Genome browser
and
Synteny maps
Artemis
CGView
LinePlot
Synton
display
Gene
editor
Pathway
Genome
DataBases
Keyword search
Blast and Pattern
Phylogenetic Profile
Fusion / Fission
Tandem duplications
Minimal Gene Set
RGPfinder
SNPs / InDels
Tutorial
Gene
cart
• genome annotation
• primary data up-to-date
KEGG
MicroCyc
Metabolic Profile
Pathway / Synteny
Vallenet D. et al.
«MicroScope - a platform for
microbial genome annotation
and comparative genomics»
Database 2009
Vallenet D, et al.
«MaGe - a microbial genome
annotation system supported
by synteny results» Nucleic
Acids Research 2006
Tools for the syntactic & functional annotation
Syntactic annotation
 Public tools : RepSeek (repeats), Oriloc (oriC/terC position), tRNAscanSE (tRNA genes), Blast on Rfam (snRNA genes).
 “homemade” tools : findrRNA (rRNA genes), AMIMat (gene models
according to codon usage), AMIGene (based on GeneMark), MICheck (reannotation of public bacterial genomes).
Functional annotation
 Public tools : BLAST (searches in specialized databases and Uniprot),
InterproScan (domains and functional sites), COGnitor (COG protein
families), PRIAM (enzymatic functions), Pathway tools (metabolic pathways
reconstruction), SignalP & TMHMM & PSORT (protein localisation).
 “homemade” tools : Syntonizer (gene context analysis),
and at the end, AutoFAssign, automatic functional annotation procedure :
Blast on ‘reference genome annotations’ & syntenies >
HAMAP results > TIGRfam/Pfam results & Blast on
UniProt
Classification of protein genes
Functional classifications from annotation tools
 Gene Ontogoly (GO classification) <- InterProScan results
 COG classification <- COGnitor results
Functional classifications (Gene Editor)
 MultiFun (E. coli; M. Riley)
 TIGR main roles
Other kind of classification
Inspired by the ‘protein name confidence’ defined in PseudoCAP =
Pseudomonas aeruginosa community annotation project
Results available to correct/complete annotation
Annotations from reference genomes
MicroScope curated annotations
Synteny results on available complete bacterial genom
TrEMBL contains functional annotations which
often come from automatic procedures only:
‘IPMed?’ is used for proteins that may have an
experimentally validated function.
TrEMBL Blast similarities: example
IPMed =
Interesting
PubMed?
The MicroScope platform : data management -1Relational DataBase PkGDB
(Prokaryotic Genome DataBase)
Data organisation and
persistence :
 Public/primary data
 Data generated during the
annotation process (analysis
results and expert
annotations)
One instance of PkGDB for all MicroScope projects
 Collaborative annotation
 Annotator accounts and rights on sequences
Annotation history
The MicroScope platform : data management -2Enzymatic activities prediction
(PRIAM)
Bacterial
Genome
EC numbers
correspondence
• Experimentally elucidated
metabolic pathways
• 1600 pathways from 2000
organisms
(P. Karp, SRI, USA)
Pathway Tools
A metabolic database is built for each annotated microbial genome
PGDB = Pathway/Genome Database (orgname_Cyc)
http://www.genoscope.cns.fr/agc/microcyc
Today: 977 organisms, 20
Go
«Metabolic profiles» functionality
PkGDB
Select
organisms to
compare
Number of reactions for
pathway x in a given organism
Total number of reactions in
pathway x
Select
pathway
classes
Metabolic phyloprofile : example of results
Using the “Keywords Search” functionality
Available datasets to be explored ?
 Automatically annotated genes + validated genes
 Only all/personal validated genes
Only annotations from databank files or from our
annotation pipeline
 Gene/Protein features: G+C%, MW, Pi
 Specific fields of the gene editor: Comments/Note
BlastP/Synteny results against:
The set of genomes of the Microscope project
Escherichia coli (updated annotation ) or Bacillus
subtilis (SubtiList database) annotations
 The set of E. coli, B. subtilis, or P. aeruginosa
essential
genes

 Genes involved in synteny groups and annotated as
Protein of Unknown Function or Putative enzyme
 The set of similarities obtained with different sources:
- HAMAP High-quality Automated/Manual Annotation
- SwissProt or TrEMBL databank, limited or not to
blast hits having a possible interesting PubMedID
- PRIAM enzymatic profiles (Enzyme commission),
- COG databank,
- InterPro databank
Genes encoding enzymes involved in KEGG and BioCyc
metabolic pathways
 The results obtained with SignalP, Tmhmm, PsortB and
Coiled Coil
Query on P. putida annotation
Step1 : genes annotated as
« unknown function » =>
2093 results (35%)
Step2 : which ones have
blast similarities (<>
unknown functions) with
UnitProt entries linked to
PubMedID ?
Results of the query...
Result : 216 genes (123 in
SP and 93 in TrEMBL)
« Get gene » => 114 genes
(can be re-annotated)
Syntaxic re-annotation of P. putida
PP3464
PP3463
PP3465
PP3462
PP3460 PP3461
PP3459
PSEPK386
8
Quinohemoprotein
amine
dehydrogenase
PP3466
PSEPK3872
PSEPK3873
Bacterial synteny: parameters
• Correspondence relationship
= Sequence similarity : BlastP
Bidirectional Best Hit
OR
at least 30% identity on 80% of the shortest
sequence
• Co-localization
Gap = 5
How to read the synteny maps ?
ACIAD2450
A putative ortholog to
ACIAD2440 on the E. coli
genome
ACIAD2440
A putative paralog to ACIAD2450
with two others co-localized
ADP1 genes (in yellow)
Another putative paralog to
ACIAD2450, elsewhere on the
ADP1 chromosome
This P. putida « ortholog » (PP0114)
is in synteny with two other genes
(coloured in blue-purple).
These two P. putida genes (PP0220 and
PP4425) are similar to ACIAD2450
(putative paralogs of PP0114 ?)
How are genes organized in a synteny group ? -2-
« Syntonome » results in the gene annotation editor
PkGDB proteomes
NCBI + WGS proteomes
MicroScope web interfaces : MaGe
MicroScope
project
Authentication
Annotation editor
EXPERT CURATION
Help
Options
Genome
Overview
Export
Artemis
Metabolic pathways
LinePlot
Synteny map
Synton visualization
CGView
KeyWords
Blast / Motives
Phylogenetic
profiles Fusions /
Fissions
Genomic islands
Metabolic
profiles
Exploration
MicroScope tutorial
Annotation data in the ‘Gene Validation’ section of the editor
This automatic information does not need to be changed
This information must be completed or corrected by the annotator
This information is optional
With the help of the
Analysis Results
section
New
Adding gene-protein-reaction association (metacyc reactions)
PP0082 = trpA gene
List of the predicted reactions linked to the gene
1
Click on EC to search for all MetaCyc reactions
corresponding to the annotated EC number
2
3
Adding gene-protein-reaction association (metacyc reactions)
PP0082 = trpA gene
Added for PP
PP0083 = trpB gene
David Vallenet Demo :
please go to
http://www.genoscope.cns.fr/agc/microscope/