Virulence Factor Ontology

Download Report

Transcript Virulence Factor Ontology

Bioscience
Discovering virulence genes present
in novel strains and metagenomes
Chris Stubben
IC postdoc, B-7
Operated by Los Alamos National Security, LLC for NNSA
Bioscience
Operated by Los Alamos National Security, LLC for NNSA
Overview
• Review current functional classification systems
• Discuss Virulence Factor Ontology
• Identify virulence genes in novel strains and
metagenomes
Slide 3
Functional classification systems
• EC numbers for enyzmes (1956)
• Swiss-Prot keywords (1986)
• E. coli gene functions, M. Riley (1993)
• TIGR role categories (1995)
• Gene Ontology (1998)
function
gen
e
Slide 4
What functions are related to virulence?
• Some systems have a few terms
– Swiss-Prot keywords = virulence,
toxin, antibiotic resistance
– TIGR roles = pathogenesis, toxin
production and resistance
• Gene Ontology (GO) also has
pathogenesis, resistance to
antibiotics, plus many more
GO terms related to the enzymatic
activity of toxins
Slide 5
Gene Ontology (GO)
• 25,688 terms in three structured controlled vocabularies
(ontologies)
– 15098 biological processes
– 2186 cellular components
– 8404 molecular functions
• Standard for eukaryotic gene annotation
• Increasingly used for prokaryotes
– TIGR (2002)
– Plant pathogens by PAMGO at VBI (2005)
– Human pathogens at 8 BRCs (2006)
Slide 6
Bioinformatics Resource Centers (BRC)
• NIAID funded, $100 million dollar effort to create eight
bioinformatic centers for human pathogens
• Goal is to provide easy access to genomic data from
multiple strains like eukaryotic model organism databases
BRCs =
?
Slide 7
Example: Toxin annotation in GO
Step 1, Assign GO terms, maybe
–
–
–
–
activation of Rho GTPase activity
N-terminal peptidyl-glutamine deamination
actin cytoskeleton reorganization
stress fiber formation
Slide 8
Step 2, add references and evidence codes
Virulence Protein
Experimental
Computational
Sequence
similarity
• Knockout mutants (IMP)
• Overexpression phenotypes (IDA)
• Genetic interactions (IGI)
• Microarrays (IEP or RCA)
• BLAST alignments (ISA)
• Orthologous proteins (ISO)
• Hidden markov models of protein
families or domains (ISM)
Function
Genomic
context
• Phlyogenetic profiles,
conserved neighborhoods,
gene fusion, shared
regulatory sites, etc (IGC)
Slide 9
Example: Toxin searches in GO
• If a gene is annotated to ‘adenylate
cyclase activity’, how do you know it’s a
toxin?
• It may also annotated to “cell killing” or
related term, but is that enough?
• However, an alternative is to define
virulence factors and toxins (both
outside the scope of GO) in a new
ontology
Slide 10
Why we need a Virulence Factor ontology
• Lots of effort to characterize pathogenic processes and
systems (eg, BRCs)
• Many different definitions of pathogen, virulence and
virulence factors
• Not clear what terms in GO may be related to toxins
and virulence (BRCs have already assigned 750,000
GO terms to 300,000 genes)
Slide 11
Virulence Factor Ontology working group
• Goal is to combine existing toxin and virulence terms from
various groups into a single ontology
– TVFac and antibiotic resistance (AR) terms at LANL
– Gemina virulence factors and AR terms at U. of Maryland
– PAMGO terms in GO
• Participants
– MITRE. Lynette Hirschmman, Marc Colosimo, and others
– LANL. Chris Stubben, Murray Wolinsky and Jian Song
– U of Maryland IGS. Lynn Schriml and Michelle Gwinn
Slide 12
Virulence Factor Ontology (VFO)
• Three new ontologies, one very simple that points
to additional terms in GO or to new ontologies
• Virulence factor (definition needed!)
–
–
–
–
–
–
–
–
–
toxin associated processes
New
antibiotic resistance
New
adhesion
entry into host
acquisition of nutrients from host
avoidance of host defenses
simplified GO trees (slims)
growth within host
modification of host morhphology
dissemination from host
Slide 13
Virulence genes in novel strains
• Emerging, engineered and novel strains will most likely
be sequenced quickly using next generation sequencing
technologies,
• and then compared to near neighbor strains using
sequence similarity (BLAST) or models (HMMs like
PFams, TIGRFams, FIGFams, EnteroFams, etc).
Slide 14
Compare novel strains to what?
• Very few manual annotations available for prokaryotes,
especially in public databases like NCBI and UniProt
Table 1. Percentage of genes in UniProt with functional
assignments to Gene Ontology terms based on
experimental evidence in the primary literature.
“Curated information from
the literature serves as the
gold-standard data set for
comparative analyses”
-Nature Sep10, 2008
Use BRCs!
Slide 15
BRC annotations
• Genomes annotations should have references and
evidence codes signifying whether annotations were
produced experimentally or computationally
3.8% of Y.pestis
CO92 with manual
annotations
Slide 16
Y. pestis CO92 annotations at ERIC
Table 1 and 2. Sequence features and coding sequence
annotations for Y. pestis CO92 at ERIC
Slide 17
Yersinia antibiotic resistance genes
Table 1 and 2. Antibiotic resistance genes found using Swiss-prot
keyword search ‘antibiotic resistance’ in UniProt and using GO term
search ‘response to antibiotic’ in ERIC.
Only one gene in common!
Slide 18
Vibrio toxins in GO, UniProt, and NMPDR
Slide 19
Virulence genes in metagenomes
• Recent comparison of virulence genes in chicken, cow,
mouse and human gut metagenomes (metavirulomes)
was based on SEED subsystem categories at NMPDR
• Another alternative is to use GO
term mappings to protein family
and domain databases like PFam
Slide 20
IMG/metagenomes from JGI
• Select metagenomes and save
Slide 21
Create abundance profiles
• Compare using Pfam, COG, or TIGRfam abundance
profiles
Slide 22
Find virulence genes
• Use GO term mappings to PFAM database to find
virulence genes
ID
PF00144
PF05139
PF05223
PF07091
PF01289
PF01376
PF03023
PF03077
PF03945
PF05394
PF05480
PF05658
PF05662
PF05932
PF07269
PF07675
PF07822
PF09025
PF09207
PF06414
PF06769
PF02794
Map tp GO term
response to antibiotic
response to antibiotic
response to antibiotic
pathogenesis
pathogenesis
pathogenesis
pathogenesis
pathogenesis
pathogenesis
pathogenesis
pathogenesis
pathogenesis
pathogenesis
pathogenesis
pathogenesis
pathogenesis
pathogenesis
pathogenesis
pathogenesis
pathogenesis
pathogenesis
pathogenesis
Pfam
Air 1
Air 2
Soil
Whalefall Human 7
Beta-lactamase
0.3094
0.2349
0.2757
0.1087
0.0191
Erythromycin esterase
0.0041
0.0114
0.0240
0.0000
0.0064
NTF2-like N-terminal transpeptidase domain
0.0000
0.0000
0.0010
0.0000
0.0000
Ribosomal RNA methyltransferase (FmrO)
0.0000
0.0000
0.0021
0.0000
0.0000
Thiol-activated cytolysin
0.0000
0.0000
0.0021
0.0000
0.0000
Heat-labile enterotoxin beta chain
0.0000
0.0000
0.0000
0.0000
0.0000
MviN-like protein
0.0247
0.0341
0.0459
0.0225
0.1146
Putative vacuolating cytotoxin
0.0000
0.0000
0.0000
0.0037
0.0000
delta endotoxin, N-terminal domain
0.0041
0.0000
0.0000
0.0000
0.0000
Avirulence protein
0.0000
0.0000
0.0010
0.0000
0.0000
Staphylococcus haemolytic protein
0.0000
0.0000
0.0010
0.0000
0.0000
Hep_Hag
0.0289
0.0265
0.0073
0.0112
0.0127
Haemagglutinin
0.0289
0.0379
0.0021
0.0000
0.0191
Tir chaperone protein (CesT)
0.0000
0.0000
0.0010
0.0000
0.0000
T-complex transport apparatus lipoprotein VirB7
0.0000
0.0000
0.0010
0.0000
0.0000
Cleaved Adhesin Domain
0.0000
0.0000
0.0000
0.0000
0.0064
Neurotoxin B-IV-like protein
0.0000
0.0000
0.0010
0.0000
0.0000
YopR Core
0.0000
0.0000
0.0000
0.0037
0.0000
Yeast killer toxin
0.0000
0.0000
0.0010
0.0000
0.0000
Zeta toxin
0.0000
0.0076
0.0010
0.0000
0.0000
Plasmid encoded toxin Txe
0.0041
0.0038
0.0010
0.0037
0.0191
RTX toxin acyltransferase family
0.0000
0.0038
0.0000
0.0000
0.0000
Slide 23
Need better mappings to virulence genes
• Current GO term mappings miss most virulenceassociated genes.
Table 1 and 2. PFAMs and TIGRfams overrepresented in air compared to soil
ID
PF00593
PF07715
PF03466
PF00126
PF00440
PF00873
PF00015
PF07992
PF00106
PF01381
PFAM
TonB dependent receptor
TonB-dependent Receptor Plug Domain
LysR substrate binding domain
Bacterial regulatory helix-turn-helix protein, lysR family
Bacterial regulatory proteins, tetR family
AcrB/AcrD/AcrF family
Methyl-accepting chemotaxis protein (MCP) signaling domain
Pyridine nucleotide-disulphide oxidoreductase
short chain dehydrogenase
Helix-turn-helix
ID
TIGR00014
TIGR01297
TIGR01782
TIGR02606
TIGR01552
TIGR01435
TIGR01352
TIGR01509
TIGR00093
TIGR02690
TIGRFAM
arsenate reductase (glutaredoxin)
cation diffusion facilitator family transporter
TonB-dependent receptor
putative addiction module antidote protein
prevent-host-death family protein
putative glutamate--cysteine ligase
TonB family C-terminal domain
haloacid dehalogenase superfamily
pseudouridine synthase family
arsenical resistance protein ArsH
Air 1
Air 2
0.90
0.94
0.68
0.48
0.42
0.77
0.29
0.78
0.99
0.31
Air 1
Soil
0.87
0.95
0.58
0.42
0.43
0.58
0.22
0.79
0.89
0.38
Air 2
0.40
0.29
0.23
0.25
0.25
0.20
1.01
0.49
0.54
0.11
0.16
0.33
0.16
0.14
0.17
0.43
0.05
0.58
0.74
0.17
Whalefall Human 7
0.31
0.00
0.37
0.02
0.52
0.18
0.38
0.27
0.28
0.20
0.64
0.03
0.42
0.00
0.77
0.49
0.58
0.23
0.23
0.64
0.05
0.13
0.01
0.03
0.14
0.00
0.68
0.28
0.28
0.02
Whalefall Human 7
0.04
0.00
0.16
0.24
0.00
0.00
0.00
0.00
0.18
0.16
0.22
0.42
0.37
0.03
0.28
0.42
0.18
0.16
0.00
0.00
Soil
0.30
0.46
0.26
0.28
0.48
0.20
0.74
0.46
0.40
0.28
Slide 24