ppt - Bio-Ontologies 2016

Download Report

Transcript ppt - Bio-Ontologies 2016

Using Ontology Reasoning to Classify
Protein Phosphatases
K.Wolstencroft, P.Lord, L.tabernero, A.brass,
R.stevens
University of Manchester
Introduction
Automated classification of proteins into protein
subfamilies
1.
2.
3.
4.
5.
Background
Architecture
Advantages
Results
Future directions
Motivation
Biological data production fast
- High throughput techniques
- Large numbers of species being sequenced
- Large amount of data uncharacterised
Data analysis is now the rate-limiting step
Why Classify?
• Classification and curation of a genome is the
first step in understanding the processes and
functions happening in an organism
• Classification enables comparative genomic
studies - what is already known in other
organisms
• The similarities and differences between
processes and functions in related organisms
often provide the greatest insight into the biology
Protein Classification
• Proteins divided into broad functional classes
“Protein Families”
- evolutionary relationships
- common domain architecture
• Relationship between sequence and structure
allows searching for distinct structural (and
functional) domains within the sequence
• Domains could be several amino acids long – or
could span most of the protein
Example
A search of the linear sequence of protein tyrosine
phosphatase type K – identified 9 functional
domains
>uniprot|Q15262|PTPK_HUMAN Receptor-type protein-tyrosine phosphatase kappa precursor (EC
3.1.3.48) (R-PTP-kappa).
MDTTAAAALPAFVALLLLSPWPLLGSAQGQFSAGGCTFDDGPGACDYHQDLYDDFEWVHV
SAQEPHYLPPEMPQGSYMIVDSSDHDPGEKARLQLPTMKENDTHCIDFSYLLYSQKGLNP
GTLNILVRVNKGPLANPIWNVTGFTGRDWLRAELAVSSFWPNEYQVIFEAEVSGGRSGYI
AIDDIQVLSYPCDKSPHFLRLGDVEVNAGQNATFQCIATGRDAVHNKLWLQRRNGEDIPV………..
Protein Family Classification
• Often diagnostic domains/motif signify family membership
e.g. ALL proteins with a tyrosine protein kinase-specific active site (IPR008266)
domain are types of tyrosine kinase
Current Techniques
• Human expert classification
– gold standard
– human knowledge applied to results from
bioinformatics analysis tools
• Automated use of bioinformatics analysis tools
– quick
– less detailed
Automated Methods
Bioinformatics analysis tools
• top BLAST hit - annotating as ‘similar to’ other
known proteins
- Could result in protein A is similar to protein B,
which is similar to protein C, which is similar to
protein D etc, etc,
• Interpro Scan analysis
- shows number and types of domains, but does
not provide interpretations
Human Expert Annotation
• Same similarity searching tools used for
domain/motif identification
• Humans use expert knowledge to classify
proteins according to domain arrangements
Presence / order / number of each important
Can an ontology be used to capture this
knowledge to the standard of a human
annotator?
Ontology Approach
• Use ontology to capture the ‘rules’ for protein
family membership in formal OWL
representation
• Ontology contains the human expert knowledge
• Ontology reasoning can take the place of human
analysis of the data
The Protein Phosphatases
• large superfamily of proteins – involved in the
removal of phosphate groups from molecules
• Important proteins in almost all cellular
processes
• Involved in diseases – diabetes and cancer
• human phosphatases well characterised
Phosphatase Functional Domains
Andersen et al (2001) Mol. Cell. Biol. 21 7117-36
Determining Class Definitions
R5
- Contains 2 protein tyrosine phosphatase domains
- Contains 1 transmembrane domain
- Contains 1 fibronectin domains
- Contains 1 carbonic anhydrase
Protégé OWL Modelling
Requirements
• Extract phosphatase sequences from rest of protein
sequences from a whole genome
• Identify the domains present in each
• Compare these sequences to the formal ontology
descriptions
• Classify each protein instance to a place in the hierarchy
Architecture
OWL DL
ontology
Raw protein
myGrid
Instance
Classified Protein
sequences
Services
Store
Phosphatases
Reasoner
(racer)
myGrid Services
• extract protein phosphatase sequences from whole
genome using simple filtering
– patmatdb EMBOSS tool used to extract proteins with
phosphatase diagnostic motifs
• perform InterproScan to determine domain architecture
• transform the InterproScan results into abstract OWL
instance descriptions
InterproScan Results
Conversion to abstract OWL format
restriction(<http://www.owlontologies.com/unnamed.owl#containsDomainIPR00034
0> cardinality(1))
restriction(<http://www.owlontologies.com/unnamed.owl#containsDomainIPR00176
3> cardinality(1))
restriction(<http://www.owlontologies.com/unnamed.owl#containsDomainIPR00038
7> cardinality(1))
Instance Store
• Instance Store enables reasoning over
individuals
• Can support much higher numbers of individuals
• OWL ontology is loaded into the instance store
• A DL reasoner (racer) is used to compare
individuals to the OWL ontology definitions
Instance Store
Example Instances
• Protein Individual
Dual Specificity
Phosphatase DUSE
• Ontology Definition of
Dual Specificity
Phosphatase
restriction(<http://www.owlontologies.com/unnamed.ow
l#containsDomainIPR000340
> cardinality(1))
restriction(<http://www.owlontologies.com/unnamed.ow
l#containsDomainIPR000387
> cardinality(1))
containsDomain IPR000340
Necessary and Sufficient for
class membership
Also inherits
containsDomain IPR000387
from Parent Class PTP
Results
• Human phosphatases have been classified
using the system
• The ontology classification performed equally
well as expert classification
• The ontology system refined classification
- DUSC contains zinc finger domain
characterised and conserved – but not in classification
- DUSA contains a disintegrin domain
previously uncharacterised – evolutionarily conserved
Aspergillus fumigatus
• Phosphatase proteins very different from human
>100 human <50 A.fumigatus
• Whole subfamilies ‘missing’
Different fungi-specific phosphorylation pathways?
No requirement for tissue-specific variations?
• Novel serine/threonine phosphatase with
homeobox
conserved in aspergillus and closely related species, but
not in any other - virulence
Ongoing Work
• Phosphatases in other genomes
– Trypanosomes
– Plasmodium falciparum
• Other protein families
– Ion Channels
– ABC transporters
– Nuclear receptors
Conclusions
• Using ontology allows automated classification
to reach the standard of human expert
annotation
• Reasoning capabilities allow interpretation of
domain organisation
• Highlights anomalies and variations from what is
known
• Allows fast, efficient comparative genomics
studies
Acknowledgements
PhD Supervisors: Andy Brass, Robert Stevens
Group: myGrid, Phil Lord, Carole Goble
Phosphatase Biologist: Lydia Tabernero
Medical Research Council