InterPro Presentation - European Bioinformatics Institute

Download Report

Transcript InterPro Presentation - European Bioinformatics Institute

Duncan Legge
EMBL-EBI
Introduction to Protein Signatures &
InterPro
Introduction
to InterPro
Introduction
to
InterPro
http://www.ebi.ac.uk/interpro
Protein Signatures
Protein Signature =
an amino acid sequence (not
necessarily consecutive) associated
with a protein characteristic.
Introduction
to InterPro
Introduction
to
InterPro
http://www.ebi.ac.uk/interpro
Foundations of InterPro
Integration
of signatures
Manual
curation
Introduction
to InterPro
Introduction
to
InterPro
http://www.ebi.ac.uk/interpro
InterPro
InterPro Consortium
Consortium of
11 major signature
databases
Introduction
to InterPro
Introduction
to
InterPro
http://www.ebi.ac.uk/interpro
What value are signatures?
• Better at finding proteins with common function
 Find more distant homologues than BLAST
Introduction to InterPro
http://www.ebi.ac.uk/interpro
What value are signatures?
• Better at finding proteins with common function
• Classification of proteins
 Associate proteins that share:
Introduction to InterPro
http://www.ebi.ac.uk/interpro
Function
Domains
Sequence
Structure
What value are signatures?
• Better at finding proteins with common function
• Classification of proteins
• Annotation of protein sequences
 Define conserved regions of a protein
- e.g.
location and type of domains
key structural or functional sites
Introduction to InterPro
http://www.ebi.ac.uk/interpro
Protein Signature Methods
Introduction
to InterPro
Introduction
to
InterPro
http://www.ebi.ac.uk/interpro
How are protein signatures made?
Protein family/domain
Build model
Multiple sequence alignment
Search
Refine
Significant
matches
Protein
signature
Introduction
to InterPro
Introduction
to
InterPro
http://www.ebi.ac.uk/interpro
ITWKGPVCGLDGKTYRNECALL
E-value 1e-49
AVPRSPVCGSDDVTYANECELK
E-value 3e-42
SVPRSPVCGSDGVTYGTECDLK
E-value 5e-39
HPPPGPVCGTDGLTYDNRCELR
E-value 6e-10
Types of Protein signatures
(sequence based)
Multiple protein alignment
Introduction to InterPro
http://www.ebi.ac.uk/interpro
Types of Protein signatures
(sequence based)
Single motif methods
Regular expression patterns
C - C - {P} - x(2) - C - [STDNEKPI] - C
Introduction to InterPro
http://www.ebi.ac.uk/interpro
Types of Protein signatures
(sequence based)
Single motif methods
Regular expression patterns
Must be this
x = any AA
( ) = number of AAs
C - C - {P} - x(2) - C - [STDNEKPI] - C
{ } = cannot be..
Introduction to InterPro
http://www.ebi.ac.uk/interpro
[ ] = any of
Types of Protein signatures
(sequence based)
Single motif methods
Regular expression patterns
1 2
3
Multiple motif methods
Identity matrices
Fingerprints
Introduction to InterPro
http://www.ebi.ac.uk/interpro
Types of Protein signatures
(sequence based)
Single motif methods
Regular expression patterns
Full domain
alignment
methods
Profiles
(Profile Library)
Identity matrices
Fingerprints
I2
I1
Multiple motif methods
M1
Introduction to InterPro
http://www.ebi.ac.uk/interpro
I3
M2
M3
D2
D3
M4
Hidden Markov
Models
Mathematical model of
amino acid probability
CONTRIBUTING MEMBER DATA BASES
Models built on either sequence or structural alignments
Each MDB has its own focus
Hidden Markov Models
FingerPrints
Structural Domains
Profiles
Protein features
(active sites…)
Functional annotation of families/domains
16
Introduction to InterPro
http://www.ebi.ac.uk/interpro
Patterns
Sequence
Clusters
Prediction of
conserved
domains
Database
Basis
Institution
Built from
Focus
URL
Family & Domain
based on
conserved
sequence
http://pfam.sanger.ac.u
k/
Pfam
HMM
Sanger Institute
Sequence
alignment
Gene3D
HMM
UCL
Structure
alignment
Structural
Domain
http://gene3d.biochem.
ucl.ac.uk/Gene3D/
Evolutionary
domain
relationships
http://supfam.cs.bris.ac
.uk/SUPERFAMILY/
Superfamily
HMM
Uni. of Bristol
Structure
alignment
SMART
HMM
EMBL Heidelberg
Sequence
alignment
Functional
domain
annotation
http://smart.emblheidelberg.de/
J. Craig Venter Inst.
Sequence
alignment
Microbial
Functional
Family
Classification
http://www.jcvi.org/cm
s/research/projects/tigrf
ams/overview/
Family
functional
classification
http://www.pantherdb.o
rg/
TIGRFAM
HMM
Panther
HMM
Uni. S. California
Sequence
alignment
PIRSF
HMM
PIR, Georgetown,
Washington D.C.
Sequence
alignment
Functional
classification
http://pir.georgetown.e
du/pirwww/dbinfo/pirsf.
shtml
PRINTS
Fingerprints
Uni. of Manchester
Sequence
alignment
Family
functional
classification
http://www.bioinf.manc
hester.ac.uk/dbbrowser/
PRINTS/index.php
PROSITE
Patterns &
Profiles
SIB
Sequence
alignment
Functional
annotation
http://expasy.org/prosit
e/
Sequence
alignment
Microbial
protein family
classification
http://expasy.org/sprot
/hamap/
HAMAP
Introduction to InterPro
Profiles
SIB
http://www.ebi.ac.uk/interpro
A Closer look at InterPro
Introduction
to InterPro
Introduction
to
InterPro
http://www.ebi.ac.uk/interpro
Foundations of InterPro
Integration
of signatures
InterPro
Manual
curation
Master
headline
Introduction
to InterPro
http://www.ebi.ac.uk/interpro
InterPro Curation Priniciples
-To represent MDBs signatures as closely as possible to
what they intended
-To reflect biological reality as accurately as possible in
the entry we create by using types, relationships, GO
mapping
-To provide as much information to the end user as
possible about the signature by annotating signatuires
and providing links to other databases.
Master
headline
Introduction
to InterPro
http://www.ebi.ac.uk/interpro
InterPro Entry
Links related
signatures
Groups similar
signature together
Adds extensive annotation
Linked to other databases
Structural information and viewers
Master
headline
Introduction
to InterPro
http://www.ebi.ac.uk/interpro
Link related signatures - relationships
1) Parent - Child (subgroup of more closely related proteins)
*
*
SMART
(100) Protein kinase
(75) Serine kinase
PFAM
(100) Protein kinase
PFAM
PROSITE
(25) Tyrosine kinase
PFAM
Protein kinase
SMART
PROSITE
Serine kinase
SMART
Parent
Tyrosine kinase
PROSITE
Children
No proteins in common
Master
headline
Introduction
to InterPro
http://www.ebi.ac.uk/interpro
Applies to domains and families
The InterPro entry types
Proteins share a common evolutionary origin, as reflected in their
related functions, sequences or structure
Biological units with defined boundaries
Short sequences typically repeated within a protein
PTM
Active
Site
Binding
Site
Introduction to InterPro
Master headline
http://www.ebi.ac.uk/interpro
Conserved
Site
Searching InterPro
protein ID
Paste in unknown
sequence
Introduction to InterPro
http://www.ebi.ac.uk/interpro
InterPro Search Results
Family
Link to PDBe
Domains
and sites
Unintegrated
signatures
Structural
data
Introduction to InterPro
http://www.ebi.ac.uk/interpro
Link to InterPro entry
Links to signature
databases
Introduction to InterPro
http://www.ebi.ac.uk/interpro
https://www.ebi.ac.uk/Tools/pfa/iprscan/
Select member
databases
Introduction to InterPro
http://www.ebi.ac.uk/interpro
Caveats
InterPro entries are based on signatures supplied to us by our
member databases
•....this means no signature, no entry!
We need your feedback!
missing/additional references
reporting problems
requests
Introduction to InterPro
http://www.ebi.ac.uk/interpro
ACKNOWLEDGEMENTS
InterPro
Team:
Sarah
Hunter
Phil
Jones
Siew-Yit
Yong
Alex
Mitchell
Amaia
Sangrador
Craig
McAnulla
Matthew
Maxim
Sebastien
Fraser Scheremetje Pesseat
w
Introduction to InterPro
http://www.ebi.ac.uk/interpro