Pfam - University of North Carolina at Charlotte

Download Report

Transcript Pfam - University of North Carolina at Charlotte

Pfam(Protein families )
Pfam 27.0
(March 2013, 14831 families)
http://pfam.sanger.ac.uk/
Protein family
• A protein family is a group of evolutionarilyrelated proteins
• Proteins in a family descend from a common ancestor
(homology) and typically have similar three-dimensional
structures, functions, and significant sequence similarity.
While it is difficult to evaluate the significance of functional
or structural similarity, there is a fairly well developed
framework for evaluating the significance of similarity
between a group of sequences using sequence
alignment methods.
• Proteins that do not share a common ancestor are very
unlikely to show statistically significant sequence similarity,
making sequence alignment a powerful tool for identifying
the members of protein families.
Superfamily – family - subfamily
• A common usage is that superfamilies contain
families which contain sub-families.
• Many proteins comprise multiple independent
structural and functional units or domains. Due
to evolutionary shuffling, different domains in a
protein have evolved independently. This has led,
in recent years, to a focus on families of protein
domains. A number of online resources are
devoted to identifying and cataloging such
domains.
Superfamily – family - subfamily
•
•
•
•
•
Superfamily: The domains in a fold are grouped into
superfamilies, which have at least a distant common
ancestor.
Family: The domains in a superfamily are grouped into
families, which have a more recent common ancestor.
Protein domain: The domains in families are grouped
into protein domains, which are essentially the same
protein.
Species: The domains in "protein domains" are
grouped according to species.
Domain: part of a protein.
An example
The human cyclophilin family, as
represented by the structures of the
isomerase domains of some of its
members.
Protein family resources
• There are many biological databases that
record examples of protein families and
allow users to identify if newly identified
proteins belong to a known family. Here
are a few examples:
Protein family resources
• Pfam - Protein families database of alignments and
HMMs
• PROSITE - Database of protein domains, families and
functional sites
• PIRSF - SuperFamily Classification System
• PASS2 - Protein Alignment as Structural Superfamilies
v2
• SUPERFAMILY - Library of HMMs representing
superfamilies and database of (superfamily and family)
annotations for all completely sequenced organisms
• SCOP and CATH - classifications of protein structures
into superfamilies, families and domains
Pfam
• The Pfam database is a large collection of
protein families, each represented
by multiple sequence
alignments and hidden Markov models
(HMMs).
SUPERFAMILY
• SUPERFAMILY is a database of structural and
functional annotation for all proteins and
genomes.
• The SUPERFAMILY annotation is based on a
collection of hidden Markov models, which
represent structural protein domains at
the SCOP superfamily level.
• The Structural Classification of Proteins
(SCOP) database is a largely manual
classification of protein structural domains based
on similarities of their structures and amino
acid sequences.
SUPERFAMILY
• SUPERFAMILY classifies amino acid
sequences into known structural domains,
especially into SCOP superfamilies. The
superfamilies are groups of proteins which
have structural evidence to support a
common evolutionary ancestor but may
not have detectable sequence homology.
SUPERFAMILY