tutorial9_12

Download Report

Transcript tutorial9_12

Tutorial 9
Protein and Function Databases
Protein and Function Databases
-UniProt - SwissProt/TrEMBL
-PROSITE
-Pfam
-Gene Onltology
-DAVID
Glossary
Domain
A structural unit which can be found in multiple
protein contexts.
Glossary
Repeat
A short unit which is unstable in isolation but forms a
stable structure when multiple copies are present.
Family
A collection of related proteins.
UniProt
http://www.uniprot.org/
The Universal Protein Resource (UniProt) is a
central repository of protein sequence,
function, classification and cross reference.
It was created by joining the information
contained in swiss-Prot and TrEMBL.
Protein
search
Reviewed
protein
Uniprot input
Sequence
download
Uniprot output
Accession
number
Protein
status
organism
length
Information for one protein
General
information
annotations
General
keywords
GO annotation
(MF, BP, CC)
Alternative
splicing isoforms
Features in the
sequence
Sequences
References
Alignment for two or more proteins
MSA
Blast
Pfam
• http://pfam.sanger.ac.uk/
• Pfam is a database of multiple alignments of
protein domains or conserved protein
regions.
What kind of domains can we find in Pfam?
Trusted Domains
Repeats
Fragment Domains
Nested Domains
Disulfide bonds
Important residues
(e.g active sites)
Trans membrane domains
What kind of domains can we find in Pfam?
Context domains: are those that despite not scoring above the
family threshold are expected to be real, based on the other
domains found in the protein.
Signal peptides:
(indicate a protein that will be secreted)
Low complexity regions
Coiled Coils:
(two or three alpha helices that wind around each
other)
Pfam input
Domains
Domain range
and score
Description
Structure info
Gene Ontology
Links
Prosite
• http://www.expasy.org/tools/scanprosite
• ProSite is a database of protein domains and
motifs that can be searched by either regular
expression patterns or sequence profiles.
Search Results
Domains architecture
Gene Ontology (GO)
http://www.geneontology.org/
• It is a database of biological processes,
molecular functions and cellular components.
• GO does not contain sequence information nor gene
or protein description.
• GO is linked to gene and protein databases.
•The GO database is structured as a tree
Search by AmiGO
Three principal branches
http://www.geneontology.org/amigo/
GO structure is a
Directed Acyclic Graph
GO sources
ISS
IDA
IPI
TAS
NAS
IMP
IGI
IEP
IC
ND
IEA
Inferred from Sequence/Structural Similarity
Inferred from Direct Assay
Inferred from Physical Interaction
Traceable Author Statement
Non-traceable Author Statement
Inferred from Mutant Phenotype
Inferred from Genetic Interaction
Inferred from Expression Pattern
Inferred by Curator
No Data available
Inferred from electronic annotation
Results for alpha-synuclein
DAVID
Functional Annotation Bioinformatics Microarray Analysis
• Identify enriched biological themes, particularly GO
terms
• Discover enriched functional-related gene/protein
groups
• Cluster redundant annotation terms
• Explore gene names in batch
annotation
classification
ID
conversion
Functional annotation
Upload
Annotation
options