biological process

Download Report

Transcript biological process

Protein Structure &
Analysis
Biology 224
Dr. Tom Peavy
Sept 28 & 30
<Images from Bioinformatics and Functional
Genomics by Jonathan Pevsner>
Protein families
Protein localization
protein
Protein function
Gene ontology (GO):
--cellular component
--biological process
--molecular function
Physical properties
The Human Proteome Organisation (HUPO)
Proteomics Standards Initiative (PSI)
Work groups
• Gel Electrophoresis
• Mass Spectrometry
• Molecular Interactions
• Protein Modifications
• Proteomics Informatics
• Sample Processing
Themes
• Controlled vocabularies
• MIAPE: Minimum information about a proteomics
experiment
The Human Proteome Organisation (HUPO)
Proteomics Standards Initiative (PSI)
http://www.psidev.info/
Protein domains, motifs
& signatures
Definitions
Signature:
• a protein category such as a domain or motif
(a defining property of the protein or family)
Domain:
• a region of a protein that can adopt a 3D structure
• a fold
• a family is a group of proteins that share a domain
• examples:
zinc finger domain
immunoglobulin domain
Motif (or fingerprint):
• a short, conserved region of a protein
• typically 10 to 20 contiguous amino acid residues
Definition of a domain
According to InterPro at EBI (http://www.ebi.ac.uk/interpro/):
A domain is an independent structural unit, found alone
or in conjunction with other domains or repeats.
Domains are evolutionarily related.
According to SMART (http://smart.embl-heidelberg.de):
A domain is a conserved structural entity with distinctive
secondary structure content and a hydrophobic core.
Homologous domains with common functions usually
show sequence similarities.
15 most common domains (human)
Zn finger, C2H2 type
Immunoglobulin
EGF-like
Zn-finger, RING
Homeobox
Pleckstrin-like
RNA-binding region RNP-1
SH3
Calcium-binding EF-hand
Fibronectin, type III
PDZ/DHR/GLGF
Small GTP-binding protein
BTB/POZ
bHLH
Cadherin
1093 proteins
1032
471
458
417
405
400
394
392
300
280
261
236
226
226
Varieties of protein domains
Extending along the length of a protein
Occupying a subset of a protein sequence
Occurring one or more times
Example of a protein with domains:
Methyl CpG binding protein 2 (MeCP2)
MBD
TRD
The protein includes a methylated DNA binding domain
(MBD) and a transcriptional repression domain (TRD).
MeCP2 is a transcriptional repressor.
Mutations in the gene encoding MeCP2 cause Rett
Syndrome, a neurological disorder affecting girls
primarily.
Result of an MeCP2 blastp search:
A methyl-binding domain shared by several proteins
Are proteins that share only a domain homologous?
Proteins can have both domains and patterns (motifs)
Pattern
Pattern
(several (several
residues) residues)
Domain
(aspartyl
protease)
Domain
(reverse
transcriptase)
The SwissProt entry for
any protein provides
highly useful information…
SwissProt entry for HIV-1 pol links to many databases
Definition of a motif
A motif (or fingerprint) is a short, conserved region
of a protein. Its size is often 10 to 20 amino acids.
Simple motifs include transmembrane domains and
phosphorylation sites. These do not imply homology
when found in a group of proteins.
PROSITE (www.expasy.org/prosite) is a dictionary of
motifs (there are currently 1600 entries). In PROSITE,
a pattern is a qualitative motif description (a protein
either matches a pattern, or not). In contrast, a profile
is a quantitative motif description. Profiles are found
in Pfam, ProDom, SMART, and other databases.
Page 231-233
http://www.ebi.ac.uk/Databases/
InterPro
InterPro is a database of protein families,
domains and functional sites in which identifiable
features found in known proteins can be applied
to unknown protein sequences.
http://www.ebi.ac.uk/interpro/
ExPASy Proteomics Server
The ExPASy (Expert Protein Analysis System) proteomics server of the
Swiss Institute of Bioinformatics (SIB) is dedicated to the analysis of protein
sequences and structures as well as 2-D PAGE (Disclaimer / References).
http://ca.expasy.org/
PROSITE
Database of protein families and domains
http://ca.expasy.org/prosite/
Pfam is a large collection of multiple sequence
alignments and hidden Markov models covering
many common protein domains.
http://www.sanger.ac.uk/Software/Pfam/index.shtml
PRINTS is a compendium of protein fingerprints
http://umber.sbs.man.ac.uk/dbbrowser/PRINTS/
The ProDom protein domain database consists of an
automatic compilation of homologous domains.
http://prodes.toulouse.inra.fr/prodom/current/html/home.php
ProDom entry for HIV-1 pol shows many related proteins
Page 231
SMART (a Simple Modular Architecture Research Tool)
allows the identification and annotation of genetically mobile
domains and the analysis of domain architectures.
http://smart.embl-heidelberg.de/
Houses the PIRSF, ProClass
and ProLINK databases
http://pir.georgetown.edu/
www.uniprot.org
Three protein databases recently merged to form UniProt:
• SwissProt
• TrEMBL (translated European Molecular Biology Lab)
• Protein Information Resource (PIR)
You can search for information on your favorite protein
there; a BLAST server is provided.
1. Go to ExPASy (http://www.expasy.ch/)
2. If you know the SwissProt accession of your protein,
enter it at top.
3. Otherwise go into Swiss-Prot/TrEMBL,
click SRS (Sequence Retrieval System),
click Start, then click continue,
then search for your protein of interest.
Page 230
Protein family classification and databases
TIGRFAMs
http://www.tigr.org/TIGRFAMs/index.shtml
SUPERFAMILY
http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/
PANTHER
http://www.pantherdb.org/
PIRSF
http://pir.georgetown.edu/iproclass/
Gene3D
http://www.biochem.ucl.ac.uk/bsm/cath/Gene3D/
Physical properties of proteins
Many websites are available for the analysis of
individual proteins. ExPASy and ISREC are two
excellent resources.
The accuracy of these programs is variable.
Predictions based on primary amino acid sequence
(such as molecular weight prediction) are likely to be
more trustworthy. For many other properties (such as
posttranslational modification of proteins by
specific sugars), experimental evidence may be
required rather than prediction algorithms.
Page 236
http://www.expasy.ch/
Page 230
Access a variety of protein analysis programs
from the top right of the ExPASy home page
Page 235
Page 244
Page 244
Proteomics: High throughput protein analysis
Proteomics is the study of the entire collection
of proteins encoded by a genome
“Proteomics” refers to all the proteins in a cell
and/or all the proteins in an organism
Large-scale protein analysis
2D protein gels
Yeast two-hybrid
Rosetta Stone approach
Pathways
Page 247
Two-dimensional protein gels
First dimension: isoelectric focusing
Second dimension: SDS-PAGE
Page 248
Two-dimensional protein gels
First dimension: isoelectric focusing
Electrophorese ampholytes to establish
a pH gradient
Can use a pre-made strip
Proteins migrate to their isoelectric point
(pI) then stop (net charge is zero)
Range of pI typically 4-9 (5-8 most common)
Page 248
Two-dimensional protein gels
Second dimension: SDS-PAGE
Electrophorese proteins through an acrylamide
matrix
Proteins are charged and migrate through an
electric field
Conditions are denaturing (SDS)
and reducing (2-mercaptoethanol)
Can resolve hundreds to thousands of proteins
Page 248
Proteins identified on 2D gels (IEF/SDS-PAGE)
Direct protein microsequencing by
Edman degradations
-- done at many core facilities (e.g. UC Davis)
-- typically need 5 picomoles
-- often get 10 to 20 amino acids sequenced
Protein mass analysis by MALDI-TOF
-- done at core facilities
-- often detect posttranslational
modifications
-- matrix assisted laser desorption/ionization
time-of-flight spectroscopy
Page 250-1
Page 252
Evaluation of 2D gels (IEF/SDS-PAGE)
Advantages:
Visualize hundreds to thousands of proteins
Improved identification of protein spots
Disadvantages:
Limited number of samples can be processed
Mostly abundant proteins visualized
Technically difficult
Page 251
Gene Ontology (GO) Consortium
The Gene Ontology Consortium
An ontology is a description of concepts. The GO
Consortium compiles a dynamic, controlled vocabulary
of terms related to gene products.
There are three organizing principles:
Molecular function
Biological process
Cellular component
GO terms are assigned to Entrez Gene entries
Page 241
Page 241
Example
Gene product cytochrome c GO entry terms:
molecular function = electron transporter activity,
the biological process = oxidative phosphorylation and
induction of cell death
the cellular component = mitochondrial matrix and
mitochondrial inner membrane.
GO consortium (http://www.geneontology.org)
No centralized GO database. Instead, curators
of organism-specific databases assign GO terms
to gene products for each organism.
AmiGO is the searchable portion of the GO
--Gene Symbol, name, UniProt access numbers, and
Text searches can be used to find GO entries
The Gene Ontology Consortium: Evidence Codes
IC
IDA
IEA
IEP
IGI
IMP
IPI
ISS
NAS
ND
TAS
Inferred by curator
Inferred from direct assay
Inferred from electronic annotation
Inferred from expression pattern
Inferred from genetic interaction
Inferred from mutant phenotype
Inferred from physical interaction
Inferred from sequence or structural similarity
Non-traceable author statement
No biological data
Traceable author statement