Bioinformatics and Functional Genomics, Chapter 8, Part 1

Download Report

Transcript Bioinformatics and Functional Genomics, Chapter 8, Part 1

Protein analysis and proteomics
(Part 1 of 2)
Monday, September 29, 2003
Introduction to Bioinformatics
ME:440.714
J. Pevsner
[email protected]
Copyright notice
Many of the images in this powerpoint presentation
are from Bioinformatics and Functional Genomics
by Jonathan Pevsner (ISBN 0-471-21004-8).
Copyright © 2003 by John Wiley & Sons, Inc.
These images and materials may not be used
without permission from the publisher. We welcome
instructors to use these powerpoints for educational
purposes, but please acknowledge the source.
The book has a homepage at http://www.bioinfbook.org
Including hyperlinks to the book chapters.
Outline for the course
Today: protein analysis and proteomics
Wednesday October 1: protein structure
Monday October 6: no class
Wednesday Oct. 8: multiple sequence alignment
Monday October 13: phylogeny
Wednesday October 15: phylogeny (continued)
Then we begin studying the Tree of Life
(final part of course)
Outline for today
Protein analysis and proteomics
Individual proteins
Protein families
Physical properties
Localization
Function
Large-scale protein analysis
2D protein gels
Yeast two-hybrid
Rosetta Stone approach
Pathways
DNA
RNA
protein
Page 224
[1] Protein families
protein
Page 224
[1] Protein families
protein
[2] Physical properties
Page 224
[1] Protein families
[3] Protein localization
protein
[2] Physical properties
Page 224
[1] Protein families
[3] Protein localization
protein
[4] Protein function
[2] Physical properties
Page 224
[1] Protein families
[3] Protein localization
protein
[4] Protein function
Gene ontology (GO):
--cellular component
--biological process
--molecular function
[2] Physical properties
Page 224
Perspective 1:
Protein domains and motifs
Page 225
Definitions
Signature:
• a protein category such as a domain or motif
Page 225
Definitions
Signature:
• a protein category such as a domain or motif
Domain:
• a region of a protein that can adopt a 3D structure
• a fold
• a family is a group of proteins that share a domain
• examples:
zinc finger domain
immunoglobulin domain
Motif (or fingerprint):
• a short, conserved region of a protein
• typically 10 to 20 contiguous amino acid residues
Page 225
15 most common domains (human)
Zn finger, C2H2 type
Immunoglobulin
EGF-like
Zn-finger, RING
Homeobox
Pleckstrin-like
RNA-binding region RNP-1
SH3
Calcium-binding EF-hand
Fibronectin, type III
PDZ/DHR/GLGF
Small GTP-binding protein
BTB/POZ
bHLH
Cadherin
1093 proteins
1032
471
458
417
405
400
394
392
300
280
261
236
226
226
Page 227
15 most common domains (various species)
The European Bioinformatics Institute (EBI)
offers many key proteomics resources:
http://www.ebi.ac.uk/proteome/
Page 227
Definition of a domain
According to InterPro at EBI (http://www.ebi.ac.uk/interpro/):
A domain is an independent structural unit, found alone
or in conjunction with other domains or repeats.
Domains are evolutionarily related.
According to SMART (http://smart.embl-heidelberg.de):
A domain is a conserved structural entity with distinctive
secondary structure content and a hydrophobic core.
Homologous domains with common functions usually
show sequence similarities.
Page 226
Varieties of protein domains
Extending along the length of a protein
Occupying a subset of a protein sequence
Occurring one or more times
Page 228
Example of a protein with domains:
Methyl CpG binding protein 2 (MeCP2)
MBD
TRD
The protein includes a methylated DNA binding domain
(MBD) and a transcriptional repression domain (TRD).
MeCP2 is a transcriptional repressor.
Mutations in the gene encoding MeCP2 cause Rett
Syndrome, a neurological disorder affecting girls
primarily.
Page 227
Result of an MeCP2 blastp search:
A methyl-binding domain shared by several proteins
Page 228
Are proteins that share only a domain homologous?
Page 228
Example of a multidomain protein: HIV-1 pol
• 1003 amino acids long
• cleaved into three proteins with distinct activities:
-- aspartyl protease
-- reverse transcriptase
-- integrase
We will explore HIV-1 pol and other proteins at the
Expert Protein Analysis System (ExPASy) server.
Visit www.expasy.org/
Page 229
Page 230
SwissProt entry for HIV-1 pol links to many databases
Page 230
ProDom entry for HIV-1 pol shows many related proteins
Page 231
Proteins can have both domains and patterns (motifs)
Pattern
Pattern
(several (several
residues) residues)
Domain
(aspartyl
protease)
Domain
(reverse
transcriptase)
Page 231
Page 232
Definition of a motif
A motif (or fingerprint) is a short, conserved region
of a protein. Its size is often 10 to 20 amino acids.
Simple motifs include transmembrane domains and
phosphorylation sites. These do not imply homology
when found in a group of proteins.
PROSITE (www.expasy.org/prosite) is a dictionary of
motifs (there are currently 1600 entries). In PROSITE,
a pattern is a qualitative motif description (a protein
either matches a pattern, or not). In contrast, a profile
is a quantitative motif description. We will encounter
profiles in Pfam, ProDom, SMART, and other databases.
Page 231-233
Perspective 2:
Physical properties of proteins
Page 233
Page 234
Physical properties of proteins
Many websites are available for the analysis of
individual proteins. ExPASy and ISREC are two
excellent resources.
The accuracy of these programs is variable.
Predictions based on primary amino acid sequence
(such as molecular weight prediction) are likely to be
more trustworthy. For many other properties (such as
posttranslational modification of proteins by
specific sugars), experimental evidence may be
required rather than prediction algorithms.
Page 236
Page 235
Page 235
Page 235
Page 236
Page 238
Page 238
Page 238
Syntaxin, SNAP-25 and VAMP are three proteins that
interact via coiled-coil domains
Introduction to Perspectives 3 and 4:
Gene Ontology (GO) Consortium
Page 237
The Gene Ontology Consortium
An ontology is a description of concepts. The GO
Consortium compiles a dynamic, controlled vocabulary
of terms related to gene products.
There are three organizing principles:
Molecular function
Biological process
Cellular compartment
You can visit GO at http://www.geneontology.org.
There is no centralized GO database. Instead, curators
of organism-specific databases assign GO terms
to gene products for each organism.
Page 237
GO terms are assigned to LocusLink entries
Page 241
Page 241
Page 241
Page 241
The Gene Ontology Consortium: Evidence Codes
IC
IDA
IEA
IEP
IGI
IMP
IPI
ISS
NAS
ND
TAS
Inferred by curator
Inferred from direct assay
Inferred from electronic annotation
Inferred from expression pattern
Inferred from genetic interaction
Inferred from mutant phenotype
Inferred from physical interaction
Inferred from sequence or structural similarity
Non-traceable author statement
No biological data
Traceable author statement
Page 240
Perspective 3:
Protein localization
Page 242
Protein localization
protein
Page 242
Protein localization
Proteins may be localized to intracellular compartments,
cytosol, the plasma membrane, or they may be secreted.
Many proteins shuttle between multiple compartments.
A variety of algorithms predict localization, but this
is essentially a cell biological question.
Page 240
Page 242
Page 244
Page 244
Localization of 2,900 yeast proteins
Michael Snyder and colleagues incorporated epitope
tags into thousands of S. cerevisiae cDNAs,
and systematically localized proteins (Kumar et al., 2002).
See http://ygac.med.yale.edu for a database including
2,900 fluorescence micrographs.
Page 243
Perspective 4:
Protein function
Page 243
Protein function
Function refers to the role of a protein in the cell.
We can consider protein function from a variety
of perspectives.
Page 243
1. Biochemical function
(molecular function)
RBP binds retinol,
could be a carrier
Page 245
2. Functional assignment
based on homology
RBP
could be
a carrier
too
Other
carrier
proteins
Page 245
3. Function
based on structure
RBP forms a calyx
Page 245
4. Function based on
ligand binding specificity
RBP binds vitamin A
Page 245
5. Function based on
cellular process
DNA
RNA
RBP is abundant,
soluble, secreted
Page 245
6. Function based
on biological process
RBP is essential for vision
Page 245
7. Function based on “proteomics”
or high throughput “functional genomics”
High throughput analyses show...
RBP levels elevated in renal failure
RBP levels decreased in liver disease
Page 245
Functional assignment of enzymes:
the EC (Enzyme Commission) system
Oxidoreductases
Transferases
Hydrolases
Lyases
Isomerases
Ligases
1,003
1,076
1,125
356
156
126
Page 246
Functional assignment of proteins:
Clusters of Orthologous Groups (COGs)
Information storage and processing
Cellular processes
Metabolism
Poorly characterized
Page 247
Functional assignment of proteins:
Clusters of Orthologous Groups (COGs)
Information storage and processing
Cellular processes
Metabolism
Poorly characterized
(Most useful for prokaryotes;
we will describe COGs on Oct. 20)
Page 247
This lecture continues in part 2
with a discussion of two dimensional
gels and the yeast two-hybrid system
http://pevsnerlab.kennedykrieger.org/ppts/lecture_bioinf_ch8_part2.ppt