PROTEIN IDENTIFICATION BY PEPTIDE MASS MAPPING

Download Report

Transcript PROTEIN IDENTIFICATION BY PEPTIDE MASS MAPPING

蛋白質體與質譜分析
Speaker: 黃弘文 中山大學生醫所
Defining proteomics
• Functional proteomics: study of the expressed proteins of a
genome using 2D and MS.
– 2D: two dimensional gel analysis, usually referred as a method
that sample first run a IEF (iso-electrofocusing) electrophoresis.
Following a SDS PAGE.
– MS: mass spectrometry, basically, a method to “accurate”
determine molecule weight.
Summary of the systems
• -ome--- Compete set.
• -ics --- the utility for analyzing these
domains.
• Functional genomics: hierachical
approaches for studying the functional
analysis of novel genes, also, Phenomics is
suggested for this definition)
Functional Genomics (or Phenomics)
轉錄
DNA
基因
mRNA
功能性基因
t-RNA
t-RNA
轉譯
核糖體
蛋白質
Microarray
基因體學
(Genomics)
(....)
轉譯後修釋
X
(....)
X
蛋白質體學
(Proteomics)
功能性蛋白質
(生化反應)
X
CHO 醣質化
PO4 磷酸化
Proteomic
1.蛋白質體學與基因體學兩者相互互補(如表現量)但不一定相同。
2.蛋白質體學能發現在基因體學無法完成得到的結果(如醣質化及磷酸化)。
2001 Proteomics Group, Institute of Biological Chemistry, Academia Sinica
SEPARATION
IDENTIFICATION
General flow
for
proteomics
analysis
2D-SDS PAGE gel
The first dimension
(separation by isoelectric focusing)
- gel with an immobilised pH gradient
- electric current causes charged
proteins to move until it reaches the
isoelectric point
(pH gradient makes the net charge 0)
Ettan IPGphor IEF System
• Accommodates all lengths of Immobline DryStrip gels
(7, 11, 13, 18, and 24 cm) and can run 12 gels
simultaneously.
1. Apply rehydration solution.
2. Remove protective film from
3.Drystrip in strip holder
DryStrip gel.
4. Apply Cover oil
5. Cover on strip holder.
6. Place assembled strip holder on Ettan™ IPGphor™ platform.
• Analysis of protein spots with melanie
software
• Maldi-Tof analysis
http://proteome.sinica.edu.tw/pro_technology.asp
Proteomic databases and data
analysis
•
•
•
•
Genomic databases
Protein databases
Proteomics databases
Maldi-tof data analysis
How many databases?
ExPASy Life Sciences Directory(Amos' WWW links page)
Last Update: March 22, 2005 (~1200 links!!)
This list contains almost exclusively pointers to information
sources for life scientists with an interest in biological
macromolecules.
Links to protein sequence,
3D structure and
2D-gel analytical tools are provided on the ExPASy server, and
more specifically from its Proteomics tools page.
Links to Geneva and Swiss biological servers, institutes, etc. are
on the Local page of ExPASy. Finally, if you don't find what
you want here, do not forget to use BioHunt molecular biology
information search engine.
http://tw.expasy.org/alinks.html
Categories of databases for Life Sciences
• Sequences (DNA, protein)
• Genomics
• Mutation/polymorphism
• Protein domain/family (----> tools)
• Proteomics (2D gel, Mass Spectrometry)
• 3D structure
• Metabolism
• Bibliography
• ‘Others’ (Microarrays,…)
Sequence databases
• DNA/RNA & Proteins
–
–
–
–
–
–
–
–
Sequences !!
Accession number (AC)
Taxonomic data
References
ANNOTATION/CURATION
Keywords
Cross-references
Documentation
Nucleotide based databases
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
http://tw.expasy.org/alinks.html
EMBL - EMBL Nucleotide sequence db (EBI)
Genbank - GenBank Nucleotide Sequence db (NCBI)
DDBJ - DNA Data Bank of Japan
dbEST - dbEST (Expressed Sequence Tags) db (NCBI)
dbSTS - dbSTS (Sequence Tagged Sites) db (NCBI)
NDB - Nucleic Acid Databank (3D structures)
BNASDB - Nucleic acid structure db from University of Pune
AsDb - Aberrant Splicing db
ACUTS - Ancient conserved untranslated DNA sequences db
Codon Usage Db
EPD - Eukaryotic Promoter db
HOVERGEN - Homologous Vertebrate Genes db
IMGT - ImMunoGeneTics db [Mirror at EBI]
ISIS - Intron Sequence and Information System
RDP - Ribosomal db Project
gRNAs db - Guide RNA db
PLACE - Plant cis-acting regulatory DNA elements db
PlantCARE - Plant cis-acting regulatory DNA elements db
sRNA db - Small RNA db
ssu rRNA - Small ribosomal subunit db
lsu rRNA - Large ribosomal subunit db
5S rRNA - 5S ribosomal RNA db
tmRNA Website
tmRDB - tmRNA dB
dbEST
• Expression in various organs (or species)
– ESTs are partial sequences of cDNA libray clones.
– cDNA is made by mRNA which represents
proteins expressed on each organism.
– ESTs of various organisms and many organs of
human are now under investigation.
http://www.ncbi.nlm.nih.gov
• Swiss-2D
• http://tw.expasy.org/
• To reveal proteins on 2D PAGE and find out
the possible properties.
Question: Find the theoretical MW and PI of a
known protein and The position on 2D gel
--Beta actin from human
Browse information and view structure.
•Open an internet browser, go to http://tw.expasy.org/
•Select SWISS-PROT and TrEMBL database
– Type:alkaline phosphatase E. coli (P00634 )
“beta actin human” Will result in information about protein
precursor (P60709)
Select Compute pI/Mw
Select residues 2-375 (this is the protein without the
signal peptide), This will provide information about the
protein’s average MW
P60709
Function
And cross
reference
Feature: note
preprotein!
aa, Mw and
sequence
Analysis tools:
Include: pI and many
other protein
biochemistry properties
AC
Description
aa
Theoretical
pI/Mw
Experimental
pI/Mw
P06106 o-acetylhomoserine
sulfhydralase
443
5.98/48.540
6.0/46.0 (4)
5.9/45.0 (3)
6.2/48.53 (2)
P32582 cystathionine betasynthase
507
6.25/56022
6.35/52.5 (3)
6.47/55.886
(2)
P31373 cystathionine gamma- 393
lyase
6.05/42410
6.29/42.402
(2)
Theoretical and experimental differences
Find 2D data
Beta actin
expressed
in many
tissues
Beta actin
expression
pattern in
human liver
Note: many
clustered spots
---PTM
Other databases
• Further analysis:
– Download gel image data from Swiss 2D
– You need:
• FTP
• Software to analysis the gel data (.mel)
– Melanie (viewer free download from Swiss 2D)
– (ImageMaster by Amersham Biosciences or
PDquest by BioRad)
ARABIDOPSIS.mel
BAT_MOUSE.mel
CEC_HUMAN.mel
CEC_HUMAN.mel
CSF_HUMAN.mel
DICTYSLUG.mel
DLD1_HUMAN.mel
ECOLI.mel
ELC_HUMAN.mel
HEPG2_HUMAN.mel
HL60_HUMAN.mel
ISLETS_MOUSE.mel
KIDNEY_HUMAN.mel
LIVER_HUMAN.mel
LIVER_MOUSE.mel
LYMPHOCYTE_HUMAN.mel
etc…
• Melanie Analysis of protein spots
–
–
–
–
pI and Mw
Auto/manual detect spots
Analysis of spots on different gels
Quantization of spots
ARABIDOPSIS.mel
BAT_MOUSE.mel
CEC_HUMAN.mel
CEC_HUMAN.mel
CSF_HUMAN.mel
DICTYSLUG.mel
DLD1_HUMAN.mel
ECOLI.mel
ELC_HUMAN.mel
HEPG2_HUMAN.mel
HL60_HUMAN.mel
ISLETS_MOUSE.mel
KIDNEY_HUMAN.mel
LIVER_HUMAN.mel
LIVER_MOUSE.mel
LYMPHOCYTE_HUMAN.mel
etc…
To define pI_MW annotations on a gel:
• Open the gel for which you know the pI and MW values of several
protein spots. In this gel, spots may or may not have already been
detected.
• Activate the Annotation tool.
• Double click on a spot (pixel) for which you know the pI and/or MW
values.
• Select the pI_MW category in the Create Annotation by Click window.
• Enter the known pI and MW values, respectively, separated by a space.
Replacing one of the values with -1 means that no value is set.
• Do this for a sufficient number of protein spots, well distributed over
the whole gel. Of course, the more spots and the more annotations,
then the better the approximated pI and MW values will be.
• ImageMaster: looks up the two closest annotations to the left and to
the right of the spot for which the pI will be determined, and then
interpolates between these 2 points.
• Calculate MW: closest spots above and below the spot for which the
MW will be determined and it makes a logarithmic interpolation.
MS analysis
• ProteinProspector (MSFIT):
http://jpsl.ludwig.edu.au/
• MSCOT: http://www.matrixscience.com/
http://jpsl.ludwig.edu.au/
COMPUTER EXERCISE #1: Find the theoretical MW of a known
protein and compare that to the MW observed by MALDI-TOF mass
spectrometry
(Alkaline Phosphatase from E.coli), browse information and view
structure.
• Open an internet browser, go to http://tw.expasy.org/
• Select SWISS-PROT and TrEMBL database
– Type: alkaline phosphatase E. coli
Will result in information about protein precursor (P00634)
Explore the information (Feature Table, Amino Acid Sequence)
Select Compute pI/Mw
Select residues 22-471 (this is the protein without the signal)
This will provide information about the protein’s average MW
(Molecular
weight: 47199.79 Theoretical pI: 5.54)
Compare the mass spectrum (Below) to the average MW. Calculate the m/z difference
and % error between the observed and theoretical protein.
• Figure 1: MALDI-TOF mass spectrum of Alkaline Phosphatase.
Open a new browser page, go to: The Protein Data Bank
http://www.rcsb.org/pdb/
Type in the protein number for Alkaline phosphotase E.coli
(P00634)
• Select Find a structure
• Select EXPLORE for any of the structures listed (suggest nonmutant)
• Select View Structure and then choose ribbons or cylinders &
file size.
• Select Quick PDB and explore the options (rotate protein, etc)
• Explore the information provided by this page.
• Try other proteins, such as Lysozyme, chick, P00698
(a) Identify a protein from measured peptide masses
Identify unknown protein #66 using the data provided in excel(below). Data is
provided for 4 unknown proteins (#66, 166, 55, 36) Search for more if time
permits.
You will copy and paste these m/z values into the search described below.
Delete the m/z values 904.4681 and 2465.199 (these are internal calibrants).
Open an internet browser, go to http://jpsl.ludwig.edu.au/
(http://prospector.ucsf.edu/)
Select the MS-Fit program
At the bottom of the screen is a data paste area filled with masses.
Delete these and copy/paste the mass list (excel spreadsheet).
Search these masses while changing different options.
Options to vary:
Database (Swiss-Prot, Owl, NCB)
Mass Tolerance (try 50, 70, 100, 120 ppm)
Look for proteins with strong MOWSE scores with low mass errors.
PROTEIN IDENTIFICATION BY PEPTIDE MASS MAPPING
• COMPUTER EXERCISE #3: Investigate peptide mass mapping used for
protein identification
(b) find a protein’s theoretical peptides to confirm
the identity
–
–
–
–
–
Open an internet browser, go to http://prospector.ucsf.edu/
Select the MS-Digest program
Options to choose:
Retrieve entry by accession number
Enter the number for the protein identified in #3a (excel
data #116)
– Select the database where your protein was found
– The digestion enzyme is Trypsin, with 0 missed cleavages
– The result is a list of theoretical peptides that can be
compared to the measured masses
(c) Identify an unknown protein using data acquired
with reduced mass accuracy.
– Identify an unknown protein – mass spectrum shown on
next page. This spectrum was acquired using EXTERNAL
CALIBRANTS, so the mass accuracy is not as good as we
saw for the proteins in (a).
– This protein is from a 2-D gel: MW is approx. 40,000, pI
is approx 5.
– In MS-Fit, type in the masses from the mass spectrum
below. Search these masses while changing different
options (including MW and pI range) and look for proteins
with strong MOWSE scores with low mass errors.
Figure : Unknown protein to identify in Computer Exercise #3c.
2-D GEL ELECTROPHORESIS
COMPUTER EXERCISE #3: Explore information on 2Dimensional gels.
• Go to http://tw.expasy.org/ and select SWISS-2DPAGE.
• Choose access to Swiss 2-D page by accession number.
• Search for P00359 (Glyceraldehyde 3-phosphate
dehydrogenase from yeast)
• Enlarge the gel and notice where the spot is observed.
• Return to SWISS-2DPAGE, and choose access by clicking on
a spot:
• Find the gel for yeast (Saccharomyces cerevisiae) and try to
find the same protein.
PROTEIN IDENTIFICATION BY PEPTIDE MASS MAPPING
• COMPUTER EXERCISE #2: explore 2D data
#Open Expasy WWW site, locate E. coli 2 D image.
1. Trying find spot on the image, find out “Superoxide dismutase”
2. Try to list all the annotated genes of the E. coli 2D image.
3. Find out this gene on other species.
PROTEIN IDENTIFICATION BY MS/MS
• COMPUTER EXERCISE #4: Investigate the use
of MS-MS data for peptide sequence
identification – and ultimately protein
identification.
– An unknown protein was digested with trypsin and MS-MS
spectra was acquired for many peptides that were generated
from that digestion. A list of peptide masses for one of
these is included in the excel spreadsheet (labeled
fragments 316-318).
– Go to Protein Prospector, select the MS-Tag program
– In the mass region, delete the sample masses and
copy/paste the mass list. First on this list should be the
mass of the selected peptide. Select a database and Search
– This will result in possible peptides and proteins that match.
ESI-MS/MS spectrum of a doubly charged ion (m/z 523.29) of a trypsin
autolylsis product from porcine trypsin. Subtraction of the masses of
adjacent fragment ion peaks (y-type) corresponds to the masses of the
amino acids in the peptide chain. Hence, the complete sequence of the
peptide is LSSPATLNSR.
EDMAN SEQUENCING
COMPUTER EXERCISE #5: Investigate the use of Edman
Sequence data for protein identification – and perform a
BLAST search to look for homology.
– A protein was subjected to Edman Sequencing, and the first 15 residues
were identified. These residues are RTPEMPVLENSAAQ.
– To find the protein and/or the proteins, perform a BLAST search.
– Open a browser, go to the ExPASy Molecular Biology Server
– Under the category Proteomics and sequence analysis tools,
– Select [BLAST]
– Type in the sequence found by Edman, select a database and run
BLAST.
– Look over the results and test the various parameters
– For example, run a multiple alignment, look at the various results
– (note the similarities and difference between proteins)
• END
• Homework
• Install the melanie view:
Analyze the gel on folder \gels\94-0002.mel, 94-0005.mel,
and ECOLI.mel
– Set ECOLI as reference gel, select Glucose-6-phosphate 1dehydrogenase (G6PD) as group on three gels, and analyze
the expression ratio.
• According to MALDI-TOF data #66 (derived from
human sample)
– 1.what is the most possible protein?
– 2.Mw/PI of the protein?
– 3.in Swiss-2D, in 2D-PAGE of nuclear proteins from
Human HeLa cells, what is PI and Mw found on the gel?