lecture 2 - Helsingin yliopisto

Download Report

Transcript lecture 2 - Helsingin yliopisto

“Proteomics & Bioinformatics”
MBI, Master's Degree Program in Helsinki, Finland
Lecture 2
8 May, 2007
Sophia Kossida, BRF, Academy of Athens, Greece
Esa Pitkänen, Univeristy of Helsinki, Finland
Juho Rousu, University of Helsinki, Finland
Gel Image Analysis
staining
Image acquisition
Image analysis /quantification
Image analysis is extracting perceptible data out of the 2DE
image, and storing it in a database.
It involves detecting spots and warping separate images to
align like-spots of the same proteins.
Spot data comes from the levels of spot darkness which is
proportional to the level of proteins staining or dye labeling of
particular amino acids.
Reiner Westermeier, GE Healthcare LifeSciences, Munich, Germany
Image analysis software
Some commercially available softwares:
ImageMaster2D/ Melanie
PDQuest (Bio-Rad, USA)
Progenesis (Nonlinear, UK)
Delta2D (Decodon, Germany)
Melanie
http://www.2d-gel-analysis.com/
Melanie
http://au.expasy.org/melanie/
PDQuest
http://www.bio-rad.com/
Progenesis
http://www.nonlinear.com/products/progenesis/
Delta 2D
http://www.decodon.com/Solutions/Delta2D/
Staining
Detection
Pre-labeling - radioisotopes, stable
isotopes, fluorescence
Intermediate labeling - Fluorescence
during equilibration of IPG-strip
Staining of gel background –Imidazol
Zinc staining
Staining of proteins- Organic dyes,
silver, fluorescent dyes
Blotting – Immuno / affinity detection
Scanning
Avoiding background, artifacts and noise
is essential
insufficient destaining, contamination by
fingerprints, fluorescent sprinkle, bubbles)
gel breaking, gel pieces
Use grayscale (complete range) instead of color images.
Scan all gel images using the same orientation, placing each gel at the same position on the
scanner plate.
Avoid scanning too much of the area around the gel.
Limit post-processing to crop, mirror and rotation by 90, 180, 270 degrees
Avoid producing TIFF files if you can process calibrated image file formats such as *.IMG/INF
and *.GEL.
usually the TIFF files are produces without grayscale calibration.
This means you loose precision or grayscales are distorted nonlinearly making quantitation
questionable. Avoid using JPEGfiles for quantitative analysis.
Image analysis
Manipulation of image/ normalization
-separation of overlapping spots, removing lines and speckles
Spot detection/ quantification
-background subtraction, spot segmentation, land-marking ,spot
matching
Gel comparison
-matching of gels (e.g. normal, diseased, treated),alignment
Data analysis
-changes in expression
Data representation
-annotation of spots, linking of data: spots -intensity - MS data
Organizing experiments
Organizing the experiment:
Creating projects, folders and
subfolders.
Importing gel images
Melanie/ImageMaster 2D Platinum 6.0
Import gels
Tool box to easy manipulate gels
Melanie/ImageMaster 2D Platinum 6.0
Viewing and manipulating images
Adjusting contrast
Intensity variations in x- and y-direction
3D-view
Automatically subtracted
background
Melanie/ImageMaster 2D Platinum 6.0
Spot detection
Adjust the
separation between
spots
Split overlap
Eliminate art
affects/noise
Stain saturation
Incomplete
resolution
Melanie/ImageMaster 2D Platinum 6.0
Spots report
A spot report summarizes the information
about the selected spots
Melanie/ImageMaster 2D Platinum 6.0
Detection/matching
Spot detection
Spot matching
Normalization of spot intensities
PTM?
Downregulation?
Modified from: mouse cardiac; 250 g loading; pH 3-10 IEF strips; 12.5% SDS-PAGE; file ID: sc5bcon vs. sc15iso
Matching
Reference gel
Combining 2D gel images -creating
a master gel, a “typical profile”.
Melanie/ImageMaster 2D Platinum 6.0
Master gels
Combine several images, creating the master image
•all the spots on a single image
–even those that will never be expressed at the same time,
•a summary of groups of replicate gels (average gel)
Delta 2D
Any point on a gel can be labeled, and automatically
transferred from one gel to another.
Gel image warping
Variations in migration, protein separation, stain artifacts and stain
saturation complicate gel matching and quantitation.
Compensates for running differences
between gels
After warping, corresponding spots will
have the same position on every
image.
Expression
Comparison of individual
experimental gels to master gels.
Identification of variant spots
Miscellaneous
Automatic retrieval of web information.
Send out a “Scout” to the web and bring back
corresponding data like pI, MW, sequence, function
Create a PowerPoint slide from a gel image
Delta 2D
2D Gel Databases
Swiss-2DPAGE www.expasy.ch
GelBank http://www.gelscape.ualberta.ca:8080/htm/gdbIndex.html
Cornea 2D-PAGE
http://www.cornea-proteomics.com/
World 2DPAGE, Index of 2D gel databases
http://ca.expasy.org/ch2d/2d-index.html
Swiss 2D PAGE viewer
Gel bank
cornea
World-2DPAGE
http://ca.expasy.org/ch2d/2d-index.html
Make 2D database
A software package to create, convert, publish, interconnect and keep up to
date 2DE-databases. Provided by ExPASY
The database is queryable via description, accession or spot clicking.
Cross-references are provided to other federated 2D PAGE database entries,
Medline and SWISS-PROT
Entries are linked to images showing the experimentally determined and
theoretical protein locations.
Search via –clickable images, -keywords
It runs on most UNIX-based operating systems
(Linux, Solaris/SunOS, IRIX). Being continuously
developed, the tool is evolving in concert with the
current Proteomics Standards Initiative of the
Human Proteome Organization (HUPO).
Data can be marked to be public, as well as
fully or partially private.
An administration Web interface, highly
secured, makes external data integration,
data export, data privacy control, database
publication and versions' control a very easy
task to perform.
Federated databases
A collection of databases that are treated as one entity and viewed
through a single user interface (pc.mag.com)
Robustness
Consistency
Maintenance of the database
Data quality
Limitations of current databases:
Do not contain strict/detailed descriptions of protocol (buffers, sample
volume, staining techniques all important information for gel comparisons).
Designed as 2D (and not proteomics) databases and therefore not readily
expandable to incorporate other proteomics data e.g. MS, MDLC.
Designed for reference gels, not on-going projects.
Guidelines for building a federated 2-DE database
http://ca.expasy.org/ch2d/fed-rules.html
Individual entries in the database must be accessible by a keyword search. Other
methods are possible but not required.
The database must be linked to other databases by active hypertext crossreferences, linking together all related databases. Database entries must be at
least linked to the main index.
A main index has to be supplied that provides a means of querying all databases
through one unique query point.
Individual protein entries must be available through clickable images.
2DE analysis software designed for use with federated databases, must be able to
access individual entries in any federated 2DE databases.
for a complete reference, see Appel et al.,
Electrophoresis 17, 1996, 540-546, 1996):
SWISS 2D PAGE
http://au.expasy.org/ch2d/
Swiss 2D PAGE viewer
Which gel
you want to
look at
Swiss 2D PAGE
Swiss-2D PAGE
Estimated position
Estimated position in
human liver sample
Vimentin_human
(P08670)
Peptide Mass Fingerprinting
A protein identification technique, that correlates experimental data with
theoretical data.
Protein
Proteolytic digestion
“Experimental” MS
Computer
search
Protein sequence from
database
In silico digestion
Theoretical MS
Peptide Mass Fingerprinting
• Protein digestion with protease (trypsin)
• Determination of the mass by MS -Calibration
• Database searching -Generation of the peptide map
• Comparison with theoretical peptide maps of known proteins -In silico
digestion
• Identification of the protein based on a probabilistic basis -percent
coverage, similarity etc
Protein digestion with protease (trypsin)
The molecule is cleaved at all the possible sites, which will produce a
set of peptides, of varying masses, that are characteristic of that
protein.
The mass of each peptide will be the sum of the amino acids present
including any modifications that those amino acids might have
undergone.
trypsin
Cleaves at lysine and arginine,
unless either is followed by proline in
C-terminal direction
from tutorial written by: Dr J. R. Jefferies, Parasitology Group,
Institute of Biological Sciences, University of Wales, UK
Determination of mass
MALDI - MS is used to measure the masses of the proteolytic
peptide fragments.
Every peak corresponds to the exact mass (m/z) of a peptide ion
Select:
Monoisotopic peaks
[M+H]+ i.e. singly charged
1051.54
1086.52
1094.56
1111.59
1244.64
1421.7
1476.67
1542.84
1613.88
1664.97
1763.79
1777.82
Peak list
Isotopes
Isotopes are different forms of an element, each having
different atomic mass.
They have a nuclei with the same number of protons (same atomic number) but
different numbers of neutrons.
Naturally occurring isotopes
Isotope
(A)
mass
Abundance,
%
Isotope
(A+1)
mass
Abundance,
%
Isotope
(A+2)
mass
12C
12
98,93
13C
13.0033548378
1.07
C14
14.003241988
1H
1.0078250321
99.9885
2H
2.0141017780
0.0115
3H
3.0160492675
14N
14.0030740052
99.632
15N
15.0001088984
0.368
16O
15.9949146221
99.757
17O
16.99913150
0.038
18O
17.9991604
modified from: http://www.ionsource.com/Card/Mass/mass.htm
Monoisotopic- /Average mass
The monoisotopic mass is the mass of the isotopic peak whose elemental
composition is composed of the most abundant isotopes of those elements
1156.3 average mass
A simulated isotopic
distribution of the
[M+H]+ ion of a
compound (polyananine)
Monoisotopic mass is expressed in atomic mass unites (amu), or in daltons (Da).
Accuracy
The higher the accuracy, the better and more specific the protein hit.
Accurate measurements of peptide masses
Accurate databases
Relies on the ability to search data already present in
various databases
Effect of Mass Accuracy and Mass Tolerance
search m/z
mass tolerance (Da)
# hits
1529
1
478
1529,7
0,1
164
1529,73
0,01
25
1529,734
0,001
4
1529,7348
0,0001
2
Tryptic digestion of human hemoglobin alpha chain yields 14 tryptic peptides, of
which the peptide VGAHAGEYHAEALER has an exact monoisotopic mass of
1528,7348 Da.
The singly charged ion of this peptide has an m/z value of 1529,7348. The result of
searching SWISS PROT database against all human and mouse proteins.
Lieber, Introduction to Proteomics
Database search
Peptide mass fingerprinting provides evidence for the most
probable identity of a protein.
The genome should be verified for the organism that you are
working on.
If not, then the next most ideal situation is that there is good
cDNA data available.
If neither of these are the case then it is worth checking if there are
any expressed sequence tags (EST) that can be used.
The quality of the Protein identification will depend upon:
Quality of the mass spectrometry data
The accuracy of the database
The power of the search algorithms and software used
Tools for fingerprinting
Mascot (Matrix Science)
Aldente (ExPasy)
Profound
MS-Fit (Prospector; UCSF)
Several of the available peptide mass fingerprinting programs use more sophisticated
scoring algorithms. Correct for scoring bias due to protein size, in which larger proteins give
rise to greater number of peptides, for tendendy of smaller peptides I databases to have a
greater number of matches with search m/z values.
Some of these algorithms also apply probability based statistics to better define the
significance of protein identifications.
Mascot PMF
Mascot PMF score
> 5% probability that
the match is a random
events, of no
significance
The significance of the result
depends on the size of the
database being searched
Mascot PMF results
Probability to be
random
Entry name
Coverage similarity
coverage
% of protein length covered by the
experimental peptides
Mascot protein view
ALDENTE
Aldente is a tool to identify proteins from peptide mass
fingerprinting data
(http://www.expasy.org/tools/aldente/):
Aldente, protein window
Aldente, peptide view
Aldente, results
S = S1 * S2
S = final score for this entry
S1 = sum of each peptide score
S2 = protein level score
The scoring is tunable:
the weights of each parameter in the score, can be
defined independently
newt
Swiss prot entry
Aldente results
Profound
http://prowl.rockefeller.edu/prowl-cgi/profound.exe
Profound results
Graphic display of results
MS-FIT
University of California, San Fransisco UCSF Mass Spectrometry Facility
http://prospector.ucsf.edu/prospector/4.0.8/html/msfit.htm
MS Fit
Ms fit detailed report