Transcript Step 1

Bioinformatics and Protein
Database Concepts
With the emergence of high-throughput techniques
for generation of protein sequences, computational
tools are required for storing, sharing, analyzing and
updating this data. Databases and its associated
features provide tools for accomplishing meaningful
storage of biological data.
Surabhi Agarwal
Master Layout: Part 1
1
This animation consists of 2 parts:
Part 1: From wet lab to Bioinformatics
Part 2: Database concepts and Protein databases
Extract protein, purify and cleave it into
smaller peptides.
2
Protein extract
Derive the amino acid sequence of the
peptide using methods like:
3
Edman Degradation
Mass spectrometry
Mass Spectrometry
4
Edman degradation
MKWVTFISLLFLFSSAYSRGVFRRDAHKSEVAHRFKD
LGEENFKALVLIAFAQYLQQCPFEDHVKLVNEVTEFAK
TCVADESAENCDKSLHTLFGDKLCTVATLRETYGEMA
DCCAKQEP ERNECFLQHKDDNP
5
Protein Sequence
Re-draw all images
Reference: Biochemistry by Stryer et al., 5th edition
Protein sequences determined and stored in
databases for future usage
1
Definitions of the components:
Part 1 – From wet lab to bioinformatics
1.
Protein: Protein is a bio-molecule made out of chains of amino acid residues.
These chains are formed between amino-acids by eliminating a water molecule and
forming a “peptide bond”. Proteins are involved in performing the structural,
functional and regulatory functions of the cell.
2.
Peptide: Small protein fragments which are formed by a stretch of around 50
amino-acids are called peptides
3.
Amino acid sequence: The order of amino acids and their linear arrangement is
known as amino-acid sequence. It is also known as the primary structure of the
protein.
4.
Edman degradation: This is a chemical method for sequencing amino acid
residues in a protein or a peptide. The N-terminal residue is labelled using phenyl
isothiocyanate and then cleaved from the remaining peptide chain without
disrupting any of the other peptide bonds. This labelled amino acid is then detected
and the procedure is repeated to identify each N-terminal amino acid sequentially.
5.
Mass spectrometry: A technique for production and detection of charged
molecular species in vacuum, after their separation by magnetic and electric fields
based on mass to charge (m/z) ratio.
2
3
4
5
1
Step 1: Protein Extraction
Break open the cells
2
3
4
Protein source
(usually a cultured
tissue or microbial
extract)
Re-suspend the extract
in lysis buffer
Centrifugation
Supernatant containing
proteins is isolated
Action
As shown in
animation
CENTRIFUGE
Description of the action
Redraw all the figures. Animator has to re-draw the figure
titled “CENTRIFUGE” with all the labeling, as it has been
taken from a web-resource. On ‘protein source’ show the
zoom in effect focusing on purple molecule. Show the
arrow that leads to breaking the molecule. Add a spin
effect on the crude extract to depict centrifugation. Remove
the supernatant (orange liquid in last figure) .
Crude Extract
Audio Narration
The cells present in the tissue culture are lysed open thereby
releasing crude extract. This extract is centrifuged to separate the
protein mixture from the cell debris. The supernatant obtained is
made up of a mixture of proteins having a variety of properties.
Protein of interest must then be isolated from this mixture.
5
http://3.bp.blogspot.com/_xW3FQUQ2DYI/Rp4DF1r_0HI/AAAAAAAAAhY/B5MzdxVSV6I/s400/centrifugation.png
Biochemistry by Stryer et al., 5th edition
1
2
Step 1: Protein Extraction
Proteins are
purified
using
various
techniques
such as
3
4
Chromatography
Solution containing purified protein
extract. Proteins are cleaved into
smaller peptides using proteases.
Electrophoresis
Action
Description of the action
As shown in
animation
This slide is in continuation with the
previous slide. Show the arrow from
first figure to the two techniques.
Then show converging arrows to the
last figure
5
Biochemistry by Stryer et al., 5th edition
Biochemistry by A.L.Lehninger et al., 3rd edition
Audio Narration
The protein of interest is separated from the
protein mixture present in the supernatant.
This is carried out by suitable techniques
such as chromatography or electrophoresis
which make use of various properties of the
proteins such as their charge, mass etc for
separation.
1
Step 2: Edman Degradation
Peptide to be sequenced:
Ala-Gly-Asp-Phe-Arg-Gly
First round
2
3
4
5
Action
Description of the action
Breakdown of
Molecule
Re-draw all images. Both sides depict the
same process. Left side is the schematic
and right side is the same process at
molecular level. Show the steps of both
processes in a parallel fashion
Biochemistry by Stryer et al., 5th editiond edition
Audio Narration
Edman degradation employs pheny isothiocyanate reagent,
which reacts with the amino terminal residue of the peptide
giving rise to phenyl thiocarbamoyl derivative of the aminoacid reside. In mild acidic conditions, this cyclic derivative of
the amino acid is released in the form of a PTH-amino acid,
which can then be identified by chromatographic techniques.
The procedure is then repeated to identify each N-terminal
amino acid sequentially.
Step
3:
Mass
Spectrometry
1
Vacuum Envelope
Ionization
Ion
Source
2
Forms ions
(charged molecules)
3
Mass
Analyzer
Ion
Detector
Sort Ions by Mass (m/z)
Detects ions
Data Processing
Action
Description of the action
Experimental
Process as
shown in
animation
From 1st figure show an arrow leading to figure
2 “Ion Source”. From their Arrow leads to
“Mass Analyzer” followed by “Ion Detector”.
Enclose all figures n a box titled “Vacuum
Envelop”. From there on, arrow leads to the
“Data System” and then to “Data Processing”
Data System
Relative
Abundance
Sample
Inlet
4
5
Detection
Mass Analyzer (filtering)
Mass Spectrum
1000
m/z
2000
Audio Narration
The mass spectrometer is an instrument that produces charged molecular
species in vacuum, separates them by means of electric and magnetic fields and
measures the mass-to-charge ratios and relative abundances of the ions thus
produced. A tandem mass spectrometer makes use of a combination of two
mass analyzers, separated by a collision cell, in order to provide improved
resolution of the fragment ions. The first mass analyzer usually operates in a
scanning mode in order to select only a particular peptide ion which is further
fragmented and resolved in the second analyzer. This can be used for protein
sequencing studies.
Master Layout: Part 2
1
This animation consists of 2 parts:
Part 1: From wet lab to Bioinformatics
Part 2: Database concepts and Protein databases
Based on the type of the data and its
prospected usage, design a
database schema.
2
Establish, Maintain and Disseminate
Protein Sequence Database which can
be:
Primary
(Aminoacid
sequence)
3
Structure
(Folds,
Domains)
Provide software and analysis
tools to access this data
4
5
Secondary
(Patterns,
Motifs)
Re-draw all images.
Reference: Biochemistry by Stryer et al., 5th edition
1
Definitions of the components:
Part 2 – Database concepts and Protein databases
1.
Type of data: The type of data stored in Biological Databases can be of various
types such as Pure Sequences, Sequences with structure, meta-data about the
source of the sequence, experimental detail, etc.
2.
Prospected Usage: The databases are primarily used to store all the
information in a single web-based resource. It also provide analysis tools for
various sequence analysis functions such as pair-wise sequence alignment,
multiple sequence alignment, homology modelling, etc
3.
Database schema: The design of the database at various levels is called a
database schema. It includes the attributes of all individual tables and the
relationships between them. The schema is defined at three levels, namely,
“Physical”, “Logical” and “View”.
4.
Primary Database: In biological database studies, primary databases store
only the protein sequence information.
2
3
4
5
1
Definitions of the components:
Part 2 – Database concepts and Protein databases
5.
Secondary Database: In biological database studies, secondary databases
refer to the repository of domains and patterns that occur within a sequence.
This information can be stored in the form of signature patterns, fingerprints,
etc.
6.
Structure Database: In biological database studies, structural database store
the three-dimensional geometry of the protein. It stores the atomic coordinates
of individual atoms in the protein molecule and other geometrical parameters
along with sequence information.
7.
Analysis tools: Analysis tools are the software tools that are available on most
of the web-based database sites. These tools help in conducting further studies
and analysis on protein sequences such as alignment, phylogenetic predictions,
etc.
8.
Meta data: Meta-data is the information about the data that is getting
documented in an database. It covers various features such as the source of
data, methods for retrieval, etc.
2
3
4
5
Step 1: A generic protein DB: Types of data
1
2
3
4
5
•
•
•
•
•
•
•
Amino-acid sequence
Location
Length of the sequence
Molecular type and classification
Accession and version, Gene ID
Keywords an Feature table
Patterns and Domains
•
•
•
•
•
Source
Reference
Gene
Categories of
data
Description of the action
Animate the sub-parts according to the
order given in this animation, i.e.
“Sequence” followed by its descriptive
blue box). Similarly for “Source”,
“Reference” and “Gene”. Re-draw all
images
http://www.ncbi.nlm.nih.gov/
http://expasy.org/
http://www.pdb.org/pdb/home/home.do
http://www.ddbj.nig.ac.jp/
Source organism
Scientific name and common
name
Taxonomy
Organelle
•
•
Sequence
Author
Title
Journal
Cross references
Comments
Action
•
•
•
•
•
Source gene
Corresponding mRNA
Corresponding Coding
Sequence (CDS)
Audio Narration
All data related to a protein can be divided into four broad
categories namely sequence details, Source, Gene details and
References. “Sequence” details contain the features of a protein’s
amino acid sequence such as the length, location, patterns and
identifiers of the protein sequence. The “source” contains
information based on the biological source used for retrieving the
protein. “Gene” contains details of the gene from which the proteins
is being expressed. “Reference” contains the details of the
research publication in which the study was reported.
Step 2: A Generic Protein DB Schema
1
PHYSICAL
2
Describes the physical
location of storage of
the data within a
database.
3
4
Action
Defines the
various Database
schemata
5
LOGICAL
Describes which type
of data will be stored
in which particular
table and the
relationships between
these tables.
Description of the action
Show the three boxes in as the first step while the
narrator speaks the first line of audio narration
“Database …. and View”. In the next step of
animation, show the text of each box
VIEW
Describes the user
interface of the
database and the
view that will be
shown to the user.
Audio Narration
Database designing is done at various levels such as Physical,
Logical and View. At the physical level, we define the purpose of
the database which is in accordance with the prospected usage.
At the logical level, we define the tables, attributes of the tables
and relationship between tables . Logical level is the most
complex and important schema for databases and requires a
thorough understanding of the data and its contexts and
relationships. At the View level we define the views and
appearance of the database
Step 3: Protein Database characteristics
1
PRIMARY/SEQUENCE
DATABASE
2
3
4
5
•
•
•
SWISS-PROT
UNI-PROT
NCBI
ANALYSIS
TOOLS
BLAST
•FASTA
•Multiple Sequence
Alignment
•Structure Prediction
•Functional annotation
•Search engine
•Pattern and Domain
alignment /search
•
DERIVED/SECONDARY
DATABASE
STRUCTURAL
DATABASE
Action
Defines the
various
Database
schemata
•
•
•
Prosite
ProDom
Pfam
•
•
•
PDB
Proteopedia
Biological Structural
Database from EBI
Description of the action
Show the central round figure followed by the 3
types of DB on the left and their examples. In the
end, show the Analysis tools and its examples
TYPE
TOOLS
Audio Narration
A typical biological database can be characterized by its
“Type” and its “Tools”. The “Type” defines the category of
data that it includes, such as sequence, domains or
structure. This implies that the particular database’s most
prominent feature includes either sequences, domains or
structure and it will primarily be used for their analysis.
The analysis tools defines the platforms that the site will
provide for gaining an insight into the protein data.
http://www.ncbi.nlm.nih.gov/, http://expasy.org/, http://www.pdb.org/pdb/home/home.do, http://www.ddbj.nig.ac.jp/, http://www.ebi.ac.uk/Databases/structure.html
http://www.uniprot.org/, http://expasy.org/prosite/, http://prodom.prabi.fr/prodom/current/html/home.php, http://pfam.sanger.ac.uk/,
http://www.proteopedia.org/wiki/index.php/Main_Page
1
Step 4: Database input formats
PROTEIN DATABASE
Enter your Query term
2
SEARCH DATABASES
UNIQUE ID
MOLECULE NAME
AMINO-ACID SEQUENCE
3
Serum
albumin
MKWVTFISLLFLFSSAYSRGVFRRDAHKSE
Acute
P01009
phase
OR blood coagulation OR
9606[NCBI]
Full-length
cDNA
VAHRFKDLGEENFKALVLIAFAQYLQQCPF
Protease
inhibitor libraries and
SERPINA1
EDHVKLVNEVTEFAKTCVADESAENCDKSL
normalization
HTLFGDKLCTVATLRETYGEMADCCAKQEP
ERNECFLQHKDDNP
KEYWORD
LITERATURE
SEARCH
GENE
TAXONOMY
4
Action
5
Shows
the
general
functions
in a
database
Description of the action
Follow the steps as shown in the
animation. DO NOT animate the yellow
box. As the animated cursor goes to
“Unique ID” narrator will read the text that
comes in the yellow box displayed along
with “Unique ID” entry. This will be followed
by an example entry in the white box.
Similarly, “Molecular Name” will be followed
by its corresponding narration in yellow
box, and so on.
Enter
the
amino-acid
sequence
of
Enter
Enter
Enter
Enter
the
the
the
key-word
name
the
name
literature
of
ofthe
tothe
identify
molecule
related
gene this
that
information
to the
be for
Enter
Enter
the
unique
taxonomic
identification
identifiers.
number
protein
to
be
analyzed.
protein.
searched.
codes
like for
theExthe
name
protein,
protein
of the
or
peptide,
journal,
other
gene
gene
citation
the
protein.
These
IDs
vary
according
to or
related
related
title to
of
information
the
the protein,
research
etc.
paper.
database
such
as accession
number,
GeneID, ODB ID, etc.
Audio Narration
For extracting the protein information from a database,
users can give a variety of input terms. These can be:
1.
Unique ID: <Read the text in the yellow box in each
case>
2.
Molecular Name
3.
Amino-acid sequence
4.
Keyword
5.
Literature
6.
Gene
7.
Taxonomy
1
Step 5: Database output formats
PATTERN
ANALYSIS
CITATIONS
2
3
4
SEARCH DATABASES
ANNNOTATIONS
SEARCH
EXPERIMENT
DETAILS
Action
MOLECULAR
DESCRIPTION
Enter your Query term
SECONDARY
STRUCTURAL
DETAILS
Output from
database
5
PROTEIN DATABASE
SOURCE
ORGANISM
DETAILS
IDs OF ENTRIES
IN RELATED
DATABASE
GENE NAMES
AND
DESCRIPTION
Description of the action Audio Narration
Co-ordinate the animation with the audio
narration. For Example, in animation mode,
the first step is to display “Molecular
Description”. This display must have the first
point of audio narration spoken along with it.
Show the outputs tab as and when it is
narrated
Once the user submits the query, the output can be of multiple formats. The
generalized information that users can obtain from protein databases is the
protein’s
1.
General Description of the protein molecule
2.
Annotations of the protein
3.
Name and description of the gene that transcribes them
4.
ID of the same protein in other relevant databases
5.
Details of the experiment conducted for characterizing proteins
6.
Details of the Protein’s secondary structure
7.
Details of the organism which was used as a source for obtaining the
protein
8.
Citations of research conducted for obtaining this protein
9.
Patterns occurring within a sequence and their analysis
1
Step 6: Database Analysis Tools
2
3
OUTPUT
INPUT
MAPWMHLLTVLALLALWGPNSV
QAYSSQHLCGSNLVEALYMTCG
RSGFYRPHDRRELEDLQVEQAE
LGLEAGGLQPSALEMILQKRGIV
DQCCNNICTFNQLQNYCNVP
ANALYSIS TOOLS
Identify
physico-chemical
Aligned
sequences
and
Identify
Variable
Synonyms
protein
and
and
conserved
from
Scientific
properties
such
as
chemical
Predicted
Secondary
and
structures
sequence
residues
terminology
of
proteins
formula,
iso-electric
Tertiary half-life,
Structures
point, molecular weight, etc.
4
Action Description of the action
Input Output Slide
5
Display the panel in the left. In first step the input
appears, followed y the arrow embossed with letters
“Analysis Tools”. The output panel appears thereafter,
with each output appearing one after the other. At
display of each output, the narrator to read aloud the
text written
Audio Narration
This slide shows the different kinds of analysis that can be conducted on a given protein sequence. The
query can be the protein name, sequence or any other identifier of the protein. In this example, we
provide the protein sequence as Input. Once the query protein sequence is entered into the Analysis
tool, it can give various kinds of results such as
1.
Identify protein from sequence
2.
Identify physico-chemical properties such as chemical formula, half-life, iso-electric point, molecular
weight, etc.
3.
Aligned sequences and structures
4.
Variable and conserved residues
5.
Predicted Secondary and Tertiary Structures
6.
Synonyms and Scientific terminology of proteins
1
2
Step 1: Case study: To study the characteristics of
human serum albumin
OBTAIN FASTA
SEQUENCE
PHYSICOCHEMICAL
PROPERTIES
DOMAIN
ANALYSIS
STRUCTURAL
ANALYSIS
3
View Full Animation
4
Action
Slides with
Options to
chose a step
or view fll case
study
Description of the action
Display the 4 panels in the animation.
These 4 steps are in sequence, but the
user must be given an option to directly go
to the specific step if they want to. In the
bottom, give a link to view full case study
5
http://www.pdb.org/pdb/home/home.do
Audio Narration
We explain the usage of Protein databases using
the example of “Human Serum Albumin” protein. If
you want to view a specific step in the case study,
click on the relevant panel. Else click on “View Full
Animation”
1
Step 1.a : Obtain FASTA Sequence– SWISS PROT
2
3
4
Serum Albumin
Action
Retrieving data
5
Description of the action
All the screen shots taken from the website needs to be remade by the animator to
simulate the web based environment .
None of the images should be a part of the
web database. Follow the steps as shown
in the animated flowchart
http://expasy.org/sprot/
Audio Narration
Open a web browser and go to
On the top right corner of
the page, there will be a search box. Click on the
downlink ahead of the “Search” box (indicated by the
arrow). We get a list of options for the databases to
search from. Select UniProtKB. Type the name of
the protein of your choice (Ex -Serum Albumin ) in
the text box in front of the word “for”
1
Step 1.b : Obtain FASTA Sequence– SWISS PROT
2
3
4
Action
Retrieving data
Description of the action
Re-make all the screen shots. Follow the
steps as shown in the animated flowchart
5
http://expasy.org/sprot/
Audio Narration
The results page for the search shows 179 hits for our
query. It is shown on the top of the page. The first 25
of them are shown on the first page, which can be
viewed by scrolling down the page. Click on the entry
of your choice. Here we click on the human Albumin hit
(ALBU_HUMAN)
1
Step 1.c : Obtain FASTA Sequence– SWISS PROT
2
3
4
Place for headings. Scroll
down to find the word
“Sequences” in this position
Action
Retrieving data
Description of the action
The first image is displayed parallel to the
narration “The top… like this”. When the arrow
appears read the second line of narration
“Search for…the page”. The second panel of
images in this slide goes parallel to narration
“Click on tab…new tab”. In the last panel
5
http://expasy.org/sprot/
Audio Narration
The top of the result page looks like this. Search for the
heading “Sequences”, by scrolling down the page. Click on
the tab FASTA next to the sequence of your interest. The
FASTA sequence opens on a new tab. Save this FASTA
sequence in your computer.
Step 1.d : Analysis Tools
1
ProtParam
HeliQuest
2
Radar
SAPS
3
Three to One
ColorSeq
4
Action
Types of Tools
Description of the action
Show the chart with the color coded division for
types of tools as shown in figure. Highlight the
“Primary Structural Analysis” and follow it up by
the display of all the tabs on the right. Highlight
the first tool “ProtParam”
5
http://expasy.org/sprot/
Audio Narration
Once the FASTA sequence is retreived, we can subject it to
variety of Protein Analysis toools which are broadly classified
into “Sequence Similarity search tools”, “Primary structural
analysis tools”, “Phylogenetic Analysis tools”, “Molecular
Modeling and Visualisation Tools” and “Structure Prediction
tools”. Here we explore the web based service called
ProtParam which belongs to “Primary Structural Analysis
tools”. For exploring other such services, users can visit
http://expasy.org/sprot/
Step 2.a : Physico-chemical Properties– SWISS PROT
1
Enter the accession number
OR paste the sequence here
2
Click(descriptive
on Compute
Delete the first line
line) from your
Parameters
FASTA sequence,
such that only the amino –
acid sequence is there
3
4
Action
Tool Input
Description of the action
Re-make all the screen shots. Follow the
steps as shown in the animated flowchart
5
http://expasy.org/tools/protparam.html /
Audio Narration
The front-end for the tool will ask you to input the
accession ID of the protein under study OR the sequence of
that protein. Delete the first line (descriptive line) from your
FASTA sequence, such that only the amino acid sequence
is there. Click on “Compute Parameters”. On the results
page, scroll down to find the various physico-chemical
parameters of this protein
Step 2.c : Physico-chemical Properties– SWISS PROT
1
CSV stands for “Comma Separated Values.
Files with .csv extension, can be easily
accessed in Plain text as well as spreadsheet
formats
2
3
4
Action
Tool Output
Description of the action
Re-make all the screen shots. Follow the steps
as shown in the animated flowchart. When the
user clicks on the green highlighted tab, the
definition must be read aloud alongwith the
written display of the definition in a separate
box as shown in the slide animation
5
http://expasy.org/tools/protparam.html
Audio Narration
This part of the results gives the percentage of each amino
acid in the sequence. The highlighted region indicates the
CSV file link. CSV stands for “Comma Separated Values”.
which can be opened from text as well as spread sheet
formats. This file can be downloaded in its comma
separated format, by clicking on it. CSV files can also be
opened with Microsoft Excel
Step 2.d : Physico-chemical Properties– SWISS PROT
1
Formula Represents
represents the
formula
the chemical
Number of
atoms for
present in
the query
molecule
the molecule
This shows the charge states of the amino
acid residues within the protein molecule
Half – Life describes the time required for
the protein to degrade to half of its original
mass
2
Defines the solubility of the proteins.
Hydrophobic molecules exhibit a Positive
GRAVY value while hydrophilic molecules
show a negative GRAVY value
3
4
Action
Tool Output
Description of the action
Re-make all the screen shots. Follow the steps
as shown in the animated flowchart. When the
user clicks on the green highlighted tab, the
difinition must be read aloud alongwith the
written display of the definition in a separate
box as shown in the slide animation
5
http://expasy.org/tools/protparam.html
Audio Narration
Other information that can be obtained from these databases
include chemical formula for the protein, total number of
atoms present in the protein, total number of negatively and
positively charged residues, estimated half-life of the protein,
i.e. the time in which the protein will degrade to half its original
mass and the average hydropathicity which gives an insight
into the solubility of the proteins. Hydrophobic molecules
exhibit a Positive GRAVY value while hydrophilic molecules
show a negative GRAVY value
Step 3.a : Domain Analysis– PROSITE
1
2
3
4
Action
Tool Input
Description of the action
Re-Draw all screen shots. Display
the sequence and then minimize it to
fit into the input window of the web
based tool. Show the clicking effect
on the button named “Scan”
5
http://expasy.org/prosite/
Audio Narration
Go to http://expasy.org/prosite/ .Input the FASTA
sequence obtained in previous steps into the
input box of the server. Click on Scan.
Step 3.b : Domain Analysis– PROSITE
1
HITS BY PROFILE
HIT 1
2
HIT 2
3
4
HIGHEST SCORE
HIT 3
Action
Tool Output
Description of the action
Re-Draw all screen shots. Show the
3 results and then emphasize on the
score of thee 2nd hits as it is the
highest. Display clicking effect on 2nd
hit
5
http://expasy.org/prosite/
Audio Narration
The results page shows the various profiles that
have the highest probability of occurrence on the
basis of which they are assigned scores. You
should select the hit with the highest score
Step 3.c : Domain Analysis– PROSITE
1
Location of Albumin
Domain in the
sequence – amino acid
position 210-402
2
CONSERVED CYSTEINE
PROSITE figure INVOLVED
of the IN
DISULPHIDE BOND
albumin domain
3
4
POSITION OF THE
PATTERN MATCHED
FOR IDENTIFYING
DOMAIN
Structure of an
albumin domain
Action
Tool Output
Description of the action
Re-Draw all screen shots. Type the
name of the query in the search box.
Click on Go. Follow it up by an arrow
and the output image
5
http://expasy.org/prosite/
Audio Narration
The result displays the position of the Albumin domain
highlighted in the sequence from position 210-402. It also
displays a graphical view in form of a downloadable png
image where the Profile hits are represented as colored
shapes with their PROSITE name. It then displays the
structure of the Albumin Domain highlighting the disulhphide bonding cysteine residues as “C” and and its
signature pattern as “*”
Step 4.a : Structural Analysis– RCSB PDB
1
Summary
Biology and Chemistry
Geometry
2
Molecular Description
Related PDB entries
3
4
Ligand chemical components
Derived data
Action
Tool Output Display
slide
5
Classification:
Structure Weight:
Molecule:
Polymer:
Type:
Length:
Chains:
Description of the action
Re-Draw the tabs. The first panel of tabs is
horizontal one. Out of them “Summary” tab is active
in this slide. That’s is the tab in white is active.
Under “Summary” there are 4 more tabs which are
vertical. Out of them the blue tab is Active. Slide 4.a
to 4.d shows the vertical tabs active one by one.
Slide 4.e. to 4.g, shows the remaining two horizontal
tabs active. followed one while reading the audio
narration of each slide, with the display it carries
http://www.pdb.org/pdb/home/home.do
Transport Protein
133377.93
Serum albumin
1
polypeptide(L)
585
A, B
Audio Narration
Once the user enters “Serum Albumin” in the PDB search box, in the
output page of the selected PDB entry, we find the following tabs.
The horizontal tabs summarize the entire result page. The vertical
tabs occur as the initial description in the first page. Each of these
tabs can be explored in detail. The structural analysis of the protein
can display a wide range of properties such as the description of the
protein molecule including classification of the protein, the chains it
contains, number of amino acids, etc.
Step 4.b : Structural Analysis– RCSB PDB
1
Summary
2
Derived data
1AO6
1BM0
1E7E
2BXC
2BXF
2BXN
Action
Description of the action
Molecular Description
Related PDB entries
Ligand chemical components
3
4
5
Biology and Chemistry
Tool Output Display
slide
Crystal structure of human serum albumin
Crystal structure of human serum albumin
Human serum albumin complexed with decanoic acid
Human serum albumin complexed with phenylbutazone
Human serum albumin complexed with diazepam
Human serum albumin complexed with myristate and iodipamide
Re-Draw the tabs. The first panel of tabs is
horizontal one. Out of them “Summary” tab is active
in this slide. That’s is the tab in white is active.
Under “Summary” there are 4 more tabs which are
vertical. Out of them the blue tab is Active. Slide 4.a
to 4.d shows the vertical tabs active one by one.
Slide 4.e. to 4.g, shows the remaining two horizontal
tabs active. followed one while reading the audio
narration of each slide, with the display it carries
http://www.pdb.org/pdb/home/home.do
Geometry
Audio Narration
The display also shows entries that are closely related to the user’s
query, such as in the case of the same protein characterized from a
different organism.
Step 4.c : Structural Analysis– RCSB PDB
1
Summary
2
Ligand chemical components
4
5
Geometry
Molecular Description
Related PDB entries
3
Biology and Chemistry
Identifier
Name
Formula
Interaction View
LQZ
2-(diethylamino)-N-(2,6-dimethylphenyl)ethanamide
C14 H22 N2 O
Ligand Explorer
Derived data
Action
Tool Output Display
slide
Description of the action
Re-Draw the tabs. The first panel of tabs is
horizontal one. Out of them “Summary” tab is active
in this slide. That’s is the tab in white is active.
Under “Summary” there are 4 more tabs which are
vertical. Out of them the blue tab is Active. Slide 4.a
to 4.d shows the vertical tabs active one by one.
Slide 4.e. to 4.g, shows the remaining two horizontal
tabs active. followed one while reading the audio
narration of each slide, with the display it carries
http://www.pdb.org/pdb/home/home.do
Audio Narration
The protein molecules are generally structurally characterized
by attaching it with a ligand and determining its structure from
experimental techniques. The description of these ligands is
given in the result summary of the query protein
Step 4.d : Structural Analysis– RCSB PDB
1
Summary
2
Geometry
Molecular Function
Cellular Component
DNA binding
extracellular region
Molecular Description
fatty acid binding
extracellular space
Related PDB entries
copper ion binding
platelet alpha granule lumen
protein binding
protein complex
Ligand chemical components
3
Biology and Chemistry
Derived data
drug binding
lipid binding
metal ion binding
chaperone binding
4
5
Action
Tool Output Display
slide
Description of the action
Re-Draw the tabs. The first panel of tabs is
horizontal one. Out of them “Summary” tab is active
in this slide. That’s is the tab in white is active.
Under “Summary” there are 4 more tabs which are
vertical. Out of them the blue tab is Active. Slide 4.a
to 4.d shows the vertical tabs active one by one.
Slide 4.e. to 4.g, shows the remaining two horizontal
tabs active. followed one while reading the audio
narration of each slide, with the display it carries
http://www.pdb.org/pdb/home/home.do
Audio Narration
Result summary displays derived data for the Serum Albumin such
as the molecular and biological functions that the protein is involved
in.
Step 4.e : Structural Analysis– RCSB PDB
1
Summary
2
3
4
SNP ID
rs11538232
rs59066571
rs11538221
rs11538216
rs11538226
rs11538217
rs58624704
rs11538220
rs17400586
rs3204504
rs3210154
rs11538228
rs3210163
rs28930975
Action
Tool Output
Display slide
Biology and Chemistry
Amino Acid Change PDB position
N->S
E->V
E->G
S->L
T->A
F->L
R->Q
K->E
K->N
A->V
A->T
A->G
Q->L
E->K
18
48
57
65
79
134
186
190
190
191
191
194
196
297
SNP ID
rs72552710
rs11538223
rs72552711
rs3210210
rs28930976
rs11538215
rs11538214
rs1140449
rs1063469
rs60826059
rs11538227
rs57636959
rs61579038
rs72552712
rs11538208
Description of the action
Re-Draw the tabs. The first panel of tabs is
horizontal one. Out of them “Biology and
Chemistry” tab is active in this slide
5
http://www.pdb.org/pdb/home/home.do
Geometry
Amino Acid Change
K->N
N->D
E->K
L->R
D->V
C->R
E->V
D->Y
K->E
T->I
S->P
K->N
T->P
K->E
M->K
PDB position
313
318
321
327
340
369
382
451
466
478
517
536
540
545
548
Audio Narration
The Biological aspect of Serum Albumin are also displayed as
results. The unique feature of this tab is that it gives a
complete list of Single Nucleotide Polymorphisms (SNP) in
the protein sequence. This shows the change in amino acids
as well as the locations of the SNPs and the SNP Ids.
Step 4.f : Structural Analysis– RCSB PDB
1
2
The length of the covalent The
bonds
angle
between
formed by 3 consecutive atoms in
two adjacent atoms in a protein
nativeBiology
molecule
conformation
of a protein
Summary
and Chemistry
Geometry
The angle formed by 2 consecutive planes of
4 linearly bonded atoms
3
4
Action
Tool Output
Display slide
Description of the action
Re-Draw the tabs. The first panel of tabs is
horizontal one. Out of them “Geometry” tab is
active in this slide
5
http://www.pdb.org/pdb/home/home.do
Audio Narration
The 3-D visualization of Serum Albumin is given as a part of
the results which can be viewed from a tool called Jmol. Along
with the image analysis from Jmol, users can also study and
download the structural characteristics of the protein such as
its Bond Length along with the place and frequency of its
occurrence. Structural results also summarize the Bond Angle
and the Dihedral Angles including the chain where they occur
and the frequency of its occurrence.
1
2
3
4
Interactivity option 1:Step No 1: To find the sequence corresponding to
the beta chain of insulin and compare their lengths in different organisms
Check the names of
the source organism
4
Click on the entry
corresponding to beta
chains
3
Interacativity Type
Arrange the steps in the
order to be performed.
5
Sort the file according
to sequence lengths 6
Chose and open a primary
sequence database of your
choice
2
Options
Remove the step number from
the bottom of the tab. Show all
the steps in the mixed order.
The user must click on the tabs
order wise. If the user clicks at a
tab which is not in the right
order, then flash a message
saying “try again”
Boundary/limits
Store the sequence ID,
source organism and
length of the sequence in
a separate text file
5
Input the term serum
albumin in the search
box
1
Results
All the tabs must be
arranged in right order.
The numbers mentioned
indicate the correct order.
1
Questionnaire
1. Which of the following is a Protein Sequence Database?
2
Answers: a)Swiss-Prot b)PDB c) CSD d) GEO
2. Which server should be used for identifying Protein Domains?
Answers: a)NCBI b)DDBJ
3
c) PROSITE
3. Which reagent is used for Edman Degradation?
Answers: a)Dabsyl Chloride
b)Ninhydrin
thiocyanate
d) Cyanogen Bromide
4
5
d) All
c) Phenyl iso-
4. Which amongst the following can be used for retrieving proteins from a
database
Answers: a)Protein Name
Identifier d) All
b) Corresponding Gene Name c) Unique
1
2
Questionnaire
5. Which one is NOT a step in sequence identification using Mass
spectroscopy
Answers: a) Labelling terminal residue b) Electro-spray ionization c)
Peptide fragmentation d) calculating m/z ratio
6. Which one is NOT a derived protein Database
3
Answers: a) Prosite
b) Pfam
c) Swiss-prot
d) ProDom
7. Answers: The most complex and important database schema is
a) Physical
4
5
b) Logical
c) View
d) All
Links for further reading
Reference websites:
http://www.proteopedia.org/wiki/index.php/Main_Page
http://www.pdb.org/pdb/home/home.do
http://www.ncbi.nlm.nih.gov
http://expasy.org/sprot/
http://prodom.prabi.fr/prodom/current/html/home.php
http://expasy.org/prosite/
http://pfam.sanger.ac.uk
Links for further reading
Books:
Biochemistry by Stryer et al., 5th edition
Biochemistry by A.L.Lehninger et al., 3rd edition
Database System Concepts by Korth et al., 5th edition