Computation in Biology

Transcript Computation in Biology

Computation in Biology
Nagasuma Chandra
Bioinformatics Centre & SERC
IISc

Next-generation biologists must
straddle computation and biology
Hierarchical structures in living systems
Cell
Tissue
Organelle
Organ
Macromolecule
Supramolesular assembly
Organism
Genome Sequence- a book of
life
DOE-Genomes.org
examplesfromenglishtext
genomicbiologytakesaholisticapproachtomolecularbiologyandev
olutionbystudyingthecompletegenomeitsgenesanditsproteinexpre
ssionpatternsncbiprovidesseveralgenomicbiologytoolsandresourc
esincludingorganismspecificpagesthatincludelinkstomanywebsite
sanddatabasesrelevanttothatspeciesweinviteyoutoexplorethelinks
providedonthispage.
examplesfromenglishtext
genomicbiologytakesaholisticapproachtomolecularbiologyandev
olutionbystudyingthecompletegenomeitsgenesanditsproteinexpre
ssionpatternsncbiprovidesseveralgenomicbiologytoolsandresourc
esincludingorganismspecificpagesthatincludelinkstomanywebsite
sanddatabasesrelevanttothatspeciesweinviteyoutoexplorethelinks
providedonthispage.
Genomic biology takes a holistic approach to molecular biology
and evolution by studying the complete genome, its genes, and
its protein expression patterns.NCBI provides several genomic
biology tools and resources, including organism-specific pages
that include links to many web sites and databases relevant to
that species. We invite you to explore the links provided on this
page.
Molecular circuitry in the cell
Biochemical networks
www.expasy.ch
Cellular networks
Characteristics of the yeast proteome: map of protein-protein interactions.
H.Jeong, S.P. Mason, A.-L. Barabasi, Z.N. Oltvai, Nature, 411, 40-41 (2001);
Role of computation




Data management
Data Analysis & Interpretation
Prediction
Application
What you need…


A model
A computational tool
Models

Levels of modelling

Abstraction level

Hierarchy in living organisms
Abstraction level of the model
Molecular models




Sequences
Structures
Genome Sequences
The ‘omics’ era
Software tools






Accelrys
Tripos
MOE
BioSuite
Schrodinger
+ hundreds of academic software bits
What you can do ………….






Sequence Space
Determine identity of the molecule
Predict physicochemical properties
Predict three dimensional structure
Predict Function
Apply in pharmaceutical/ other industries
Examples

Accelrys GCG

MOE

BioSuite
Example usage
Examples of GCG capabilities











Sequence Comparison
Database Searching and Retrieval
DNA/RNA Secondary Structure Prediction
Editing and Publication
Evolution
Fragment Assembly
Gene Finding and Pattern Recognition
Sequence Importing and Exporting
Mapping
Primer Selection
Protein Analysis
Single Gene/Protein Sequence
analysis- MOE
The colored bars over the sequences reflect the
secondary structure of those sequences having
associated atomic coordinates. Chains with
sequence-only data have no such bars. In this
instance, seven of the chains in the family have
structural data and can therefore be used as
structural templates.
This image illustrates Residue
Identity matrix in MOE which shows
Chains 13 and 14 have the highest
percent identity to the query
sequence.
Whole genome
Sequence analysis- BioSuite
Structures





Advantages of structural-level studies
The protein folding problem
Sequence-Structure Gap
Need to predict structure using
computational methods
Applications
Four levels of protein structure
Structures





Advantages of structural-level studies
The protein folding problem
Sequence-Structure Gap
Need to predict structure using
computational methods
Applications
What you can do ………….







Structure Space
Visualize structures
Build molecular models
Manipulate
Analyse
Simulate molecular behaviour
Apply in Drug Discovery
Visualization:
Viewer Module of InsightII
Pulldowns
Module Icon
Icon Palette
Command prompt
Information Area
Visualizations
Ligand-Protein Interaction
Aiding NMR Structue
determination
Aiding crystal structure
determination.. X-ray crystallography
Building molecular models



Small molecules
Protein/ Nucleic acid/ Carbohydrates
Predicting Protein Structure




Homology modelling
Threading
Modifications- Site directed mutants
Protein-ligand complexes
BIOPOLYMER
Biopolymer module provides tools for building and modifying a wide
range of biological macromolecules, including proteins, peptides, nucleic
acids, and carbohydrates.
It is useful in:
Building Proteins and Peptides
Structural Domain Analysis
Building Carbohydrates
Building Nucleic Acids
Structural Database Searching.
This module in turn can be used
later by other programs for
structure refinement and analysis
of small and large molecules
Backbone structure of the C-terminal fragment of E.coli
50S ribosomal protein (in yellow), predicted from the
carbon trace using the Protein/Backbone command of the
Biopolymer module. The crystallographic backbone
structure is shown superimposed in blue. The RMS
deviation between corresponding backbone atoms of the
two structures is 0.52 Angstroms
Manipulations

Eg., Conformation tweaking
HIS_229
ASP_187
The following images are examples of this method of predicting conformations of a
few long sidechains of PDB protein 1IC6.A. In each of the following figures, the
native conformation is shown colored by element. In the left image, the predicted
rotamer (the rotamer with the lowest deltaG) is shown in white. In the right image,
all other rotamers generated by the conformational search are shown.
MODELER
MODELER uses a comparative modeling methodology to rapidly build
structural models for protein sequences without a known structure. It derives
3D protein models without the time consuming separate stages of core region
identification and loop region building or searching that are inherent to
manual homology modeling schemes.
MODELER can create a model even with only one
source protein. In this case, the structure for
dihydrofolate reductase from Lactobacillus Casei is
used to generate a model for the E. Coli protein. The
model is 2.2 Å RMS deviation from the crystal structure
of the E. Coli protein.
PROFILES – 3D
Profiles-3D offers a unique approach to structure prediction by
measuring the compatibility between protein sequences and known
protein structures, and then using this information to address the
inverse protein folding problem. Profiles-3D enables you to investigate
which particular fold an amino acid sequence is likely to adopt.
Benefits:
Profiles-3D can test the validity of a model or
preliminary structures derived from experimental data
or modeling studies.
Profiles-3D can suggest which 3D structure an amino
acid sequence is likely to adopt by relating structural
properties to amino acid sequence information.
Reference template proteins identified by Profiles-3D
can be used as input to InsightII Homology,MODELER
module.
This image shows the result of a “Profiles-3D
Verify” showing a ribbon drawing of a model of
myoglobin,where a single alpha-helix has been
purposely misfolded.Profiles-3D has detected the
misfolded region, and Insight II has automatically
created the subset that was used to color the
structure and ribbon.
MATCHMAKER
MatchMaker uses an inverse-folding method to predict the 3D structure
of a protein from its amino acid sequence.By comparing a new protein
sequence to its topology fingerprint database, MatchMaker assesses the
ability of a sequence to adopt characteristic topologies.
Even in the absence of strong sequence similarity, MatchMaker
generates high quality structural models.
Examples of MatchMaker output,
including a histogram of
sequence-structural compatibility
(upper right), a sub-optimal
alignment plot (upper left),an
energy profile (middle left), and a
prediction of structural elements
(helix/beta strand,
buried/exposed) for the input
sequence.
Simulations- ‘Discover’
Analysis


Protein characterization
Protein Comparison




Sequence-Structure-Function relationships
Active site detection
Ligand Binding mode analysis
Electrostatic analysis
Structure Analysis

Quality Check
PROTABLE
ProTable used to analyze and evaluate protein structures. ProTable creates
Ramachandran plots, assesses deviation of local geometries and side chain
rotameric states from standard protein values, and determines the energetics of
each residue.
These images show the results of a ProTable evaluation of a theoretical model of
prostatespecific antigen (2PSA).
MatchMaker energies reveals a loop (highlighted in green) that may require
further refinement. Structures (purple and blue are low probability; orange and
red are high probability). An automated Ramachandran analysis (right) identifies
backbone torsions in borderline or disallowed regions.
DELPHI
DelPhi is a powerful and versatile Poisson-Boltzmann electrostatics simulation
engine. DelPhi gives you the ability to determine the specificity of ligand-receptor
interactions which aids in accelerating drug discovery.
DelPhi calculates:
Electrostatic properties,including the effects of
bulk solvent and ionic strength for nucleic acids,
polysaccharides, and complexes such as
glycoproteins and protein/DNA.
HIV protease, rendered with an electrostatic
contour surface with a stick rendering of the drug
inside the surface. Blue is positive, red is negative
charge and gray is neutral.
Applications: Drug Discovery
SITEID
SiteID provides analysis and visualization tools leading to the
identification of potential binding sites within or at the surface of
biological targets.
Applications:
Locate ligand binding pockets on a
Macromolecule.
Identify protein-protein
interaction surfaces.
Identify constraints in a novel protein
structure for 3D database searching to
find or optimize lead compounds.
The binding pocket of dihydrofolate reductase located by SiteID and shown as a
MOLCAD surface. The red areas of the surface indicate contact atoms in the
pocket, while the yellow areas show the residues in which those atoms are
contained. The inhibitor (methotrexate) is shown in green.
STRUCTURE BASED DESIGN TOOLS
Active Site Detection: MOE uses a fast geometric algorithm, based on
Edelsbrunner’s alpha shapes, to detect candidate protein-ligand and proteinprotein binding sites. Individual sites can be visualized or populated with
“dummy atoms” for docking calculations or Starting points for de novo
ligand design efforts.
Left PDB 1AAQ (HIV-1 Protease) and the first site located by the MOE Site Finder.
Middle 1AAQ with the complexed ligand (hydroxyethylene isostere). Right
Hydroethylene isostere overlaid with calculated alpha spheres of the first site.
FLEXX
FlexX rapidly docks a conformationally flexible ligand into a binding site, using
an incremental construction algorithm that builds the ligand in the active site.
FlexX is composed of four basic
components:
Conformational flexibility.
Set of possible protein-ligand interactions.
 Scoring function for the interactions.
Algorithm for placement and incremental
growth of the ligand from a defined core.
A set of inhibitors docked into the active site of Carboxypeptidase A by FlexX. The protein
backbone and the active site surface were rendered using MOLCAD. The active site surface
is color-coded by electrostatic potential.
RACHEL
RACHEL performs automated combinatorial optimization of lead compounds
by systematically derivatizing user-defined sites on the ligand.
Applications:
Combinatorially enumerate user defined
sites on a lead scaffold to optimize binding
within a receptor
Bridge high-affinity ligand fragments
positioned within the active site
The X-ray structure of N9 influenza virus neuraminidase (2QWK) shown with five
ligands generated using RACHEL that are predicted to be active. Hydrogen bonds
between the ligands and residues are indicated by dashed yellow lines. The surface was
rendered using MOLCAD . Dark purple regions contain a greater Acceptor/donor density
and light purple regions indicate areas where hydrogen bonding is less likely to occur.
HIGH THROUGHPUT DISCOVERY TOOLS
HTS-QSAR : CCG’s unique Binary QSAR methodology is ideal for building pass/fail
models from high error content data and standard molecular descriptors. The resulting
probabilistic models (based on Bayesian statistical inference) are used as a biasing
agent in the design of focused combinatorial libraries
CHEMINFORMATICS TOOLS
Molecular Databases: The MOE Molecular Database is a disk-based
spreadsheet central to the manipulation and visualization of large collections
of compounds.Data can be imported and exported in various standard file
formats and merged with structural or biological activity data.
MOLECULAR DATABASE VIEWER
MOLECULAR DATABASE CALCULATOR
SEARCH COMPARE
Search Compare provides systematic conformational search and analysis
as well as superimposition, molecular similarity.
Using Search Compare, two angiotensin II antagonists are
flexibly superimposed based on the field similarity
(combined steric and electrostatic potentials).
UNITY
Unity locates compounds in databases that match a pharmacophore or fit
to receptor site.
Applications:
Exploration of databases for compounds
consistent with a pharmacophore hypothesis
Lead explosion by retrieving similar compounds
Virtual screening of compound databases to
discover lead compounds
Determining reagents in commercial databases
that support combinatorial chemistry synthesis
A UNITY query constructed at the active site of the streptavidin/biotin complex (1STP).
Yellow lines originate at hydrogen bonding sites of the protein (shown as spheres) and
terminate within the spatial constraint for complementary ligand sites. A surface
constraint at the protein/ligand interface is shown in green. The spatial cap in red
accounts for a bifurcated interaction with an Asp carboxyl. Partial match groups are
shown in different colors: red, yellow, or green.
CATALYST/SHAPE
Catalyst/SHAPE identifies compounds that possess similar 3D shapes to a
specified 3D conformation.
FEATURES:
•Performs flexible shape-based database
searches.
•Performs statistical analysis of shape indices
of a particular database.
•Simultaneously performs shape and
pharmacophore searches via a merged query.
Methotrexate is displayed (left: hydrogen removed) in its bound
conformation to the enzyme dihydrofolate reductase inhibitor.
On the right are 3D compounds retrieved from the Derwent’s
World Drug Index that best fit the shape of the bound
conformation of methotrexate. This shape-based 3D search was
performed with Accelrys’ Catalyst/SHAPE
HypoGen
HypoGen
Given only available experimental information such as 2D structures and
biological activities of a set of molecules, Catalyst can be used to generate
general interaction hypotheses that explain variations in activity across a set of
molecules.
Two 5HT3 antagonists (green and yellow) mapped
on to a six-feature hypothesis.
C2-LIGAND FIT
C2.LigandFit provides active site finding, flexible docking and scoring
capabilities, allowing evaluation of compounds against a receptor site
Features
• Active site search by flood filling method
• Fast conformational search for ligand in
protein cavity
• Fast grid method for evaluation of proteinligand interactions
• Clustering of docked conformers
• Multiple scoring functions
Active site identification for HIV Protease
usingC2•LigandFit flood filling technique
C2ADME TOOL
C2ADME provides computational models for the prediction of absorption,
Distribution, metabolism,and excretion (ADME) properties derived from
chemical structures.
Features:
C2•ADME provides computational
ADME/Tox prediction tools with the ability to
predict problematic New Chemical Entities at
an early stage of the development process
C2•ADME currently includes models for
passive intestinal absorption,blood-brain
barrier (BBB) penetration,and aqueous
solubility at 25°C.
Plot of Polar Surface Area (PSA) vs. LogP for a sample of
the World Drug Index (WDI) database showing the 95%
and 99% confidence limit ellipses corresponding to the
Absorption Model. The points are color coded by
Absorption level (Good,Moderate, Poor and Very Poor).
In-built utilities



Scripting- automation
Session Folders
Log files
What you should remember




…..
Good computational practices
Other users are as important as yourself
Do not use up licenses unduly
Preparation

Evaluate protocol, choice of package,
follow job submission rules
Access details




Insight/ Catalyst/ Cerius – SGI
machines- base modules- several
licenses
Tripos- SGI machines
MOE- Linux platform/ Windows/ SGI
BioSuite- Linux

Computation in Biology

Transcript Computation in Biology

Directory