Bridging Bioinformatics and Chem(o)

Download Report

Transcript Bridging Bioinformatics and Chem(o)

Bridging Bioinformatics
and Chem(o)informatics
Gary Wiggins
School of Informatics
Indiana University
[email protected]
Yan He (SLIS MLS Student)
Meredith Saba (SLIS MLS Student)
Provocative Thought
“While much bioscience is published with
the knowledge that machines will be
expected to understand at least part of it,
almost all chemistry is published purely for
humans to read.”

Murray-Rust et al. Org. Biomol. Chem. 2004,
2, 3201.
Overview of the Talk
 Review
 Review
of ACS CINF 2004 Papers
of Relevant Articles
 Public Chemistry Databases and Data
Repositories with Bioinformatics Info/Links
 Overview of Web Services
 NIH-funded Projects Underway or Planned
at Indiana University
“The Bigger Picture — Linking
Bioinformatics to Cheminformatics”

American Chemical Society Division of Chemical
Information (CINF) Symposium, Anaheim,
Spring 2004
 All-day session with 16 papers
 http://www.acscinf.org/new/docs/meetings/22
7nm/227cinfabstracts.htm
Problems from ACS CINF 2004
 Both
technical and people factors hinder
knowledge exchange between biology and
chemistry. (Lipinski)
 People Problems per Chris Lipinski


Meta data capture is complicated by people
issues, particularly those between chemists
and biologists.
Discipline-based disconnects occur
distressingly often and are frequently
overlooked as a cause of lost productivity.
Interdisciplinary Collaborations:
Biology and Chemistry

[What’s] “... important for these collaborations is,
not only do you have to accept the other guy’s
paradigm or at least live with it; you have to be
willing to accept the other guy’s foibles or your
perception of the other guy’s foibles (and
recognize the opposite of this). We each have
our own approaches to how we do science, and
it’s just different cultures.”
--Thom Kauffman interview in ACS LiveWire, March 2005, 7.3.
http://pubs.acs.org/4librarians/livewire/2006/7.3/profile.html
Some Questions from the ACS
CINF 2004 Symposium
 "Find
all proteins related to protein A (i.e.
within a given path length of A) in a protein
interaction graph, and retrieve related
assay results and compound structures.”
 “Find all pathways where compound X
inhibits or slows a reaction, and retrieve
Gene Ontology classifications for all
proteins involved in the reaction.”
Problems from ACS CINF 2004

Commercial vs. public data
 Batch mode data processing possible in biology,
but primitive in chemistry
 Primary HTS data has a very high noise factor
 Data format standardization problem


Chemoinformatics and bioinformatics use completely
different data formats and analysis tools
Chemical and protein sequence information has
been largely analyzed separately
Solutions from ACS CINF 2004

Linking biological and chemical information in
computational approaches to predict biological
activity, ADME profiles, and adverse drug
reactions (ADR)
 Energetics of binding for more accurate and
sensitive chemical representation of DNAprotein interactions
 A discovery informatics platform that facilitates
archival, sharing, integration, and exploration of
synthetic methods and biological activity data
Solutions from ACS CINF 2004
 Data
pipelining approach makes it
possible to apply bioinformatics and
chemoinformatics data and analyses
together.
 Visualizations are the best way for people
to understand data.
Solutions from ACS CINF 2004

Cabinet (Chemical And Biological Information
NETwork, formerly Fedora) servers include






Metabolic pathway network chart (Empath)
Protein-Ligand Association Network (Planet)
Enzyme Commission Codebook (EC Book)
Traditional Chinese Medicines (TCM)
World Drug Index (WDI), and others.
Built on the Daylight HTTP toolkit
 http://www.metaphorics.com/products/cabinet.ht
ml
Overview of the Talk
 Review
of ACS CINF 2004 Papers
 Review
 Public
of Relevant Articles
Chemistry Databases and Data
Repositories with Bioinformatics Info/Links
 Overview of Web Services
 NIH-funded Projects Underway or Planned
at Indiana University
What is Chemoinformatics?
(Brown)
 “…the
essence of chemoinformatics is
integration and focus rather than its
components, which are independent
disciplines.”
 Supporting disciplines:



Chemical information
Computational chemistry
Chemometrics
Chemoinformatics and Disease
Toolkits as Integrators (Brown)
 Companies
such as Daylight, Advanced
Visual Systems, OpenEye, and SciTegic
provide integration systems for:




Statistical methods
Text mining
Computational chemistry
Visualization
Genego’s MetaDrug Product
 Toxicogenomics
platform for the prediction
of human drug metabolism and toxicity of
novel compounds
 Enables the visualization of pre-clinical
and clinical high-throughput data in the
context of the complete biological system
 Integrates chemical, biological, and protein
function data
 http://www.genego.com/
BioWisdom
 Examination
of vast amounts of available
information using its Sofia KnowledgeScan
methodology
 SRS data integration platform

http://www.biowisdom.com/
Lessons from Hip Hop (Salamone)
 Mashup

technique
Bring together disparate informatics,
biological, chemical, and imaging information
when conducting research
 Example
of an integration tool:
iSpecies.org

A search for a species returns a page with
NCBI genomics information, Yahoo images of
the species, and articles culled from Google
Scholar
iSpecies.org Search
 For
mus musculus
Chemogenomics and
Chemoproteomics (Gagna)

Chemogenomics (def.)—The description of all
potential drugs that can be used against all
possible target sites, OR the actions of targetspecific chemical ligands and how they are used
to globally examine genes
 Chemoproteomics (def.)—Uses chemistry to
characterize protein structure and functions
 They are “. . . a form of chemical biology brought
up to date in the area of genome and proteome
analysis.”
New Interdisciplinary Journals








ACS Chemical Biology (ACS)
ChemBioChem; A European Journal of
Chemical Biology (Wiley/VCH)
Chemical Biology and Drug Design (Blackwell)
JBIC; Journal of Biological and Inorganic
Chemistry (Springer)
Journal of Biochemical and Molecular Toxicology
(Wiley)
Molecular Biosystems (RSC)
Nature Chemical Biology (Nature Publishing)
Organic & Biomolecular Chemistry (RSC)
Open Source Software
(Geldenhuys)

Log P calculator from Interactive Analysis


University of Utah’s Computational Science and
Engineering Online



http://www.logp.com
Can submit jobs for molecular mechanics, quantum
chemical calculations, and biomolecular interfaces for
viewing PDB files
http://www.cse-online.net
Virtual Computational Chemistry Laboratory

http://www.vcclab.org
The Blue Obelisk (Guha)
 Several
open chemistry and
chemoinformatics projects that have
pooled forces to enhance interoperability
 Maintain:


Chemoinformatics Algorithms Dictionary
Data Repository for standardized data for
chemical properties and other facts (e.g.,
mass)
 http://www.blueobelisk.org/
BlueObelisk.org

Working collaboratively on projects such as:









Chemistry Development Kit (CDK)
JChemPaint
Jmol
JUMBO
NMRShiftDB
Octet
Open Babel
QSAR
World Wide Molecular Matrix (WWMM)
Barriers to the Use of Open Source
Software
 Unix
command line
 Problem: Lack of known standards and
datasets of compounds for validation, e.g.,
in docking programs
Lessons from the Human Genome
Project (Austin)

Keys to success in the HGP were:



Comprehensiveness
Commitment to open access to the sequence as a
research tool without encumbrance
Proposed tools for a “genome functionation
toolbox”:



Whole-genome transcriptome and proteome
characterization
Development of small inhibitory RNAs (siRNAs) and
knockout mice for every gene
Small molecules and the druggable genome
ChemDB
http://cdb.ics.uci.edu/CHEM/Web/
ChEBI, Chemical Entities of
Biological Interest
 Dictionary
of molecular entities focused on
small chemical compounds
 Features an ontological classification,
showing the relationships between
molecular entities or classes of entities
and their parents and/or children
Vioxx Entry in ChEBI
The IUPAC International Chemical
Identifier (InChI)

Open source, non-proprietary, public-domain identifier
for chemicals
 String of characters that uniquely represent a molecular
substance
 Independent of the way the chemical structure is drawn
 Enables reliable structure recognition and easy linking of
diverse data compilations
 Accepts as input MOLfiles (or SDfiles) and CML files
 Download the program to your computer at:

http://www.iupac.org/inchi/license.html
Generation of InChI for Vioxx with
wInChI
Vioxx Entry in PubChem
Compounds Found with InChI
Vioxx Bioassay Data in PubChem
Vioxx PubChem Link to External
Sources of Information
The Elsevier MDL/NIH Link via
PubChem and DiscoveryGate

Cross-indexes PubChem to the Compound
Index hosted on Elsevier MDL’s DiscoveryGate
platform
 MDL added 5 million structures from PubChem
to their index, resulting in over 14 million unique
chemical structures
 Links go both ways

Can move from biological data in PubChem to
bioactivity, chemical sourcing, synthetic methodology,
and EHS data in DiscoveryGate sources
Elsevier MDL’s xPharm
 Comprehensive




set of records linking:
Agents (compounds) (2300)
Targets (600)
Disorders (450)
Principles that govern their interactions (180)
 Answers
questions such as:
• What targets are associated with control of blood
pressure?
• What adverse effects are associated with
monoamine oxidase inhibitors?
Text Datamining (Banville)
“In the pharmaceutical field, it is ideally the
marriage of biological and chemical information
that needs to be the ultimate focus of text data
mining applications.”
 Problems:




Lack of universal publication standards for identifying
each unique chemical entity
Selective indexing policies of A&I services
Need to understand how chemical structures link to
biological processes
Chemical Datamining Software

SureChem


CLiDE



http://surechem.reeltwo.com/
Recognizes structures, reactions, and text
http://www.simbiosys.ca/clide/
OSCAR

“OSCAR1” to check experimental data
• http://www.ch.cam.ac.uk/magnus/checker.html
• http://www.rsc.org/Publishing/ReSourCe/AuthorGuidelines/AuthoringTools/E
xperimentalDataChecker/

CSR (Chemical Structure Reconstruction)


http://www.scai.fraunhofer.de/uploads/media/MZ-ERCIM05_04.pdf
MDL DocSearch—combines MDL’s Isentris platform and EMC’s
Documentum
Overview of the Talk
 Review
of ACS CINF 2004 Papers
 Review of Relevant Articles
 Public
Chemistry Databases and
Data Repositories with
Bioinformatics Info/Links
 Overview
of Web Services
 NIH-funded Projects Underway or Planned
at Indiana University
Themes from SwissProt’s 20th
Anniversary Conference,
“In silico Analysis of Proteins”
 Knowledgebases,
databases and other
information resources for proteins
 Sequence searches and alignments
 Protein sequence analysis
 Protein structure prediction, analysis and
visualization
 Proteomics data analysis
Chemoinformatics Databases
(Jónsdóttir)
 Lists
databases relevant to drug discovery
and development, including:





General databases
DBs for screening compounds
DBs for medicinal agents
DBs with ADMET properties
DBs with physico-chemical properties
 Curiously
Abstracts
does not mention Chemical
Databases with Protein and Ligand
Information (Jónsdóttir)
 Protein


Data Bank
Target Registration Database
Relibase—uses structural info to analyze
protein-ligand interactions; Relibase+ for
protein-protein interaction searching
 Cambridge
Structural Database
 KEGG LIGAND DB for enzyme reactions

http://www.genome.ad.jp/ligand
Other Databases with Protein and
Ligand Information
 SitesBase--a
database of known ligand
binding sites within the PDB

http://www.bioinformatics.leeds.ac.uk/sb/main.
html
 Binding

http://www.bindingmoad.org/
 sc-PDB

MOAD
(Kellenberger)
http://bioinfo-pharma.ustrasbg.fr:8080/scPDB/index.jsp
sc-PDB
http://bioinfo-pharma.u-strasbg.fr:8080/scPDB/index.jsp
Isatin Search on sc-PDB
Other Databases with ProteinProtein Interaction Data (Jónsdóttir)
 YPD,
Yeast Proteome Database (for
proteins from S. cerevisiae)

http://www.biobase.de/pages/index.php?id=139
 Human

Protein Reference Database
http://www.hprd.org/
 BIND,
Biomolecular Interaction Network
Database (ceased as of 11/16/2005?)

http://www.bind.ca/Action
International Molecular Exchange
(IMEx) Consortium
http://imex.sourceforge.net/





BIND (http://www.blueprint.org) The Blueprint Initiative
AsiaPte. Ltd, Singapore and The Blueprint Initiative
North America,Toronto Canada
DIP (http://dip.doe-mbi.ucla.edu) UCLA-DOE Institute for
Genomics & Proteomics
IntAct (http://www.ebi.ac.uk/intact), EMBL–European
Bioinformatics Institute, Hinxton, UK;
MINT (http://mint.bio.uniroma2.it/mint/) University of
Rome “Tor Vergata”, Rome Italy
MPact (http://mips.gsf.de/genre/proj/mpact), MIPS /
Institute for Bioinformatics, Munich, Germany.
Protein Sites from IU I533 Students
and others

LigandDepot—integrated source for small molecules


PSIPRED Protein Structure Prediction Server


http://bioinf.cs.ucl.ac.uk/psipred/
DSSP--a database of secondary structure assignments
(and much more) for all protein entries in the PDB


http://ligand-depot.rutgers.edu/index.html
http://swift.cmbi.ru.nl/gv/dssp/
Dr. Predrag Radivojac’s I690 class on Structural
Bioinformatics

http://www.informatics.indiana.edu/predrag/2006springi690/2006
springi690.htm
Protein Secondary Structure
Prediction
 Methods




Neural Network
Rule Based
Other Machine Learning
Homology Based
Protein Secondary Structure
Prediction Software
 PredictProtein
http://www.predictprotein.org/
Chou-Fasman
http://fasta.bioch.virginia.edu/fasta_www/chofas
.htm

 NN

Predict
http://www.cmpharm.ucsf.edu/~nomi/nnpredict.
html
Structure-Based Docking Methods
 Method




Scans many small molecules and “docks”
them to a site of interest on a protein structure
Predicts free energy of binding
Filters thousands of compounds relatively
quickly
Top hits can be used for more rigorous
computational/experimental characterization
and optimization
Structure-Based Docking Methods

DOCK


http://dock.compbio.ucsf.edu/
Accelrys’s Insight (built on DOCK)
• http://www.accelrys.com/products/insight/

FlexX


Glide


http://www.biosolveit.de/FlexX/
http://www.schrodinger.com/ProductDescription.p
hp?mID=6&sID=6
GOLD

http://www.ccdc.cam.ac.uk/products/life_sciences
/gold/
Useful Structure Databases
 ModBase

http://modbase.compbio.ucsf.edu/modbase-cginew/search_form.cgi
 Dali
Database (Fold classification; based on
PDB)

http://ekhidna.biocenter.helsinki.fi/dali/start
 Protein
Structure Analysis, Comparison, &/or
Classification [Guide]

http://www.bio.vu.nl/nvtb/Structures.html
SCOP, Structural Classification of
Proteins
 Curated
database of structural and
evolutionary relationships

All known protein folds (v. 1.69, July 2005)
• 70,859 domains organized into 2,845 families,
1,539 superfamilies, and 945 folds

Detailed information about close relatives
 Links
to coordinates, images of structures,
interactive viewers, and literature
references

http://scop.mrc-lmb.cam.ac.uk/scop/
SCOP Search Options
 Homology
search yields a list of structures
with significant levels of sequence
similarity
 Keyword search matches words in SCOP
and PDB
CATH Protein Structure
Classification

Like SCOP, structured hierarchically by:





Class (determined by secondary structure)
Architecture (overall shape, e.g., barrel, sandwich, roll, etc.) – no
equivalent in SCOP
Topology (grouped into fold families based on overall shape and
connectivity of secondary structures)
Homologous Superfamily (domains thought to share a common
ancestor)
As of January 2005, had 43,229 domains classified into
1,467 superfamilies and 5,107 sequence families; A
protein family database (CATH-PFDB) contained a total
of 616,470 domain sequences classified into 23,876
sequence families
• http://cathwww.biochem.ucl.ac.uk/latest/index.html
CATH Search Options
 Can
browse or search the classification by
CATH code
 CATH codes can be used to search other
databases, e.g., DHS, Gene3D, and
Impala
Gasteiger’s Biochemical Pathways
Database






Database of biochemical pathways that represents
chemical structures and reactions on the atomic level
Gives access to each atom and bond of the substrates of
enzyme reactions
Allows the study of transition state hypotheses of
enzyme reactions
Analysis of the physicochemical effects operating at the
reaction site allows a classification of enzyme reactions
that goes beyond the traditional EC code for enzymes.
1533 biochemical molecules and 2175 reactions
http://www2.chemie.unierlangen.de/services/biopath/index.html
A Gene Expression Database for
NCI60 (Scherf)
 Published
in Nature Genetics, 2000
 First study to integrate gene expression
with molecular pharmacology databases
 Gene expression profiles for NCI60
assessed using microarray technology
 Gene-drug relationships investigated by
how the gene transcription levels vary with
respect to drug activities
Correlation Matrix Between Drug
Activity and Gene Expression
Other Relevant Databases/Servers

Each year Nucleic Acids
Research publishes a
Database Issue in January and
a Web Server Issue in July
(See refs in Bibliography
section). Examples from the
most recent issues:
Databases
KEGG
PDB
PINT
MutDB
GLIDA
DrugBank
Servers
BASys
BRIDGEP
SCRATCH
Glyprot
I2I-SiteEng
PatchDock
SPACE
SymmDock
DeNovoID
Overview of the Talk
 Review
of ACS CINF 2004 Papers
 Review of Relevant Articles
 Public Chemistry Databases and Data
Repositories with Bioinformatics Info/Links
 Overview
 NIH-funded
of Web Services
Projects Underway or Planned
at Indiana University
Web Services Overview
 What

are “Web Services”?
A distributed invocation system built on Grid
computing
• Independent of platform and programming
language
• Built on existing Web standards

A service oriented architecture with
• Interfaces based on Internet protocols
• Messages in XML (except for binary data
attachments)
Service-Oriented Architecture

From Curcin et al.
DDT, 2005,
10(12),867
Web Services for Chemistry:
Problems

Performance and scalability
 Proprietary data
 Competition from high-performance desktop
applications
-- Geoff Hutchison, it’s a puzzle blog, 2005-01-05

ALSO:


Lack of a substantial body of trustworthy Open
Access databases
Non-standard chemical data formats (over 40 in
regular use and requiring normalization to one
another)
Overview of the Talk
 Review
of ACS CINF 2004 Papers
 Review of Relevant Articles
 Public Chemistry Databases and Data
Repositories with Bioinformatics Info/Links
 Overview of Web Services
 NIH-funded
Projects Underway or
Planned at Indiana University
Indiana University Planned
Projects:
http://www.chembiogrid.org

Design of a Grid-based distributed data
architecture
 Development of tools for HTS data analysis and
virtual screening
 Database for quantum mechanical simulation
data
 Chemical prototype projects



Novel routes to enzymatic reaction mechanisms
Mechanism-based drug design
Data-inquiry-based development of new methods in
natural product synthesis
Web Services for Chemistry at IU
Purpose
Technologies
Interaction Layer
Interactive software for
creative access and
exploitation of information
by humans
Microsoft .NET Smart
Clients, portlets, Java
applets, email and browser
clients, visualization
technologies
Aggregation Layer
Workflows and data
schemas customized for
particular domains,
applications and users
BPEL, Taverna and other
workflow modeling tools,
aggregate web services
Web service layer
Comprehensive data and
computation provision
including storage,
calculation, semantics and
meta-data exposed as web
services
Apache web services,
SOAP wrappers, WSDL,
UDDI, XML,
Microsoft .NET
NCI Developmental Therapeutics
Program (DTP)
 Downloadable






data:
In vitro 60 cell line results
in vitro anti-HIV results
Yeast assay
200,000+ chemical structures
molecular targets
microarray data
 Or
search the database at:
• http://dtp.nci.nih.gov/docs/dtp_search.html
IU Database of NIH DTP Data

Contains over 200,000 chemical structures
tested in 60 cellular assays from different human
tumor cell lines
 Also includes microarray assay profiles for the
untreated cell lines (~14,000 datapoints)
 A local PostgreSQL database containing the
data that is exposed as a web service
 Using workflows and complex SQL queries, we
can do advanced data mining that exploits the
chemical, biological and genomic information for
particular audiences (chemists, biologists, etc)
Mining the NIH DTP database
60 cell lines
~200,000
compounds
Cell lines can be clustered based on gene expression similarity
Compounds can be clustered based on similarity of profile
across cell lines, or by chemical structure fingerprint similarity
Use of Taverna at IU







A protein implicated in tumor growth is supplied to the docking
program (in this case HSP90 taken from the PDB 1Y4 complex)
The workflow employs our local NIH DTP database service to
search 200,000 compounds tested in human tumor cellular assays
for similar structures to the ligand.
Client portlets are used to browse these structures
Once docking is complete, the user visualizes the high-scoring
docked structures in a portlet using the JMOL applet.
Similar structures are filtered for drugability, and are automatically
passed to the OpenEye FRED docking program for docking into the
target protein.
A 2D structure is supplied for input into the similarity search (in this
case, the extracted bound ligand from the PDB IY4 complex)
Correlation of docking results and “biological fingerprints” across the
human tumor cell lines can help identify potential mechanisms of
action of DTP compounds
Taverna Workflow
Workflow definition
Available web services
(WSDL)
Visual depiction of workflow
Taverna in Action
Overall Workflow
Pre-Closing Quote
 “There
is not going to be a ‘voila’ moment
at the computer terminal. Instead, there is
systematic use of wide-ranging
computational tools to facilitate and
enhance the drug discovery process.”

Jorgensen. Science, March 19, 2004, 303,
1814.
Closing quote
“The future of chemistry depends on the
automated analysis of chemical
knowledge, combining disparate data
sources in a single resource, such as the
World-Wide Molecular Matrix, which can
be analysed using computational
techniques to assess and build on these
data.”

Townsend et al. Org. Biomol. Chem. 2004, 2,
3299.
Post-closing quote: zzzzzCAS
 “In
an industry first, Chemical Abstracts
Service (CAS) has unveiled a
revolutionary new literature searching tool
which will permit scientists to search and
retrieve the world’s chemical literature—
including patents and obscure technical
reports—in their sleep.”
--Author unknown
Acknowledgements
 Randy Arnold
 Xiao
Dong
 Sean Mooney
 Peter Murray-Rust
 David J. Wild
 I533 Chemical Informatics Seminar
Students
 Elsevier Science
Bibliography: Articles, Books, and
Conference Papers





“The Bigger Picture: Linking Bioinformatics to Cheminformatics”
[CINF Symposium] Abstracts [1-16], 227th ACS National Meeting
Anaheim, CA, March 28-April 1, 2004
http://www.acscinf.org/new/docs/meetings/227nm/227cinfabstracts.h
tm
Austin, C.P. “The completed human genome: implications for
chemical biology.” Current Opinion in Chemical Biology 2003, 7,
511-515.
Bajorath, Jürgen, ed. Chemoinformatics: concepts, methods, and
tools for drug discovery. Totowa, N.J. : Humana Press, c2004.
(Methods in molecular biology ; v. 275)
Banville, Debra L. “Mining chemical structural informationo from the
drug literature.” Drug Discovery Today January 2006, 11(1/2), 35-42.
Brown F. “Editorial opinion: chemoinformatics - a ten year update.”
Current Opinion in Drug Discovery and Development 2005 May;
8(3): 298-302.
Bibliography: Articles (cont’d)





Coles, Simon J.; Day, Nick E.; Murray-Rust, Peter; Rzepa, Henry S.;
Zhang, Yong. “Enhancement of the chemical semantic web through
InChIfication.” Organic & Biomolecular Chemistry 2005, 3, 18321834.
Curcin, Vera; Ghanem, Moustafa; Guo, Yike. "Web services in the
life sciences." Drug Discovery Today 2005, 10(12), 865-871.
Gagna CE, Winokur D, Clark Lambert W. “Cell biology,
chemogenomics and chemoproteomics.” Cell Biol Int. 2004; 28(11):
755-64.
Geldenhuys, W.J.; Gaasch, K.E.; Watson, M.; Allen, D.D.;Van Der
Schyf, C.J. “Optimizing the use of open-source software applications
in drug discovery.” Drug Discovery Today February 2006, 11(3/4),
127-132.
Guha, R.; Howard, M.T.; Hutchison, G.R.; Murray-Rust, P.; Rzepa,
H.; Steinbeck, C; Wegner, J.; Willighagen, E.L. “The Blue Obelisk—
Interoperability in chemical informatics.” Journal of Chemical
Information and Modeling 2006 Web Release Date: 22-Feb-2006;
DOI: 10.1021/ci050400b
Bibliography: Articles (cont’d)






Jónsdóttir, S.O.; Jorgensen, F.S.; Brunak, S. “Prediction methods
and databases within chemoinformatics: emphasis on drugs and
drug candidates.” Bioinformatics 2005 May 15; 21(10): 2145-60.
Jorgensen, William L. “The many roles of computation in drug
discovery.” Science March 19, 2004, 303, 1813-1818.
Kauffman, Thom. “Profile.” [interview] LiveWire, March 2005, 7.3;
http://pubs.acs.org/4librarians/livewire/2006/7.3/profile.html
Murray-Rust, Peter S.; Mitchell, John B.O.; Rzepa, Henry S.
“Communication and re-use of chemical information in bioscience.”
BMC Bioinformatics 2005, 6, 180.
Murray-Rust, Peter; Mitchell, John B.O.; Rzepa, Henry S.
“Chemistry in bioinformatics.” BMC Bioinformatics 2005, 6, 141-144.
Povolna, Vera; Dixon, Scott; Weininger, David. “Cabinet—Chemical
and Biological Informatics NETwork.” in: Oprea, Tudor I., ed.
Chemoinformatics in Drug Discovery. Weinheim: Wiley-VCH, 2004,
241-269.
Bibliography: Articles (cont’d)




Salamone, Salvatore. “Hip Hop offers lessons on life sciences data
integration.” Bio-IT World February 2006, 36.
Scherf Uwe, Ross Douglas T., Waltham Mark, Smith Lawrence H.,
Lee Jae K., Tanabe Lorraine, Kohn Kurt W., Reinhold William C.,
Myers Timothy G., Andrews Darren T., Scudiero Dominic A., Eisen
Michael B., Sausville Edward A., Pommier Yves, Botstein David,
Brown Patrick O., Weinstein John N. “A gene expression database
for the molecular pharmacology of cancer.” Nature Genetics 2000,
24, 236-244.
Souchelnytskyi, S. "Bridging proteomics and systems biology: What
are the roads to be traveled?" Proteomics 2005 (November), 5(16),
4123-4137.
Tetko, Igor V. “Computing chemistry on the web.” Drug Discovery
Today November 2005, 10(22), 1497-1500.
Bibliography: Articles (cont’d)


Zimmermann, Marc; Thi, Le Thuy Bui; Hofmann, Martin. “Combating
illiteracy in chemistry: Towards computer-based chemical structure
reconstruction.” ERCIM News January 2005, 60, 40-41.

http://www.scai.fraunhofer.de/uploads/media/MZERCIM05_04.pdf
Zimmermann, Marc; Fluck, Juliane; Thi, Le Thuy Bui; Kolarik,
Corinna; Kumpf, Kai; Hofmann, Martin. “Information extraction in the
life sciences: Perspectives for medicinal. chemistry, pharmacology
and toxicology.” Current Topics in Medicinal Chemistry 2005, 5(8),
785-796.
Bibliography: Databases




Andreeva, A.; Howorth, D.; Brenner, S.E.; Hubbard, T.J.P.; Chothia,
C.; Murzin, A.G. “SCOP database in 2004: refinements integrate
structure and sequence family data.” Nucleic Acids Research 2004,
32 Database issue D226-D229 doi: 10.1093/nar/gkh039
Chen J, Swamidass SJ, Dou Y, Bruand J, Baldi P. “ChemDB: a
public database of small molecules and related chemoinformatics
resources.” Bioinformatics. 2005 Nov 15; 21(22): 4133-9.
Dunkel, M.; Fullbeck, M.; Neumann, S.; Preissner, R. “SuperNatural:
a searchable database of available natural compounds.” Nucleic
Acids Research 2006, 34, Database issue D678-D683 doi:
10.1093/nar/gkj132
Gold, Nicola D.; Jackson, Richard M. “A searchable database for
comparing protein-ligand binding site for the analysis of structurefunction relationships.” Journal of Chemical Information and
Modeling 2006, 46(2), 736-742.
Bibliography: Databases (cont’d)





Kanehisa, M.; Goto, S.; Hattori, M.; Aoki-Kinoshita, F. Itoh, M.;
Kawashima, S.; Katayama, T.; Araki, M; Hirakawa, M. “From
genomics to chemical genomics: new developments in KEGG.”
Nucleic Acids Research 2006, 34, Database issue D354-D357. doi:
10:1093/nar/gkj102.
Kellenberger, Esther; Muller, Pascal; Schalon, Clarire; Bret,
Guillaume; Foata, Nicolas; Rognan, Didier. “sc-PDB: An annotated
database of druggable binding sites from the Protein Data Bank.”
Journal of Chemical Information and Modeling 2006, 46(2), 717-727.
Kirwin, J.J.; Shoichet, B.K. “ZINC—A free database of commercially
available compounds for virtual screening.” Journal of Chemical
Information and Modeling 2005, 45, 177-182.
Kouranov, A.; Xie, L. de la Cruz, J.; Chen, L.; Westbrook, J.; Bourne,
P.E.; Berman, H.M. “The RCSB PDB information protal for structural
genomics.” Nucleic Acids Research 2006, 34, Database issue D302D305 doe: 10:1093/nar/gkj120
Kumar, M.D.S.; Gromiha, M.M. “PINT: Protein-protein interactions
thermodynamic database.” Nucleic Acids Research 2006, 34
Database issue D195-D198 doi: 10.1093/nar/gkj017
Bibliography: Databases (cont’d)




Lo Conte, L.; Brenner, S.E.; Hubbard, T.J.P.; Chothia, C.; Murzin,
A.G. “SCOP database in 2002: refinements accommodate structural
genomics.” Nucleic Acids Research 2002, 30(1): 264-267.
Murzin, A.G.; Brenner, S.E.; Hubbard, T.; Chothia, C. “SCOP: A
structural classification of proteins database for the investigation of
sequences and structures.” Journal of Molecular Biology 1995, 247,
536-540.
Okuno, Y.; Yang, J.; Taneishi, K.; Yabuuchi, H.; Tsujimoto, G.
“GLIDA: GPCR-ligand database for chemical genomic drug
discovery.” Nucleic Acids Research 2006, 34, Database issue D673D677 doi: 10.1093/nar/gkj028.
Pearl F, Todd A, Sillitoe I, Dibley M, Redfern O, Lewis T, Bennett C,
Marsden R, Grant A, Lee D, Akpor A, Maibaum M, Harrison A,
Dallman T, Reeves G, Diboun I, Addou S, Lise S, Johnston C, Sillero
A, Thornton J, Orengo C. The CATH Domain Structure Database
and related resources Gene3D and DHS provide comprehensive
domain family information for genome analysis.” Nucleic Acids
Research. 2005, 33 Database Issue D247-D251.
Bibliography: Databases (cont’d)


Wheeler, D.L. et al. “Database resources of the National Center for
Biotechnology Information.” Nucleic Acids Research 2006, 34
Database Issue D173-D180 doi: 10.1093/nar/gkj158
Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard
P, Chang Z, Woolsey, Jennifer. “DrugBank: a comprehensive
resource for in silico drug discovery and exploration.”
Nucleic Acids Res. 2006 Jan 1;34(Database issue): D668-72.
Biotech Validation Suite for Protein
Structures
 Send
the server a PDB file
 Server provides a comprehensive check of
the protein, including:



Atomic volume analysis
Full geometric analysis
NMR restraint data
 http://biotech.ebi.ac.uk:8400/
Knowledge-Driven Bioinformatics
Enhanced with Chemistry
ToxTree
 An
in silico toxicology prediction suite
 Based on the CDK toolkit
 Built on CML
 Released as OpenSource under the GPL
 Standalone PC software
 User Manual:
http://ecb.jrc.it/DOCUMENTS/QSAR/TOX
TREE/toxTree_user_manual.pdf
Tools for Genomic and Proteomic
Scientists vis-à-vis Cell Biology
(Gagna et al.)

Tools to fully exploit the techniques in cellular
biology




Light microscopy for high resolution images
Fractionation of cells into basic components via
ultracentrifugation
Analysis of individual cells through flow cytometry
LCM, normal and diseased TMAs (tissue
microarrays), quantitative computer image analysis,
cell micromanipulation, and high-throughput
microscopy
InChI Generation on the Web
 The
following websites provide the facility
to generate InChIs:


www.acdlabs.com/download/chemsk.html
ACD/Labs' freely available structure-drawing
program ChemSketch includes the facility to
generate InChIs from drawn structures.
pubchem.ncbi.nlm.nih.gov/edit/
PubChem Server Side Structure Editor v1.8
includes a facility for generating InChIs as you
draw the structure.
Advances in Macromolcular
Crystallography by CCG
 More

protein structures available now
Use of 3D info in bioinformatics makes
functional inferences more dependable
• CCG Structural Family Database distributed with
MOE



Includes fold detection methodology to ID structurally
similar proteins
Simultaneous sequence and structural alignment of large
collections of proteins
3D structural family analysis for insight into conserved
geometry, water molecules, salt bridges, hydrogen
bonds, hydrophobic contacts, and disulfide bonds
CCG’s Cheminformatics Offerings
 MOE
Molecular Database
 Molecular Descriptors calculated and used
for classification, clustering, filtering, and
predictive model construction
 QSAR/QSPR Predictive Modeling
 Diversity and Similarity Searching
 High Throughput Conformational Search
 3D Pharmacophore Search
Components of the Semantic Web
for Chemistry






XML – eXtensible Markup Language
RDF – Resource Description Framework
RSS – Rich Site Summary
Dublin Core – allows metadata-based
newsfeeds
OWL – for ontologies
BPEL4WS – for workflow and web services

Murray-Rust et al. Org. Biomol. Chem. 2004, 2, 31923203.
Web Services Integration Projects:
Biosciences
 myGrid

http://www.mygrid.org.uk/
 BIOPIPE

http://biopipe.org/
 BioMOBY

http://biomoby.org/
BIOT 2006

Major themes, areas and suggested topics include
















- Bio-molecular and Phylogenetic Databases
- Molecular Evolution and Phylogenetic analysis
- Drug Delivery Systems
- Bio-Ontology and Data Mining
- Sequence Search and Alignment
- Microarray Analysis
- System Biology
- Pathway analysis
- Identification and Classification of Genes
- Protein Structure Prediction and Molecular Simulation
- Functional Genomics
- Proteomics
- Tertiary structure prediction
- Drug Docking
- Gene Expression Analysis
- Biomedical Imaging
Proteomics: What is it?

Proteomics is the study of protein expression,
regulation, modification, and function in living
systems for understanding how living systems
use proteins. Using a variety of techniques,
proteomics can be used to study how proteins
interact within a system, or how proteins change
due to applied stresses.
 Requires advanced measurement techniques,
especially separations and mass spectrometry
Proteomics Needs Informatics for:
 Locating
peaks in 2 or more dimensions
 MS/MS spectra interpretation
 Protein/Peptide quantification
 Peptide detectability
 Experimental data  Biological
information



enzyme or pathway regulation
disease susceptibility
drug efficacy