The University PowerPoint Template
Download
Report
Transcript The University PowerPoint Template
A bibliometric analysis of
chemoinformatics
Presented at the 25th Anniversary Meeting of the Molecular
Graphics and Modelling Society, School of Oriental and
African Studies, London 13th March 2007
Peter Willett, University of Sheffield, UK
Overview of talk
• Bibliometrics
• Chemoinformatics
• Growth of the subject
• Subject coverage
• Author productivity
• The Journal of Molecular Graphics (and
Modelling)
Bibliometrics: what is it?
• Bibliometrics is:
• “The application of mathematical and statistical
methods to books and other media” (A. Pritchard
(1969), Statistical bibliography or bibliometrics?, J.
Docum., Vol. 25, pp. 348-349)
• “The study, or measurement, of texts and information”
(Wikipedia)
• See also:
• Webometrics
• “the study of the quantitative aspects of the construction and
use of information resources, structures and technologies on
the Web drawing on bibliometric and informetric approaches"
(L. Björneborn and P. Ingwersen (2004), Toward a basic
framework for webometrics. J. Amer. Soc. Inf. Sci. Technol.,
Vol. 55, pp. 1216-1227)
• Cybermetrics, informetrics, scientometrics
Bibliometrics: subjects of
study
• Bibliometric distributions
• Highly skewed frequency distributions (Bradford,
Lotka, Zipf) and their implications
• Citation analysis
• Analysis of individuals, institutions and journals
Use as performance indicators for the evaluation of research
• Philosophy of science
Subject coverage
Academic collaborations
• Now extension to linkages between Web sites
Sitations, cf citations
From chemical documentation
to chemoinformatics
• Chemical documentation is long established
• Chemisches Journal started in 1778
• Chemical Abstracts started in 1907
• First computer-based information systems
and services in Sixties
• Chemical Titles in 1961
• Morgan and Sussenguth algorithms in 1965
• Recent emergence of chemoinformatics
• M. Hann and R. Green (1999), Chemoinformatics
- a new name for an old problem?, Curr. Opin.
Chem. Biol., Vol. 3, pp. 379-383.
Chemoinformatics:
definitions
• “The use of information technology and management has
become a critical part of the drug discovery process.
Chemoinformatics is the mixing of those information resources
to transform data into information and information into
knowledge for the intended purpose of making better decisions
faster in the area of drug lead identification and optimization”
F.K. Brown (1998), Chemoinformatics: What is it and how does
it impact drug discovery?, Ann. Reports Med. Chem., Vol. 33,
pp. 375-384
• Take 1998 as the starting point for the bibliometric analyses
• Many alternatives, e.g.
• “Chem(o)informatics is a generic term that encompasses the
design, creation, organization, management, retrieval, analysis,
dissemination, visualization and use of chemical information”
G
Paris (August 1999 ACS meeting), quoted by W.A. Warr at
http://www.warr.com/warrzone.htm
• “Chemoinformatics is the application of informatics methods to
solve chemical problems” J. Gasteiger and T. Engels (2003),
Chemoinformatics: a Textbook, Wiley-VCH.
Bibliometric studies in
chemoinformatics
• Onodera (2001)
• Analysis of the subject coverage of Journal of
Chemical Information and Computer Sciences
• Redman et al. (2001)
• Applications of the Cambridge Structural Database
• Bishop et al. (2003)
• Citations to Sheffield chemoinformatics research
• Warr (2005)
• Most cited papers in Journal of Chemical Information
and Computer Sciences
• Behrens and Luksch (2006)
• Contents of the Inorganic Crystal Structure Database
Data sources for
bibliometric research
• Web of Knowledge (WOK)
• Long established as the data source for bibliometric
analyses
• Recent addition of analysis tools (Analyse Results
and Citation Reports)
• Probably still the most comprehensive
• New sources
• Google
Google Scholar restricted to the scholarly literature
• Scopus
New service from Elsevier, offering similar facilities
to WOK
What shall we call it?
Term or phrase
Google
Google
Scholar
WOK
Scopus
Chemical documentation
695000
66
1
34
Chemical informatics
50,400
129
20
39
Chemical information
management
978
42
4
28
Chemical information
science
779
17
2
5
Chemiinformatics
2230
2
2
2
Cheminformatics
320,000
447
83
250
Chemoinformatics
191,000
5636
99
473
Google postings from
http://www.molinspiration.com/chemoinformatics.html
350000
300000
250000
200000
150000
100000
50000
20
01
20
02
20
03
20
04
20
05
20
06
0
Chemo*
Chem*
• WOK search of the title, keyword and abstract
fields for:
• chemoinformatics OR cheminformatics OR “chemical
informatics”
• This search retrieved 197 records for the period
1998-2006 in 87 different sources
• Of these, Journal of Chemical Information and
Modeling (and its predecessor) is clearly the core
journal
Most frequently occurring
sources
Source
Citations
Abstracts of papers of ACS meeting
44
Journal of Chemical Information and Computer
Sciences/Journal of Chemical Information and Modeling
22
Drug Discovery Today
11
Combinatorial Chemistry and High-Throughput Screening
5
Bioinformatics
5
Current Opinion in Drug Discovery and Development
4
Journal of Computer-Aided Molecular Design
4
Molecular Diversity
4
Quantitative Structure-Activity Relationships/QSAR &
Combinatorial Science
4
Inter-journal relationships
• L. Leydesdorff (2007), "Visualization of the
citation impact environment of scientific
journals", J. Amer. Soc. Inf. Sci. Technol., Vol.
58, pp. 25-38.
• Analysis of 2003-04 WOK data to identify journals
that provide >= 1% of the citations to/from a given
journal
• For Journal of Chemical Information and
Computer Sciences
• 14 other “to” journals but only 5 other “from” journals
• Multi-disciplinary nature of the field means that a wide
range of sources are used
Author productivity: I
• Analysis of the authors of all articles published
1998-2006 in:
• Bioinformatics, Combinatorial Chemistry and HighThroughput Screening and Journal of Biomolecular
Screening
• Journal of Chemical Information and Modeling,
Journal of Computer-Aided Molecular Design,
Molecular Diversity and QSAR & Combinatorial
Science
• Journal of Molecular Graphics and Modelling, Journal
of Molecular Modeling and SAR and QSAR in
Environmental Research
• Identification of the 20 most productive authors
for each of these journals in 1998-2006
Author productivity: II
• Productive authors in the first group of journals did not
publish frequently in the other two groups of journals, but
fair degree of overlap between the journals in the other
two groups (Molecular Diversity the least)
• There is one author in the top-20 for four journals, two authors in
the top-20 for three journals and 12 authors in the top-20 for two
journals
• Eight of the top-20 authors in Journal of Chemical Information
and Computer Sciences are also top-20 authors in other journals
• Main degrees of overlap between
• Journal of Chemical Information and Modeling and Journal of
Computer-Aided Molecular Design
• QSAR & Combinatorial Science and SAR and QSAR in
Environmental Research
Overlap in “top-20” authors
JCICS
JCAMD
MD
QSAR
JMGM
JMM
JCAMD
MD
QSAR
JMGM
JMM
SAR
5
1
1
2
1
3
0
1
2
0
1
0
2
1
0
0
0
5
2
0
0
The core literature
• A basic principle of bibliometrics is that citation
corresponds to use, i.e., frequently cited papers are the
most scientifically valuable
• NB the many exceptions…
•
•
•
•
“Classic” citations
Critical citations
Self-citation and close collaborators
Journal Impact Factor games
• …but generally a valid assumption
• Analysis of citations to 4411 articles in seven
chemoinformatics journals for 1998-2006 attracted a
total of 35,228 citations
Most-cited papers: I
E. Lindahl et al. (2001), GROMACS 3.0: a package for molecular simulation and trajectory
analysis, J. Mol. Model., Vol. 7, pp. 306-317
854
G. Schaftenaar and J.H. Noordik (2000), Molden: a pre- and post-processing program for
molecular and electronic structures, J. Comput.-Aid. Mol. Design, Vol. 14, pp. 123-134
701
A.K. Dunker et al. (2001), Intrinsically disordered protein, J. Mol. Graph. Model., Vol. 19,
pp. 26-59.
239
T.J.A. Ewing et al. (2001), DOCK 4.0: search strategies for automated molecular docking of
flexible molecule databases, J. Comput.-Aid. Mol. Design, Vol. 15, pp. 411-428.
181
M.D. Wessel et al. (1998), Prediction of human intestinal absorption of drug compounds
from molecular structure, J. Chem. Inf. Comput. Sci., Vol. 38, pp. 726-735.
157
T.I. Oprea et al. (2001), Is there a difference between leads and drugs? A historical
perspective, J. Chem. Inf. Comput. Sci., Vol. 41, pp. 1308-1315
145
H.-J. Bohm (1998), Prediction of binding constants of protein ligands: A fast method for the
prioritization of hits obtained from de novo design or 3D database search programs, J.
Comput.-Aid. Mol. Design, Vol. 12, pp. 309-323
143
J.A. Platts et al. (1999), Estimation of molecular linear free energy relation descriptors
using a group contribution approach, J. Chem. Inf. Comput. Sci., Vol. 39, pp. 835-845.
137
Most-cited papers: II
• Certain types of article strongly
represented in the top-30 positions
•
•
•
•
Software descriptions (9)
Reviews (4)
Drug-likeness (4)
Binding energies (4)
• The first of these might be thought of as the
field’s “classic” citations (cf Journal of Chemical
Information and Computer Sciences two mostcited articles)
Institutional productivity
• The following institutions all provide at least 1%
of the papers in all of the seven journals
• National Institute of Chemistry, Ljubljana, University of
Erlangen-Nurnberg, University of Sheffield, University
of Minnesota, Environmental Protection Agency,
Russian Academy of Sciences, Liverpool John
Moores University, Pennsylvania State University,
Chinese Academy of Sciences and the University of
Cambridge
• Of top-50 institutions, only Tripos (no. 27) and
Pfizer (no. 36) are for-profit organisations
National productivity: the ten
countries providing the most
articles in the seven journals
USA
Germany
England
PR China
France
Spain
Italy
Japan
India
Switzerland
All others
The Journal of Molecular
Graphics and Modelling
• The journal, then the Journal of Molecular Graphics, was
started in 1983 and changed to its current name with
Volume 15 in 1997
• The journal is:
• “devoted to the publication of papers on the uses of computers in
theoretical investigations of molecular structure, function,
interaction, and design. The scope of the journal includes all
aspects of molecular modelling and computational chemistry,
including, for instance, the study of molecular shape and
properties, molecular simulations, protein and polymer
engineering, drug design, materials design, structure-activity and
structure-property relationships, database mining, and
compound library design”
• See
http://www.elsevier.com/wps/find/journaldescription.cws_
home/525012/description#description
Bibliometric distributions: I
• Many bibliometric distributions are characterised
by inverse, highly skewed frequency
distributions
• Zipf’s Law for word occurrences
• Lotka’s Law for author productivity
• Bradford’s Law for subject spread in journals
• Many other examples
• Design of storage systems
• Language acquisition
• Income distribution (Pareto distribution)
Bibliometric distributions: II
• All of the bibliometric distributions can be
represented by an equation of the form
where f(k) is the frequency of occurrence of
some bibliometric item that is associated with
each member of a population (k=1,2...) that is
producing examples of these items, and where
C and are constants
Lotka’s Law
• The original formulation (A. Lotka (1926), The frequency
distribution of scientific productivity, Journal of the
Washington Academy of Sciences, Vol. 16, pp. 317-323)
suggested =2 but wide range of values observed in
practice, e.g., 1.78-3.78 (M.L. Pao (1986), An empirical
examination of Lotka's Law, J. Amer. Soc. Inf. Sci., Vol.
37, pp. 26-33)
• WOK lists 859 articles appearing in Vols. 2-24 of the
journal
• Reasonable Lotka plot with C=0.834 and = 3.02
• Well know authors with >= 6 papers: Arteca, Bajorath, Brasseur,
Chatterjee, Ferrin, Flower, Gaber, Goodsell, Griffith, Maigret,
Martin, Mornon, Nakamura, Olson, Richards, Tapia, Toma,
Umeyama, Welsh, White, Willett
Lotka data for 859 articles
published in Volumes 2-24 of
the journal
8
7
6
5
4
3
2
1
0
0
0.5
1
1.5
Log authors against log papers
2
2.5
Types of paper in Volumes 4
(1986), 14 (1996) and 24
(2006)
70
60
50
40
Software
Applications
30
20
10
0
1986
1996
2006
Most-cited papers
R. Koradi et al. (1996), MOLMOL: A program for display and analysis of
macromolecular structures, J. Mol. Graph. Model., Vol. 14, pp. 51-55.
3298
W. Humphrey et al. (1996), VMD: Visual molecular dynamics, J. Mol. Graph.,
Vol. 14, pp. 33-38.
1732
G. Vriend (1990), What-If – a molecular modelling and drug design program, J.
Mol. Graph., Vol. 8, pp. 52-56.
1505
R.M. Esnouf (1997), An extensively modified version of MolScript that includes
greatly enhanced coloring capabilities, J. Mol. Graph. Model., Vol. 15, pp. 132134.
1316
S.V. Evans (1993), SETOR – hardware-lighted 3-dimensional solid model
representations of macromolecules, J. Mol. Graph., Vol. 11, pp. 134-138.
1151
T.E. Ferrin et al. (1988), The MIDAS display system, J. Mol. Graph., Vol. 6, pp.
13-27
982
M. Carson (1987), Ribbon models of macromolecules, J. Mol. Graph., Vol. 5,
pp. 103-106.
514
W. Smith and T.R. Forester (1996), DL_POLY_2.0: A general-purpose parallel
molecular dynamics simulation package, J. Mol. Graph., Vol. 14, pp. 36-141.
314
Inter-journal relatedness
• The Journal Citation Reports database provides a
further way of analysing the degree of co-citation
between journals
• Let A and B be journals publishing PA and PB articles;
let CAB be the number of times that A cites B and let
CTA be the total number of citations in A. Then the
relatedness of A to B is defined as
CAB
PB CTA
• A similar calculation can be made of the relatedness of
B to A
Relatedness values (× 106)
JMGM to J
J to JMGM
Journal of Computer-Aided Molecular Design
250.35
256.16
Journal of Chemical Information and Modeling
62.95
186.84
Journal of Computational Chemistry
162.85
66.96
Structure
30.00
141.65
Proteins
55.99
116.33
Acta Crystallographica D
15.04
111.91
SAR and QSAR in Environmental Research
31.48
98.87
Journal of Molecular Modeling
22.36
96.27
Current Opinion in Structural Biology
84.66
41.70
Protein Science
26.56
79.73
Countries providing at least
3% of the articles in Volumes
2-24 of the journal
USA
England
Japan
France
Germany
Australia
Spain
Switzerland
All others
Conclusions
• Most academics are interested in their personal
citation counts and in the impact factors for their
favourite journals
• Bibliometrics has more general applications
• Subject coverage
• Key players and articles
• Relationships between journals
• Recent developments facilitate the carrying-out
of such analyses