The University PowerPoint Template
Download
Report
Transcript The University PowerPoint Template
A bibliometric analysis of
chemoinformatics
Presented at the 233rd National Meeting of the American
Chemical Society, Chicago, 25th March 2007
Peter Willett, University of Sheffield, UK
Overview of talk
• Bibliometrics
• Chemoinformatics
• Growth of the subject
• Subject coverage
• Key journals
• Author productivity
• Yvonne Martin’s contribution
Bibliometrics: what is it?
• Bibliometrics is:
• “The application of mathematical and statistical
methods to books and other media” (A. Pritchard
(1969), Statistical bibliography or bibliometrics?, J.
Docum., Vol. 25, pp. 348-349)
• “The study, or measurement, of texts and information”
(Wikipedia)
• See also:
• Webometrics
• “The study of the quantitative aspects of the construction and
use of information resources, structures and technologies on
the Web drawing on bibliometric and informetric approaches"
(L. Björneborn and P. Ingwersen (2004), Toward a basic
framework for webometrics. J. Amer. Soc. Inf. Sci. Technol.,
Vol. 55, pp. 1216-1227)
• Cybermetrics, informetrics, scientometrics
Bibliometrics: subjects of
study
• Bibliometric distributions
• Highly skewed frequency distributions (Bradford,
Lotka, Zipf) and their implications
• Citation analysis
• Analysis of individuals, institutions and journals
Use as performance indicators for the evaluation of research
• Philosophy of science
Subject coverage
Academic collaborations
• Now extension to linkages between Web sites
Sitations, cf citations
From chemical documentation
to chemoinformatics
• Chemistry is an information-rich discipline
• Chemisches Journal (1778); Chemical Abstracts (1907)
• Chemical Titles (1961); Morgan and Sussenguth algorithms
(1965)
• But chemoinformatics a new arrival
• M. Hann and R. Green (1999), Chemoinformatics - a new
name for an old problem?, Curr. Opin. Chem. Biol., Vol. 3,
pp. 379-383
• “The use of information technology and management has
become a critical part of the drug discovery process.
Chemoinformatics is the mixing of those information
resources to transform data into information and information
into knowledge for the intended purpose of making better
decisions faster in the area of drug lead identification and
optimization” F.K. Brown (1998), Chemoinformatics: What is
it and how does it impact drug discovery?, Ann. Reports
Med. Chem., Vol. 33, pp. 375-384
• Take 1998 as the starting point for the bibliometric
analyses
Bibliometric studies in
chemoinformatics
• Onodera (2001)
• Analysis of the subject coverage of Journal of
Chemical Information and Computer Sciences
• Redman et al. (2001)
• Applications of the Cambridge Structural Database
• Bishop et al. (2003)
• Citations to Sheffield chemoinformatics research
• Warr (2005)
• Most cited papers in Journal of Chemical Information
and Computer Sciences
• Behrens and Luksch (2006)
• Contents of the Inorganic Crystal Structure Database
Data sources for
bibliometric research
• Web of Knowledge (WOK)
• Long established as the data source for bibliometric
analyses
• Recent addition of analysis tools (Analyse Results
and Citation Reports)
• Probably still the most comprehensive
• New sources
• Google
Google Scholar restricted to the scholarly literature
• Scopus
New service from Elsevier, offering similar facilities
to WOK
What shall we call it?
Term or phrase
Google
Google
Scholar
WOK
Scopus
Chemical documentation
695000
66
1
34
Chemical informatics
50,400
129
20
39
Chemical information
management
978
42
4
28
Chemical information
science
779
17
2
5
Chemiinformatics
2230
2
2
2
Cheminformatics
320,000
447
83
250
Chemoinformatics
191,000
5636
99
473
Google postings from
http://www.molinspiration.com/chemoinformatics.html
350000
300000
250000
200000
150000
100000
50000
20
01
20
02
20
03
20
04
20
05
20
06
0
Chemo*
Chem*
• WOK search of the title, keyword and abstract
fields for:
• chemoinformatics OR cheminformatics OR “chemical
informatics”
• This search retrieved 197 records for the period
1998-2006 in 87 different sources
• Of these, Journal of Chemical Information and
Modeling (and its predecessor) is clearly the core
journal
Most frequently occurring
sources
Source
Citations
Abstracts of papers of ACS meeting
44
Journal of Chemical Information and Computer
Sciences/Journal of Chemical Information and Modeling
22
Drug Discovery Today
11
Combinatorial Chemistry and High-Throughput Screening
5
Bioinformatics
5
Current Opinion in Drug Discovery and Development
4
Journal of Computer-Aided Molecular Design
4
Molecular Diversity
4
Quantitative Structure-Activity Relationships/QSAR &
Combinatorial Science
4
Inter-journal relationships
• L. Leydesdorff (2007), "Visualization of the
citation impact environment of scientific
journals", J. Amer. Soc. Inf. Sci. Technol., Vol.
58, pp. 25-38.
• Analysis of 2003-04 WOK data to identify journals
that provide >= 1% of the citations to/from a given
journal
• For Journal of Chemical Information and
Computer Sciences
• 14 other “to” journals but only 5 other “from” journals
• Emerging, multi-disciplinary nature of the field means
that a wide range of sources is used
Author productivity: I
• Analysis of the authors of all articles published
1998-2006 in:
• Bioinformatics, Combinatorial Chemistry and HighThroughput Screening and Journal of Biomolecular
Screening
• Journal of Chemical Information and Modeling,
Journal of Computer-Aided Molecular Design,
Molecular Diversity and QSAR & Combinatorial
Science
• Journal of Molecular Graphics and Modelling, Journal
of Molecular Modeling and SAR and QSAR in
Environmental Research
• Identification of the 20 most productive authors
for each of these journals in 1998-2006
Author productivity: II
• Productive authors in the first group of journals did not
publish frequently in the other two groups of journals, but
fair degree of overlap between the journals in the other
two groups
• There are two authors in the top-20 for four journals, one author
in the top-20 for three journals and 12 authors in the top-20 for
two journals
• Eight of the top-20 authors in Journal of Chemical Information
and Computer Sciences are also top-20 authors in other journals
• Main degrees of overlap between
• Journal of Chemical Information and Modeling and Journal of
Computer-Aided Molecular Design
• QSAR & Combinatorial Science and SAR and QSAR in
Environmental Research
Overlap in “top-20” authors
JCICS
JCAMD
MD
QSAR
JMGM
JMM
JCAMD
MD
QSAR
JMGM
JMM
SAR
5
1
1
2
1
3
0
1
2
0
1
0
2
1
0
0
0
5
2
0
0
The core literature
• A basic principle of bibliometrics is that citation
corresponds to use, i.e., frequently cited papers are the
most scientifically valuable
• NB the many exceptions…
•
•
•
•
“Classic” citations
Critical citations
Self-citation and close collaborators
Journal Impact Factor games
• …but generally a valid assumption
• Analysis of citations to 4411 articles in seven
chemoinformatics journals for 1998-2006 attracted a
total of 35,228 citations
Most-cited papers: I
E. Lindahl et al. (2001), GROMACS 3.0: a package for molecular simulation and trajectory
analysis, J. Mol. Model., Vol. 7, pp. 306-317
854
G. Schaftenaar and J.H. Noordik (2000), Molden: a pre- and post-processing program for
molecular and electronic structures, J. Comput.-Aid. Mol. Design, Vol. 14, pp. 123-134
701
P. Willett et al. (1998), Chemical similarity searching, J. Chem. Inf. Comput. Sci., Vol. 38,
pp. 983-996.
291
A.K. Dunker et al. (2001), Intrinsically disordered protein, J. Mol. Graph. Model., Vol. 19,
pp. 26-59.
239
T.J.A. Ewing et al. (2001), DOCK 4.0: search strategies for automated molecular docking of
flexible molecule databases, J. Comput.-Aid. Mol. Design, Vol. 15, pp. 411-428.
181
A. Golbraikh and A. Tropsha, A. (2002), Beware of q2!, J. Mol. Graph. Modell., Vol. 20, pp.
269-276.
167
M.D. Wessel et al. (1998), Prediction of human intestinal absorption of drug compounds
from molecular structure, J. Chem. Inf. Comput. Sci., Vol. 38, pp. 726-735.
157
T.I. Oprea et al. (2001), Is there a difference between leads and drugs? A historical
perspective, J. Chem. Inf. Comput. Sci., Vol. 41, pp. 1308-1315
145
Most-cited papers: II
• Certain types of article strongly
represented in the top-30 positions
•
•
•
•
Software descriptions (9)
Reviews (4)
Drug-likeness (4)
Binding energies (4)
• The first of these article-types might be thought
of as the field’s “classic” citations (cf Journal of
Chemical Information and Computer Sciences
two most-cited articles)
Institutional productivity
• The following institutions all provide at least 1%
of the papers in the seven journals
• National Institute of Chemistry, Ljubljana, University of
Erlangen-Nurnberg, University of Sheffield, University
of Minnesota, Environmental Protection Agency,
Russian Academy of Sciences, Liverpool John
Moores University, Pennsylvania State University,
Chinese Academy of Sciences and the University of
Cambridge
• Of top-50 institutions, Tripos (no. 27) and Pfizer
(no. 36) are for-profit organisations
National productivity: the ten
countries providing the most
articles in the seven journals
USA
Germany
England
PR China
France
Spain
Italy
Japan
India
Switzerland
All others
Yvonne Martin’s
contribution
• Since starting her career in 1958 as a Research
Assistant at Abbott Laboratories she has
produced:
•
•
•
•
One authored and six edited books
39 book chapters
Seven patents
22 review articles and 60 refereed articles
• References to 73 of the articles in WOK, with
>=2 articles in
• J. Med. Chem. (28), J. Pharm. Sci. (7), Perspect.
Drug Discov. Design (7), J. Comput.-Aid. Mol. Design
(5) and J. Chem. Inf. Model. (3)
• Also three references on optical microscopy by a
(presumed) namesake
Citation analysis
• A total of 2714 citations to these 73 publications
(plus a few more to conference abstracts)
• The notable difference between the mean (37.2)
and median (13) numbers of citations mean that
some of her publications have been very
influential
• Eight (and counting) have more than 100
citations
Most-cited papers
R.D. Brown and Y.C. Martin (1996), Use of structure activity data to compare structurebased clustering methods and descriptors for use in compound selection, J. Chem. Inf.
Comput. Sci., Vol. 36, pp. 572-584
321
I. Muegge and Y.C. Martin (1999), A general and fast scoring function for protein-ligand
interactions: A simplified potential approach, J. Med. Chem., Vol. 42, pp. 791-804
256
Y.C. Martin et al. (1993), Fast new approach to pharmacophore mapping and its application
to dopaminergic and benzodiazepine agonists, J. Comput.-Aid. Mol. Design, Vol. 7, pp. 83102.
223
R.D. Brown and Y.C. Martin, The information content of 2D and 3D structural descriptors
relevant to ligand-receptor binding, J. Chem. Inf. Comput. Sci., Vol. 37, pp. 1-9
154
M.A. Abreo et al. (1996), Novel 3-pyridyl ethers with subnanomolar affinity for central
neuronal nicotinic acetylcholine receptors, J. Med. Chem., Vol. 39, pp. 817-825
142
Y.C. Martin (1992), 3D database searching in drug design, J. Med. Chem., Vol. 35, pp.
2145-2154
138
Y.C. Martin (1981), A practitioner’s perspective of the role of quantitative structure-activity
analysis in medicinal chemistry, J. Med. Chem., Vol. 24, pp. 229-237.
121
Y.C. Martin et al. (2002), Do structurally similar molecules have similar biological activity? J.
Med. Chem., Vol. 45, pp. 4350-4358
110
Brown and Martin (1996)
•
The 321 citations are in 80 journals,
with some frequent (>=10)
• J. Chem. Inf. Model. (109), J. Med.
Chem. (24), J. Comput.-Aid. Mol.
Design (15), J. Mol. Graph. Model.
(15), Perspect. Drug. Discov. Design
(13), Comb. Chem. High-Through.
Screen. (11) and Drug Discov. Today
(11)
•
Wide range of disciplines with
singleton-sources including
• Advances in Informatics, Canadian
Journal of Physiology and
Pharmacology, Grid Computing in
the Life Sciences, IBM Journal of
Research and Development, Journal
of Immunology, Mathematics of
Operations Research, and
Technometrics
Muegge and Martin (1999)
•
The 256 citations are in 76
journals, with some frequent (>=5)
• J. Med. Chem. (69), J. Chem. Inf.
Model. (19), J. Comput.-Aid. Mol.
Design (19), Proteins (13), J. Mol.
Graph. Model. (9), J. Comput.
Chem. (7), Bioorg. Med. Chem.
(7), Bioorg. Med. Chem. Lett. (6),
Curr. Med. Chem. (6), J. Mol. Biol.
(5)
•
Slightly more focused range of
disciplines with singleton-sources
including
• Biomaterials, FEBS Letters,
International Journal of Quantum
Chemistry, Journal of Biomedical
Materials Research, Nucleic Acids
Research, Oncology Reports,
Organometallic Chemistry, and
Protein Simulations
Martin et al. (1993)
• The 223 citations are in 75 journals,
with some frequent (>=5)
• J. Med. Chem. (38), J. Comput.-Aid.
Mol. Design (35), J. Chem. Inf.
Model. (25), Perspect. Drug. Discov.
Design (11), Acta Chim. Sinica (7),
Bioorg. Med. Chem. (5), Mol. Pharm.
(5)
• Wide range of disciplines with
singleton-sources including
• Algorithmica, Computational
Geometry, Cytochrome P450,
European Journal of Operational
Research, IEEE Transactions on
Pattern Analysis and Machine
Intelligence, Letters in Peptide
Science, Machine Learning, and
Trends in Cardiovascular Medicine
Conclusions
• Most academics are interested in their personal
citation counts and in the impact factors for their
favourite journals
• Bibliometrics has more general applications
• Subject coverage
• Key players and articles
• Relationships between journals
• Recent developments facilitate the carrying-out
of such analyses