EMTREE and Mesh: A Thesaurus comparison

Download Report

Transcript EMTREE and Mesh: A Thesaurus comparison

Comparison of the EMTREE and MeSH indexing
systems on their effectiveness to retrieve drug
related information
Informatio Medicata 2008, Budapest
25 September 2008
by
Arthur Eger MSc, Account Development Manager
Aim of this study
• What are the differences between
EMTREE and MeSH?
• How important are these differences
when searching for drug related
information?
• Why would you choose EMTREE or
MeSH?
Methods
• As introduction a short synopsis of the EMBASE
and Medline databases is given, followed by a
comparison of the EMTREE and MeSH
indexing systems to identify the advantages of
each indexing system
• Experimental data is collected from several drug
searches, both in EMBASE and
MEDLINE/PubMed in which a term to term
comparison for the 10 leading pharmaceutical
products was performed
• The outcome of the searches was analysed to
come to a conclusion
Introduction: EMBASE
• Biomedical database produced by Elsevier
• Comprehensive coverage of biomedicine with
•
•
•
•
focus on drugs & pharmacology
Indexed with EMTREE (with 55,666 Preferred
Terms incl. 27,058 drugs and chemicals)
Over 11 million records from 1974-present;
currently covers 4,901 journals.
Overlap with MEDLINE ca. 60% (at journal
level)
Unique records especially in drugs titles +
European literature
Introduction: EMBASE.com
• 11 million EMBASE records AND 7
•
•
•
•
million Medline records (only unique
Medline records added)
Daily updates (more then 3,000 records
added every working day)
Linking retrieved articles to full text
Combined drug and disease searches
with easy to select biomedical limits and
subheadings
EMTREE facilitated searching
Introduction: Medline
• Biomedical database produced by NLM (U.S.
•
•
•
•
•
National Library of Medicine)
Focus on all of biomedicine incl. nursing,
dentistry and veterinary science
Indexed with MeSH (= Medical Subject
Headings with 24,219 total Mesh Terms)
15 million records from 1966-present; currently
covers 5,000 journals
Overlap with EMBASE ca. 60% (at journal level)
Unique records especially in US titles + nursing
literature etc.
Comparing EMTREE and MeSH: what do they
have in common?
• Biomedical / life science terminologies
• Similar facet structure (EMTREE was
modelled on MeSH 18 years ago)
• Broader/narrower terms and synonyms
• Major annual updates with hundreds of
new terms
• Links to CAS registry numbers and
enzyme commission (EC) numbers
Comparing EMTREE and Mesh: what are the
differences?
EMTREE
MeSH
Importance (for the user)
“Natural language terminology” (e.g. myeloid leukemia)
Has many “inverted terms”
(e.g. leukemia, myeloid)
Intuitive and recognisable terms for ease of use
Has more than 225,829 synonyms (incl. almost 142,323
drug synonyms)
Has far fewer synonyms
(exact number unknown)
High probability that term used by user is in EMTREE
Includes all MeSH terms
(many as synonyms)
-
EMTREE can easily be used by MeSH users
Relies upon “meaning” invested in terms by authors
using them
Has many scope notes to describe how terms are used
No dependence on or need to look up scope notes
Larger (55,666 preferred terms)
Smaller (24,219 preferred terms)
Best chance of finding both drug and non-drug
terminology
Drugs facet has 27,058 preferred terms
Drugs facet has only 2,711 terms
Drugs terminology is better and more up-to-date
Drug terms are only updated when they become
established
Better results for new drugs
Polyhierarchical structure with duplicated trees
Polyhierarchical structure with differences between trees
Unambiguous and context-free explosion searches
Check tags and document types included within
EMTREE
Check tags and document types are in separate lists
All the terminology you need is in one place (i.e.
EMTREE)
All drug and chemical information is included in
EMTREE
Detailed drug information is in a separate
(“supplementary”) file
All the drug information you need is in EMTREE
New drug terms are updated earlier in EMTREE
[1]
MeSH advantages:
• Extensive history notes, which can be used to track
•
•
•
•
earlier literature predating the introduction of particular
terms.
Extensive scope notes.
Ad-hoc updates during the year (e.g. SARS in 2003).
Extensive terminology in nursing, veterinary medicine
and dentistry.
EMTREE has a record of updating terminology in
response to the needs of user groups in relevant areas
such as pharmacoeconomics (1996), pharmaceutical
vehicles and additives (1997), biotechnology (2001),
alternatives to animal experimentation (2002) and
nursing (2006).
EMTREE advantages:
EMTREE drugs terminology is:
– extensive, and supported by synonyms
(including many trade names).
– updated regularly and comprehensively,
746 drug preferred terms were added in
2007.
– organized in a comprehensive tree
structure, with drug terms structured from
many points of view, including structure,
activity, therapeutic use.
– Useful in identifying, finding out about and
searching new drugs.
Why consider EMTREE in drug searches?
EMTREE terminology in general:
– already includes all of MeSH (i.e. all MeSH
terms are linked into EMTREE).
– can be extended further in response to
user needs, subject to agreement with
overall policy.
– Elsevier provides support to ensure that
EMTREE is optimally embedded in user
applications.
Experimental data collection
• Online study conducted 18th August
2008, on EMBASE.com and on Medline
via PubMed.
• Drugs were top 10 leading products
based on global pharmaceutical sales
(source: Wood Mackenzie Top 100
Ethical Drugs by Sales (http://www.p-dr.com/ranking/ranking.html)). Search
period: literature publication years 20042008.
Experimental data collection
• Search method: A term to term comparison in which
each database was searched for EMTREE as well as
for MeSH preferred terms. Searched were:
descriptors.
• For preferred terms that were not yet incorporated in
the MeSH thesaurus, the Substance name, available
from the MeSH Supplementary Concept Data was
used instead.
• Where the Mesh term differs from the EMTREE
term, the generic name was checked at official
online sites for the drug and EMTREE was found
to be correct according to these sites. The results
are very different in EMBASE if you search for
Omeprazole rather then Esomeprazole for example,
there are 5,425 results for Omeprazole in EMBASE.
Experimental data collection
Preferred term
Citations
Product name
MeSH
EMTREE
EMBASE
MEDLINE
or Substance name²
Lipitor
Atorvastatin
Atorvastatin
7,668
1,654
Advair
Fluticasone propionate plus
Salmeterol
Fluticasone propionate plus
Salmeterol
839
39
Plavix
Clopidogrel
Clopidogrel
9,552
1,613
Nexium
Esomeprazole
Omeprazole
1,493
1,558
Norvasc
Amlodipine besylate
Amlodipine
439
621
Experimental data collection
Preferred term
Citations
Product name
MeSH
EMTREE
EMBASE
MEDLINE
5,197
1,375
7,775
1,498
8,223
2,461
2,725
508
7,991
1,488
or Substance name²
Enbrel
Etanercept
TNFR-Fc fusion
Zyprexal
Olanzapine
Olanzapine
Remicade
Infliximab
Infliximab
Diovan
Valsartan
Valsartan
Risperdal
Risperidone
Risperidone
Discussion of the findings: Lipitor
• EMBASE found 7,668, MEDLINE found
1,654 between the years 2004 to 2008
• Both appear to search the same term,
and whereas EMBASE has unique
pharmacological titles, this does not
completely explain the larger number of
records found in EMBASE compared to
MEDLINE.
Discussion of the findings: Looking at Lipitor
in EMTREE
Discussion of the findings: Looking at Lipitor
in EMTREE
The EMTREE preferred term is atorvastatin so this is term we use to search
in EMBASE. The indexers will have used this term when indexing
regardless of how the author referred to it.
Discussion of the findings: Looking at Lipitor
in EMTREE
We can see the many tree structures for lipitor based on therapeutic use for
example.
Discussion of the findings: Looking at Lipitor
in EMTREE
Note the large number of synonyms. A search for atorvastatin will also search for
all of the synonyms mentioned here.
Discussion of the findings: Looking at Lipitor
in MeSH
Compare this to a search in PubMed and you see that atorvastatin is NOT a MeSH
heading, it is a substance name, you must use heptanoic acid or pyroles if you want to
search a MeSH heading.
The entry terms are the equivalent of synonyms but you see there are less then in
EMTREE.
Discussion of the findings: Conclusion
• Atorvastatin is not a preferred term in
MeSH and so it is not mapped in the
same way as a preferred term
• A larger number of synonyms are
included in the EMBASE search and so
more records may be found
• This conclusion applies to many of the
studied drugs in the table
Discussion of the findings: Looking at
Nexium
• 1,493 records found in EMBASE and
1,558 records found in MEDLINE
This time we find more results in MEDLINE,
why?
Discussion of the findings: Looking at
Nexium In EMTREE
The EMTREE preferred term is esomeprazole so this is term we
use to search in EMBASE. The indexers will have used this term
when indexing regardless of how the author referred to it.
Discussion of the findings: Looking at
Nexium In EMTREE
We can see the many tree structures for esomeprazole based on therapeutic
use for example.
Discussion of the findings: Looking at
Nexium in MeSH
Checking the generic term for Nexium on official sites finds it to be Esomeprazole.
Here you can see that Omeprazole, as used in MEDLINE is an exploded term for
protein pump inhibitor or pyridine derivative
Discussion of the findings:
Omeprazole or esomeprazole?
• MeSH has chosen a term which is not
used in official sites for nexium
• Omeprazole is an exploded term in the
same tree structure as esomeprazole
• A search for omeprazole in EMBASE,
gives very different results to a search for
esomeprazole (5,425 results in EMBASE
v.s 1,558 in Medline)
Summary of the findings
Drug name
Results in EMBASE
Results in Medline
More found in
EMBASE
Found in EMBASE
in % of total
Lipitor
7,668
1,654
6,014
82%
Advair
839
39
800
96%
Plavix
9,552
1,613
7,939
86%
Nexium
1,493
1,558
-65
49%
439
621
-182
41%
Enbrel
5,197
1,375
3,822
79%
Zyprexal
7,775
1,498
6,277
84%
Remicade
8,223
2,461
5,762
77%
Diovan
2,725
508
2,217
84%
Risperdal
7,991
1,488
6,503
84%
51,902
12,815
39,087
80%
Norvasc
Total
Conclusion
• Of the searched 10 product names, 80%
of all retrieved records were retrieved in
EMBASE
• In 2 occurrences more records were
found in Medline (Nexium and Norvasc).
The preferred term made the difference
and here, it would appear that a search
for the EMTREE preferred term gives you
more accurate results.
• This example also applies to Norvasc
Acknowledgements
The presenter would like to acknowledge
the work of Mrs. Ann-Marie Roche
Elsevier’s Pharma Development Group
Senior Product Specialist, on preparing
the searches and the basis of the slides
shown in this presentation.