PubMed - Marshall University

Download Report

Transcript PubMed - Marshall University

On-line access to searching the Biomedical Literature
PubMed
PubMed Tutorial
http://www.nlm.nih.gov/bsd/pubmed_tuto
rial/m1001.html
 This is a useful tool.
 You need to have the correct download
for the interactive animations.
 http://www.ncbi.nlm.nih.gov/bookshelf/br.f
cgi?book=helppubmed&part=pubmedhelp
 Another useful tool with lots of Quick
Start hints.
 I have summarized some in this lecture

NCBI Entrez



Entrez links between
databases
It is a life sciences
search engine
http://www.ncbi.nlm.ni
h.gov/sites/gquery
Interrelationships
PubMed vs. Google

PubMed
◦ Peer reviewed journals
 Multiple layers of quality
control
 Edited and reviewed text
and grammar
◦ Combines automated
and manual searching
◦ Structured links to
other data sets (nucleic
acid and protein
sequences)

Google
◦ The internet
 Free, but you get what you
pay for
 Variable document
structure and grammar
◦ Fully automated search
◦ Unstructured links
Entrez PubMed
Access http://www.ncbi.nlm.nih.gov/entrez/
Entrez covers the Biomedical research broadly.
Even recent journals are indexed.
 Lack of coverage in
 CS and engineering
 Physical chemistry
 Plant science
 Searchable content
◦ Free text search
 title, abstract, indexing, address
◦ Controlled vocabulary
 Mesh indexing, journal, dates, substance
names, secondary indices


PubMedCentral
U.S. National Library of Medicine's digital
archive of life sciences journal literature
 Full text of many journal archives

◦ Not the most recent issues
◦ Limited journal collection
Access to PMC is free and unrestricted
http://www.pubmedcentral.nih.gov/about/faq.html

PubMed Entry

The PubMed
Entry
includes:
◦ Citation
◦ Link to paper
(maybe)
◦ Abstract
◦ PMID#
◦ UID#
Searching the Biomedical Literature
The PubMed literature is also in a flat file
format with various fields.
 Knowledge of the fields in the file can allow you
to focus your search and find what you are
looking for more quickly.
 For example, you can search by author and
journal if you are looking for a specific person’s
work and know where it was published.

Uses and Limits of MeSH
Manually indexed
◦ Major topics => intelligent filtering
◦ Pick up things that are not in the title/abstract
◦ Takes time to add new headings (no MeSH headings
for most recent several months)
◦ People are fallible, so some misclassification occurs
◦ Subheadings can be very useful, but are less reliable
Strong medical focus
◦ Good for biomedical searches
◦ Not as useful in technical areas, agriculture and plants
MeSH Vocabulary
The MeSH controlled vocabulary is a distinctive
feature of MEDLINE.
 It imposes uniformity and consistency to the
indexing of biomedical literature.
 MeSH terms are arranged in a hierarchical,
categorized system.
 These MeSH Tree Structures are updated
annually.

Curating-Not in a Museum
Curating in Bioinformatics is an action taken by
someone (often a scientist trained in technical
areas) to regularize the language of Science.
 Science uses too many synonyms- word that
mean roughly the same thing.
 Curating makes that regular so we can search
for things.
 MESH is a form of curation.

MeSH Homepage
http://www.nlm.nih.gov/mesh/meshhome.html
 MeSH is needed to help organize searching for
efficiency.
 This reduces the synonyms and abbreviations in
the biomedical literature.
 Humans help with the sorting of the headings is
“curation.”

Structure of MeSH
Divisions
Anatomy [A]
Organisms [B]
Diseases [C]
Chemicals and Drugs [D]
Analytical, Diagnostic and Therapeutic Techniques
and Equipment [E]
Psychiatry and Psychology [F]
Biological Sciences [G]
Physical Sciences [H]
Anthropology, Education, Sociology and Social
Phenomena [I]
Technology and Food and Beverages [J]
Humanities [K]
Information Science [L]
Persons [M]
Health Care [N]
Geographic Locations [Z]
Hierarchy with Multiple Inheritance
Amino Acids, Peptides, and Proteins [D12]
Proteins [D12.776]
DNA-Binding Proteins [D12.776.260]
NF-kappa B [D12.776.260.600]
Amino Acids, Peptides, and Proteins [D12]
Proteins [D12.776]
Nuclear Proteins [D12.776.660]
NF-kappa B [D12.776.260.600]
Amino Acids, Peptides, and Proteins [D12]
Proteins [D12.776]
Transcription Factors [D12.776.930]
NF-kappa B [D12.776.260.600]
MeSH Full Listing
NF-kappa B
Ubiquitous, inducible, nuclear transcriptional
activator that binds to enhancer elements in
many different cell types and is activated by
pathogenic stimuli. The NF-kappa B complex is a
heterodimer composed of two DNA-binding
subunits: NF-kappa B1 and relA.
Year introduced: 1991
Ssubheadings:
administration and dosage agonists analysis
antagonists and inhibitors biosynthesis blood
cerebrospinal fluid chemistry classification
deficiency diagnostic use drug effects genetics
immunology isolation and purification
metabolism pharmacokinetics pharmacology
physiology radiation effects secretion
therapeutic use toxicity ultrastructure
Restrict Search to
Major Topic headings only
Do Not Explode this term
(i.e., do not include MeSH terms found
below this term in the MeSH tree).
Entry Terms:
NF-kB
NF kB
Nuclear Factor kappa B
kappa B Enhancer Binding Protein
Immunoglobulin Enhancer-Binding Protein
Enhancer-Binding Protein, Immunoglobulin
Immunoglobulin Enhancer Binding Protein
Transcription Factor NF-kB
Factor NF-kB, Transcription
NF-kB, Transcription Factor
Transcription Factor NF kB
Ig-EBP-1
Ig EBP 1
Previous Indexing:
DNA-Binding Proteins (1987-1990)
Transcription Factors (1987-1990)
See Also:
I-kappa B
All MeSH Categories
Chemicals and Drugs Category
Amino Acids, Peptides, and Proteins
Proteins
DNA-Binding Proteins
NF-kappa B
All MeSH Categories
Chemicals and Drugs Category
Amino Acids, Peptides, and Proteins
Proteins
Nuclear Proteins
NF-kappa B
All MeSH Categories
Chemicals and Drugs Category
Amino Acids, Peptides, and Proteins
Proteins
Transcription Factors
NF-kappa B
Journals Database
Entrez -> Journals
A database of journal names and information
Entry structure:
Nature genetics.
pISSN: 1061-4036
MEDLINE Abbr: Nat Genet
ISO Abbr: Nat. Genet.
NLM ID: 9216904
Boolean Logic
Boolean logic symbolically represents
relationships between entities. There are three
Boolean operators:
 AND

◦ Use the AND operator to retrieve a set in which
each citation contains ALL the search terms. This
operator places no condition on where the terms are
found in relation to one another; the terms simply
have to appear somewhere in the same citation.
Boolean Logic

OR
◦ Use the OR operator to retrieve documents that
contain at least one of the specified search terms.
◦ Use OR when you want to pull together articles on
similar subjects.

NOT
◦ Use the NOT operator to exclude the retrieval of
terms from your search.
◦ Be careful with NOT as you can exclude things you
might want
Boolean Logic in PubMed
Boolean operators -- AND, OR, NOT -- must
be entered in uppercase letters.
 Boolean operators are processed from left to
right.
 Use parentheses to nest terms together so they
will be processed as a unit and then
incorporated into the overall strategy.
 Boolean Logic is revealed by clicking Details

Boolean Logic in PubMed
Phrase Searching

Specify with quotes
“transcription factor” vs. “transcription” “factor”

Precomputed
◦ Fast
◦ Often mapped to synonyms and MeSH terms
◦ Just because you get a “phrase not found” message
does not mean it is not present
Text Neighboring
Related articles link (single or multiple articles)
◦ Term usage similarity
 Articles talking about the same thing are likely to
use the same words
◦ Good recall (sensitivity)
◦ Precomputed and fast
Limitations
◦ Strictly algorithmic, no understanding
 “Ras activates PI3K” vs. “PI3K activates Ras”
◦ Historical and author biases in vocabulary
◦ Poor precision (specificity)
◦ Ranking can not satisfy everyone
Computational Issues in Statistical Text
Retrieval
Stop words
◦ Simple words like “the” and “and” are not worth
scoring
 Term weights
◦ We should weight matches of rare words more
heavily than matches of common words
 Stemming and synonyms
◦ Need to stem verbs and plural forms
◦ May or may not be able to reduce to a
normalized set of synonyms

Computational Issues in Statistical Text
Retrieval

Normalizing for length
◦ Don’t want to exclude short articles or
articles without an abstract

All vs. all comparison is not feasible
◦ 107 articles => 1014 comparisons, not feasible
◦ Compute demands of the task are growing
faster than Moore’s law
Entrez Clipboard
The Clipboard gives you a place to
collect selected citations from one
search or several searches.
 After you add citations to the
Clipboard, you may then want to use
the print, save, or order buttons.
 The maximum number of items that
can be placed in the Clipboard is 500.

Entrez Clipboard
Once you have added items to the
Clipboard, you can click on Clipboard
from the Features bar to view your
selections.
 PubMed Central uses cookies to add
your selections to the Clipboard. To
use this feature, your web browser
must be set to accept cookies.

Using Clipboard

Add to Clipboard
◦ To place an item in the Clipboard, click on the check
box to the left of the citation.
◦ Select Clipboard from the Send to pull-down menu.
◦ Then click the Send to button. Once you have added a
citation to the Clipboard, the record number color
will change to green. Send to “clipboard”
◦ You can save results collected from multiple searches
◦ The Clipboard will hold a maximum of 500 items.
◦ Clipboard items will be lost after 8 hours of inactivity.
Saving from the Clipboard
Citations are initially displayed in the
summary format in the relevancy order.
 Use Sort to change the order. You can
select all or individual citations to display or
save in one of the citation display formats.

Saving from the Clipboard

Select the desired format from the pulldown menu, click Save to save your
selections to a file, or use the Print feature
of your web browser to print the citations.
Saving from the Clipboard
Printing from your web browser will only
print the information and citations listed on
the web page.
 You may also display citations as plain text
without the sidebar menu and toolbars by
clicking the Text button.

Modifying the Display

PubMed Central citations are initially displayed
in a summary format. You can choose to display
other formats:
◦ Click on the Abstract, Full Text, PDF or PubLink
hyperlink for a specific citation.
◦ All Citations -Select a display format from the Display
pull-down menu and then click Display to view a
different display or Links for all citations on the page.
◦ Selected Citations - Click on the boxes to the left of
each author to select specific citations and then
select a format or Links from the Display pull-down
menu and click Display.
◦ You can also use the link-out function in the display
menu which can be handy.
Entrez History

Retrieve and use your search history
◦ Boolean combinations of search results. To combine
searches use # before search number, e.g., #2 AND
#6.
◦ Filtering of previous search results
◦ This can help you on big searches to remember and
build on your terms
◦ Search History will be lost after eight hours of
inactivity
Address Fields
Find a local expert in PubMed
“Marshall University” AND (25755) [ad] OR “West Virginia” [ad]
NOT WVU [ad])
Need to think about all the ways people write addresses
“Joan C. Edwards” fails to pick up “MUSOM.” Zip codes are very
specific, but only get about 70%, since they might not list all authors
zips
Won’t catch co-authored articles with a remote
collaborator
Related Articles
PubMed uses a powerful word-weighted
algorithm to compare words from the Title
and Abstract of each citation, as well as the
MeSH headings assigned.
 The best matches for each citation are precalculated and stored as a set.
 THIS MAKES IT FAST.
 You may see a few citations without the
Related Articles link. These citations have
not yet gone through the algorithm, which
takes several days.

OnlineBooks
NCBI Handbook