Transcript PubMed

PubMed
On-line access to searching the
Biomedical Literature
PubMed Tutorial
• http://www.nlm.nih.gov/bsd/pubmed_
tutorial/m1001.html
• This is an invaluable resource to
learning to use this tool.
• You need to have the correct
download for the interactive
animations.
NCBI Entrez
• Common retrieval
interface to many
databases
• Controlled links between
databases
• Maintained at the
National Center for
Biotechnology
Information (NCBI) in
the National Library of
Medicine (NLM)
Entrez 2004
PubMed vs. Google
• PubMed
– Peer reviewed journals
• Multiple layers of
quality control
• Edited and reviewed
text and grammar
– Combines automated
and manual searching
– Structured links to
other data sets
(nucleic acid and
protein sequences)
• Google
– The internet
• Free, but you get
what you pay for
• Variable document
structure and
grammar
– Fully automated
search
– Unstructured links
Entrez PubMed
• Access
http://www.ncbi.nlm.nih.gov/entrez/
• Coverage
– Biomedical research broadly
– Partial indexing of most recent couple months of journals
– Lack of coverage in
• CS and engineering
• Physical chemistry
• Plant science
• Searchable content
– Free text search
• title, abstract, indexing, address
– Controlled vocabulary
• Mesh indexing, journal, dates, substance names,
secondary indices
Relationships between the Entrez
primary information resources
Searching the Biomedical
Literature
• The PubMed literature is also in a flat file
format with various fields.
• Knowledge of the fields in the file can
allow you to focus your search and find
what you are looking for more quickly.
• For example, you can search by author and
journal if you are looking for a specific
person’s work and know where it was
published.
PubMed Search results
• Search results for
PubMed include the
citation of the
articles that your
search has
returned.
• Using Limits option
you can search for
– Author/Keyword/
Title/ Journal
– Language of pub
– Date
– Organism (human
or animal)
– Sex (male or
female)
– Type of publication
(Clinical, Review,
Editorial, etc.)
PubMed Entry
• The
PubMed
Entry
includes:
– Citation
– Link to
paper
(maybe)
– Abstract
– PMID#
– UID#
Uses and Limits of MeSH
Manually indexed
– Major topics => intelligent filtering
– Pick up things that are not in the title/abstract
– Takes time to add new headings (no MeSH headings for
most recent ~couple of months)
– People are fallible, so some misclassification occurs
– Subheadings can be very useful, but are less reliable
Strong medical bias
– Good for biomedical searches
– Not as useful in technical areas, agriculture and plants
MeSH Vocabulary
• The MeSH controlled vocabulary is a
distinctive feature of MEDLINE.
• It imposes uniformity and consistency to
the indexing of biomedical literature.
• MeSH terms are arranged in a hierarchical,
categorized system.
• These MeSH Tree Structures are updated
annually.
MeSH Homepage
• http://www.nlm.nih.gov/mesh/meshhome.ht
ml
• MeSH is needed to help organize searching
for efficiency.
• There are too many synonyms and
abbreviations in the biomedical literature.
• Humans still help with the sorting of the
headings. This is called “curation.”
Structure of MeSH
Divisions
Anatomy [A]
Organisms [B]
Diseases [C]
Chemicals and Drugs [D]
Analytical, Diagnostic and Therapeutic
Techniques and Equipment [E]
Psychiatry and Psychology [F]
Biological Sciences [G]
Physical Sciences [H]
Anthropology, Education, Sociology and
Social Phenomena [I]
Technology and Food and Beverages [J]
Humanities [K]
Information Science [L]
Persons [M]
Health Care [N]
Geographic Locations [Z]
Hierarchy with Multiple
Inheritance
Amino Acids, Peptides, and Proteins [D12]
Proteins [D12.776]
DNA-Binding Proteins [D12.776.260]
NF-kappa B [D12.776.260.600]
Amino Acids, Peptides, and Proteins [D12]
Proteins [D12.776]
Nuclear Proteins [D12.776.660]
NF-kappa B [D12.776.260.600]
Amino Acids, Peptides, and Proteins [D12]
Proteins [D12.776]
Transcription Factors [D12.776.930]
NF-kappa B [D12.776.260.600]
MeSH Full Listing
NF-kappa B
Ubiquitous, inducible, nuclear transcriptional activator that binds
to enhancer elements in many different cell types and is
activated by pathogenic stimuli. The NF-kappa B complex is
a heterodimer composed of two DNA-binding subunits: NFkappa B1 and relA.
Year introduced: 1991
Ssubheadings:
administration and dosage agonists analysis antagonists and
inhibitors biosynthesis blood cerebrospinal fluid chemistry
classification deficiency diagnostic use drug effects
genetics immunology isolation and purification metabolism
pharmacokinetics pharmacology physiology radiation effects
secretion therapeutic use toxicity ultrastructure
Restrict Search to
Major Topic headings only
Do Not Explode this term
Entry Terms:
(i.e., do not include MeSH terms found below
this term in the MeSH tree).
NF-kB
NF kB
Nuclear Factor kappa B
kappa B Enhancer Binding Protein
Immunoglobulin Enhancer-Binding Protein
Enhancer-Binding Protein, Immunoglobulin
Immunoglobulin Enhancer Binding Protein
Transcription Factor NF-kB
Factor NF-kB, Transcription
NF-kB, Transcription Factor
Transcription Factor NF kB
Ig-EBP-1
Ig EBP 1
Previous Indexing:
See Also:
DNA-Binding Proteins (1987-1990)
Transcription Factors (1987-1990)
I-kappa B
All MeSH Categories
Chemicals and Drugs Category
Amino Acids, Peptides, and Proteins
Proteins
DNA-Binding Proteins
NF-kappa B
All MeSH Categories
Chemicals and Drugs Category
Amino Acids, Peptides, and Proteins
Proteins
Nuclear Proteins
NF-kappa B
All MeSH Categories
Chemicals and Drugs Category
Amino Acids, Peptides, and Proteins
Proteins
Transcription Factors
NF-kappa B
Journals Database
Entrez -> Journals
A database of journal names and information
Entry structure:
Nature genetics.
pISSN: 1061-4036
MEDLINE Abbr: Nat Genet
ISO Abbr: Nat. Genet.
NLM ID: 9216904
• See also: ISI databases
Boolean Logic
• Boolean logic symbolically represents
relationships between entities. There are
three Boolean operators:
• AND
– Use the AND operator to retrieve a set in
which each citation contains ALL the search
terms. This operator places no condition on
where the terms are found in relation to one
another; the terms simply have to appear
somewhere in the same citation.
Boolean Logic
• OR
– Use the OR operator to retrieve documents
that contain at least one of the specified
search terms.
– Use OR when you want to pull together articles
on similar subjects.
• NOT
– Use the NOT operator to exclude the retrieval
of terms from your search.
– Be careful with NOT as you can exclude things
you might want
Boolean Logic in PubMed
• Boolean operators -- AND, OR, NOT -must be entered in uppercase letters.
• Boolean operators are processed from left
to right.
• Use parentheses to nest terms together so
they will be processed as a unit and then
incorporated into the overall strategy.
Boolean Logic in PubMed
Boolean Logic is revealed by clicking Details
• Entrez attempts to intelligently parse your query
Query: dna binding transcription factor macrophage
Details => (((("dna"[MeSH Terms] OR dna[Text Word]) AND
(("pharmacokinetics"[MeSH Subheading] OR "pharmacokinetics“
[MeSH Terms])
OR binding [Text Word]))
AND ("transcription factors“ [MeSH Terms] OR transcription
factor [Text Word]))
AND ("macrophages"[MeSH Terms] OR macrophage [Text
Word]))
• You can force a Boolean search
Query: “dna binding” AND “transcription factor” AND
macrophage
Details => (("dna binding"[All Fields] AND "transcription
factor"[All Fields]) AND ("macrophages"[MeSH Terms] OR
macrophage[Text Word]))
Phrase Searching
• Specify with quotes
“transcription factor” vs. “transcription” “factor”
• Precomputed
– Fast
– Often mapped to synonyms and MeSH terms
– Just because you get a “phrase not found”
message does not mean it is not present
Text Neighboring
Related articles link (single or multiple articles)
– Term usage similarity
• Articles talking about the same thing are likely to use the
same words
– Good recall (sensitivity)
– Precomputed and fast
Limitations
– Strictly algorithmic, no understanding
• “Ras activates PI3K” vs. “PI3K activates Ras”
– Historical and author biases in vocabulary
– Poor precision (specificity)
– Ranking can not satisfy everyone
Computational Issues in Statistical
Text Retrieval
• Stop words
– Simple words like “the” and “and” are not worth scoring
• Term weights
– We should weight matches of rare words more heavily than
matches of common words
• Stemming and synonyms
– Need to stem verbs and plural forms
– May or may not be able to reduce to a normalized set of synonyms
• Normalizing for length
– Don’t want to exclude short articles or articles without an abstract
• All vs. all comparison is not feasible
– 107 articles => 1014 comparisons, not feasible
– Compute demands of the task are growing faster than Moore’s law
Entrez Clipboard
• The Clipboard gives you a place to collect selected
citations from one search or several searches.
• After you add citations to the Clipboard, you may
then want to use the print, save, or order buttons.
• The maximum number of items that can be placed
in the Clipboard is 500.
• Once you have added items to the Clipboard, you
can click on Clipboard from the Features bar to
view your selections.
• PubMed Central uses cookies to add your
selections to the Clipboard. To use this feature,
your web browser must be set to accept cookies.
Using Clipboard
• Add to Clipboard
– To place an item in the Clipboard, click on the
check box to the left of the citation.
– Select Clipboard from the Send to pull-down
menu.
– Then click the Send to button. Once you have
added a citation to the Clipboard, the record
number color will change to green. Send to
“clipboard”
– You can save results collected from multiple
searches
– The Clipboard will hold a maximum of 500 items.
– Clipboard items will be lost after 1 hours of
inactivity.
Saving from the Clipboard
• Citations are initially displayed in the summary
format in the relevancy order.
• Use Sort to change the order. You can select all or
individual citations to display or save in one of the
citation display formats.
• Select the desired format from the pull-down
menu, click Save to save your selections to a file,
or use the Print feature of your web browser to
print the citations.
• Printing from your web browser will only print the
information and citations listed on the web page.
• You may also display citations as plain text without
the sidebar menu and toolbars by clicking the
Text button.
Document Display in PubMed
• PubMed Central displays your search results in
relevancy order by batches - the default is 20
citations per page.
• The Show pull-down menu allows you to change the
number of citations displayed on a single page up
to a maximum of 500 items. To do this:
• From the Summary Page, click on the Show pulldown menu and select a number. To have all of the
citations displayed on a single page, select a
number higher than the total number of your
search results.
• Click the Display button to redisplay your citations
according to your selection.
Modifying the Display
• PubMed Central citations are initially
displayed in a summary format. You can
choose to display other formats:
– Click on the Abstract, Full Text, PDF or PubLink
hyperlink for a specific citation.
– All Citations -Select a display format from the
Display pull-down menu and then click Display to
view a different display or Links for all
citations on the page.
– Selected Citations - Click on the boxes to the
left of each author to select specific citations
and then select a format or Links from the
Display pull-down menu and click Display.
Entrez History
• Retrieve and use your search history
– Boolean combinations of search results. To
combine searches use # before search number,
e.g., #2 AND #6.
– Filtering of previous search results
– This can help you on big searches to remember
and build on your terms
– Search History will be lost after eight hours of
inactivity
Address Fields
Find a local expert in PubMed
“Marshall University” AND (25755) [ad] OR “West Virginia”
[ad] NOT WVU [ad])
Need to think about all the ways people write
addresses
“Joan C. Edwards” fails to pick up “MUSOM.” Zip codes are
very specific, but only get about 70%, since they might not
list all authors zips
Won’t catch co-authored articles with a remote
collaborator
Secondary Indexes
Find articles about a Genbank entry – Query: “L44140 [si]”
1: Robertson SP, Twigg SR, Sutherland-Smith AJ, Biancalana V, Gorlin
RJ, Horn D, Kenwrick SJ, Kim CA, Morava E, Newbury-Ecob R,
Orstavik KH, Quarrell OW,Schwartz CE, Shears DJ, Suri M,
Kendrick-Jones J, Wilkie AO; OPD-spectrum Disorders Clinical
Collaborative Group. Localized mutations in the gene encoding the
cytoskeletal protein filamin A cause diverse malformations in
humans. Nat Genet. 2003 Apr;33(4):487-91.
2: Rivella S, Palermo B, Pelizon C, Sala C, Arrigo G, Toniolo D.
Selection and mapping of replication origins from a 500-kb region of
the human X chromosome and their relationship to gene expression.
Genomics. 1999 Nov 15;62(1):11-20.
3: Small K, Iber J, Warren ST. Emerin deletion reveals a common Xchromosome inversion mediated by inverted repeats. Nat Genet.
1997 May;16(1):96-9.
4: Chen EY, Zollo M, Mazzarella R, Ciccodicola A, Chen CN, Zuo L,
Heiner C, Burough F, Repetto M, Schlessinger D, D'Urso M. Longrange sequence analysis in Xq28: thirteen known and six candidate
genes in 219.4 kb of high GC DNA between the RCP/GCP and G6PD
loci. Hum Mol Genet. 1996 May;5(5):659-68.
PubMedCentral
• U.S. National Library of Medicine's digital
archive of life sciences journal literature
• Full text of many journal archives
– Not the most recent issues
– Limited journal collection
• Access to PMC is free and unrestricted
http://www.pubmedcentral.nih.gov/about/faq.html
Related Articles
• PubMed uses a powerful word-weighted
algorithm to compare words from the Title
and Abstract of each citation, as well as
the MeSH headings assigned. The best
matches for each citation are precalculated and stored as a set.
• You may see a few citations without the
Related Articles link. These citations have
not yet gone through the algorithm, which
takes several days.
Links
• The Links pull-down menu provides
access to the links between records
in the Entrez databases. All links,
except for Related Articles, are
included in the pull down menu.
LinkOut
• LinkOut provides links from PubMed and
other Entrez databases to a wide variety
of relevant web-accessible online resources
including full-text publications.
• To see the full list of web-accessible online
resources for an item, select LinkOut from
the Links pull-down menu.
• View the Abstract or Citation display
formats to see if there is an icon link to
full-text.
PubMed Link
NCBI Bookshelf
• The Bookshelf is a growing collection of
biomedical books that can be searched
directly
• Accessible as in text annotations on many
PubMed abstracts
– “Links” => “Books”
– Automated phrase indices hyperlinked to text
books
OnlineBooks
Example of Books Links
J Biol Chem. 2003 Aug 28 [Epub ahead of print].
Phosphorylation of serine S337 of NF-kappa B p50 is critical for DNA binding.
Hou S, Guan H, Ricciardi RP.
Microbiology Biochemistry, University of Pennsylvania, Philadelphia, PA 19342.
It has been demonstrated that phosphorylation of the p50 subunit of NF-kappa B
is required for efficient DNA binding, yet the specific phospho-residues of p50
have not been determined. In this study, we substituted all of the serine and
conserved threonine residues in the p50 Rel homology domain and identified three
serine residues, S65, S337 and S342, as critical for DNA binding without
affecting dimerization. While substitution with negatively charged aspartic acid at
each of these positions failed to restore DNA binding, substitution with threonine,
a potential phospho-acceptor, retained DNA binding for residues 65 and 337. In
particular, S337, in a consensus site for PKA and other kinases, was shown to be
phosphorylated both in vitro and in vivo. Importantly, phosphorylation of S337 by
PKA in vitro dramatically increased DNA binding of p50. This study shows for the
first time that DNA binding ability of NF-kB p50 subunit is regulated through
phosphorylation of residue S337 and has implications f o r both positive and
negative control of NF-kappa B transcription.t
PMID: 12947093 [PubMed - as supplied by publisher]
NCBI Handbook