No Slide Title

Download Report

Transcript No Slide Title

The UIC Arrowsmith Site
1. The UIC Arrowsmith website and the U of C Kiwi site
– two means, same end.
http://arrowsmith.psych.uic.edu and http://kiwi.uchicago.edu
(and don’t forget http://arrowsmith2.psych.uic.edu, too!)
Five modes of discovering inferences
across two or more literatures….
Mode 1. The transitive model
A – B – C and variations
 Two literatures can be connected by transitive relations
such as “affects,” “causes,” “stimulates,” “implies,”
“correlates positively with,” “is a subset of”.
 Transitive relations may not be simple: if A is correlated (r = 0.8) with
B, and B is correlated (r = 0.8) with C, then A and C are likely to be
positively correlated but the r value might be higher or lower than 0.8
(and may not be correlated at all, in fact, if the correlations A – B and B –
C occur along entirely independent dimensions).
 Relations can be “suggestive” even when not truly logical or transitive:
 Stress depletes body stores of Magnesium
 Stress is associated with Migraine

“suggests” that Mg depletion might be one of the
intervening steps predisposing to migraine.
 Is an inference A – B – C likely to have biological validity?
 Are there multiple, coherent B-links between A and C across
the two literatures?
 Do A – B and B – C assertions correspond to known
biological mechanisms?
 Why has the inference A – C not been stated explicitly in the
literature? Perhaps, simply it has not been published yet. In
such cases, Arrowsmith analyses can help to identify hot,
promising research questions.
 Or the inferences may represent true “gaps”, overlooked
by investigators because A – B and B – C are stated in widely
different disciplines or at different time periods.
Mode 2 – A and C have B in common.
Here, B is a common feature or property of both A and C.
 what signs/symptoms B1,2,3,… occur in both diseases A and C?
 or, given two different signs or symptoms A and C, which diseases
B1,2,3,… have both symptoms been reported in case studies?
E.g. A = retinal detachment and C = aortic aneurysm
Most of the A papers are in ophthalmology, and C are in the
cardiology literature. Arrowsmith searches are able to construct
such a list of B terms readily(e.g. Marfan syndrome), even when A
and C are mentioned in large, non-overlapping sets of papers.
 Conventional Medline searches can only find papers that
mention A and C together.

Change in terminology or context, or conceptual reformulation in some field.

“radial glial cells” and “neural stem cells” -- distinct cell types? No, the
same cells regarded from two different perspectives ( ). Thus, two large
bodies of literature now need to be entirely re-evaluated in the light of this
discovery.

A = “radial glial cells”, C = “neural stem cells”
The B-terms comprise a list of items that have been studied in both
literatures (albeit in different contexts).

“reelin” enhances radial glial cell phenotype in the developing cortex,

stimulates neural stem cell differentiation in vitro and in adult cortex ( ),

inference that reelin should affect neuroblast proliferation in developing
cortex.
Mode 3 – Generalizing literatures within categories.
A known linkage A1 – B1 – C1 can be converted to a new
hypothesized linkage by replacing A1, B1 or C1 with new,
more specific or more general terms.
 aspirin (A1) inhibits cyclooxygenase (B) and thereby helps headaches
(C).
 one may hypothesize that other drugs A2,3…that have the same
molecular actions also may help headaches. So, A1 – B1 – C1 can be
generalized to A2 – B1 – C1, etc.
 Hierarchical controlled vocabularies such as UMLS (Unified Medical
Language System) and MeSH can automatically generate terms of
greater generality.
 Thus, in principle, a single Arrowsmith search can be automatically
extended to comprise an envelope of different searches.
 Or, the “related articles” feature of PubMed ( ) can define a larger
set of articles related to any given literature.
Mode 4 – Find a subset of one literature that is related to
another.
Prof. Jones is an expert in field (A), but needs to obtain information in
unfamiliar field (C). The goal is not to learn everything, just the
material most likely to be relevant to A.
 One can read review articles, but if fields A and C are not often
studied in the same context (e.g., herpetology and weather
forecasting), relevant review articles may be lacking.
 This subset of C can be readily defined as the BC papers, i.e. those
papers within C having one or more B-terms in common with the A
literature.
 title words and phrases or Medical Subject Headings (MeSH terms)
as B-terms.
 Thus, the goal of this mode is not to make inferences across
literatures, but to use one literature to define a relevant “slice” or
subset of the other.
 This mode shows all the BC titles, clustered into distinct subjects or
themes.
Mode 5 – Literature-based Discovery in BioInformatics.
(see Vetle’s talk)
The UIC Arrowsmith website currently is set up
to link literatures via title words and phrases. Why?
 Titles of biomedical papers are informative
 Usually state the main finding, clearly and concisely
 Much less noise than in abstracts or full-text
 Words can bridge different fields more easily than concepts
 “epigenetics” changed its meaning about 4 times this century
 ambiguity/change of meaning is GOOD for linking literatures
 Can link to non-Medline papers readily (cf. MeSH or UMLS links)
Research explorations:
 Add terms from abstracts?
 Or, full-text of online papers? But need to distinguish
sections.
 Link literatures via other characteristics of papers:
 UMLS concepts (Weeber)
 MeSH headings (Swanson, Srinavasan, Hristovski)
 Affiliations (Swanson)
 Authors (see later)
 “Related articles” to expand A and C literatures
until they overlap.
B-term filters: shorten and rank the B-list.
 Semantic filter to assign terms to UMLS categories

GO list for anatomical terms, gene and protein names

Synonym metric to relate and merge terms
that are within a single semantic category

Frequency filter (low frequency, high frequency terms)
 Recency filter
 MeSH filter


To assist in ranking title B-terms

As an alternative means of linking literatures via shared MeSH terms
Enrichment filter to identify terms more frequent in A and/or C
relative to Medline as a whole.
The “one node” search (on Kiwi now,
soon to be on Arrowsmith)
 Making a chain of inferences.
 Special cases of disease-drug, gene-gene or proteinprotein associations.
Automating aspects of Arrowsmith searching:
 Automatically generalizing an A – B – C search
as an envelope
 Automatic one-node or two-node searching
Alerting Service to identify papers that cross disciplines
 New papers within a user-specified literature that
introduce terms or concepts for the first time into that
literature.
 Examine all new papers in Medline, find those that
introduce terms or concepts into one of the disciplines
covered by the paper.