Search Strategies

Download Report

Transcript Search Strategies

Search Strategies
Online Search Techniques
Universal Search Techniques
• Precision- getting results that are relevant, “on topic.”
• Recall- getting all of the relevant results, i.e. not being
too narrow in search scope
• Subject vs. Keyword
• Strategies:
–
–
–
–
–
–
Boolean
Nesting
Phrase Searching
Truncation
Proximity
Field Searching
Subject vs. Keyword
• All cataloged and indexed materials have
assigned headings called “subjects”
• Subject headings describe the “aboutness” or
topic of the work, bring together all of the works
on the same topic, despite differences in text.
• Subject headings are “controlled”- they are
carefully selected from existing lists called
“controlled vocabularies”
• Subject searches only search within the assigned
subject field within a database record
Subject vs. Keyword
• Keywords are natural language
• Different people (including authors) use
different words to describe the same topic
• Keywords are not controlled
• Keyword searches typically search an entire
database record, which increasing includes
document text.
Subject Searching
• Advantages
– More precise; fewer
irrelevant results
– More manageable
numbers of results
– Increases recall by
disambiguating and colocating alike terms
• Disadvantages
– Unfamiliar to users
– Controlled vocab difficult
to discover and manage
Keyword Searching
• Advantages
– “Natural” languagerequires no special
knowledge
– Increases recall by
searching full record
– Can lead to subject
searching
• Disadvantages
– Less precision; more
irrelevant results (due to
ambiguous terms,
retrieving terms from
irrelevant parts of record,
i.e. notes, author, etc.)
– Synonyms/ different terms
mean loss of recall unless
all terms are searched
– More weeding/ larger
numbers of results
Subject vs. Keyword
Boolean Searching
• Boolean Operators: And, Or, Not
• Allows us to broaden our search by adding like
terms (using “or”)
• Allows us to narrow our search by searching
more than one topic at a time (using “and”)
• Allows us to eliminate unwanted results (using
“not”)
Boolean Searching
• Keyword grid- helps organize thoughts:
• Childhood obesity
Childhood
OR
AND
Obesity
OR
AND
OR
AND
Rates
Youth
Overweight
AND
OR
AND
OR
Statistics
Adolescent
BMI Index
AND
OR
Prevalence
Nesting
• When we combine several terms, we have to
groups like terms together
• Search engines often “read” search strings like
a sentence• Child* and obesity or overweight will look for:
• Child and obesity as one search, with both
terms appearing in the document, and obesity
as a separate search, not combined with child
Nesting
• To avoid confusion, we nest terms with
parentheses
• In general, interchangeable terms (those
connected with “or”) go in parentheses.
• Child* and (obesity or overweight) will look
for the word child with either obesity or
overweight- every record will have one of
these combinations.
Phrase Searching
• Most databases automatically “and” search terms
together if no Boolean operator is specified.
• With “and” both words will appear in the record
(or full text) but they may not be anywhere near
each other, may not be related.
• To specify a phrase, use quotations
• “information literacy” will search for those two
words as a phrase.
Truncation
• Allows us to search word variations
automatically
• Some databases differ, but usually indicated
by a “*”
– childhood = child*
– obesity = obes*
Wildcard
• Like truncation, allows for variations within a
word
– color = colo*r
– behavior = behavio*r
Proximity
• Specify the distance and/or word order of search
terms
• Operators= “within” usually written as “w” and
“near” usually written as “n”
– Childhood w5 obesity = the words childhood and
obesity must both appear, in that order, with no more
than 5 words between.
– Childhood n5 obesity = the words childhood and
obesity must both appear, with no more than 5 words
between, in any order.
Proximity
• Increases precision, as words close together
are more likely related
• Databases vary in level of specificity– Some allow you to search terms up to 25 words
apart.
– Some allow you to search within sentence,
paragraph, or page rather than give a word count.
Field Searching
• Databases offer many additional ways to limit
our search
• Typical are author, title, subject, journal title,
date, document type, etc.
• These are called field- vary by database
• May be more fields in an “advanced” or
“expert” search than are available to the basic
search screen.
Other considerations
• Stop words- a number of words, considered
superfluous, are automatically dropped from searches.
These include the boolean operators (and, or, not) as
well as articles and prepositions (a, an, the, of, etc.). If
these words are essential, use quotations to indicate a
phrase search and they will be included.
• Thesaurus- many databases allow you to search their
controlled vocabulary through a “thesaurus” feature
• Browse index- similarly, you may be able to search or
browse other fields, such as journal names
Other considerations
• Inconsistencies- there is no standardization or
universal control of database. As a result, field
names, search operators, available fields, etc.
can vary. i.e. the field for the title of a journal
is variously called “journal name” “journal
title” and “source
• Time outs- some databases automatically time
out after a certain period of inactivity, and all
work will be lost