preparing to search and Dialog 1

Download Report

Transcript preparing to search and Dialog 1

LIS618 lecture 1
Thomas Krichel
2004-01-2
Structure of talk
• Recap on Boolean (aurally)
• Before online searching
• Working with DIALOG
–
–
–
–
Overview
Search command
Bluesheets
Basic and additional index
before a search
• what is purpose
– brief overview
– comprehensive search
• What perspective on the topic
– scholarly
– technical
– business
– popular
I
before search
• What type of information
–
–
–
–
Fulltext
Bibliographic
Directory
Numeric
• Are there any known sources?
–
–
–
–
Authors
Journals
Papers
Conferences
II
before search
•
•
•
•
III
What are the language restrictions?
What, if any, are the cost restrictions?
How current need the data to be?
How much of each record is required?
Concept analysis
• This is the art/science of taking the topic to
search for and develop facets. Example
“Internet filtering in Libraries”
– Internet filter
– Libraries
– Controversy not technical issues
• We may also need the think about the aim
of the search.
Search aims
• a known needle in a known haystack
• a known needle in an unknown haystack
• an unknown needle in an unknown
haystack
• any needle in a haystack
• the sharpest needle in a haystack
• most of the sharpest needles in a haystack
Search aims
•
•
•
•
•
•
all the needles in a haystack
affirmation of no needles in a haystack
things like needles in a haystack
is there a new needle in the haystack
where are the haystacks
needles, haystacks, anything
types of searches
•
•
•
•
•
known-item searches
negative searches
selective dissemination of information
topical or subject searches
passage searching, where the user is only
interested in part of the item
search strategies I
• Building block approach
– Do a number of elementary searches
– Combine the resulting sets with Boolean
operators
• This is what I did in the example in the
previous lecture
• Works only with the Boolean model
search strategies II
• Snowballing approach
– Start with a very specific query
– Think of other term that can be added to get
more results
– Stop when a reasonable number of results
are achieved.
• Not sure this really works well in practice.
search strategies III
• The successive fraction approach is the
opposite of the snowballing approach
– First search for a broad concept
– Then repeat the query by adding various
limiting factors.
• Can work well if the IR system allows to
repeat and edit queries.
• But queries can become unwieldy.
search strategies IV
• Most specific facet first
– Conduct concept analysis
– Look for the most specific facet
– Search that first, add others later
• Presupposes that you have done a decent
concept analysis.
two steps in DIALOG
• step one: select databases (aka files) to
look at
• step two: perform searches on the
selected databases
• You may wonder why one does not have
one single step like in a search engine.
Discuss.
• today we concentrate on the second step
working on selected files
• We assume that we have selected
database that we know and we look at the
search interface on the selected database.
• The database selection process is a bit
more complicated, covered next week.
• First, let us login and look at the command
prompt.
• Then we select the first database (file) with
the begin command
The begin command
• As its name suggests, usually the first
command.
• begin number, number,…
• selects files with numbers number
• Once they are selected they can be
searched.
• Now select the ERIC "begin 1"
• "Begin 1" can be abbreviated as "b 1"
Substeps in the second step
• Identify search terms
• Use Dialog basic commands to conduct a
search
• View records online or print the results
the 's' (select) command
• Once issued the "begin" command to select a
database, we issue the "s" command on the
database.
• "s query_terms" where query_terms are the
query terms
• This will search the index of selected database
in full-text view for the query issued
• It will not find any of the following: "an and by for
from of the to with". They are stop words.
connectors
• If you want to use several keywords there
are three ways
– you can truncate search terms
– you can build an expression by putting
several keywords together. This is achieved
by DIALOG's connectors.
– you can combine several expressions with the
use of Boolean operators
• we will cover this is in turn now
truncation of terms
• Open Truncation
– "select path?" retrieves all words that begin
with path: paths, pathos, pathway, pathology
• Controlled-Length Truncation
– "select path??"
retrieves the root and up to
two additional characters: paths, pathos
truncation of terms II
• Embedded Character truncation can be used
for variant spellings:
– "select organi?ation" -> organization organisation
– "select fib??board" -> fiberboard fibreboard
• This truncation feature is also useful for
searching for unusual plural forms:
– "select wom?n"
-> woman women
• Apparently you can also do prefixes by putting
the ? in the beginning.
– "?mobile"
->
automobile metamobile
Use of connectors
• Connectors are used to put several words
together.
• One instance where this is useful is when
you have words that on their own mean
different things.
• For example "mate" is a herbal beverage
consumed in South America. Looking for
mate on the Internet retrieves a lot of
singles' pages.
example: terms related to "mate"
What other terms to be used?
– matear
– matero
– cebar
– cebador
– yerba
– bombilla
(drink mate)
(mate drinker)
(prepare mate)
(mate preparer)
(mate herb)
(mate straw)
connectors
I
• '(W)' requires terms to appear one after
the other next to each other e.g.
'yerba(W)mate?' matches "yerba mate".
• '(i W)' where i is an integer, means
followed by at most i words, e.g.
'ceba?(3W)mate?' matches "cebar un
maravilloso mate" but not "cebador guapo
mirando un buen mate"
connectors
II
• '(N)' requires terms to be next to each
other e.g. 'yerba(N)mate?' matches "yerba
mate" or "mate yerba".
• '(i N)' where i is an integer, means
proximity by at most i words, e.g.
'ceba?(3N)mate?' matches "cebar mate"
or "matear con la cebadora".
• '(S)' searches for the occurrence of
connected terms in the same paragraph.
using Boolean operators
• In your query, you can combine several
expressions with Boolean operators
• Example: "S LIBRARY(W)SCHOOL? AND
DISTANCE(W)EDUCATION"
• But I usually do not issue such fancy
queries.
executing several searches
• there can be several searches done
sequentially, and the results sets are
saved by the system.
• Each time the system assigns a set
number, Si,
• These can be combined in Boolean
expressions, e.g. 's S1 or S2 and S3'
• Remember that Boolean operations are
set-theoretic!
Boolean operators on sets
• when using Booleans, be aware that "and"
has higher precedence than "or".
• Thus:
a or b and c
is not the same as
(a or b) and c
but it is
a or (b and c)
• use parenthesis when in doubt
DS (display sets)
• This command can be executed any time
to review the sets that have been formed
since the last B (begin) command.
• This can be useful to review your search
history.
the target command
• "target set" where set is a search result
set creates a subset of the "statistically
most relevant results" in the original set.
• I have not seen details about how this
subset is computed.
• new result set is being formed.
display: the type command
type set/format/range
• set is a result set
• format is a format
• range can be
– start – end
• start is a record number to start
• end is a record number to end
– all
standard delivery formats
•
•
•
•
•
2 -- full record except abstract
3 or medium – citation
5 or long – full except full text
6 or free – title and dialog number
8 or short – title plus indexing terms
– useful to find other indexing terms
• 9 or full – everything
• KWIC or K – keywords in context
options for delivery
• I once tried to email results to me, to no
avail
• You can save the html of the search
results in the browser.
• You can print the results within the
browser.
http://openlib.org/home/krichel
Thank you for your attention!