Basics of Information Retrieval and Query Formulation

Download Report

Transcript Basics of Information Retrieval and Query Formulation

Basics of
Information Retrieval and
Query Formulation
Bekele Negeri Duresa
Nuclear Information Specialist
Outline






Information Retrieval
The INIS Collection
Planning searches
Formulating queries
Measure of Search effectiveness
Using search results and queries
2
Information Retrieval
 Information Retrieval (IR) is finding material (usually
documents) that satisfies an information need from
within large collections (usually stored on
computers).
 Today we frequently think first of “web search”, but
includes:
 E-mail search
 Searching your laptop
 Corporate knowledge bases
 Structured bibliographic databases
3
The INIS Collection
 INIS Collection is comprised of
 The INIS bibliographic Database (3,842,516 records)
 The INIS Nonconventional Literature (NCL) collection
(510,458 total and 369,140 publicly available)
 A bibliographic database consists of data (records) whose
attribute are described in fields and a means by which to
search these fields, a search engine.
 Databases may look different on screen but the underlying
principles for searching and formulating search strategies are
common to all.
4
Bibliographic Databases (Types)
 Bibliographic Databases: provide only citation or reference
author(s), title, subject(s) and publisher..) and with this
information you should be able to locate the item in the
Library.
 Bibliographic with some full text content: these enhanced
databases include keywords and abstracts and often, but
not always, include the full text of a set of records.
 Bibliographic databases with full text content: these
databases include the entire full text for all articles and
other documents indexed.
5
Planning your search
 “Understanding the Problem is Half of the
Solution”
 Define precisely the information you are
seeking
 Identify the concepts that represent the
problem
6
Planning a search: example
Risk of medical radiation exposure?
 To whom: to patients or doctors and medical technicians?
 If to patients: are you concerned about exposure due to
radiodiagnostics (CT, X-ray..) or due to radiation therapy or
both?
 If radiotherapy: are you concerned about radionuclide
therapy or external irradiation therapy?
 If about personnel: are you interested about safety policy
of medical establishements?
7
Planning your search (Cont’d)
 Topic: Risk of radiation exposure of medical staff in a
radiotherapy department
 Concepts:
 Exposure to radiation
 Medical staff
 Radiotherapy
 Are we looking for documents
 In certain language, from certain country..
 Latest publications
 Only records with full text documents, or journal citations..?
8
IINIS Database Fields
 Numerical (Exact or Range search)
 Year of publication (PY)
 Reference Number (RN)
 Free Text
 Title (TI)
 Source (SO)
Authors (AU)
Abstract (AB)
 Controlled Vocabulary
 Language (LA)
 Country of Input (CO)
 Descriptors (DE)
 Indexer-assigned descriptors (DEI)
 Computer-upposted descriptors (DEC)
9
Search Strategy
 Simple search
 single search term or phrase
 “Oncology” , “nuclear safety”
 Advanced search (combining concepts)
 Boolean Operators:
OR, AND, NOT
 Text Operators
any (includes any), all (includes all), exact phrase
 Numeric Operators
equal, more, less, more or equal, less or equal
 Truncation, Wildcard
 Multilingual Search
10
Query Syntax
Google Search Appliance
11
Query Formulation
 Translating your search concepts into proper
search syntax

For the Topic: Risk of radiation exposure of medical staff in a radiotherapy
department







Simple to complex
MEDICAL PERSONNEL (is a BT for RADIOLOGICAL PERSONNEL)
OCCUPATIONAL EXPOSURE or OCCUPATIONAL SAFETY or RADIATION PROTECTION
RADIOTHERAPY
Try to search for individual terms and explore the database; you may identify other key concepts like
radiation doses, ALARA, dose limits…
Then combine them using boolean operators (and, or, not)
Some databases allow you to combine searches while others allow you to combine your results
during selection of records
12
Measuring Search Effectiveness
 Precision & Recall
 Recall: the ratio of the number of relevant records retrieved to the total
number of relevant records in the database.
13
 Precision: the ratio of the number of
relevant records retrieved to the total
number of irrelevant and relevant records
retrieved.
 Precision and recall are
Inversely related
High recall = comprehensive retrieval
but high noise
High Precision = only relevant records
but miss out good records
Source: http://www.creighton.edu/fileadmin/user/HSL/docs/ref/Searching_-_Recall_Precision.pdf
14
Optimize your search strategy
 Precision




Search in particular field
Search in DEI (indexer assigned descriptor)
Use exact Phrase
Combine using “AND”
 Recall
 Search across fields
 Combine synonyms, related terms, broad or general terms
 Use “any” or “all” words
 Optimise
 Use your best judgment
 From simple to complex
15
Using your Query and search
results
 Selecting relevant records
 select format (pdf/ html/excel..)
 Printing/saving
 Email search results
 Storing query
 Save and run query
 Subscribe Feeds
16
Thank you!
17