Basics of Information Retrieval and Query Formulation
Download
Report
Transcript Basics of Information Retrieval and Query Formulation
Basics of
Information Retrieval and
Query Formulation
Bekele Negeri Duresa
Nuclear Information Specialist
Outline
Information Retrieval
The INIS Collection
Planning searches
Formulating queries
Measure of Search effectiveness
Using search results and queries
2
Information Retrieval
Information Retrieval (IR) is finding material (usually
documents) that satisfies an information need from
within large collections (usually stored on
computers).
Today we frequently think first of “web search”, but
includes:
E-mail search
Searching your laptop
Corporate knowledge bases
Structured bibliographic databases
3
The INIS Collection
INIS Collection is comprised of
The INIS bibliographic Database (3,842,516 records)
The INIS Nonconventional Literature (NCL) collection
(510,458 total and 369,140 publicly available)
A bibliographic database consists of data (records) whose
attribute are described in fields and a means by which to
search these fields, a search engine.
Databases may look different on screen but the underlying
principles for searching and formulating search strategies are
common to all.
4
Bibliographic Databases (Types)
Bibliographic Databases: provide only citation or reference
author(s), title, subject(s) and publisher..) and with this
information you should be able to locate the item in the
Library.
Bibliographic with some full text content: these enhanced
databases include keywords and abstracts and often, but
not always, include the full text of a set of records.
Bibliographic databases with full text content: these
databases include the entire full text for all articles and
other documents indexed.
5
Planning your search
“Understanding the Problem is Half of the
Solution”
Define precisely the information you are
seeking
Identify the concepts that represent the
problem
6
Planning a search: example
Risk of medical radiation exposure?
To whom: to patients or doctors and medical technicians?
If to patients: are you concerned about exposure due to
radiodiagnostics (CT, X-ray..) or due to radiation therapy or
both?
If radiotherapy: are you concerned about radionuclide
therapy or external irradiation therapy?
If about personnel: are you interested about safety policy
of medical establishements?
7
Planning your search (Cont’d)
Topic: Risk of radiation exposure of medical staff in a
radiotherapy department
Concepts:
Exposure to radiation
Medical staff
Radiotherapy
Are we looking for documents
In certain language, from certain country..
Latest publications
Only records with full text documents, or journal citations..?
8
IINIS Database Fields
Numerical (Exact or Range search)
Year of publication (PY)
Reference Number (RN)
Free Text
Title (TI)
Source (SO)
Authors (AU)
Abstract (AB)
Controlled Vocabulary
Language (LA)
Country of Input (CO)
Descriptors (DE)
Indexer-assigned descriptors (DEI)
Computer-upposted descriptors (DEC)
9
Search Strategy
Simple search
single search term or phrase
“Oncology” , “nuclear safety”
Advanced search (combining concepts)
Boolean Operators:
OR, AND, NOT
Text Operators
any (includes any), all (includes all), exact phrase
Numeric Operators
equal, more, less, more or equal, less or equal
Truncation, Wildcard
Multilingual Search
10
Query Syntax
Google Search Appliance
11
Query Formulation
Translating your search concepts into proper
search syntax
For the Topic: Risk of radiation exposure of medical staff in a radiotherapy
department
Simple to complex
MEDICAL PERSONNEL (is a BT for RADIOLOGICAL PERSONNEL)
OCCUPATIONAL EXPOSURE or OCCUPATIONAL SAFETY or RADIATION PROTECTION
RADIOTHERAPY
Try to search for individual terms and explore the database; you may identify other key concepts like
radiation doses, ALARA, dose limits…
Then combine them using boolean operators (and, or, not)
Some databases allow you to combine searches while others allow you to combine your results
during selection of records
12
Measuring Search Effectiveness
Precision & Recall
Recall: the ratio of the number of relevant records retrieved to the total
number of relevant records in the database.
13
Precision: the ratio of the number of
relevant records retrieved to the total
number of irrelevant and relevant records
retrieved.
Precision and recall are
Inversely related
High recall = comprehensive retrieval
but high noise
High Precision = only relevant records
but miss out good records
Source: http://www.creighton.edu/fileadmin/user/HSL/docs/ref/Searching_-_Recall_Precision.pdf
14
Optimize your search strategy
Precision
Search in particular field
Search in DEI (indexer assigned descriptor)
Use exact Phrase
Combine using “AND”
Recall
Search across fields
Combine synonyms, related terms, broad or general terms
Use “any” or “all” words
Optimise
Use your best judgment
From simple to complex
15
Using your Query and search
results
Selecting relevant records
select format (pdf/ html/excel..)
Printing/saving
Email search results
Storing query
Save and run query
Subscribe Feeds
16
Thank you!
17