Transcript Basics
Basics
Computer
Internet
Search
Strategy
Computer Basics
IP address: Internet Protocol Address
An identifier for a computer or device on a
network
The format of an IP address is four numbers
separated by periods. Each number can be zero
to 255. For example, 134.140.112.9
Can be static or on the fly
Internet Basics
The Internet vs. The World Wide Web
The Internet is not synonymous with
World Wide Web
Internet Basics
The Internet is a network of computer networks
Computers connected so they can communicate
with any other computer also connected, or
networked, to the internet
Used for communicating many kinds of
information using protocols including SMTP for
email and HTTP for web pages
The World Wide Web is only one part of the
Internet, which also includes email, newsgroups,
and instant messaging
Internet Basics
The World Wide Web is a “web” of documents,
called web pages, connected via hyperlinks
One way of communicating on the internet
Uses HTTP protocol
Accessed via browsers, such as Internet
Explorer or Netscape
Web pages can include graphics, audio, text,
and video.
Internet Basics
Web pages vs. websites
A web page is a document on the World
Wide Web
A website is a collection of web pages
including a home page, the main page on
the site and first to be viewed, plus
additional, related, hyperlinked pages
Internet Basics
URL: Uniform Resource Locator
The unique address of a web page
Can be persistent or dynamic
Format:
http://web.simmons.edu/~krajewsk
Internet Basics
A hyperlink is an element in an web page
that links to another place in the same
page or to an entirely different web page
Click on the hyperlink to access the linked
web page
Internet Basics
A domain is a group of computers sharing
a part of an IP address
Consist of a range of IP addresses
Will share the same basis of url
www.simmons.edu/libraries and
www.simmons.edu/gslis are all part of the
simmons.edu domain
Cache
Copies of frequently used data stored on a
local hard drive
Allows information to be accessed more
quickly because it does not have to be
retrieved from the internet each time it is
called
Browser
A Web browser is a software application
used to locate and display Web pages.
Most browsers can display text, graphics,
audio, and video
Internet Basics
On the web vs. Access via the web
On the web: online, free, available to
everyone
Via the web: online, but in a special,
restricted database, requiring a login and
/or subscription fee
Internet Basics
A search engine is a program that
searches the web for specified keywords
and returns a list of the web pages where
the keywords were found
Search Basics
Searching is the process of querying a
database—a library catalog, periodical
index, or search engine—to find relevant
information
Search Basics
Each item in a database is called a RECORD
All records are INDEXED by specialists or
computers who pull out key pieces of information
Each key piece of information indexed belongs
in a specific FIELD, which is generally
searchable (author, title, or specific to subject)
HITS are the number of records in the entire
database that match your search terms
Search Basics
Syntax – The “language” of the database
you are searching
HOW you translate you information need
into a query
Search Basics
Boolean operators are connectors used to define the
relationship between or among your search terms:
OR – Either Term A or Term B must be present on a
web page for it to be included in your results list
AND – Both Term A and Term B must be present for the
web page to appear on the results list
NOT – Term A must be present and Term B must not be
present for the web page to appear on the results list
Dogs OR Cats
Gets both, might be overlap
Dogs
Cats
Dogs AND Cats
Only gets records where both appear
Dogs
Cats
Dogs NOT Cats
Eliminates Records where Cats Appear
Dogs
Cats
Parentheses: Nesting
(Dog? or Pupp?) and (Cat? or Kitten?)
Dog or Dogs or
Puppy or
Puppies
Cat or Cats
or Kitten or
Kittens
Search Basics
Proximity operators specify how close search terms must appear
together in a web page to be included in the results list:
Next to – Term A and Term B must appear right next to each other
for the web page to appear in the results list
Near – Term A and Term B must be near each other for the web
page to appear in the results list
Within # - Term A and Term B must appear within a certain number
of words for the web page to appear in the results list
Same paragraph - Term A and Term B must appear in the same
paragraph for the web page to appear in the results list
Search Basics
Truncation is the use of a symbol to stand
for any possible ending of a root
Eliminates the need for long searches with
similar words separated by the Boolean
operator OR
Example Child* The asterisk * can stand for any
possible ending of the root child, such as child,
children, childhood, child’s, children’s,
Search Basics
Wildcard symbols can stand for any
character or characters within a word
Useful for roots that have many unrelated
endings
Example wom?n can stand for woman,
women, womon, womyn
Search Basics
Searching terms as a phrase dictates that they
appear in the order specified, right next to each
other, in the web page
Sometimes automatic
Useful in searching for short quotations
Example “hot cross bun” finds only web pages
with that exact phrase, eliminating those that
have the words hot, cross, and bun unrelated to
one another
Search Basics
Limits restrict what part of the web page is
searched
Limited limiting capabilities with search engines
Usually searches metadata, information that
cannot be seen on the web page, itself
Example Language:English finds only web
pages with English language text
Search Basics
Search Index syntax varies
Usually no field searching
Limited truncation and wildcards
Boolean “AND” may be assumed
Phrase syntax important
Limit search effectiveness dependent on
web page creators’ included metadata
Search Basics
Natural language searching common with
search engines
No connectors (no boolean, proximity, etc.)
Statistical algorithm for “relevance”
Term
frequency
Term location
Proximity of terms to each other
Uniqueness of term
Possibly “popularity” of document
Build taxonomy “on the fly”
Search Strategies
Key Factors for Successful Web Searching :
Which search engines/resources you
choose for the search
How carefully you formulate & execute
the search terms & search logic
How much information is actually
available
Search Strategies
Precision & Recall are traditional measures of a
successful search
Recall: % of relevant records found of all the relevant
records (possible hits) in file
“How much of the good stuff did your search produce?”
measured against all the possible relevant hits
Precision: % of relevant records within search
results
“How much of the bad stuff did your search produce?”
measured against what you actually retrieved
Search Strategies
Precision vs. Recall
Usually inverse relationship
High
Recall
Low
Low
Precision
High
Search Strategies
Precision is assured by choosing enough
appropriate concepts
Recall is assured by choosing enough
appropriate synonyms
Search Strategies
Choosing Search Words
Make a list of concrete words that define
the topic
Identify alternatives
Search Strategies
Simplify words:
Plurals: s, es, y-ies often will automatically
be searched
Truncate (often * or !): packag* or wrap*
Search Strategies
Eliminate general and assumed/applied terms
Leave out “science” if you are search a science
search engine
Consider whether or not to search “efforts” general term implied in most articles – an web
page discussing effort to do something is likely
not to include the word “efforts”
Search Strategies
Be the most specific (narrowest) when:
Sure of target document(s)
Don’t care about recall (precision first)
Don’t have time to plan
“Quick & Dirty”
Search Strategies
To narrow a search
“And” in a new concept
Use fewer terms in concept sets
Be more specific - use proximity over ands
Go from free text to controlled vocabulary/fields
Truncate further right
Use narrower, more specific vocabulary terms
Qualify search strategies to titles, abstracts, descriptors
Limit by language, publication year, type
Search Strategies
Be less specific (broadest) when:
Need comprehensive retrieval (high
recall)
“Feeling your way”
Unsure of terms
Unsure of database content
Fuzzy topic
Search Strategies
To broaden a search:
Eliminate a concept set - the least crucial
OR in more terms
Be less specific - go from descriptors to free text
Use broader ands instead of proximity
Truncate further left
Use broader controlled vocabulary terms
Remove qualifiers; search full text
Remove limitations