Invisible or deep web

Download Report

Transcript Invisible or deep web

WISER Humanities:
Quality Information on the Internet
Johanneke Sytsema
Linguistics Subject Consultant
[email protected]
Judy Reading
Reader Services
[email protected]
Aims of the session
• An overview of the types of web
search tools
• Functionality and Focus of different
search tools
• Summary of helpful search
techniques
• Evaluating results
• Using gateways
Web basics
•
•
•
•
Organisation
Size
Scope
Invisible web
Wikipedia on “The deep web”
Invisible or deep web
• Subscription content
• Private sites (eg registration or log-in
protected)
• Dynamic content – returned from a query or
completed form
• Unlinked content
• File types which can’t be searched by search
engines eg multimedia files. PDF used to be
unavailable
Primary Search tools
• Search engines
–
–
–
–
General – Google
Specific Google Scholar (Oxford full text links)
Google Blog Search to search blogs
OpenDOAR www.opendoar.org to search
repositories
• Meta search engines – crawlers
• Gateways - Intute
• Reference tools - OxLip
Search engines
• Major players
–
–
–
–
Google (US and UK versions)
Google Scholar (www.scholar.google.com)
Yahoo search (www.yahoo.com or co.uk)
Ask Jeeves (www.ask.com or co.uk)
Search Engines
Advantages
• Index a large proportion of the public web
• Word for word indexing
• Easy to use and available
Disadvantages
• Huge number of hits generated
• No quality control
• Different advance searching techniques
• Public pages only, no databases
Google
• Worth looking in the Advanced search and Advanced Search
Tips to see what Google offers.
You can search:
• for a particular resource or kind of information such as books,
blogs, images or news
• Within a particular site or create a customised search engine
searching sites you select
• Sites which link to a particular site or are similar to a site
• The order of search terms matters and you can repeat key
words to influence what you retrieve
• Normally Google automatically truncates and combines with
“and”.
• Phrase searching (using quotation marks) very useful for
making searches specific
• Results are ranked for relevance through a secret formula which
includes the popularity of pages
Google Scholar
• You need the Virtual Private Network set up to gain
access to all Oxford subscription full-text if using a
home PC or laptop
• Make sure you have Oxford selected in your
preferences so the links to Oxford holdings works.
Google Scholar
• Advantages: quick and easy to use, full-text
searching, cited by links, huge general resource
linked to local holdings as well as full-text if available
• Disadvantages: not comprehensive – most recent
and oldest material may be missing, doesn’t have the
full functionality of subscription bibliographic
databases eg can’t mark from a list or save searches
to combine sets, no controlled vocabulary
• Use Advanced search options and read through the
search tips and help offered to make the most of
Google Scholar
• Getting better all the time – may provide “good
enough” quick results but not (yet?) sufficient for a
thorough literature search
Google Scholar
• Use ‘exact phrase’
• Use ‘with at least one of
the words’
• Use ‘Exclude the words’
• Set preferences to
choose Oxford full text
option
• Not comprehensive at
all
• Cannot sort by date
Google Scholar
613 hits
Link to Oxford
Full Text
Search techniques (1)
Too many results?
• Add more concepts
• Link terms
• Search in a particular field i.e. title
• Limit to UK pages
• Advanced searching options
• Search for exact phrase using “..”
• Set preferences
Search techniques (2)
Too few results?
• Broaden search term
• Add alternative phrases
• Try a meta search engine
Are you searching in the right place?
Have you tried subscription databases?
Meta Search engines (1)
A tool that searches across a number of
individual search engines retrieving the ‘top’
results from each
• Clusty (www.clusty.com) – ranks results
according to subject
• Metacrawler (www.metacrawler.com) – links
to the top 10 hits in other search engines
• Dogpile (www.dogpile.com) – no ranking;
searches by ‘exact phrase’
Clusty
Metacrawler: compare top ten
results in Google, Yahoo, Ask
Meta-Search engines (2)
Advantages
• Search across a number of engines using a single
interface
• Ranking according to subject (Clusty)
• Compare top ten results of Google, Yahoo, Ask in
Metacrawler
• Can save time searching
• More of the web searched
• Duplicates removed
Meta-Search engines (3)
Disadvantages
• Difficult to limit searches
• Search engine coverage:
• Metacrawler: Google, Yahoo, Ask
• CLusty: Ask, Open Directory, Gigablast and
others
Directories/indexes/Gateways
Lists of web resources grouped together in a
structured manner
• INTUTE http://www.intute.ac.uk/ subject
based web resources for education and
research (part of RDN Resource Discovery
Network)
• British Academy Portal
http://www.britac.ac.uk/portal/index.html
(humanities and social sciences, academic)
• INFOMINE scholarly internet resource
collections http://infomine.ucr.edu/
Directories/indexes/gateways
Advantages
• Created by people who have evaluated the
sites
• Quality resources
• Subject structure allows browsing
• Smaller and more manageable than engines
Disadvantages
• Browsing can return a long list of sites
• Difficult to identify which category to search in
• Indexed by title rather than word-for-word
INTUTE
INTUTE
• Search Example:
archeology/papyrology
• ‘Dead Sea scrolls and
Qumran’ contains info and
is a gateway to other sites
• Duke papyrus archive
contains information about
over 1,300 papyri
• Great Isaiah Scroll: images,
translation and discussion
of the text
INTUTE offers RSS feeds
You will be automatically updated when new quality sites in your
subject area are added to the database
Gateways
• STELLA: Gateway
produced by Glasgow
University
– English and Scottish
language links
– /www.arts.gla.ac.uk/SES
LL/STELLA/links.htm
• Refers to organisations,
gateways, text archives
• No search
• Points to useful
resources
Evaluating results
•
•
•
•
•
When was it produced?
Who is responsible for the information?
Why has it been published on the Internet?
Where is the page situated?
What is the value to you?
example
• http://home.wanadoo.nl/
mpaginae/
• Aim of this site?
• Who made it?
• When was it made?
• What can be searched?
example
•
•
•
•
www.kb.nl
This site in English
About us
Copyright & colofon
Summary
For focussed results use
• Specific search engines
• Meta-search engines
• Focus your search strategy
For quality results use evaluated resources:
• Directories
• Gateways
• Databases
This presentation will be available from the
WISER Presentations Archive
www.ouls.ox.ac.uk
OULS Home > E-resources > Information
skills and induction > WISER > WISER
Presentations Archive