Searching the Deep Web

Download Report

Transcript Searching the Deep Web

LEMA, February 2011
Deep Web Video
1 trillion +
Pages
25
%
Surface Web: accessible
via general-purpose
search engines such as
Google and Yahoo!
75
%
Deep Web: Not
accessible via typical
search engines;
primarily databases
500 trillion +
Pages!!
AKA visible vs.
invisible web
Image from express.howstuffworks.com, 14 Feb 11
The “deep web” contains …
 Databases which use dynamic or temporary links
 Often ?, &, CGI, other elements in the URL
 Websites which aren’t indexed, by design or because there
are no links to it
 Deep web sites
 Google limits the amount of a web site it indexes, an
unpublished factor in its secret algorithm
 At one point, only 110K
 Formats that aren’t currently supported
 Google now shows results for .pdf, .doc, .ppt
 Boundary between surface and deep web always in flux as
search engines incorporate more of the deep web at the
same time more is being added to the deep web
Deep Web: Why important?
 Studies show that students’ searching habits are fairly
ingrained by college



Use Google for everything
Only look at the 1st page of results
Assume trustworthiness of web sites
 Rich source of in-depth material not accessible through
a typical Google search
 Expose students now to richer and more authoritative
resources.
Students need to understand ….
 The best results are NOT in the top 10
 Everything’s NOT on the web
 Google does NOT search the whole web
 Everything’s NOT free
 Everything’s NOT trustworthy
 Searching/Research is NOT always easy
How can we help our students be
better searchers?
 Introduce them to the idea that Google isn’t everything &
why
 Reinforce the idea of evaluating resources
 Make them better “surface” searchers
 Many information needs can be met with the surface web
 Easy yet “advanced” Google searching techniques
 Better alternatives to the “surface web” & how to effectively
search these alternatives
 Databases!
 Familiarity with “deep” sites on a particular topic
 Example: Primary materials available at Library of Congress
 Example: Legislative info at thomas.loc.gov
 Familiarity with portals and directories
Three simple techniques to being a
better Google searcher ….
 Phrase searching
 “xxx xxxx”
 Searching the title of web pages
 intitle: xxx or intitle:”xxx xxxx”
 Example: intitle:”climate change”
 Example: intitle:unicorn
 Specifying a site
 site:.xxx or site:xxx.com
 egypt site:washingtonpost.com
 “climate change” site:.gov
NOTE:
1. No space after
colon
2. Lowercase
commands
Let’s try a site: search ….
 Look for a Washington Post article on the B-52s
Now let’s try a phrase search…
 First, try Howard Morris as a simple keyword search --
How many hits?
 Now try it as a phrase “Howard Morris”
 How many hits?
Now let’s try an intitle: search
 First, just search for “climate change” – how many
hits?
An intitle: search
 Now try searching for “climate change” in the title of
the web page – how many hits?
Searching the Deep Web
 LVHS Library Web Page – Deep Web link on the left
 Google search for your topic and add keyword
database
 Ex: Plane crashes database
The Deep Web: A Comparison
 Using Google, search on the term metabolism
 Open a separate tab, go to www.science.gov and
search metabolism again
 Looking at the top ten results of each, which provided
generally “better” information?
 How difficult/easy is it to pursue your search in related
fields?
Directories/Portals of Interest
 Ipl2
 January 2010
 Merge of Internet Public Library and Librarians’ Internet
Index
 Librarians and Information Science Professionals
 Hosted by Drexel University’s College of Information Science
& Technology
 Infomine
 University-level scholarly resources
 Librarian built and maintained
 University of California
 Virtual Private Library
Other Resources
 LVHS Library Web Page – Deep Web link on the left
 Going Beyond Google: The Invisible Web in
Learning and Teaching by Jane Devine and Francine
Egger-Sider, 2009
 Not as up-to-date as web resources, but
 Very focused on teaching
Any questions?