Searching the Deep Web
Download
Report
Transcript Searching the Deep Web
LEMA, February 2011
Deep Web Video
1 trillion +
Pages
25
%
Surface Web: accessible
via general-purpose
search engines such as
Google and Yahoo!
75
%
Deep Web: Not
accessible via typical
search engines;
primarily databases
500 trillion +
Pages!!
AKA visible vs.
invisible web
Image from express.howstuffworks.com, 14 Feb 11
The “deep web” contains …
Databases which use dynamic or temporary links
Often ?, &, CGI, other elements in the URL
Websites which aren’t indexed, by design or because there
are no links to it
Deep web sites
Google limits the amount of a web site it indexes, an
unpublished factor in its secret algorithm
At one point, only 110K
Formats that aren’t currently supported
Google now shows results for .pdf, .doc, .ppt
Boundary between surface and deep web always in flux as
search engines incorporate more of the deep web at the
same time more is being added to the deep web
Deep Web: Why important?
Studies show that students’ searching habits are fairly
ingrained by college
Use Google for everything
Only look at the 1st page of results
Assume trustworthiness of web sites
Rich source of in-depth material not accessible through
a typical Google search
Expose students now to richer and more authoritative
resources.
Students need to understand ….
The best results are NOT in the top 10
Everything’s NOT on the web
Google does NOT search the whole web
Everything’s NOT free
Everything’s NOT trustworthy
Searching/Research is NOT always easy
How can we help our students be
better searchers?
Introduce them to the idea that Google isn’t everything &
why
Reinforce the idea of evaluating resources
Make them better “surface” searchers
Many information needs can be met with the surface web
Easy yet “advanced” Google searching techniques
Better alternatives to the “surface web” & how to effectively
search these alternatives
Databases!
Familiarity with “deep” sites on a particular topic
Example: Primary materials available at Library of Congress
Example: Legislative info at thomas.loc.gov
Familiarity with portals and directories
Three simple techniques to being a
better Google searcher ….
Phrase searching
“xxx xxxx”
Searching the title of web pages
intitle: xxx or intitle:”xxx xxxx”
Example: intitle:”climate change”
Example: intitle:unicorn
Specifying a site
site:.xxx or site:xxx.com
egypt site:washingtonpost.com
“climate change” site:.gov
NOTE:
1. No space after
colon
2. Lowercase
commands
Let’s try a site: search ….
Look for a Washington Post article on the B-52s
Now let’s try a phrase search…
First, try Howard Morris as a simple keyword search --
How many hits?
Now try it as a phrase “Howard Morris”
How many hits?
Now let’s try an intitle: search
First, just search for “climate change” – how many
hits?
An intitle: search
Now try searching for “climate change” in the title of
the web page – how many hits?
Searching the Deep Web
LVHS Library Web Page – Deep Web link on the left
Google search for your topic and add keyword
database
Ex: Plane crashes database
The Deep Web: A Comparison
Using Google, search on the term metabolism
Open a separate tab, go to www.science.gov and
search metabolism again
Looking at the top ten results of each, which provided
generally “better” information?
How difficult/easy is it to pursue your search in related
fields?
Directories/Portals of Interest
Ipl2
January 2010
Merge of Internet Public Library and Librarians’ Internet
Index
Librarians and Information Science Professionals
Hosted by Drexel University’s College of Information Science
& Technology
Infomine
University-level scholarly resources
Librarian built and maintained
University of California
Virtual Private Library
Other Resources
LVHS Library Web Page – Deep Web link on the left
Going Beyond Google: The Invisible Web in
Learning and Teaching by Jane Devine and Francine
Egger-Sider, 2009
Not as up-to-date as web resources, but
Very focused on teaching
Any questions?