Slides for 10/25 -- The Deep Web
Download
Report
Transcript Slides for 10/25 -- The Deep Web
The Invisible/Deep Web
The Private Web
Sites excluded by a Webmaster
Password protection
“Noindex” metatag
Robot exclusion protocol
The Opaque Web
Files not indexed in search engines
because:
Depth of crawl
Frequency of crawl
Disconnected URLs
Sherman and Price (2001)
The Proprietary Web
Registered Sites
e.g., New York Times
Fee-based Sites
e.g., Wall Street Journal
Sherman and Price (2001)
Truly Invisible Web
Technical Reasons
Crawlers cannot handle newer file formats
Dynamically generated information
Content in relational databases
(e.g., Census data)
Sherman and Price (2001)
Dynamically Generated Content
stock quotes
airline flights
weather
phone
directories
library catalogs
people finders
dictionary definitions
online store products
(e.g., Ebay)
census data
patents
news
Why Web Is “Invisible” to
Search Engines
Password protected sites
Noindex metatag
Robot exclusion protocol
Sites require forms to be filled out
Dynamically generated content
New file formats
Newly added Web pages
Finding the
Invisible/Deep Web
Try these search terms:
webliography, libguides, database,
research guides, resource guides
Use Invisible Web directories.
Explore library websites
– Pathfinders, Research Guides, etc.
Chris Sherman
Gary Price