What is the Invisible Web?

Download Report

Transcript What is the Invisible Web?

Invisible Web
Invisible Web
 AKA
Hidden
Web
Deep Web
(First labeled as the Invisible
Web in 1994 by Jill H.
Ellsworth)
Internet vs. World Wide Web
Internet



A network created so that computers could talk
to each other
1969
Funded by U. S. Defense Advanced Research
Agency
WWW




Software that runs on the Internet that allows
users to access files
1990
Created by programmer Tim Berners-Lee
European Organization for Nuclear Research
What is the Invisible Web?


Areas of the web that search engine crawlers
cannot access.
Areas of the web that are not directly searchable
by the basic search function of general search
engines such as Google.
Search Engine Crawler


The tool used by Search Engines to index the
Web.
A webpage that is invisible to one search engine
crawler may not be to another.
What is the Surface Web?


Areas of the web that search engine crawlers can
access.
Areas of the web that are directly searchable by
the basic search function of general search
engines such as Google.
Web Evolution
Size of the Web

Surface Web vs. Invisible Web

“White Paper” put out by BrightPlanet states: that
the Invisible Web is about 500 times larger than the
Surface Web.
Visible vs. Invisible
A Visible Website?

Has static HTML WebPages (Depends on
the search engine being used)
 Google
Scholar (PDF’s)
 Google Video
A Visible Website?
 Can
be a Database or Directory
 Do
you have to enter a query?
 Are there links to pages?
 Is a subscription needed?
A Visible Website?
 Needs
to be linked to a site that is
currently being crawled.
 If
not, then it is called a “disconnected
page”
 There is no way to discover the page
 Even
if there are no technical issues
A Visible Website?
 May
depend on the number of
WebPages on a Website
 Mark
Ludwig (Univ. at Buffalo, State
Univ. of New York)
 2.2 million WebPages created, Google
crawled only 20,000
A Visible Website?
 Depends
on if WebPages are Static or
Dynamic
 Weather
information
 Stock reports
Opaque Web
 Sites
that are physically able to be
indexed, but are not chosen to be
indexed for whatever reason.



Size of a website
Dynamic WebPages
Essentially Invisible
Why is this important


Locating information
Disseminating information

If you want a website or digital project to be visible
Locating Information
When Searching


Use the word database, directory, search engine,
or a similar synonym
Use a term for your topic
Disseminating Information
A Visible Website?


Difficult to say if a whole website is visible
Need to make the distinction one Webpage at a
time
Disseminating Information
Do these Websites have invisible
or surface WebPages?
JSTOR
Earth Trends Environmental Information
Chicago Tribune
Directory of Open Access Journals
Western Waters Digital Library
Invisible Web Directories
 CompletePlanet
http://aip.completeplanet.com
 INFOMINE
http://infomine.ucr.edu/
 Librarians’ Internet Index
http://www.lii.org/
Disseminating Information
Google

For Example) Google search engine crawler will
index WebPages by:
Linking to them from other WebPages
 Submitting “add URL form”
www.google.com/addurl.html
 Pay for indexing
 Google only indexes first 101 KB of a website

Future of the Invisible Web



The size of the Invisible Web will continue to
grow.
Most likely be an increase in the amount of
Websites that can be crawled. (Increase in
speed)
More indexing of non-HTML formats

Ex) Google Scholar or Google Video
Conclusion



Use more than one search tool
Use invisible web directories
Remember how web search engine crawlers
function for the purpose of effectively locating
and disseminating information
Questions?