The Invisible Web
Download
Report
Transcript The Invisible Web
The Invisible Web
Definition
Searching
The Invisible Web
Also called:
deep content
hidden internet
dark matter
The Invisible Web
The vast number of pages that search engines cannot or will not
index
Restricted: login, password (such as intranets, databases; private,
proprietary)
Sites not linked from anywhere (undiscovered)
Sites that use a robots.txt file to keep files off limits from spiders
Unsearchable or un-indexable file formats
Non-static - searchable databases that only produce results
dynamically in response to a specific search request (such as CGI,
ASP, CFM)
Real-time data – changes rapidly – too “fresh”
Sites that are too “deep”
The Invisible Web
Search engines often avoid indexing web pages that are
delivered dynamically, such as via database programs:
Often, the search engine may not like the URL used in
order to retrieve the document. Many dynamic delivery
mechanisms make use of the ? symbol.
For example, a page may be found this way:
http://www.website.com/cgi-bin/getpage.cgi?name=sitemap
Most search engines will not read past the ? in that URL.
The Invisible Web
Invisible Web sources tend to be:
More current
More comprehensive
Searchable (however, not by SE’s)
More specific/targeted
Deeper breadth
Often better quality
The Invisible Web
Top types of “invisible” information
News
RSS
Blogs
Public company filings, stock prices
Customized maps and directions
Clinical trials
Telephone numbers and addresses, postal codes
Definitions
Job postings
Grant information
Statistics
Weather
Museum, gallery, and library holdings
Finding the “Dark Matter”
Search Engines
Specialized Search Engines
Directories
Vortals
Traditional Search Engines
Traditional Search Engines incorporation
of “Invisible” Databases
Weather
Maps
Phone directories
Catalogs
Stock prices
Traditional Search Engines
Unless specially, programmed, though,
spiders can’t find all the valuable
resources available
Specialized Search Engines
Search deeper into sites:
Go beyond top page, or homepage
Choose sources to spider—topical sites
only
“Smart” ranking and indexing based on
knowledge of the specific subject
Specialized Search Engines
There are hundreds of specialized search
engines for almost every topic Search Engine Guide
Specialty Search Engines
Directories
Collections of pre-screened web-sites into
categories based on a controlled ontology
Ontology: classification of human
knowledge into topics, similar to traditional
library catalogs
Directories
Closed Model: paid editors; quality control
(LookSmart, Yahoo)
Open Model: volunteer editors; (Open
Directory Project, Google)
Directories
Easier access to relevant results
Faster
Access to materials not always indexed by
search engines—content in databases or
file types not searched by spiders
Directories
Issues with directories:
Inherently small
Unseen editorial policies
May
charge for listing
Lopsided coverage
Timeliness--Harder to keep updated
Search
Vortals
Vortals: vertical-portal. Instead of being a
horizontal, all-inclusive entry point into the Web,
they are vertical, specialized entry points.
Comprehensive sites focusing on gathering and
providing links to the best resources in a specific
topic.
Usually are combined subject-specific search
engines and subject-specific directories
Also called “focused crawlers”; metasites; guru;
authority; industry guide; subject directory site
Vortals
Advantages – best of directories and
subject specific search engines
More up-to-date - crawl subject specific
pages more often
Deeper crawl - gets more of the content on
each server
More precision, less recall
Searching the Invisible Web
How do you find these sites?
Use directories known directories to find
invisible web searching and browsing
tools:
Librarians’
Index to the Internet
Open Directory
Google Directory
Teoma works well, too.
Searching the Invisible Web
Rethink your search:
Think key terms specific details – macro vs. micro
Example you want to find the melting point of hydrogen
peroxide. On the general web, you’d put in the key
words melting, point, and “hydrogen peroxide” On the
invisible web, you look for chemical databases, which
included melting points as one feature of the database,
once in the database, then you’d search for hydrogen
peroxide
Searching the Invisible Web
Remember some concepts are assumed
Do not use the subject a search term
Example: If you are looking for information
on gender inequity in math education,
exclude terms like education from your
search in AskERIC, an education specific
search tool
Mining the Invisible Web
Tips: Certain kinds of sites can prove to be
clearinghouses of information:
Government - statistics of all kinds
Professional organizations - archives of relevant
research and statistics
Media sites (TV and Radio) – transcripts and
speeches
College and university professor sites – lectures
and personal publications
Mining the Invisible Web
Look for library guides and commercial
portals for more guidance in finding the
hidden, valuable content available for free
on the Web (more on this in the next
lesson):
My Ready Reference on the Web Resource