Information Trapping
Download
Report
Transcript Information Trapping
Information Trapping
Mark DuBois
Illinois Central College
[email protected]
Your background
Why are you here?
What do you hope to gain from this presentation?
What do you know about?
RSS feeds (live bookmarks)
Micro-formats
e-Mail
Tagging and social bookmarks
Source of a lot of this information
Information trapping book
Information Trapping:
Real-Time Research on
the Web. Tara Calishain.
(2006)
ISBN: 0321491718
Why trapping?
Suppose you need to keep up to date with a
given technology
You could
Subscribe to various specialty magazines and e-news
letters
Use search engines to methodically obtain information
Searching is so 1990’s
Much of the information available on the WWW
today differs what was there yesterday
Why not set up RSS feeds and other traps
Once these are established, you review the results
periodically
Huh?
Consider the process (contrast with a search)
1. Examine your subject and carefully develop
search queries
2. Evaluate places to search
3. Establish your queries
4. Receive and periodically evaluate the results
The initial process is more time consuming
It is not as easy to tweak the traps as it is to
modify a search query
However, once you have the traps set, you can
collect results for months or years
Simple Example
Initial questions
What is the topic you are interested in?
What are the likely sources of information on this
topic?
This likely includes questions such as what and
where (in the event a geographic locality is involved
or you wish to focus your results on particular
institutions or individuals)
How frequently do you wish to receive results?
How do you want to receive the results?
Do you prefer e-mail, RSS feeds or what?
RSS fundamentals
WikiPedia definition (slightly modified)
“Family of web feed formats used to
publish frequently updated content such
as blog entries, news headlines, or
podcasts.”
“An RSS document, which is called a ‘feed,’
‘web feed,’ or ‘channel,’ contains either a summary
of content from an associated web site or the full
text. RSS makes it possible for people to keep up
with their favorite web sites in an automated
manner that's easier than checking them manually.”
RSS fundamentals
Consider current versions of Firefox – subscribe
to this page (instead of bookmark this page)
Firefox addons
Wizz RSS - https://addons.mozilla.org/en-
US/firefox/addon/424
Purpose to read and manage
RSS feeds
Useful for small number
of feeds
Perhaps only critical ones
Public and Private
Need Wizz account for latter
Limited security
Firefox addons
Sage - https://addons.mozilla.org/en-
US/firefox/addon/77
No need for an account
Linked to Technorati
(see what others link to
for items of interest)
A lightweight alternative
(like Wizz)
Web based RSS readers
http://www.bloglines.com/
Lots of options
http://www.newsburst.com/
Part of CNET
Can use OPML (Outline Process Markup Language) –
XML based file to allow importing/ exporting of RSS
feeds
http://www.google.com.reader
Public page if you want to share
http://www.feedbucket.com/
http://reader.rocketinfo.com/desktop/
Client side RSS readers
http://www.jwizz.com/ - may recall Wizz
Java based version for desktop
http://www.superwaba.com.br/en/default.asp
Mobile device RSS reader (based on Wizz)
http://www.sharpreader.net/
Requires .Net platform
There are many others, but a fair number cost
NetNewsWire (for Mac) $29.95
NewzCrawler (for Windows) $24.95
Ok, now I have the software…
So what?
First need to identify possible sources of
information (next slide)
Need to understand the technology so you
can use it effectively
Some sites updated frequently, others, not very
often
Before you try to set up traps to monitor sites, I
recommend you understand the capabilities of the
technology and the nuances of the sites you plan
to monitor
RSS fundamentals
Sources of feeds
http://www.newsgator.com/
http://feedster.com/
http://www.syndic8.com/
http://newsisfree.com/
http://technorati.com/blogs (for weblogs)
http://2rss.com/
http://www.rss-network.com/
Types of feeds
Static
Keyword based
What does RSS look like?
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="http://rss.cnn.com/~d/styles/rss2full.xsl" type="text/xsl"
media="screen"?>
<?xml-stylesheet href="http://rss.cnn.com/~d/styles/itemcontent.css"
type="text/css" media="screen"?>
<rss xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0"
version="2.0">
<channel> <title>CNN.com</title>
<link>http://www.cnn.com/?eref=rss_topstories</link> <description>CNN.com
delivers up-to-the-minute news and information on the latest top stories, weather,
entertainment, politics and more.</description>
<language>en-us</language>
<copyright>© 2007 Cable News Network LP, LLLP.</copyright> <pubDate>Sun,
04 Nov 2007 12:22:34 EST</pubDate>
<ttl>5</ttl> <image>
<title>CNN.com</title> <link>http://www.cnn.com/?eref=rss_topstories</link>
<url>http://i.cnn.net/cnn/.element/img/1.0/logo/cnn.logo.rss.gif</url>
Page Monitors
Isn’t RSS enough?
Sometimes the content is not available via RSS
Sometimes you only need a little information
What is a page monitor?
Automated tool that takes a “snapshot of a web
page”
Returns later and takes another
Compares the two and reports on differences
Can have false positives (perhaps someone changed the
spelling)
Web based or client side tools
Page Monitors (2)
Web based
http://watchthatpage.com/
Free, must register
Has been reported on some blacklists
http://trackengine.com/
Free (for up to 5 sources)
http://changedetect.com/
Free (up to 5 sources)
http://www.changedetection.com/monitor.html
Somewhat limited options (no frequency of monitoring)
http://www.pagehammer.com/
Free as well
Page Monitors (3)
Desktop
http://aignes.com/ (Website-Watcher)
Free trial (relatively inexpensive)
http://www.copernic.com/en/products/tracker/
$50 (free 30 day trial)
http://www.safe-install.com/programs/internet-
owl.html (Internet Owl)
Free
Mac
http://chaoticsoftware.com/ProductPages/WebWatc
her.html (Web Watcher)
$20 shareware
e-Mail alerts
Why?
May want to monitor entire sites (not just selected
pages with a page monitor)
Most don’t have as many false positives as page
monitors
May want to have content sent to places other than
your computer (perhaps a cell phone)
There are quite a few of these
Search for entomology ("email alerts" OR "e-mail
alerts") gave me 257,000 possibilities
e-Mail alerts sites
http://www.google.com/alerts is one
http://alerts.yahoo.com is another
http://googlealert.com/ (not affiliated with Google –
this site came before Google alerts)
Microformats
These are small islands of HTML data
Yes, HTML is a data type these days, just like XML,
SQL databases and so forth
As long as everyone agrees on underlying format/
names
Can actually use these to interchange data
Microformats
Solve a specific problem
Have a low barrier to entry
Design for humans first, machines second
Reuse building blocks from existing standards
Are modular and can be embedded in web pages
Encourage decentralized content and services
Types of microformats
Hcard – for marking up contact information for people
and organizations
Hcalendar – for marking up event information for
meetings and conferences
Hreview – for marking up reviews including products
and events
Example sites
http://corkd.com/ - wine reviews (hreview), contact
(hcard)
http://flickr.com/ - profiles (hcard)
http://www.last.fm/ - concerts (hcalendar)
http://upcoming.yahoo.com/ - events (hcalendar),
profiles (hcard)
Microformats
Desire to re-use bits of HTML
http://microformats.org/ (good reference site)
Operator (Firefox add-on) -
https://addons.mozilla.org/en-US/firefox/addon/4106
Dreamweaver microformats extension
http://www.webstandards.org/action/dwtf/microform
ats/
Consider a few examples
hCard (for people and organizations)
http://microformats.org/code/hcard/creator
hCard creator
Microformats.org with Operator
Screen capture below
hCard Example
<div id="hcard-Mark-DuBois" class="vcard">
<a class="url fn"
href="http://www.markdubois.info">Mark DuBois</a>
<div class="org">WOW</div>
<div class="adr">
<div class="street-address">1 College Drive</div>
<span class="locality">East Peoria</span>,
<span class="region">IL</span>,
<span class="postal-code">61635</span>
<span class="country-name">USA</span>
</div>
</div>
hCalendar
http://microformats.org/code/hcalendar/creator
Compact Code example
<div class="vevent" id="hcalendar-WOW-Meeting">
<abbr class="dtstart" title="20071101">November
1st</abbr> — <abbr class="dtend"
title="20071103">2nd, 2007</abbr> <span
class="summary">WOW Meeting</span>—
at <span class="location">Las Vegas</span> <div
class="description">Review established curriculum
model and current technology trends as they affect
web curricula</div> </div>
Queries
Yes, we also will need to use search engines
We need to verify we have the desired sites (and have
not overlooked something)
How should I actually create queries to obtain needed
information?
Many just plug a couple of words into the text input box
Consider
Using unique language – instead of ants (which turns up
Java related terms in addition to insects), I might look for
Formicidae
Use more words (I believe Google has a limit of 32
words)
How many have ever approached that limit?
Try to be as narrow as possible
Searching
Basic syntax
Caterpillar –tractor (using the minus sign in front of a
word to exclude those sites from the results)
JavaScript tutorials examples – since I did not specify,
most search engines today assume a Boolean AND is
between each word
Sidebar - http://www.googlewhack.com/
Special searching syntax
Intitle:keyword (for Google and Yahoo) – word must be
in title
InURL:keyword (for Google and Yahoo) – word must be
in URL
Site:domain (.edu, .com, etc.) – might help if looking for
academic information
Tags and conversations
Tags – keyword someone uses to describe a resource
in a directory
People often build a folksonomy (or collaborative
tagging)
http://en.wikipedia.org/wiki/Folksonomy
These are not full descriptions, only a few words
Conversations – discussions on mailing lists or
forums
There are specialty search engines which index
conversations
http://www.omgili.com/ is an example
Why treat these differently?
Language
Searching within tags
Potentially working with huge datasets
Consider that by 2010 Gartner Group estimates
there will be 1 zettabyte of information generated
annually
2 to the 70th power
10 to the 21st power
“Grains of sand”
A lot of this information is in the form of audio,
images, and video
This is why tagging has become so popular –
helpful to find
What do we look for?
Searching within tags (2)
Tags are only a couple of words
Consider that you can look for different levels of
information
Insects
Ants
Labor Day ants
Lasius neoniger
Last one might be appropriate for website search
but is probably too specific for tag search
Try to stay simple and general
Searching within conversations
Create queries that reflect how you would discuss
a topic
If you are interested in professional
conversations, use their vocabulary
Many of the conversation search sites have
advanced search options
Use them
Example on next slide
Advanced search - conversations
Tagging information
Might want to use some existing sites as well
http://del.icio.us/ (doesn’t look or act like a search
engine)
Yes, there is a search box, but also try
http://del.icio.us/tag/keyword1+keyword2
http://www.spurl.net/
http://www.blinklist.com/
http://rawsugar.com/
http://technorati.com/tag
Filtering the input
You may have set a number of traps
RSS feeds can be organized in the software itself
eMail tends to accumulate and may hinder your
best efforts to control it
One alternative is Gmail
Lot of storage space (4.5 GB at this time)
Good filtering ability
Excellent anti-spam capabilities
Great searching capabilities
Can also create multiple addresses
[email protected] – send me a message
sometime
Gmail example
Results of filter for mail sent to
[email protected]
Gmail example (2)
Setting the filter
Gmail example (3)
Setting the filter – part 2
Gmail example (4)
Searching
Show search options
Note that when you select a label, you are doing a
search in your inbox
Organizing the information
Consider starting with simple text editor
I use Notepad++
http://notepad-plus.sourceforge.net/uk/site.htm
Multiple sources of information
If you use a tool like MS-Word, you get all sorts of
formatting (yes, you can deactivate it, but it can be
a pain)
Could also use Wiki (portable one is TiddlyWiki)
http://www.tiddlywiki.com/
That is what I will provide all these links with
Can download from
http://www.markdubois.info/IBEA/
Some of the items we covered
RSS
Web page monitors
eMail alerts
Microformats
Queries
Searching
Tags
Conversations
Filtering the results
Organizing the results
References
Information trapping book
Information Trapping:
Real-Time Research on
the Web. Tara Calishain.
(2006)
ISBN: 0321491718
Microformats: Empowering
Your Markup for Web 2.0
John Allsopp (2007)
ISBN 1590598148
Information Trapping
Mark DuBois
Illinois Central College
[email protected]
[email protected]