Information Trapping

Download Report

Transcript Information Trapping

Information Trapping
Mark DuBois
Illinois Central College
[email protected]
Your background
 Why are you here?
 What do you hope to gain from this presentation?
 What do you know about?
 RSS feeds (live bookmarks)
 Micro-formats
 e-Mail
 Tagging and social bookmarks
Source of a lot of this information
 Information trapping book
Information Trapping:
Real-Time Research on
the Web. Tara Calishain.
(2006)
ISBN: 0321491718
Why trapping?
 Suppose you need to keep up to date with a
given technology
 You could
 Subscribe to various specialty magazines and e-news
letters
 Use search engines to methodically obtain information
 Searching is so 1990’s
 Much of the information available on the WWW
today differs what was there yesterday
 Why not set up RSS feeds and other traps
 Once these are established, you review the results
periodically
Huh?
 Consider the process (contrast with a search)
1. Examine your subject and carefully develop
search queries
2. Evaluate places to search
3. Establish your queries
4. Receive and periodically evaluate the results
The initial process is more time consuming



It is not as easy to tweak the traps as it is to
modify a search query
However, once you have the traps set, you can
collect results for months or years
Simple Example
Initial questions
 What is the topic you are interested in?
 What are the likely sources of information on this
topic?
 This likely includes questions such as what and
where (in the event a geographic locality is involved
or you wish to focus your results on particular
institutions or individuals)
 How frequently do you wish to receive results?
 How do you want to receive the results?
 Do you prefer e-mail, RSS feeds or what?
RSS fundamentals
 WikiPedia definition (slightly modified)
 “Family of web feed formats used to
publish frequently updated content such
as blog entries, news headlines, or
podcasts.”
 “An RSS document, which is called a ‘feed,’
‘web feed,’ or ‘channel,’ contains either a summary
of content from an associated web site or the full
text. RSS makes it possible for people to keep up
with their favorite web sites in an automated
manner that's easier than checking them manually.”
RSS fundamentals
 Consider current versions of Firefox – subscribe
to this page (instead of bookmark this page)
Firefox addons
 Wizz RSS - https://addons.mozilla.org/en-
US/firefox/addon/424
 Purpose to read and manage
RSS feeds
 Useful for small number
of feeds
 Perhaps only critical ones
 Public and Private
 Need Wizz account for latter
 Limited security
Firefox addons
 Sage - https://addons.mozilla.org/en-
US/firefox/addon/77
 No need for an account
 Linked to Technorati
(see what others link to
for items of interest)
 A lightweight alternative
(like Wizz)
Web based RSS readers
 http://www.bloglines.com/
 Lots of options
 http://www.newsburst.com/
 Part of CNET
 Can use OPML (Outline Process Markup Language) –
XML based file to allow importing/ exporting of RSS
feeds
 http://www.google.com.reader
 Public page if you want to share
 http://www.feedbucket.com/
 http://reader.rocketinfo.com/desktop/
Client side RSS readers
 http://www.jwizz.com/ - may recall Wizz
 Java based version for desktop
 http://www.superwaba.com.br/en/default.asp
 Mobile device RSS reader (based on Wizz)
 http://www.sharpreader.net/
 Requires .Net platform
 There are many others, but a fair number cost
 NetNewsWire (for Mac) $29.95
 NewzCrawler (for Windows) $24.95
Ok, now I have the software…
 So what?
 First need to identify possible sources of
information (next slide)
 Need to understand the technology so you
can use it effectively
 Some sites updated frequently, others, not very
often
 Before you try to set up traps to monitor sites, I
recommend you understand the capabilities of the
technology and the nuances of the sites you plan
to monitor
RSS fundamentals
 Sources of feeds







http://www.newsgator.com/
http://feedster.com/
http://www.syndic8.com/
http://newsisfree.com/
http://technorati.com/blogs (for weblogs)
http://2rss.com/
http://www.rss-network.com/
 Types of feeds
 Static
 Keyword based
What does RSS look like?

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="http://rss.cnn.com/~d/styles/rss2full.xsl" type="text/xsl"
media="screen"?>
<?xml-stylesheet href="http://rss.cnn.com/~d/styles/itemcontent.css"
type="text/css" media="screen"?>
<rss xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0"
version="2.0">
<channel> <title>CNN.com</title>
<link>http://www.cnn.com/?eref=rss_topstories</link> <description>CNN.com
delivers up-to-the-minute news and information on the latest top stories, weather,
entertainment, politics and more.</description>
<language>en-us</language>
<copyright>© 2007 Cable News Network LP, LLLP.</copyright> <pubDate>Sun,
04 Nov 2007 12:22:34 EST</pubDate>
<ttl>5</ttl> <image>
<title>CNN.com</title> <link>http://www.cnn.com/?eref=rss_topstories</link>
<url>http://i.cnn.net/cnn/.element/img/1.0/logo/cnn.logo.rss.gif</url>
Page Monitors
 Isn’t RSS enough?
 Sometimes the content is not available via RSS
 Sometimes you only need a little information
 What is a page monitor?
 Automated tool that takes a “snapshot of a web
page”
 Returns later and takes another
 Compares the two and reports on differences
 Can have false positives (perhaps someone changed the
spelling)
 Web based or client side tools
Page Monitors (2)
 Web based
 http://watchthatpage.com/
 Free, must register
 Has been reported on some blacklists
 http://trackengine.com/
 Free (for up to 5 sources)
 http://changedetect.com/
 Free (up to 5 sources)
 http://www.changedetection.com/monitor.html
 Somewhat limited options (no frequency of monitoring)
 http://www.pagehammer.com/
 Free as well
Page Monitors (3)
 Desktop
 http://aignes.com/ (Website-Watcher)
 Free trial (relatively inexpensive)
 http://www.copernic.com/en/products/tracker/
 $50 (free 30 day trial)
 http://www.safe-install.com/programs/internet-
owl.html (Internet Owl)
 Free
 Mac
 http://chaoticsoftware.com/ProductPages/WebWatc
her.html (Web Watcher)
 $20 shareware
e-Mail alerts
 Why?
 May want to monitor entire sites (not just selected
pages with a page monitor)
 Most don’t have as many false positives as page
monitors
 May want to have content sent to places other than
your computer (perhaps a cell phone)
 There are quite a few of these
 Search for entomology ("email alerts" OR "e-mail
alerts") gave me 257,000 possibilities
e-Mail alerts sites
 http://www.google.com/alerts is one
 http://alerts.yahoo.com is another
 http://googlealert.com/ (not affiliated with Google –
this site came before Google alerts)
Microformats
 These are small islands of HTML data
 Yes, HTML is a data type these days, just like XML,
SQL databases and so forth
 As long as everyone agrees on underlying format/
names
 Can actually use these to interchange data
Microformats
 Solve a specific problem
 Have a low barrier to entry
 Design for humans first, machines second
 Reuse building blocks from existing standards
 Are modular and can be embedded in web pages
 Encourage decentralized content and services
Types of microformats
 Hcard – for marking up contact information for people
and organizations
 Hcalendar – for marking up event information for
meetings and conferences
 Hreview – for marking up reviews including products
and events
 Example sites
 http://corkd.com/ - wine reviews (hreview), contact
(hcard)
 http://flickr.com/ - profiles (hcard)
 http://www.last.fm/ - concerts (hcalendar)
 http://upcoming.yahoo.com/ - events (hcalendar),
profiles (hcard)
Microformats
 Desire to re-use bits of HTML
 http://microformats.org/ (good reference site)
 Operator (Firefox add-on) -
https://addons.mozilla.org/en-US/firefox/addon/4106
 Dreamweaver microformats extension
 http://www.webstandards.org/action/dwtf/microform
ats/
 Consider a few examples
 hCard (for people and organizations)
 http://microformats.org/code/hcard/creator
 hCard creator
Microformats.org with Operator
 Screen capture below
hCard Example
<div id="hcard-Mark-DuBois" class="vcard">
<a class="url fn"
href="http://www.markdubois.info">Mark DuBois</a>
<div class="org">WOW</div>
<div class="adr">
<div class="street-address">1 College Drive</div>
<span class="locality">East Peoria</span>,
<span class="region">IL</span>,
<span class="postal-code">61635</span>
<span class="country-name">USA</span>
</div>
</div>
hCalendar
 http://microformats.org/code/hcalendar/creator
 Compact Code example
 <div class="vevent" id="hcalendar-WOW-Meeting">
<abbr class="dtstart" title="20071101">November
1st</abbr> &mdash; <abbr class="dtend"
title="20071103">2nd, 2007</abbr> <span
class="summary">WOW Meeting</span>&mdash;
at <span class="location">Las Vegas</span> <div
class="description">Review established curriculum
model and current technology trends as they affect
web curricula</div> </div>
Queries
 Yes, we also will need to use search engines
 We need to verify we have the desired sites (and have
not overlooked something)
 How should I actually create queries to obtain needed
information?
 Many just plug a couple of words into the text input box
 Consider
 Using unique language – instead of ants (which turns up
Java related terms in addition to insects), I might look for
Formicidae
 Use more words (I believe Google has a limit of 32
words)
 How many have ever approached that limit?
 Try to be as narrow as possible
Searching
 Basic syntax
 Caterpillar –tractor (using the minus sign in front of a
word to exclude those sites from the results)
 JavaScript tutorials examples – since I did not specify,
most search engines today assume a Boolean AND is
between each word
 Sidebar - http://www.googlewhack.com/
 Special searching syntax
 Intitle:keyword (for Google and Yahoo) – word must be
in title
 InURL:keyword (for Google and Yahoo) – word must be
in URL
 Site:domain (.edu, .com, etc.) – might help if looking for
academic information
Tags and conversations
 Tags – keyword someone uses to describe a resource
in a directory
 People often build a folksonomy (or collaborative
tagging)
 http://en.wikipedia.org/wiki/Folksonomy
 These are not full descriptions, only a few words
 Conversations – discussions on mailing lists or
forums
 There are specialty search engines which index
conversations
 http://www.omgili.com/ is an example
 Why treat these differently?
 Language
Searching within tags
 Potentially working with huge datasets
 Consider that by 2010 Gartner Group estimates
there will be 1 zettabyte of information generated
annually
 2 to the 70th power
 10 to the 21st power
 “Grains of sand”
 A lot of this information is in the form of audio,
images, and video
 This is why tagging has become so popular –
helpful to find
 What do we look for?
Searching within tags (2)
 Tags are only a couple of words
 Consider that you can look for different levels of
information
 Insects
 Ants
 Labor Day ants
 Lasius neoniger
 Last one might be appropriate for website search
but is probably too specific for tag search
 Try to stay simple and general
Searching within conversations
 Create queries that reflect how you would discuss
a topic
 If you are interested in professional
conversations, use their vocabulary
 Many of the conversation search sites have
advanced search options
 Use them
 Example on next slide
Advanced search - conversations
Tagging information
 Might want to use some existing sites as well
 http://del.icio.us/ (doesn’t look or act like a search
engine)
 Yes, there is a search box, but also try
 http://del.icio.us/tag/keyword1+keyword2
 http://www.spurl.net/
 http://www.blinklist.com/
 http://rawsugar.com/
 http://technorati.com/tag
Filtering the input
 You may have set a number of traps
 RSS feeds can be organized in the software itself
 eMail tends to accumulate and may hinder your
best efforts to control it
 One alternative is Gmail
 Lot of storage space (4.5 GB at this time)
 Good filtering ability
 Excellent anti-spam capabilities
 Great searching capabilities
 Can also create multiple addresses
 [email protected] – send me a message
sometime
Gmail example
 Results of filter for mail sent to
[email protected]
Gmail example (2)
 Setting the filter
Gmail example (3)
 Setting the filter – part 2
Gmail example (4)
 Searching
 Show search options
 Note that when you select a label, you are doing a
search in your inbox
Organizing the information
 Consider starting with simple text editor
 I use Notepad++
 http://notepad-plus.sourceforge.net/uk/site.htm
 Multiple sources of information
 If you use a tool like MS-Word, you get all sorts of
formatting (yes, you can deactivate it, but it can be
a pain)
 Could also use Wiki (portable one is TiddlyWiki)
 http://www.tiddlywiki.com/
 That is what I will provide all these links with
 Can download from
http://www.markdubois.info/IBEA/
Some of the items we covered
 RSS
 Web page monitors
 eMail alerts
 Microformats
 Queries
 Searching
 Tags
 Conversations
 Filtering the results
 Organizing the results
References
 Information trapping book
Information Trapping:
Real-Time Research on
the Web. Tara Calishain.
(2006)
ISBN: 0321491718
 Microformats: Empowering
Your Markup for Web 2.0
John Allsopp (2007)
ISBN 1590598148
Information Trapping
Mark DuBois
Illinois Central College
[email protected]
[email protected]