WWW and InfoSphere - Georgia Institute of Technology

Download Report

Transcript WWW and InfoSphere - Georgia Institute of Technology

WWW and Internet
CS 7450 - Information Visualization
March 3, 2005
John Stasko
Internet and WWW
• By nature, abstract, so good target for
visualization
• Often described in terms of metaphors
 “Information Superhighway”
Spring 2005
CS 7450
2
Agenda
• Two main topics
 Presentations of the Internet and WWW
Focus on topology and navigation, similar to the
graph visualization work
 Visual aids for browsing and using the WWW
and the Internet
Assistive visualizations not focusing on presenting
net structure and connectivity
Spring 2005
CS 7450
3
1. Internet and WWW Topology
• Fundamentally, the Internet is a graph
with some existing physical topology,
though that is often not how we want to
conceptualize it
 Might think of it as having a structure
• Our discussions from graph visualization
are germane here
Spring 2005
CS 7450
4
The Problem
Spring 2005
CS 7450
Mukherjea & Foley
WWW ‘95
5
The Problem
• Websites simply are too big
• Huge graphs
• Layout is challenging
Spring 2005
CS 7450
6
Step Back
• Why would someone want to visualize the
WWW?
Spring 2005
CS 7450
7
Some Reasons
• Aid authors and webmasters with
production and organization of content
• Assist Web surfers making sense of the
information
• Help researchers understand the Web
Spring 2005
CS 7450
8
Depictions of the Web
• Great web site that presents many
different conceptualizations of cyberspace
 Atlas of Cyberspace
http://www.cybergeography.org/atlas/
• Let’s take a few minutes to browse...
Spring 2005
CS 7450
9
Mapping the Internet
• Bill Cheswick at ATT
• Interesting visualizations plus the data
sets are available
• www.cs.bell-labs.com/who/ches/map/index.html
Spring 2005
CS 7450
10
Internet Traffic Paths
www.caida.org/tools/measurement/skitter/
Spring 2005
CS 7450
11
Mbone
Map
www.cs.berkeley.edu/~elan/mbone/map.html
Spring 2005
CS 7450
12
Immersive Systems
www.pnl.gov/remote/projects/starlight/
Spring 2005
CS 7450
13
View of Web Site’s Pages
www.dynamicdiagrams.com/
Spring 2005
CS 7450
14
Web Site
www.mos.ics.keio.ac.jp/NattoView
Spring 2005
CS 7450
15
Web Site Visitations
Spring 2005
CS 7450
www.inventix.com
16
Task Analysis
• Potential web-related tasks
 How and when has info been accessed?
 Where do people enter and spend time?
 How do they move about?
 What paths aren’t traversed?
 Where are they coming from?
 What has been added, changed, deleted?
 Do changes affect navigation patterns?
 Do we need to do a redesign?
Spring 2005
CS 7450
17
Data Set
• Each server request is a data case
• Example variables








IP Address/Client host
Timestamp
URL requested
HTTP status (success, not found, …)
Bytes delivered
Referencing URL (HTTP-Referrer)
User agent (browser and OS info)
...
Spring 2005
CS 7450
18
One Approach
• Use existing InfoVis tool (Eureka, Spotfire,
InfoZoom, etc.), load the data set, and
analyze it
• Get all the strengths and weakness of the
InfoVis tool for supporting particular
analysis tasks
Spring 2005
CS 7450
19
Web Ecology
• Problem: Most visualizations of the web
fail to present the dynamically changing
ecology of users and documents on the
web
• What do we mean by ecology metaphor?
Chi, et al
CHI ‘98
Spring 2005
CS 7450
20
Web Ecology
• By understanding set of relationships
(ecology) among users and their
information environment, and its change
through time (evolution) individuals can
better understand
 Web Content
 Layout of physical and topological space
 Usage through time
Spring 2005
CS 7450
21
Existing Visualizations
• Despite useful functions, problems
 Difficulty visualizing large number of
documents
 Considerable amount of screen real-estate
used
 Only permits the visualization of a site at a
particular point in time, very difficult to make
comparisons across times
 No mechanisms provided that allow
differences in usage to be identified
Spring 2005
CS 7450
22
Techniques
• Disk Tree
 Center-rooted tree that represents the
hyperlink structure of a web site
• Time Tube
 Set of disk trees that organizes and visualizes
the evolution of web sites
Spring 2005
CS 7450
23
Task Application
• Visualizations designed to be useful for
 Local - Finding specific content
 Comparison - Comparing info at two places
 Global - Discovering a trend or pattern in the
site
Spring 2005
CS 7450
24
Analysis Domain
• www.xerox.com, April ‘97
 7,588 items across a 30-day period
 889 new items
 Daily log kept of additions, modifications, and
deletions of content
 Base data comes from link info, usage log
from web servers
 Topological info from custom hyperlink
database
Spring 2005
CS 7450
25
Disk Trees
• Interested in shortest number of hops
from one document to another
• Breadth-first traversal transforms the web
graph into a tree by placing the node as
close to the root node as possible
• After obtaining this tree we then visualize
the structure using the Disk Tree
technique
Spring 2005
CS 7450
26
Disk Tree
Lines - tree links
Line size & brightness page access frequency
Color - page lifecycle stage
new: red
continued: green
deleted: yellow
Spring 2005
CS 7450
27
Advantages
• Structure is compact, with pattern easily
recognizable
• When viewed straight on or at slight
angles, no occlusion problems, since
entire layout is on a 2-D plane
• Unlike cone trees, this 2-D representation
can utilize a third dimension for other
information, such as time
• Circularity pleasing to the eye
Spring 2005
CS 7450
28
Time Tubes
• Time Tubes are multiple disk trees layered
out along a spatial axis
• Advantages
 By using a spatial axis to represent time, we
see information space-time in a single
visualization
 Focus and Context
 Possibility for Animation
Spring 2005
CS 7450
29
Time Tubes
Spring 2005
CS 7450
30
Key Point
• Pages there any time during the studied
period are shown in all disk trees for
period, even if they didn’t exist yet
Spring 2005
CS 7450
31
Real Use
• Time Tube answers following questions:
 What devolved into dead wood? When did it?
Was there a correlation with the restructuring
of the web?
Product safety pages got darker and darker,
indicating lower usage
Doesn’t tell why page is less popular, just raises a
flag to explore page further
Spring 2005
CS 7450
32
Real Use
• What evolved into a popular page? When
did it? Was there a correlation with the
restructuring of the Web site?
 Redesign of site called attention to Fact Book
page
 Became more popular and the corresponding
Disk Trees become greener and greener in
successive weeks
Spring 2005
CS 7450
33
Real Use
• How was usage affected by items added
over time?
 Press release issued for new family of
products, shown as red links
 Usage in the third week jumped from 1
access to 871 accesses, this example helps us
understand that this was probably a well
received product line
Spring 2005
CS 7450
34
Real Use
• How was usage affected by items deleted
over time?
 Change in removing direct link from home
page to main driver page did not negatively
affect the overall use of driver information
 Info stayed green indicating usage, but link
from home page was black, showing not
much traffic
Spring 2005
CS 7450
35
E-Commerce Applications
• What if your focus is on understanding
user access patterns for web sites selling
products to consumers?
• What tasks are important?
Spring 2005
CS 7450
36
One Approach
• Blue Martini Software
• Aggregate web data and visualize
simplified graph of user movements
through web site
• Highlight places where people leave
before purchasing
• ...
Brainerd & Becker
InfoVis ‘01
Spring 2005
CS 7450
37
Different icons represent
different kinds of pages
Only show most-used pages
Spring 2005
CS 7450
38
E-Commerce mimics
mall shopping :^)
Gender differences in
purchase paths at
websites
Spring 2005
CS 7450
39
2. Aiding WWW Browsing
• Can we utilize information visualization
techniques to help people interact with
the WWW and the Internet?
• Battle “lost in hyperspace” problem
• Help us know what’s there
• Help us find things
Spring 2005
CS 7450
40
WebBook and Web Forager
• Personal computers viewed as knowledge
processors before
 Spreadsheets and calculators
• Now viewed as knowledge sources,
portals to vast information worlds
 Networking and WWW
Card, Robertson and York
CHI ‘96
Spring 2005
CS 7450
41
WWW Problems
•
•
•
•
Pages are hard to find
Users get lost, can’t relocate pages
Difficulty organizing things once found
Difficulty doing knowledge processing on
found thing
• Interacting with web is too slow to
incorporate gracefully into other activities
Spring 2005
CS 7450
42
Information Foraging Theory
• From Ecological Biology
• Idea: user stalks certain types of
information
• Users have tendency to interact
repeatedly with small clusters of
information (locality of reference)
• Information encountered at certain rate
 Users evolve to increase finding rate
 Sources evolve to be more attractive
Spring 2005
CS 7450
43
Mechanisms Evolved
• 3 mechanisms in the evolution of the web
on the server side
 Indexes - Lycos search
 Table of contents - Yahoo
 Home pages provided by users with big lists
of related links
Spring 2005
CS 7450
44
Assisting People
• To provide insight
 must support sensemaking
 restructuring
 recoding
• Hotlists are one mechanism in this
direction
Spring 2005
CS 7450
45
Improvements
• WebBook and Web Forager try to do two
things to foster information sensemaking
 Move away from a single web page, and
group and manipulate related pages
 Move from a work environment containing a
single element to a workspace in which the
page is contained with multiple other entities,
including Web Books
Spring 2005
CS 7450
46
WebBook
Spring 2005
CS 7450
47
Features
• WebBook allows for the rapid interaction
with object at a higher level of
aggregation than pages
• 3D book representation, uses animation
• Can ruffle through pages, leave
bookmarks
Spring 2005
CS 7450
48
Applications
•
•
•
•
•
Hot List books
Topic books
Search reports
Book books
...
Spring 2005
CS 7450
49
Web Forager
Spring 2005
CS 7450
50
Web Forager
• Application that embeds the WebBook and
other objects in a hierarchical 3D
information workspace
• Workspace is intended to create patches
from the web where high density of
relevant pages (grouped together in Web
Books) can be combined with rapid access
Spring 2005
CS 7450
51
Constituents
• Hierarchical Workspace - 3 levels
 Focus Place - full page shown, direct
interaction
 Intermediate memory space - books or pages
placed when they are in use but not
immediate focus
 Tertiary space - Storage (bookcase)
Video
Spring 2005
CS 7450
52
Discussion
• Strengths/Weaknesses
Spring 2005
CS 7450
53
Data Mountain
• 3D document management system
• Prototype is an alternative to web browser
“bookmarks” or “favorites”
• Could be used for any kind of document
management
Robertson, et al
UIST ‘98
Spring 2005
CS 7450
54
Make-Up
• 3D inclined plane in which thumbnails of
web pages are placed to serve as
favorites
• User is responsible for organization
• Uses smooth animation and audio to
assist interaction
Spring 2005
CS 7450
55
Spring 2005
CS 7450
Video
56
User Study
•
•
•
•
•
Data Mountain versus IE4 “Favorites”
Experienced IE4 users
Stored 100 pages, then retrieved them
DM fared about-as-well with “title” cue
DM fared better for all other cues
Spring 2005
CS 7450
57
Leveraging Human Capabilities
• Spatial memory: analogy with paper
placed on a pile on your desk
 User is responsible for personal organization
• 3D perception: minimal cognitive load,
good utilization of screen space
Spring 2005
CS 7450
58
Interaction Techniques
• Placing pages: confinement to inclined
plane makes normal 2D drag-and-drop
sufficient; no unfamiliar 3D navigation
needed
• Continuous feedback: both audio and
visual feedback are natural; minimized
unexpected interactions/surprises
Spring 2005
CS 7450
59
Limitations/Future
• Limits number of pages stored
• No explicit support for grouping
• Landmarks/contours as helpers
Spring 2005
CS 7450
60
Discussion
• Strengths/Weaknesses
• Could it be used elsewhere?
Spring 2005
CS 7450
61
Upcoming
• Trees & Hierarchies (2 days)
 Reading
Chapter 8
Lamping & Rao
Spring 2005
CS 7450
62
References
• Spence and CMS texts
• All referred to papers and websites
• McNamara & Defnet and Craighill,
Robeson & Sheridan F ‘99 slides
Spring 2005
CS 7450
63