Transcript Slide 1

Understanding Web Searching
Secondary Readings and So On…
Will Meurer for WIRED
October 7, 2004
Introduction
• Why do we care about how people use the Web?
• Today’s topics (10/7, not the present age):
–
–
–
–
–
–
–
Implicit vs. explicit feedback
Representation effectiveness
Browser-based activities
History mechanisms
How do we cater to the people?
Resources
Research
Implicit vs. Explicit Feedback
Reading Time, Scrolling and… (Kelly & Belkin, 2001)
• Implicit feedback (Morita & Shinoda):
– Time spent on a page is directly related to user
interest. Backed by many studies.
• Explicit feedback (this study)
– Time spent on a page is similar for relevant and
irrelevant content.
• Results suggest:
– “Generalizability” is severely affected by explicit
feedback methods.
– Spend time to choose the right feedback type!
Implicit vs. Explicit Feedback
Reading Time, Scrolling and… (Kelly & Belkin, 2001)
• Why do the results differ?
– Relevance was difficult to
distinguish this time
– Participants are truly
interested in the content
former studies
– Users may have rushed to
complete in this experimental
context
Representation Effectiveness
How we really use the Web (Krug, 2000)
Three “facts of life”:
1. “We don’t read pages. We scan them.”
– Why? hurry, necessity, habit
– If we are to read its entirety, we save or print!
(ClearType project)
Representation Effectiveness
How we really use the Web (Krug, 2000)
2. “We don’t make optimal choices. We
Satisfice.”
– Why? hurry, quick access to and fro, less
work than thinking
– Generally, it’s more productive to guess.
Representation Effectiveness
How we really use the Web (Krug, 2000)
3. “We don’t figure out how things work.”
– Why? not important, “if it ain’t broke
(baroque)…”
– Is it important to us whether the user
understands how it works or not? Why?
Representation Effectiveness
Cognitive Strategies in Web… (Navarro-Prieto, et al, 1999)
• Users get lost on the Web. Why?
• It is not just interactivity between user and
system, rather user, task, and information
• Analysis structure of browsing behavior
presented and tested
“The Interactivity Framework” or “How we
should analyze cognitive strategies”
Representation Effectiveness
Cognitive Strategies in Web… (Navarro-Prieto, et al, 1999)
• The Interactivity Framework
– User Level – Web experience, cognitive
processes, cognitive style, knowledge (CS
majors knew more about SE processes)
– User Strategies – based on searching
structure (or lack of), task nature
SEARCHING CONDITIONS
FACT FINDING
EXPLORATORY
DISPERSED
STRUCTURE
•
•
Look for data base algorithm in Java
Look for criteria for the diagnosis of
diseases
•
Find all the available jobs for
profession
CATEGORY
STRUCTURE
•
Look for word definition
•
Find all information about
1997 Nobel Prize for
Literature
Representation Effectiveness
Cognitive Strategies in Web… (Navarro-Prieto, et al, 1999)
– Information Structure
• Internal (user’s) representation
• External (system’s) representation
• Computational Offloading – How much work does the
user have to do to understand and how much does a
representation help?
– Re-representation – How much it makes problem solving easier
or more difficult
– Graphical Constraining – How it constrains inferences
– Temporal and Spatial Constraining – How it helps when
distributed over time and space
Representation Effectiveness
Cognitive Strategies in Web… (Navarro-Prieto, et al, 1999)
SEARCHING
TASK
EXPERIENCED
WEB-PARTICIPANTS
INFORMATION IN
WEB
DISPERSED
STRUCTURE
SPECIFIC FACT
FINDING:
• Bottom-up
• Mixed strategy at the
beginning and
selecting Bottom-up
(e.g. find criteria for a
psychological disease)
NOVICE WEBPARTICIPANTS
•
•
Start with top-down and
change at the end to
bottom-up
Start typing without
knowing why
EXPLORATORY:
• Top-down
INFORMATION IN
WEB
CATEGORY
STRUCTURE
(e.g. find a job opening)
•
•
Mixed strategy at the
beginning and then
selecting top-down
Top-down
•
•
Top-down following
browser categories
Start with bottom-up and
change to top-down
Representation Effectiveness
Cognitive Strategies in Web… (Navarro-Prieto, et al, 1999)
• More Results
– Experienced users searched with a plan
– By having a plan you keep a more internal
representation and focus your search
– Inexperienced users were more influenced by
external representations
– Computational Offloading Results
• Must explain
– How have these issues changed?
Representation Effectiveness
Cognitive Strategies in Web… (Navarro-Prieto, et al, 1999)
• Conclusions
– Cognitive strategies used by the participants
depend on how the information is structured.
– Interaction is a multi-dimensioned concept.
– Search engine interfaces should be designed
to have less restrictive external
representation.
Browser-based Activities
Characterizing Browsing… (Catledge & Pitkow, 1995)
• User study of browsing events at the Georgia
Tech (xMosaic browser)
• Three main browsing strategies identified:
– Search browsing – directed search, goal known
– General purpose browsing – consulting highly likely
sources for needed information (dictionary.com)
– Serendipitous browsing – random
– Most people use a combination of these
Browser-based Activities
Characterizing Browsing… (Catledge & Pitkow, 1995)
• Results
– Users were patient 99% of the time for long page loads
– 1222 unique sites accessed outside of GATech (~16% of Web servers)
– Paths were calculated (sequences of page navigation)
• Per session, paths of 7 different sites occurred 5 times
• Per user, paths of 8 different sites occurred 9 times
Browser-based Activities
Characterizing Browsing… (Catledge & Pitkow, 1995)
• More Results
– 2% of the retrieved pages were saved or printed
– Based on user’s slope, browsing strategy categories were
applied
– Slope can also categorize usage
patterns of Web documents
– Users tended to operate in one
small area of a site
Browser-based Activities
Characterizing Browsing… (Catledge & Pitkow, 1995)
• Design Strategies
– Users averaged 10 pages per server
• Make most important info within 2 or 3 jumps from the index
• Do not put too many links on one page – increases search
time (back, forward, back, site map, etc.)
– Facilitate the likely visitor browser patterns
• Maybe make more than one version of your page?
• Most work well in a “hub and spoke” environment
• The Future
– Offer site tour based on most frequently traveled
paths
– Alter page design dynamically based on site trends
History Mechanisms (in browsers)
Revisitation Patterns in… (Tauscher & Greenberg, 1997)
• Purpose: Provide empirical data to aid in
the development of effective history
mechanisms
– Understand revisitation patterns
– Evaluate current mechanisms and suggest
best practices and methods
• Data Collection
– Altered version of xMosaic to record activity
– Survey of users afterward
History Mechanisms (in browsers)
Revisitation Patterns in… (Tauscher & Greenberg, 1997)
• Revisitation Results
– 58% recurrence rate (>40% are new pages!)
– As people search they build their vocabulary
– 7 browsing strategies
•
•
•
•
•
•
•
First-time visits to cluster of pages
Revisits to pages
Authoring of pages (high reload percentage)
Regular use of web-based apps
Hub-and-spoke (breadth-first approach)
Guided tour (e.g. next page links)
Depth-first search (following links deeply before returning to
the index)
History Mechanisms (in browsers)
Revisitation Patterns in… (Tauscher & Greenberg, 1997)
• Revisitation Results
– Visit frequency as a function of distance
• Users mostly revisit recently visited pages (within about 6 jumps)
• 39% chance that the next URL will match one of the previous 6
pages visited
– Access frequency
•
•
•
•
60% of pages visited only once
19% visited twice
8% visited 3 times
4% visited 4 times
– Locality (not valuable for predicting next page)
• Most locality sets were small
• Only 2.5 to 4.5 URLs per set
• Only 15% of pages were part of a locality set
– Paths (not valuable for predicting next page)
• Could these be captured and offered in a history mechanism?
• Time per page could indicate path
History Mechanisms (in browsers)
Revisitation Patterns in… (Tauscher & Greenberg, 1997)
• Mechanism types
– Recency Ordered
•
•
•
•
Sequential order based on time accessed
Repeated entries for revisitation
“Pruned” by keeping only first instance or only last
Simple for users to understand (they remember paths)
– Frequency Ordered
•
•
•
•
•
Most visited at top, least visited at bottom
User interest changes, latest URLs must have frequency
How to break ties – last visited, earliest visited
When few items are on the list, this suffers
Difficult for users to understand
History Mechanisms (in browsers)
Revisitation Patterns in… (Tauscher & Greenberg, 1997)
• Stack-based
– Recently visited at top
– Order and availability depend on:
• Loading – causes page to be added to the top
• Recalling – changes pointer to the currently displayed page
• Revisiting – user reloads the page, has no effect on the stack
–
–
–
–
Keeps duplicates
Non-persistent vs. persistent (btw sessions)
Better than recency at short distances
Users have difficulty understanding this model
History Mechanisms (in browsers)
Revisitation Patterns in… (Tauscher & Greenberg, 1997)
• Hierarchically Structured
– Recency ordered hyperlink sublists
•
•
•
•
Like recency w/ latest position saved
Each URL has its own sublist of links from that page
Helps with common linking paths
Easier to understand
– Context-sensitive web subspace
• Somewhat of a combination of the above-mentioned and
stack-based approaches
• Gives user better understanding of context of his/her
searches
• May be difficult to remember where a certain URL was
• I THINK this approach would be a great tool
History Mechanisms (in browsers)
Revisitation Patterns in… (Tauscher & Greenberg, 1997)
• Do users actually use history mechanisms?
– Less than 1% of navigation
– 3% involve favorites
– 30% of navigation was back button usage
How do we cater to the people?
• Inter-site browsing strategies are not easy to
tackle. How would you control that?
• Why should we attempt to understand user
behavior and search strategies?
– Formulate general design principles (e.g. 3 level
depth)
– Design for multiple searching personalities
– Understand how to survey your intended users or get
feedback most appropriately
– Identify importance of all aspects of the development
process and allocate resources accordingly
How do we cater to the people?
Some Bright Ideas
• Personalized search
– Learning systems – You might also like…
– www.a9.com (history, favorites, personalized
interface)
– But what about changing for different types of user
behavior based on the user’s path history on your
server?
• Researched since 1995 and earlier!
• What has resulted?
• Microsoft ASP.net 2.0 – Web Parts
What resources are out there?
• xMosaic 2.6 download, for those of you so excited
• Architecture of the World Wide Web
http://www.w3.org/TR/webarch/
• Sum Sun Sug Gestions
http://www.sun.com/980713/webwriting/
• Jakob Nielsen – research on content usability,
http://useit.com/alertbox/9710a.html
Research
• Vox Populi: The Public Searching Of The Web (2001)
– Compares statistics from two studies
– Shows how public searching changed from 1997 to 1999
• Usage Patterns of a Web-Based Library Catalog (2001),
Michael D. Cooper
• Real Life, Real Users, and Real Needs: A Study and
Analysis of User Queries on the Web (2000), Jansen,
Spink & Saracevic
• Redefining the Browser History in Hypertext Terms (),
Mark Ollerenshaw