hcc00 - UC Berkeley School of Information

Download Report

Transcript hcc00 - UC Berkeley School of Information

Search
Text Mining
Web Site Usability
Marti Hearst
SIMS
UCB HCC Retreat
BAILANDO Projects
Better Access to Information
using Language Analysis and
Novel Dynamic Organizations
UCB HCC Retreat
Current BAILANDO Projects

CHA-CHA:


LINDI:



Web Search results in Context
UI support for Search
Text Data Mining
TANGO:

Automated Web Site Usability
UCB HCC Retreat
Search UIs
Combine Browsing & Search
Place Search Results in Context
Large
Category
Hierarchies
UCB HCC Retreat
Cha-Cha
Students: Mike Chen, Jamie Laflen, Jason Hong, Jimmy Lin,
Shiang Chen
UCB HCC Retreat
Medical Category Hierarchy
Medicine
Disease
Migraine
UCB HCC Retreat
Anatomy
MS
Carotid Artery
Spinal Cord
Drugs
Tamoxifin
Steroids
DynaCat (Pratt, Hearst, & Fagan 99)
UCB HCC Retreat
DynaCat Study

Design



Three queries
24 cancer patients
Compared three interfaces


Results




ranked list, clusters, categories
Participants strongly preferred categories
Participants found more answers using categories
Participants took same amount of time with all
three interfaces
Similar results have been verified by another
study by Chen and Dumais (CHI 2000)
UCB HCC Retreat
Cat-a-Cone Interface
(Hearst & Karadi 97)
Improving Search via Large
Category Hierarchies


How to show intersections across category
types?
How to preview related categories in a usertailored, dynamic manner?
UCB HCC Retreat
Information retrieval
Text Data Mining
UCB HCC Retreat
Information retrieval
Selection or rejection of existing documents
based on a function of word match.
UCB HCC Retreat
Text Data Mining
Relationships between information in
documents can create new facts, not
previously known.
UCB HCC Retreat
Imagine
You are a medical researcher
Your patient has
spinal inflammation
numbness in fingers
low TC levels
negative results for all tests
How can you help her?
UCB HCC Retreat
Idea
A new way of searching text.
Link pieces of information together
to formulate hypotheses …
UCB HCC Retreat
LINDI
Linking Information for New DIscoveries


Students: Barbara Rosario, David Blei
Three main parts

Search UI for building and reusing hypothesis
seeking strategies.

Statistical language analysis techniques for
interpreting the text.

Backend for interfacing with various databases and
translating different formats.
UCB HCC Retreat
Gathering Evidence
Spinal Inflammation
Numbness in fingers
Low TC Levels
UCB HCC Retreat
Gathering Evidence
Spinal Inflammation
Numbness in fingers
Low TC Levels
UCB HCC Retreat
Find diseases
associated
with each
Supporting Cascaded Search Operations
Spinal Inflammation
Numbness in fingers
Low TC Levels
UCB HCC Retreat
UCB HCC Retreat
New Language Analysis


First use category labels to retrieve candidate
documents
Then use language analysis to detect causal
relationships between concepts

Title:


Interpretation:


Magnesum deficiency implicated in increased stress levels.
<nutrient><reduction> related-to <increase><symptom>
Use these to find relationships and formulate
hypotheses
UCB HCC Retreat
Statistical Semantic Parsing

Modern statistical techniques


Mainly applied to syntactic structure
Probabilistic knowledge representation

Represent hypotheses with different degrees
of certainty.
UCB HCC Retreat
Automating
Assessment of
Web Site Usability
UCB HCC Retreat
Why Worry?

Problem: IBM's extranet



Solution




Heavy use of help and search
Unhappy users
Massive web site redesign
Focus on info-organization, not the purchasing
process.
Cost: "in the millions"
Results



Not announced or trumped up
Use of "help" decreased 84%
Sales increased 400%
UCB HCC Retreat
Web TANGO
Tool for Assessing NaviGation & Organization
Student: Melody Ivory


Goal: automated support for comparing
design alternatives
How: Assess usability of the information
architecture
Approximate people’s information-seeking
behavior (Monte Carlo simulation)
 Output quantitative usability metrics

UCB HCC Retreat
Anatomy of Web Site Design
Information Architecture
Information
Design
Navigation
Design
Graphic
Design
Courtesy of Mark Newman
UCB HCC Retreat
Usability Evaluation
Standard Techniques

User studies




Have people use the interface to complete
some tasks
Requires an implemented interface
"Discount" vs. Scientific Results
Heuristic Evaluation

An expert assesses a design or
implementation according to certain
guidelines
UCB HCC Retreat
Automated Usability Evaluation

Logging/capture






Analytical Modeling




Pro: Easy
Con: Requires implemented system
Con: Don't know the user task (web)
Con: Don't present alternatives
Con: Don't distinguish error from success
Pro: doable at design phase
Con: models an expert
Con: academic exercise
Simulation
UCB HCC Retreat
Existing Metrics

Web metric analysis tools report on what is easy
to measure, e.g.:



Predicted download time
Depth/breadth of site
We want to worry about


Content
User goals/tasks


Not available from logs
We also want to compare alternative designs.
UCB HCC Retreat
Monte Carlo Simulation



Have a model of information structure
Have a set of user goals
Want to assess navigation structure





Compare alternatives/tradeoffs
Identify bottlenecks
Identify critically important pages/links
Check all pairs of start/end points
Check overall reachability before and after a change.
UCB HCC Retreat
Monte Carlo Simulation

At each step in the simulation


Assume a probability distribution over a set of next
choices.
The next choice is a function of:







The current goal
The understandability of the choice
The overall complexity of the set of choices
Prior interaction history
These can use models of "scent"
Varying the distribution corresponds to varying
properties of the links
Spot-check important choices
UCB HCC Retreat
X
One Monte Carlo simulation step for Design 1, Task 1. Simulation starts from the
home page and the target information is at Renter Support.
UCB HCC Retreat
X
Monte Carlo simulation results for Design 1, Task 1. Simulation runs start from all
pages in the site. Average Navigation times are shown for Tasks 2 & 3.
UCB HCC Retreat
Using Simulator Results

Design Decisions



Use Design 1
Improve Tasks 1 & 2
Next Steps




Analyze results for Tasks 1
&2
Create new Design 1
Repeat simulation to
compare old & new
designs
Iterate if necessary
UCB HCC Retreat
Design 1
Design 2
Task Time Errors Time Errors
1
2
3
41 sec 2
38 sec 4
32 sec 2
38 sec 4
43 sec 5
74 sec 6
Research Issues:
Navigation Predictions

Develop IR model for predicting link selection

Requirements





Information need (task metadata)
Representation of pages (page metadata)
Method for selecting links (relevance ranking)
Maintaining user’s conceptual model during site traversal
(scent [Fur97,LC98,Pir97])
One possible approach

Information Foraging Theory [PC95,Pir97,PPR96]


UCB HCC Retreat
Functional categorization of pages based on features
Prediction of relevance to current page
 Consider link connectivity, text similarity & usage
Other HCC-Related Projects

Using a large digital desk in design


Using visualization for light design


Ame Elliot
Dan Glaser
User interfaces and computer security

Prof. Doug Tygar, Rachna Dahmija
UCB HCC Retreat