prospective-students - UC Berkeley School of Information

Download Report

Transcript prospective-students - UC Berkeley School of Information

Search
Text Mining
Web Site Usability
Marti Hearst
SIMS
UCB CS Research Fair
BAILANDO Projects
Better Access to Information
using Language Analysis and
Novel Dynamic Organizations
UCB CS Research Fair
Current BAILANDO Projects

CHA-CHA & FLAMENCO:


LINDI:



Better Search Interfaces
UI support for Search
Text Data Mining
TANGO:

Automated Web Site Usability
UCB CS Research Fair
Search UIs
Combine Browsing & Search
Place Search Results in Context
Large
Category
Hierarchies
UCB CS Research Fair
Cha-Cha
Students: Mike Chen, Jamie Laflen, Jason Hong, Jimmy Lin,
Shiang Chen
UCB CS Research Fair
Medical Category Hierarchy
Medicine
Disease
Migraine
UCB CS Research Fair
Anatomy
MS
Carotid Artery
Spinal Cord
Drugs
Tamoxifin
Steroids
DynaCat (Pratt, Hearst, & Fagan 99)
UCB CS Research Fair
DynaCat Study

Design



Three queries
24 cancer patients
Compared three interfaces


Results




ranked list, clusters, categories
Participants strongly preferred categories
Participants found more answers using categories
Participants took same amount of time with all
three interfaces
Similar results have been verified by another
study by Chen and Dumais (CHI 2000)
UCB CS Research Fair
Cat-a-Cone Interface
(Hearst & Karadi 97)
FLAMENCO:
Improving Search via Large
Category Hierarchies


How to show intersections across category
types?
How to preview related categories in a usertailored, dynamic manner?
UCB CS Research Fair
Text Data Mining
Relationships between information in
documents can create new facts, not
previously known.
UCB CS Research Fair
Imagine
You are a medical researcher
Your patient has
spinal inflammation
numbness in fingers
low TC levels
negative results for all tests
How can you help her?
UCB CS Research Fair
Idea
A new way of searching text.
Link pieces of information together
to formulate hypotheses …
UCB CS Research Fair
LINDI
Linking Information for New DIscoveries

Three main parts

Search UI for building and reusing hypothesis
seeking strategies.

Statistical language analysis techniques for
interpreting the text.

Backend for interfacing with various databases and
translating different formats.
UCB CS Research Fair
Gathering Evidence
Spinal Inflammation
Numbness in fingers
Low TC Levels
UCB CS Research Fair
Gathering Evidence
Spinal Inflammation
Numbness in fingers
Low TC Levels
UCB CS Research Fair
Find diseases
associated
with each
Supporting Cascaded Search Operations
Spinal Inflammation
Numbness in fingers
Low TC Levels
UCB CS Research Fair
UCB CS Research Fair
New Language Analysis


First use category labels to retrieve candidate
documents
Then use language analysis to detect causal
relationships between concepts

Title:


Interpretation:


Magnesum deficiency implicated in increased stress levels.
<nutrient><reduction> related-to <increase><symptom>
Use these to find relationships and formulate
hypotheses
UCB CS Research Fair
Statistical Semantic Parsing

Modern statistical techniques


Mainly applied to syntactic structure
Probabilistic knowledge representation

Represent hypotheses with different degrees
of certainty.
UCB CS Research Fair
Automating
Assessment of
Web Site Usability
UCB CS Research Fair
Why Worry?

Problem: IBM's extranet



Solution




Heavy use of help and search
Unhappy users
Massive web site redesign
Focus on info-organization, not the purchasing
process.
Cost: "in the millions"
Results



Not announced or trumped up
Use of "help" decreased 84%
Sales increased 400%
UCB CS Research Fair
Web TANGO
Tool for Assessing NaviGation & Organization


Goal: automated support for comparing
design alternatives
How: Assess usability of the information
architecture
Approximate people’s information-seeking
behavior (Monte Carlo simulation)
 Output quantitative usability metrics

UCB CS Research Fair
Guidelines



There are many usability guidelines
A survey of 21 sets of web guidelines
found little overlap (Ratner et al. 96)
Why?


Our hypothesis: not empirically validated
So … let’s figure out what works!
UCB CS Research Fair
An Empirical Study:
Which features distinguish
well-designed web pages?
UCB CS Research Fair
Methodology

Data collection




1108 pages
163 sites
3 levels per site
14 metrics


About 85% accurate
Text cluster and text positioning counts less
accurate
UCB CS Research Fair
Metrics
UCB CS Research Fair
Preliminary Results



Linear regression to predict Webby judges
ratings
Top 30% vs bottom 30%
Prediction accuracy:


72% if categories not taken into account
83% if categories assessed separately
UCB CS Research Fair
Goals


Create empirical foundations for what is
still guesswork
Next step:


A free online tool
Long term goal:

An monte carlo simulator for comparing
potential designs
UCB CS Research Fair
For More Information
http://webtango.berkeley.edu
[email protected]
UCB CS Research Fair