Transcript UIST 2006

1 of 20
Enabling web browsers to
augment* web sites’
filter*ing and sort*ing functionalities
David Huynh · Rob Miller · David Karger
MIT Computer Science & Artificial Intelligence Laboratory
UIST 2006 · Montreux, Switzerland
1 of 20
2 of 20
WHERE TO GO NEXT?
• Letizia · Lieberman, 1995
• WebWatcher · Joachims, et. al., 1996
• WBI · Barrett, et. al. 1997
2 of 20
3 of 20
WHAT’S THE BIG PICTURE?
• WebBook · Card, et. al., 1996
• WebNet · Cockburn, Jones, 2000
3 of 20
4 of 20
DATA TIDBITS
• Selection Recognition Agent · Pandit, Kalbag, 1997
• Apple Data Detectors · Nardi, et. al., 1998
• Microsoft Smart Tags
• Google AutoLink
• Creo · Faaborg, Lieberman, 2006
4 of 20
5 of 20
Internet Scrapbook · Sugiura, Koseki, UIST 1998
5 of 20
Hunter Gatherer · schraefel, et. al., WWW 2002
6 of 20
6 of 20
7 of 20
Piggy-Bank · Huynh, et. al., Int’l Semantic Web Conf 2005
7 of 20
8 of 20
Dontcheva, et. al., UIST 2006
8 of 20
9 of 20
OUTLINE
• Motivations
• Contributions
• interaction design
• data extraction algorithm
• Evaluations
• Future Work
9 of 20
10 of 20
MOTIVATIONS
• Internet scrapbook, Hunter Gatherer, Piggy Bank, Mira’s
work — all address user needs to “save for later use”
• Immediate use
• Intervention by the web browser
• Unique user needs; web sites not effective
• Private user information
• Sub-goal of “save for later use”
10 of 20
11 of 20
UNIQUE NEEDS
11 of 20
12 of 20
MOTIVATIONS
• Immediate use  in-place interaction
• Keep context
• Leverage web site
• (Ultimately, seamless integration of features
provided by browser and by web site)
12 of 20
13 of 20
CONTRIBUTIONS
• UI design for
• Extracting, then
• Sorting/filtering
structured data in sequences of web pages
• Data extraction algorithm
• For sequences of web pages
• Yielding field values
 Firefox extension “Sifter”, open source
13 of 20
14 of 20
UI DESIGN – EXTRACTION INTERFACE
• Not a user’s goal
• Poorly understood
• Browser and web server are “Mr. Computer”
• It doesn’t let me do this…
• WYSI-all-YG
• Semantic/structure recovery from HTML
• Seemingly unnecessary
• Lossy recovery
• WYSI-what-the-computer-sees (understands)
• Lengthy, error-prone
14 of 20
15 of 20
UI DESIGN – EXTRACTION INTERFACE
• Preview of what Sifter
will do
• Give user better sense
of control
• Offer a chance to make
correction early
• Make lengthy wait more
acceptable
• While keeping the
Confirmation steps
short
15 of 20
16 of 20
UI DESIGN – AUGMENTATION INTERFACE
• Consistency with rest of page
• Paging controls, Browsing controls, Sorting controls, Indicators
16 of 20
17 of 20
UI DESIGN – EXTRACTION INTERFACE
17 of 20
18 of 20
UI DESIGN – AUGMENTATION INTERFACE
• Possible solutions?
• Remove them
• Leave them as-is
• Take control of them
• Dim them out
• Focus interaction on items
• Suggest disconnection with
rest of page
• Still support interaction
18 of 20
19 of 20
UI DESIGN – AUGMENTATION INTERFACE
19 of 20
20 of 20
UI DESIGN – AUGMENTATION INTERFACE
20 of 20
21 of 20
DATA EXTRACTION
• Item Detection
•
•
•
•
Items can be precisely addressed by an XPath
Each item contains a link
On search results pages, items occupy most of the page area
No handling of items made up of sibling/cousin nodes
• Subsequent Page Detection
• Link label heuristic (e.g., “Pages: [1] 2 3 4 … 23”)
• URL parameter heuristic
• http://.../...?...&idx=10&...
• http://.../...?...&idx=20&...
• http://.../...?...&idx=30&...
• Field Detection
• Greedy tree alignment
• Yields typed field values
21 of 20
22 of 20
EVALUATIONS
• Data Extraction
• Collection-level accuracy, not item-level accuracy
• 30 collections
• 19 perfectly extracted (63% collection-level accuracy)
• With some user intervention
• (Link to test data and detailed results is in paper)
• UI Design
• Web augmentation is a novel concept
• Mira needed a tutorial in her informal evaluation
• Formative evaluation: usable? useful?
• Assuming that extraction is perfect
22 of 20
23 of 20
EVALUATIONS – UI DESIGN
• Similar to Flamenco study, Yee, et. al., CHI 2003
• Task 1: Structured
• Perform specified complex filter/sort operations
• 3 cheapest paperbacks (if bought used) by John Grisham in 2005
• Using Sifter
• Using only the web site (Amazon) in 5 minutes
• High-level instruction build-up
• No specific help
• Task 1: Unstructured
• Judge whether a sale is good
• Using Sifter
• Using only the web site (Ashford.com)
23 of 20
24 of 20
EVALUATIONS – UI DESIGN
• Task 1
• 8/8 successful using Sifter
• 5/8 successful using Amazon
• Only 1 knew Advanced Search
• Unified browsing UI useful
• Task 2
• 7/8 successful using Sifter
• 6/7 looked at distribution of
percent saving
24 of 20
25 of 20
SUMMARY
• In-place interaction
• Structured data in web pages
• Advanced browsing features
• Over sequences of pages
• Correction UI
• More in-place interaction
• Interactors
25 of 20