The Case for Browser Provenance

Download Report

Transcript The Case for Browser Provenance

The Case for Browser Provenance
Daniel W. Margo and Margo Seltzer
Harvard School of
Engineering and Applied
Sciences
Overview
• Problem: Browser Data Management
• Solution: Provenance for Web Browsers
• Use Cases
• Details and Challenges
• Implementation
The Modern Browser:
A Super-Application
• Originally a distributed document reader.
• But now most documents are distributed.
• And the definition of “document” has changed:
– Webmail
– YouTube
– Google Apps
• It is difficult for users to manage all this data.
– e.g., recall a specific web page.
Browser Data Management (I)
• A “little big data” problem…
– My history: ~25k objects in ~2 months.
– Tractable for computers, but not for users.
• Traditional solution: Bookmarks.
– Requires users to tag their data in advance…
– …and to manage the bookmarks.
• Advanced solutions:
– History Search (Google Chrome’s “New Tab” page)
– Autocompletion (form history, saved passwords)
Browser Data Management (II)
• Firefox 3’s “Smart Location Bar”
from http://support.mozilla.com/en-US/kb/Smart+Location+Bar
• Most solutions powered by history and usage statistics.
• “History and usage statistics” = provenance.
Traditional Browser History
Web Graphs (Firefox 3 Places)
Browser Provenance
Browser Provenance
Use Case:
Contextual History Search
• Most history search is textual
• Edges imply contextual relationships.
– E.g. “rosebud”  “Citizen Kane”.
• 2-phase contextual search (Shah et. al):
– Perform a textual history search.
– Then, push the weight of results to neighbors.
• Similar to modern web search…
– And good for the same reasons.
Use Case:
Personalizing Web Search
• Context is created by the user.
– So a gardener relates “rosebud”  “flower”.
– Frustrating if Google returns “Citizen Kane”.
• Browser could clarify context to search engine!
– Naïve: Just insert “flower” into “rosebud” searches.
– If engine had a better interface, we could do better.
• Personalization with privacy.
– Browser knows more about user than cookies can.
– No need to give third parties raw personal data.
Use Case:
Time-Contextual History Search
• Current histories can’t recreate prior state.
– e.g., “were these two pages open simultaneously?”
• Time relationships…
– Are natural: “rosebud, and I think I was also looking at
gardening tools around that time.”
– Narrow the search space a great deal.
• Related Work:
– Gyllstrom and Soules’ “SeeTrieve”
– Dumals et. al’s “Stuff I’ve Seen”
Use Case:
Download Lineage
• Need to know where data comes from.
– For source attribution, finding updates, etc.
• URL is not always sufficient.
– “This image came from…ImageShack!”
• This is exactly what provenance is for!
– Just query ancestors!
Conclusion
• Browsers record many statistics.
• These statistics are provenance records.
• Provenance techniques can improve:
– History search, via context.
– Web search, via personalization.
– Data management, via lineage.
• Some details in the paper.
• Excruciating details in future work.