From Browsing to Interacting: DBMS Support for Responsive Websites

Download Report

Transcript From Browsing to Interacting: DBMS Support for Responsive Websites

Challenges in Scalable Data Mining:
Support for DB-Backed Web Sites
Raghu Ramakrishnan
Professor, UW-Madison
Talk Outline

Introduction
–
–
–

Background
–
–

Personalization
User interactivity
DB implications
Tracking users
Content management and delivery
Challenges
Introduction
Evolution of Websites
Standard Content
Passive Users
Personalized Content
Active Users
Fundamental Shift

User-centric design of websites.
–
–

Web, unlike print, phones, TV, and other media,
offers a unique opportunity to present each
customer with a customized experience.
Exploiting this potential is becoming a key
differentiator across sites.
See
http://www.personalization.com/resources/vendors/
Personalization

Adapt the site to each user, even
each visit.
–
–
Bugs Bunny is different from Michael
Jordan
Last time, Bugs was shopping for
himself, this time he’s looking for a gift
for Michael
Personalization

Technical Implications:
–
–

Need to know something about user, current visit
Need to dynamically alter requested page
Privacy concerns:
–
Will an individual user’s profile be disclosed/sold to
others? Will the profile information be used in ways
other than to improve that user’s site-experience in
ways that the user approves?
User Interactions

Traditionally: Searches, purchases.
–

Doesn’t leverage a site’s biggest asset: its users.
Site itself is not changed by users in this model.
Richer interactions: Web communities.
–
–
–
–
–
Put up for auction, bid
Comment, rate
Form groups and work together
Ask, answer
User-generated content
Web Communities



Site content driven by users, and changes
rapidly.
Viral growth patterns lead to high volumes of
traffic.
Need to validate, review for quality.
–

Must track user activity
Greater need for personalization, push
technologies.
–
Again, need to track users, dynamic pages
DBs, Mining, and Websites



Personalization and increased user interactivity
both lead to websites that deliver dynamically
constructed pages, based on data in a DB.
Ergo, we have a vast new application domain
for database management systems.
Ergo, we have a challenge: How best to adapt
each page to the current user and context.
Background
Tracking Users: Cookies

GET
–
–
Browser issues this command to retrieve a doc;
includes all cookies visible to target server
Server responds with header info, including doc size,
server location, cookie directives, etc., plus document
GET … Cookie: visits=10 …
Set-Cookie: visits=11
Cookies

Server can set following parameters for cookies:
–
–
–
–
Name and value
When cookie expires
Which pages on server “see” the cookie
Which servers can “see” the cookie


E.g., Doubleclick servers can see cookies set at many sites
Alternative to cookies:
–
–
Carry request history along: modify each requested
page to “attach” history to every link on page!
Allows session tracking, but not across sessions.
Vignette StoryServer

A platform for developing dynamic web sites:
–
–


Content personalization and delivery
Content Management
An elaborate gateway that sits between web
servers and DBMSs (and file systems).
Spin-off from CNET’s efforts to develop their
own site.
Vignette StoryServer

A page is assembled dynamically from
components:
–
–
–

Adaptive navigation bars
Summary components (e.g., top-ten lists)
Personalized elements (e.g., selected news);
integration with recommendation engines such as
Net Perceptions’ GroupLens is supported
Caching support for components provides ability
to trade-off degree of dynamism (and
customization)
Data Mining Challenges
A List of Challenges




Similarity (real-time)
Matching (real-time)
Trends (off-line)
Correlation (off-line)
The Similarity Problem

Find users with similar tastes, in context.
–

Joe’s looking at an Athlon processor; which users are
similar to Joe in their PC tastes? Whose
recommendations is Joe likely to follow?
Find similar content, in context.
–
–
–
Which processors are similar in that they appeal to
the same groups of people?
Which processors are similar in that they have similar
performance characteristics?
Which articles appeal to the same people?
The Matching Problem

Match user to data, in context.
–
What related information should you recommend to
Joe when he is looking at the Athlon PC product?




Related products: graphics cards, monitors
Related reviews, discussions
If Joe’s been looking only at AMD products, other AMD
chips; if not, show alternatives from Intel
Match data to user, in context.
–
Which expert is best qualified to answer Joe’s
question?
The Trends Problem





Identify trends in sales.
Identify trends in overall user preferences, user
segmentation.
Identify trends for individual users.
Identify trends in overall product popularity,
product segmentation.
Identify trends for specific products.
The Correlations Problem

Given a set of trends (e.g., in pricing) track the
impact on other trends.
–
–

Are there correlated trends?
Are there causal relationships?
Note that correlating a given trend to an overall
trend is hard enough, but trying to find all other
individual or product-specific trends that happen
to be correlated is much harder!
Problem Characteristics




Large datasets: Many users, huge activity
levels, lots of products, lots of documents, …
Real-time recommendations: “In context”
Constantly evolving data: Data mining models
can get outdated, want to find trends.
Variations:
–
–
Attach recommendation engine to a user’s browser,
rather than to the web server. (Purple Swami)
Look for similar documents across sites and extract
relevant metadata. (Whizbang)
Summary


Lots of challenges.
Lots of players.
–
Companies that provide applications and integrate
data mining into the application logic.

–
E.g., ATG, BroadVision, QUIQ, Vignette
Companies that provide data mining tools.

E.g., Blaze, Broadbase, DataSage, Engage, E.piphany, Net
Perceptions, Manna