No Slide Title - UCLA Computer Science
Download
Report
Transcript No Slide Title - UCLA Computer Science
Database-Centric Data Mining
and Inductive DBMS
Carlo Zaniolo
Computer Science Department
UCLA
E-Commerce Applications
ZAIAS Corp: decision support and e-Services for
web-based auctions—technology to:
monitor multiple ongoing auctions,
determine the right price for auctioned goods, &
secure well-priced items by timely bids.
Sophisticated analysis of sequence patterns and
instant response is needed for first two points.
Similar requirements in many other applications
Sequence Analysis: many applications
Mining web access logs for:
Customer characterization/segmentation
Target advertising—banner optimization
Frequent sequences in market baskets
Analysis of stock market trends
Double bottoms and similar patterns
Fraud detection examples examples
Stolen credit cards, theft of cellular phones and user ids,
Intrusion Detection:
Example: A peripatetic intruder that attempts successive
logins from proximity workstations---spatio-temporal
criteria used to detect such an attack
State of The Art
ADT (e.g.. Informix Datablades): Not
flexible enough, no Optimization
In particular not suitable for infinite data
streams
SEQ: Enhanced ADTs (e g. sets and
sequences) with their own query language
SRQL: Adding sequence algebra operators
to relational model
SQL-TS
A query language for finding complex
patterns in sequences
Completely based on SQL---Minimal
extensions, only the from clause affected
A powerful query optimization technique based
on extensions of the Knuth, Morris & Pratt
(KMP) string-search algorithm
Example in Mining Weblogs
Consider a table or a stream of tuples:
Sessions(SessNo, ClickTime, PageNo, PageType)
That keeps track of pages visited in a session
(sequence of requests from the same user)
Page types include:
content (‘c’)
description of product (‘d’)
purchase (‘p’)
Ads a web merchant dream of: c, d, p
3 clicks for a purchase
SQL-TS queries to find the ideal 3-click
scenario
SELECT B.PageNo, C.ClickTime
FROM Sessions
CLUSTER BY SessNo
SEQUENCE BY ClickTime
AS (A, B, C)
WHERE
A.PageType=‘c’
AND
B.PageType=‘d’
AND
C.PageType=‘p’
Credit Card Spending
Consider a Table log that keeps track of
credit card transactions:
Spending(Date, AccountNo, Amount)
A surge in average spending might be sign of
credit card theft.
Credit Card Fraud Detection in SQL-TS
Track 30-day average spending and when it increases considerably
for two consecutive days:
Select Z.AccountNo, Z.Date
FROM Spending
CLUSTER BY AccountNo
SEQUENCE BY Date
AS (*X, Y, Z)
WHERE COUNT(*X)=30
AND
Y.Amount > 5 * AVG(*X.Amount)
AND
Z.Amount > 5 * AVG(*X.Amount)
*X denotes 1 or more occurrences of X
Aggregates can be computed on the stars
Example in Online Auction
stream containg ongoing bids:
Bids(auctn_id, Amount, Time)
Table describing auctions:
auctions(auctn_id, item_id, min_bid, deadline,…)
Find bids that are converging to a fixed price during
the last 15 minutes of the auction.
Example in Online Auction in SQL-TS
Find three successive bids that raise the last bid by
less than 2% during the last 15 minutes of auction:
SELECT T.auction_id, T.time, T.amount
FROM Autctions AS A,
Bids CLUSTER BY auctn_id
SEQUENCE BY time
AS (X, Y, Z, T)
WHERE A.auctn_id=X.auctn_id
AND
X.time + 15 Minute > A.deadline
AND
Y.amount - X.amount < 0.02 * X.amount
AND
Z.amount - Y.amount < 0.02 * Y.amount
AND
T.amount - Z.amount < 0.02 * Z.amount
Conclusion
SQL-TS is a simple but powerful SQL
extension to searche for pattern in
sequences and time series.
QL-TS is supported by powerful query
optimization techniques based on a
generalization of the Knuth, Morris and
Pratt text search algorithm.
References
1.
2.
3.
4.
5.
6.
Reza Sadri, Carlo Zaniolo, Amir Zarkesh, Jafar Adibi:
Expressing and optimizing sequence queries in database
systems. ACM Transactions on Database Systems
(TODS)Volume 29 , Issue 2 (June 2004)
Reza Sadri, Carlo Zaniolo, Amir M. Zarkesh, Jafar Adibi:
Optimization of Sequence Queries in Database Systems.
PODS 2001.
Reza Sadri, Carlo Zaniolo, Amir M. Zarkesh, Jafar Adibi: A
Sequential Pattern Query Language for Supporting Instant
Data Minining for e-Services, VLDB 2001.
R. Sadri, Optimization of Sequence Queries in Database
Systems Ph.D. Thesis, UCLA, 2001.
P. Seshadri, M. Livny, and R. Ramakrishnan. SEQ: A model for
sequence databases. In ICDE,, 1995.
P. Seshadri, M. Livny, and R. Ramakrishnan. Sequence query
processing. ACM SIGMOD Conference on Management of
Data,, May 1994