No Slide Title - UCLA Computer Science

Download Report

Transcript No Slide Title - UCLA Computer Science

Patterns in Sequences and
Data Streams
Carlo Zaniolo
Computer Science Department
UCLA
E-Commerce Applications
 ZAIAS Corp: decision support and e-Services for
web-based auctions—technology to:
monitor multiple ongoing auctions,
determine the right price for auctioned goods, &
secure well-priced items by timely bids.
 Sophisticated analysis of sequence patterns and
instant response is needed for first two points.
 Similar requirements in many other applications
Sequence Analysis: many applications
 Mining web access logs for:
Customer characterization/segmentation
Target advertising—banner optimization
 Frequent sequences in market baskets
 Analysis of stock market trends
Double bottoms and similar patterns
 Fraud detection examples examples
Stolen credit cards, theft of cellular phones and user ids,
 Intrusion Detection:
Example: A peripatetic intruder that attempts successive
logins from proximity workstations---spatio-temporal
criteria used to detect such an attack
State of The Art
ADT (e.g.. Informix Datablades): Not
flexible enough, no Optimization
 In particular not suitable for infinite data
streams
SEQ: Enhanced ADTs (e g. sets and
sequences) with their own query language
SRQL: Adding sequence algebra operators
to relational model
SQL-TS
A query language for finding complex
patterns in sequences
Completely based on SQL---Minimal
extensions, only the from clause affected
A powerful query optimization technique based
on extensions of the Knuth, Morris & Pratt
(KMP) string-search algorithm
Example in Mining Weblogs
Consider a table or a stream of tuples:
Sessions(SessNo, ClickTime, PageNo, PageType)
That keeps track of pages visited in a session
(sequence of requests from the same user)
Page types include:
content (‘c’)
description of product (‘d’)
purchase (‘p’)
Ads a web merchant dream of: c, d, p
3 clicks for a purchase
 SQL-TS queries to find the ideal 3-click scenario
SELECT B.PageNo, C.ClickTime
FROM Sessions
PARTITION BY SessNo
ORDER BY ClickTime
AS (A, B, C)
WHERE
A.PageType=‘c’
AND
B.PageType=‘d’
AND
C.PageType=‘p’
PARTITION BY and SEQUENCE BY were used in the original
paper
Credit Card Spending
Consider a Table log that keeps track of
credit card transactions:
Spending(Date, AccountNo, Amount)
A surge in average spending might be sign of
credit card theft.
Credit Card Fraud Detection in SQL-TS
Track 30-day average spending and when it increases considerably
for two consecutive days:
Select Z.AccountNo, Z.Date
FROM Spending
PARTITION BY AccountNo
SEQUENCE BY Date
AS (X+, Y, Z)
WHERE COUNT(X+)=30
AND
Y.Amount> 5*AVG(X+.Amount)
AND
Z.Amount > 5*AVG(X+.Amount)
 +X denotes 1 or more occurrences of X
 Aggregates can be computed on the stars
Example in Online Auction
 stream containing ongoing bids:
Bids(auctn_id, Amount, Time)
 Table describing auctions:
auctions(auctn_id, item_id, min_bid, deadline,…)
Find bids that are converging to a fixed price during
the last 15 minutes of the auction.
Example in Online Auction in SQL-TS
Find three successive bids that raise the last bid by
less than 2% during the last 15 minutes of auction:
SELECT T.auction_id, T.time, T.amount
FROM Auctions AS A,
Bids PARTITION BY auctn_id
ORDER BY time
AS (X, Y, Z, T)
WHERE A.auctn_id=X.auctn_id
AND
X.time + 15 Minute > A.deadline
AND
Y.amount - X.amount < 0.02*X.amount
AND
Z.amount - Y.amount < 0.02*Y.amount
AND
T.amount - Z.amount < 0.02*Z.amount
Conclusion
 SQL-TS is a simple but powerful SQL
extension to searche for pattern in
sequences and time series.
QL-TS is supported by powerful query
optimization techniques based on a
generalization of the Knuth, Morris and
Pratt text search algorithm.
References
1.
2.
3.
4.
5.
6.
Reza Sadri, Carlo Zaniolo, Amir Zarkesh, Jafar Adibi:
Expressing and optimizing sequence queries in database
systems. ACM Transactions on Database Systems
(TODS)Volume 29 , Issue 2 (June 2004)
Reza Sadri, Carlo Zaniolo, Amir M. Zarkesh, Jafar Adibi:
Optimization of Sequence Queries in Database Systems.
PODS 2001.
Reza Sadri, Carlo Zaniolo, Amir M. Zarkesh, Jafar Adibi: A
Sequential Pattern Query Language for Supporting Instant
Data Minining for e-Services, VLDB 2001.
R. Sadri, Optimization of Sequence Queries in Database
Systems Ph.D. Thesis, UCLA, 2001.
P. Seshadri, M. Livny, and R. Ramakrishnan. SEQ: A model for
sequence databases. In ICDE,, 1995.
P. Seshadri, M. Livny, and R. Ramakrishnan. Sequence query
processing. ACM SIGMOD Conference on Management of
Data,, May 1994