Transcript slides

G54DMT – Data Mining Techniques and
Applications
http://www.cs.nott.ac.uk/~jqb/G54DMT
Dr. Jaume Bacardit
[email protected]
Topic 3: Data Mining
Lecture 5: Regression and Association Rules
Some slides from chapter 5 of Data Mining. Concepts and Techniques by Han & Kamber
Outline of the lecture
• Regression
– Definition
– Representations
• Association rules
– Definition
– Methods
• Resources
Regression
• Regression problems are supervised problems
where the output variable is continuous
• Many techniques with different names are
included in this category
– Regression
– Function approximation
– Modelling
– Curve-fitting
• Given an input vector X and a corresponding
output y, we want to find a function f such
that y’=f(X) is as close as possible to the true y
Evaluating regression
• Supervised learning: we know the true
outputs, so we check how different are from
the predicted ones
n
1
MAE = å f (X i ) - y i
n i=1
– Mean Absolute Error
– Mean Squared Error
n
– Root Mean Squared Error MSE = 1 å ( f (X i ) - y i ) 2
n
i=1
RMSE = MSE
Linear Regression
• Most classic (and widespread in statistics)
type of regression
• f(X) is modelled as
– y’=w0+w1x1+w2x2+…+wnxn
http://upload.wikimedia.org/wikipedia/en/thumb/1/13/Linear_regression.png/400px-Linear_regression.png
Linear regression
• Simple but limited in expression power
– The same model would apply to these four
datasets
http://en.wikipedia.org/wiki/Anscombe%27s_quartet
Linear regression
• How to find W?
– Many mathematical methods available
•
•
•
•
Least squares
Ridge regression
Lasso
Etc
– We can also use some kind of metaheuristic (e.g. a
Genetic Algorithm)
Polynomial regression
• More complex and sophisticated functions
– y=w0+w1x+w2x2+…..
– Y=w0+w1x1+w2x2+w3x1x2+…
• Now the job is double
– Choosing the correct function (human inspection
may help)
– Adjusting the weights of the model
• Still, would a single mathematical function fit
any type of data?
Piece-wise regression
• Input space is partitioned in regions
• A local regression model is generated from the
training examples that fell inside each region
– Approximating a sine function with linear
regressions
(Butz, 2010)
Piece-wise regression
• How to partition the input space
– Using a series of rules
• With a (hyper)rectangular condition (XCSF)
• With a (hyper)ellipsoidal condition (XCSF,LWPR)
• With a neural condition (XCSF)
– Using a tree-like structure (CART, M5)
• How to perform the regression process for
each local approximation
– Pick any of the functions discussed before
– Plus some truly non-linear methods (SVR)
Piece-wise approximation with
hyperellipsoids
• Using XCSF (Wilson, 02) with hyperellipsoid
conditions (Butz et al, 08)
XCSF’s population
Test function
(Stalph et al, 2010)
Other regression methods
• Neural networks
– A MLP is natively a regression method
• Classification is done by discretising the output of the network
– It is proven that a MLP with enough hidden nodes can
approximate any function
• Support Vector Regression
– As in SVM, depending on the kernel we got linear or nonlinear regression
– The margin specifies a tube around the approximated
function. All points inside the tube have their errors
ignored
– Support Vectors are the points that lay outside the tube
Association Rules
• Association rules try to find frequent patterns
in the dataset that appear together
• It can use the class label but it does not have
to  we can consider it an unsupervised
learning paradigm
• Two types of elements being generated
– Association rules: They have antecedent and
consequent
– Frequent itemsets: They just have an antecedent.
• Both antecedent and consequent are logic
predicates (generally of conjunctive form)
Association rules mining
Witten and Frank, 2005 (http://www.cs.waikato.ac.nz/~eibe/Slides2edRev2.zip)
Origin of Association Rules
• These methods were originally employed to
analyse shopping carts
• Database is specified as a set of transactions.
Each of them includes one or more of a set of
items
• An frequent itemset is a set of items that
appears in many transactions
Tid
Items
10
A, C, D
• These databases are extremely sparse 20 B, C, E
30
A, B, C, E
40
B, E
Beers and diapers
• An urban myth about association rules says
that when applied to analyze a very large
volume of shopping carts they discovered a
very simple pattern
– “Customers that buy beer also tend to buy
diapers”
• This story has changed through time. You can
find an article about it here
• It is a good example of data mining, as it was
able to find an unexpected pattern
Why Is Freq. Pattern Mining Important?
• Discloses an intrinsic and important property of data sets
• Forms the foundation for many essential data mining tasks
– Association, correlation, and causality analysis
– Sequential, structural (e.g., sub-graph) patterns
– Pattern analysis in spatiotemporal, multimedia, time-series,
and stream data
– Classification: associative classification
– Cluster analysis: frequent pattern-based clustering
– Data warehousing: iceberg cube and cube-gradient
– Semantic data compression: fascicles
– Broad applications
Evaluation of association rules
• Support
– Percentage of examples covered by the predicate in the
antecedent
– Applies to both association rules and frequent itemsets
• Confidence
– Percentage of the examples matched by the antecedent
for which also match the consequent
– Only apply to association rules
• Typically, the user specifies a minimum support and
confidence and the algorithm finds all rules above
the thresholds
Scalable Methods for Mining Frequent Patterns
• The downward closure property of frequent patterns
– Any subset of a frequent itemset must be frequent
– If {beer, diaper, nuts} is frequent, so is {beer, diaper}
– i.e., every transaction having {beer, diaper, nuts} also contains
{beer, diaper}
• Scalable mining methods: Three major approaches
– Apriori (Agrawal & Srikant@VLDB’94)
– Freq. pattern growth (FPgrowth—Han, Pei & Yin
@SIGMOD’00)
– Vertical data format approach (Charm—Zaki & Hsiao
@SDM’02)
Apriori: A Candidate Generation-and-Test Approach
• Apriori pruning principle: If there is any itemset which is
infrequent, its superset should not be generated/tested!
(Agrawal & Srikant @VLDB’94, Mannila, et al. @ KDD’ 94)
• Method:
– Initially, scan DB once to get frequent 1-itemset
– Generate length (k+1) candidate itemsets from length k
frequent itemsets
– Test the candidates against DB
– Terminate when no frequent or candidate set can be
generated
The Apriori Algorithm—An Example
Supmin = 2
Itemset
sup
{A}
2
{B}
3
{C}
3
{D}
1
{E}
3
Database TDB
Tid
Items
10
A, C, D
20
B, C, E
30
A, B, C, E
40
B, E
C1
1st scan
C2
L2
Itemset
{A, C}
{B, C}
{B, E}
{C, E}
sup
2
2
3
2
Itemset
{A, B}
{A, C}
{A, E}
{B, C}
{B, E}
{C, E}
sup
1
2
1
2
3
2
Itemset
sup
{A}
2
{B}
3
{C}
3
{E}
3
L1
C2
2nd scan
Itemset
{A, B}
{A, C}
{A, E}
{B, C}
{B, E}
{C, E}
C3
Itemset
{B, C, E}
3rd scan
L3
Itemset
sup
{B, C, E}
2
The Apriori Algorithm
• Pseudo-code:
Ck: Candidate itemset of size k
Lk : frequent itemset of size k
L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1
that are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk;
Resources
• “The Elements of Statistical Learning” by
Hastie et al. contains a lot of detail about
statistical regression
• List of Regression and association rules
methods in KEEL
• Weka also contains both kind of methods
• Chapter 5 of the Han and Kamber book is all
about association rules (Han created the
Fpgrowth method)
• Review of evolutionary algorithms for
association rule mining
Questions?