Distributed Action Rules Mining and Feasibility

Transcript Distributed Action Rules Mining and Feasibility

ACTION RULES
Slides by A A Tzacheva.
Knowledge Discovery

The nontrivial process of identifying valid, novel, potentially
useful, and ultimately understandable patterns in data.
(Fayyad, et al 1996)
Data mining is the process of discovering meaningful new
correlations, patterns and trends by sifting through large
amounts of data stored in repositories.
Gartner Group
“The Saying that Knowledge Is Power Is
Not Quite True…
Used Knowledge Is Power”
Edward E. Free
2
Knowledge Discovery

Knowledge Discovery of Databases
(KDD) is a new area of research that
combines many algorithms and techniques
used in artificial intelligence, statistics,
databases, machine learning, etc.

KDD is the process of extracting
previously unknown, not obvious, new,
and interesting information from huge
amount of data

Past research on data mining has mostly
been focused on techniques for generating
rules from datasets
3
Knowledge Discovery
Cleaning
Integration
Selection
Transformation
Data
Warehouse
Data
Mining
Evaluation
Visualization
Prepared
data
Patterns
Knowledge
Knowledge
Base
Data
4
Knowledge Discovery
[Pohle, 2003]
Many data mining systems are great in deriving useful
statistics and patterns from huge amounts of data, but
they are not very smart in interpreting these results,
which is crucial for turning them into interesting,
understandable and actionable knowledge.
Lack of sophisticated tool support for incorporating
human domain knowledge into the mining process. This
domain knowledge should be updated with the mining
results.
Mining Process (Fayyad): [[Business Understanding] 
[Domain Knowledge]]  [Data Understanding] 
[Data Preparation]  [Modeling/Mining] 
[Evaluation]  [Deployment]
5
Interestingness Function
Associations: two conditions occur together, with some
confidence
E = [Cond1  Cond2]
Presumptive
Objective
Data Mining Task:
For a given dataset D, language of facts L, interestingness
function ID, L and threshold c, find association E such
that ID,L(E) > c efficiently.
Knowledge Engineer defines c
6
Are All the “Discovered” Patterns Interesting?

The rules discovered by data mining algorithm are large and we want a subset of
rules, which are interesting, because these algorithms discover accurate rules
rather than interesting rules.

Association is interesting if it is easily understood by humans, valid on new or
test data with some degree of certainty, potentially useful, novel, or validates
some hypothesis that a user seeks to confirm .

There are two aspects of rules’ interestingness that have been studied in data
mining literature, objective and subjective measures

Objective measures are data-driven and domain-independent. Generally, these
measures evaluate the rules based on the quality as well as the similarity between
them, rather than considering the user belief about the domain.
* Note: Domain here is meant in a sense of the type of data – ex. financial data
(means financial domain), medical data (means medical domain), therefore if the
measures being calculated are independent of the domain, then they can always
be calculated - no matter if the data is financial, medical or any other type.
7
Objective Measure Examples
Assume    is an association rule
Some objective measures are:
Support or Strength: card[  ]
Confidence or Certainty Factor: card[]/card[]
Coverage Factor: card[]/card[]
Leverage: card[]/n – [card[]/n]*[card[]/n]
Lift: n  card[]/[card[]*card[]]
8
Problem: Subjective Interestingness

Rule is:
 unexpected, if it contradicts the user belief about
the domain and therefore surprises the user
 novel, if to some extent contributes to new
knowledge
 actionable, if the user can take an action to his/her
advantage based on this rule
In the data mining literature the actionability has been
quantified in terms of unexpectedness. For example:
- the most of actionable knowledge is unexpected.
- the most of unexpected knowledge is actionable.

9
Actionability - Subjective

So, if we are able to calculate the actionability of a rule
(like the way we are able to calculate support and
confidence) then, we have a way of knowing whether our
rule is unexpected and interesting (we have a way to
measure the interestingness of the rule)

In the data mining literature, data mining is viewed as the
process of turning data into information, information into
action, and action into value or profit.

However, the task of finding actionable rules is not trivial.
As actionability is seen as an elusive concept because it is
difficult to know the space of all rules and the actions to
be attached to them.
10
Argument: Objective Unexpectedness
Unexpectedness
/does not depend on domain knowledge/
If r = [A  B1] has a high confidence and r1 = [A*C  B2]
has a high confidence, then r1 is unexpected.
Unexpectedness is inherently subjective and prior beliefs of
the user form its important component.
[Padmanabhan & Tuzhilin]
A  B is unexpected with respect to the belief   
on the dataset D if the following conditions hold:
 B   = False [ B and  logically contradict each other]
 A   happen together on a large subset of D
 A*  B is true, which means A*  
11
Actionability

The actionability measure is based on the rules’ benefit to
the user, that is, the user can do something to his/her
interest with the rule.

This measure is very important for the rules to be
interesting in the sense that the users always are looking
for patterns to improve their performance and establishing
better work.

The practical implication of getting information is to
improve the business, that is, the information must ensure
the success of business for decision-making. Actions can
be performed to make the business succeed.
12
Actionable Rules and Action Rules

Actionable rule mining deals with benefit-  There are methods which define
actionability as an approximation of
driven actions required for decision
unexpectedness.
making

Rules are unexpected if they "surprise"
the user, and rules are actionable if the
user can do something with them to
his/her advantage

For example, a user may be able to
change the nondesirable/non-profitable
patterns to desirable/profitable patterns

Although both unexpectedness and actionability are important, actionability is
the key concept in most applications because actionable rules allow the user to do
his/her job better by taking some specific actions in response to the discovered
knowledge.

In order to produce unexpected
and/or actionable rules, the system
must know what the user expects,
i.e., his/her existing knowledge or
concepts about the domain.

Machine learning also typically
assumes that the domain knowledge
is correct or at least partially correct.
13
ActionAction
Rules Rules

Next, we will focus on a special type of rules, called action rules,
which are actionable rules. We will study a well-defined algorithm for
discovering such rules.

These rules can be constructed from classification rules to suggest a
way to re-classify objects (for instance customers, or patients) to a
desired state.

In e-commerce applications, this re-classification may mean that a
consumer not interested in a certain product, now may buy it, and
therefore may fall into a group of more profitable customers. In
medical domain, this re-classification may mean how to change the
class of a tumor from malignant to benign.
14
ActionAction
Rules Rules


These groups are described by
values of classification
attributes in a decision table
schema. By a decision table we
mean any information system
where the set of attributes is
partitioned into conditions and
decisions.

For example, date of birth is a
stable attribute, and interest rate
on any customer account is a
flexible attribute (dependable
on bank).

The assumption that the
decision attribute d is flexible is
quite essential.
To discover action rules it is
required that the set of
conditions is partitioned into
stable conditions and flexible
conditions/attributes. For
simplicity reason, we also
assume that there is only one
decision attribute.
15
Action Rules
Decision table
* Note: the objects are
the rows of the
database (decision
table), and the
attributes are the
columns of the
database.
Any information system S of the form
S = ( AFl  ASt  {d} ), where
 d is a distinguished attribute called decision.
 the elements of ASt are called stable conditions
 the elements of AFl  {d} are called flexible conditions
Example of action rule:
[ (b1, v1 w1)  (b2, v2  w2)  …  (bp, vp  wp)](x) 
[(d, k1  k2)](x)
This means that, if we change the value of attribute b1 from
v1 to  w1 ,and the value of attribute b2 from v2 to  w2
,and so on, and the value of attribute bp from vp to  wp ,
then the value of the decision attribute d, will change from
k1  to the desired value k2 .
Assumption: (i)[(1 i  p)  (bi  AFl)] – in other
words, the attributes b1, b2, …, bp are all flexible
16
Action Rules
X
x1
x2
x3
x4
x5
x6
x7
a
0
0
0
0
2
2
2
b
S
R
S
R
P
P
S
c
0
1
1
1
2
2
2
d
L
L
L
L
L
L
H
Decision Table
Rules discovered:
r1 = [
(b, P)  (d, L)]
r2 = [(a, 2) ^ (b, S)  (d, H)]
{a, c} - stable attributes,
{b,d} - flexible attributes,
d - decision attribute.
(its values are L – Low
profitability customer,
and H – High profitability
customer)
(r1, r2)- action rule:
[(b, P S)](x) [(d, L H)](x)
17
Practical Examples

Next, we see some action rules extracted
from 3 different databases – 2 in medical
domain, and 1 in financial domain:
Binding to thrombin database
 Insurance company benchmark database
 Breast cancer database


However, we first need to introduce one
more notation – the cost of an action rule
18
Cost of Action Rule

Usually, there is a cost (monetary or moral) association with
undertaking some kind of an action. For example:



decreasing the interest rate on a customer account (reclassifying the customer from one interest rate group to
another) may cost us mailing a letter to them, and doing some
internal administration to the account, say $5.
relocating an employee from one city to another (reclassifying the employee from one division to another) may
cost us the moving expense, in addition to a moral cost – some
negative emotions of the employee about it may influence
his/her future performance or perception of our organization.
We will denote the cost with  - a number from 0 to + . The
cost will be close to 0 if the action is trivial (very easy to
accomplish) and the cost will be close to plus infinity + if the
action is very difficulty (almost impossible) to accomplish.
19
Cost of Action Rule
Assumption: If we have an information system S, S = (X, A,
V) , where X are the objects, A are the attributes, and V
are the values of the attributes, assume attribute b  A
is flexible, and b1, b2  Vb (b1 and b2 are some of the
values of b).
By S(X, b1, b2) we mean a number from (0, +] which
describes the average predicted cost of approved action
associated with a possible re-classification of qualifying
objects X from class b1 to class b2.
Object X qualifies for re-classification from b1 to b2, if
b(X) = b1 (currently the value of b is b1 for that object)
20
Cost of Action Rule
Action rule r:
[(b1, v1→ w1)  (b2, v2→ w2)  … ( bp, vp→ wp)](x)  (d, k1→ k2)(x)
The cost of the left hand side of the rule r (costLeft) equal to the
sum of the costs of the terms listed in the left hand side:
costLeft = {S(vi , wi) : 1  i  p}
Action rule r is feasible in S, if costLeft < S(k1 , k2).
For any feasible action rule r, the cost of the conditional
Part (left hand side) of r is lower than the cost of its decision
part (right hand side, where the decision attribute is listed)
21
Binding to Thrombin Database



The first database, is the Binding to Thrombin database, is
used for drug design, and provided in the KDD Cup 2001
Competition.
Drugs are typically small organic molecules that achieve their
desired activity by binding to a target site on a receptor. The
first step in the discovery of a new drug is usually to identify
and isolate the receptor to which it should bind, followed by
testing many small molecules for their ability to bind to the
target site.
This leaves researchers with the task of determining what
separates the active (binding) compounds from the inactive
(non-binding) ones. Such a determination can then be used in
the design of new compounds that not only bind, but also have
all the other properties required for a drug (solubility, oral
absorption, lack of side effects, appropriate duration of
action, toxicity, etc.).
22
Binding to Thrombin Database



The data set consists of 1909 compounds (the objects /rows
in the database) tested for their ability to bind to a target
site on thrombin, a key receptor in blood clotting.
Each compound is described by binary features (the
attributes / columns in the database), which describe threedimensional properties of the molecule. Biological activity in
general, and receptor binding affinity in particular, correlate
with various structural and physical properties of small
organic molecules. The task with KDD Cup 2001 was to
determine which of these properties are critical in this case
and to learn to accurately predict the class value: Active or
Inactive.
In this testing we use the class attribute, which has value A
for active and I for inactive, as the re-classification
attribute for the actionRules. In this way, we provide
suggestions to the user to what molecular properties can be
changed in order to reclassify the chemical compound from
inactive to active class, in order to bind to thrombin.
23
Binding to Thrombin Database

The following results were found with LowestCostReclassifier
(software for extracting action rules of lowest cost) :
decisionAttribute = activity
valueFrom = 0
valueTo = 1
dataFile= thrombin
numberOfObjects: 1908
minConfidenceL1 = 0.65
minFeasibilityL3 = 0.0001
knownCost = 0.3
maxCostL2 = 0.01
---- Goal Node:
(f304, 1->0 | 0.12327) => (activity, 0->1 | 0.3)
1
(f172, 0->1 | 0.00472118) => (f304, 1->0 | 0.12327)
---- Action Rule of Min Cost Found: ---(f172, 0->1 | 0.00472118) => (activity, 0->1 | 0.3)
0.998424
0.998424
24
Insurance Company Benchmark database




The next database used is in the financial
domain, the Insurance Company Benchmark
(COIL 2000) database used with the CoIL
2000 Challenge.
The data contains 5,822 tuples (the customers /rows in the
database). The features (the attributes / columns in the
database) include product usage data and socio-demographic
data derived from zip area codes.
The data was supplied by the Dutch data mining company
Sentient Machine Research and is based on a real world
business problem.
In our testing the user would like to reclassify the attribute
Contribution car policies from a value of 5 to 6.
25
Insurance Company Benchmark database

The following results were found with LowestCostReclassifier :
decisionAttribute = Contribution car policies
valueFrom = 5
valueTo = 6
dataFile= Insurance
numberOfObjects: 5822
minConfidenceL1 = 0.72
minFeasibilityL3 = 0.0001
knownCost = 0.4
maxCostL2 = 0.02
---- Goal Node:
(Private health insurance, 3->4 | 0.02423844) => (Contribution
car policies, 5->6 | 0.4) 0.833333
(High level education, 2->3 | 0.00176027) ^ (Social class B1,
2->4 | 0.0146667) => (Private health insurance, 3->4 | 0.3)
0.714286
---- Action Rule of Min Cost Found: ---(High level education, 2->3 | 0.00176027) ^ (Social class B1,
2->4 | 0.0146667) => (Contribution car policies, 5->6 | 0.4)
0.714286
26
Breast Cancer Database




Another database used is a breast
cancer database. It was obtained at
the University of Wisconsin
Hospitals, Madison from Dr. William
H. Wolberg.
Time
It contains a class attribute which classifies the tumor as
benign or malignant. The rest of the attributes (columns in
the database) contain descriptions of common factors
radiologists, and pathologists examine in order to place the
diagnosis, such as Clump Thickness, Uniformity of Cell Shape,
Bare Nuclei, etc.
The database has 700 instances (the objects /rows in the
database), Benign: 458 (65.5%) and Malignant: 241 (34.5%).
The class attribute is used as the re-classification attribute
with LowestCostReclassifier. This way, we will provide
suggestions/actions to be undertaken in order to change the
class from malignant to benign.
27
Breast Cancer Database

The following results were found with LowestCostReclassifier :
decisionAttribute = Class
valueFrom = 4
valueTo = 2
dataFile= brcancer
numberOfObjects: 699
minConfidenceL1 = 0.7
minFeasibilityL3 = 0.0001
knownCost = 0.3
maxCostL2 = 0.00059
.
.
.
!!!!!!!List of sifted action rules is empty
i.e. no feasible action rules were found, so
this state/child has no descendents:
(Marginal Adhesion, 1->3 | 0.00128449)
@@@@@@@@ Traversing next node
The list Q of nodes traversed was empty.
28
Breast Cancer Database
---- Best Node ---(Uniformity of Cell Size, 8->1 | 0.00179706) ^ (Normal
Nucleoli, 5->3 | 0.00721622) => (Class, 4->2 | 0.3)
1
(Marginal Adhesion, 1->3 | 0.00128449) => (Uniformity of Cell
Size, 8->1 | 0.00179706) 0.85
---- Action Rule of Min Cost Found: ---(Marginal Adhesion, 1->3 | 0.00128449) => (Class, 4->2 | 0.3)
0.85

In this case, the goal could not be reached,
possibly because the desired maximum cost
specified, 0.00059 was too low. However,
still the best node found thus far was
returned, which cost 0.00128449 is still
lower that the currently known cost to the
user 0.3.
29

Distributed Action Rules Mining and Feasibility

Transcript Distributed Action Rules Mining and Feasibility

Directory