Discovery Robust Knowledge from Databases that Change

Download Report

Transcript Discovery Robust Knowledge from Databases that Change

Discovering Robust Knowledge
from Databases that Change
Chun-Nan Hsu
Craig A. Knoblock
Arizona State University
University of Southern California
Journal of Data Mining and Knowledge Discovery,
Volume 2, 1998
Presented & Edited by: Mark Stebbins
CS331/295 Data Mining – Spring 2009
1
Outline






Introduction
Preliminaries
Robustness of Knowledge
Applying Robustness in Knowledge
Discovery
Experimental Results
Conclusions
2
Introduction: Importance
Many applications of knowledge discovery and data mining
require the knowledge to be consistent with data.

Databases usually changes over time and make machinediscovered knowledge inconsistent.
 How can we detect, check & update these inconsistent
rules without high maintenance costs?
Useful knowledge should be robust against database changes
so that it is unlikely to become inconsistent after database
changes.
 Defines a notion of robustness in the context of relational
databases
 Describes how robustness can be estimated and applied in
knowledge discovery.

3
Introduction: Definition of Robustness


Why do we need robustness?
It is important to know whether a discovered
knowledge is robust against database changes.
Robustness: A measure of the quality of discovered
knowledge.
General definition of robustness:
Robustness of discovered knowledge can be defined as
the probability that the knowledge is consistent with a
database state. Or estimated in ratio form as:
4
Introduction: Applications of Robustness Estimation



Minimize the maintenance cost of inconsistent rules in the
presence of database changes.
When an inconsistent rule is detected, a system may:
 remove the rule: repeatedly invoke discovery system
 repair the rule: repaired frequently after data changes
Apply to rule discovery for semantic query optimization (SQO).
 Rule: all Maltese seaports have railroad access.
 Query: find all Maltese seaports with railroad access and 2,000 sq.ft
of storage.
5
Applications of Robustness Estimation (cont.)

Apply to guide the data mining and evaluation of semantic rules.
Two stages:



Data mining stage: uses a rule pruning approach to prune
antecedents of a discovered rule to increase its robustness.
Evaluation stage: eliminates rules when their estimated
robustness values are below a given threshold.
Apply to rule maintenance. Similar to rule pruning
approach.

Use the estimated robustness of the resulting partially repaired
rules to search for the best sequence of repair operators so that the
repaired rule is more robust than the original one.
6
Introduction: Robustness Comparison

Comparison with Support count:



Support count for an association rule expresses the
probability that a data instance satisfies a rule.
While the robustness expresses the probability that an entire
database state is consistent with a rule.
Comparison with predictive accuracy:
 Predictive accuracy for classification rule measures the
probability that knowledge is consistent with randomly
selected unseen data instead of with database states.
 The difference is significant in databases that are interpreted
using the closed-world assumption (CWA).
7
Introduction: Closed-world Assumption

Information that is not explicitly present in the database is taken to be false.

Closed-world databases are widely used in relational db’s, deductive db’s and
rule-based information systems because of


The characteristics of application domains.
The limitation of the representation systems.

Databases change by deletions, insertions and updates, and in a close-world
database, they may affect the validity of a rule.

Closed-world data tends to be dynamic, important for knowledge discovery
systems to handle dynamic and closed world data.

An instance of closed-world data usually represents a dynamic state in the
world.
E.g: an instance of employee information in a personnel database.
8
Example of a relational database

Schema:
ship_class(class_name, ship_type,max_draft, length,container_cap),
ship(ship_name, ship_class, status, fleet, year, built),
geoloc(name, glc_cd, country, latitude, longitude),
seaport( name, glc_code, storage, rail, road, anch_offshore),
wharf(wharf_id, glc_code, depth, length, crane_qty).

Rules:
R1:The latitude of a Maltese geographic location is greater than or equal to 35.89.
geoloc(_,_,?country,?latitude,_) ^ ?country = ”Malta”  ?latitude >= 35.89
R2: All Maltese geographic locations are seaports.
geoloc(_,?glc_cd, ?country,_,_) ^ ?country = ”Malta”  seaport(_, ?glc_cd, _, _,_,_)
9
Outline






Introduction
Preliminaries
Robustness of Knowledge
Applying Robustness in Knowledge
Discovery
Experimental Results
Conclusions
10
Preliminaries

Relational database: a set of relations

A relation contains a set of instances (tuples) of attributevalue vectors

Two types of literals:



Database literal: literal defined on database relations (e.g.:
seaport(_,?glc_cd,?storage,_,_,_)
Built-in literal: literal on built-in relations (e.g.: latitude > 35.89)
Two types of rules:


Range rule: rules with a positive built-in literal (e.g.: R1)
Relational rule: rules with a database literal as their consequent
(e.g.: R2)
11
Preliminaries (cont.)


Database state at time t: the collection of the instances
presents in the database at time t.
Consistent rule: given a database state, if all variable
instantiations that satisfy the antecedents of the rule also
satisfy the consequent of the rule.
12
Outline






Introduction
Preliminaries
Robustness of Knowledge
Applying Robustness in Knowledge
Discovery
Experimental Results
Conclusions
13
Definitions of Robustness


Robust rule: does not become inconsistent
(invalid) after database changes
Definition 1: (Robustness for all states)
Given a rule r, let D be the event that a database
is in a state that is consistent with r.
Robust
1
(r)
Pr
(D)
Robust1(r) 
# of database states consistent with r
# of all possible database states
14
Definitions of Robustness (cont.)


Intuitively, a rule is robust if it is unlikely that the
transactions will invalidate the rule.
Definition 2: (Robustness for accessible states)
Given a rule r, and a database in a state d, in which r is
consistent. Let t denote the event of performing a
transaction on d that results in new database states
inconsistent with r. The robustness of r in accessible states
from the current state d is defined as
Robust(r|d
)

Pr
(

t|d)

1
Pr
(t|d
15
Definitions of Robustness (cont.)

Observation: If all transactions are equally probable, then
Robust1(r )  Robust (r | d )

The robustness of a rule could be different in different
database states.
E.g.: Suppose there are two db states d1 and d2 of a given db. To
reach a state inconsistent with r, we need to delete 10 tuples in d1
and only 1 tuple in d2.
Robust(r|d1) > Robust(r|d2)
16
Estimate Robustness



Key idea: estimates the probabilities of data
changes, rather than the number of possible
database states.
Estimate the robustness of a rule based on the
probability of transactions that may invalidate the
rule.
Decompose data changing transactions and
estimate their probabilities using the Laplace law
of succession.
17
Estimate Robustness (cont.)

Laplace law of Succession:
Given a repeatable experiment with an outcome of one of
any of k classes. Suppose we have conducted this
experiment n times, r of which have resulted in some
outcome C, in which we are interested. The probability
that the outcome of the next experiment will be C can be
estimated as
r 1
nk
18
Examples to Estimate Robustness

R1:The latitude of a Maltese geographic location is greater than or
equal to 35.89.
geoloc(_,_,?country,?latitude,_) ^ ?country = “Malta”  ?latitude >=
35.89
Find the robustness of R1 ?

T1 (Update a satisfied tuple): One of the existing tuples of geoloc with
its ?country = “Malta” is updated such that its ?latitude < 35.89

T2 (Insert a new tuple): A new tuple of geoloc with its ?country =
“Malta” and ?latitude < 35.89 is inserted to the database.

T3 (Update an unsatisfied tuple): One of the existing tuples of geoloc
with its ?latitude < 35.89 and ?country != “Malta” is updated such that
its ?country = “Malta”.
19
Examples to Estimate Robustness
(cont.)


T1, T2, and T3: transactions that invalidate R1.
Since T1, T2, and T3 are mutually exclusive, we have
Pr(T 1  T 2  T 3)  Pr(T 1)  Pr(T 2)  Pr(T 3)


The robustness of R1 can be estimated from the
probabilities of T1, T2, and T3.
Requirements for Transactions Classes
 Transaction classes must be mutually exclusive.
 Minimal, no redundant conditions are specified.
?country = ‘Malta’ ?latitude < 35.89 and ?longitude > 130.00
20
Decomposition of Transactions
How to estimate Pr(T1), Pr(T2), Pr(T3)?



Decompose the transaction into more primitive
statements and estimate their local
probabilities first
Use Bayesian network model to decompose
the transaction (next slide).
21
Bayesian network model of
transaction
X3:
on which tuples?
X1:
type of transaction?
X2:
on which relation?
X4:
on which attributes?
X5:
what new attribute
value?

Nodes in the network represent the random variables involved in the
transaction.

The arc from node x_i to x_j indicates that x_j is dependent on x_i.
22
Decomposition of Transaction

The probability of a transaction can be estimated as the
joint probability of all variables
Pr(T 1)  Pr( x1  x 2  x3  x 4  x5)
 Pr( x1)  Pr( x 2 | x1)  Pr( x3 | x 2  x1)  Pr( x 4 | x 2  x1)  Pr( x5 | x 4  x 2  x1)

Example for the transaction T1, their semantics are as
follows:





x1: a tuple is updated
x2: a tuple of geoloc is updated
x3: a tuple of geoloc, whose ?country = ‘Malta’ is updated
x4: a tuple of geoloc whose ?latitude is updated
x5: a tuple of geoloc whose ?latitude is updated to a new value less
than 35.89
23
Estimation of Probability of T1
Apply the Laplace law to estimate each local conditional
probability.
 x1: a tuple is updated:


x2|x1: a tuple of geoloc is updated, given that a tuple is
updated:
x3|x2^x1: a tuple of geoloc whose ?country=“Malta” is
updated, given that a tuple of geoloc is updated:
24
Estimation of Probability of T1


x4|x2^x1: a tuple of geoloc whose ?latitude is updated
x5|x4^x2^x1: a tuple of geoloc whose ?latitude is
updated to a new value less than 35.89
25
Estimation of Probability of T1

Assume the size of the relation geoloc is 616, ten of them
with ?country=“Malta”

Pr(T1) = (1/3)*(1/5)*(10/616)*(1/5)*(1/2) = 0.000108

Similarly, we can estimate Pr(T2) and Pr(T3).

Suppose, that Pr(T2) = 0.000265 and Pr(T3) = 0.00002

Then the robustness of the rule can be estimated as 1(0.000108 + 0.000265 + 0.00002 ) = 0.999606
26
Estimate Robustness


Estimation accuracy depends on available information.
However, even given only database schemas, the method
can still come up with some estimates.
How to derive transactions that invalidate an arbitrary logic
statement? It is not a trivial problem!!!


Most knowledge discovery systems have strong restrictions on the
syntax of discovered knowledge.
Hence, the invalidating transactions can be manually generalized
into a small sets of transaction templates, as well as templates of
probability estimates for robustness estimation.
27
Outline






Introduction
Preliminaries
Robustness of Knowledge
Applying Robustness in Knowledge
Discovery
Experimental Results
Conclusions
28
Applying Robustness in Knowledge
Discovery



A rule pruning algorithm which can increase the robustness and
applicability of machine discovered rules by pruning their antecedent
literals.
Specification of rule pruning problem:
Take a machine-generated rule as input, which is consistent with a
database but potentially overly-specific, and remove antecedent literals
of the rule so that it remains consistent but is short and robust.
Basic idea of pruning algorithm:


To search for a subset of antecedent literals to remove until any further
removal will yield an inconsistent rule.
However the search space can be exponentially large with respect to the
number of literals in rule.
 A beam-search algorithm was presented to trim the search space.
29
Pruning Algorithm

Beam-search algorithm



Apply the robustness estimation approach to estimate
the robustness of a partially pruned rule and guide the
pruning search.
Two properties to optimize: robustness and length.
For each set of equally short rules, the algorithm
searches for the rule that is as robust as possible while
still being consistent.
30
Antecedent pruning algorithm

Pruning rule antecedents
1. INPUT R = rules (initially the rule to be pruned), B = beam size;
2. LET O = results;( initially empty);
3.
WHILE (R is not empty) DO
4.
Move the first rule r in R to O;
5.
Prune r, LET R’ = resulting rules;
6.
Remove visited dangling or inconsistent rules in R’;
7.
Estimate and sort on the robustness of rules in R’;
8.
Retain top B rules in R’ and remove the rest;
9.
Merge sorted R’ into R in sorted order of the robustness;
10. RETURN O ;
31
Empirical Demonstration of Rule
Pruning

A detailed empirical study has been conducted on rule R3.
wharf(_,?code,?depth,?length,?crane) ^
seaport(?name,?code,_,_,_,_) ^
geoloc(?name,_,?country,_,_) ^
?country = “Malta” ^
?depth <= 50 ^
?crane > 0
 ?length >= 1200

The relationship between length and robustness of the rules is shown
on the next slide.

Pruned rules
R7: wharf(_,?code,?depth,?length,?crane) ^
seaport(?name,?code,_,_,_,_) ^
geoloc(?name,_,?country,_,_) ^
?crane > 0
 ?length >= 1200
32
Empirical Demonstration of Rule
Pruning
33
Pruned rules and their estimated
robustness
34
Experimental Results
Experiment setups:
 Use the rule discovery system BASIL to derive rules from
two large ORACLE relations databases



BASIL part of SIMS information mediator developed by authors
Synthesize 123 sample transactions to represent possible
transactions of the experimental databases (27 updates, 29
deletions and 67 insertions)
Steps of the experiments:



Train BASIL to discover a set of rules and estimate their robustness,
generates 355 rules
Use another set of 202 sample transactions to assist the robustness
estimation.
Apply the set of 123 transactions to the two relational database and
check consistency of all 355 rules
35
Experimental results
36
Experimental results

Evaluation of predictive power of the
robustness estimation





Recall
= |I L| / |I |
Precision = |I L| / |L |
I is set of inconsistent rules
L is set of rules likely to become inconsistent
Consistency threshold = 0.75
Experimental results
Conclusions

Formalize the notion of robustness of a rule r in a given
database state d

Estimate probabilities of rule invalidating transactions



Apply in a rule discovery system



Decomposes the probability of a transactions into local probabilities
that can be estimated using Laplace law.
No need to provide additional information for the estimation.
Pruning antecedents of a discovered rule so that the rule will be
highly robust and widely applicable.
Beam-search algorithm
Conduct empirical experiments to demonstrate the
algorithm.
39
Real-world Applications

Face recognition software



Envirionment is the “data set” – constantly
changing
Rules must resist variability in lighting and pose
Systems prognostics


Data set continuously updated with diagnostic or
calibration information
Predictive failure rules used to dictate system
resource planning
Questions for final exam prep.
•
Define robustness. (Slides 4, 14, 15)
– the probability that the knowledge is
consistent with a database state
– Robustness for all states:
Robust1(r) 
# of database states consistent with r
# of all possible database states
– Robustness for accessible states
Robust(r|d
)

Pr
(

t|d)

1
Pr
(t|d)
41
Questions for final exam prep.
•
What is the closed world assumption and
the significance of CWA? Where is it used
in data mining? (Slide 8)
– Information that is not explicitly present in the
database is taken to be false.
– Significant because the data tends to be dynamic
– Used in relational db’s, deductive db’s and rulebased information systems
42
Questions for final exam prep.
•
Compare and contrast robustness, support
count and predicative accuracy (Slide 7)
– Support count for an association rule expresses the
probability that a data instance satisfies a rule.
– Predictive accuracy for classification rule measures
the probability that knowledge is consistent with
randomly selected unseen data
– Robustness expresses the probability that an entire
database state is consistent with a rule.
43

Thanks and questions
44