Discovering Robust Knowledge from Databases that Change
Download
Report
Transcript Discovering Robust Knowledge from Databases that Change
Discovering Robust
Knowledge from
Databases that Change
Author: Chun-Nan Hsu,
Craig A. Knoblock
Advisor: Dr. Hsu
Graduate: Yu-Wei Su
2001/12/18
1/50
Abstract
Databases usually change over time and
make machine-discovered knowledge
inconsistent
Useful knowledge should be robust against
database changes so that it is unlikely to
become inconsistent after database changes
2001/12/18
2/50
Abstract( cont.)
Defines this notion of robustness in database
and describes how robustness of first-order
Horn-clause rules can be estimated and
applied in knowledge discovery
2001/12/18
3/50
outline
Motivation
Objective
Terminology
Robustness of knowledge
Definitions of robustness
Estimating robustness
Templates for estimating robustness
Empirical demonstration of robustness estimation
Applying robustness in knowledge discovery
Experimental results
Conclusion and future work
opinion
2001/12/18
4/50
Motivation
Many application require discovery
knowledge to be consistent in all database
states
Most solution approach to these problem
assume static databases
2001/12/18
5/50
Objective
To discover robust knowledge that is unlikely
to become inconsistent with new database
states
To presents an efficient approach to the
estimation and use of the new measure
2001/12/18
6/50
Terminology
Robustness can be defined as the probability that
the knowledge is consistent with a database state
This paper considers relational databases, which
consist of a set of relations
2001/12/18
7/50
Terminology( cont.)
Horn-clause rules that express the regularity of data
To literals defined on database relation as database
literal and literals on built-in relations as built-in
literals
Range rule built-in literal
Relational rule database literals
2001/12/18
8/50
Terminology( cont.)
A database state at a given time t is the
collection of the instances present in the
database at time t
To use the close-world assumption( CWA) to
interpret the semantic of a database state
A rule is consistent with a database state if all
variable instantiations that satisfy the
antecedents of the rule also satisfy the
consequent of the rule
2001/12/18
9/50
Terminology( cont.)
2001/12/18
10/50
Robustness of knowledge
Definitions of robustness
Estimating robustness
Templates for estimating robustness
Empirical demonstration of robustness
estimation
2001/12/18
11/50
Definitions of robustness
Definition 1(Robustness for all states)
Given a rule r, let D be the event that a database is in
a state that is consistent with r. The robustness of r is
Robust1(r)=Pr(D)
# of database states consistent with r
Robust1(r)=
# of all possible database states
Two problems: treats all database states are equally
probable and possible database states is intractably
large
2001/12/18
12/50
Definitions of robustness( cont.)
Definition 2( Robustness for accessible states)
Given a rule r, a database in a state denoted
as d, in which r is consistent. New database
states are accessible from d by performing
transactions. Let t denote the event of
performing a transaction on d that result in
new database state inconsistent with r
2001/12/18
13/50
Definitions of robustness( cont.)
The robustness of r in accessible states from
the current state d is
Robust (r | d ) Pr(t | d ) 1 Pr(t | d )
2001/12/18
14/50
Definitions of robustness( cont.)
Corollary 3
If r is consistent with d, and if new database
states are accessible from d only by performing
transaction, and all transaction are equally
probable, then
Robust1(r)=Robust(r|d)
2001/12/18
15/50
Definitions of robustness( cont.)
Example
to reach a state inconsistent with r
d1 delete ten tuples
d2 delete one tuple
Robust(r|d1) > Robust(r|d2)
2001/12/18
16/50
Estimating robustness
Laplace law of succession
Given a repeatable experiment with an outcome
of one of any of k classes
experiment n times
r of which have resulted in some outcome C
The probability that the outcome of the next
experiment will be C can be
r 1
nk
2001/12/18
17/50
Estimating robustness( cont.)
m-probability
Let r, n, C be as laplace law
Pr(C) is probability that has an outcome C
m is an adjusting constant that indicates our
confidence in Pr(c)
The probability that the outcome of the next
experiment will be C can be
r m Pr(c)
nm
2001/12/18
18/50
Estimating robustness( cont.)
Laplace law is a special case of m-probability
with Pc(c)=1/k and m=k
To estimate the robustness of a rule based on
the probability of transactions that may
invalidate the rule
To decomposed into the transactions of
deriving a set of invalidating transaction and
estimating the probability of those
transactions
2001/12/18
19/50
Estimating robustness( cont.)
example
2001/12/18
20/50
Estimating robustness( cont.)
T1, T2 and T3 are mutually exclusive with
each other and these cover all possible
transactions will invalidate R2.1
Pr(T 1 T 2 T 3) Pr(T 1) Pr(T 2) Pr(T 3)
Robust ( R 2.1 | d ) 1 Pr(T1 T 2 T 3)
2001/12/18
21/50
Estimating robustness( cont.)
To decompose the transaction into more primitive
statements and estimate their local probabilities.
The decomposition is based on a bayesian network
model
2001/12/18
22/50
Estimating robustness-example
Pr(T 1) Pr( x1 x 2 x3 x 4 x5)
Pr( x1) Pr( x 2 | x1) Pr( x3 | x 2 x1) Pr( x 4 | x 2 x1) Pr( x5 | x 4 x 2 x1)
X1: a tuple is updated
X2: a tuple of geoloc is updated
X3: a tuple of geoloc, whose ?country=“malta”, is
updated
X4: a tuple of geoloc whose ?latitude is updated
X5: a tuple of geoloc whose ?latitude is updated to a
new value less than 35.89
2001/12/18
23/50
Estimating robustness-example( cont.)
X1: a tuple is updated
tu is the number of pervious updates
t is the total number of pervious transactions
tu 1
Pr( x1)
t 3
If no information is available, assume tu=t=0
2001/12/18
24/50
Estimating robustness-example( cont.)
X2: a tuple of geoloc is updated
R is the number of relations in the database
tu,geoloc is the number of updates made to tuples of
relation geoloc
tu , geoloc 1
Pr( x 2 | x1)
tu R
2001/12/18
25/50
Estimating robustness-example( cont.)
X3: a tuple of geoloc, whose ?country=“malta”,
is updated
G is the size of relation geoloc
Ia3 is the number of tuples in geoloc
satisfy ?country=“malta”
Tu,a3 is the number of updates made on the tuples
in geoloc that satisfy ?country=“malta”
Pr( x3 | x2 x1)
2001/12/18
tu , a 3 1
tu , geoloc G / Ia3
26/50
Estimating robustness-example( cont.)
X4: a tuple of geoloc whose ?latitude is
updated
A is the number of attributes of geoloc
Tu,geoloc,latitude is the number of updates made on
the latitude attribute of the geoloc relation
tu , geoloc, latitude 1
Pr( x4 | x2 x1)
tu , geoloc A
2001/12/18
27/50
Estimating robustness-example( cont.)
X5: a tuple of geoloc whose ?latitude is
updated to a new value less than 35.89
0.5 no information available
Pr( x5 | x 4 x 2 x1)
0.398 with range information
2001/12/18
28/50
Templates for estimating
robustness
The templates allow the system to
automatically estimate the robustness of
knowledge
Parameters of these equations can be
evaluate by accessing database schema or
transaction log
2001/12/18
29/50
Templates for estimating
robustness( cont.)
2001/12/18
30/50
Templates for estimating
robustness( cont.)
2001/12/18
31/50
Empirical demonstration of
robustness estimation
2001/12/18
32/50
Empirical demonstration of
robustness estimation( cont.)
2001/12/18
33/50
Empirical demonstration of
robustness estimation( cont.)
2001/12/18
34/50
Empirical demonstration of
robustness estimation( cont.)
Definition 4(probability of consistency)
Given a rule r, a database state d and a set of n
transactions, the probability of consistency for a
rule r after applying n transactions to the
database state d is defined
Pc(r , n | d ) (robust (r | d )) n
2001/12/18
35/50
Empirical demonstration of
robustness estimation( cont.)
2001/12/18
36/50
Applying robustness in
knowledge discovery
Using robustness alone is not enough to
guide the discovery
Use robustness together with other measures
of usefulness
One of the measure of usefulness is
applicability
A pruning discovered rule is both highly
applicable and robust
2001/12/18
37/50
Applying robustness in
knowledge discovery( cont.)
A rule is more applicable if it is shorter
To dividing a learning process into a twostage rule construction and rule pruning
Specification of rule pruning
2001/12/18
Take a machine-generated rule as input which is
consistent with a database but overly-specific
Remove antecedent literals of the rule so that it
remains consistent but is short and robust
38/50
Applying robustness in
knowledge discovery( cont.)
To search for a subset of antecedent literals
to remove until any further removal will yield
an inconsistent rule
To present a beam-search algorithm to trim
the search space
Two property-robustness and length
2001/12/18
39/50
Applying robustness in
knowledge discovery( cont.)
2001/12/18
40/50
Applying robustness in
knowledge discovery( cont.)
The pruner removes the pruned rules that are
inconsistent or dangling literal in the rule
To identify an inconsistent rule, the pruner can
consult the database directly
A set of literals are dangling if the variables
occurring in those literals do not occur in any other
literals in a rule
2001/12/18
41/50
Applying robustness in
knowledge discovery( cont.)
To ensure removing a database literal L does
not yield dangling literals, L must satisfy
following
2001/12/18
No built-in literal in the antecedents of the rule is
defined on the variables occurring in L
If a variable occurring in the consequent of r also
occurs in L, this variable must occurs in some
other database literals in the rule
Removing L from the rule does not disconnet
existing join paths between any database literals
in the rule
42/50
Applying robustness in
knowledge discovery( cont.)
2001/12/18
43/50
Applying robustness in
knowledge discovery( cont.)
2001/12/18
44/50
Applying robustness in
knowledge discovery( cont.)
2001/12/18
45/50
Experimental results
To used the rule discovery system BASIL
Two large ORACLE relational databases
123 synthesized transactions contains 27
updates, 29 deletions and 67 insertions
2001/12/18
46/50
Experimental results
Experiment design
Train BASIL to discover a set of rules and
estimate their robustness
2001/12/18
Exhaust its search space during the rule discovery
and generated 355 rules. Meanwhile BASIL estimated
the robustness of rules with another 202 sample
transactions
Use the 123 transactions to generate a new
database state
47/50
Experimental results( cont.)
2001/12/18
Check if high robust rules have a better chance
to remain consistent with the data in the new
database state
48/50
Conclusion and future work
To formalize the notion of the robustness
against database changes
Applying approaches to a variety of KDD
applications in database management
To improve the precision of the robustness
estimation by refining the estimation
templates to prevent overestimating
2001/12/18
49/50
Opinion
Applying this approach in reliability test
2001/12/18
50/50