Discovering Robust Knowledge from Databases that Change

Download Report

Transcript Discovering Robust Knowledge from Databases that Change

Discovering Robust
Knowledge from
Databases that Change
Author: Chun-Nan Hsu,
Craig A. Knoblock
Advisor: Dr. Hsu
Graduate: Yu-Wei Su
2001/12/18
1/50
Abstract


Databases usually change over time and
make machine-discovered knowledge
inconsistent
Useful knowledge should be robust against
database changes so that it is unlikely to
become inconsistent after database changes
2001/12/18
2/50
Abstract( cont.)

Defines this notion of robustness in database
and describes how robustness of first-order
Horn-clause rules can be estimated and
applied in knowledge discovery
2001/12/18
3/50
outline




Motivation
Objective
Terminology
Robustness of knowledge








Definitions of robustness
Estimating robustness
Templates for estimating robustness
Empirical demonstration of robustness estimation
Applying robustness in knowledge discovery
Experimental results
Conclusion and future work
opinion
2001/12/18
4/50
Motivation


Many application require discovery
knowledge to be consistent in all database
states
Most solution approach to these problem
assume static databases
2001/12/18
5/50
Objective


To discover robust knowledge that is unlikely
to become inconsistent with new database
states
To presents an efficient approach to the
estimation and use of the new measure
2001/12/18
6/50
Terminology


Robustness can be defined as the probability that
the knowledge is consistent with a database state
This paper considers relational databases, which
consist of a set of relations
2001/12/18
7/50
Terminology( cont.)

Horn-clause rules that express the regularity of data

To literals defined on database relation as database
literal and literals on built-in relations as built-in
literals
Range rule  built-in literal
Relational rule  database literals


2001/12/18
8/50
Terminology( cont.)



A database state at a given time t is the
collection of the instances present in the
database at time t
To use the close-world assumption( CWA) to
interpret the semantic of a database state
A rule is consistent with a database state if all
variable instantiations that satisfy the
antecedents of the rule also satisfy the
consequent of the rule
2001/12/18
9/50
Terminology( cont.)
2001/12/18
10/50
Robustness of knowledge




Definitions of robustness
Estimating robustness
Templates for estimating robustness
Empirical demonstration of robustness
estimation
2001/12/18
11/50
Definitions of robustness

Definition 1(Robustness for all states)
Given a rule r, let D be the event that a database is in
a state that is consistent with r. The robustness of r is
Robust1(r)=Pr(D)
# of database states consistent with r
Robust1(r)=
# of all possible database states

Two problems: treats all database states are equally
probable and possible database states is intractably
large
2001/12/18
12/50
Definitions of robustness( cont.)

Definition 2( Robustness for accessible states)
Given a rule r, a database in a state denoted
as d, in which r is consistent. New database
states are accessible from d by performing
transactions. Let t denote the event of
performing a transaction on d that result in
new database state inconsistent with r
2001/12/18
13/50
Definitions of robustness( cont.)

The robustness of r in accessible states from
the current state d is
Robust (r | d )  Pr(t | d )  1  Pr(t | d )
2001/12/18
14/50
Definitions of robustness( cont.)

Corollary 3
If r is consistent with d, and if new database
states are accessible from d only by performing
transaction, and all transaction are equally
probable, then
Robust1(r)=Robust(r|d)
2001/12/18
15/50
Definitions of robustness( cont.)

Example
to reach a state inconsistent with r
d1  delete ten tuples
d2  delete one tuple
Robust(r|d1) > Robust(r|d2)
2001/12/18
16/50
Estimating robustness

Laplace law of succession




Given a repeatable experiment with an outcome
of one of any of k classes
experiment n times
r of which have resulted in some outcome C
The probability that the outcome of the next
experiment will be C can be
r 1
nk
2001/12/18
17/50
Estimating robustness( cont.)

m-probability




Let r, n, C be as laplace law
Pr(C) is probability that has an outcome C
m is an adjusting constant that indicates our
confidence in Pr(c)
The probability that the outcome of the next
experiment will be C can be
r  m  Pr(c)
nm
2001/12/18
18/50
Estimating robustness( cont.)



Laplace law is a special case of m-probability
with Pc(c)=1/k and m=k
To estimate the robustness of a rule based on
the probability of transactions that may
invalidate the rule
To decomposed into the transactions of
deriving a set of invalidating transaction and
estimating the probability of those
transactions
2001/12/18
19/50
Estimating robustness( cont.)

example
2001/12/18
20/50
Estimating robustness( cont.)

T1, T2 and T3 are mutually exclusive with
each other and these cover all possible
transactions will invalidate R2.1
Pr(T 1  T 2  T 3)  Pr(T 1)  Pr(T 2)  Pr(T 3)
Robust ( R 2.1 | d )  1  Pr(T1  T 2  T 3)
2001/12/18
21/50
Estimating robustness( cont.)


To decompose the transaction into more primitive
statements and estimate their local probabilities.
The decomposition is based on a bayesian network
model
2001/12/18
22/50
Estimating robustness-example
Pr(T 1)  Pr( x1  x 2  x3  x 4  x5)
 Pr( x1)  Pr( x 2 | x1)  Pr( x3 | x 2  x1)  Pr( x 4 | x 2  x1)  Pr( x5 | x 4  x 2  x1)





X1: a tuple is updated
X2: a tuple of geoloc is updated
X3: a tuple of geoloc, whose ?country=“malta”, is
updated
X4: a tuple of geoloc whose ?latitude is updated
X5: a tuple of geoloc whose ?latitude is updated to a
new value less than 35.89
2001/12/18
23/50
Estimating robustness-example( cont.)

X1: a tuple is updated


tu is the number of pervious updates
t is the total number of pervious transactions
tu  1
Pr( x1) 
t 3

If no information is available, assume tu=t=0
2001/12/18
24/50
Estimating robustness-example( cont.)

X2: a tuple of geoloc is updated


R is the number of relations in the database
tu,geoloc is the number of updates made to tuples of
relation geoloc
tu , geoloc  1
Pr( x 2 | x1) 
tu  R
2001/12/18
25/50
Estimating robustness-example( cont.)

X3: a tuple of geoloc, whose ?country=“malta”,
is updated



G is the size of relation geoloc
Ia3 is the number of tuples in geoloc
satisfy ?country=“malta”
Tu,a3 is the number of updates made on the tuples
in geoloc that satisfy ?country=“malta”
Pr( x3 | x2  x1) 
2001/12/18
tu , a 3  1
tu , geoloc  G / Ia3
26/50
Estimating robustness-example( cont.)

X4: a tuple of geoloc whose ?latitude is
updated


A is the number of attributes of geoloc
Tu,geoloc,latitude is the number of updates made on
the latitude attribute of the geoloc relation
tu , geoloc, latitude  1
Pr( x4 | x2  x1) 
tu , geoloc  A
2001/12/18
27/50
Estimating robustness-example( cont.)

X5: a tuple of geoloc whose ?latitude is
updated to a new value less than 35.89
 0.5 no information available
Pr( x5 | x 4  x 2  x1)  
0.398 with range information
2001/12/18
28/50
Templates for estimating
robustness


The templates allow the system to
automatically estimate the robustness of
knowledge
Parameters of these equations can be
evaluate by accessing database schema or
transaction log
2001/12/18
29/50
Templates for estimating
robustness( cont.)
2001/12/18
30/50
Templates for estimating
robustness( cont.)
2001/12/18
31/50
Empirical demonstration of
robustness estimation
2001/12/18
32/50
Empirical demonstration of
robustness estimation( cont.)
2001/12/18
33/50
Empirical demonstration of
robustness estimation( cont.)
2001/12/18
34/50
Empirical demonstration of
robustness estimation( cont.)

Definition 4(probability of consistency)

Given a rule r, a database state d and a set of n
transactions, the probability of consistency for a
rule r after applying n transactions to the
database state d is defined
Pc(r , n | d )  (robust (r | d )) n
2001/12/18
35/50
Empirical demonstration of
robustness estimation( cont.)
2001/12/18
36/50
Applying robustness in
knowledge discovery




Using robustness alone is not enough to
guide the discovery
Use robustness together with other measures
of usefulness
One of the measure of usefulness is
applicability
A pruning discovered rule is both highly
applicable and robust
2001/12/18
37/50
Applying robustness in
knowledge discovery( cont.)



A rule is more applicable if it is shorter
To dividing a learning process into a twostage rule construction and rule pruning
Specification of rule pruning


2001/12/18
Take a machine-generated rule as input which is
consistent with a database but overly-specific
Remove antecedent literals of the rule so that it
remains consistent but is short and robust
38/50
Applying robustness in
knowledge discovery( cont.)



To search for a subset of antecedent literals
to remove until any further removal will yield
an inconsistent rule
To present a beam-search algorithm to trim
the search space
Two property-robustness and length
2001/12/18
39/50
Applying robustness in
knowledge discovery( cont.)
2001/12/18
40/50
Applying robustness in
knowledge discovery( cont.)



The pruner removes the pruned rules that are
inconsistent or dangling literal in the rule
To identify an inconsistent rule, the pruner can
consult the database directly
A set of literals are dangling if the variables
occurring in those literals do not occur in any other
literals in a rule
2001/12/18
41/50
Applying robustness in
knowledge discovery( cont.)

To ensure removing a database literal L does
not yield dangling literals, L must satisfy
following



2001/12/18
No built-in literal in the antecedents of the rule is
defined on the variables occurring in L
If a variable occurring in the consequent of r also
occurs in L, this variable must occurs in some
other database literals in the rule
Removing L from the rule does not disconnet
existing join paths between any database literals
in the rule
42/50
Applying robustness in
knowledge discovery( cont.)
2001/12/18
43/50
Applying robustness in
knowledge discovery( cont.)
2001/12/18
44/50
Applying robustness in
knowledge discovery( cont.)
2001/12/18
45/50
Experimental results



To used the rule discovery system BASIL
Two large ORACLE relational databases
123 synthesized transactions contains 27
updates, 29 deletions and 67 insertions
2001/12/18
46/50
Experimental results

Experiment design

Train BASIL to discover a set of rules and
estimate their robustness


2001/12/18
Exhaust its search space during the rule discovery
and generated 355 rules. Meanwhile BASIL estimated
the robustness of rules with another 202 sample
transactions
Use the 123 transactions to generate a new
database state
47/50
Experimental results( cont.)

2001/12/18
Check if high robust rules have a better chance
to remain consistent with the data in the new
database state
48/50
Conclusion and future work



To formalize the notion of the robustness
against database changes
Applying approaches to a variety of KDD
applications in database management
To improve the precision of the robustness
estimation by refining the estimation
templates to prevent overestimating
2001/12/18
49/50
Opinion

Applying this approach in reliability test
2001/12/18
50/50