slides - UCLA Computer Science

Download Report

Transcript slides - UCLA Computer Science

CS 240A: Databases and Knowledge Bases
Analysis of Active Databases
Carlo Zaniolo
Department of Computer Science
University of California, Los Angeles
Notes From Chapter 4 of
Advanced Database Systems
by Zaniolo, Ceri, Faloutsos,
Snodgrass, Subrahmanian and
Zicari Morgan Kaufmann, 1997
Properties of rule execution

Termination: For any legal transaction, the
subsequent rule execution terminates.

Confluence:For any legal transaction, the
subsequent rule execution terminates in the same
final state.

Identical observable behavior: For any legal
transaction, the subsequent rule execution is
confluent and produces the same output
sequence.
Analysis of Termination
An active database has NonTerminating Behavior iff
there exists at least one transaction which
produces nonterminating rule processing.

Triggering Graph (TG):

Directed graph {V, E}
 Node vi
 V correspond to rule ri  R
rj,rk   E means that the action of rule rj generates
events which trigger rule rk
 Arc
 Acyclicity
of the triggering graph implies the absence of
non-terminating behaviors
Cyclic Rules

With termination:
CREATE RULE SalaryControl ON Emp
WHEN INSERTED, DELETED, UPDATED (Sal)
IF ( SELECT AVG (Sal) FROM Emp ) > 100
THEN UPDATE Emp
SET Sal = .9 * Sal
Cyclic Rules

Without termination:
CREATE RULE SalaryControl2 ON Emp
WHEN INSERTED, DELETED, UPDATED (Sal)
IF ( SELECT AVG (Sal) FROM Emp ) > 100
THEN UPDATE Emp
SET Sal = 1.1 * Sal
Improving Rule Analysis

Eliminate edge < ri,rj > between two rules when:

The condition of rj is guaranteed to be false after
the execution of ri.

The new data produced by ri do not satisfy the
condition of rj.
Example of second case: Grade range rules
CREATE TRIGGER CheckGradeDomain1
AFTER UPDATE OF Exam ON Grade
REFERENCING NEW AS N
FOR EACH ROW
WHEN (N.Grade > 30)
UPDATE Exam
SET Grade = NULL
WHERE ExamNumber = N.ExamNumber
CREATE TRIGGER CheckGradeDomain2
AFTER UPDATE OF Exam ON Grade
REFERENCING NEW AS N
FOR EACH ROW
WHEN (N.Grade < 18)
THEN UPDATE Exam
SET Grade = NULL
WHERE ExamNumber = N.ExamNumber
Analysis of Confluence

Confluence is the property that there only one final result
(even when execution is nondeterministic)

This is often called Church-Rosser property, and the
Knuth-Bendix algorithm can be used (in some cases) to
determine if a given set of rules has this property

Set-oriented Rules (I.e., statement level triggering event)

Confluence is an issue when there is no total order between rules.

For tuple-oriented rules (e.g., for each row) confluence is
an issue even if the rules are totally ordered.

In general confluence is hard to assure and might not be
necessary.
Confluence (cont.)

Tuple-oriented Rules
 Confluence
is much harder to ensure; it requires that
the final state does not depend on the system's
controlled order in which tuples are accessed.
 But:
Confluence is not necessary or desirable in many
cases.
 Mutating
table exception, when a table that is currently
being updated also needs to be changed by a rule; may
reveal lack of confluence.
 Sufficient
condition for confluence: Commutativity of two
rules r1 and r2 : if for any database state, rule ri and then
rule r1 followed by r2 produces the same database as r2
followed by r1.
Confluence: Commutative Rules
create trigger T1 for C1
events modify(A1)
condition C1(X), X.A1=7
actions modify(C1.A2,X,A2+1)
end;
create trigger T2 for C1
events modify(A1)
condition C1(X), X.A1=7
actions modify(C1.A2,X,A2+2)
end;
Observable Determinism
For every input application, rule processing execution
terminates in the same final state with the same output
sequence (messages, display activities, reports).
 Unlike confluence, observable determinism is neither
necessary nor desirable in most cases.
 Rules may be confluent but not satisfy observable
determinism:
create trigger T2 for C1
create trigger T1 for C1
events modify(A1)
events modify(A1)
condition C1(X), X.A1=7
condition C1(X), X.A1=7
actions
actions
modify(C1.A2,X,A2+2),
modify(C1.A2,X,A2+1),
display(I where integer(I), I=X.A2)
display(I where integer(I), I=X.A2)
end;
end;

Modularization Techniques for Active Rule
Design

The main problem with rules: understanding their
interaction.

This is not just a design issue, but rather a maintenance
problem--The AI experience: XCON

Understanding rules after their design is quite hard.

Adding one rule to a rule set may lead to unexpected
behaviors.

The main reason: lack of modularization devices for active
rules.

Can we improve predictability of active rules by better
structuring: next slide.
Eventual consistency
From Wikipedia, the free encyclopedia

Eventual consistency is used in distributed computing to
maximize achieve high availability [as opposed to
determinism or serializability--CZ]

It guarantees that, if no new updates are made to a given
data item, eventually all accesses to that item will return
the last updated value.

A system that has achieved eventual consistency is often
said to have converged, or achieved replica convergence.

Analysis based on Datalog and monotonicity in generalized
lattices [UCB]
Modularization by Stratification

Stratification: partitioning of rules into disjoint
strata such that each strata includes ``uniform''
rules.

Benefit: Rule behavior can be abstracted by:

 Reasoning
``locally" on each individual stratum
 Reasoning
``globally" of the behavior across strata
A key design principle for providing rule
modularization and control.
Rule Debugging and Monitoring

Purpose: runtime debugging and monitoring for
tuning active rules' behavior.

Commercial systems lack of debugging and
monitoring tools.

Minimum requirement: trace facility-but even this
often unavailable.

Some research prototypes offer powerful
debuggers (e.g., Chimera)
Conclusions

Database production rules are:




Very powerful
Very widespread (on almost all relational products)
Very difficult to program—even harder to maintain
Active rule products must be enhanced:




Many products are ``implemented'', with severe limitation and
sometimes illdefined semantics
SQL:1999 is now published cleans up several problems --but many
others remain.
Concepts such as stratification will hopefully soon become a
widespread notion for improving design
Declarative interfaces are possible for many active database
applications integrity constraints, materialized views, virtual views,
authorization, dependency control, deduction, versions,workflow
management, resource management, ....