Transcript 24SpDatalog

Datalog
Presented by:
Michael Lam
CS 157A section 1
Dr. Sin Min Lee
Outline
•
•
•
•
•
Basic Structure/Terminolgoy (5.2.1)
Syntax of Datalog Rules (5.2.2)
Safety (5.2.4)
Recursion in Datalog (5.2.6)
Relational Operations in Datalog
(5.2.5)
• Examples
What is Datalog?
Datalog is a nonprocedural query language based on the logicprogramming language Prolog. A user describes the information desired
without giving a specific procedure for obtaining that information.
Datalog simplifies writing simple queries and makes query optimization
easier.
Terminology
Variable
A variable is like a variable in a mathematical equation, it represents some
constant that has yet to be determined.
By convention variables always start with an upper case letter, e.g., City,
Firstname,Code, A, B, C, etc
Constant
A constant is an attribute value. By convention all constants begin with
lower-case letters such as: sammy, joe, xxy2003, etc
fact
A fact is the same things a tuple in a relation. A fact has the following
structure. (called Ground facts, because there are no variables)
<predicate name>(<list of constants>)
Example: Facts about student enrollment:
The following set of facts shows some students and the subjects in which
they are enrolled.
enrolledIn(joe, cp2001)
enrolledIn(sue, cp2001)
enrolledIn(joe, cp2003).
In relational database terms, it would be the same as the following relation.
enrolledIn
Rules
A rule is a way to derive new facts. It is part of a query.
enrolledIn
JOE
CP2001
SUE
CP2001
JOE
CP2003
The major difference in the representation is that in the Datalog facts the
attributes are not named. So there is nothing to tell us that the first attribute is
a Name and the second is a Code.
Syntax of Datalog Rules
A rule has the following form
<head> :-<body>
Head is a single predicate and the body is a list of predicates. Each predicate
has the form
positive literal:
<relation name>(<list of constants or variables>)
negative literal:
not <relation name>(<list of constants or variables>)
What does :- mean?
Assume we have a rule: Q :- P
Then :- means, if P is true then Q is true
Example: A rule about students
The following rule identifies courses students are enrolled in.
result(Name, Code, Phone) :- student(Name, Phone), enrolledIn(Name, Code).
student
enrolledIn
Name
Phone
Name
Code
joe
408-489-5454
joe
CP2004
sue
CP2001
joe
CP2003
Test every combination of student and enrooledIn that unifies
student(joe,408-489-5454), enrolledIn(joe, CP2004)
student(joe,408-489-5454), enrolledIn(sue, CP2001)
student(joe,408-489-5454), enrolledIn(joe, CP2003)
Result:
result(joe, CP2004, 408-489-5454)
result(joe, CP2003, 408-489-5454)
Safety
It may be possible to write rules that generate an an infinite number of answers. For
example:
employee(X,Y) :- X>Y
Since the relation defining > is infinite, this rule would generate an infinite number
of facts for the relation employee. Also the use of negation can cause similar
problems. For example:
not-in-loan(L, B, A) :- not loan(L, B, A)
We what to retrieve all tuples that are not in the loan relation but if account numbers,
branch-name and balance is infinite then the relation not-in-loan would also be
infinite.
To avoid such problems Datalog rules are required to satisfy the following safety
conditions:
1. Every variable that appears in the head of the rule also appears in a
nonarithmetic positive literal in the body of the rule
2. Every variable appearing in a negative literal in the body of the rule also
appears in some positive literal in the body of the rule.
Recursion in Datalog
Suppose now that we want to find out which employees are
supervised, directly or indirectly by a given manager, "Jones".
(1) People whose manager is "Jones"
(2) People whose manager is supervised by "Jones"
Ex. Tim manager is Jones
Kelly manager is Tim
employee-jones(X) :- manager(X, "Jones")
employee-jones(X) :- manager(X,Y), employee-jones(Y)
Procedure
I = set of facts in the database
repeat
Old_I = I
I = I U infer( R, I)
until I = Old_I
employee-jones(X) :- manager(X, "Jones")
employee-jones(X) :- manager(X,Y), employee-jones(Y)
manager relation
Iteration
employee-name
manager-name
Alon
Barinsky
Barinsky
Estovar
Corbin
Duarte
Duarte
Jones
Estovar
Jones
Jones
Klinger
Rensal
Klinger
Tuples in employee-jones
0
1
(Duarte), (Estovar)
2
(Duarte), (Estovar), (Barinsky), (Corbin)
3
(Duarte), (Estovar), (Barinsky), (Corbin), (Alon)
4
(Duarte), (Estovar), (Barinsky), (Corbin), (Alon)
Relational Operations in
Datalog
student(Name,StudentID)
courses(Cname, StudentID)
Relational Algebra:
names = Πname(student)
Datalog Rule: names(Name) :- student(Name,_)
Relational Algebra:
joeInfo = ΠstudentID(σname="joe"(student))
Datalog Rule: joeInfo(StudentID) :- student(joe,StudentID)
Relational Algebra: ΠcoursName(σname="joe"(student
|><| courses))
Datalog Rule:
joeInfo(Name,Cname) :- student(joe, StudentID), courses(Cname, StudentID)
Examples
Relational Algebra: joeInfo = ΠstudentID(σname="joe", studentID > 25(student))
Datalog Rule: joeInfo(StudentID) :- student(joe,StudentID), StudentID > 25
Relational Algebra: ΠcName(courses) |><| Πname (student)
Datalog Rule:
joeInfo(Name,Cname) :- student(Name, StudentID), courses(Cname, StudentID)
Relational Algebra: ΠstudentID(σname "joe", studentID > 25(student))
Datelog Rule:
joeInfo(StudentID) :- student(Name,StudentID), StudentID > 25, Name <> joe