Transcript slides
Data Models and Query
Languages
CSE 590DB, Winter 1999
Theory of Databases
Zack Ives
January 10, 1999
1
The Relational Model
Database consists of relations, where columns are
attributes and rows are tuples
Declarative queries using an algebraic or logic language
(or SQL, which we shall ignore)
NewMovies
Name
The Phantom Menace
Patch Adams
Stepmom
attribute
Rating
PG
PG-13
PG-13
NewMovies(Name, Rating, Director)
Director
George Lucas
Tom Shadyac
Chris Columbus
arity = 3, cardinality = 3
tuple
2
Schema
Each attribute in a relation has a type:
NewMovies(Name: String, Rating: RatingDomain, Director: String),
where RatingDomain = {G, PG, PG-13, R, NC-17}
(In many cases, we will omit the type, and assume the type most
appropriate to the expression)
Relation schema: relation name, attributes, and types
Relation instance: a set of tuples for a given schema
Database schema: set of relation schemas
Database instance: relation instance for every relation
schema
3
More on Tuples
Formally, a tuple is a mapping from attribute names to
(correctly typed) values:
Director -> “George Lucas”
Name -> “Stepmom”
May specify a tuple using values & schema notation:
NewMovies(“The Phantom Menace”, “PG”, “George Lucas”)
Can have constraints on attribute values:
Integrity constraints
Keys
Foreign keys
Functional dependencies
4
Relational Algebra
All operators take one or more sets of tuples as input,
and produce a new set as output
Basic set operators (, , —, but normally no
complement)
Selection ()
Projection ()
Cartesian Product ()
Join ()
Division ()
5
Set Operations
Sets must be compatible!
Same number of attributes
Same types/domains
R1 R2 = {all tuples in either R1 or R2}
R1 R2 = {all tuples in both R1 and R2}
Complement???
Negation: R1 — R2 = {all tuples in R1 not in R2}
Query without unions is conjunctive
6
Selection
Chooses a subset of tuples satisfying a predicate
NewMovies
Name
The Phantom Menace
Patch Adams
Stepmom
Rating
PG
PG-13
PG-13
Director
George Lucas
Tom Shadyac
Chris Columbus
Rating = “PG-13” (NewMovies)
Name
Patch Adams
Stepmom
Rating
PG-13
PG-13
Director
Tom Shadyac
Chris Columbus
7
Projection
Chooses a subset of attributes, removes duplicates
NewMovies
Name
The Phantom Menace
Patch Adams
Stepmom
Rating
PG
PG-13
PG-13
Director
George Lucas
Tom Shadyac
Chris Columbus
Name, Director (NewMovies)
Name
The Phantom Menace
Patch Adams
Stepmom
Director
George Lucas
Tom Shadyac
Chris Columbus
8
Cartesian Product
Combine two relations - all pairings of tuples
NewMovie
Name
Phantom Menace
Patch Adams
Stepmom
Director
George Lucas
Tom Shadyac
Chris Columbus
NewMovie MetroShows
Name
Director
Phantom Menace George Lucas
Phantom Menace George Lucas
Phantom Menace George Lucas
Phantom Menace George Lucas
Patch Adams
Tom Shadyac
…
…
MetroShows
Name
Patch Adams
Patch Adams
Stepmom
Stepmom
Name
Patch Adams
Patch Adams
Stepmom
Stepmom
Patch Adams
…
Time
7:00
9:40
6:50
9:00
7:00
...
Time
7:00
9:40
6:50
9:00
9
Join
Combine tuples from two relations by predicate. If
predicate is equality, remove duplicate attribute.
NewMovie
Name
Phantom Menace
Patch Adams
Stepmom
Director
George Lucas
Tom Shadyac
Chris Columbus
MetroShows
Name
Patch Adams
Patch Adams
Stepmom
Stepmom
Time
7:00
9:40
6:50
9:00
NewMovie NewMovie.Name = MetroShows.Name MetroShows
Name
Patch Adams
Patch Adams
Stepmom
Stepmom
Director
Tom Shadyac
Tom Shadyac
Chris Columbus
Chris Columbus
Time
7:00
9:40
6:50
9:00
Equivalent to:
Name, Director, Time
( NewMovie.Name = MetroShows.Name
(NewMovie MetroShows))
10
Division
Only returns results from dividend which match
attributes of all of tuples in divisor (“for all”)
Customers
name
Johnson
Smith
Lindsay
Green
Green
branch
Downtown
Brighton
Redwood
Brighton
Downtown
Branches
branch
Brighton
Downtown
Customers Branches
name
Green
11
Logic-Based Query Languages
Tuple-relational calculus
Based on first-order predicate calculus, but imposes
restrictions (e.g., no function symbols)
Equivalent in power to relational algebra
Datalog
More powerful than tuple-relational calculus
Supports recursion and thus transitive closure
Equivalent to set of Horn clauses
12
Predicates & Atoms
Relations represented by predicates
Tuples represented by atoms:
Flight(“Alon”, “British Airways”, 1234, “Seattle”, “Israel”,
“1/9/1999”, “10:00”)
Arithmetic atoms:
X < 100, X + Y + 5 > Z * 2
In certain cases, negated atoms:
not Flight(“Zack”, “British Airways”, 1234, “Seattle”, “Israel”,
“1/9/1999”, “10:00”)
13
Datalog Rules & Programs
“Pure” datalog rules:
head :- atom1, atom2, atom3
where all atoms are non-negated and relational
Datalog program is set of datalog rules
Conjunctive query has single rule:
HanksDirector(D) :- ActorIn(“Hanks”, M) & DirectedBy(D, M)
Disjunctive query:
HanksDirector(D) :- ActorIn(“Hanks”, M) & DirectedBy(D, M)
HanksDirector(D) :- ProducedBy(“Hanks”, M) & DirectedBy(D, M)
14
Intension & Extension
Distinction between EDB & IDB:
Extensional DB
what’s stored in the database
can only be in the body of a Datalog rule
Intensional DB
result of a datalog rule
can be in head or body of a Datalog rule
15
Evaluating Rules
Evaluating rules:
Start with EDB, iteratively derive facts for IDB’s
Repeat until cannot derive any new facts:
Consider every possible assignment from database to
variables in body
If each atom in body is made true by assignment, add tuple
for the head into the head’s relation
Conjunctive queries are inexpensive
Disjunctive are more expensive (even though they’re
Horn clauses)
16
Transitive Closure
Suppose we to know all possible destination airports
reachable from SeaTac, given an EDB Flight(From, To):
SFO
SEA
LAX
LGA
DEN
JFK
We need to express the query:
Find all airports reachable from SEA
17
Recursion in Datalog
Program:
Dest(X, Y) :- Flight(X, Y)
Dest(X, Y) :- Dest(X, Z), Flight(Z, Y)
Evaluate unknown # of times until fixpoint
Flight: {(SEA,SFO), (SEA,LAX), (SEA,JFK), (LAX,LGA),
(JFK,LGA),(LGA,DEN)}
Dest0: {}
Dest1: Dest0 {(SEA,SFO), (SEA,LAX), (SEA,JFK)}
Dest2: Dest1 {(SEA,LGA),(LAX,DEN),(JFK,DEN)}
Dest3: Dest2 {(SEA,DEN)}
No more changes, so stop (minimum fixpoint)
18
Built-in Predicates
Rules may include atoms with built-in
predicates:
Expensive(X) :- Product(X, Y, P) & P > 100
But need to restrict use of built-in atoms:
P(X) :- R(X) & X < Y
When is X < Y?
We require that every variable from a built-in
atom appear in a relational atom
19
Negated Subgoals
Restrict forms:
P(X, Y) :- Between(X, Y, Z) & NOT Direct(X,Z)
Bad:
Q(X, Y) :- From(X) & NOT To(Y)
Bad but fixable:
P(X) :- From(X) & NOT Connects(X, Y)
Rewrite as:
Connects’(X) :- Connects(X,Y)
P(X) :- End(X) & NOT Connects’(X)
20
Stratified/Stratisfiable Rules
Predicate P depends on predicate Q if Q appears
negated in a rule defining P
If cycle in dependency graph, program is not
stratifiable:
p(X) :- r(X) & NOT q(X)
q(X) :- r(X) & NOT p(X)
Suppose r has tuple {1}
Intuitively, stratification provides a way of executing
program P as sequence of subprograms P1 … Pn defining
IDB relations w/o “forward references”. Results
independent of stratification.
21