Transcript slides

Data Models and Query
Languages
CSE 590DB, Winter 1999
Theory of Databases
Zack Ives
January 10, 1999
1
The Relational Model
 Database consists of relations, where columns are
attributes and rows are tuples
 Declarative queries using an algebraic or logic language
(or SQL, which we shall ignore)
NewMovies
Name
The Phantom Menace
Patch Adams
Stepmom
attribute
Rating
PG
PG-13
PG-13
NewMovies(Name, Rating, Director)
Director
George Lucas
Tom Shadyac
Chris Columbus
arity = 3, cardinality = 3
tuple
2
Schema
 Each attribute in a relation has a type:
NewMovies(Name: String, Rating: RatingDomain, Director: String),
where RatingDomain = {G, PG, PG-13, R, NC-17}
(In many cases, we will omit the type, and assume the type most
appropriate to the expression)
 Relation schema: relation name, attributes, and types
 Relation instance: a set of tuples for a given schema
 Database schema: set of relation schemas
 Database instance: relation instance for every relation
schema
3
More on Tuples
 Formally, a tuple is a mapping from attribute names to
(correctly typed) values:
Director -> “George Lucas”
Name -> “Stepmom”
 May specify a tuple using values & schema notation:
NewMovies(“The Phantom Menace”, “PG”, “George Lucas”)
 Can have constraints on attribute values:
Integrity constraints
Keys
Foreign keys
Functional dependencies
4
Relational Algebra
All operators take one or more sets of tuples as input,
and produce a new set as output
Basic set operators (, , —, but normally no
complement)
Selection ()
Projection ()
Cartesian Product ()
Join ()
Division ()
5
Set Operations
Sets must be compatible!
Same number of attributes
Same types/domains
R1  R2 = {all tuples in either R1 or R2}
R1  R2 = {all tuples in both R1 and R2}
Complement???
Negation: R1 — R2 = {all tuples in R1 not in R2}
Query without unions is conjunctive
6
Selection
Chooses a subset of tuples satisfying a predicate
NewMovies
Name
The Phantom Menace
Patch Adams
Stepmom
Rating
PG
PG-13
PG-13
Director
George Lucas
Tom Shadyac
Chris Columbus
Rating = “PG-13” (NewMovies)
Name
Patch Adams
Stepmom
Rating
PG-13
PG-13
Director
Tom Shadyac
Chris Columbus
7
Projection
Chooses a subset of attributes, removes duplicates
NewMovies
Name
The Phantom Menace
Patch Adams
Stepmom
Rating
PG
PG-13
PG-13
Director
George Lucas
Tom Shadyac
Chris Columbus
Name, Director (NewMovies)
Name
The Phantom Menace
Patch Adams
Stepmom
Director
George Lucas
Tom Shadyac
Chris Columbus
8
Cartesian Product
Combine two relations - all pairings of tuples
NewMovie
Name
Phantom Menace
Patch Adams
Stepmom
Director
George Lucas
Tom Shadyac
Chris Columbus
NewMovie  MetroShows
Name
Director
Phantom Menace George Lucas
Phantom Menace George Lucas
Phantom Menace George Lucas
Phantom Menace George Lucas
Patch Adams
Tom Shadyac
…
…
MetroShows
Name
Patch Adams
Patch Adams
Stepmom
Stepmom
Name
Patch Adams
Patch Adams
Stepmom
Stepmom
Patch Adams
…
Time
7:00
9:40
6:50
9:00
7:00
...
Time
7:00
9:40
6:50
9:00
9
Join
 Combine tuples from two relations by predicate. If
predicate is equality, remove duplicate attribute.
NewMovie
Name
Phantom Menace
Patch Adams
Stepmom
Director
George Lucas
Tom Shadyac
Chris Columbus
MetroShows
Name
Patch Adams
Patch Adams
Stepmom
Stepmom
Time
7:00
9:40
6:50
9:00
NewMovie  NewMovie.Name = MetroShows.Name MetroShows
Name
Patch Adams
Patch Adams
Stepmom
Stepmom
Director
Tom Shadyac
Tom Shadyac
Chris Columbus
Chris Columbus
Time
7:00
9:40
6:50
9:00
Equivalent to:
Name, Director, Time
( NewMovie.Name = MetroShows.Name
(NewMovie  MetroShows))
10
Division
 Only returns results from dividend which match
attributes of all of tuples in divisor (“for all”)
Customers
name
Johnson
Smith
Lindsay
Green
Green
branch
Downtown
Brighton
Redwood
Brighton
Downtown
Branches
branch
Brighton
Downtown
Customers  Branches
name
Green
11
Logic-Based Query Languages
Tuple-relational calculus
Based on first-order predicate calculus, but imposes
restrictions (e.g., no function symbols)
Equivalent in power to relational algebra
Datalog
More powerful than tuple-relational calculus
Supports recursion and thus transitive closure
Equivalent to set of Horn clauses
12
Predicates & Atoms
Relations represented by predicates
Tuples represented by atoms:
Flight(“Alon”, “British Airways”, 1234, “Seattle”, “Israel”,
“1/9/1999”, “10:00”)
Arithmetic atoms:
X < 100, X + Y + 5 > Z * 2
In certain cases, negated atoms:
not Flight(“Zack”, “British Airways”, 1234, “Seattle”, “Israel”,
“1/9/1999”, “10:00”)
13
Datalog Rules & Programs
 “Pure” datalog rules:
head :- atom1, atom2, atom3
where all atoms are non-negated and relational
 Datalog program is set of datalog rules
 Conjunctive query has single rule:
HanksDirector(D) :- ActorIn(“Hanks”, M) & DirectedBy(D, M)
 Disjunctive query:
HanksDirector(D) :- ActorIn(“Hanks”, M) & DirectedBy(D, M)
HanksDirector(D) :- ProducedBy(“Hanks”, M) & DirectedBy(D, M)
14
Intension & Extension
Distinction between EDB & IDB:
Extensional DB
what’s stored in the database
can only be in the body of a Datalog rule
Intensional DB
result of a datalog rule
can be in head or body of a Datalog rule
15
Evaluating Rules
 Evaluating rules:
Start with EDB, iteratively derive facts for IDB’s
Repeat until cannot derive any new facts:
Consider every possible assignment from database to
variables in body
If each atom in body is made true by assignment, add tuple
for the head into the head’s relation
 Conjunctive queries are inexpensive
 Disjunctive are more expensive (even though they’re
Horn clauses)
16
Transitive Closure
 Suppose we to know all possible destination airports
reachable from SeaTac, given an EDB Flight(From, To):
SFO
SEA
LAX
LGA
DEN
JFK
We need to express the query:
Find all airports reachable from SEA
17
Recursion in Datalog
 Program:
Dest(X, Y) :- Flight(X, Y)
Dest(X, Y) :- Dest(X, Z), Flight(Z, Y)
 Evaluate unknown # of times until fixpoint
Flight: {(SEA,SFO), (SEA,LAX), (SEA,JFK), (LAX,LGA),
(JFK,LGA),(LGA,DEN)}
Dest0: {}
Dest1: Dest0  {(SEA,SFO), (SEA,LAX), (SEA,JFK)}
Dest2: Dest1  {(SEA,LGA),(LAX,DEN),(JFK,DEN)}
Dest3: Dest2  {(SEA,DEN)}
No more changes, so stop (minimum fixpoint)
18
Built-in Predicates
Rules may include atoms with built-in
predicates:
Expensive(X) :- Product(X, Y, P) & P > 100
But need to restrict use of built-in atoms:
P(X) :- R(X) & X < Y
 When is X < Y?
We require that every variable from a built-in
atom appear in a relational atom
19
Negated Subgoals
Restrict forms:
P(X, Y) :- Between(X, Y, Z) & NOT Direct(X,Z)
Bad:
Q(X, Y) :- From(X) & NOT To(Y)
Bad but fixable:
P(X) :- From(X) & NOT Connects(X, Y)
Rewrite as:
Connects’(X) :- Connects(X,Y)
P(X) :- End(X) & NOT Connects’(X)
20
Stratified/Stratisfiable Rules
 Predicate P depends on predicate Q if Q appears
negated in a rule defining P
 If cycle in dependency graph, program is not
stratifiable:
p(X) :- r(X) & NOT q(X)
q(X) :- r(X) & NOT p(X)
 Suppose r has tuple {1}
Intuitively, stratification provides a way of executing
program P as sequence of subprograms P1 … Pn defining
IDB relations w/o “forward references”. Results
independent of stratification.
21