Database Systems – Set Theory

Download Report

Transcript Database Systems – Set Theory

Database Systems – Set Theory
RELATIONS




A relational database consists of tables, each of which is assigned a unique name.
A row in a table represents a relationship among a set of values.
A table is a collection of such relationships.
Column Headers are commonly referred to as attributes
Websites-Schema=(website, organization, first-year, category)
websites relation:
website
organization
first-year category
www.zojjed.com
Walking Promotions
2006
Fiction
www.racewalk.com
Walking Promotions
1995
Health
www.greattreks.com
Walking Promotions
2006
Travel
www.twofeetgallery.com
Walking Promotions
2004
Photograph
www.walkinghealthy.com
Walking Promotions
2002
Health
www.cs.drexel.edu/~jsalvage
Drexel University
2005
Education
Database Systems – Set Theory
DOMAINS
A Domain is the set of permitted values for a column/attribute.
 The domain can be any positive number as in the case with first year
 The domain can be a series of letters up to a maximum number of letters as in the
case with organization.
 The domain can be valid web addresses, whose rules might be slightly more
complicated.

If
D1
D2
D3
D4
denotes
denotes
denotes
denotes
the
the
the
the
set
set
set
set
of
of
of
of
all
all
all
all
websites
organizations
first years
categories
Any row of websites must contain a 4-tuple(v1,v2,v3, v4) where
v1 is a website in the domain D1
v2 is a organization in the domain D2
v3 is year in the domain D3
V4 is a category in the domain D4
Therefore, account is a subset of D1xD2xD3xD4.
Database Systems – Set Theory
DOMAINS
In general a table must be a subset of D1xD2x…xDn-1xDn
Tables vs. Relations
There exists a close relationship between this language and the terminology used in
databases.
Instead of numbers DB’s use names.
Relation -> table
tuple -> row
Websites table has 6 tuples.
Database Systems – Set Theory
TUPLE NOTATION
If t is a variable denoting the first tuple relationship, then t[website] denotes the
website attribute of the tuple t.
t[website] = “www.zojjed.com”
t[organization]=”Walking Promotions”
t[first-year] = 2006
t[category] = “Fiction”
Alternatively, using numbered attributes:
t[1] = “www.zojjed.com”
t[2]= ”Walking Promotions”
t[3] = 2006
t[4] = “Fiction”
t  r, indicate the tuple t is in the relation r
Database Systems – Set Theory
DOMAINS
It is possible for several attributes to have the same domain, i.e. several attributes can
contain the same domain of possible values.
Later we will introduce a customer relation. It has a customer name; if Ialso had an
employee table with the field employee name, technically they both have the same
domain.
It depends upon how you look at it. If the domain is the set of all possible names, this
is true.
What about the domains website and first-year. They are incompatible.
What about website and category? While they both store character strings, the website
domain has a more limited domain.
In a set, an attribute may contain the value Null.
For now we will assume they do not.
Database Systems – Set Theory
DATABASE SCHEMAS
 Logical design of the database
 defines the type definition of a variable
DATABASE INSTANCE
 Snapshot of the database at a given time
 an instance of a variable
A database schema in relations is defined by using a capitalized name for the
relationship-schema and a lowercase name of each attribute. An instance of a relation
is represented by a lowercase name.
Websites-schema(website, organization, first-year, category)
A relation on the Website-schema is as follows:
websites(Website-schema)
Side notes, very important:
A relation has no order
A relation cannot contain duplicate tuples
Database Systems – Set Theory
Customers-Schema=(website, first-name, last-name)
customers Relation
website
first-name last-name
www.zojjed.com
Derek
Jeter
www.zojjed.com
Chase
Utley
www.drexel.edu/~jsalvage
Jeremy
Johnson
www.racewalk.com
Ryan
Howard
www.zojjed.com
Ryan
Howard
Notice the website attribute appears in both the customers relation and
Websites relation.
This is not a coincidence, fields often are repeated.
This allows distinct relations to be related.
If we wanted to gather website details for customer websites we need
information from both relations
Database Systems – Set Theory
Combined information from website and customers relations
website
category
first-name
last-name
www.zojjed.com
Fiction
Derek
Jeter
www.zojjed.com
Fiction
Chase
Utley
www.cs.drexel.edu/~jsalvage
Education
Jeremy
Johnson
www.racewalk.com
Health
Ryan
Howard
www.zojjed.com
Fiction
Ryan
Howard
In real databases, unique id fields would be used to identify the
customer and the website so the website name would not be repeated
in both tables.
Database Systems – Set Theory
Instead of having two schemas, it’s possible to have one schema as follows:
WebsiteCustomers(website, organization, first-year, category, first-name, last-name)
What is wrong with this?
Database Systems – Set Theory
Redundant data as well as null fields.
website
organization
first-year
category
first-name
last-name
www.zojjed.com
Walking Promotions
2006
Fiction
Derek
Jeter
www.racewalk.com
Walking Promotions
1995
Health
Ryan
Howard
www.greattreks.com
Walking Promotions
2006
Travel
Null
Null
www.twofeetgallery.com
Walking Promotions
2004
Photographs
Null
Null
www.walkinghealthy.com
Walking Promotions
2002
Health
Null
Null
www.zojjed.com
Walking Promotions
2006
Fiction
Ryan
Howard
www.zojjed.com
Walking Promotions
2006
Fiction
Chase
Utley
www.cs.drexel.edu/~jsalvage
Drexel University
2005
Education
Jeremy
Johnson
Database Systems – Set Theory
hit-counts-Schema= (website, date, hit-count)
hit-counts relation
website
date
hit-count
www.zojjed.com
5/20/2007
5
www.racewalk.com
5/20/2007
2019
www.greattreks.com
5/20/2007
1050
www.twofeetgallery.com
5/20/2007
32
www.walkinghealthy.com
5/20/2007
159
www.zojjed.com
5/21/2007
6
www.zojjed.com
5/22/2007
5
www.cs.drexel.edu/~jsalvage
5/20/2007
376
www.racewalk.com
5/21/2007
2099
Is there anything wrong with the above relation?
Database Systems – Set Theory
hit-counts-Schema= (website, date, hit-count)
hit-counts relation
website
date
hit-count
www.zojjed.com
5/20/2007
5
www.racewalk.com
5/20/2007
2019
www.greattreks.com
5/20/2007
1050
www.twofeetgallery.com
5/20/2007
32
www.walkinghealthy.com
5/20/2007
159
www.zojjed.com
5/21/2007
6
www.zojjed.com
5/22/2007
5
www.cs.drexel.edu/~jsalvage
5/20/2007
376
www.racewalk.com
5/21/2007
2099
Is there anything wrong with the above relation?
No, there is no reason why we cannot list a website more than once.
Database Systems – Set Theory
If we did not care about the date and only cared about the hit count, could we define
the hit-counts Schema as follows:
hit-counts-Schema= (website, hit-count)
hit-counts relation:
website
hit-count
www.zojjed.com
5
www.racewalk.com
2019
www.greattreks.com
1050
www.twofeetgallery.com
32
www.walkinghealthy.com
159
www.zojjed.com
6
www.zojjed.com
5
www.cs.drexel.edu/~jsalvage
376
www.racewalk.com
2099
Database Systems – Set Theory
If we did not care about the date and only cared about the hit count, could we define
the hit-counts Schema as follows:
hit-counts-Schema= (website, hit-count)
hit-counts relation:
website
hit-count
www.zojjed.com
5
www.racewalk.com
2019
www.greattreks.com
1050
www.twofeetgallery.com
32
www.walkinghealthy.com
159
www.zojjed.com
6
www.zojjed.com
5
www.cs.drexel.edu/~jsalvage
376
www.racewalk.com
2099
In real databases there would be no problem, but we said that you cannot repeat
tuples in a relation. So the answer is no, we cannot use this schema because of
duplicates.
Database Systems – Set Theory
QUERY LANGUAGES
A query language is a language in which the user requests information from the
database.
Can be procedural or non-procedural.
We will study Relational Algebra
It Is a procedural language consisting of sets of operations that take one or two
relations as input and output a relation. Operations include:
 select
 project
 union
 set difference
 Cartesian product
 Rename
 Intersection
 Aggregate functions
We will also study various forms of joining relations.
Database Systems – Set Theory
Unary- operates on one relation
Binary – operates on a pair of relations
The Select Operation
Unary operation
Selects tuples that satisfy a given predicate
 - represents a select operation - sigma
<select condition>(R)
<selection condition> = <attribute name> <comparison op> <constant value> or
<selection condition> = <attribute name> <comparison op> <attribute name>
comparison operators are: =, <>, <, <=, >, >=
equal, not equal, less than, less than or equal to, greater than, greater than or equal
to
Database Systems – Set Theory
To select those tuples of the hit-counts relation where the website is “www.zojjed.com”
we write.
website = “www.zojjed.com”(hit-counts)
This returns the relation:
hit-counts relation
website
date
hit-count
website
date
hit-count
www.zojjed.com
5/20/2007
5
www.zojjed.com
5/20/2007
5
www.zojjed.com
5/21/2007
6
www.racewalk.com
5/20/2007
2019
www.zojjed.com
5/22/2007
5
www.greattreks.com
5/20/2007
1050
www.twofeetgallery.com
5/20/2007
32
www.walkinghealthy.com
5/20/2007
159
www.zojjed.com
5/21/2007
6
www.zojjed.com
5/22/2007
5
www.cs.drexel.edu/~jsalvage
5/20/2007
376
www.racewalk.com
5/21/2007
2099
Database Systems – Set Theory
To select those tuples of the hit-counts relation where the hit-count is greater than
1000 we write.
hit-count > 1000 (hit-counts)
This returns the relation:
hit-counts relation
website
date
hit-count
website
date
hit-count
www.racewalk.com
5/20/2007
2019
www.zojjed.com
5/20/2007
5
www.greattreks.com
5/20/2007
1050
www.racewalk.com
5/20/2007
2019
ww.racewalk.com
5/21/2007
2099
www.greattreks.com
5/20/2007
1050
www.twofeetgallery.com
5/20/2007
32
www.walkinghealthy.com
5/20/2007
159
www.zojjed.com
5/21/2007
6
www.zojjed.com
5/22/2007
5
www.cs.drexel.edu/~jsalvage
5/20/2007
376
www.racewalk.com
5/21/2007
2099
Database Systems – Set Theory
Can combine predicates with and, or, and not
To select those tuples of the hit-counts relation where the hit-count is greater than 5
and the website is www.zojjed.com, we write.
hit-count > 5
and website = “www.zojjed.com”
This returns the relation:
(hit-counts)
hit-count > 5
website
date
hit-count
website
date
hit-count
www.zojjed.com
5/21/2007
6
www.racewalk.com
5/20/2007
2019
www.greattreks.com
5/20/2007
1050
www.twofeetgallery.com
5/20/2007
32
website = “www.zojjed.com”
website
date
hit-count
www.walkinghealthy.com
5/20/2007
159
www.zojjed.com
5/20/2007
5
www.zojjed.com
5/21/2007
6
www.zojjed.com
5/21/2007
6
www.cs.drexel.edu/~jsalvage
5/20/2007
376
www.zojjed.com
5/22/2007
5
www.racewalk.com
5/21/2007
2099
Database Systems – Set Theory
The Project Operation
unary
returns arguments in relation without all attributes
duplicates are removed
 - represent project operation - pi
 <attribute list> (R)
website, category(Websites)
website
category
www.zojjed.com
Fiction
www.racewalk.com
Health
www.greattreks.com
Travel
www.twofeetgallery.com
Photographs
www.walkinghealthy.com
Health
www.cs.drexel.edu/~jsalvage
Education
Database Systems – Set Theory
Composition of Relational Operations
Often we need to combine operations. Often we wish to select a set of tuples and limit
the relation returned to a few attributes.
What if we want to find out only the websites that have had greater than 1000 hits in a
given day?
First we must find out what tuples have hit counts greater than 1000.
We can accomplish this with the following relational query:

(hit-counts)
hit-count>1000
By using the Project operation we can remove the extra attributes such as hit-count
and date and only return the values in the website column.
website(
(hit-counts))
hit-count>1000
What is the relation that is returned?
Database Systems – Set Theory
What if we want to find out only the websites that have had greater than 1000 hits in a
single day?
website(
(hit-counts))
hit-count>1000
What is the relation that is returned?
hit-counts relation
website
website
date
hit-count
www.racewalk.com
www.zojjed.com
5/20/2007
5
www.greattreks.com
www.racewalk.com
5/20/2007
2019
www.greattreks.com
5/20/2007
1050
www.twofeetgallery.com
5/20/2007
32
www.walkinghealthy.com
5/20/2007
159
www.zojjed.com
5/21/2007
6
www.zojjed.com
5/22/2007
5
www.cs.drexel.edu/~jsalvage
5/20/2007
376
www.racewalk.com
5/21/2007
2099
Database Systems – Set Theory
Union Operator
binary
 - union operator
It is often useful to combine the results of queries.
Again, remember that set theory removes duplicates.
Relation 1  Relation 2 = Result Set
Database Systems – Set Theory
What is a query that returns all websites that have customers OR a hit count greater
than 1000?
We need information from both the customers relation as well as the hit count relation.
First we need the names of all websites that have customers
website(customers)
customers Relation
website
website
first-name
last-name
www.drexel.edu/~jsalvage
www.zojjed.com
Derek
Jeter
www.racewalk.com
www.zojjed.com
Chase
Utley
www.zojjed.com
www.drexel.edu/~jsalvage
Jeremy
Johnson
www.racewalk.com
Ryan
Howard
www.zojjed.com
Ryan
Howard
Database Systems – Set Theory
Then we need the names of the websites that have a hit count greater than 1000:
website(hit-count>1000 (hit-counts))
website
www.racewalk.com
www.greattreks.com
hit-counts relation
website
date
hit-count
www.zojjed.com
5/20/2007
5
www.racewalk.com
5/20/2007
2019
www.greattreks.com
5/20/2007
1050
www.twofeetgallery.com
5/20/2007
32
www.walkinghealthy.com
5/20/2007
159
www.zojjed.com
5/21/2007
6
www.zojjed.com
5/22/2007
5
www.cs.drexel.edu/~jsalvage
5/20/2007
376
www.racewalk.com
5/21/2007
2099
Database Systems – Set Theory
Combine the results using a union operation
website(customers)  website(
(hit-counts))
hit-count>1000
website
www.drexel.edu/~jsalvage
www.racewalk.com
www.greattreks.com
www.zojjed.com
website(hit-count>1000 (hit-counts))
website(customers)
website
website
www.racewalk.com
www.drexel.edu/~jsalvage
www.greattreks.com
www.racewalk.com
www.zojjed.com
Remember, order is not important!
• Unions MUST be of similar types
• They MUST have the same number of attributes
• The domains of the attributes MUST be the same
Database Systems – Set Theory
Intersection Operator
binary
∩ - intersection operator
Returns all tuples contained within both relations
Relation 1 ∩ Relation 2 = Result Set
Database Systems – Set Theory
What is a query that returns all websites that have customers AND a hit count greater
than 1000?
First we need the names of all websites that have customers
website(customers)
website
www.drexel.edu/~jsalvage
www.racewalk.com
www.zojjed.com
Then we need the names of the websites that have a hit count greater than 1000:
website(
(hit-counts))
hit-count>1000
website
www.racewalk.com
www.greattreks.com
Database Systems – Set Theory
What is a query that returns all websites that have customers AND a hit
count greater than 1000?
website(customers) ∩ website(hit-count>1000 (hit-counts))
website
www.racewalk.com
website(hit-count>1000 (hit-counts))
website(customers)
website
website
www.racewalk.com
www.drexel.edu/~jsalvage
www.greattreks.com
www.racewalk.com
www.zojjed.com
Database Systems – Set Theory
The Set Difference Operation (MINUS)
binary
-, denotes set difference
Relation 1 - Relation 2 = Result Set
Finds tuples in one set but not in another
r – s, produces a set containing those tuples in r but not in s, operand order is
significant.
Database Systems – Set Theory
Produce a list of websites who have a hit count > 1000 and do not have a
customer.
We need the names of all websites that have customers
website(customers)
website
www.drexel.edu/~jsalvage
www.racewalk.com
www.zojjed.com
Then we need the names of the websites that have a hit count greater than 1000:
website(
(hit-counts))
hit-count>1000
website
www.racewalk.com
www.greattreks.com
website(hit-count>1000 (hit-counts)) - website(customers)
Database Systems – Set Theory
Produce a list of websites who have a hit count > 1000 and do not have a customer.
website(hit-count>1000 (hit-counts)) - website(customers)
website
www.greattreks.com
website(hit-count>1000 (hit-counts))
website(customers)
website
website
www.racewalk.com
www.drexel.edu/~jsalvage
www.greattreks.com
www.racewalk.com
www.zojjed.com
Notice the attributes in R1 that are not in R2 are included in the result set, but the
attributes in R2 that are not in R1 are not included in the result set, i.e. operand order
is significant.
Database Systems – Set Theory
Given the relation websites below:
website
organization
first-year
category
www.zojjed.com
Walking Promotions
2006
Fiction
www.racewalk.com
Walking Promotions
1995
Health
www.greattreks.com
Walking Promotions
2006
Travel
www.twofeetgallery.com
Walking Promotions
2004
Photographs
www.walkinghealthy.com
Walking Promotions
2002
Health
www.cs.drexel.edu/~jsalvage
Drexel University
2005
Education
What is the result of the following?
website(empty set) - website(websites)
Database Systems – Set Theory
Given the relation websites below:
website
organization
first-year
category
www.zojjed.com
Walking Promotions
2006
Fiction
www.racewalk.com
Walking Promotions
1995
Health
www.greattreks.com
Walking Promotions
2006
Travel
www.twofeetgallery.com
Walking Promotions
2004
Photographs
www.walkinghealthy.com
Walking Promotions
2002
Health
www.cs.drexel.edu/~jsalvage
Drexel University
2005
Education
What is the result of the following?
website(empty set) - website(websites)
The result is the empty set, because no records are contained in “R1” and only the
records in R1 that are not in R2 are returned from the MINUS operator.
Database Systems – Set Theory
Given the relation websites below:
website
organization
first-year
category
www.zojjed.com
Walking Promotions
2006
Fiction
www.racewalk.com
Walking Promotions
1995
Health
www.greattreks.com
Walking Promotions
2006
Travel
www.twofeetgallery.com
Walking Promotions
2004
Photographs
www.walkinghealthy.com
Walking Promotions
2002
Health
www.cs.drexel.edu/~jsalvage
Drexel University
2005
Education
What is the result of the following?
website(websites) - website(empty set)
Database Systems – Set Theory
The result of website(websites) - website(empty set) is the complete relation
websites, since no records are contained in the empty set all records from the
websites relation are included in the result set.
website
www.zojjed.com
www.racewalk.com
www.greattreks.com
www.twofeetgallery.com
www.walkinghealthy.com
www.cs.drexel.edu/~jsalvage
Database Systems – Set Theory
The Cartesian-Product Operation
binary
x – combines information in two relations
Relation 1 x Relation 2 = Result Set
because attributes can be repeated in different relations, a notation relation.attribute
will be used.
Therefore, the resulting schema of r = websites x customers is:
(websites.website, websites.organization, websites.first-year, websites.category,
customers.website, customers.first-name, customers.last-name)
Note this does not support cases where you wish to use the same relation twice; we
will address this with the rename operation shortly.
What tuples exist in r if r = websites x customers?
The combination of all tuples in websites with every tuple in customers.
Given r1 with n1 tuples and r2 with n2 tuples then r1xr2 has n1*n2 tuples
Database Systems – Set Theory
Let’s look at a simplified example first.
If relation R1 contains the following:
Then R1 x R2 contains the
following:
R1.Value 1
R2.Value 2
1
A
1
B
1
C
2
A
2
B
Value2
2
C
A
3
A
B
3
B
C
3
C
Value1
1
2
3
and if relation R2 contains the following:
Database Systems – Set Theory
Similarly, websites x customers appears as follows:
websites.website
organization
category
customers.website
first-name
last-name
Walking Promotions
firstyear
2006
www.zojjed.com
Fiction
www.zojjed.com
Derek
Jeter
www.zojjed.com
Walking Promotions
2006
Fiction
www.zojjed.com
Chase
Utley
www.zojjed.com
Walking Promotions
2006
Fiction
www.drexel.edu/~jsalvage
Jeremy
Johnson
www.zojjed.com
Walking Promotions
2006
Fiction
www.racewalk.com
Ryan
Howard
www.zojjed.com
Walking Promotions
2006
Fiction
www.zojjed.com
Ryan
Howard
www.racewalk.com
Walking Promotions
1995
Health
www.zojjed.com
Derek
Jeter
www.racewalk.com
Walking Promotions
1995
Health
www.zojjed.com
Chase
Utley
www.racewalk.com
Walking Promotions
1995
Health
www.drexel.edu/~jsalvage
Jeremy
Johnson
www.racewalk.com
Walking Promotions
1995
Health
www.racewalk.com
Ryan
Howard
www.racewalk.com
Walking Promotions
1995
Health
www.zojjed.com
Ryan
Howard
www.greattreks.com
Walking Promotions
2006
Travel
www.zojjed.com
Derek
Jeter
www.greattreks.com
Walking Promotions
2006
Travel
www.zojjed.com
Chase
Utley
www.greattreks.com
Walking Promotions
2006
Travel
www.drexel.edu/~jsalvage
Jeremy
Johnson
www.greattreks.com
Walking Promotions
2006
Travel
www.racewalk.com
Ryan
Howard
www.greattreks.com
Walking Promotions
2006
Travel
www.zojjed.com
Ryan
Howard
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Database Systems – Set Theory
What if we want to find all the customers who bought from a website created before
the year 2000?
We could try the following:
first-year <2000 (websites x customers)
Note that we are not using a projection to reduce the number of attributes to show
what is really happening. In the end, you would use a projection to show only the
fields requested by the query.
websites.website
organization
first-year
category
customers.website
first-name
last-name
www.racewalk.com
Walking Promotions
1995
Health
www.zojjed.com
Derek
Jeter
www.racewalk.com
Walking Promotions
1995
Health
www.zojjed.com
Chase
Utley
www.racewalk.com
Walking Promotions
1995
Health
www.drexel.edu/~jsalvage
Jeremy
Johnson
www.racewalk.com
Walking Promotions
1995
Health
www.racewalk.com
Ryan
Howard
www.racewalk.com
Walking Promotions
1995
Health
www.zojjed.com
Ryan
Howard
Oops, too many tuples!
Database Systems – Set Theory
Because the Cartesian-product pairs all possible tuples, all tuples from websites are
combined with all tuples from customers. While only those with the first-year < 2000
are selected, it still returns 5 tuples.
Of those sets, we only want the ones where the websites relation’s website attribute
equals the customers relation’s website attribute.
websites.website
organization
first-year
category
customers.website
first-name
last-name
www.racewalk.com
Walking Promotions
1995
Health
www.zojjed.com
Derek
Jeter
www.racewalk.com
Walking Promotions
1995
Health
www.zojjed.com
Chase
Utley
www.racewalk.com
Walking Promotions
1995
Health
www.drexel.edu/~jsalvage
Jeremy
Johnson
www.racewalk.com
www.racewalk.com
Walking Promotions
Walking Promotions
1995
1995
Health
Health
www.racewalk.com
www.zojjed.com
Ryan
Ryan
The only tuple we truly want is the highlighted tuple.
we can write this as follows:
websites.website = customers.website(first-year <2000 (websites x customers))
Howard
Howard
Database Systems – Set Theory
Since,
websites.website = customers.website(first-year <2000 (websites x customers))
Returns the following tuple with too many attributes, we must also use a
projection to remove the excessive attributes.
websites.website
organization
first-year
category
customers.website
first-name
last-name
www.racewalk.com
Walking Promotions
1995
Health
www.racewalk.com
Ryan
Howard
Applying the projection of first-name, last name to the previous query gives us the
following query:
first-name, last-name(websites.website = customers.website(first-year <2000 (websites
x customers)))
Database Systems – Set Theory
The Assignment Operator
unary
allows an expression to be assigned to a variable
NewRelation  OldRelation
For example:
1200loans  amount > 1200(loan)
or
result  loan-number(1200loans)
Or
The Rename Operation
Unary
x(E) renames the expression E to x.
Relational-algebra expressions do not have a name that we can refer to them by using
the rename operator,
 is roh.
Database Systems – Set Theory
Example, without using an aggregation function (not yet shown), find the largest hit
count of any website for a single day. If the same max hit count exists more than
once, you are only allowed to return a single tuple containing the answer.
We accomplish this in two steps:
First, compute a temporary relation consisting of hit counts less than the largest hit
count.
Second, take the set difference (minus) between the relation hit-count(hit-counts)
and the temporary relation
To accomplish the first step, compute all the websites hit counts compared to all the
websites hit counts, in other words compute the Cartesian product of the relation
hit-counts with itself.
hit-counts x hit-counts
However, we must rename one of the hit-counts relations so that we can identify the
balance distinctly
Database Systems – Set Theory
Given the projection of only the hit-count field from the relation hit-counts via
hit-count(hit-counts)
If we rename the result of this projection
We have:
d (hit-count(hit-counts))
hit-count
5
2019
1050
32
159
6
376
2099
And thus create a cross product of the two relations as
hit-count(hit-counts) x d (hit-count(hit-counts))
Database Systems – Set Theory
hit-count(hit-counts) x d (hit-count(hit-counts))
hit-count
d(hit-count)
5
5
2019
5
1050
5
32
5
159
5
6
5
376
5
2099
5
5
2019
2019
2019
1050
2019
32
2019
159
2019
6
2019
376
2019
2099
2019
hit-count
d(hit-count)
hit-count
d(hit-count)
5
1050
5
159
2019
1050
2019
159
1050
1050
1050
159
32
1050
32
159
159
1050
159
159
6
1050
6
159
376
1050
376
159
2099
1050
2099
159
5
32
5
6
2019
32
2019
6
1050
32
1050
6
32
32
32
6
159
32
159
6
6
32
6
6
376
32
376
6
2099
32
2099
6
Database Systems – Set Theory
hit-count(hit-counts) x d (hit-count(hit-counts))
hit-count
d(hit-count)
5
376
2019
376
1050
376
32
376
159
376
6
376
376
376
2099
376
5
2099
2019
2099
1050
2099
32
2099
159
2099
6
2099
376
2099
2099
2099
Database Systems – Set Theory
Now we select only those tuples that have the first attribute containing a
value less than the second attribute, we do so with the following query:
hitcounts.hit-count < d.hit-count (hit-count(hit-counts) x d (hit-count(hit-counts)))
hit-count
d(hit-count)
5
5
2019
5
1050
5
32
5
159
5
6
5
376
5
2099
5
5
2019
2019
2019
1050
2019
32
2019
159
2019
6
2019
376
2019
2099
2019
hit-count
d(hit-count)
hit-count
d(hit-count)
5
1050
5
159
2019
1050
2019
159
1050
1050
1050
159
32
1050
32
159
159
1050
159
159
6
1050
6
159
376
1050
376
159
2099
1050
2099
159
5
32
5
6
2019
32
2019
6
1050
32
1050
6
32
32
32
6
159
32
159
6
6
32
6
6
376
32
376
6
2099
32
2099
6
Database Systems – Set Theory
hit-count(hit-counts) x d (hit-count(hit-counts))
hit-count
d(hit-count)
5
376
2019
376
1050
376
32
376
159
376
6
376
376
376
2099
376
5
2099
2019
2099
1050
2099
32
2099
159
2099
6
2099
376
2099
2099
2099
Database Systems – Set Theory
This certainly gives us a lot of tuples, but if we then project just the hit-count from the
first column and remove the duplicates, we are left with the following:
hit-count
5
2019
1050
32
159
6
376
This is the set containing most hit counts, but it does not include the largest hit count.
Database Systems – Set Theory
To get just the largest hit count we now simply subtract our result set from the
projection of the original hit count relation as follows:
hit-count(hit-counts) - hit-count(hitcounts.hit-count < d.hit-count (hit-count(hit-counts) x
d (hit-count(hit-counts))))
hit-count
2099
Database Systems – Set Theory
We need a better way to represent certain queries because the notation for joining two
relations and only selecting records where the attributes match is too cumbersome.
Therefore we have:
The Natural Join Operation
Binary
Result Set = R1 |x| R2
The natural join operation finds the Cartesian product of two relations, but only returns
tuples where the attributes whose names are the same in both relations contain the
same values.
Database Systems – Set Theory
Let’s look at a simplified example first.
If relation R1 contains the following:
and if relation R2 contains the following:
Value1
Value2
Value2
Value3
1
X
X
A
2
Y
Z
B
3
Z
A
C
Database Systems – Set Theory
Then R1 x R2 contains the following:
R1.Value 1
R1.Value 2
R2.Value2
R2.Value3
1
X
X
A
1
X
Z
B
1
X
A
C
2
Y
X
A
2
Y
Z
B
2
Y
A
C
3
Z
X
A
3
Z
Z
B
3
Z
A
C
Then R1 |x| R2 contains the following:
R1.Value 1
R1.Value 2
R2.Value2
R2.Value3
1
X
X
A
3
Z
Z
B
Database Systems – Set Theory
Example:
Find the names of all customers who have made a purchase from a health or travel
website. Return the name of the customer, the website, and the category of the
website.
The old way:
Form a Cartesian product of the websites and customers relations.
Select the tuples of the same website as well as a category equal to “health” or
“travel.”
Project the first-name, last-name, website, and category

first-name, last-name, website, category
websites.website = customers.website and
(websites x customers))
category = “Health” or category = “Travel”
(
Database Systems – Set Theory
Another example:
Find all the names of websites and the dates they have a hit count for web sites that
are in the health category.
Database Systems – Set Theory
Another example:
Find all the web site names and the dates that have hit counts for web sites in the
health category.

website, date
(category = “Health” (websites |x| hit-counts))
Database Systems – Set Theory
Generalized Projections
Allows basic arithmetic operations within fields of a tuple
Observe the Sales relation:
product
first-name
last-name
tax
total-cost
Zojjed!
Derek
Jeter
1.00
17.95
Zojjed!
Chase
Utley
1.00
17.95
VB .NET Coach
Jeremy
Johnson
0
54.95
Race Walk Like A Champion
Ryan
Howard
1.25
25.95
Zojjed!
Ryan
Howard
0
16.95
What was the price of the product sold minus the tax paid?
 product, first-name, last-name, (total-cost – tax) as net-pay (Sales)
Database Systems – Set Theory
Aggregate Functions
takes a collection of values and returns a single value as a result
i.e
sum {1, 1, 3, 4, 4, 11} returns the value 24.
avg {1, 1, 3, 4, 4, 11} returns the value 4.
count {1, 1, 3, 4, 4, 11} returns the value 6.
min {1, 1, 3, 4, 4, 11} returns the value 1.
max {1, 1, 3, 4, 4, 11} returns the value 11.
count-distinct {1, 1, 3, 4, 4, 11} returns the value 4.
Ignore the fact that we said sets can’t contain duplicate values
Database Systems – Set Theory
Operations to Modify the Contents of Relations
Deletion
rr–E
This uses set difference (minus) and variable assignment.
Delete all of the sale of “Zojjed!” from the Sales relation
sales  sales - product = “Zojjed!” (sales)
Delete all sales with no tax collected
sales  sales - tax = 0 (sales)
Insertion
rrE
This uses union and variable assignment.
Add a record to the sales relation
sales  sales  {(“I Walk to Eat”, “Chase”, “Utley”, 0, 15.95)}
Add a record to the websites relation
websites  websites  {(“www.mediatitan.net”, “Walking Promotions”, 2006,
“Fiction”)}
Database Systems – Set Theory
Updating
Zero all tax attributes in the sales relation
sales   product, first-name, last-name, 0, total-cost(sales)
Database Systems – Set Theory
Joins
There are other forms of joins. Let’s look at the following two simple relations:
employee-name
city
employee-name
team
Jeter
New York City
Glavine
Mets
Howard
Philadelphia
Howard
Phillies
Utley
Philadelphia
Bonds
Giants
Schilling
Boston
Schilling
Choke Sox
cities relation
teams relation
employee-name
city
employee-name
team
Howard
Philadelphia
Howard
Phillies
Schilling
Boston
Schilling
Choke Sox
Natural Join -> cities |x| teams
The natural join omits records that do not match, so we do not have
records for Jeter, Utley, Glavine, or Bonds.
Database Systems – Set Theory
employee-name
city
employee-name
team
Jeter
New York City
Glavine
Mets
Howard
Philadelphia
Howard
Phillies
Utley
Philadelphia
Bonds
Giants
Schilling
Boston
Schilling
Choke Sox
cities relation
teams relation
employee-name
city
employee-name
team
Jeter
New York City
Null
Null
Howard
Philadelphia
Howard
Phillies
Utley
Philadelphia
Null
Null
Schilling
Boston
Schilling
Choke Sox
Left Outer Join -> cities LOJ teams
Includes all records from the left and only those records on the right
that match
Database Systems – Set Theory
A subtle note for Left Outer Join.
If relation R1 contains the following:
and if relation R2 contains the following:
Value1
Value2
Value2
Value3
1
X
X
A
2
Y
Z
B
3
Z
Z
C
Database Systems – Set Theory
Then R1 LOJ R2 contains the following:
R1.Value 1
R1.Value 2
R2.Value2
R2.Value3
1
X
X
A
2
Y
Null
Null
3
Z
Z
B
3
Z
Z
C
Database Systems – Set Theory
employee-name city
employee-name
team
Jeter
New York City
Glavine
Mets
Howard
Philadelphia
Howard
Phillies
Utley
Philadelphia
Bonds
Giants
Schilling
Boston
Schilling
Choke Sox
cities relation
teams relation
employee-name
city
employee-name
team
Null
Null
Glavine
Mets
Howard
Philadelphia
Howard
Phillies
Null
Null
Bonds
Giants
Schilling
Boston
Schilling
Choke Sox
Right Outer Join -> cities ROJ teams
Includes all records from the right and only those records on the left
that match
Database Systems – Set Theory
employee-name city
employee-name
team
Jeter
New York City
Glavine
Mets
Howard
Philadelphia
Howard
Phillies
Utley
Philadelphia
Bonds
Giants
Schilling
Boston
Schilling
Choke Sox
cities relation
teams relation
employee-name
city
employee-name
team
Null
Null
Glavine
Mets
Howard
Philadelphia
Howard
Phillies
Null
Null
Bonds
Giants
Schilling
Boston
Schilling
Choke Sox
Jeter
New York City
Null
Null
Utley
Philadelphia
Null
Null
Full Outer Join -> cities FOJ teams
Includes all records from the right and left, tuples that do not match
have nulls in their place.
Database Systems – Set Theory
NULLS
And:
true and unknown = unknown
false and unknown = false
unknown and unknown = unknown
Or
true or unknown = true
false or unknown = unknown
unknown or unknown = unknown
Not
not unknown = unknown
Database Systems – Set Theory
REFERENTIAL INTEGRITY
Superkey of R
A unique identifier for a tuple.
A set of attributes SK of R such that no two tuples in any valid relation
instance r(R) will have the same value for SK. That is, for any distinct
tuples t1 and t2 in r(R), t1[SK]  t2[SK].
Key of R
A "minimal" superkey; that is, a superkey K such that removal of any
attribute from K results in a set of attributes that is not a superkey, i.e. every
attribute of K is needed to maintain its superkey status.
Example: The CAR relation schema
CAR(State, Reg#, SerialNo, Make, Model, Year) has two keys
Key1 = {State, Reg#}
Key2 = {SerialNo}
Both are superkeys. {SerialNo, Make} is a superkey but not a key.
If a relation has several candidate keys, one is chosen arbitrarily to be the
primary key. The primary key attributes are underlined.
Database Systems – Set Theory
REFERNTIAL INTEGRITY
Relational Database Schema
A set S of relation schemas that belong to the same database. S is the name
of the database.
S = {R1, R2, ..., Rn}
Entity Integrity:
The primary key attributes PK of each relation schema R in S cannot have
null values in any tuple of r(R). This is because primary key values are used
to identify the individual tuples.
t[PK]  null for any tuple t in r(R)
Note, other attributes of R may be similarly constrained to disallow null
values, even though they are not members of the primary key.
Database Systems – Set Theory
Referential Integrity
A constraint involving two relations (the previous constraints involve a single
relation).
Used to specify a relationship among tuples in two relations: the referencing
relation and the referenced relation.
Tuples in the referencing relation R1 have attributes FK (called foreign key
attributes) that reference the primary key attributes PK of the referenced
relation R2.
A tuple t1 in R1 is said to reference a tuple t2 in R2 if t1[FK] = t2[PK].
A referential integrity constraint can be displayed in a relational database
schema as a directed arc from R1.FK to R2.