Transcript ppt - DUET
Data Models
How to structure data
1
What is a Data Model?
Having formed a model of the
enterprise, we now need to
represent the data.
The data model tells us the
structure of the database.
Historically, three data models:
Hierarchical data model
Network data model
Relational data model
2
Hierarchical and Network
Data Models
Hierarchical and network data
models have been superseded
by the relational data model.
Reasons:
Lack of expressive power
E.g., one cannot express many-tomany relationships in the
hierarchical model
More closely tied to the
underlying implementation.
Hence, less data independence.
Relational data model has a
clean mathematical basis.
3
The Relational Model
Due to Codd.
Everything is represented as a
relation in the mathematical
sense. Also called tables.
A database therefore is a
collection of tables, each of
which has a unique name, and
each of which is described by a
schema.
In addition, Codd defined a data
manipulation language.
4
Example of Schemas in
the Relational Model
Example of a representation of
entity sets:
Student(sid,name,addr)
Course(cid,title,eid)
Empl(eid, ename, deptid)
Dept(deptid, dname, loc)
Primary keys are underlined.
Recall that a primary key is one
that uniquely identifies an entity.
An entity is a row in a table.
5
More Example Schemas
Relationship sets between entity
sets are also represented in
tables.
Example of a table
corresponding to a relationship:
Enrol(sid, cid, grade)
Again, a relationship is
represented by a row (or a tuple)
in a relation.
6
Relational Databases:
Basic Concepts I
Attribute:
A column in a table
Domain
The set of values from which the
values of an attribute are drawn.
Null value
A special value, meaning “not
known” or “not applicable”.
Relation schema
A set of attribute names
7
Relational Databases:
Basic Concepts II
Tuple
A set of values, one for each
attribute in the relation scheme
over which the tuple is defined,
i.e. a mapping from attributes to
the appropriate domains
Relation instance
A set of tuples over the scheme
of the relation
8
Relational Databases:
Basic Concepts III
Relational Database
A set of relations, each with a
unique name
Normalized Relation
A relation in which every value is
atomic (non-decomposable).
Hence, every attribute in every
tuple has a single value.
9
Keys
Candidate Key
A minimal set of attributes that
uniquely identifies a tuple
Primary Key
The candidate key chosen as the
identifying key of the relation
Alternate Key
Candidate keys which are not
primary keys
10
Foreign Key
An attribute (or set of attributes)
in table R1 which also occurs as
the primary key of relation R2.
R2 is called the referenced
relation.
Foreign keys are also called
connection keys or reference
attributes.
11
Integrity Rules: Entity
Constraint
Entity constraint
All attributes in a primary key
must be non-null.
Motivation: If the primary key
uniquely identifies an entity in an
entity set, then we must ensure
that we have all the relevant
information
12
Integrity Rules:
Referential Integrity
Referential integrity
A database cannot contain a
tuple with a value for a foreign
key that does not match a
primary key value in the
referenced relation.
Or, a foreign key must refer to a
tuple that exists.
Motivation: If referential integrity
were violated, we could have
relationships between entities
that we do not have any
information about.
13
Data Manipulation
Languages
In order for a database to be
useful, it should be possible to
store and retrieve information
from it. This is the role of the
data manipulation language.
One of the attractions of the
relational data model is that it
comes with a well-defined data
manipulation language.
14
Types of DML
Two types of data manipulation
languages
Navigational (procedural)
The query specifies (to some
extent) the strategy used to find
the desired result e.g. relational
algebra.
Non-navigational(nonprocedural)
The query only specifies what data
is wanted, not how to find it e.g.
relational calculus.
15
Relational Algebra
Codd defined a number of
algebraic operations for the
relational model.
Unary operations take as input a
single table and produce as
output another table.
Binary operations take as input
two tables and produce as output
another table.
16
Unary Operations:
Select
Select produces a table that only
contains the tuples that satisfy a
particular condition, in other
words a “horizontal” subset.
Appearance:
sC(R)
where C is a selection condition
and R is the relation over which
the selection takes place
17
Example of Select
Student
sid
123
345
567
name
Fred
John
Ann
addr
3 Oxford
6 Hope Rd.
5 Garden
s sid > 300(Student) yields
345
John
6 Hope Rd.
567
Ann
5 Garden
18
Unary Operations:
Project
Project produces a table
consisting of only some of the
attributes. It creates a “vertical”
subset.
Note that a project eliminates
duplicates.
Appearance:
ПA(R)
where A is a set of attributes of R
and R is the relation over which
the project takes place.
19
Example of Project
Enrol
sid
123
234
345
cid
CS51T
CS52S
CS52S
grade
76
50
55
Пcid(Enrol) yields
CS51T
CS52S
20
Binary Operations
Two relations are (union)
compatible if they have the same
set of attributes.
Example, one table may
represent suppliers in one
country, while another table with
same schema represents
suppliers in another country.
For the union, intersection and
set-difference operations, the
relations must be compatible.
21
Union, Intersection, Setdifference
R1 R2
The union is the table comprised
of all tuples in R1 or R2.
R1 R2
The intersection is the table
comprised of all tuples in R1 and
R2
R1 - R2
The set-difference between R1
and R2 is the table consisting of
all tuples in R1 but not in R2.
22
Cartesian Product
R1 R2
The Cartesian product is the
table consisting of all tuples
formed by concatenating each
tuple in R1 with a tuple in R2, for
all tuples in R2.
23
Example of a Cartesian
Product
R1
R2
R1 R2
A
1
2
C
a
b
c
A
1
1
1
2
2
2
B
x
y
D
s
t
u
B
x
x
x
y
y
y
C
a
b
c
a
b
c
D
s
t
u
s
t
u
24
Natural Join
R1
R2
Assume R1 and R2 have
attributes A in common. Natural
join is formed by concatenating
all tuples from R1 and R2 with
same values for A, and dropping
the occurrences of A in R2
R1
R2 = ПA’(sC(R1 R2))
where C is the condition that the
values for R1 and R2 are the
same for all attributes in A and A’ is
all attributes in R1 and R2 apart
from the occurrences of A in R2.
hence, natural join is syntactic
sugar
25
Example of a Natural Join
I
Course
cid
CS51T
CS52S
CS52T
CS51S
title
DBMS
OS
Networking
ES
Instructor
eid
123
345
456
ename
Rao
Allen
Mansingh
eid
123
345
345
456
26
Example of a Natural Join
II
Course
cid
CS51T
CS52S
CS52T
CS51S
Instructor
title
DBMS
OS
Net...
ES
eid
123
345
345
456
ename
Rao
Allen
Allen
Mansingh
27
Division
R1 R2
Assume that the schema for R2
is a proper subset of the one for
R1.
We form the division by
Ordering the tuples in R1 so that
all the tuples with the same value
for the non-common attributes are
grouped together.
Each group contributes a tuple to
the result if the group’s values on
the common attributes form a
superset of the values of these
attributes in R2.
28
Example of Division I
Enrol
cid
CS51T
CS52S
CS51T
CS52S
CS51T
CS52S
Temp
sid
123
234
sid
123
123
234
234
345
345
grade
A
A
C
B
C
C
grade
A
B
29
Example of Division II
Enrol
Enrol Temp
cid
sid
CS51T 123
CS51T 234
CS51T 345
CS52S 123
CS52S 234
CS52S 345
grade
A
C
C
A
B
C
cid
CS52S
Thus, the division gives all courses for
which 123 got an A and 234 a B.
30
Assignment
Allows the expression to be
written in parts.
Assigns the part to a temporary
variable.
This variable can be used in
subsequent expressions.
E.g.
sid(stitle = ‘DBMS’ (Enrol
Could be re-written as:
Course)
r
Enrol
Course
sid(stitle = ‘DBMS’(r))
31
Rename Operation
Names the result of an
expression.
x(A1,A2,…,An) (E)
returns the result of expression E
under the name x with the
attributes renamed as
A1,A2,…,An.
E.g. S (Student)
Renames Student table to S.
32
Database Modification
Insert
r
r
E
e.g.
Course
Delete
r
e.g.
Course
{(‘CS51T’,’DBMS’)}
r - E
Student
Student - ssid=‘1’(Student)
Update
r
F ,F ,…,F (r)
1 2
n
e.g.
Enrol
sid,cid,grade
grade + 2 (Enrol)
33
Examples
Assume the following schema:
Student(sid,sname,saddr)
Course(cid,title,lid)
Enrol(sid, cid, grade)
Lecturer(lid,lname,deptname)
Query 1: Find the name of all
students that have taken the course
entitled ‘Expert Systems’.
Query 2: Find the titles of all
courses that student ‘Mark Smith’ has
done.
Query 3: Find the id of students that
have enrolled in all the courses that
lecturer with id. = ‘234’ has taught.
Query 4: Find the highest grade for
‘CS51T’.
34
Relational Calculus
A relational calculus expression
defines a new relation in terms of
other relations.
A tuple variable ranges over a
named relation. So, its values
are tuples from that relation.
Example:
Get grades for CS51T
e(Enrol)
{<e.grade>: e.cid = ‘CS51T’ }
35
Basic Syntax for Relational
Calculus Expressions
r(R),…,s(S)
{ <target> : predicate}
where
R,..,S are tables
r,..,s are tuple variables
target specifies the attributes of
the resulting relation
predicate is a formula giving a
condition that tuples must satisfy
to qualify for the resulting
relation.
36
The Predicate
Predicate is constructed from
attribute names
constants
comparison operators
logical connectives
quantified tuple variables
t(R), t(R)
37
Examples of Relational
Calculus
Example 2
Get names and grades for
students enrolled in CS51T
e(Enrol), s(Student)
{<s.name, e.grade>:
e.cid = ‘CS51T’
s.sid = e.sid}
In relation algebra
Пcid, name( s CID =‘ CS51T’(Grade
Student))
38
Example 3
Give the names of all students
who got at least one A.
s(Student)
{<s.name>:
e(Enrol)
(e.grade = ‘A’
s.sid = e.sid)}
Tuple variables not mentioned in
the target list must be bound in
the predicate.
39
Example 4
Get the names of all students
who only got A’s
s(Student)
{<s.name>:
e(Enrol)( s.sid = e.sid
e.grade = ‘A’)
e2(Enrol) (s.sid = e2.sid)}
40
Example 5
Get the names of all students
who got an A and a B
s(Student)
{<s.name>:
e(Enrol) (e.grade = ‘B’
s.sid = e.sid)
e2(Enrol) (e2.grade = ‘A’
s.sid = e2.sid)}
41
Example 6
Get the course titles and names
for the courses for which the
student did not get an A
c(Course), s(Student)
{<s.name, c.title>:
g(Enrol) s.sid = g.sid
g.cid = c.cid
g.grade ‘A’}
42