Introduction to Database System

Download Report

Transcript Introduction to Database System

The Relational Model
What DB models exist?

Most widely current model: Relational Model
 Vendors: IBM DB2, Microsoft, Oracle, Sybase, etc…

“Legacy systems” have older models
 E.g., IBM’s IMS (hierarchical), CODASYL network
model

Recent and future competitors:
 object-oriented model: ObjectStore, Versant
 Object-relational model: Oracle, DB2
 Semi-structured: XML


Integrated in all major relational database systems
Native XML database systems
421B: Database Systems - The Relational Model
2
Definitions


Relational Database: a set of relations
Relation: Consists of two parts:
 Schema: specifies name of relation, plus a set of attributes,
plus the domain/type of each attribute

E.g., Students(sid:string, name:string, login:string,
faculty:string, gpa:real)
 Instance: set of tuples (all tuples are distinct).

A relation can be seen as a table:
 Column headers = attribute names, rows = tuples/records,
columns/fields = attribute values




#Rows = cardinality
#Fields = degree / arity
If clear from context, we say instead of “instance of a
relation” simply “relation”
Database Schema: collection of relation schemas
421B: Database Systems - The Relational Model
3
Example of Students Relation
sid
53666
53688
53650
name
Bartoli
Chang
Chang
login
faculty gpa
bartoli@cs Science 3.4
chang@eecs Eng
3.2
chang@math Science 3.8
Column headers
= attributes
Cardinality = 3
Degree = 5
...



All rows are distinct (set-oriented)
Rows are not ordered (a permutation of rows
represents still the same table)
Columns are per definition not ordered but in practice
we often assume a fixed order
 with this, a single tuple can be represented as
(53666, Bartoli, bartoli@cs, Science, 3.4)
421B: Database Systems - The Relational Model
4
Relational DDL and DML


Data Definition Language (DDL): defines the schema of a
database
Data Manipulation Language (DML): “manipulates” the
data, i.e., the instances of the relations
 Insert, update, delete tuples
 “Query” the relations: retrieve tuples that fulfill certain criteria
(hence, often called “query language”)

The Relational Model offers simple and powerful querying
of data with precise semantics independent of how data
is stored or whether changes in the physical structure
are made (physical data independence)
421B: Database Systems - The Relational Model
5
The SQL Query Language
Developed by IBM (system R) in the 1970s
 Need for a standard since it is used by many
vendors
 Standards:

 SQL-86
…
 SQL-99 / SQL3 (adds object-relational features)
 SQL:2003 (adds XML features)
421B: Database Systems - The Relational Model
6
SQL Data Types



All attributes must have a data type.
SQL supports several basic data types
Character and string types
 CHAR(n) denotes a character string of fixed length (containing
trailing blanks for padding if necessary).
 VARCHAR(n) denotes a string of up to n characters (between 0
and n characters).
 SQL permits reasonable coercion between values of characterstring types

Integer Types
 INT or INTEGER (names are synonyms)
 SHORTINT
421B: Database Systems - The Relational Model
7
Data Types (contd.)

Floating point numbers
 FLOAT or REAL (names are synonyms)
 DOUBLE PRECISION
 DECIMAL(n,d): real number with fixed decimal point. Value consists
of n digits, with the decimal point d positions from the right.

Dates and time:
 DATE: has the form ‘YYYY-MM-DD’
 TIME: has the form ‘15:00:02’ or ‘15:00:02.5’
 May be compared and converted to string types


Bit strings
User defined domains
 New name for a data type
 Possibility to define restrictions on values of domain (< 10)
421B: Database Systems - The Relational Model
8
Data Definition: Table Creation






Defines all attributes of the
relation
The type/domain of each
attribute is specified
DBMS enforce correct type
whenever a tuple is added or
modified
SQL is case insensitive
It is possible to define default
values
Special NULL value: ‘unknown’
421B: Database Systems - The Relational Model
CREATE TABLE Students
(sid CHAR(20),
name VARCHAR2(20),
login CHAR(10),
faculty VARCHAR(20),
gpa REAL
DEFAULT 0.0)
CREATE TABLE Enrolled
(sid CHAR(20),
cid CHAR(20),
grade CHAR(2))
9
Data Definition: Destroying and
Altering Relations
DROP TABLE Students

Destroys the relation Students. The schema
information and the tuples are deleted.
ALTER TABLE Students
ADD COLUMN firstYear:integer

The schema of students is altered by adding a
new field; every existing tuple in the current
instance is extended with a null value in the
new field.
421B: Database Systems - The Relational Model
10
Data Manipulation: insert, delete,
update

Insert a single tuple using:
INSERT INTO Students (sid,name,login,faculty,gpa)
VALUES(53688,’Chang’,’cheng@eecs’,’Eng’,3.2)
INSERT INTO Students (sid,name,login,faculty)
VALUES(53688,’Chang’,NULL,’Eng’)

Can delete all tuples satisfying some condition

Can update all tuples satisfying some condition
DELETE
FROM Students
WHERE name = ‘Chang’
UPDATE Students
SET gpa = 3.4
WHERE sid = 53688
421B: Database Systems - The Relational Model
11
Querying the Data

Find the names and gpa of all students with gpa less
than 3.5
SELECT name, gpa
FROM Students
WHERE gpa < 3.5
sid
53666
53688
53650
name
Bartoli
Chang
Chang
login
faculty
gpa
bartoli@cs Science 3.4
chang@eecs Eng
3.2
chang@math Science 3.8
421B: Database Systems - The Relational Model
name
Bartoli
Chang
gpa
3.4
3.2
12
Integrity Constraints (ICs)

Integrity Constraints must be true for any
instance of the database;
 e.g., domain constraints
 ICs are specified when schema is defined
 ICs are checked when relations are modified

A legal instance of a relation is one that
satisfies all specified ICs.
 DBMS should not allow illegal instances
 If DBMS checks ICs, stored data is more faithful to
real-world meaning (also checks for entry errors)
 Of course, DBMS can only check what is specified in
the schema
421B: Database Systems - The Relational Model
13
Primary Key Constraints

A set of fields is a key for a relation if
 No two distinct tuples can have same values in all key fields, and
 This is not true for any subset of the key.
If there are two or more keys, one of the candidates is
chosen to be the primary key.
 The primary key attributes of a tuple may not be NULL.
 A set of fields that contains a subset of fields fulfilling
the key constraint is called a superkey
 E.g. sid is a key for Students. (What about name?). The
set (sid,gpa) is a superkey

421B: Database Systems - The Relational Model
14
Primary and Candidate Keys in SQL

Possibly many candidate keys
exist, one of which is chosen as
the primary key
 Each student has a unique id.
 For a given student and course,
CREATE TABLE Students
(sid CHAR(20) PRIMARY KEY,
name VARCHAR2(20),
…
there is a single grade; further, no
two students in a course receive
CREATE TABLE Enrolled
the same grade


Application dependent
Defined carelessly, an IC can
prevent the storage of database
instances that arise in practice!
421B: Database Systems - The Relational Model
(sid CHAR(20),
cid CHAR(20),
grade CHAR(2),
PRIMARY KEY (sid,cid),
UNIQUE (cid,grade))
15
Foreign Key

Foreign Key: Set of fields in one relation that is used to
“refer” to a tuple in another relation.
 Must correspond to the primary key of the second relation.
 Represents a “logical pointer”.

Examples
 in relation Enrolled, sid is a foreign key referring to Students:
Students(sid:CHAR(20),name:VARCHAR(20),login:CHAR(10),
faculty:VARCHAR(20), gpa:REAL)
Enrolled(sid:CHAR(20),cid:CAHR(20),grade:CHAR(2))
421B: Database Systems - The Relational Model
16
Referential Integrity

Foreign Key Constraint: the foreign key value
of a tuple must represent an existing tuple in
the referred relation
 Enrollment may only contain a tuple of a student who
exists in the Students relation

If all foreign key constraints are enforced,
referential integrity is achieved, i.e., no
dangling references
421B: Database Systems - The Relational Model
17
Foreign Keys in SQL

Only students listed in
the Students relation
should be allowed to
enroll for courses
sid
53666
53688
53650
name
Bartoli
Chang
Chang
login
CREATE TABLE Enrolled
(sid CHAR(20),
cid CHAR(20),
grade CHAR(2),
PRIMARY KEY (sid,cid)
FOREIGN KEY (sid)
REFERENCES Students)
faculty
gpa
bartoli@cs Science 3.4
chang@eecs Eng
3.2
chang@math Science 3.8
421B: Database Systems - The Relational Model
sid
cid
grade
53666 Topology112 C
53666 Reggae203
B
53650 Topology112 A
53668 History105
B
18
Enforcing Referential Integrity

An Enrolled tuple with a sid is inserted such that no
tuple with this sid exists in Students
 Disallow insertion

A Students tuple is deleted
 Delete all Enrolled tuples that refer to it
 Disallow the deletion of a Students tuple to which Enrolled
tuples point
 Set sid in Enrolled tuples that refer to it to “default sid”
 (in SQL set sid in Enrolled tuples that refer to it to NULL value)

The primary key of a Students tuple is changed
 Update the sid of all Enrolled tuples that refer to the original
value
 Further options similar to delete
421B: Database Systems - The Relational Model
19
Referential Integrity in SQL/92

SQL standard supports all CREATE TABLE Enrolled
4 options on deletes and
(sid CHAR(20),
updates
cid CHAR(20),
 Default is NO ACTION

delete/update is rejected
 CASCADE

also delete/update all
tuples that refer to the
deleted/updated tuple
 SET NULL / SET DEFAULT

grade CHAR(2),
PRIMARY KEY (sid,cid)
FOREIGN KEY (sid)
REFERENCES Students
ON DELETE CASCADE
ON UPDATE SET NULL)
Set foreign key value of
referencing tuple to NULL
/ given default
421B: Database Systems - The Relational Model
20
Where do ICs come from?
ICs are based on the semantics of the real-world
enterprise that is being described in the database
relations
 We can check a database instance to see if an IC is
violated, but we can NEVER infer that an IC is true by
looking at an instance

 An IC is a statement about all possible instances
 For example, we might think that the student name is a key
because in a given instance there do not exist two tuples with
the same name; but it might happen in the future.

Key and foreign key ICs are the most common; more
general ICs supported, too. (Discussed later in the
course)
421B: Database Systems - The Relational Model
21
Logical Design: ER to Relational

Entity sets to tables
name
eid
salary

CREATE TABLE Employees
(eid CHAR(11),
name VARCHAR(20),
salary REAL,
PRIMARY KEY (eid))
Employees

Departments
budget
did
Employees(eid, name, salary)
Departments(did, dname, budget)
CREATE TABLE Departments
(did INTEGER,
dname CHAR(20),
budget REAL,
PRIMARY KEY (did))
dname
421B: Database Systems - The Relational Model
22
Many-to-many Relationship Sets

Map relationship set to table. Attributes of the table
must include
 Keys for each participating entity set (as foreign keys)
 This set of attributes forms the key for the relation
 All descriptive attributes

eid
name
salary
Employees
since
Works_in
Departments
did dname budget
421B: Database Systems - The Relational Model
Works_in(eid, did, since)
CREATE TABLE Works_In
(eid CHAR(11),
did INTEGER,
since DATE,
PRIMARY KEY (eid,did),
FOREIGN KEY (eid)
REFERENCES Employees,
FOREIGN KEY (did)
REFERENCES Departments)
23
Relationships Sets with Key Constraints

Alternative 1: map relationship set to table
 Many-one from entity set E1 to entity set E2: key of E1
 i.e., key of entity-set with the key constraint is the key for the new
relationship table (did is now the key)
 One-one: key of either entity set
 Separate tables for entity sets (Employees and Departments)
eid
name
salary
Employees
Manages
since
Departments
did
dname
budget
421B: Database Systems - The Relational Model

Manages(eid, did, since)
CREATE TABLE Manages
(eid CHAR(11),
did INTEGER,
since DATE,
PRIMARY KEY (did),
FOREIGN KEY (eid)
REFERENCES Employees,
FOREIGN KEY (did)
REFERENCES Departments)
24
Relationships Sets with Key Constraints
(contd.)

Alternative II: include relationship set in table of the
entity set with the key constraint
 Possible because there is at most one relationship per entity
 Not useful if many entities do not have a relationship (wasted
space, many not filled values)

eid
name
salary
Employees
Manages
since
Departments
did
dname
budget
421B: Database Systems - The Relational Model
DepartmentsM(did, dname,
budget, eidmgr, since)
CREATE TABLE DepartmentsM
(did INTEGER,
dname CHAR(20),
budget REAL,
eidmgr CHAR(11),
since DATE,
PRIMARY KEY (did),
FOREIGN KEY (eidmgr)
REFERENCES EMPLOYEES(eid))
25
Relationship Sets with Participation
Constraints and Key Constraints

Include relationship set in table of the entity set with
the key constraint and the participation constraint
 We can capture participation constraints involving one entity set in a
binary relationship if it also has a key constraint, but little else (at
least within the table definitions)

eid
name
salary
Employees
Manages
since
Departments
did
dname
budget
421B: Database Systems - The Relational Model
DepartmentsM(did, dname,
budget, eidmgr, since)
CREATE TABLE DepartmentsM
(did INTEGER,
dname CHAR(20),
budget REAL,
eidmgr CHAR(11) NOT NULL,
since DATE,
PRIMARY KEY (did),
FOREIGN KEY (eidmgr)
REFERENCES EMPLOYEES(eid))
26
Renaming

In the case the keys of the participating entity sets
have the same names we must rename attributes
accordingly

name
eid
Reports_To(super-eid, sub-eid)
salary
CREATE TABLE Reports_To
Employees
(supervisor_eid CHAR(11),
subordinate_eid CHAR(11),
supervisor
subordinate
PRIMARY KEY (supervisor_eid,
subordinate_eid),
Reports_To
FOREIGN KEY (supervisor_eid)
REFERENCES Employees(eid),
FOREIGN KEY (subordinate_eid)
REFERENCES Employees(eid))
421B: Database Systems - The Relational Model
27
Translating Weak Entity Sets

Weak entity set and identifying relationship set are
translated into a single table
 When the owner entity is deleted, all owned weak entities must
also be deleted

eid
name
salary
Employees
Policy
cost
Dependants
pname
dateofbirth
421B: Database Systems - The Relational Model
Dependants_Policy(pname, dob,
cost, eid)
CREATE TABLE Dependants_Policy
(pname CHAR(20),
dob DATE,
cost REAL,
eid CHAR(11),
PRIMARY KEY (pname,eid),
FOREIGN KEY (eid)
REFERENCES Employees,
ON DELETE CASCADE)
28
Translating ISA Hierarchies

General Approach: distribute information among relations
 Relation of superclass stores the general attributes and defines key
 Relations of subclasses have key of superclass and addit. attributes
 when a tuple in super-relation is deleted, corresponding tuples in
sub-relation must be deleted


Employees(eid, name, salary)
Contract_Emps(contract_id,eid)
CREATE TABLE Employees
(eid CHAR(20),
name salary
eid
name CHAR(20),salary REAL,
PRIMARY KEY (ssn))
Employees
CREATE TABLE Contract_Emps
hours_worked
(contract_id INTEGER,
Contract_id
hourly_wages
eid CHAR(11),
ISA
PRIMARY KEY (eid),
Hourly_Emps
Contract_Emps FOREIGN KEY (eid)
REFERENCES Employees,
ON DELETE CASCADE)
421B: Database Systems - The Relational Model
29
Translating ISA Hierarchies (contd.)

Object-oriented approach:

 Sub-classes have all attributes;
 if an entity is in a sub-class it does not
appear in the super-class relation;

Pro/Contra:

Employees(eid, name, salary)
Hourly_Emps(eid,name,salary,
hourly_wages,hours_worked)
Contract_Emps(eid, name,
salary,contract_id)
+ A query asking for all hourly employees
only has to go to one relation (in
general approach it has to read two
CREATE TABLE Employees
relations)
(eid CHAR(20) PRIMARY KEY,
- Query on general attributes of all
name CHAR(20),salary REAL)
employees has to read all three tables
- If an entity is both Hourly_emps and CREATE TABLE Hourly_Emps
Contract_emps, name and salary are
(eid CHAR(20) PRIMARY KEY,
stored twice => undesired redundancy
name CHAR(20),salary REAL,
hourly_wages REAL,
hours_worked REAL)
CREATE TABLE Contract_Emps
(eid CHAR(20) PRIMARY KEY,
name CHAR(20),salary REAL,
contract_id INTEGER)
421B: Database Systems - The Relational Model
30
Translating ISA Hierarchies (contd.)

Last Alternative: one big
relation
 Create only one relation
for the root entity set
with all attributes found
anywhere in its network of
subclasses.
 Put NULL in attributes not
relevant to a given entity
421B: Database Systems - The Relational Model

Employees(eid,name,salary,
hourly_wages,hours_worked,
contract_id)
CREATE TABLE Employees
(eid CHAR(20),
name CHAR(20),
salary REAL,
hourly_wages REAL,
hourly_worked REAL,
contract_id INTEGER,
PRIMARY KEY (eid))
31
Translating Aggregation

No key constraints







Projects(pid,started_on,pbudget)
Departments(did,dname,budget)
Employees(eid,name,salary)
Sponsors(pid,did,since)
Monitors(pid,did,eid,until)
Key constraint from
Projects to Departments
 Projects(pid, started_on,
pbudget, did, since)
 No Sponsors
 Monitors(pid, eid, until)
Key constraint from Sponsorsname
to Employees
 Sponsors(pid,did,eid,since,until)
 No Monitors
Started_on
pid
pbudget
Projects
421B: Database Systems - The Relational Model
eid
Employees
salary
Monitors until
since
Sponsors
did dname budget
Departments
32
Relational Model: Summary
A tabular representation of data
 Simple and intuitive, currently the most widely
used model.
 Integrity constraints can be specified based on
application semantics (up to a certain degree).
DBMS checks for violations

 Two important ICs : primary and foreign keys
 In addition, we always have domain constraints
Powerful and natural query languages exist
 Rules to translate ER to relational model

421B: Database Systems - The Relational Model
33
Review: Binary vs. Ternary

Putting any constraints in ssn
upper picture:
 Key constraint on Policies
name
Employees
(to guarantee that each
policy only owned by one
employee), would also mean
that the policy can only
cover one dependent

Constraints of lower
picture:
 Each dependant
determined by one policy
 Each policy is owned by
one employee
covers
pid
Policies
ssn
name
Dependants
cost
pname dob
lot
Employees
Dependants
beneficiary
purchaser
pid
421B: Database Systems - The Relational Model
pname dob
lot
Policies
cost
34
Binary vs. Ternary



Key constraints allow us to
combine Purchaser with
Policies and Beneficiary with
Dependants
Participation constraints
lead to NOT NULL
constraints (or primary key
in case of weak entity)
What if policy is a weak
entity?
421B: Database Systems - The Relational Model


Policies(pid, cost, eid)
Dependants(pname, dob, pid)
CREATE TABLE Policies
(pid INTEGER,
cost REAL,
eid CHAR(11) NOT NULL,
PRIMARY KEY (pid),
FOREIGN KEY (eid)
REFERENCES Employees
ON DELETE CASCADE)
CREATE TABLE Dependants
(pname CHAR(20),
dob DATE,
pid INTEGER,
PRIMARY KEY (pname,pid),
FOREIGN KEY (pid)
REFERENCES Policies,
ON DELETE CASCADE)
35