Transcript lecture 3_2

An introduction to
relational database RDB
Sept. 2005
1
What are behind the scene of
RDBMS?
• Functional components
–
–
–
–
–
–
SQL language parser and processor
Query engine
Query optimizer
Storage management
Transaction Management (concurrency, recovery)
…
2
Outline: Relational Data Model
•Data Model
•Relational Data Model
-
relation schema, relations
-
database schema, database state
-
integrity constraints and updating
•Relational algebra (Won’t be discussed in detail,
there will be a separate presentation on it)
Sept. 2005
-
select, project, join, cartesian product
-
set operations:
union, intersection, difference, division
3
What is a model?
• In general, a model is a theoretical
construct that represents complex
information and processes by a set of
variables and a set of logical and
quantitative relationships between them.
• A model supports a set of operands and a
set of functions (also called operators).
Sept. 2005
4
Other definition
• Models are simplified frameworks designed to represent
the all individual complex processes which can be fit into
the model.
• An abstract construct or idea that is a simplified view of
reality. It must enable you to make testable predictions of
what will happen under new circumstances.
• A stylized simplification of reality in which behavior is
represented by variables and assumptions about how
they are determined and interact. Models enable one to
think consistently and logically about complex issues, to
work out how changes in a modeled system matter, and
(sometimes) to response to user requirements.
Sept. 2005
5
• A DBMS allows a user to define the data
to be stored in terms of a data model,
which is a collection of high-level (abstract
level) data description constructs that hide
many low level storage details.
Sept. 2005
6
• A database model is a theory or specification
describing how a database is structured and
used. Several such models have been
suggested:
–
–
–
–
–
–
Hierarchical model
Network model
Relational model
Object model
Object-Relational model
…
Sept. 2005
7
• A data model is a model that is specifically
designed for data. It is a collection of
descriptions of data structures and their
contained fields, together with the
operations or functions that manipulate
them.
Sept. 2005
8
A data model should support
• Structure: defines how data are organized
(hierarchical, network, relational, objectoriented).
• Integrity: provides a language for the definition
of rules that restrict which instances of the
defined structure are allowed.
• Manipulation: provides a language in which
updates of the data can be expressed.
• Querying: provides a language in which the data
can be queried.
Sept. 2005
9
• A description of data in terms of a data
model is called a schema.
• Data themselves in terms of a data model
are called instances.
Sept. 2005
10
Three-schema architecture
External
view
Sept. 2005
External
view
A specific user or
groups view of the
database
logical
schema
Describes the
whole database
for all users
Internal
schema
Physical storage
structures and
details
11
Data independence
Ability to change the database at one level with no impact to
the next higher level
•physical data independence - the ability to change the
physical schema without affecting the conceptual schema
•typical example: add a new index
•logical data independence - the ability to change the
conceptual schema without affecting existing external views
or application programs
•typical example: add an attribute
Sept. 2005
12
Relational Data Model
• Most database management systems
today are based on the relational data
model, which is dominant in marketplace.
• Relational data model was first introduced
in 1970 by Ted Codd (IBM).
Sept. 2005
13
• The fundamental assumption of the relational
model is that all data are represented as
mathematical relations, i.e., a subset of the
Cartesian product of n sets.
• In the mathematical model, reasoning about
such data is done in two-valued predicate logic
meaning there are two possible evaluations for
each proposition: either true or false. Data are
operated upon by means of a relational calculus
and algebra.
• If NULL is considered, three values (true, false,
and NULLs).
Sept. 2005
14
How the Programmer Sees
the RDBMS
• Tables:
Students:
SSN
123-45-6789
234-56-7890
Takes:
Name
Charles
Dan
…
Category
undergrad
grad
…
SSN
123-45-6789
123-45-6789
234-56-7890
Courses:
CID
CSE444
CSE541
Name
Databases
Operating systems
CID
CSE444
CSE444
CSE142
…
Quarter
fall
winter
• Still implemented as files, but behind the scenes
can be quite complex
“data independence” = separate logical view
from physical implementation
15
A Relation is a Table
Attributes
(column
headers)
Tuples
(rows)
Sept. 2005
name
manf
Winterbrew
Pete’s
Bud Lite
Anheuser-Busch
Beers
16
Relation schema vs. relation
instances
• The main construct for representing data
in the relational model is a relation.
• A relation consists of a relation schema
and relation instances.
• Intuitively, the relation instance is a table,
and the relation schema describes the
column heads for the table.
Sept. 2005
17
Relational Model
• Each relation is a table with rows and
columns, which can be thought of as a set
of records, each of which has the same
number (and type) of fields.
• A relational database is a collection of one
or more relations (tables).
• The idea is very simple and elegant.
Sept. 2005
18
Schema
• In the relational model, the schema for a
relation specifies its name, the name of
each field (or attribute or column), and the
domain (type) of each field.
• Also constraints, as we will see later.
Sept. 2005
19
A relation schema R, denoted by R(A1, …, An), is made up
of a relation name R and a list of attributes A1, …, An.
A relation r(R) is a mathematical relation of degree n on the
domains dom(A1), dom(A2), … dom(An), which is a subset
of the Cartesian product of the domains that define R:
r(R)  (dom(A1)  (dom(A2)  …  (dom(An))
formal terms
relation
tuple
attribute
domain
Sept. 2005
informal
table
row
column header
data type describing column values
20
Cartesian product
Emp(SSN, name, sex)
1
2
3
J
D
m
f
1
J = {(1, J), (1, D), (2, J), (2, D), (3, J), (3, D)}
2  D
3
Sept. 2005
21
Cartesian product
1
J
m


2
D
f
3
= {(1, J, m), (1, D, m), (2, J, m), (2, D, m), (3, J, m), (3, D, m),
(1, J, f), (1, D, f), (2, J, f), (2, D, f), (3, J, f), (3, D, f)}
Emp(SSN, name, sex)
1
2
Sept. 2005
J
D
m
f
22
Domain
A domain is a set of atomic values from which values can be
drawn
•examples
- social security numbers: set of valid 9-digit social
insurance numbers
- names: set of names of persons
- grade point average: possible values of computed grade
point averages; each must be a real number between 0 and
4.5.
Sept. 2005
23
Domain
In many systems one specifies a data type (e.g. integer, date,
string(20), …) and writes supporting application code to
enforce any specific constraints (e.g. a SIN must be a 9-digit
number).
Domain = proper Data type + proper check constraints (you can
think so and we will discuss why)
Attribute
An attribute Ai is a name given to the role a domain plays in a
relation schema R.
Relation (or Relation State)
A relation, or relation state, r of the relation schema R(A1, A2,
… An) is a set of n-tuples r={t1, t2, … tm}, where each n-tuple is
Sept. an
2005ordered list of n values t =< v , v , … v > (i = 1, …, m). 24
i
1 2
n
Relation Schema example
EMPLOYEE(Name, SSN, HomePhone, Address, OfficePhone, …)
EMPLOYEE Relation example:
EMPLOYEE
Sept. 2005
Name
SSN
HomePhone
Address
Benjamin Bayer 305-61-2435 373-1616
2918 Bluebonnet Lane ...
Katherine Ashly 381-62-1245 375-4409
125 Kirby Road
...
Dick Davidson
3452 Elgin Road
...
422-11-2320 null
25
Some characteristics of relations
•Unordered attributes from schema point of view, but when
instances are considered, values order should match attribute
order.
•no ordering of tuples
•each value in a tuple is atomic
•no composite values
•separate relation tuples for multivalued attributes
•some attributes may be null
•no value
•value missing/unknown
•a relation is an assertion
•e.g. a employee entity has a Name, SSN, HomePhone, etc
Sept. 2005•each tuple is a fact or a particular instance
26
•some relations store facts about relationships
employee
works
project
Z
jones
A
B
smith
jones
z
jones
A
jones
C
smith
z
smith
B
smith
D
C
D
Sept. 2005
27
Relational Database
•a relational database schema S is a set of relation schemas
S = {R1, R2, ...} and a set of integrity constraints IC.
•A relational database state DB of S is a set of relation states
DB={r(R1), r(R2), ...}
Sept. 2005
28
EMPLOYEE
fname, minit, lname, ssn, bdate, address, sex, salary, superssn, dno
DEPARTMENT
Dname, dnumber, mgrssn, mgrstartdate
DEPT _LOCATIONS
Dnumber, dlocation
PROJECT
Pname, pnumber, plocation, dnum
WORKS ON
DEPENDENT
Essn pno, hours
A database
schema:
Essn, dependentname, sex, bdate, relationship
Sept. 2005
29
EMPLOYEE
fname, minit, lname, ssn, bdate, address, sex, salary, superssn, dno
John
B
Smith
123489 1965-01-09 731 Fondren
M
40000
343488
5
Franklin
T
Wong
239979 1955-01-10
M
50000
343488
5
638 Voss
DEPARTMENT
r(EMPLOYEE)
Dname, dnumber, mgrssn, mgrstartdate
Research
5
343488
1988-05-22
DEPT _LOCATIONS
Dnumber, dlocation
5
Houston
6
Stafford
r(DEPARTMENT)
A database
state:
r(DEPT_LOCATION)
Sept. 2005
30
Integrity Constraints
•any database will have some number of constraints that must
be applied to ensure correct data (valid states)
1. domain constraints
•a domain is a restriction on the set of valid values
•domain constraints specify that the value of each
attribute A must be an atomic value from the domain
dom(A).
2. key constraints
•a superkey is any combination of attributes that
uniquely identify a tuple: t1[superkey]  t2[superkey].
- Example: <Name, SSN> (in Employee)
•a key is superkey that has a minimal set of attributes
- Example: <SSN> (in Employee)
Sept. 2005
31
Integrity Constraints
•If a relation schema has more than one key, each of them is
called a candidate key.
•one candidate key is chosen as the primary key (PK)
•foreign key (FK) is defined as follows:
i) Consider two relation schemas R1 and R2;
ii) The attributes in FK in R1 have the same domain(s) as the
primary key attributes PK in R2; the attributes FK are said to
reference or refer to the relation R2;
iii) A value of FK in a tuple t1 of the current state r(R1) either
occurs as a value of PK for some tuple t2 in the current state
r(R2) or is null. In the former case, we have t1[FK] = t2[PK],
and we say that the tuple t1 references or refers to the tuple t2.
Example:
FK
Employee(SSN, …, Dno)
Sept. 2005
Dept(Dno, … )
32
Integrity Constraints
3. entity integrity
•no part of a PK can be null
4. referential integrity
•domain of FK must be same as domain of PK
•FK must be null or have a value that appears as a PK
value
5. semantic integrity
•other rules that the application domain requires:
•state constraint: gross salary > net income
•transition constraint: Widowed can only follow
Married; salary of an employee cannot decrease
Sept. 2005
33
More about keys
• Surrogate key
• Composite key
Sept. 2005
34
Summarizing different Constraints
• Rules that restrict the data values that you can enter into a field
in a database table
• Integrity constraints: define primary and foreign keys
• Value constraints: define specific data values or data ranges
that must be inserted into columns and whether values must be
unique or not NULL
• Table constraint: restricts the data value with respect to all
other values in the table
• Field constraint: limits the value that can be placed in a specific
field, irrespective of values that exist in other table records
Sept. 2005
35
EMPLOYEE
fname, minit, lname, ssn, bdate, address, sex, salary, superssn, dno
DEPARTMENT
Dname, dnumber, mgrssn, mgrstartdate
Dnumber, dlocation
DEPT _LOCATIONS
PROJECT
Pname, pnumber, plocation, dnum
Essn, pno, hours
WORKS_ON
DEPENDENT
reference integrity
Essn, dependentname, sex, bdate, relationship
Sept. 2005
36
Updating and constraints
insert
•Insert the following tuple into EMPLOYEE:
<‘Cecilia’, ‘F’, ‘Kolonsky’, ‘677678989’, ‘1960-04-05’, ‘6357 Windy
Lane, Katy, TX’, F, 40000, null, 4>
•When inserting, the integrity constraints should be checked:
domain, key, entity, referential, semantic integrity
update
•Update the SALARY of the EMPLOYEE tuple with ssn =
‘999887777’ to 30000.
•When updating, the integrity constraints should be checked:
domain, key, entity, referential, semantic integrity
Sept. 2005
37
Updating and constraints
delete
•Delete the WORK_ON tuple with Essn = ‘999887777’ and
pno = 10.
•When deleting, the referential constraint will be checked.
- The following deletion is not acceptable:
Delete the EMPLOYEE tuple with ssn = ‘999887777’
- reject, cascade, modify
Sept. 2005
38
cascade – a strategy to enforce referential integrity
Employee
ssn
...
123456789
...
Works-on
Essn
123456789
...
Sept. 2005
delete
Pno
5
...
delete
39
cascade – a strategy to enforce referential integrity
Employee
ssn
... ...
supervisor
234589710
123456789
... ...
234589710
null
delete
Employee
ssn
123456789
... ...
supervisor
234589710
... ...
234589710
Sept. 2005
not reasonable
delete
null
delete
40
Modify – a strategy to enforce referential integrity
Employee
ssn
...
123456789
...
Works-on
Essn
123456789
...
delete
Pno
5
...
Works-on
Essn
null
...
Pno
5
...
This violates the entity constraint.
Sept. 2005
41
Modify – a strategy to enforce referential integrity
Employee
ssn
...
123456789
...
Department
...
Dno
5
...
delete
chairman
123456789
Department
...
Dno
5
...
chairman
null
This does not violate the entity constraint.
Sept. 2005
42
Schema: a description of a database -- meta data
•the intension of the database
Schema evolution: if the database definition changes, we
say it evolves.
Database State: the data in the database at a particular point
in time
•the extension of the schema
Sept. 2005
43
Sample database
Course CName CNo CrHrs Dept
Database 8803 3
CS
C
2606 3
CS
Student Name StNo Class Major
Smith 17
1
CS
Brown
8
2
CS
Grades StNo SId Grade
17 25 A
17 43 B
Section SId CNo Semester Yr
Instructor
32 8803 Spring 2000 Smith
25 8803 Winter 2000 Smith
43
Sept. 2005
2606
Spring 2000 Jones
44
University database schema:
Student
Course
Cname
CNo
CrHrs
Dept.
string(20)
string(20)
integer
String(20)
Sept. 2005
string(20)
integer
integer
String(20)
Section
grades
StNo
SId
Grade
Name
StNo
Class
Major
integer
integer
String(20)
SID
CNo
Semester
Yr
Instructor
integer
string(20)
integer
integer
string(20)
45
Schema evolution:
Student
Course
Cname
CNo
CrHrs
Dept.
string(20)
string(20)
integer
String(20)
Name
StNo
Class
Major
string(20)
integer
integer
String(20)
Prerequisite
… ...
Sept. 2005
CNo
Pre-CNo
string(20)
string(20)
46
Database evolution:
Student
Course
Cname
Cno
CrHrs
Dept.
string(20)
string(20)
integer
String(20)
Name
StNo
Class
Major
Age
Sex
string(20)
integer
integer
String(20)
integer
string(20)
… ...
Sept. 2005
47
Sample database
Course CName CNo CrHrs Dept
Database 8803 3
CS
C
2606 3
CS
Student Name StNo Class Major
Smith 17
1
CS
Brown
8
2
CS
Grades StNo Sid Grade
17 25 A
17 43 B
Section SId CNo Semester Yr
Instructor
32 8803 Spring 2000 Smith
25 8803 Winter 2000 Smith
43
Sept. 2005
2606
Spring 2000 Jones
48
Database state changed:
Course CName CNo CrHrs Dept
Database 8803 3
CS
C
2606 3
CS
Student Name StNo Class Major
Smith 17
1
CS
Brown
8
2
CS
Grades StNo Sid Grade
17 25 A
17 43 B
Section SId CNo Semester Yr
Instructor
32 8803 Spring 2000 Smith
25 8803 Winter 2000 Smith
Sept. 2005
49
We define the database
Definition
Construction
Manipulation
database state is the empty state with no
data
initial state when database is populated
(loaded)
current state changes with each insert,
delete, update
hopefully, the database goes from one
correct/valid state to another
Sept. 2005
50
Mathematical Foundations
• Relational Model is based on Strong
Mathematical Foundations. The
Relational model is based on
• (1) Set Theory,
• (2) Mathematical Relations,
• (3) Modern Algebra,
• (4) Logic, and
• (5) Lattice Theory.
Sept. 2005
51
Relational Algebra
a set of relations
relation specific
a set of operations
set operations
Sept. 2005
select
project
join
division
union
intersection
difference
cartesian product
52
Relational algebra
select
•horizontal subset
project
•vertical subset
join (equijoin, natural join, inner, outer)
•combine multiple relations
cartesian product
union, intersection, difference
division
Sept. 2005
53
Relational algebra
select
•horizontal subset
project
•vertical subset
join (equijoin, natural join, inner, outer)
•combine multiple relations
cartesian product
union, intersection, difference
Division (won’t be tested)
Sept. 2005
54
Relational algebra - Select
•horizontal subset
•symbol: 
•boolean condition for row filter
•e.g. employees earning more than 30,000
•salary>30000(Employee)
fname
minit … salary ...
Franklin T
… 40000
...
Jennifer S
… 43000
...
James
… 55000
...
Sept. 2005
E
55
Relational algebra - Project
•vertical subset
fname
minit lname
•symbol: 
John
B
Sarah
•e.g. names of employees
Franklin
T
Wong
• fname, minit, lname(Employee)
Alicia
J
Zalaya
Jennifer
S
Wallace
Ramesh
K
Narayan
Joyce
A
English
Ahmad
V
Jabbar
James
E
Borg
Sept. 2005
56
Relational algebra - Join
•join or combine tuples from two relations into single tuples
•symbol:
•boolean condition specifies the join condition
•e.g. to report on employees and their dependents
•Employee
fname
Sept. 2005
ssn=essn
minit … essn
Dependent
dependent_name …
57
Relational algebra - Join
•Employee
ssn=essn
Dependent
Essn
fname
minit
…
ssn
Franklin
T
… 333445555
Jennifer
S
… 987654321
John
B
… 123456789
dependent_name ...
333445555 Alice
333445555 Theodore
333445555 Joy
987654321 Abner
123456789 Michael
123456789 Alice
123456789 Elizabeth
Sept. 2005
58
Employee ssn=essn Dependent
fname
minit
Franklin
T
Franklin
T
…
333445555
Theodore
Franklin
T
…
333445555
Joy
Jennifer
S
…
987654321
Abner
John
B
…
123456789
Michael
John
B
…
123456789
Alice
John
B
…
123456789
Elizabeth
Sept. 2005
ssn
essn
333445555 333445555
dependent_name ...
Alice
59
Relational algebra - Join
•what is the result of
•Employee
Dependent ?
•Note there is no join condition
•List all attributes from two tables
fname
Sept. 2005
minit
…
essn
dependent_name ...
60
Sample data
Emplname
A
ssn
Brith-date grade
seniority salary
121
1948/2/12 2
7
80k
A
132
1952/3/24 2
5
60k
B
C
101
106
1972/1/2
1987/3/3
6
7
70k
80k
Sept. 2005
4
3
61
Sample queries
Find out names of all employees born before 1960/11/1
Emplname
A
ssn
Brith-date grade
seniority salary
121
1948/2/12 2
7
80k
A
132
1952/3/24 2
5
60k
B
C
101
106
1972/1/2
1987/3/3
6
7
70k
80k
Sept. 2005
4
3
62
• More detailed discussion of relational
algebra will be presented in a separate
lecture!
Sept. 2005
63
Transactions
• A transaction = sequence of statements
that either all succeed, or all fail
• Transactions have the ACID properties:
A = atomicity
C = consistency
I = isolation
D = durability
64
Transactions
• Enroll “Mary Johnson” in “CSCI242”:
BEGIN TRANSACTION;
INSERT INTO Takes
SELECT Students.SSN, Courses.CID
FROM Students, Courses
WHERE Students.name = ‘Mary Johnson’ and
Courses.name = ‘CSCI242’
-- More updates here....
IF everything-went-OK
THEN COMMIT;
ELSE ROLLBACK
If system crashes, the transaction is still either committed or aborted
65