(b) from - Rutgers ECE
Download
Report
Transcript (b) from - Rutgers ECE
ECE 569
Database System Engineering
Spring 2004
Yanyong Zhang www.ece.rutgers.edu/~yyzhang
Course URL www.ece.rutgers.edu/~yyzhang/spring04
ECE569 Lecture 02.1
Spring 2004
Today’s topic
Relational data model
Different data models exist: relational, network,
hierarchical, object-oriented
Powerful, simple, declarative
Relational algebra
ECE569 Lecture 02.2
Spring 2004
Overview of Database Design
Conceptual design (ER model is used at this stage)
What are the entities and relationships in the enterprise?
What information about these entities and relationships
should we store in the database?
What are the integrity constraints or business rules that
hold
A database ‘schema’ in the ER model can be represented
pictorially
Can map an ER diagram into a relational schema
ECE569 Lecture 02.3
Spring 2004
Entity-Relationship model
To describe the conceptual scheme.
An entity is a thing that exists and is distinguishable.
Entity sets consist of a group of all “similar” entities.
Each entity has certain attributes.
A relationship among entity sets is an ordered list of entity
sets.
Relationship set: collection of similar relationships
An n-ary relationship set R relates n entity sets E1, …, En; each
relationship in R involves entities e1 E1, …, en En
- Same entity set could participate in different relationship sets, or in
different “roles” in same set
ECE569 Lecture 02.4
Spring 2004
Example of entity-relationship model
Date
Balance
Account
Made To
Billed
Patient
From
Address
ECE569 Lecture 02.5
Disease
Vital Sign
Time
Blood
Diagnosed
Treatment
Amount
Pulse
Name
Room#
Payment
Name
Spring 2004
Data Model
A data model is a mathematical formalism with two
parts:
A notation for describing data
A set of operations used to manipulate that data
ECE569 Lecture 02.6
Spring 2004
Why Study the Relational Model?
Most widely used model
“legacy systems” in older models
Vendors: IBM, informix, Microsoft,Oracle,Sybase,etc.
e.g., IBM’s IMS
Recent competitor: object-oriented model
ObjectStore, Versant, Ontos
A synthesis emerging: object-relational model
- Informix Universal server, UniSQL, O2, Oracle, DB2
ECE569 Lecture 02.7
Spring 2004
Relational Database: Definitions
Relational database: a set of relations
Relation: made up of 2 parts
Instance: a table, with rows and columns
- # Rows = cardinality, # columns = degree
Schema: specifies name of relation, plus name and type
of each column
- e.g., students(sid:string, name:string,
login:string,age:integer, gpa:real)
Can think of a relation as a set of rows or tuples (i.e., all
rows are distinct)
ECE569 Lecture 02.8
Spring 2004
Set-theoretic notion of a relation
A domain D is a set of values
colors = {red, blue, green}
age = set of positive integers less than 20
last_name = {a-zA-Z}+
A relation over domains D1,D2,…,Dn is a subset of
D1D2D3…Dn.
Example
R(colors, age) {(red,1), (blue,1),(green,1),
(red,2), (blue,2),(green,2), …}
ECE569 Lecture 02.9
Spring 2004
Set-theoretic notion of a relation (cont)
We are only interested in finite relations.
The members of a relation are called tuples.
Each relation that is a subset of the product of k
domains is k-degree.
A relation can be represented as a table, where
each row is a tuple and each column corresponds
to one domain.
PERSON (NAME, AGE)
Name
Zhang
Merril
Stewart
ECE569 Lecture 02.10
Age
27
2
40
Spring 2004
Set-theoretic notion of a relation (cont)
Because a relation is a set of tuples, the order of
tuples in the table is insignificant (but the order of
domains does matter).
Name
Zhang
Merril
Stewart
Age
27
2
40
Name
Zhang
Stewart
Merril
Age
27
40
2
Age
27
2
40
ECE569 Lecture 02.11
Name
Zhang
Merill
Stewart
Spring 2004
An alternative definition – set of mappings
A relation schema R = {A1,A2,…,An} (no order imposed on
attributes)
A relation instance of R is a finite set of mappings {1, 2, …,
m} where
2: R D
where D = null dom(A1) dom(A2)… dom(An).
Each tuple i maps each attribute in tuple i to a value.
Note that attributes must be given names that are unique in a
relation schema.
Under this definition, all three instances above are
equivalent.
ECE569 Lecture 02.12
Spring 2004
Keys
A set S of attributes is a candidate key for R if
No semantically correct instance of R can include two
tuples such that [s] = [s], and
No proper subset of S satisfies (a)
A set of attributes satisfying (a) but not necessarily
(b) is a superkey.
A relation may have multiple candidate keys. One
can be designated as the primary key.
ECE569 Lecture 02.13
Spring 2004
Example
Schema for a relation of degree four
patient(name,address,balance_due,room#)
An instance of patient with three tuples that
satisfies the key constraint (name is a candidate
key)
patient
Name
Address
Balance_due
Room #
Zhang
1964 highland dr
$5,230
518
Stewart
2342 K St.
$25
203
Brinkley
2342 K St.
$10,200
203
No other candidate key is possible (assuming that
this is a semantically correct instance of patient).
ECE569 Lecture 02.14
Spring 2004
Example – Healthcare DB
patient(name, address, balance_due, room#)
payments(name, amount, date)
vital_signs(name, pulse, blood, date, time)
diagnosis(patient_name, disease_name)
disease(disease_name, treatment)
treats(patient, doctor)
ECE569 Lecture 02.15
Spring 2004
Example instance of students relation
Sid
53666
53688
53650
ECE569 Lecture 02.16
Name
Jones
Smith
Smith
Login
jones@cse
smith@eecs
smith@math
Age
18
18
19
Gpa
3.4
3.2
3.8
Spring 2004
Relational Query Language
A major strength of the relational model: supports
simple, powerful querying of data
Queries can be written intuitively, and the DBMS is
responsible for efficient evaluation
Key: precise semantics for relational queries
Allows the optimizer to extensively re-order operations,
and still ensure that the answer does not change
ECE569 Lecture 02.17
Spring 2004
The SQL Query Language
Developed by IBM (system R) in the 1970s
Need for a standard since it is used by many
vendors
Standards
SQL-86
SQL-89 (minor revision)
SQL-92 (major revision)
SQL-99 (major extensions, current standard)
ECE569 Lecture 02.18
Spring 2004
The SQL Query Language
To find all 18 year old students, we can write
Sid
select *
from student S 53666
where s.age=18 53688
Name
Jones
Smith
Login
jones@cse
smith@eecs
Age Gpa
18 3.4
18 3.2
To find just names and logins, replace the first line:
select s.name, s.login
ECE569 Lecture 02.19
Spring 2004
Logical DB design
Representing entity-relationship diagrams
An entity set E can be represented by a relation whose
relation scheme consists of all the attributes of the entity
set.
Each tuple of the relation represents one entity in the
current instance of E.
A relationship R among E1, E2, …, Ek is represented by a
relation whose relation scheme consists of the attributes
in the keys for each of E1, E2, …, Ek.
ECE569 Lecture 02.20
Spring 2004
Relational Query language
Query language: Allow manipulation and retrieval
of data from a database
Relational model supports simple, powerful QLs:
Strong formal foundation based on logic
Allows for much optimization
Query language != programming language
ECE569 Lecture 02.21
Spring 2004
Formal Relational Query Languages
Two mathematical query languages form the basis
for “real” languages (e.g., SQL) and for
implementation:
Relational algebra: more operational, very useful for
representing execution plans
Relational calculus: let users describe what they want,
rather than how to compute it (Non-operational,
declarative)
ECE569 Lecture 02.22
Spring 2004
Relational Algebra
A set of five basic operations that map one or more
relations to a new relation.
Union: R S t tR or tS
The set of tuples in R or S or both
The degrees of R and S must match
a
b
c
A
B
C
D
E
F
G
H
a
d
c
a
b
c
d
e
d
e
b
e
a
b
a
e
d
a
c
b
b
c
a
a
d
b
b
b
c
b
b
d
e
b
a
d
b
b
b
b
R
ECE569 Lecture 02.23
S
T
RT
Spring 2004
Relational Algebra (cont)
Difference: R - S t tR and tS
The set of tuples in R, but not in S
The degrees of R and S must match
A
B
C
D
E
F
G
H
a
d
c
a
b
c
d
e
d
e
b
e
a
b
a
e
d
a
c
b
b
c
a
a
d
b
b
b
c
b
b
R
ECE569 Lecture 02.24
S
R-T
T
Spring 2004
Relational Algebra (cont)
Cartesian Product: R X S
(a1, a2, …, ar+s) (a1, a2, …, ar) R and (ar+1, ar+2, …, ar+s) S
R x S is of degree r+s where r=degree(R) and s=degree(s)
a tuple t is in R x S if its first r components match those
of a tuple in R and its last s components from a tuple in S
R.A R.B R.C S.D S.E
A
B
C
D
E
F
G
H
A
B
C
D
E
a
b
c
d
e
d
e
b
a
b
c
d
e
a
e
d
a
c
b
b
c
a
a
d
b
b
b
c
b
b
a
b
c
b
c
a
d
c
d
e
a
d
c
b
c
e
a
b
d
e
e
a
b
b
c
R
S
T
RxS
ECE569 Lecture 02.25
Spring 2004
Relational Algebra (cont)
Projection: i1, i2, …, im (R) (ai1, ai2, …, aim) (a1, a2, …, an) R
i1, i2, …, im (R) is of degree m
A tuple t in the result is obtained by removing and/or
rearranging the attributes of a tuple of R
A
B
C
D
E
F
G
H
G
F
a
b
c
d
e
d
e
b
e
d
a
e
d
a
c
b
b
c
a
a
d
b
b
b
c
b
b
b
a
b
d
R
ECE569 Lecture 02.26
S
T
2,1 (T)
Spring 2004
Relational Algebra (cont)
Selection: F(R) t t R and F(t)
F(R) is of the same degree as R
All tuples in R that satisfy predicate F are included in
result
F is a formula involving
- Constants or components of the tuple – component i is
denoted $I
- Comparison operators, <, =, >, , ,
- Logical operators and, or, not
F
G
H
d
e
b
a
a
d
b
b
b
c
b
b
T
ECE569 Lecture 02.27
F
G
H
d
e
b
a
b
b
d
b
b
$1=d or $2=$3 (T)
Spring 2004
Examples
Find the names of all patients with a balance of
more than $10,000.
Find the names of all patients that have a pulse
less than 50 or have been diagnosed with
hypertension.
ECE569 Lecture 02.28
Spring 2004
Examples (cont)
Find all patients whose treatment includes the
“application of leeches”
Print the names of all pairs of patients that
occupy the same hospital room and have been
diagnosed with the same disease.
ECE569 Lecture 02.29
Spring 2004
Examples (cont)
Print the name of the patient with the highest pulse
recorded today
Find all patients that had pulse measured today
Find the set of all ‘losers’, i.e., those whose pulse is less
than or equal to that of some other patient
Subtract (b) from (a) and project out patient name
ECE569 Lecture 02.30
Spring 2004
Examples (cont)
List all patients that suffer from at least one illness
that no other patient suffers from
ECE569 Lecture 02.31
Spring 2004
Additional Operations
Intersection: R S t tR and tS
The set of tuples in both R and S
The degrees of R and S must match
R S = R – (R – S)
S
R
R–S
R – ( R – S)
ECE569 Lecture 02.32
Spring 2004
Additional Operations (cont)
Theta-join: R ij S $i $r+j(R S) where r =
degree(R) and <, =, >, , ,
When is ‘=‘ operation is called equijoin.
Find patient with highest pulse
ECE569 Lecture 02.33
Spring 2004
Additional Operations (cont)
Natural Join: R
S
An equijoin for all attributes that R and S have in
common.
The shared columns originating in S are projected out
R
S i1, i2, …, im R.A1 = S. A1 and … and R.Ak = S.Ak (R x S)
where i1, i2, … im is the list of all attributes of R x S, in
order, except the attributes S.A1, …, S.Ak.
Example: Names of patients being treated with leeches.
ECE569 Lecture 02.34
Spring 2004
Sample Queries
Consider the following set of relations
frequents (drinker, bar)
serves (bar, beer)
likes (drinker, beer)
List the drinkers that frequent at least one bar that
serves a beer they like
Construct the relation should_visit(drinker,bar)
consisting of all tuples <d, b> where bar b serves a
beer that drinker d likes.
ECE569 Lecture 02.35
Spring 2004
Sample Queries (cont)
List the drinkers that frequent only bars that serve
some beer they like. (Assume every drinker
frequents at least one bar.)
Print the drinkers that frequent no bar that serves a
beer that they like
ECE569 Lecture 02.36
Spring 2004
Questions on Healthcare DB
Retrieve all pairs of patients that shared a room at
the same time
List all patients that finished paying their bill
before one of their roomates began paying theirs.
Print the names of all pairs of patients that occupy
the same hospital room and have been diagnosed
with the same disease.
ECE569 Lecture 02.37
Spring 2004
Components of Data-Intensive Systems
Three separate types of functionalities
Data management
Application logic
Presentation
The system architecture determines whether these three
components reside on a single system (“tier”) or are
distributed across several tiers
ECE569 Lecture 02.38
Spring 2004
Single-tier architecture
All functionality combined into a single tier, usually
on a mainframe
Advantages:
Users access through dumb terminals
Easy maintenance and administration
Disadvantages:
Today, users expect graphical user interface
Centralized computation of all of them is too much for a
central system
ECE569 Lecture 02.39
Spring 2004
Client-Server Architecture
Thin client: work division
Client implements only the graphical user interface
Server implements business logic and data management
Thick client:
work division
- Client implements both the graphical user interface and the
business logic
- Server implements data management
Disadvantages:
- No central place to update the business logic
- Security issues: server needs to trust clients
- Scalability: does not scale more than several 100s of clients
ECE569 Lecture 02.40
Spring 2004
The three-tier architecture
Client program
(web browser)
Application Server
(Tomcat, Apache)
Database System
(DB2)
ECE569 Lecture 02.41
Spring 2004