(b) from - Rutgers ECE

Download Report

Transcript (b) from - Rutgers ECE

ECE 569
Database System Engineering
Spring 2004
Yanyong Zhang www.ece.rutgers.edu/~yyzhang
Course URL www.ece.rutgers.edu/~yyzhang/spring04
ECE569 Lecture 02.1
Spring 2004
Today’s topic


Relational data model

Different data models exist: relational, network,
hierarchical, object-oriented

Powerful, simple, declarative
Relational algebra
ECE569 Lecture 02.2
Spring 2004
Overview of Database Design

Conceptual design (ER model is used at this stage)

What are the entities and relationships in the enterprise?

What information about these entities and relationships
should we store in the database?

What are the integrity constraints or business rules that
hold

A database ‘schema’ in the ER model can be represented
pictorially

Can map an ER diagram into a relational schema
ECE569 Lecture 02.3
Spring 2004
Entity-Relationship model

To describe the conceptual scheme.

An entity is a thing that exists and is distinguishable.

Entity sets consist of a group of all “similar” entities.

Each entity has certain attributes.

A relationship among entity sets is an ordered list of entity
sets.

Relationship set: collection of similar relationships

An n-ary relationship set R relates n entity sets E1, …, En; each
relationship in R involves entities e1 E1, …, en En
- Same entity set could participate in different relationship sets, or in
different “roles” in same set
ECE569 Lecture 02.4
Spring 2004
Example of entity-relationship model
Date
Balance
Account
Made To
Billed
Patient
From
Address
ECE569 Lecture 02.5
Disease
Vital Sign
Time
Blood
Diagnosed
Treatment
Amount
Pulse
Name
Room#
Payment
Name
Spring 2004
Data Model

A data model is a mathematical formalism with two
parts:

A notation for describing data

A set of operations used to manipulate that data
ECE569 Lecture 02.6
Spring 2004
Why Study the Relational Model?

Most widely used model


“legacy systems” in older models


Vendors: IBM, informix, Microsoft,Oracle,Sybase,etc.
e.g., IBM’s IMS
Recent competitor: object-oriented model

ObjectStore, Versant, Ontos

A synthesis emerging: object-relational model
- Informix Universal server, UniSQL, O2, Oracle, DB2
ECE569 Lecture 02.7
Spring 2004
Relational Database: Definitions

Relational database: a set of relations

Relation: made up of 2 parts

Instance: a table, with rows and columns
- # Rows = cardinality, # columns = degree

Schema: specifies name of relation, plus name and type
of each column
- e.g., students(sid:string, name:string,
login:string,age:integer, gpa:real)

Can think of a relation as a set of rows or tuples (i.e., all
rows are distinct)
ECE569 Lecture 02.8
Spring 2004
Set-theoretic notion of a relation

A domain D is a set of values

colors = {red, blue, green}

age = set of positive integers less than 20

last_name = {a-zA-Z}+

A relation over domains D1,D2,…,Dn is a subset of
D1D2D3…Dn.

Example

R(colors, age)  {(red,1), (blue,1),(green,1),
(red,2), (blue,2),(green,2), …}
ECE569 Lecture 02.9
Spring 2004
Set-theoretic notion of a relation (cont)

We are only interested in finite relations.

The members of a relation are called tuples.

Each relation that is a subset of the product of k
domains is k-degree.

A relation can be represented as a table, where
each row is a tuple and each column corresponds
to one domain.
PERSON (NAME, AGE)
Name
Zhang
Merril
Stewart
ECE569 Lecture 02.10
Age
27
2
40
Spring 2004
Set-theoretic notion of a relation (cont)

Because a relation is a set of tuples, the order of
tuples in the table is insignificant (but the order of
domains does matter).
Name
Zhang
Merril
Stewart
Age
27
2
40

Name
Zhang
Stewart
Merril
Age
27
40
2

Age
27
2
40
ECE569 Lecture 02.11
Name
Zhang
Merill
Stewart
Spring 2004
An alternative definition – set of mappings

A relation schema R = {A1,A2,…,An} (no order imposed on
attributes)

A relation instance of R is a finite set of mappings {1, 2, …,
m} where
2: R  D
where D = null  dom(A1)  dom(A2)…  dom(An).

Each tuple i maps each attribute in tuple i to a value.

Note that attributes must be given names that are unique in a
relation schema.

Under this definition, all three instances above are
equivalent.
ECE569 Lecture 02.12
Spring 2004
Keys

A set S of attributes is a candidate key for R if

No semantically correct instance of R can include two
tuples    such that [s] = [s], and

No proper subset of S satisfies (a)

A set of attributes satisfying (a) but not necessarily
(b) is a superkey.

A relation may have multiple candidate keys. One
can be designated as the primary key.
ECE569 Lecture 02.13
Spring 2004
Example

Schema for a relation of degree four
patient(name,address,balance_due,room#)

An instance of patient with three tuples that
satisfies the key constraint (name is a candidate
key)
patient

Name
Address
Balance_due
Room #
Zhang
1964 highland dr
$5,230
518
Stewart
2342 K St.
$25
203
Brinkley
2342 K St.
$10,200
203
No other candidate key is possible (assuming that
this is a semantically correct instance of patient).
ECE569 Lecture 02.14
Spring 2004
Example – Healthcare DB
patient(name, address, balance_due, room#)
payments(name, amount, date)
vital_signs(name, pulse, blood, date, time)
diagnosis(patient_name, disease_name)
disease(disease_name, treatment)
treats(patient, doctor)
ECE569 Lecture 02.15
Spring 2004
Example instance of students relation
Sid
53666
53688
53650
ECE569 Lecture 02.16
Name
Jones
Smith
Smith
Login
jones@cse
smith@eecs
smith@math
Age
18
18
19
Gpa
3.4
3.2
3.8
Spring 2004
Relational Query Language

A major strength of the relational model: supports
simple, powerful querying of data

Queries can be written intuitively, and the DBMS is
responsible for efficient evaluation

Key: precise semantics for relational queries

Allows the optimizer to extensively re-order operations,
and still ensure that the answer does not change
ECE569 Lecture 02.17
Spring 2004
The SQL Query Language

Developed by IBM (system R) in the 1970s

Need for a standard since it is used by many
vendors

Standards

SQL-86

SQL-89 (minor revision)

SQL-92 (major revision)

SQL-99 (major extensions, current standard)
ECE569 Lecture 02.18
Spring 2004
The SQL Query Language

To find all 18 year old students, we can write
Sid
select *
from student S 53666
where s.age=18 53688

Name
Jones
Smith
Login
jones@cse
smith@eecs
Age Gpa
18 3.4
18 3.2
To find just names and logins, replace the first line:
select s.name, s.login
ECE569 Lecture 02.19
Spring 2004
Logical DB design

Representing entity-relationship diagrams

An entity set E can be represented by a relation whose
relation scheme consists of all the attributes of the entity
set.

Each tuple of the relation represents one entity in the
current instance of E.

A relationship R among E1, E2, …, Ek is represented by a
relation whose relation scheme consists of the attributes
in the keys for each of E1, E2, …, Ek.
ECE569 Lecture 02.20
Spring 2004
Relational Query language

Query language: Allow manipulation and retrieval
of data from a database

Relational model supports simple, powerful QLs:


Strong formal foundation based on logic

Allows for much optimization
Query language != programming language
ECE569 Lecture 02.21
Spring 2004
Formal Relational Query Languages

Two mathematical query languages form the basis
for “real” languages (e.g., SQL) and for
implementation:

Relational algebra: more operational, very useful for
representing execution plans

Relational calculus: let users describe what they want,
rather than how to compute it (Non-operational,
declarative)
ECE569 Lecture 02.22
Spring 2004
Relational Algebra

A set of five basic operations that map one or more
relations to a new relation.

Union: R  S  t  tR or tS 

The set of tuples in R or S or both

The degrees of R and S must match
a
b
c
A
B
C
D
E
F
G
H
a
d
c
a
b
c
d
e
d
e
b
e
a
b
a
e
d
a
c
b
b
c
a
a
d
b
b
b
c
b
b
d
e
b
a
d
b
b
b
b
R
ECE569 Lecture 02.23
S
T
RT
Spring 2004
Relational Algebra (cont)

Difference: R - S  t  tR and tS 

The set of tuples in R, but not in S

The degrees of R and S must match
A
B
C
D
E
F
G
H
a
d
c
a
b
c
d
e
d
e
b
e
a
b
a
e
d
a
c
b
b
c
a
a
d
b
b
b
c
b
b
R
ECE569 Lecture 02.24
S
R-T
T
Spring 2004
Relational Algebra (cont)

Cartesian Product: R X S 
 (a1, a2, …, ar+s) (a1, a2, …, ar) R and (ar+1, ar+2, …, ar+s)  S 

R x S is of degree r+s where r=degree(R) and s=degree(s)

a tuple t is in R x S if its first r components match those
of a tuple in R and its last s components from a tuple in S
R.A R.B R.C S.D S.E
A
B
C
D
E
F
G
H
A
B
C
D
E
a
b
c
d
e
d
e
b
a
b
c
d
e
a
e
d
a
c
b
b
c
a
a
d
b
b
b
c
b
b
a
b
c
b
c
a
d
c
d
e
a
d
c
b
c
e
a
b
d
e
e
a
b
b
c
R
S
T
RxS
ECE569 Lecture 02.25
Spring 2004
Relational Algebra (cont)

Projection: i1, i2, …, im (R)   (ai1, ai2, …, aim) (a1, a2, …, an) R

i1, i2, …, im (R) is of degree m

A tuple t in the result is obtained by removing and/or
rearranging the attributes of a tuple of R
A
B
C
D
E
F
G
H
G
F
a
b
c
d
e
d
e
b
e
d
a
e
d
a
c
b
b
c
a
a
d
b
b
b
c
b
b
b
a
b
d
R
ECE569 Lecture 02.26
S
T
2,1 (T)
Spring 2004
Relational Algebra (cont)

Selection: F(R)  t  t R and F(t)

F(R) is of the same degree as R

All tuples in R that satisfy predicate F are included in
result

F is a formula involving
- Constants or components of the tuple – component i is
denoted $I
- Comparison operators, <, =, >, , , 
- Logical operators and, or, not
F
G
H
d
e
b
a
a
d
b
b
b
c
b
b
T
ECE569 Lecture 02.27
F
G
H
d
e
b
a
b
b
d
b
b
$1=d or $2=$3 (T)
Spring 2004
Examples

Find the names of all patients with a balance of
more than $10,000.

Find the names of all patients that have a pulse
less than 50 or have been diagnosed with
hypertension.
ECE569 Lecture 02.28
Spring 2004
Examples (cont)

Find all patients whose treatment includes the
“application of leeches”

Print the names of all pairs of patients that
occupy the same hospital room and have been
diagnosed with the same disease.
ECE569 Lecture 02.29
Spring 2004
Examples (cont)

Print the name of the patient with the highest pulse
recorded today

Find all patients that had pulse measured today

Find the set of all ‘losers’, i.e., those whose pulse is less
than or equal to that of some other patient

Subtract (b) from (a) and project out patient name
ECE569 Lecture 02.30
Spring 2004
Examples (cont)

List all patients that suffer from at least one illness
that no other patient suffers from
ECE569 Lecture 02.31
Spring 2004
Additional Operations

Intersection: R  S  t  tR and tS 

The set of tuples in both R and S

The degrees of R and S must match

R  S = R – (R – S)
S
R
R–S
R – ( R – S)
ECE569 Lecture 02.32
Spring 2004
Additional Operations (cont)

Theta-join: R ij S  $i  $r+j(R  S) where r =
degree(R) and    <, =, >, , ,  


When  is ‘=‘ operation is called equijoin.
Find patient with highest pulse
ECE569 Lecture 02.33
Spring 2004
Additional Operations (cont)

Natural Join: R
S

An equijoin for all attributes that R and S have in
common.

The shared columns originating in S are projected out

R
S   i1, i2, …, im  R.A1 = S. A1 and … and R.Ak = S.Ak (R x S)
where i1, i2, … im is the list of all attributes of R x S, in
order, except the attributes S.A1, …, S.Ak.

Example: Names of patients being treated with leeches.
ECE569 Lecture 02.34
Spring 2004
Sample Queries

Consider the following set of relations
frequents (drinker, bar)
serves (bar, beer)
likes (drinker, beer)

List the drinkers that frequent at least one bar that
serves a beer they like

Construct the relation should_visit(drinker,bar)
consisting of all tuples <d, b> where bar b serves a
beer that drinker d likes.
ECE569 Lecture 02.35
Spring 2004
Sample Queries (cont)

List the drinkers that frequent only bars that serve
some beer they like. (Assume every drinker
frequents at least one bar.)

Print the drinkers that frequent no bar that serves a
beer that they like
ECE569 Lecture 02.36
Spring 2004
Questions on Healthcare DB

Retrieve all pairs of patients that shared a room at
the same time

List all patients that finished paying their bill
before one of their roomates began paying theirs.

Print the names of all pairs of patients that occupy
the same hospital room and have been diagnosed
with the same disease.
ECE569 Lecture 02.37
Spring 2004
Components of Data-Intensive Systems

Three separate types of functionalities

Data management

Application logic

Presentation

The system architecture determines whether these three
components reside on a single system (“tier”) or are
distributed across several tiers
ECE569 Lecture 02.38
Spring 2004
Single-tier architecture

All functionality combined into a single tier, usually
on a mainframe


Advantages:


Users access through dumb terminals
Easy maintenance and administration
Disadvantages:

Today, users expect graphical user interface

Centralized computation of all of them is too much for a
central system
ECE569 Lecture 02.39
Spring 2004
Client-Server Architecture


Thin client: work division

Client implements only the graphical user interface

Server implements business logic and data management
Thick client:

work division
- Client implements both the graphical user interface and the
business logic
- Server implements data management

Disadvantages:
- No central place to update the business logic
- Security issues: server needs to trust clients
- Scalability: does not scale more than several 100s of clients
ECE569 Lecture 02.40
Spring 2004
The three-tier architecture
Client program
(web browser)
Application Server
(Tomcat, Apache)
Database System
(DB2)
ECE569 Lecture 02.41
Spring 2004