CS186: Introduction to Database Systems
Download
Report
Transcript CS186: Introduction to Database Systems
EECS 647: Introduction to
Database Systems
Instructor: Luke Huan
Spring 2007
Administrative
Take home background survey will be posted at the
class website
http://people.eecs.ku.edu/~jhuan/EECS647/schedule.ht
ml today
Due Feb 5th (next Monday) at the class meeting time.
7/18/2015
Luke Huan Univ. of Kansas
2
Administrative (cont.)
Your PostgreSQL account has been created.
Check your EECS email account and get your passwd
Do not change your PostgreSQL passwd!
Go to
http://people.eecs.ku.edu/~jhuan/EECS647/additionalFile
s.html (linked through the class website) and read through
the tutorial
7/18/2015
Luke Huan Univ. of Kansas
3
Review
A data model is
What are the two terms used by ER model to describe a
miniworld?
Entity
Relationship
Student (sid: string, name: string, login:
string, age: integer, gpa:real)
A database schema is
a group of concepts for describing data.
a description of a database, using a given data model
10101
(relational model by default).
11101
What are the three types of database schema that we
usually see in database design?
7/18/2015
Luke Huan Univ. of Kansas
4
SUMMARY OF ER-DIAGRAM
NOTATION FOR ER SCHEMAS
Symbol
Meaning
ENTITY TYPE
WEAK ENTITY TYPE
RELATIONSHIP TYPE
IDENTIFYING RELATIONSHIP TYPE
ATTRIBUTE
KEY ATTRIBUTE
MULTIVALUED ATTRIBUTE
COMPOSITE ATTRIBUTE
DERIVED ATTRIBUTE
E1
E1
R
R
7/18/2015
E2
R
N
(min,max)
TOTAL PARTICIPATION OF E2 IN R
E2
CARDINALITY RATIO 1:N FOR E1:E2 IN R
E
STRUCTURAL CONSTRAINT (min, max)
ON PARTICIPATION OF E IN R
Luke Huan Univ. of Kansas
5
ER Case Study I
Works_In does
not allow an
employee to
work in a
department
for two or more
periods.
We want to
record several
values of the
descriptive
attributes for
each instance of
this relationship.
7/18/2015
from
name
ssn
to
dname
lot
did
Departments
Works_In
Employees
budget
name
dname
ssn
lot
Employees
from
Luke Huan Univ. of Kansas
did
Works_In
Duration
budget
Departments
to
6
ER Case study II
Design a database representing cities, counties, and states
For states, record name and capital (city)
For counties, record name, area, and location (state)
For cities, record name, population, and location (county and state)
Assume the following:
Names of states are unique
Names of counties are only unique within a state
Names of cities are only unique within a county
A city is always located in a single county
A county is always located in a single state
7/18/2015
Luke Huan Univ. of Kansas
7
Case study : first design
name
name
Cities
In
population
States
capital
county_name
county_area
County area information is repeated for every city in the
county
Redundancy is bad (why?)
State capital should really be a city
Should “reference” entities through explicit relationships
7/18/2015
Luke Huan Univ. of Kansas
8
Case study : second design
name
Cities
population
In
IsCapitalOf
name
Counties
In
States
name
area
Technically, nothing in this design could prevent a city
in state X from being the capital of another state Y, but
oh well…
7/18/2015
Luke Huan Univ. of Kansas
9
Today’s Outline
Relational Model
Relational Model Constraints and Relational Database
Schemas
Update Operations
Insertion
Deletion
Update
Transaction
Dealing with Constraint Violations
7/18/2015
Luke Huan Univ. of Kansas
10
Why Study the Relational Model?
Most widely used model.
“Legacy systems” in older models
e.g., IBM’s IMS
Object-oriented concepts merged in
“Object-Relational” model
Early work done in POSTGRES research project at Berkeley
XML features in most relational systems
Can export XML interfaces
Can embed XML inside relational fields
7/18/2015
Luke Huan Univ. of Kansas
11
Database Design
7/18/2015
Luke Huan Univ. of Kansas
12
Relational Model Concepts
Relational database: a set of relations.
Relation: made up of 2 parts:
7/18/2015
Schema : specifies name of relation, plus name and
type of each column.
E.g. Students(sid: string, name: string, login:
string, age: integer, gpa: real)
Instance : a table, with rows and columns.
#rows = cardinality
#fields = degree / arity
Luke Huan Univ. of Kansas
13
Informally
RELATION: A table of values
A relation may be thought of as a set of rows (table
view).
Each row represents a fact that corresponds to a realworld entity or relationship.
Each row has a value of an item or set of items that
uniquely identifies that row in the table.
Sometimes row-ids or sequential numbers are assigned
to identify the rows in the table.
A relation may alternately be though of as a set of
columns (schema view).
Each column typically is called by its column name or
column header or attribute name.
7/18/2015
Luke Huan Univ. of Kansas
14
Example
7/18/2015
Luke Huan Univ. of Kansas
15
Historically
The model was first proposed by Dr. E.F. Codd of IBM in
1970 in the following paper:
"A Relational Model for Large Shared Data Banks,"
Communications of the ACM, June 1970.
The above paper caused a major revolution in the field of Database
management and earned Ted Codd the coveted ACM Turing Award.
The picture is from wikipedia
7/18/2015
Luke Huan Univ. of Kansas
16
A (Slightly) Formal Definition
A database is a collection of relations (or tables)
Each relation is identified by a name and a list of
attributes (or columns)
Each attribute has a name and a domain (or type)
Set-valued attributes not allowed
Simplicity is a virtue!
7/18/2015
Luke Huan Univ. of Kansas
17
Schema versus instance
Schema (metadata)
Specification of how data is to be structured logically
Defined at set-up
Rarely changes
Instance
Content
Changes rapidly, but always conforms to the schema
Compare to type and objects of type in a programming
language
7/18/2015
Luke Huan Univ. of Kansas
18
Example
Schema
Student (SID integer, name string, age integer, GPA float)
Course (CID string, title string)
Enroll (SID integer, CID integer)
Instance
{ h142, Bart, 10, 2.3i, h123, Milhouse, 10, 3.1i, ...}
{ hCPS116, Intro. to Database Systemsi, ...}
{ h142, CPS116i, h142, CPS114i, ...}
7/18/2015
Luke Huan Univ. of Kansas
19
Formal Definition (Set Theory)
Formally, given sets D1, D2, …. Dn a relation r is a
subset of D1 x D2 x … x Dn
Thus, a relation is a set of n-tuples (a1, a2, …, an) where
each ai Di
7/18/2015
Luke Huan Univ. of Kansas
20
Example
Example: If
customer_name = {Jones, Smith, Curry, Lindsay, …}
customer_street = {Main, North, Park, …}
customer_city
= {Harrison, Rye, Pittsfield, …}
Then r = { (Jones, Main, Harrison), (Smith,
North, Rye),
(Curry, North, Rye), (Lindsay, Park, Pittsfield) }
is a relation over
customer_name × customer_street × customer_city
7/18/2015
Luke Huan Univ. of Kansas
21
Attribute Types
Each attribute of a relation has a name, designating
the role of the attribute
The set of allowed values for each attribute is
called the domain of the attribute
Attribute values are (normally) required to be
atomic; that is, indivisible
7/18/2015
E.g. the value of an attribute can be an account
number, but cannot be a set of account numbers
Domain is said to be atomic if all its members are
atomic
The special value null is a member of every domain
Luke Huan Univ. of Kansas
22
Relation Schema
A1, A2, …, An are attributes
R = (A1, A2, …, An ) is a relation schema
Example:
Customer_schema = (customer_name,
customer_street, customer_city)
7/18/2015
r(R) denotes a relation r on the relation schema R
Example:
customer (Customer_schema)
Luke Huan Univ. of Kansas
23
Relation Instance
The current values (relation instance) of a
relation are specified by a table
An element t of r is a tuple, represented by a
row in a table
attributes
(or columns)
customer_name customer_street
Jones
Smith
Curry
Lindsay
Main
North
North
Park
customer_city
Harrison
Rye
Rye
Pittsfield
tuples
(or rows)
customer
7/18/2015
Luke Huan Univ. of Kansas
24
Definition Summary
Informal Terms
Formal Terms
Table
Relation
Column
Attribute/Domain
Row
Tuple
Values in a column
Domain
Table Definition
Schema of a Relation
Populated Table
Extension
7/18/2015
Luke Huan Univ. of Kansas
25
Characteristics of Relation
The tuples in a ration r(R) are not considered to be
ordered, even though they appear to be in the tabular
form.
We consider the attributes in R(A1, A2, ..., An) and the
values in t=<v1, v2, ..., vn> to be ordered .
(However, a more general alternative definition of
relation does not require this ordering).
All values are considered atomic (indivisible). A
special null value is used to represent values that are
unknown or inapplicable to certain tuples.
7/18/2015
Luke Huan Univ. of Kansas
26
Characteristics of Relation
Notation: we refer to component values of a tuple t
by t[Ai] = vi (the value of attribute Ai for tuple t).
Similarly, t[Au, Av, ..., Aw] refers to the subtuple of
t containing the values of attributes Au, Av, ..., Aw,
respectively.
7/18/2015
Luke Huan Univ. of Kansas
27
Relational Integrity Constraints
Constraints are conditions that must hold on all valid
relation instances. There are four main types of
constraints:
1.
Domain constraints
1.
2.
3.
4.
7/18/2015
The value of a attribute must come from its domain
Key constraints
Entity integrity constraints
Referential integrity constraints
Luke Huan Univ. of Kansas
28
Primary Key Constraints
A set of fields is a candidate key for a relation if :
1. No two distinct tuples can have same values in all key
fields, and
2. This is not true for any subset of the key.
Part 2 false? A superkey.
If there’s >1 key for a relation, one of the keys is chosen
(by DBA) to be the primary key.
E.g., given a schema Student(sid: string, name: string,
gpa: float) we have:
sid is a key for Students. (What about name?) The set
{sid, gpa} is a superkey.
Key Example
CAR (licence_num: string, Engine_serial_num: string,
make: string, model: string, year: integer)
What is the candidate key(s)
Which one you may use as a primary key
What are the super keys
7/18/2015
Luke Huan Univ. of Kansas
30
Entity Integrity
Entity Integrity: The primary key attributes PK of
each relation schema R in S cannot have null values
in any tuple of r(R).
7/18/2015
Other attributes of R may be similarly constrained to
disallow null values, even though they are not
members of the primary key.
Luke Huan Univ. of Kansas
31
Foreign Keys, Referential Integrity
Foreign key : Set of fields in one relation that is used to
`refer’ to a tuple in another relation. (Must correspond
to primary key of the second relation.) Like a `logical
pointer’.
E.g. sid is a foreign key referring to Students:
Student(sid: string, name: string, gpa: float)
Enrolled(sid: string, cid: string, grade: string)
If all foreign key constraints are enforced, referential
integrity is achieved, i.e., no dangling references.
Can you name a data model w/o referential integrity?
Links in HTML!
Foreign Keys
Only students listed in the Students relation should be
allowed to enroll for courses.
Enrolled
sid
53666
53666
53650
53666
cid
grade
Carnatic101
C
Reggae203
B
Topology112
A
History105
B
Students
sid
53666
53688
53650
name
login
Jones jones@cs
Smith smith@eecs
Smith smith@math
age
18
18
19
gpa
3.4
3.2
3.8
Or, use NULL as the value for the foreign key in the
referencing tuple when the referenced tuple does not exist
In-Class Exercise
(Taken from Exercise 5.16)
Consider the following relations for a database that keeps track of student
enrollment in courses and the books adopted for each course:
STUDENT(SSN, Name, Major, Bdate)
COURSE(Course#, Cname, Dept)
ENROLL(SSN, Course#, Quarter, Grade)
BOOK_ADOPTION(Course#, Quarter, Book_ISBN)
TEXT(Book_ISBN, Book_Title, Publisher, Author)
Draw a relational schema diagram specifying the foreign keys for this
schema.
7/18/2015
Luke Huan Univ. of Kansas
34
Other Types of Constraints
Semantic Integrity Constraints:
7/18/2015
based on application semantics and cannot be
expressed by the model per se
e.g., “the max. no. of hours per employee for all
projects he or she works on is 56 hrs per week”
A constraint specification language may have to be
used to express these
SQL-99 allows triggers and ASSERTIONS to allow
for some of these
Luke Huan Univ. of Kansas
35
Update Operations on Relations
Update operations
INSERT a tuple.
DELETE a tuple.
MODIFY a tuple.
Constraints should not be violated in updates
7/18/2015
Luke Huan Univ. of Kansas
36
Update Operations on Relations
In case of integrity violation, several actions can
be taken:
7/18/2015
Cancel the operation that causes the violation
(REJECT option)
Perform the operation but inform the user of the
violation
Trigger additional updates so the violation is
corrected (CASCADE option, SET NULL option)
Execute a user-specified error-correction routine
Luke Huan Univ. of Kansas
37
Key concept: Transaction
an atomic sequence of database actions (reads/writes)
takes DB from one consistent state to another
consistent state 1
7/18/2015
transaction
Luke Huan Univ. of Kansas
consistent state 2
38
Example
checking: $200
savings: $1000
Transaction
“transfer $100
from Saving to
checking”
checking: $300
savings: $900
Here, consistency is based on our knowledge of banking
“semantics”
In general, up to writer of transaction to ensure transaction
preserves consistency
DBMS provides (limited) automatic enforcement, via integrity
constraints
e.g., balances must be >= 0
7/18/2015
Luke Huan Univ. of Kansas
39