cs610a - Arms-A
Download
Report
Transcript cs610a - Arms-A
CS610 / CS710 Database Systems I
Daisy Wong
1
© D. Wong 2003
2002
Week 1
2
© D. Wong 2003
2002
Objectives
Modeling and design of databases.
Programming: queries and DB operations
3
© D. Wong 2003
2002
Database
A large
collection of data:
–stored in mass storage
–exists over a long period of time
–Can take on a variety of appearances depending
on the requirements at the time
–Can serve as the data source for a variety of
applications
4
© D. Wong 2003
2002
Database Examples
Examples:
– Wal-Mart : records every item purchased in
every store
– L. L. Bean: records detail information of each
customer and their purchases
– Hospitals: record patient demographics,
conditions and progress, test results, etc
–...
This course, refer to a collection of data managed
by a Database Management System (DBMS)
5
© D. Wong 2003
2002
DBMS
A combination of software, data, and structure of
the data that support:
– Users to create databases and specify their
schema
– Users to query and modify the data
– Storage of very large amount of data, secure the
data from accident or unauthorized use, and
allow efficient access
– Control concurrent access from many users,
presenting correct data to each user, and
prevent accidental corruption of the data from
simultaneous accesses
6
© D. Wong 2003
2002
DBMS lingo
Schema
– logical structure of the data. Use Data Definition
Language (DDL)
Query
– A question about the data. Use Data
Manipulation Language (DML)
Transaction (or atomic transaction)
– A logical unit of work that must be completed as
a whole or not at all.
7
© D. Wong 2003
2002
Three levels of database schemas
External View1
External View2
External View3
External level
Logical to external mappings
Logical Schema
Logical level
Internal to Logical mappings
Disk
Internal schema
8
Internal level
© D. Wong 2003
2002
DBMS User Roles
End users
– access the database for information to do their jobs.
Casual (use simple user interfaces) vs. Sophisticated (
use DML)
Database designers
– specify information content (use DDL) to create
database systems
Application developers
– design and develop applications that extend the
functionality of the dbms. E.g. user interface, data
analysis and data mining, various business services
Database administrators (DBAs)
– administer databases: control access, maintain data
accuracy and integrity, monitor and improve database
9
© D. Wong 2003
2002
performance
A little history
First attempt – file systems
Hierarchical model (tree based)
Network model (graph base)
Relational model
– Proposed by E. F. Codd (1970)
– Data should be presented to user as tables
(relations)
– Queries expressed in a very high-level language
– SQL (Structured Query Language) – most
important language based on relational model
10
© D. Wong 2003
2002
Relational Model
A conceptual model that represents data as relations.
Relations – tables of data
Accounts
accountNo
12345
34567
…
name
Sally
Sue
…
balance
1000.21
285.48
…
Query using SQL:
SELECT balance FROM Accounts
WHERE accountNo = 34567;
Relational DBMS finds an efficient way to answer the
query
11
© D. Wong 2003
2002
Major components of a DBMS (Simplified. Figure 1.1)
External
level
Logical
level
Users / applications
DBA
DDL commands
DDL
compiler
Queries/updates
Query
Manager
Transaction
Commands
Transaction
Manager
Buffer/File
Storage
Manager
Internal
level
Data
Metadata
12
© D. Wong 2003
2002
Figure 1.1: Database management system components
Ref. FCDS 2ed. by Ullman
13
© D. Wong 2003
2002
Storage Manager
• Obtains the requested information from data storage
• Modifies the information if requested and re-store
• Indexes are used (data structures that help us find data
items quickly given a part of their value). Advanced data
structures such as B-tree are used for efficient access
• Indexes are part of the data, their description is part of the
metadata
• Consists of 2 components: file manger and buffer manager
1. File manager keeps track of the location of files on disk
and obtains the blocks containing the requested data
2. Buffer manager handles main memory. It manages the
memory blocks, obtaining disk blocks from the file
manager, trying to optimize the access to data
14
© D. Wong 2003
2002
Query Manager
Parse and optimize the query using a query
compiler
Execute the resulting query plan (sequence of
actions for the DBMS to perform)
– Issues a sequence of requests to storage
manager for small pieces of data
Return the result to the requester
15
© D. Wong 2003
2002
Transaction Manager
Assure all transactions are executed properly
ACID properties of “proper” execution:
– Atomicity : All of the updates of a transaction are
successful, or no update take place
– Consistency: Each transaction should leave the
database in a consistent state
– Isolation: Each transaction, when executed
concurrently with other transactions, should have the
same effect as if it had been executed by itself
– Durability: Once a transaction has completed
successfully, its changes to the database should be
permanent. Even serious failures should not affect the
permanence of a transaction.
16
© D. Wong 2003
2002
Techniques to enforce ACID
Locking – granularity of locks is important.
Logging – write a log to nonvolatile storage.
Assure durability.
Transaction Commitment – for durability and
atomicity, transactions are computed
“tentatively”, recorded, but no changes are made
to the db until the transaction gets committed.
Changes copied to the log, then copied to db.
17
© D. Wong 2003
2002
Trends
Object Oriented DB
– Richer data types
– Classes and class hierarchy enable share or reuse of sw
and schemas
– Protect misuse through abstract data types
Object Relational DB
Constraints and triggers handling
Multimedia data
Data Integration to support advance data analysis such as
data mining. E.g. data warehouses, data marts
Multi-tier Client-Server architecture, move more
processing to the client
Parallel processing
Support Web sites
18
© D. Wong 2003
2002
Knowledge Discovery in Databases (KDD)
Data Mining
Interpretation
/ Evaluation
Transformation
Preprocessing
Knowledge
Selection
Patterns
Preprocessed
Data
Data
Transformed
Data
Target
Data
Reference: Fayyad; Smyth: "From Data Mining to Knowledge Discovery: An Overview" 1996
19
© D. Wong 2003
2002
Ch. 2 Entity-Relationship Data Model
Data models
Entity-Relationship diagrams
Design Principles
Modeling of constraints
Weak entity sets
20
© D. Wong 2003
2002
Data Modeling
Used for conceptual database design
Object-oriented
Classes /
Objects
DBMS
ODL
Ideas
Relational
Schema
Relational
DBMS
E/R
21
© D. Wong 2003
2002
Data Models
Value-Oriented
Examples
Object-Oriented
–Relational
–Object-oriented
–Logic
–Network
–Hierarchical
Distinct objects by
Data values
Redundancy
handling
Reduced by Design Use pointers to
capabilities
objects
Many-to-many
relationships
Handled the same
22
Object identity
Only binary
relationships
© D. Wong 2003
2002
Relational Model
Based on mathematically defined relations of
entities
Consists of:
– Attributes (fields)
– Domain legitimate values of attributes (data
range)
– Views of data presented in table format
– Create new views by projections of the database
– Records are n-tuples, where n = # of attributes
23
© D. Wong 2003
2002
Entity-Relationship (E/R) Diagrams
Represents the schematic of a database
Useful for designing the conceptual model
Entity set : classes of objects
Entity : member of an entity set (data)
Attributes : properties of the entities in an entity
set
Relationship – describe how entities relate to each
other
24
© D. Wong 2003
2002
Entity-Relationship Diagrams Symbols
attributes
relationships
entity sets
one - one
Cardinality:
m
m
many - one
n
25
many - many
© D. Wong 2003
2002
Entity-Relationship Diagrams (continue)
Entity
– Has a set of attributes
– The key is underlined
SS#
Key
– Attributes which uniquely identify an entity in
an entity set
– An inherent property of the data (e.g. movie
title and year)
– Serve as a constraint
26
© D. Wong 2003
2002
Week 2
27
© D. Wong 2003
2002
Ch. 2 Entity-Relationship Data Model (continue)
Data models
Entity-Relationship diagrams
Design Principles
Modeling of constraints
Weak entity sets
28
© D. Wong 2003
2002
Entity-Relationship Diagrams Symbols
attributes
relationships
entity sets
Cardinality (Multiplicity):
one - one
m
m
many - one
n
29
many - many
© D. Wong 2003
2002
Entity-Relationship Diagrams (continue 1)
Entity
– Has a set of attributes
– The key is underlined
SS#
Key
– Attributes which uniquely identify an entity in
an entity set
– An inherent property of the data (e.g. movie
title and year)
– Serve as a constraint
30
© D. Wong 2003
2002
Entity-Relationship Diagrams (continue 2)
Multiway relationships
Relationships with roles
Attributes on relationships
– E.g. salary of a star for a movie
– Not necessary by adding new entity set, whose
entities have the attributes
31
© D. Wong 2003
2002
E/R Diagrams Multiway relationships
Can be converted to a collection of binary, manyone relationships
– Introduce new entity set (connecting entity set)
Entities
are tuples of the relationship set for the
multiway relationship
– Introduce many-one relationships from the
connecting entity set to each of the entity sets
that provide components of tuples in the
original, multiway relationship
32
© D. Wong 2003
2002
ER Diagrams –Inheritance
Subclasses
– special-case entity sets with own special
attributes and/or relationships
isa relationship : One-One relationship to handle
inheritance, to connect entity set to its subclasses.
– E.g. every Cartoon is a Movie
Super class
ER symbol for isa :
(arrows not drawn)
isa
Subclass
33
© D. Wong 2003
2002
Design Principles
Faithfulness : Reflects the reality of the data and
the problem with the appropriate:
–
attributes
–
data ranges
–
relationships
–
cardinality
E.g.
Stars
Stars
-in
34
Movies
© D. Wong 2003
2002
Design Principles (2)
Avoid Redundancy
– Redundancy : duplicate representation of data,
lead to
Inconsistency,
Waste
violate data integrity
space
E.g.
Studios
Owns
Studio_name
Movies
Studio_name
35
© D. Wong 2003
2002
Design Principles (3)
Keep it simple : Avoid more elements than needed
Pick the right kind of element
– attributes or entity set
– what should the relationships be
36
© D. Wong 2003
2002
DB design
Centers on the entity sets and relationships making up the
database.
Users and designers must work together to:
– Specify the entity sets and relationships and their
meanings
– Specify the attributes and the meaning of it
– Specify the constraints
– Identify key attributes of entity sets
– Specify how the data would be viewed
– Specify who can view and modify what and when
Better to do a good job in the original design
– Need collaboration of users and designers
Reminder: Changing database design later usually very
37
© D. Wong 2003
2002
costly.
Constraints
Constraints are part of the schema, not of an
instance of the database.
Classification of Constraints
– Keys
– Single-value constraints
– Referential integrity constraints
– Domain constraints
– General constrains
38
© D. Wong 2003
2002
Keys in the E/R Model
Key for an entity set E is a set K of one or more attributes
such that, given any two distinct entities e1 and e2 in E, e1
and e2 cannot have identical values for each of the
attributes in the key K. (Ref. 2.3.2)
– E.g. Movie(title, year, length, filmtype), E1 = (titanic,
1960, 120, color), E2 = (titanic, 1997, 180, color)
– A key can consist of more than one attribute
– There can be more than one possible key (candidate
keys), but designate one as the “primary key”.
– Attributes that constitute the primary key cannot be
null
– If the entity set is in an isa-hierarchy, require the root
entity set have all the attributes needed for a key.
– In E/R diagram, attributes of a key are underlined
39
© D. Wong 2003
2002
Single-Value Constraints
Assertion: at most one value exists in a given
context
Ways to express single-value constraints in E/R:
– Each attribute of an entity set has a single value
Some
attributes may allow null
Key attributes should not be null
No formal representation in E/R diagram, make
a notation beside the attribute
– Many-one relationship
E
R
40
F
© D. Wong 2003
2002
Referential Integrity
Assertion: exactly one value exists in a given role
Enforced at database implementation
– Forbid the deletion of a referenced entity
– Require that if referenced entity is deleted,
then all entities that reference it are deleted as
well.
Use a rounded arrow in the diagram
Movies
Owned
by
41
Studios
© D. Wong 2003
2002
Other kinds of constraints
Domain constraints : restrict the value of an
attribute to be in a limited set.
– No formal notation in E/R, place a notation
next to the attribute
– E.g. declaring the type of an attribute
– E.g specify range for the attribute value
General constraints
– E.g. degree of the relationships
Movies
Stars-in
42
<=10
Stars
© D. Wong 2003
2002
Weak Entity Sets
Definition: An entity set whose key is composed of attributes
some or all of which belong to another entity set.
Causes:
1. entity sets in a hierarchy based on classification
Crew
Unit-of
number
Studios
name
addr
2. Connecting entity sets to eliminate a muliway
relationship.
43
© D. Wong 2003
2002
Weak Entity Sets Requirements
1.
2.
3.
Key consists of zero or more of its own attributes, and
Key attributes from entity sets that are reached by
supporting relationships (many-one)
R is a supporting relationship for a weak entity set E to
some entity set F, R must:
– Be a binary, many-one relationship from E to F
– R must have referential integrity from E to F
– The attributes that F supplies for the key of E must be
key attributes of F
– If F is weak, then key attributes of F supplied to E may
be attributes of some entity set to which F is connected
by a supporting relationship (recursive)
–
For multiple supporting relationship from E to F, each
relationship is used to supply a copy of the key
attributes of F
44
© D. Wong 2003
2002
Week 3
45
© D. Wong 2003
2002
Ch. 3 (part 1)
Relational Model basics
From E/R diagram to Relations
46
© D. Wong 2003
2002
Relational Model
Based on mathematically defined relations of entities
Consists of:
– Attributes (fields)
– Domain legitimate values of attributes (data range)
– Views of data presented in table format
– Create new views by projections of the database
– Records are n-tuples, where n = # of attributes (arity)
Use relational algebra operations to compute data
Result of the operations are also relations
47
© D. Wong 2003
2002
Relations
Sets of tuples presented in a table (order of tuples
immaterial)
Attributes are the column header
Schema of a relation: name and the set of attributes. E.g.
Movies (title, year, length, filmType)
Database schema – the set of schemas for the relations in
a design
Domain – elementary data type of each attribute
Relation instances
Movies title
year
length
filmType
Star Wars
1977
124
color
Mighty Ducks
1991
104
color
Wayne’s World
1992
48
95
color
© D. Wong 2003
2002
E/R Diagram to Relational Designs
1.
Turn each entity set into a relation with the same
set of attributes. E.g. Star (name, address)
2.
Represent relationships by relations
3.
Combining relations when appropriate
4.
Handle weak entity sets
5.
Convert subclass structures to relations
49
© D. Wong 2003
2002
E/R Relationship to Relations
Replace a relationship by a relation (say R):
1. For each entity set involved in relationship R, take its
key attribute(s) as part of the schema of the relation
for R.
2. If the relationship has attributes, add to R’s attribute
set
3. If one entity set has multiple roles in a relationship,
then its key attributes appear as many times as there
are roles. Need to rename the attributes to avoid name
duplication.
e.g. Contracts (starName, title, year, studioOfStar,
producingStudio) Ref. Fig. 3.6
50
© D. Wong 2003
2002
Discovering Keys for Relations
If relation R is constructed from:
Entity set – key attributes of the entity set
Relationship R, key of R is :
–
many-many : composed of keys of both
connected entity sets
–
many-one from E1 to E2 : consisted of key of
E1
–
one-one between E1 and E2: either the key of
E1 or E2
51
© D. Wong 2003
2002
Combining Relations
In general, if 2 or more relations has exactly the
same keys, combine them.
For many-one relationship:
– E (ea1, ea2)
– F (fa1, fa2, fa3)
– R (ea1, fa2, ra1) -- key is from the many relation
– Result: E(ea1, ea2, fa2, ra1)
E
F
R
52
© D. Wong 2003
2002
Handling weak entity set
1.
The relation for the weak entity set includes it’s
own attributes and the attritubes of the other
entity sets that help form it’s key
e.g. 1: Contracts (starName, studioName, title,
year, salary)
e.g. 2: Crew (number, studioName)
2.
3.
The relation for any relationship create as usual
(rename attributes to avoid confusion)
Don’t need to create a relation for supporting
relationships
53
© D. Wong 2003
2002
Weak entity set handling example
Studios(name, addr)
Crews(number, studioName)
Unit-of(number, studioName, name) – but not
needed because it’s a supporting relationship
Crew
Uint-of
number
Studios
name
54
addr
© D. Wong 2003
2002
Converting Subclass Structures to Relations
Strategies:
1.
E/R approach
2.
OO approach
3.
Use null values
55
© D. Wong 2003
2002
E/R approach
For each entity set E in the hierarchy, create a
relation that includes the key attributes from the
root and any attributes belonging to E
Do not create a relation for isa relationship
Example:
– Movies(title, year, length, filmType)
– MurderMysteris(title, year, weapon)
– Cartoons(title, year)
56
© D. Wong 2003
2002
OO approach
Treat entities as objects belonging to a single class
– for each possible subtree including the root,
create one relation, whose schema includes all the
attributes of all the entity sets in the subtree
Example:
– Movies(title, year, length, filmType)
– MoviesC(title, year, length, filmType)
– MoviesMM( title, year, length, filmType, weapon)
– MoviesCMM(title, year, length, filmType, weapon)
– Voices(title, year, starName)
Then combine relations that have the same set of
attributes.
57
© D. Wong 2003
2002
Use null values
Create one relation with all the attributes of all the
entity sets in the hierarchy. Each entity is
represented by one tuple, and that tuple has a null
value for whatever attributes the entity does not
have.
Example:
– Movie(title, year, length, filmType, weapon)
58
© D. Wong 2003
2002
Comparison of the 3 approaches
Pros and Cons in each approach. Factors to
consider:
– Number of relations involved in a query
(depends on the query of interest)
– Total number of relations in the schema
– Space usage
59
© D. Wong 2003
2002
Operations in the Relational Data Model
Algebraic notation (Relational Algebra)
sets
bags
Constraints on relations
Extensions to the relational model
60
© D. Wong 2003
2002
Ch. 5 Relational Algebra
Relational Algebra expressions are composed of:
– Operands - relations represented by:
relation’s
name or
the list of tuples in a relation
– Operators
Results of the operations are also relations
Build complex expressions from simple ones
Queries – relational algebra expressions
61
© D. Wong 2003
2002
Classes of Relational Algebra Operations
Classes of relational algebra operations:
1.
Set operations: union, difference, intersection
2.
Remove parts of a relation:
–
Selection – eliminates rows (tuples)
–
Projection – eliminates columns
3.
Combine tuples of two relations : cartesian
product and joins
4.
Renaming: changes relation schema (i.e. relation
name, and/or attribute names)
62
© D. Wong 2003
2002
Example relations 1
R
S
A
B
C
a
b
c
d
a
f
c
b
d
A
B
C
b
g
a
d
a
f
63
© D. Wong 2003
2002
Set Operations
Union, Difference, Intersection
Given relations R and S:
– R and S have schemas with identical set of
attributes, and domain types for each attributes
(=> same arity)
– Columns of R and S must be ordered so that the
order of attributes is the same for both relations
64
© D. Wong 2003
2002
Union
RS
– Result: a set of tuples that are in R or S or both.
– A tuple appears only once in the result even if it is
present in both R and S
– Reminder: R and S has same arity
Example:
A
a
d
c
b
B
b
a
b
g
65
C
c
f
d
a
© D. Wong 2003
2002
Difference
R–S
– Result: set of tuples in R but not in S
– Reminder: Not commutative: R – S ≠ S – R
– Reminder: R and S has same arity
Example:
A
B
C
a
b
c
c
b
d
66
© D. Wong 2003
2002
Intersection
RS
– Result: set of tuples in both R and S, i.e. R-(R-S)
– Reminder: R and S has same arity
Example:
A
B
C
d
a
f
67
© D. Wong 2003
2002
Projection
A1, A2, …, An( R )
Result:
– a new relation that has only the columns for
attributes A1, A2, … An of R
– Schema of the new relation is {A1, A2, … An}
Example 1: A, C( R )
Example 2: B( R )
A
C
a
c
d
f
c
d
B
b
a
68
© D. Wong 2003
2002
Selection
sc(
R ) , where C is a conditional expression:
– operands are constants or attributes of R
– Operators are : <, =, >, , , , and, or, not
Result:
– a new relation with a subset of R’s tuples that
satisfy condition C
– Schema is the same as R’s schema
Example: sB=b(R)
A
B
C
a
b
c
c
b
d
69
© D. Wong 2003
2002
Cartesian Product
RxS
Let arity of R = k1 and arity of S = k2
Result: the set of all possible (k1 + k2)-tuples whose first k1
components form a tuple in R, and whose last k2
components form a tuple in S
Example:
R.A
R.B
R.C
S.A
S.B
S.C
a
b
c
b
g
a
a
b
c
d
a
f
d
a
f
b
g
a
d
a
f
d
a
f
c
b
d
b
g
a
c
b
d
d
a
f
70
© D. Wong 2003
2002
Example Relation 2
R
A
B
C
1
2
4
7
D
E
3
3
1
5
6
6
2
8
9
71
S
© D. Wong 2003
2002
Theta-Joins
R
Result: tuples in R x S such that C is satisfied, i.e.
C
S
where C is an arbitrary condition
sC(R x S)
To construct the result:
1. Take the product of R and S
2. Select from the product only those tuples that
satisfy the condition C
Arity of result = arity of R + arity of S
Example: R
A
1
1
4
B
2
2
5
C
S where C is B < D
C
3
3
6
72
D
3
6
6
E
1
2
2
© D. Wong 2003
2002
Example Relation 3
R
A
B
C
a
b
d
S
B
C
D
c
b
c
d
b
c
b
c
e
b
b
f
a
d
b
c
a
d
73
© D. Wong 2003
2002
Natural Joins
R
S
Result:
i1, i2, …, im sR.A1=S.A1 … R.Ak=S.Ak (R x S) where i1, i2, …,
im is the list of all components of R x S in order, except
the components S.A1, … S.Ak
Example:
A
a
B
b
C
c
D
d
a
d
b
b
c
c
e
d
d
c
b
a
c
d
e
b
74
© D. Wong 2003
2002
Renaming
S(A1, A2, …, An)(T) or S(T)
Result: renames relation T to S
Example: R x S(X, Y, Z)(S)
A
a
a
B
b
b
C
c
c
X
b
d
Y
g
a
Z
a
f
d
d
c
a
a
b
f
f
d
b
d
b
g
a
g
a
f
a
c
b
d
d
a
f
75
© D. Wong 2003
2002
Week 4
76
© D. Wong 2003
2002
The rest of Ch. 5 – Relational Algebra
Ch. 10 – Logical query languages
77
© D. Wong 2003
2002
Combining Operations to Form Queries
Construct more complex expressions by applying
operators to sub-expressions
Use parentheses to indicate operands grouping
Multiple ways to write equivalent queries
Expression tree for visualizing complex expression
Query optimizer
Example (ref. Fig. 5.8):
title,year(slength≥100(Movie) sstudioName=‘Fox’(Movie))
or
title,year(slength≥100 AND studioName=‘Fox’(Movie))
78
© D. Wong 2003
2002
Algebraic Laws
R (S T) = (R S) T
Associative: e.g.
Commutative: e.g. R S = S R
79
© D. Wong 2003
2002
Dependent and Independent Operations
1.
Set operations: union, difference, intersection
2.
Remove parts of a relation:
–
Selection – eliminates rows (tuples)
–
Projection – eliminates columns
3.
Combine tuples of two relations : cartesian
product and joins
4.
Renaming: changes relation schema (i.e. relation
name, and/or attribute names)
80
© D. Wong 2003
2002
Constraints on Relations
Two ways to express constraints in relational algebra:
1.
R=Ø
// R Ø
2.
RS
// R - S = Ø
where R and S are relational algebra expressions
81
© D. Wong 2003
2002
Referential Integrity Constraints
Assertion: a value appearing in one context also
appears in another, related context.
Example:
Movie(title, year, length, inColor, studioName, producerC#)
MovieExec(name, address, cert#, networth)
Constraint: the producer of every movie is a certified movie
executive, i.e. appear in the MovieExec relation
producerC#(Movie) cert#(MovieExec)
or
producerC#(Movie) - cert#(MovieExec) = Ø
82
© D. Wong 2003
2002
Other Constraints
Domain constraints example:
MovieStars(name, address, gender, birthdate)
Constraint: acceptable values for the “gender” attribute are ‘F’ or
‘M’
sgender’F’ AND gender‘M’(MovieStar) = Ø
Other constraints example:
MovieExec(name, address, cert#, networth)
Studio(name, address, presC#)
Constraint: president of a movie studio must have a net worth of at
least $10,000,000
snetworth<10000000(Studio
presC#=cert#
MovieExec) = Ø
Functional dependency constraints
83
© D. Wong 2003
2002
Relational Operations on Bags
Bag
– a “set” that is allowed to have more than one
occurrence of an element
– => duplicate tuples in a relation
– Constraint representations work with bags
Reason:
– For implementation efficiency when duplication is
acceptable
– When actual no. of tuples is needed for aggregate
Example:
A
B
1
2
3
4
1
2
1
2
84
© D. Wong 2003
2002
Relational Operations of Bags (continue)
Given: R and S are bags, and tuple t appears in R n
times, and in S m times
R S : contains n + m tuple t
R – S : contains max(0, n-m) tuple t
R S : contains min(n, m) tuple t
A,B(R) : each tuple is processed independently,
resulting duplicate tuples are not eliminated
sc(R) : apply the selection condition to each tuple
independently, resulting duplicate tuples are not
eliminated
85
© D. Wong 2003
2002
Product and Joins of Bags
Given: R and S are bags, and tuple r appears in R m
times, and tuple s appears in S n times
R x S : the resulting tuple rs will appear mn times.
R S : each tuple of R is compared to each tuple
of S to decide if the pair tuples joins successfully,
do not eliminate duplicates
R CS : each tuple of R is compared to each tuple
of S to decide if the condition C is met, do not
eliminate duplicates
86
© D. Wong 2003
2002
Extended Operation to Relational Algebra
Duplicate elimination : to convert a bag to a set
Aggregation: count, sum, max, min, average
Grouping
Extended Projection
Sorting
Outerjoins
87
© D. Wong 2003
2002
OUTERJOINS
Dangling tuples: tuples that failed to match any tuple of
the other relation in the common attributes.
An operator to augment the result of a join by the
dangling tuples, padded with null values.
R
S : Full outerjoin of R1 and R2 is a join that
includes all rows from R1 and R2 matched or not.
Unmatched rows are padded with special null symbols .
LEFT outerjoin of R1 and R2 is a join that includes all
rows from R1, matched or not, plus the matching values
from R2. Unmatched rows are padded with
.
RIGHT outerjoin of R1 and R2 is a join that includes all
rows from R2, matched or not, plus the matching values
from R1. Unmatched rows are padded with
.
The joining may be NATURAL or theta join
88
© D. Wong 2003
2002
Outer Join Example
R
A
B
C
a
b
d
S
B
C
D
c
b
c
d
b
c
b
c
e
b
b
f
a
d
b
c
a
d
c
d
b
Full Outer Join
Natural join
A
B
C
D
A
B
C
D
a
b
c
d
a
b
c
d
a
b
c
e
a
b
c
e
d
b
c
d
d
b
c
d
d
b
c
e
d
b
c
e
b
b
f
c
a
d
b
c
a
d
b
c
d
b
89
© D. Wong 2003
2002
Extensions to the Relational Model
Modifications : insert, delete, update
Views : relational expression with a name to be
applied real relations to produce the relation
defined by the expression. Views can used as
arguments to other expressions.
Null values : common interpretations:
– Value unknown
– Value inapplicable
– Value withheld
90
© D. Wong 2003
2002
Ch. 10 Logical Query Languages
Motivation
Datalog
Relational Algebra to Datalog
Recursion in Datalog
Negation in Recursive Rules
91
© D. Wong 2003
2002
Motivation
Logical rules is more natural in representing
recursive queries
Logical rules form the basis of many informationintegration applications
92
© D. Wong 2003
2002
A Datalog rule example
Relation:
Movie(title, year, length, inColor, studioName,
producerC#)
LongMovie(t, y) Movie(t, y, l, c, s, p) AND l 100
head
subgoals
body
LongMovie = title, year(slength≥100(Movie))
93
© D. Wong 2003
2002
Datalog rule
Relational Atoms : predicate followed by arguments
Arithmetic Atoms : comparison between two
arithmetic expressions (e.g. x ≠ Y)
Predicate = relation name or arithmetic comparison
predicates (e.g. =, <, ≠, etc)
Head – a relational atom
Body – one or more atoms (subgoals) connected by
AND
Subgoals (not head) may be optionally negated by
NOT
Local variables – variables in body, not in head
94
© D. Wong 2003
2002
Datalog
A logic based data model
The underlying mathematical model of data is
essentially that of the relational model
Predicate symbols denote relations
Relational algebra operations are described by
rules
Query
: a collection of one or more rules.
The relation in the rule head is the answer to the
query
95
© D. Wong 2003
2002
Extensional and Intensional Predicates
Extensional Predicates (EDB)
The set of relations which ARE defined as part of the
actual database (i.e. physically stored).
e.g. R = {1}
Intensional Predicates (IDB)
The set of relations which are NOT defined as part of the
actual database but are instead abstracted from logical
rules.
e.g. P (x) Q (x)
Q (x) R (x)
A predicate must be IDB or EDB but not both.
–IDB predicate can appear in the body or head of a rule
–EDB predicate can appear in the rule body only
96
© D. Wong 2003
2002
3 Different Interpretations of Logical Rules
Proof-Theoretic Interpretaton
Model-Theoretic Interpretation
Computational Interpretation
The following discussions will use the EDB:
R = {1}
97
© D. Wong 2003
2002
Proof-Theoretic Interpretation
As axioms to be used in a proof.
–From the facts in the database, see what other
facts can be proved using the rules in all possible
ways.
–All facts derivable using the rules are derivable
by applying the rules in the forward direction
only
–Example:
P = {1}, Q = {1}
98
© D. Wong 2003
2002
Model-Theoretic Interpretation
As definition of possible worlds or models.
– To be a model, an interpretation must make the rules
true, no matter what assignment of values is made for
the variables in each rule.
– Multiple models are possible
– With no negations, a unique minimal model exists
that gives the same result as the proof-theoretic
interpretation.
– Minimal model : cannot make any true fact false and
still have a model consistent with the EDB
– Example:
1.
P = {1, 2, 3}, Q = {1, 2}
2.
P = {1}, Q={1}
99
© D. Wong 2003
2002
Computational Interpretation
By providing an algorithm for “executing” the rules
to determine whether a potential fact is true or false.
– E.g. Prolog – uses a particular algorithm that
involves searching for proofs of the potential
fact.
– Drawback:
the set of facts Prolog finds a proof is not
necessarily the same as the set of all facts for
which a proof exists
The set of facts Prolog finds true is not
necessarily a model.
100
© D. Wong 2003
2002
Week 5
101
© D. Wong 2003
2002
Example
1.
2.
3.
sibling(X, Y) parent(X, Z) AND parent(Y, Z) AND X ≠ Y.
cousin(X, Y) parent (X, Xp) AND parent (Y, Yp) AND
sibling(Xp, Yp).
cousin(X, Y) parent (X, Xp) AND parent (Y, Yp) AND
cousin((Xp, Yp).
4.
related(X, Y) sibling(X, Y).
5.
related(X, Y) related(X, Z) AND parent(Y, Z).
6.
related(X, Y) related(Z, Y) AND parent(X, Z).
102
© D. Wong 2003
2002
Meaning of Logical Rules
Head is true of its arguments if there exist values
for local variables that make all the subgoals true
If rule contains only non-negated relational
predicates, then value of head variables is the
result of natural join the subgoals and project onto
the head variables
Example:
Cousin(X, Y) = X,Y(P(X, Xp)
Yp))
103
P(Y, Yp)
S(Xp,
© D. Wong 2003
2002
Evaluation of Rules – Variable based
Consider all possible assignments of values to the
variables. If all subgoals are true, add the head to the
result relation.
Example:
–
Only assignments that make the first subgoal true:
Case 1: x = 1, z = 2
Case 2: x = 2, z = 3
–
In case 1, y = 3 makes second subgoal true. Since (1, 3) is
not in R, so the third subgoal is also true.
So, add (x, y) = (1, 3) to S
–
In case 2, no value of y makes the second subgoal true
–
So, S = {(1, 3)}
104
© D. Wong 2003
2002
Evaluation of Rules – Tuple based
Consider all assignments of tuples to subgoals that make
each subgoal true. If the variables are assigned consistent
values, add the head to the result.
Start with non-negated relational subgoals only
Example:
– Start with R(x,z) and R(z, y). Possible tuple assignments
to subgoals:
R(x, z)
R(z, y)
– Only 1 assignment gives
consistent value to z
(1, 2)
(1, 2)
(1, 2)
(2, 3)
– It also makes the third
subgoal true
– So, S = {(1, 3)}
(2, 3)
(1, 2)
(2, 3)
(2, 3)
105
© D. Wong 2003
2002
Safe rules
A rule is safe if all of its variables are limited.
Variable, X, in a rule is limited if:
1. Any variable that appears as an argument in a
relational predicate of the body is limited
2. Any variable that appears in a subgoal X=a,
or a=X, where a is a constant, is limited
3.Variable x is limited if it appears in a subgoal
X=Y or Y=X, when Y is a variable already
known to be limited
Rules must be safe to make sense (i.e. to produce
finite relation as the result)
106
© D. Wong 2003
2002
Safe and unsafe rule Examples
1.
biggerThan(X, Y) X > Y.
unsafe
2.
S(X) R(Y).
unsafe
3.
S(X) R(Y) AND X < Y
unsafe
4.
P(X, Y) Q(X, Z) & W=a & Y=W.
safe
107
© D. Wong 2003
2002
Datalog Rules and Relational Algebra summary
Intersection : AND
Union: multiple rules with same IDB predicate as
rule head
Difference: NOT
Projection: rule head atom has arguments of the
projected attributes
Selection: condition represented by arithmetic
subgoals
Product: AND two subgoals, with head atom
contain all attributes of the two subgoals
Joins
108
© D. Wong 2003
2002
A Datalog rule example
Relation: parent(C,P), where P is a parent of C
sibling(X, Y) parent(X, Z) AND parent(Y, Z) AND X ≠ Y.
head
subgoals
body
S(X,Y) = X,Y(sX≠Y(P(X, Z)
109
P(Y, Z)))
© D. Wong 2003
2002
Recursion
P Q
P depends on Q
Draw a graph: node = IDB predicates, arc P -> Q
means P depends on Q
Cycles means recursion
Example:
cousin(X, Y) parent (X, Xp) AND parent (Y, Yp) AND
cousin((Xp, Yp).
110
© D. Wong 2003
2002
Dependency Graph
related
cousin
sibling
parent
111
© D. Wong 2003
2002
Least Fixedpoint
The smallest IDB relations that contain all tuples
that the rules require us to follow
Iterative Fixed-point Evaluation of recursive
rules:
Start
IDB =
yes
Apply rules
to IDB,
EDB
Changes
to IDB?
112
No
Done
© D. Wong 2003
2002
Fixedpoint Computation Example (Ref. Text)
1. FollowOn(x, y) SequelOf(x, y)
2. FollowOn(x, y) SequelOf(x, z) AND FollowOn(z, y)
Evaluation note:
At round i, the only new tuples added to any IDB
relation comes from match to tuples added in
the round i-1
113
© D. Wong 2003
2002
Negation in Recursive Rules
We can compute a minimal fixedpoint only for
stratifiable recursion with negation.
To determine stratifiability:
1. Draw a graph whose nodes are the IDB
predicates
2. Draw an arc (labeled -) from node A to node B
for rule A not B
3. Draw an arc (no label) from node A to node B
for rule A B
114
© D. Wong 2003
2002
Negation in Recursive Rules (continued)
If graph has a cycle containing one or more negative
arcs, the the recursion is NOT stratified. (When
negation appears in the least fixedpoint operation.)
Stratum of IDB predicate A = the largest no. of
negative arcs on a path beginning from A.
If no negative arcs on any paths from predicate A,
stratum number of A = 0
Evaluate IDB predicates in the order of their strata,
starting with the lowest.
115
© D. Wong 2003
2002
Week 6
116
© D. Wong 2003
2002
Ch. 3 (continued)
Database design problems
Functional Dependency
Keys of relations
Decompositions based on Functional Dependency
(normalization)
– Boyce-Codd Normal Form (BCNF)
– Third Normal Form
– Recovering information from a decomposition
Multivalued Dependencies and decomposition
– Fourth Normal Form
Relationships Among Normal Forms
117
© D. Wong 2003
2002
Database Design Problems
Principal kinds of anomalies that constitute bad
db design:
– Redundancy
– Update Anomalies
– Deletion Anomalies
– Insertion Anomalies
Want to find a good relation schema design for the
relational model
118
© D. Wong 2003
2002
Redundancy
Information repeated unnecessarily in several
tuples.
Example:
– length and filmType for Movie
119
© D. Wong 2003
2002
Update Anomalies
Change information in one tuple but leave the
same information unchanged in another
A consequence of redundancy
Cause potential inconsistency
Example:
– Change length of Star Wars in one tuple but not
the others
120
© D. Wong 2003
2002
Deletion Anomalies
A set of values becomes empty (deleted), may lose
other information as side effect
Example:
– Deleting the only star listed for the movie
Mighty Ducks
121
© D. Wong 2003
2002
Insertion Anomalies
Cannot insert a tuple because some of the data not yet
available
Inverse to deletion anomalies
Problem of using null value to fill the missing / unavailable
data:
– When the data becomes available, will we remember to
delete the one with nulls
– If the missing data is part of a key, then can’t use null
Example:
– Cannot start to keep track of the information of a new
movie when the cast is not yet determined
122
© D. Wong 2003
2002
Dependencies
Dependency is an assertion that only a subset of all
possible relations are ‘legal’. An assertion about the real
world, cannot be proved.
It’s a form of constraints.
Dependencies in a relation means some sort of redundancy
in the legal relations.
Example: title, year length
title
year
length
filmType
studioName
starName
Star wars 1977
124
Color
Fox
CarrieFisher
Star wars 1977
???
Color
Fox
Mark Hamil
Functional dependencies (FD)
Multivalued dependencies (MVD)
123
© D. Wong 2003
2002
Functional Dependencies (FD)
Given: relation schema R(A1, …, An), and X and Y be
subsets of (A1, … An).
FD : X Y means X functionally determines Y
e.g. A1A2…An B1B2…Bm
X Y is an assertion about R that whenever two tuples
agree on all the attributes of X, then they must also agree
on attributes of Y.
It’s a constraint on the data that may appear within a
relation. (i.e. schema level control of data)
It’s a restriction on relations that depend only on the
equality or inequality of values, i.e. value-oblivious
It’s the most important type of value-oblivious constraints.
Value-oblivious constraints have the greatest impact for
designing database schemas
124
© D. Wong 2003
2002
Significance of Functional Dependencies
Key:
1. Entity set R (A1, …, An), and X is a subset of
A1, …, An that form a key for R, then may
assert:
X Y where Y is any subset of {A1, …, An}
2. R(A1, …, An) is a many-one relationship from
entity set E1 to entity set E2, and among the
Ai’s are attributes that form a key X for E1
and a key Y for E2, then may assert: X Y
Tool for explaining the process of normalization
125
© D. Wong 2003
2002
Representing FD using Relational Algebra
Example 5.31
MovieStar (name, address, gender, birthdate)
FD: name address
sMS1.name=MS2.name AND MS1.address MS2.address
(MS1(MovieStar) x MS2(MovieStar)) =
126
© D. Wong 2003
2002
Keys in the E/R Model
key for an entity set E is a set K of one or more attributes
such that, given any two distinct entities e1 and e2 in E, e1
and e2 cannot have identical values for each of the
attributes in the key K.
– A key can consist of more than one attribute
– There can be more than one possible key (candidate
keys), but designate one as the “primary key”.
– Attributes that constitute the primary key cannot be
null
– If the entity set is in an isa-hierarchy, require the root
entity set have all the attributes needed for a key.
– In E/R diagram, attributes of a key are underlined
127
© D. Wong 2003
2002
Keys of Relations
K is a key for relation R if:
1. K all attributes of R
2. For no proper subset of K is (1) true (i.e. a key
must be minimal)
3. If K at least satisfies (1), then K is a superkey
Example: {title, year, starName) forms a key for the
Movie relation
Superkeys:
A set of attributes that contains a key is called a
superkey
128
© D. Wong 2003
2002
Discovering Keys for relations
Consider relation R:
1.
If R comes from an entity set, then the key for the
relation is the key attributes of this entity set
2.
If R is from a many-many relationship, then the keys of
both connected entity sets are the key attributes for R
3.
If R is from a many-one relationship from entity set E1 to
entity set E2, then the keys attributes of E1 are the key
attributes for R
4.
If R is from a one-one relationship, then the key
attributes for either of the connected entity sets are key
attributes of R. More than 1 candidate key in this case.
129
© D. Wong 2003
2002
FD Rules
Splitting / Combining rule
Trivial Dependencies
Armstrong’s Axioms:
1. Reflexivity
2. Augmentatioin
3. Transitivity
Computing closure of attributes
Finding all implied FD’s ( Ref. section: 3.5.7)
130
© D. Wong 2003
2002
Splitting / Combining rule
FD:
A1A2…An B1B2…Bm
I
vs.
A1A2…An B1
II
...
A1A2…An Bm
Splitting rule: replace FD I by the set of FDs II
Combining rule: replace the set of FDs II by FD I
131
© D. Wong 2003
2002
Trivial Dependencies
FD A1A2…An B is trivial if B is one of the A’s
Every trivial dependency holds in every relation
For FD A1A2…An B1B2…Bm
– Trivial if the B’s are a subset of the A’s
e.g. title year title
– Nontrivial if at least one of the B’s is not among the A’s
– Completely nontrivial if none of the B’s is also one of
the A’s
Trivial-dependency rule:
– A1A2…An B1B2…Bm A1A2…An C1C2…Ck where
the C’s are all those B’s that are not also A’s
132
© D. Wong 2003
2002
Armstrong’s Axioms:
A set of inference rules: (Pg. 135, 2nd ed. Pg.99)
1.
Reflexivity
If {B1, B2, …, Bm} {A1, A2, …, An}, then
A1A2…An B1B2…Bm. // trivial dependencies
2.
Augmentatioin
If A1A2…An B1B2…Bm, then
A1A2…AnC1…Ck B1B2…BmC1…Ck for any set of
attributes C1…Ck
3.
Transitivity
If A1A2…An B1B2…Bm and B1B2…Bm C1…Ck then
A1A2…An C1…Ck
133
© D. Wong 2003
2002
Week 7
134
© D. Wong 2003
2002
Functional Dependencies (FD)
Given: relation schema R(A1, …, An), and X and Y be
subsets of (A1, … An).
FD : X Y means X functionally determines Y
e.g. A1A2…An B1B2…Bm
Significance:
– Key:
1. Entity set R (A1, …, An), and X is a subset of A1, …, An
that form a key for R, then may assert:
X Y where Y is any subset of {A1, …, An}
2. R(A1, …, An) is a many-one relationship from entity set
E1 to entity set E2, and among the Ai’s are attributes
that form a key X for E1 and a key Y for E2, then may
assert: X Y
135
© D. Wong 2003
2002
Keys of Relations
K is a key for relation R if:
1. K all attributes of R
2. For no proper subset of K is (1) true (i.e. a key
must be minimal)
3. If K at least satisfies (1), then K is a superkey
Example: {title, year, starName) forms a key for the
Movie relation
Superkeys:
A set of attributes that contains a key is called a
superkey
136
© D. Wong 2003
2002
FD Rules
Splitting / Combining rule
Trivial Dependencies
Armstrong’s Axioms:
1. Reflexivity
2. Augmentatioin
3. Transitivity
Computing closure of attributes
Finding all implied FD’s ( Ref. section: 3.5.7)
137
© D. Wong 2003
2002
Given vs. Derived FD’s
Given FD’s: stated initially for a relation. FD’s
These are known to hold for a relation R.
Derived FD’s are FD’s that follow logically from
the given FD’s (using the inference rules) or by the
closure algorithm of attributes.
Basis : any set of given FD’s from which all FD’s
for a relation can be inferred.
Minimal basis: if no proper subset of the FD’s in a
basis can also derive the complete set of FD’s.
Example: 3.22 – pp. 98
138
© D. Wong 2003
2002
Closure of attributes
Let {A1,A2,…,An} is a set of attributes, S is a set of
FD’s
Closure of {A1,A2,…,An} under the dependencies in
S = set of attributes B such that every relation that
satisfies all the dependencies in set S also satisfies
A1A2…An B. I.e. B is the set of attributes
functionally determined by {A1,A2,…,An}
Notation: {A1,A2,…,An}+
Allow trivial dependencies, so A1A2…An are
always in {A1,A2,…,An}+
139
© D. Wong 2003
2002
Closure Algorithm
Let X be {A1,A2,…,An}+
1. X = {A1,A2,…,An}
// initialize X
2. Search for some FD B1B2…Bm C where
B1B2…Bm X, but C is not in X. If found, add
C to X
3. Repeat step 2 until no more attributes can be
added to X.
4. The resulting X is the correct value of
{A1,A2,…,An}+
140
© D. Wong 2003
2002
Attributes Closure Example. 3.19 pp. 93
Given: R(A, B, C, D, E, F) and FD’s
AB C
BC AD
DE
CF B
To find: {A, B}+
Procedure:
1. Let X = {A, B}
2. Add C to X
3. Add D to X
// AB C, X = {A, B, C}
// BC AD, X = {A, B, C, D}
4. Add E to X
// D E, X = {A, B, C, D, E}
5. End, {A, B}+ = {A, B, C, D, E}
141
© D. Wong 2003
2002
To infer if a FD follows a set of FD in a relation
AB D follows from dependencies in example 3.19?
– Compute {A, B}+ , if D ends up in the closure,
then AB D follows from the dependencies
– Since {A, B}+ = {A, B, C, D, E}, AB D follows.
D A follows from dependencies in example 3.19?
– Compute {D}+ , if A ends up in the closure, then
D A follows from the dependencies
– But {D}+ = {D, E}, D A does NOT follows.
142
© D. Wong 2003
2002
Attributes Closures and Keys
To test if A1,A2,…,An is a key for a relation R:
Compute {A1,A2,…,An}+
1. Check if {A1,A2,…,An}+ is the set of all
attributes of R
2. If yes, then check no subset of A1,A2,…,An, say
S, such that S+ is the set of all attributes of R
If A1,A2,…,An only satisfies 1 but not 2, then
A1,A2,…,An is a superkey
143
© D. Wong 2003
2002
Finding all implied FD’s
Motivation: Given relation R with FDs F. When
projecting R to form new relation S, want to know what
FD’s hold in S
Method: compute all FD’s that
1. Follow from F
2. Involve only attributes of S
Problem: no. of FDs may be large (could be exponential
in the number of attributes of S)
So, make simplifications in computing attribute closure:
– Ignore empty set and set of all attributes (trivial FD’s)
–
–
Drop XY A if X A holds.
If closure of some set X is all attributes, then ignore
supersets of X.
144
© D. Wong 2003
2002
Algorithm to find all implied FD’s
1.
2.
3.
For each set of attributes X compute X+. Start
with closures of singleton set, and then move onto
doubleton if necessary.
Add X A for each A in X+ - X
Ignore “obvious” dependencies that follow from
others as described in the simplification
guidelines.
145
© D. Wong 2003
2002
Finding all implied FD’s Example
Given: R(ABCD), and FD’s F: AB C, C D, D A
What FD’s follow from F?
•
A+ = A; B+ = B
// nothing
•
C+ = ACD
// add C A
•
D+ = DA
// nothing new
•
(AB)+ = ABCD
// add AB D; skip all supersets of AB
•
(BC)+ = ABCD
// nothing new; skip all supersets of BC
•
(BD)+ = ABCD
// add BD C; skip all supersets of BD
•
(AC)+ = ACD ; (AD)+ = (AD); (CD)+ = ACD // nothing new
•
(ACD)+ = ACD
•
All other sets contain AB, BC, or BD, so skip
•
the only interesting FD’s that follow from F are :
// nothing new;
C A, AB D, BD C
146
© D. Wong 2003
2002
Normalization
Purpose: process to eliminate redundancy in
relations due to functional or multi-valued
dependencies.
Normal forms:
– Boyce-Codd Normal Form (BCNF)
– Third Normal Form (3NF)
– Fourth Normal Form (4NF)
147
© D. Wong 2003
2002
Boyce-Codd Normal Form (BCNF)
Definition:
A relation R is in BCNF iff whenever there is a nontrivial
dependency A1A2…An B holds for R, {A1,A2,…,An}
must be a superkey for R
Alternative definition after applying the combining rule to
all FD’s with a common L.S.:
A relation R is in BCNF iff whenever nontrivial
dependency A1A2…An B1B2…Bm holds for R,
{A1,A2,…,An} must be a superkey for R
Guarantees no redundancy, prevents update, deletion, and
insertion anomalies
E.g. title year length filmType studioName // violates
BCNF because key of Movie is {title, year, starName}, so
Movie is not in BCNF
148
© D. Wong 2003
2002
Decomposition to BCNF
Given: relation R, FD’s F.
Decomposition strategy:
1. look for a non-trivial FD A1A2…An B1B2…Bm that
violates BCNF. (Heuristic: add to the RS as many
attributes as are functionally determined by
A1A2…An)
2. Compute {A1,A2,…,An}+ // all the attributes involved
in the violating dependency
Others
3. Decompose R into
1.
2.
A’s
{A1,A2,…,An}+ , and
(R - {A1,A2,…,An}+) {A1,A2,…,An}
B’s
// LS + all
attributes not involved in the violating dependency
149
© D. Wong 2003
2002
BCNF Decomposition Example 3.24 pp 104
Relation: Movie(title, year, length, filmType, studioName,
starName)
Key: {title, year, starName}
FD’s: title year length filmType studioName is a BCNF
violation, so Movie not in BCNF
Decomposition:
Schema 1: {title, year, length, filmType, studioName}
Schema 2: {title, year, starName}
To obtain the new relations, project the schemas onto Movie
To recover information (I.e. Movie) from the new relations:
natural join the new relations. Does not lose information.
150
© D. Wong 2003
2002
Additional points about BCNF
A relation may have more than 1 keys
Need some key in the LHS of any nontrivial FD
Only nontrivial FD’s are potential BCNF violation
candidates
Any 2-attributes relations is in BCNF
Decomposition must be based on FD that holds in
the relation, otherwise can’t recover the original
relation from the new relations.
Eliminates all redundancies
Some decomposition may not preserve the FD’s
(e.g. 3.32 pp 114)
151
© D. Wong 2003
2002
Third Normal Form (3NF)
Definition:
A relation R is in 3NF if: whenever there is a nontrivial
dependency A1A2…An B holds for R, either
{A1,A2,…,An} is a superkey for R, or B is a member of
some key (i.e. B is prime).
BCNF with a relaxed condition
Some redundancy might be left in 3NF if resulting relations
are not in BCNF
Preserves all the FD’s
Process similar to BCNF with the added condition
To recover information (I.e. Movie) from the new relations:
natural join the new relations. Does not lose information.
152
© D. Wong 2003
2002
Week 9
153
© D. Wong 2003
2002
Normalization
Purpose: process to eliminate redundancy in
relations due to functional or multi-valued
dependencies.
Decompose relation schema into Normal forms:
– Boyce-Codd Normal Form (BCNF)
– Third Normal Form (3NF)
– Fourth Normal Form (4NF)
To obtain the new relations, project the schemas
onto the original relation schema (e.g. Movie)
To recover information (I.e. Movie) from the new
relations: natural join the new relations.
154
© D. Wong 2003
2002
BCNF Decomposition Example 3.24 pp 104
Relation: Movie(title, year, length, filmType, studioName,
starName)
Key: {title, year, starName}
FD’s: title year length filmType studioName is a BCNF
violation, so Movie not in BCNF
Decomposition:
Schema 1: {title, year, length, filmType, studioName}
Schema 2: {title, year, starName}
To obtain the new relations, project the schemas onto Movie
To recover information (I.e. Movie) from the new relations:
natural join the new relations. Does not lose information.
155
© D. Wong 2003
2002
Functional Dependencies (FD)
Given: relation schema R(A1, …, An), and X and
Y be subsets of (A1, … An).
FD : X Y means X functionally determines Y
e.g. A1A2…An B1B2…Bm
A1A2…An B1B2…Bm is an assertion about R
that two attributes or sets of attributes in R are
dependent of one another.
156
© D. Wong 2003
2002
Mutivalued Dependencies (MVD)
relation schema R, and A1A2…An and B1B2…Bm be
subsets of attributes of R.
Given:
MVD : A1A2…An B1B2…Bm holds in R if :
For each pair of tuples t and u of relation R that agree on
all the A’s, we can find in R some tuple v that agrees:
1. With both t and u on the A’s,
2. With t on the B’s, and
3. With u on all attributes of R that are not among the
A’s or B’s
A1A2…An B1B2…Bm is an assertion about R that two
attributes or sets of attributes in R are independent of one
another.
Cause redundancy not related to FD’s in a BCNF schema.
Most common source: putting 2 or more many-many
relationships in a single relation.
157
© D. Wong 2003
2002
MVD Rules
Trivial dependencies rule
If A1A2…An B1B2…Bm holds for R, then A1A2…An
C1C2…Ck holds where the C’s are the B’s + one or
more of the A’s. The converse also hold.
Transitive rule
If A1A2…An B1B2…Bm and B1B2…Bm C1C2…Ck
then A1A2…An C1C2…Ck
Splitting rule does not hold
E.g. name street city, but not name street
So, always start with set of attributes on the R.S. because
splitting rule does not hold.
158
© D. Wong 2003
2002
More MVD Rules
Every FD is an MVD
Because If FD A1A2…An B1B2…Bm, then swapping B’s
between tuples that agree on A’s doesn’t create new tuples.
A’s
B’s
t
u
Complementation rule
If X Y, then X Z, where Z is all attributes not in
X or Y
e.g. Star_Star_In {name, street, city, title, year}
name street city
name title year
159
© D. Wong 2003
2002
Nontrivial MVD
A1A2…An B1B2…Bm for a relation R is
nontrivial if:
1.
B1B2…Bm is not a subset of A1A2…An
2.
A1A2…An B1B2…Bm is not all attributes of R
160
© D. Wong 2003
2002
Fourth Normal Form (4NF)
Decompose relations that has MVD’s into 4NF to
eliminate MVD’s.
Definition:
R is in 4NF if A1A2…An B1B2…Bm is a
nontrivial MVD, {A1A2…An} is a superkey.
Since every FD is an MVD, so 4NF is more
stringent than BCNF
Only nontrivial MVD’s has the potential to violate
4NF
161
© D. Wong 2003
2002
4NF Decomposition
Given: relation R, and nontrivial MVD X Y that violate
4NF
1.
Decompose X Y into XY and X (R-Y)
R
X
Y
2.
Produce the relations by projecting R onto XY and
(R-Y)
X
3.
Reconstruct R from the new relations using natural join
e.g. Star_Star_In {name, street, city, title, year} and
name street city
Decompose Star_Star_In using name street city into
{name, street, city} and {name, title, year}
162
© D. Wong 2003
2002
Relationships among normal forms
4NF is the most stringent
4NF BCNF 3NF
163
© D. Wong 2003
2002
Lossless-join decomposition
Given: Relation R, decomposed into schemes R1, R2, …
Rk, and D is a set of dependencies.
Definition: R1, R2, … Rk is a lossless-join (w.r.t. D) if for
every relation r for R satisfying D:
r = R1(r)
R2(r)
…
Rk(r)
i.e. Every relation r for R is the natural join of
its projections onto the Ri’s.
The lossless-join property is necessary if the decomposed
relation is to be recoverable from its
decomposition.
However, joins are expensive. So, don’t over decompose!
164
© D. Wong 2003
2002
Structured Query Language (SQL)
A DDL and DML for relational DBMSs
History: ANSI SQL, , SQL-92 (SQL2), SQL-99 (SQL3)
SQL-99 extends SQL2 with object-relational features and
other new features
Most DBMS vendors implements the core, and then add
bells and whistles and variations
Query capability is close to relational algebra, with lots of
extensions.
Case insensitive except characters inside quoted strings ' '
e.g. 'Smith' 'SMITH'
; as statement delimiter
165
© D. Wong 2003
2002
Example database schema
Movie(title, year, length, inColor, studioName, producerC#)
StartIn(movieTitle, movieYear, starName)
MovieStar(name, address, gender, birthdate)
MovieExec(name, address, cert#, netWorth)
Studio(name, address, presC#)
166
© D. Wong 2003
2002
SQL Quries – basic form
SELECT attribute/s
FROM relations / views /subqury
WHERE conditional expression;
167
© D. Wong 2003
2002
SQL query examples
1.
Example 1:
SELECT *
FROM Movie;
2.
-- * => all attributes of Movie
Example 2:
SELECT *
FROM Movie
WHERE studioName = 'Disney' AND year = 1990;
3.
Example 3:
SELECT title, length
FROM Movie
WHERE studioName = 'Disney' AND year = 1990;
168
© D. Wong 2003
2002
Duplicates
SQL generally operates using bags instead of sets
Exception: UNION, INTERSECT, EXCEPT
operation
To eliminate duplicates, add keyword DISTINCT
to the SELECT clause
e.g. SELECT DISTINCT starName
FROM StarsIn;
Duplicate elimination is costly. Use judiciously.
169
© D. Wong 2003
2002
SQL Correspondence to Relational Algebra
SELECT L
-- R.A. project
FROM R
-- R.A. operands
WHERE C ;
-- R.A. select
R.A. expression: L(sC(R))
When reading and writing queries:
1.
FROM
-- what relations are involved
2.
WHERE
-- what's the tuples selection criteria
3.
SELECT
-- what columns to output
170
© D. Wong 2003
2002
Union, Intersection, Difference of Queries
UNION : R1 UNION R2
or (Q1) UNION (Q2)
e.g. (SELECT title, year FROM Movie)
UNION
(SELECT movieTitle AS title, movieYear AS
year FROM StarsIn);
INTERSECT : R1 INTERSECT R2 or
(Q1) INTERSECT (Q2)
EXCEPT: R1 EXCEPT R2
-- difference
(Q1) EXCEPT (Q2)
171
© D. Wong 2003
2002
Union, Intersection, Difference of Queries (continued)
Q1 and Q2 are queries that produce relations
R1 and R2, or results of Q1 and Q2 should have
the same list of attributes and attribute types.
Rename if necessary.
Duplicates are eliminated automatically
Add the keyword ALL after UNION,
INTERSECT, or EXCEPT to prevent duplicates
elimination
172
© D. Wong 2003
2002
SQL and Relational Algebra
The six independent operations are implemented
by SQL
SQL is relational complete
173
© D. Wong 2003
2002
Some data values in SQL
1.
Strings
2.
Dates and Times
3.
Null values
4.
Truth value of Unknown
174
© D. Wong 2003
2002
1. Strings
Comparison operators (according to lexicographical order)
<, >, <=, >= =
LIKE -- pattern matching
% -- matches any sequence of 0 or more characters
_ -- matches any one character
E.g.: title LIKE 'Star _ _ _ _'
E.g.: title LIKE '%''s%'
Can specify escape character
E.g.
title LIKE 'x%%x%' ESCAPE 'x'
175
© D. Wong 2003
2002
2. Dates and Times
Date constant: DATE '2002-10-01'
Time constant: TIME '15:00:02.5'
Timestamp (combines dates and times):
TIMESTAMP '2002-10-01 15:00:02.5‘
(beware of implementation differences!)
Comparison operators apply
176
© D. Wong 2003
2002
3. Null Values
NULL to represent:
1. Value unknown
2. Value inapplicable
3. Value withheld
Operations involving NULL
1. Arithmetic operation: result is NULL
2. Comparison: result is UNKNOWN
NULL is not a constant, therefore NULL cannot be used
explicitly as an operand.
IS NULL and IS NOT NULL checks
Read "Pitfalls Regarding Nulls" pp. 250
177
© D. Wong 2003
2002
4. UNKNOWN
Consider TRUE = 1, FALSE = 0, UNKNOWN =
0.5
1. AND of 2 truth-value = min. of the 2 values
2. OR of 2 truth-value = max. of the 2 values
3. Negation of v = 1-v
Refer Fig. 6.2 pp. 250 for truth table for 3-valued
logic
178
© D. Wong 2003
2002
The Six Clauses in SQL Queries
1.
SELECT
-- required
2.
FROM
-- required
3.
WHERE
4.
GROUP BY
5.
HAVING
6.
ORDER BY
Subqueries may appear in the FROM clause and the
WHERE clause
Comments begins with ‘--’
-- if used, must follows a group by clause
179
© D. Wong 2003
2002
Table level SQL (ref. 6.6, pp. 292)
Create table – to define the schema of a base table
(Ref. 6.6.1 for data types syntax)
E.g.
create table EMP
(
empno int not null,
lastName varchar(30) not null,
firstName varchar(30) not null,
num_of_children int,
constraint pk_EMP primary key (empno)
);
Drop table – to destroy a base table
e.g. drop table EMP;
180
© D. Wong 2003
2002
Tuple Modification Statements (ref. 6.5, pp. 286)
Insert – to add a row
Syntax: insert into R(A1..An) values (v1…vn)
– E.g. insert into emp(empno, lastName, firstName,
num_of_children) values (12345, ‘Doe’, ‘John’, 1)
– Or insert into emp values (12345, ‘Doe’, ‘John’, 1)
Delete – to remove a row
Syntax: delete from R where <condition>
– E.g. delete from emp where empno = 12345
Update – to modify the contents of a row
Syntax: update R set Ai = value where Aj = targetValue
– E.g. update emp set num_of_children = 2 where empno =
12345
181
© D. Wong 2003
2002
Some JOINS in SQL. (ref. pp. 270)
CROSS JOIN
-- R.A. cartesian product
e.g. Movie CROSS JOIN StarsIn;
JOIN … ON
-- R.A. theta-join
e.g. Movie JOIN StarsIn ON title = movieTitle AND year =
movieYear;
[NATURAL] JOIN
-- R.A. natural join
e.g. MovieStar NATURAL JOIN MovieExec; or
MovieStar JOIN MovieExec;
OUTERJOINS
-- joins that include dangling tuples
182
© D. Wong 2003
2002
OUTERJOINS
An operator to augment the result of a join by the
dangling tuples, padded with null values.
Full outerjoin of R1 and R2 is a join that includes all
rows from R1 and R2 matched or not. Unmatched rows
are padded with NULLs.
LEFT outerjoin of R1 and R2 is a join that includes all
rows from R1, matched or not, plus the matching values
from R2. Unmatched rows are padded with NULLs.
RIGHT outerjoin of R1 and R2 is a join that includes all
rows from R2, matched or not, plus the matching values
from R1. Unmatched rows are padded with NULLs.
The joining may be NATURAL or theta join
183
© D. Wong 2003
2002
Outerjoins Syntax
1.
R1 NATURAL {FULL | LEFT | RIGHT} OUTER
JOIN R2;
E.g. 1. MovieStar NATURAL FULL OUTER
JOIN MovieExec;
E.g. 2. MovieStar NATURAL LEFT OUTER
JOIN MovieExec;
E.g. 3. MovieStar NATURAL RIGHT OUTER
JOIN MovieExec;
184
© D. Wong 2003
2002
Outerjoins Syntax (continued)
1.
R1 {FULL | LEFT | RIGHT} OUTER JOIN R2
ON conditional expression;
E.g. 1. Movie FULL OUTER JOIN StarsIn ON
title = movieTitle AND year = movieYear;
E.g. 2. MovieStar LEFT OUTER JOIN StarsIn
ON title = movieTitle AND year = movieYear;
E.g. 3. MovieStar RIGHT OUTER JOIN StarsIn
ON title = movieTitle AND year = movieYear;
185
© D. Wong 2003
2002
Use result of joins as subqueries in queries
E.g.
SELECT title, year, length, inColor, studioName,
producerC#, starName
FROM Movie JOIN StarsIn ON
title = movieTitle AND year = movieYear;
186
© D. Wong 2003
2002
Week 10
187
© D. Wong 2003
2002
SQL (continued)
SQL continued
Oracle Sequence
Views
188
© D. Wong 2003
2002
SQL Correspondence to Relational Algebra
SELECT L
-- R.A. project
FROM R
-- R.A. operands
WHERE C ;
-- R.A. select
R.A. expression: L(sC(R))
When reading and writing queries:
1.
FROM
-- what relations are involved
2.
WHERE
-- what's the tuples selection criteria
3.
SELECT
-- what columns to output
189
© D. Wong 2003
2002
SQL and Relational Algebra
The six independent operations are implemented
by SQL
SQL is relational complete
190
© D. Wong 2003
2002
Some data values in SQL
1.
Strings
2.
Dates and Times
3.
Null values
4.
Truth value of Unknown
191
© D. Wong 2003
2002
4. UNKNOWN
Consider TRUE = 1, FALSE = 0, UNKNOWN =
0.5
1. AND of 2 truth-value = min. of the 2 values
2. OR of 2 truth-value = max. of the 2 values
3. Negation of v = 1-v
Refer Fig. 6.2 pp. 250 for truth table for 3-valued
logic
192
© D. Wong 2003
2002
SQL data types (pp. 292, 6.6.1)
Character strings:
char(n)
-- fixed length
varchar(n)
-- varying length
Integer
int or integer
Floating-point numbers
float, real
dec (m, n), or decimal(m, n) -- floating point number
with fixed decimal places, where m > n.
e.g. dec(7, 2)
Dates and times
date, time, timestamp, datetime
Refer to handout for SQL more data types description.
193
© D. Wong 2003
2002
SQL syntax: the Six Clauses in SQL Queries
1.
2.
3.
4.
5.
6.
SELECT
FROM
WHERE
-- required
-- required
GROUP BY
HAVING
-- if used, must follows a group by
clause, to choose the groups based on some aggregate
properties of the group itself
ORDER BY
-- to sort the output
E.g.
Select name, sum(length) from MovieExec, Movie
Where producerC# = cert#
group by name having min(year) < 1930
Order by name desc
194
© D. Wong 2003
2002
SQL syntax (continued)
Subqueries may appear in the FROM clause and
the WHERE clause
Comments begins with ‘--’
Tuple variables : an alias to a Relation, defined in
the from clause
– E.g.
Select Star1.name, Star2.name
From MovieStar Star1, MovieStar Star2
Where Star1.address = Star2.address and
Star1.name < Star2.name
195
© D. Wong 2003
2002
Interpreting Multi-relation Queries in SQL
To define the meaning of select-from-where
MovieStar(name, addr, gender, bDate)
name
addr
gender
bDate
J. Anniston
Addr ABC
F
8/88/88
H. Ford
Addr LMN
M
7/7/77
B. Pitt
Addr ABC
M
8/1/88
SELECT Star1.name, Star2.name
FROM MovieStar Star1, MovieStar Star2
WHERE Star1.addr = Star2.addr AND Star1.name <
Star2.name;
196
© D. Wong 2003
2002
3 Equivalent ways of interpretataion
1.
2.
Nested Loops
Parallel assignment
– No explicitly created nested loops
–
–
–
–
Consider all possible assignments of tuples from the
relations to the tuple variables
Each assignment that produce a true value from the
WHERE clause contribute a tuple to the answer
Construct the resulting tuples using the attributes of
the SELECT clause
Similar to the tuple based Datalog rule evaluation, the
WHERE clause serves the purpose of the arithmetic
subgoals
197
© D. Wong 2003
2002
Evaluation of Rules – Tuple based
Consider all assignments of tuples to subgoals that make
each subgoal true. If the variables are assigned consistent
values, add the head to the result.
Start with non-negated relational subgoals only
Example:
– Start with R(x,z) and R(z, y). Possible tuple assignments
to subgoals:
R(x, z)
R(z, y)
– Only 1 assignment gives
consistent value to z
(1, 2)
(1, 2)
(1, 2)
(2, 3)
– It also makes the third
subgoal true
– So, S = {(1, 3)}
(2, 3)
(1, 2)
(2, 3)
(2, 3)
198
© D. Wong 2003
2002
3. Conversion to R.A
FROM : Cartesian product (rename if necessary)
WHERE : as the selection criteria
SELECT : specify the projection operation
attributes
199
© D. Wong 2003
2002
Unintuitive Result of SQL Interpretation
Given: unary relations R, S, T
To compute: R (ST)
Consider query:
SELECT R.A
FROM R, S, T
WHERE R.A = S.A OR R.A = T.A;
Consider the case if T is empty.
200
© D. Wong 2003
2002
Some JOINS in SQL. (ref. pp. 270)
CROSS JOIN
-- R.A. cartesian product
e.g. Movie CROSS JOIN StarsIn;
JOIN … ON
-- R.A. theta-join
e.g. Movie JOIN StarsIn ON title = movieTitle AND year =
movieYear;
[NATURAL] JOIN
-- R.A. natural join
e.g. MovieStar NATURAL JOIN MovieExec; or
MovieStar JOIN MovieExec;
OUTERJOINS
-- joins that include dangling tuples
201
© D. Wong 2003
2002
OUTERJOINS
An operator to augment the result of a join by the
dangling tuples, padded with null values.
Full outerjoin of R1 and R2 is a join that includes all
rows from R1 and R2 matched or not. Unmatched rows
are padded with NULLs.
LEFT outerjoin of R1 and R2 is a join that includes all
rows from R1, matched or not, plus the matching values
from R2. Unmatched rows are padded with NULLs.
RIGHT outerjoin of R1 and R2 is a join that includes all
rows from R2, matched or not, plus the matching values
from R1. Unmatched rows are padded with NULLs.
The joining may be NATURAL or theta join
202
© D. Wong 2003
2002
Outerjoins Syntax
1.
R1 [NATURAL] {FULL | LEFT | RIGHT}
OUTER JOIN R2;
E.g. 1. MovieStar NATURAL FULL OUTER
JOIN MovieExec;
E.g. 2. MovieStar NATURAL LEFT OUTER
JOIN MovieExec;
E.g. 3. MovieStar NATURAL RIGHT OUTER
JOIN MovieExec;
203
© D. Wong 2003
2002
Outerjoins Syntax (continued)
2.
R1 {FULL | LEFT | RIGHT} OUTER JOIN R2
ON conditional expression;
E.g. 1. Movie FULL OUTER JOIN StarsIn ON
title = movieTitle AND year = movieYear;
E.g. 2. MovieStar LEFT OUTER JOIN StarsIn
ON title = movieTitle AND year = movieYear;
E.g. 3. MovieStar RIGHT OUTER JOIN StarsIn
ON title = movieTitle AND year = movieYear;
204
© D. Wong 2003
2002
Use result of joins as subqueries in queries
E.g.
SELECT title, year, length, inColor, studioName,
producerC#, starName
FROM Movie JOIN StarsIn ON
title = movieTitle AND year = movieYear;
205
© D. Wong 2003
2002
Conditions involving Relations
1.
2.
3.
4.
EXIST R – true iff R is not empty
S IN R – true iff S = one of the values in R
S NOT IN R
S > ALL R – true iff S > every value in R (> can be the
other comparison operators). Operator NOT applies.
S > ANY R – true iff S > at least 1 value in R (> can be the
other comparison operators). Operator NOT applies.
Case 1: S is scalar, R is unary
Case 2: S is a tuple, R is a multi-attribute relation (arity S
must = arity R)
When compare tuples, the first value has highest priority, and
subsequent value the next highest. E.g. (1, 7, 8) < (2, 0, 1)
Ref: 6.3-6.4 .
206
© D. Wong 2003
2002
Table level SQL (ref. 6.6, pp. 292)
Create table – to define the schema of a base table
(Ref. 6.6.1 for data types syntax)
E.g.
create table EMP
(
empno int not null,
lastName varchar(30) not null,
firstName varchar(30) not null,
num_of_children int,
constraint pk_EMP primary key (empno)
);
Drop table – to destroy a base table
e.g. drop table EMP;
207
© D. Wong 2003
2002
Tuple Modification Statements (ref. 6.5, pp. 286)
Insert – to add a row
Syntax: insert into R(A1..An) values (v1…vn)
– E.g. insert into emp(empno, lastName, firstName,
num_of_children) values (12345, ‘Doe’, ‘John’, 1)
– Or insert into emp values (12345, ‘Doe’, ‘John’, 1)
Read: P. 288 – Timing of Insertion.
Delete – to remove a row
Syntax: delete from R where <condition>
– E.g. delete from emp where empno = 12345
Update – to modify the contents of a row
Syntax: update R set Ai = value where Aj = targetValue
– E.g. update emp set num_of_children = 2 where empno =
12345
208
© D. Wong 2003
2002
Modifying Relation Schema
DROP TABLE – eliminate all data
Alternative is to ALTER TABLE:
1. To add a column
ALTER TABLE tablename ADD columnName typespec;
2. To delete a column
ALTER TABLE tablename DROP columnName;
3. To change an attribute size
ALTER TABLE tablename MODIFY (columnName
typespec);
209
© D. Wong 2003
2002
Default Value
Used when a value is not supplied for a tuple when
create or modify. System assume null when none
specified except for NOT NULL attributes.
Defined with data declarations:
e.g. Phone CHAR(16) DEFAULT ‘unlisted’
210
© D. Wong 2003
2002
Views
Relations that are defined by an expression like a
query.
Views can be queried, and can even be modified
in some cases
Declaring views in SQL:
CREATE VIEW <viewName> AS <view
definition>;
Use view to present what the user is allowed to
see
A view exists only within the environment of a
session.
211
© D. Wong 2003
2002
View Example
Given: base table
Movie(title, year, length, inColor, studioName,
presidentC#)
CREATE VIEW ParamountMovie AS
SELECT title, year
FROM Movie
WHERE StudioName = 'Paramount';
212
© D. Wong 2003
2002
SQL queries to View
1.
2.
3.
Has the same syntax as a stored relation
E.g.
Select title
From ParamountMovie
Where year = 1979;
When query to view, the tuples are obtained from the
base table.
Equivalent to the following query without view:
Select title
From Movie
Where studioName = 'Paramount' and ear = 1979;
Ref. Pp.308, 6.7.5 (Interpreting queries involving views)
213
© D. Wong 2003
2002
Update views
A view update is translated to modify the base table.
Only updatable views can be modified.
Views that can be updated:
–
–
Simple
Created by SELECT , but not SELECT DISTINCT,
some attributes from R :
1.
The WHERE clause must not involve R in a
subquery
2.
The select clause must include enough attributes
that for every tuple inserted into the view, the
other attributes can be filled out with NULL of
default value, and have a tuple of the base relation
that will yield the inserted tuple of the view.
214
© D. Wong 2003
2002
Update view problem e.g.
E.g.
INSERT into ParamountMovie VALUES ('Star
Trek', 1979);
Does not meet criteria 2 because:
The base relation Movie would have NULL rather
than 'Paramount' as StudioName
215
© D. Wong 2003
2002
Other operations on Views
DELETE tuple from a view = delete from base
table, with the condition defining the view added
to the condition of the delete statement
UPDATE view = update the base table, with the
condition defining the view added to the condition
of the update statement
DELETE view: does not affect the base table
DROP VIEW <viewName>;
216
© D. Wong 2003
2002
Oracle's Sequences
Sequential lists of numbers that are generated
automatically by the database.
Useful for creating unique surrogate key values
for primary key attribute when no desirable
attribute exist to use as the primary key
Data type of attributes using sequences is
NUMBER
217
© D. Wong 2003
2002
Oracle's Sequences (continued 1)
To create a sequence (SIMPLE FORM):
CREATE SEQUENCE <sequence name> [INCREMENT
BY <number>] [START WITH <start value number>];
e.g. CREATE SEQUENCE ClassSequence INCREMENT
BY 1 START WITH 1;
To access the next sequence value:
<sequence name>.NEXTVAL
To use <sequence name>.NEXTVAL in an insert
statement:
INSERT INTO <table name> VALUES (<sequence
name>.NEXTVAL, <field2> … );
e.g INSERT INTO class (classkey, classid, description,
section, location) VALUES (ClassSequence.NEXTVAL,
?, ?, ?, ?);
218
© D. Wong 2003
2002
Oracle's Sequences (continued 2)
To access the current sequence value:
<sequence name>.CURRVAL
may use in select, insert statements
May be used immediately after using the
NEXTVAL in the same database session.
To view property of a sequence using the
"user_sequences" system table:
SELECT * FROM user_sequences;
To delete a sequence:
DROP SEQUENCE <sequence name>;
219
© D. Wong 2003
2002
Week 11
220
© D. Wong 2003
2002
Indexes
JDBC
JDBC in J2EE (Java 2 Enterprise Edition)
221
© D. Wong 2003
2002
Indexes
An index on an attribute A is a data structure to
improve query performance efficiency
Reason: not efficient to scan all tuples (for large
relations) in order to find the few that meet a
given condition
E.g.
SELECT * FROM Movie
WHERE studioName = ‘Disney’ AND year = 1990;
Not part of SQL standard
222
© D. Wong 2003
2002
Indexes Typical Syntax
To create index
CREATE INDEX indexName ON R(A1,…An)
E.g.
1. CREATE INDEX YearIndex ON Movie(year);
2. CREATE INDEX KeyIndex ON Movie(title, year);
Delete Index
DELETE INDEX yearIndex;
223
© D. Wong 2003
2002
Selection of Indexes
Index design requires an estimate of the typical
mix of queries and other operations on the db
Example of good use of indexes:
1. An attribute frequently compared to constant
in a where clause of a query
2. Attribute that appear frequently in join
operations
e.g. SELECT name FROM Movie, MovieExec
WHERE title = ‘status’ AND producerC#
= cert#;
224
© D. Wong 2003
2002
Decision factors
Important to strike a balance.
Factors:
1. Given attribute A, and index on A will:
–
–
Greatly speed up queries with a condition
on that attribute
May speed up joins involving A
2. Index make insertion, deletion, and updates
more complex and time-consuming
225
© D. Wong 2003
2002
Indexes (continued)
Techniques to execute SQL queries are intimately
associated with storage structures. Typically, a
relation is stored in many disk blocks.
An index is an auxiliary structure, perhaps stored
in a separate file, that support fast access to the
rows of a table.
Main cost of a query or modification is I/O:
No. of disk blocks to be read into memory and
write onto disk
226
© D. Wong 2003
2002
Index Selection Example
Given Relation: StarsIn(movieTitle, movieYear, starName)
3 operations to perform:
1. Q1: select movieTitle, movieYear From StarsIn
where starName = s;
2. Q2: select starName from StarsIn where movieTitle
= t and movieYear = y;
3. I: insert into StarsIn values(t, y, s);
where s, t, y are some constants
227
© D. Wong 2003
2002
Example's assumptions
1.
Cost for examining 1 disk block = 1 unit
2.
StarsIn is stored in 10 disk blocks
3.
Average no. of stars in a movie = 3
4.
Average no. of movies that a star appeared in = 3
5.
Tuples for a given star or movie are likely to be
spread over the 10 disk blocks of StarsIn
6.
One disk access is needed to read a block of the
index every time when the index is used to locate
tuples with a given value for the indexd
attribute(s)
228
© D. Wong 2003
2002
Example's Estimated Cost of actions
Action No Index
Star Index
Movie Index Both Indexes
Q1
10
4
10
4
Q2
10
10
4
4
I
2
4
4
6
Cost of 3
actions
2+8p1+8p2
4+6p2
4+6p1
6-2p1-2p2
Costs associated with the three actions, as a function of which
indexes are selected (Ref. Fig. 6.17 2nd ed.)
Star index is an index on StarName,
Movie index is an index on MovieTitle and movieYear.
The numbers in rows 2-5 of the table are no. of disk accesses for the action.
229
© D. Wong 2003
2002
Example's usage scenarios
The fraction of the time to do Q1 = p1, Q2 = p2,
I=1-p1-p2
Consider:
– Case 1: p1 = p2 = 0.1
– Case 2: p1 = p2 = 0.4
– Case 3: p1=0.5, p2=0.1
What is the best index strategy for each case?
Create only the index that helps the most
frequently used query type (e.g. query about stars
=> create Star index)
230
© D. Wong 2003
2002
JDBC (Java DataBase Connectivity)
An API to the database driver manager
Provides call-level interface for the execution of
SQL statements from a Java language program
Developed by SUN Microsystems
An integral part of the Java language
Java applications use the JDBC dialect of SQL,
independent of the DBMS used
Support applications to request information about
the schema from the DBMS at run time.
231
© D. Wong 2003
2002
JDBC Interaction with SQL DB
1.
Make connection to the database
2.
Create SQL statement
3.
Execute the SQL statement
4.
Result table from the SQL "select" statement is
returned as a Java object. JDBC provide
methods to access the rows and columns.
OR
5.
SQL statements return simple integer results that
represent the number of affected rows (e.g.
insert)
232
© D. Wong 2003
2002
Connecting to a db through JDBC
Driver 1
Application
Driver
Manager
Driver 2
Driver 3
DBMS
server
JDBC Modules
233
© D. Wong 2003
2002
JDBC - java.sql package
Defines a collection of interfaces and classes that allow
programs to interact with db.
Interfaces for primary SQL execution:
–
Driver: supports data connection creation
–
Connection: represents connection between a java client
and and SQL database server
–
Statement: includes methods for executing text queries
–
PreparedStatement: represents a precompiled and stored
query
–
CallableStatement: used to execute SQL stored procedures
–
ResultlSet: contains the results of a query
–
ResultSetMetaData: information about a ResultSet,
including the attribute names and types
234
© D. Wong 2003
2002
Client-server Architectures
User Interface /
Application
User Interface /
Application
User and
Application tier
Middleware
Middle tier
Application
Database
Server
Database
Server
235
Database
Server tier
© D. Wong 2003
2002
JDBC in J2EE
J2EE – Java 2 Enterprise Edition, a middle layer server
Connection to DBMS using JDBC (e.g. Cloudscape,
Oracle, MS SQL)
J2EE Platform – services and architecture
Enterprise JavaBeans (EJB)
– Session Beans vs. Entity Beans
EJB access to databases using JDBC
– Database connection
– Persistence management (Entity Bean e.g.)
– Transaction management (Session Bean e.g.)
236
© D. Wong 2003
2002
J2EE Services
HTTP - enables Web browsers to access servlets
and JavaServer PagesTM (JSP) files
EJB - allows clients to invoke methods on
enterprise beans
Authentication - enforces security by requiring
users to log in
Naming and Directory - allows programs to locate
services and components through the Java
Naming and Directory InterfaceTM (JNDI) API
237
© D. Wong 2003
2002
J2EE Architecture
Ref. JavaTM 2 Enterprise Edition Developer's Guide, Figure 1-2
238
© D. Wong 2003
2002
Enterprise JavaBeans (EJB)
Server-side Java components
Contain the business logic of enterprise application
Support database access
Transactional
Multi-user secure
Managed by the EJB container
Prohibited from a set of operations
239
© D. Wong 2003
2002
Session Bean vs. Entity Bean
Session Bean
Entity Bean
Purpose
Performs a task for a Represents a business
client
entity object that exists in
persistent storage.
Shared
Access
May have one client.
May be shared by multiple
clients.
Persistence
Not persistent.
Persistent. Entity state
remains in a database.
Ref. JavaTM 2 Enterprise Edition Developer's Guide, Table 1-1
240
© D. Wong 2003
2002
EJB Access to Databases Using JDBC API
J2EE uses
1. JDBC 2.0 (java.sql) and
2. JDBC 2.0 Optional package (javax.sql)
To make a connection to database in J2EE :
1. Should not hardcode the actual name (URL) of the
database in EJB
2. Should refer to the database with a logical name
3. Use a JNDI lookup when obtaining the database
connection.
241
© D. Wong 2003
2002
Driver and Data source properties
In J2EE configuration file, resource.properties, specify:
Driver
e.g. 1 Cloudscape that is packaged with Sun’s J2EE
jdbcDriver.0.name=COM.cloudscape.core.RmiJdbcDriver
e.g. 2 Oracle
jdbcDriver.0.name= oracle.jdbc.driver.OracleDriver
JDBC URL
e.g. 1 Cloudscape
jdbcDataSource.0.name=jdbc/Cloudscape
jdbcDataSource.0.url=jdbc:cloudscape:rmi:CloudscapeDB;create=true
e.g. 2 Oracle
jdbcDataSource.0.name=jdbc/Oracle
jdbcDataSource.0.url= jdbc:oracle:thin:@bigoh.cis.uab.edu:1521:cs610
242
© D. Wong 2003
2002
Making a connection to database example
1. Specify the logical database name.
private String dbName = "java:comp/env/jdbc/AccountDB";
2. Obtain the DataSource associated with the logical name.
InitialContext ic = new InitialContext();
DataSource ds = (DataSource) ic.lookup(dbName);
3. Get the Connection from the DataSource.
Connection con = ds.getConnection(username, password);
243
© D. Wong 2003
2002
Specifying JNDI name for deployment Step 1: Enter the code name
244
© D. Wong 2003
2002
Step 2: Map the coded name to the JNDI name
245
© D. Wong 2003
2002
Persistence Management
Container-Managed Persistence
• Entity bean code does not contain database access calls.
• The EJB container generates the SQL statements.
Bean-Managed Persistence
• Entity bean code contains the database access calls
(SQLs) (i.e. you write the code!)
246
© D. Wong 2003
2002
Container Managed example: Product entity bean
ProductEJB.java
ProductHome.java
Product.java
ProductClient.java
Bean Managed example: Account entity bean
AccountEJB.java
AccountHome.java
Account.java
AccountClient.java
247
© D. Wong 2003
2002
248
© D. Wong 2003
2002
249
© D. Wong 2003
2002
250
© D. Wong 2003
2002
Week 12
251
© D. Wong 2003
2002
EJB Access to Databases Using JDBC API
J2EE uses
1. JDBC 2.0 (java.sql) and
2. JDBC 2.0 Optional package (javax.sql)
To make a connection to database in J2EE :
1. Should not hardcode the actual name (URL) of the
database in EJB
2. Should refer to the database with a logical name
3. Use a JNDI lookup when obtaining the database
connection.
252
© D. Wong 2003
2002
Driver and Data source properties
In J2EE configuration file, resource.properties, specify:
Driver
e.g. 1 Cloudscape that is packaged with Sun’s J2EE
jdbcDriver.0.name=COM.cloudscape.core.RmiJdbcDriver
e.g. 2 Oracle
jdbcDriver.0.name= oracle.jdbc.driver.OracleDriver
JDBC URL
e.g. 1 Cloudscape
jdbcDataSource.0.name=jdbc/Cloudscape
jdbcDataSource.0.url=jdbc:cloudscape:rmi:CloudscapeDB;create=true
e.g. 2 Oracle
jdbcDataSource.0.name=jdbc/Oracle
jdbcDataSource.0.url= jdbc:oracle:thin:@bigoh.cis.uab.edu:1521:cs610
253
© D. Wong 2003
2002
Making a connection to database example
1. Specify the logical database name.
private String dbName = "java:comp/env/jdbc/AccountDB";
2. Obtain the DataSource associated with the logical name.
InitialContext ic = new InitialContext();
DataSource ds = (DataSource) ic.lookup(dbName);
3. Get the Connection from the DataSource.
Connection con = ds.getConnection(username, password);
254
© D. Wong 2003
2002
Specifying JNDI name for deployment Step 1: Enter the code name
255
© D. Wong 2003
2002
Step 2: Map the coded name to the JNDI name
256
© D. Wong 2003
2002
Persistence Management
Container-Managed Persistence
• Entity bean code does not contain database access calls.
• The EJB container generates the SQL statements.
Bean-Managed Persistence
• Entity bean code contains the database access calls
(SQLs) (i.e. you write the code!)
257
© D. Wong 2003
2002
Container Managed example: Product entity bean
ProductEJB.java
ProductHome.java
Product.java
ProductClient.java
Bean Managed example: Account entity bean
AccountEJB.java
AccountHome.java
Account.java
AccountClient.java
258
© D. Wong 2003
2002
259
© D. Wong 2003
2002
260
© D. Wong 2003
2002
261
© D. Wong 2003
2002
Transaction Management
Ref. JavaTM 2 Enterprise Edition Developer's Guide
262
© D. Wong 2003
2002
J2EE Resources
• JavaTM 2 SDK, Enterprise Edition Technical Documentation
JavaTM 2 Enterprise Edition Developer's Guide
• http://java.sun.com/j2ee/j2sdkee/
• http://archives.java.sun.com/archives/j2ee-interest.html
• Developing Enterprise Applications with the JavaTM 2
Platform Enterprise Edition
http://java.sun.com/j2ee/blueprints/
• Major commercial implementations:
•WebLogic - Bea
•Websphere - IBM
265
© D. Wong 2003
2002
.Net Data Providers
Purpose:
– connect, read, and execute commands against
data sources
– other functions, such as the management of
input and output parameters, security,
transactions, and database server errors.
266
© D. Wong 2003
2002
.NET data provider classes
1.
SqlConnection
2.
SqlCommand
3.
SqlDataReader - can use for read-only
applications from a SQL Server data source
4.
SqlDataAdapter - acts as a bridge between a
remote SQL Server data source and a DataSet
class instance inside a Visual Basic .NET solution
267
© D. Wong 2003
2002
Semi-structured Data Model (Ref. 4.6)
Blends class and relation for
1. Flexibility, therefore suitable for integration of
database.
2. Serve as a document model in notation such
as XML
3. Schemaless – the data is “self-describing”, the
schema is attached to the data itself, which
can vary arbitrarily over time and within a
single db.
268
© D. Wong 2003
2002
Semi-structured Data Model Utility
An information integration tool to integrate legacy
databases
Legacy database problem:
The problem of integrating multiple databases
designed independently, and used over time for
many different applications.
269
© D. Wong 2003
2002
Semi-structured Data Representation
A db of semi-structured data is a collection of
nodes
Node types:
– Leaf – have associated data of atomic types (e.g.
numbers, strings)
– Interior – have 1 or more arcs out
– Root – an interior node has no arcs entering it,
representing the entire db (I.e. the entry point)
Every node must be reachable from the root.
270
© D. Wong 2003
2002
Semistructured Data Representation (continued)
Arc – has a label to indicate relationship between
2 nodes. Consider arc L connecting nodes M, N,
the 2 roles that an arc serves:
1. If M is an object, N is an attribute, then L
represents the name of the attribute
2. If M and N are both objects, L is the name of
the relationship from N to M
The structure is a graph, not necessarily a tree.
E.g. Figure 4.19
271
© D. Wong 2003
2002
XML
XML : eXensible Markup Language.
Tag-based notation for “marking” documents
Plain text
Formats of tag:
1. <tag> string of data </tag>
e.g. <phone>123-456-7890</phone>
2. <tag/>
<!-- empty tag-->
e.g. <flag/>
Two modes:
– Well-formed XML
– Valid XML
Comment format: <!– your comments -->
272
© D. Wong 2003
2002
HTML vs. XML
HTML tags
For describing the presentation of the data (e.g.
<B> abc </B>) by a web browser
XML tags
For describing the meaning of the data (e.g.
<phone> 123-456-7890 </phone>)
273
© D. Wong 2003
2002
Well-formed XML
Requirements:
Document begin with a declaration (prologue):
<?xml version="1.0"? STANDLONE=“yes”?>
It has a root.
Every opening tag is followed by a matching closing tag,
and the elements are properly nested inside each other.
E.g. <BODY> … </BODY>
Any attribute can occur at most once in a given opening
tag, its value must be provided and quoted.
E.g Figure 4.21
274
© D. Wong 2003
2002
Well-formed XML (continued)
Invent your own tags.
Resembles semi-structure data model:
– Schemaless
– Self-describing
– Object like
275
© D. Wong 2003
2002
Valid XML
Uses a Document Type Definition (DTD) to
specify:
– Allowable tags
– Grammar for how the tags may be nested.
More flexible than a strict-schema model
– E.g. allow optional or missing fields
More restrictive than a completely schemaless
model
276
© D. Wong 2003
2002
XML elements and Entities/Objects (1)
XML evolved from a document markup language
(SGML) rather than a database language
e.g. <Address>
John lives on
<Street> Main St </Street>
house number
<number>123</number>
In a remote township
</Address>
Mixed data and text is a hindrance for database
277
© D. Wong 2003
2002
XML elements and Entities/Objects (2)
XML elements are ordered whereas the attributes
and the tuples in a relation are not
E.g.
<Address> <Number>123-4567</Number>
<Street>Main St. </Street> <Address>
<Address> <Street>Main St. </Street>
<Number>123-4567</Number> </Address>
But, these tuples are equivalent:
Number
Street
123-4567 Main St.
278
Street
Number
Main St.
123-4567
© D. Wong 2003
2002
XML Element types
Complex types
– Elements that contain sub-elements or carry attributes
e.g. <message to=“[email protected]"
from=“[email protected]" subject="XML">
<text>
XML basics …
</text>
</message>
Simple types
– elements that contain numbers (and strings, and dates,
etc.) but do not contain any subelements
– Built-in or author defined
Attributes always have simple types.
279
© D. Wong 2003
2002
Document Type Definition (DTD)
It’s set of rules for constructing an XML document.
– i.e. a grammar that specifies the schema of a legal
XML document
Specified by XML authors:
– As part of the document itself, or
– Stored separate from the document. The
document refers to the URL where the DTD is
stored.
An XML document that conforms to it’s DTD is
said to be valid.
280
© D. Wong 2003
2002
DTD Structure
Basic DTD structure:
<!DOCTYPE root-tag [
<!ELEMENT element-name(components)>
more elements
]>
The name of the DTD must coincide with the tag name of
the root element of the document that conforms to that
DTD.
One ELEMENT statement for each allowed tag, including
the root tag
For each tag that can have attributes, the ATTLIST
statement specifies the allowed attributes and their type.
Component of #PCDATA means simple text (i.e. no tags
nested within)
281
© D. Wong 2003
2002
DTD Structure (continue)
Component occurrence indicators:
– * : zero or more
– + : one or more
– ? : zero or one time
– | : either or
e.g. (#PCDATA | (STREET CITY)
Empty tags: <!ELEMENT elementName, EMPTY>
e.g. <!ELEMENT Flag EMPTY>
E.g. Figure 4.22 (2n. Ed.) – a DTD for movie stars
Figure 4.23 (2n. Ed.) – a document following DTD in
Figure 4.22
282
© D. Wong 2003
2002
Using a DTD
Two ways:
1. Include the DTD at the beginning of the
document
2. In the document’s opening line, refer to the
DTD that is stored separately in a file system
accessible to the application that is processing
the document.
e.g.
<?XML VERSION = “1.0” STANDALONE = “no”?>
<!DOCTYPE Stars SYSTEM “star.dtd”>
283
© D. Wong 2003
2002
Attribute Lists
Format:
<!ATTLIST ElementName
attributeName type [Modifier]
...
>
284
© D. Wong 2003
2002
Some attribute types
1.
2.
CDATA
<!-- simple text -->
ID
An attribute of type ID must have a unique value through
out the document
E.g. if attr1 and attr2 are ID type, then
<elt1 attr1="123"> and <elt2 attr2="123"> is illegal
Imitate key in relational db
3.
IDREF
An attribute of type IDREF must refer to a valid ID
declared in the same document
Imitate foreign key
4.
IDREFS
Refer to a list of valid IDs declared in the document
285
© D. Wong 2003
2002
Some Modifier Types
#IMPLIED
– The attribute is optional.
– Can remain unspecified
#REQUIRED
– The attribute is mandatory.
– E.g.
movieId ID #REQUIRED
286
© D. Wong 2003
2002
Example: Use DTD to alleviate the order problem
E.g.:
<!ELEMENT Person ((Name, Id, Address) |
(Name, Address, Id) | (Id, Address, Name) | (Id,
Name, Address) | (Address, Id, Name) | (Address,
Name, Id))>
But, it's awkward!
287
© D. Wong 2003
2002
XML Schema
A DDL for XML documents
Purpose: to define a class of XML documents
Instance document: an XML document that
conforms to a particular schema (schema valid).
Instances and schemas may exist as:
– documents in files
– streams of bytes sent between applications
– fields in a database record
– collections of XML "Infoset "Information Items
288
© D. Wong 2003
2002
Triggers
Available in Oracle and SQL99
Event-Condition-Action rules: define an action
the db should take when some db-related events
occurs
289
© D. Wong 2003
2002
Trigger vs. other constraints
1.
Triggers are awakened when certain events,
specified by db programmer, occur. E.g. update,
insert, delete to a relation; end of a transaction
2.
Other constraints immediately prevent the event
if the constraint is violated. For trigger, it's
condition is tested when the event occurs, if the
condition does not hold, then the action
associated with the trigger will not happen (I.e.
trigger will not be fired)
3.
If the trigger condition is true, the action is
performed by the DBMS (I.e. trigger fired). So,
it's transparent to the user
290
© D. Wong 2003
2002
Trigger Syntax
CREATE [OR REPLACE] TRIGGER <triggerName>
{BEFORE | AFTER | INSTEADOF}
{DELETE | INSERT | UPDATE [of column, …] }
[OR {DELETE | INSERT | UPDATE [of column, …] }
ON {tableName | viewName}
[REFERENCING { OLD [AS] <oldName> , NEW [AS]
<newName> …]
FOR EACH {ROW | STATEMENT}
[WHEN (condition) ] <action>;
291
© D. Wong 2003
2002
Oracle Triggers
<action> is PL/SQL block
Simplest is :
BEGIN
<SQL statement>
END;
In the PL/SQL action block, variables OLD and NEW are preceded
by : e.g. :OLD
Follow the create trigger statement with a Dot (.) and then RUN to
store the definition in the db
The action cannot change the relation that triggers the action, nor to
relations connected to the triggering relation by a constraint e.g. FK
constraint
Read 7.4.2 – 7.4.4
292
© D. Wong 2003
2002
SQL3 Triggers
<action> can be:
1.
a single SQL statement
2.
A SQL statements, separated by ; enclosed in a
BEGIN
<SQL statements>
END;
293
© D. Wong 2003
2002
Constraints Summary
1.
Primary Key declaration
2.
Foreign Key – referential integrity constraint
3.
Constraints within relations:
Attribute constraints: 1. NOT NULL; 2.
CHECK
Tuple based CHECKs
4.
5.
Schema level constraints – SQL2 assertions (not
in Oracle)
Triggers – Oracles's and SQL99's
294
© D. Wong 2003
2002
Week 13
295
© D. Wong 2003
2002
Java API for XML Processing (JAXP)
For processing XML data using applications written in
Java
JAXP APIs - javax.xml.parsers package
Leverages 2 parser standards :
SAX (Simple API for XML Parsing)
parse the data as a stream of events, a serial-access
mechanism that does element-by-element processing
DOM (Document Object Model) (easier to use)
build a tree structure of objects to represent the
data
org.w3c.dom
Detail review of the enrollment example (files posted
on the course web page as JAXP example)
http://java.sun.com/j2ee/1.4/docs/tutorial/doc/JAXPIntro5.html
296
© D. Wong 2003
2002
Ch. 7 Constraints and Triggers
Keys as constraint
Foreign key constraints and referential integrity
Constraints on attributes
Constraints on tuples
Assertions
Triggers
297
© D. Wong 2003
2002
Keys as Constraint
One primary key per relation
Declared in CREATE TABLE
2 ways:
1. List with attribute (when only 1 attribute in key) e.g.
snum char (5) PRIMARY KEY
2. Add to the list of items declared in the schema. E.g.
CREATE TABLE supplier (
snum CHAR(3),
sname VARCHAR(15),
status INT,
city VARCHAR(15)
CONSTRAINT pk_supplier PRIMARY KEY(snum)
);
298
© D. Wong 2003
2002
Keys (continued)
Primary Key attribute cannot be NULL
2 tuples in R cannot agree on all of the attributes in the
primary key. DBMS rejects violating insertions or updates.
Can also declare a key with UNIQUE
– Can be used instead of PRIMARY KEY
– Can have multiple UNIQUE in a table
– UNIQUE attributes permit NULL
Key constraints are checked at insert and updates
Use of index created with key attributes for efficient key
enforcement checks
299
© D. Wong 2003
2002
Foreign-Key Constraints
To enforce referential integrity constraints
E.g. SPJ (type 1), S, P, J (type 2)
The reference attribute of the relation of type 2
must be declared PRIMARY or UNIQUE
Again 2 ways to declare in CREATE TABLE
1. Inline with declaration
2. Add to the list of items declared in the schema
300
© D. Wong 2003
2002
FK as list of items declared in the schema e.g.
CREATE TABLE spj (
snum CHAR(3),
pnum CHAR(3),
jnum CHAR(3),
qty INT,
CONSTRAINT fk_spj_1 FOREIGN KEY (snum) REFERENCES
S(snum) DEFERRABLE INITIALLY DEFERRED,
CONSTRAINT fk_spj_2 FOREIGN KEY (pnum) REFERENCES
P(pnum) DEFERRABLE INITIALLY IMMEDIATE,
CONSTRAINT fk_spj_3 FOREIGN KEY (jnum) REFERENCES
J(jnum)
-- this one just use default, i.e. not DEFERRABLE
);
301
© D. Wong 2003
2002
FK – maintaining referential integrity
3 Policies:
1. Default : Reject violating Modifications
2. Cascade : changes to the referenced attributes are
mimicked at the foreign key
3. Set Null : set the foreign key to null when the
referenced attributes are changed (delete / update)
Declare for delete and update separately
Declare with the declaration of foreign key
Dangling tuples – violate referential integrity constraint
for the foreign key. Dealt with the 3 policies.
302
© D. Wong 2003
2002
Maintaining referential integrity - Example
CREATE TABLE spj (
snum CHAR(3),
pnum CHAR(3),
jnum CHAR(3),
qty INT,
CONSTRAINT fk_spj_1 FOREIGN KEY (snum) REFERENCES
S(snum) ON UPDATE CASCADE DEFERRABLE INITIALLY
DEFERRED,
CONSTRAINT fk_spj_2 FOREIGN KEY (pnum) REFERENCES
P(pnum) ON DELETE SET NULL DEFERRABLE INITIALLY
IMMEDIATE,
CONSTRAINT fk_spj_3 FOREIGN KEY (jnum) REFERENCES
J(jnum)
-- just use default
);
303
© D. Wong 2003
2002
Constraints on Attributes and Tuples
Limits the values of some attributes.
Constraints within relations
Defined in relation's schema as:
1. Constraints on an attribute
2. Constraints on tuples
304
© D. Wong 2003
2002
Constraints on the attributes
Not-null constraints
e.g. pnum CHAR(3) NOT NULL,
CONSTRAINT fk_spj_2 FOREIGN KEY (pnum)
REFERENCES P(pnum) ,
Attribute-Based CHECK
– Checked whenever any tuple gets a new value for this
attribute (e.g. update, insert)
– Reject violating operations
– The check is expressed as a conditional expression
e.g. qty INT CHECK (qty <= 10000),
– Invisible to other relations (so, can be violated if the
relations referenced in the condition are changed)
– More examples in text Pp. 328 ex. 7.8, 7.9
305
© D. Wong 2003
2002
Tuple-Based CHECK Constraints
Applies to one or more attributes of a table
Checked when a tuple is inserted or updated. Reject if it's
a violating operation.
Invisible to other relations (so, can be violated if the
relations referenced in the condition are changed)
E.g.
CREATE TABLE MovieStar(
name CHAR(30) PRIMARY KEY,
gender CHAR(1),
CHECK (gender = 'F' OR name NOT LIKE'Ms.%')
);
306
© D. Wong 2003
2002
Modification of Constraints
Only to constraints that are named.
Naming of constraints examples:
1. snum char (5) CONSTRAINT pk_supplier
PRIMARY KEY ,
2. CONSTRAINT fk_spj_1 FOREIGN KEY (snum)
REFERENCES S(snum) DEFERRABLE,
3. Gender CHAR(1) CONSTRAINT NoAndro
CHECK (gender in ('F', 'M')),
307
© D. Wong 2003
2002
Modifications
Delete constraints
ALTER TABLE <tableName> DROP CONSTRAINT
<constraintName>;
Add constraint
ALTER TABLE <tableName> ADD CONSTRAINT
<constraintName> <constraint specification>;
e.g. ALTER TABLE student ADD CONSTRAINT
PK_student PRIMARY KEY (studentkey);
Set constraint – to change a deferrable constraint from
immediate to deferred and vise versa
SET CONSTRAINT <constraintName> DEFERRED;
308
© D. Wong 2003
2002
Assertions
An assertion is a schema level CHECK constraint.
Exist independent of any particular table.
It's condition can refer to any table in the db schema
Declaring an assertion:
CREATE ASSERTION <constraintName>
CHECK <condition>
[constraint attributes];
constraint attributes: [NOT] DEFERRABLE; INITIALLY
IMMEDIATE; INITIALLY DEFERRED
The constraint is violated if the condition is false
Any db modifications that causes a violation will be
rejected
309
© D. Wong 2003
2002
Triggers
Available in Oracle and SQL99
Event-Condition-Action rules: define an action
the db should take when some db-related events
occurs
Triggers are useful for:
1. To maintain complex integrity constraints
2. To audit changes made to table
3. To signal to other programs that changes were
made to a table
310
© D. Wong 2003
2002
Trigger vs. other constraints
1.
Triggers are awakened when certain events,
specified by db programmer, occur. E.g. update,
insert, delete to a relation; end of a transaction
2.
Other constraints immediately prevent the event
if the constraint is violated. For trigger, it's
condition is tested when the event occurs, if the
condition does not hold, then the action
associated with the trigger will not happen (I.e.
trigger will not be fired)
3.
If the trigger condition is true, the action is
performed by the DBMS (I.e. trigger fired). So,
it's transparent to the user
311
© D. Wong 2003
2002
Trigger Syntax
CREATE [OR REPLACE] TRIGGER <triggerName>
{BEFORE | AFTER | INSTEADOF}
{DELETE | INSERT | UPDATE [of column, …] }
[OR {DELETE | INSERT | UPDATE [of column, …] }
ON {tableName | viewName}
[REFERENCING { OLD [AS] <oldName> , NEW [AS]
<newName> …]
FOR EACH {ROW | STATEMENT}
[WHEN (condition) ] <action>;
312
© D. Wong 2003
2002
Oracle Triggers
<action> is PL/SQL block
Simplest is :
BEGIN
<SQL statement>
END;
In the PL/SQL action block, variables OLD and NEW are preceded
by : e.g. :OLD
Follow the create trigger statement with a Dot (.) and then RUN to
store the definition in the db
The action cannot change the relation that triggers the action, nor to
relations connected to the triggering relation by a constraint e.g. FK
constraint
Read 7.4.2 – 7.4.4
313
© D. Wong 2003
2002
SQL3 Triggers
<action> can be:
1.
a single SQL statement
2.
A SQL statements, separated by ; enclosed in a
BEGIN
<SQL statements>
END;
314
© D. Wong 2003
2002
Constraints Summary
1.
Primary Key declaration
2.
Foreign Key – referential integrity constraint
3.
Constraints within relations:
Attribute constraints: 1. NOT NULL; 2.
CHECK
Tuple based CHECKs
4.
5.
Schema level constraints – SQL2 assertions (not
in Oracle)
Triggers – Oracles's and SQL99's
315
© D. Wong 2003
2002
Ch. 8 - System Aspects of SQL
SQL in programming environments
– Statement Level Interface
Embedded
SQL, SQLJ
Dynamic SQL
– PSM
Oracle’s
PL/SQL
– Call Level Interface
SQL environment
316
© D. Wong 2003
2002
Host Language Interface
Host language: conventional language that applications are
written in. e.g C, Java, C#, Visual Basic .Net
Statement level interface
– Static: Embedded SQL, SQLJ
– Dynamic SQL
Call level interface (CLI)
Use of CLI (e.g. JDBC) for static transaction is less
efficient at run time than statement-level interfaces
because CLI’s preparation and execution generally involve
separate communications with the DBMS, which is costly.
317
© D. Wong 2003
2002
Embedded SQL
Shared variables
– Host language variables that can be read/written by
SQL statement
– Serves as interface between the host language and the
SQL execution system
– When used in embedded SQL statements, precede the
variable with a : e.g. :name
Embedded SQL statements are prefixed with EXEC SQL
Preprocessor translate the EXEC SQL statements into
function calls in host language, then compile, linked with
SQL-related library to form executable code. Fig. 8.1 pp
351
318
© D. Wong 2003
2002
Embedded SQL (continued)
SQLSTATE
– a special variable (an array of 5 char)
– Serves to connect the host-language program
with the SQL execution system
– updated each time a SQL library function is
called
– Some of the codes are:
‘00000’
= no error
‘02000’ = no tuple found
319
© D. Wong 2003
2002
Embeddable SQL statements
Any SQL statement that does not return a result (I.e. not
a select-from-where query) can be embedded. E.g. insert,
delete, create, update.
select-from-where queries are not embeddable directly.
For connecting the result of queries with a host-language
program, must use one of the following methods:
1. Use single-row select for query that produces a single
tuple : select-into-from-where. Ref. fig. 8.3 pp 355
EXEC SQL SELECT A INTO :var FROM R WHERE
<cond>;
2. Use CURSOR for queries producing more than one
tuple Fig. 8.4 pp 357
320
© D. Wong 2003
2002
CURSOR - Ref. Fig. 7.4 pp.379 (Fig. 8.4 pp 357)
An object used to store the output of a query for processing
in an application. It provides the mechanism to reference
the current position in a result set, and to do positioned
updates and deletes.
Declare cursor in embedded SQL by:
EXEC SQL DECLARE <cursor> CURSOR FOR <query>
Open cursor before use:
EXEC SQL OPEN <cursor>
Use Fetch statement to get the next tuple of the relation
over which the cursor range
EXEC SQLFETCH FROM <cursor> INTO <list of shared
variables>
SQLSTATE of ‘02000’ means no more tuples found
To close a cursor when done: EXEC SQL CLOSE
321
© D. Wong 2003
2002
Cursor options
INSENSITIVE
guarantee that changes to underlying relation made
between one opening and closing of cursor will not
affect the set of tuples already fetched.
FOR READ ONLY
DBMS will prevent modifications to the underlying
relation through this cursor
SCROLL
The cursor may be accessed with any one of the :
FETCH {NEXT / PRIOR/ FIRST / LAST /RELATVE n/
ABSOLUTE n}
ResultSet in JDBC is a cursor
322
© D. Wong 2003
2002
SQLJ
ANSI standard Statement-level Interface to JAVA
A dialect of embedded SQL that can be included in JAVA
program
Goal:
To obtain the run-time efficiency of embedded SQL for
Java applications while retaining the advantage of
accessing DBMS through JDBC
Embedded SQLJ constructs are replaced by calls to an
SQLJ run-time package, which access database using
JDBC.
Benefit: the pre-compiler (translator in Oracle) can check
SQL syntax and the number and types of arguments and
results.
323
© D. Wong 2003
2002
Differences between SQLJ, Embedded SQL and JDBC
1.
SQLJ supports essentially SQL-92, much more portable
across DBMS vendors. For embedded SQL, each DBMS
vendor supports its own proprietary version of SQL.
2.
An SQLJ clause in a JAVA program begin with #SQL
instead of EXEC SQL, and can contain select-from-where
statement inside { }
Any JAVA variables can be included as a parameter in an
SQL statement (prefix the variable with : ) , same as in
embedded SQL
In SQLJ, a query returns an SQLJ interator object
instead of a ResultSet object (as in JDBC). But, it’s
similar to ResultSet in that they provide a cursor
mechanism.
Both SQLJ statement and JDBC calls can be included in
the same Java program
3.
4.
5.
324
© D. Wong 2003
2002
Oracle’s SQLJ script
To use: sqlj file.sqlj
Many command lines options are availabe, refer to
Oracle developer guide at
http://otn.oracle.com/tech/java/sqlj_jdbc/pdf/a96655.pdf
Invokes translator to preprocess java programs
with SQLJ clauses (suffix of those files can be
.sqlj, .java). Translator generates .java file which
contains the JDBC calls for the SQL statements
Invokes javac to compile
Invokes JVM to execute
325
© D. Wong 2003
2002
Dynamic SQL
Used when the SQL statement is not known at
compile time
E.g. an application that prompt the user for an
SQL query, read, and then execute the query
(can you think of an application?)
Host language program must instruct SQL to:
1. Take a string and turn it into an executable
SQL statement:
EXEC SQL PREPARE sqlvar FROM sharedVar;
2. Execute the prepared statement:
EXEC SQL EXECUTE sqlvar;
326
© D. Wong 2003
2002
Dynamic SQL (continued)
Steps 1 and 2 can be combined by:
EXEC SQL EXECUTE IMMEDIATE sharedVar;
The 2 steps approach is beneficial if a prepared
statement is execute multiple times.
Ref. Fig. 8.7 pp. 368
327
© D. Wong 2003
2002
Week 14
328
© D. Wong 2003
2002
Persistent, Stored Modules (PSM) (Ref. 8.2 )
A way to create and store procedures or functions
with a database schema
The procedure/functions can be used in SQL
queries or other SQL statements to perform within
the database computations that cannot be
expressed in SQL query language
Like a simple, general-purpose language
SQL/PSM standard (PSM-96). Commercial
DBMS offers such capabilities with variations (e.g.
Oracle, MS SQL)
329
© D. Wong 2003
2002
PSM Modules
Collection of function and procedure definitions,
temporary relation declarations, and other declarations.
(PSM modules in Oracle are Packages)
PSM Procedure
CREATE PROCEDURE
<name> (parameters)
PSM Function
CREATE FUNCTION
<name> (parameters)
local declarations
return <type>
procedure body;
local declarations
procedure body;
Parameters: mode-name-type triples
Modes: IN, OUT, INOUT
330
© D. Wong 2003
2002
Statement Forms in PSM
1.
CALL –statement (for procedures only)
CALL <procedure name> (<parameter list>);
i.
From a host-language program:
e.g. EXEC SQL CALL proc(:x, 3); // x is a
shared variable
ii. As a statement of another PSM function or
procedure
iii. As an SQL command in an SQL client (e.g.
SQLPLUS). E.g CALL proc (1, 3)
331
© D. Wong 2003
2002
Statement Forms in PSM (continued)
2.
RETURN statement (for functions only)
3.
Declaration of local variables (precede executable
statements):
DECLARE <name> <type>;
4.
Assignment statement:
SET <variable> = <expression>;
5.
Statement group in BEGIN … END block
6.
Statement label -
7.
Branching statement (ref. Fig. 8.9, 8.10 )
labelName:
332
© D. Wong 2003
2002
Statement Forms in PSM (continued 2)
8.
9.
Queries (select-from-where):
Loops
a) LOOP <statement list> END LOOP;
b) Only to iterate over a cursor:
FOR <loop name> AS <cursor name> CURSOR FOR
<query> DO
<statement list>
END FOR
c) WHILE <condition> DO <statement list> END
WHILE;
d) REPEAT <statement list> UNTIL <condition> END
REPEAT;
333
© D. Wong 2003
2002
PL/SQL of Oracle
Oracle’s procedural language (PL). It’s a superset
of SQL.
Used to create stored procedures/functions and
packages, to trigger database events, or to add
programming logic to SQL commands
Ref. : Ullman and students’s introduction
http://www-db.stanford.edu/~ullman/fcdb/oracle/or-plsql.html
334
© D. Wong 2003
2002
Call-level Interface (CLI)
The application is written entirely in the host language.
SQL statements are the values of string variables
constructed at run time
The string variables are passed as arguments to host
language procedures provided by the CLI library of
functions.
No special syntax, no pre-compiler is needed.
System independence – in principle!
SQL 99's CLI standard is an adaptation of ODBC (Open
Database Connectivity) and the host language is C
SQL/CLI in Java is JDBC (discussed in week 9)
335
© D. Wong 2003
2002
The SQL Environment
It's the framework in which data may exist and
SQL operations on data may be executed. I.e. a
DBMS running at an installation.
SQL standard of elements hierarchy: Fig. 8.15
2nd ed
1. Schema – collection of tables, views,
assertions, triggers, PSM, modules, etc.
2. Catalogs – collections of schemas
3. Clusters – collections of catalogs. Each user
has an associated cluster.
336
© D. Wong 2003
2002
The SQL Environment (continued)
SQL clients and servers – processes that runs of the same
or different hosts.
– Server : support operations on database elements
– Client : allow a user to connect to a server and operate
on a database (e.g. SQLPLUS)
Connection – a "connection" must be made between the
server and the client by executing the CONNECT
statement on the client host. Default connection is made by
an SQL client (e.g. SQLPLUS).
Session – SQL operations that are performed while a
connection is active. Each session has a current catalog,
schema, and an authorized user. Reminder: temporary
elements such as view exists only within the environment of
a session.
337
© D. Wong 2003
2002
The SQL Environment (continued 2)
Modules
–
An SQL term for an application program
–
3 kinds of modules in SQL standard:
1.
2.
3.
–
Generic SQL interface, e.g. SQLPLUS
Embedded SQL
True modules, e.g. PSM
SQL agent – a SQL module in execution Ref.
Fig. 8.17 pp. 384
338
© D. Wong 2003
2002
SQL Transactions
– Example in JDBC
Security and User Authorization in SQL
Object Oriented Model
– ODL
– OQL
Object Relational Systems
Recursion in SQL
339
© D. Wong 2003
2002
Transactions in SQL (8.6 - 2nd ed.)
Concerns with enforcing the ACID properties of transactions
Review: ACID properties of “proper” execution:
– Atomicity : All of the updates of a transaction are
successful, or no update take place
– Consistency: Each transaction should leave the database in
a consistent state
– Isolation: Each transaction, when executed concurrently
with other transactions, should have the same effect as if it
had been executed by itself (serializable behavior)
– Durability: Once a transaction has completed successfully,
its changes to the database should be permanent. Even
serious failures should not affect the permanence of a
transaction.
340
© D. Wong 2003
2002
Serializability and Atomicity
Serializable function execution:
If the function executions behave as if they were
run serially, even though their execution may
overlap in time.
Example of problem with concurrent function
executions that are not serialized:
Ex. 8.26, Fig. 8.22 and Fig. 8.23 pp. 398
Example of problem when failure (system,
network, power) occurs during a database
operation:
Ex. 8.27, Fig. 8.24 pp. 400
341
© D. Wong 2003
2002
Transactions
To solve the serializability and atomicity problem.
Def.: A collection of one or more operations on the
database that must be executed atomically, (i.e.
either all are done or none are) and in a serializable
manner (i.e. isolation).
End of a SQL transaction is marked by either
operations:
1. COMMIT - the database will be permanently
updated
2. ROLLBACK – no changes by the transaction
appear in the database (abort the transaction)
342
© D. Wong 2003
2002
Transactions (continued)
In query interface (e.g. SQLPLUS):
Transactions are single queries or modification
statements.
In program interface (e.g. JDBC):
1. Single SQL statement: use automatic commit mode. A
transaction:
starts when the statement begins execution
ends when an automatic commit or rollback is
completed
2. More than 1 SQL statement:
–
Disable auto-commit
–
After all SQL statements in a transaction are
executed, call commit; or rollback if error occurs
343
© D. Wong 2003
2002
Deferred constraints
When the commit method is executed, if there are
deferred constraints that need to be checked, and
these constraints are now violated, then the
transaction is not committed.
Instead, the transaction is rolled back and the
error is reported via the SQLSTATE variable.
In JDBC, this SQLSTATE is reported via the
SQLException object's getSQLState() method.
344
© D. Wong 2003
2002
Code segment to show transaction control in JDBC:
try { // ... add code here for making connection ...
PreparedStatement updateSales = con.prepareStatement("update COFFEES " +
"set SALES = ? where COF_NAME like ?");
PreparedStatement updateTotal = con.prepareStatement( "update COFFEES " +
"set TOTAL = TOTAL + ? where COF_NAME like ?") ;
con.setAutoCommit(false);
updateSales.setInt(1, 50);
updateSales.setString(2, "Colombian");
updateSales.executeUpdate();
updateTotal.setInt(1, 50);
// first update
updateTotal.setString(2, "Colombian");
updateTotal.executeUpdate();
// second update
con.commit();
con.setAutoCommit(true);
} catch(SQLException ex) {
if (con != null) {
try { System.err.print("Error detected! Transaction is being rolled back!");
con.rollback();
} catch(SQLException excep) {
System.err.print("SQLException: " + excep.getMessage());
} } // end if there is a connection
}
345
© D. Wong 2003
2002
Transaction Isolation Levels
Determine whether and how a transaction will be affected
by concurrent transactions – transactions from different
SQL sessions that are accessing the same data.
SQL isolation levels are to address:
1. Dirty Read: a read of dirty data. Dirty data is data
written by a transaction that has not yet committed.
2. Non-repeatable Read: One transaction reads a row.
Another deletes or modifies it and COMMIT before
the first one does. Now, if the first transaction
perform the same read again will get different result.
3. Phantom tuples: tuples that are inserted into the
database by another transaction while one's
transaction is running.
346
© D. Wong 2003
2002
Isolation levels in JDBC
The 5 isolation levels:
1. TRANSACTION_NONE - transactions not supported (This level is
not in the SQL standard)
2. TRANSACTION_READ_UNCOMMITTED - allows dirty reads,
nonrepeatable reads, and phantom reads
3. TRANSACTION_READ_COMMITTED - prevents dirty reads,
but does not prevent nonrepeatable reads, and phantom reads
4. TRANSACTION_REPEATABLE_READ - disallow dirty reads,
nonrepeatable reads, but allow phantom reads
5. TRANSACTION_SERIALIZABLE - disallow dirty reads,
nonrepeatable reads, and phantom reads (The highest level, the
default)
Connection methods to get and set them:
1.
2.
getTransactionIsolation() - to find out the transaction isolation
level the database is set
setTransactionIsolation() - to set the appropriate isolation level
347
© D. Wong 2003
2002
Choosing Isolation Levels
A choice of compromise between performance and
a slightly inconsistent view of data (because data is
being updated concurrently by other users).
Ref. Ex. 8.32, 8.33 pp. 408
348
© D. Wong 2003
2002
DBMS Techniques to enforce ACID
Locking – granularity of locks is important. Locks
are obtained at the beginning of a transaction.
Locks are released at the end of commit or
rollback.
Logging – write a log to nonvolatile storage.
Assure durability.
Transaction Commitment – for durability and
atomicity, transactions are computed
“tentatively”, recorded, but no changes are made
to the db until the transaction gets committed.
Changes are copied to the log, then copied to db.
349
© D. Wong 2003
2002
J2EE Transaction Management in Enterprise JavaBeans (EJBs)
Ref. JavaTM 2 Enterprise Edition Developer's Guide
350
© D. Wong 2003
2002
Container-Managed Transactions (see note page of this
slide)
Description
• Code does not include statements that begin and end the transaction
• Immediately before an EJB method starts - transaction begins
• Just before the method exits - commits
• Each method can be associated with a single transaction
Prohibited methods, e.g.:
• commit, setAutoCommit, and rollback methods of
java.sql.Connection
Limitation:
• When a method is executing, it can be associated with either a single
transaction or no transaction at all
351
© D. Wong 2003
2002
Bean Managed Transaction (see note page of this slide)
• Session bean code invokes methods that mark the boundaries of
the transaction - setAutoCommit(); commit(); rollback();
• An entity bean may not have bean-managed transactions
Example:
public void ship (String productId, String orderId, int quantity) {
try {
con.setAutoCommit(false);
updateOrderItem(productId, orderId);
updateInventory(productId, quantity);
con.commit();
} catch (Exception ex) {
try {
con.rollback();
throw new EJBException("Transaction failed: " + ex.getMessage());
} catch (SQLException sqx) {
throw new EJBException("Rollback failed: " + sqx.getMessage());
}}}
Ref.
JavaTM
352
2 Enterprise Edition Developer's Guide, JDBC Transaction
© D. Wong 2003
2002
Week 15
353
© D. Wong 2003
2002
Security and User Authorization in SQL
8.7 pp. 410
Authorization ID = user name
Special authorization ID: PUBLIC
Privileges for:
SELECT, INSERT, UPDATE, DELETE,
REFERENCE, USAGE, TRIGGER,
EXECUTE, UNDER
For SELECT, INSERT, UPDATE, may also
specify on attribute level
Privileges are needed for relations in the
subqueries also. e.g. Fig. 8.25 pp 411
354
© D. Wong 2003
2002
Creating privileges
Owner of schema or modules has all privileges
Establish ownership at:
1. When a schema is created.
2. When a session is initiated by a CONNECT
statement.
e.g. CONNECT TO ABC_server AS conn1
AUTHORIZATION smith;
3. When a module is created, use an optional
AUTHORIZATION clause
355
© D. Wong 2003
2002
Granting privileges
Owner of a relation has GRANT privilege.
If you have the "GRANT" privilege to a set of privileges, you
may grant them to any user.
GRANT <privilege list> ON <database element>
TO <user list> [WITH GRANT OPTION]
e.g.
GRANT SELECT, INSERT ON Studio TO kirk, picard
WITH GRANT OPTION;
-- by Janeway
GRANT SELECT, INSERT ON Studio TO sisko; -- by picard
GRANT SELECT, INSERT(name) ON Studio TO sisko; -- by kirk
Grant diagram e.g. Fig. 8.26 pp. 417
356
© D. Wong 2003
2002
Revoking Privileges
Privileges can be revoked:
REVOKE [GRANT OPTION FOR] <privilege list> ON <database
element> FROM <user list> {CASCADE | RESTRICT}
e.g.
REVOKE SELECT, INSERT ON Studio FROM picard CASCADE ;
If A has been given a privilege by several different people
on the same element, then all of them have to revoke in
order for A to lose the privilege
If A granted privilege P to B, who granted P to C, then A
revokes P from B will also revoke P from C. e.g. Fig 8.29 pp
420
357
© D. Wong 2003
2002
Object-Oriented Data Model
ODMG
– Object Database Management Group
– Deals with OO standard for database
– Also deals with ORDBMS (Object Relational DBMS)
Major parts of ODMG standard:
– ODL: Object Definition Language, how to specify the
db schema
– OQL: the SQL-like Object Query Language
– Host language binding: how to use ODL and OQL from
within procedural languages. The standard define
bindings for C++, SmallTalk, and Java. In ODMG, the
host language also serves as the object manipulation
language.
358
© D. Wong 2003
2002
ODMG database management system
Application is written in a host language e.g. C++, Java
In order to access the db, the application must be linked with
the ODBMS libraries and with the code that implements its
class methods.
Much of the code that manipulates objects is part of the
database itself.
Each class has a set of methods. Method signatures are
specified in the schema using ODL.
The code for these methods is stored on the database server.
ODBMS invokes the appropriate code whenever a method is
called.
OODMG database data is modified directly in the host
language
e.g. Stud.Name = "Joe";
// Stud contains the oid of a
// persistent Student object
359
© D. Wong 2003
2002
Architecture of an ODMG database
Schema Spec. in
ODL(Embedded in
C++, Java, etc)
ODL
Preprocessor
Source code for class
methods in host language
(C++, Java, …)
Host language
compiler
ODBMS
Software
Method
Implementation
Obj. code
ODBMS
Libraries
Linker
Information stored at the Server
Metadata
Data Access
Method Implementation
Binaries Stored in DBMS
Object Data
360
Ref. "Databases and Transaction Processing" – Lewis, Addison Wesley
© D. Wong 2003
2002
Structure of ODMG Applications
Application source
code in host language
ODBMS
Host language
compiler
ODBMS library
Application
Object code
Method
implementation
binaries stored in
DBMS
Linker
Executable
code
361
Ref. "Databases and Transaction Processing" – Lewis, Addison Wesley
© D. Wong 2003
2002
Object Definition Language (ODL)
Conceptual model to describe the attributes, methods,
and relationships of each object type (class), including it's
inheritance properties.
ODL classes describes 3 kinds of elements:
1. Attributes: values associated with the object
2. Relationship: connection between the object itself and
other objects
3. Methods: functions that may be applied to objects of
the class.
Methods are specified by it's signature: name,
arguments (names, order, and type), return value
type, name of any exceptions it can raise.
e.g. Fig. 4.2 pp137
362
© D. Wong 2003
2002
Object Definition Language (ODL) (continued)
Class declaration
Class include:
1. Class Name
2. Key declaration(s). Optional.
3. Extent Declaration = name for the set of currently
existing objects of a class (I.e. relation instance in
relational model)
4. Element declarations: attributes, relationships,
methods
class <name> [(extent names)]
{ < list of elements> }
363
© D. Wong 2003
2002
Object Definition Language (ODL) (continued 2)
Attribute declaration (non-objects):
attribute <type> <name>;
e.g. 1 attribute string name;
e.g. 2 attribute Struct Addr{ string street, string city}
address;
Relationship (and inverse relationship) declaration
(objects):
relationship [rangetype]<className> <name> inverse
className::<relationship name>;
e.g. relationship Set<Star> stars
inverse Star::starredIn;
364
© D. Wong 2003
2002
Method declaration
<returnType> <methodName> (arguments) raises
(<exception>);
e.g. 1: void lengthInhours() raises (noLengthFound);
e.g. 2: void starName(out Set<String>) ;
Arguments:
in : read-only
out: for returning values
inout: for both
365
© D. Wong 2003
2002
ODL Relationships
Only binary relationships supported
–
Use a connecting class to represent multiway
relationships Fig. 2.9 pp. 34.
Relationships are defined in inverse pairs. Fig.
4.3 pp 140
1. Many-many: have a set type of class in each
direction
2. Many-one: a set type for the one, and a simple
class name for the many
3. One-one: simple class name in both
366
© D. Wong 2003
2002
Subclass (S is a subclass of D)
Class C extends D { class C's declarations }
e.g. class Cartoon extends Movie {
relationship Set<Star> voices;
}
Multiple inheritance (separate the super classes by : in the
extend declaration)
e.g. class CartoonMurderMystery
extends MurderMystery : Cartoon
Name conflict resolutions with Multiple inheritance pp.
151
367
© D. Wong 2003
2002
ODL data types
Basis:
1. Atomic type: integer, float, characters, string,
boolean, enum
2. Class names
Structured types:
1. Set: Set<T> // finite sets of elements of type T
2.
3.
4.
5.
Bag: Bag<T> // finite bags of element type T
List: List<T> // finite lists of 0 or more elements T
Array: Array <T, i> // T = type, i = no. of elements
Dictionary: Dictionary <T, S>, T is key type, S is
range type. Each pair has unique key value.
6. Structures : Struct N {<type1> field1, …}
368
© D. Wong 2003
2002
Keys declaration in ODL
Optional because each object is identified by an internal
OID
May declare one or more keys in the extent declaration
e.g. class Movie
(extent Movies key (title, year))
{
attribute string title;
attribute integer year;
…
}
369
© D. Wong 2003
2002
ODL to Relational Design
Invent a new attribute to serve as key when there
is no key in the ODL design
ODL attributes that are not atomic are converted
into relation attributes that usually are redesigned
with normalization
Methods are not converted to relational design.
But can have methods in Object Relational design
370
© D. Wong 2003
2002
Object-Relational DB (ORDB)
SQL-99 adopted a limited subset of the object relational
model
ORDBMS is a conservative extension to the existing
RDBMS.
In general, ORDB consists of:
– A set of relations (which can be viewed as classes)
– Each relation consists of a set of tuples (which can be
viewed as instances of the class that represents the
relation)
– Each tuple is of the form (oid, val) where oid is an
object id and val is a tuple value whose components can
be arbitrary values (e.g. primitive values, sets of tuples,
and references to other objects)
371
© D. Wong 2003
2002
ORDB, ODB, RDB
Difference between ORDB and ODB
– In ORDB, the top-level structure of each object
instance is always a tuple. In ODB, top-level
structure can be an arbitrary value.
Difference between ORDB and RDB:
– RDB tuple components must be primitive
values
– ORDB tuple components can be arbitrary
values
372
© D. Wong 2003
2002
Oracle Object example
create type ADDRESS_TY as object
(Street
VARCHAR2(50),
City
VARCHAR2(25),
State
CHAR(2),
Zip
NUMBER);
create type PERSON_TY as object
(Name
VARCHAR2(25),
BirthDate DATE;
Address ADDRESS_TY
member function AGE_DAYS (BirthDate IN DATE)
return NUMBER);
373
© D. Wong 2003
2002
Oracle Object example (continued)
Defining methods for user defined types using PL/SQL:
Create type body PERSON_TY as
Member function AGE_DAYS (BirthDate DATE)
return NUMBER is
begin
RETURN ROUND(SysDate – BirthDate);
end;
-- if there are more methods to the data type, may define here
end;
/
374
© D. Wong 2003
2002
Oracle Object example (continued 2)
Create table with user defined abstract data types:
create table CUSTOMER
(Customer_ID NUMBER,
Person
PERSON_TY);
Use constructors for inserting data:
insert into CUSTOMER values (1, PERSON_TY('Joe Smith', '01JAN-90', ADDRESS_TY('10 Spring ST', 'BHM', 'AL', 35110)));
Use path names to access the attributes:
SELECT Person.Address.Street
FROM CUSTOMER;
SELECT Person.AGE_DAYS(Person.BirthDate)
FROM CUSTOMER;
UPDATE CUSTOMER
SET Person.Address.City = 'Birmingham'
WHERE Person.Address.City = 'BHM';
375
© D. Wong 2003
2002
Object-Orient Analysis and Design
Normalization in relational model relates each attribute to
its primary key
e.g. The following is in 3NF:
create table CUSTOMER
(Customer_ID NUMBER,
Name
VARCHAR2(25),
BirthDate DATE;
Street
VARCHAR2(50),
City
VARCHAR2(25),
State
CHAR(2),
ZipNUMBER);
For OO, further group related columns into abstract data
types (ADT) (e.g. ADDRESS_TY) for reuse.
Then look for relationships among ADTs to determine if
nesting is appropriate (e.g. PERSON_TY);
376
© D. Wong 2003
2002