Select - FSU Computer Science
Download
Report
Transcript Select - FSU Computer Science
COP 4710 Databases
Fall, 2000
Today’s Topic
Review for Final Exam
David A. Gaitros
November 6, 2000
Department of Computer Science
Copyright by Dr. Greg Riccardi
1
Outline of Course
Study of principals and techniques of
databases
Grades assigned as in information sheet
Examples of use of databases
Programming projects in database design and
implementation
– Programming in Microsoft Access
– Programming in Java with a Unix database
– Development of a web site with database support
Course notes in
http://www.cs.fsu.edu/cop4710/lectures
Next class, Chapter 2
2
Representation of Information
Data is collections of bits
– physical database
Information is data with meaning
– logical database
Representation of meta-data
– database system is self-describing
Database Management System (DBMS)
–
–
–
–
define information content
construct database
manipulate by queries, reports and updates
data plus software
3
Vocabulary
Glossary of terms
Define the terms as used in this subject
– Database literature is filled with terms
Example of terms
–
–
–
–
Data, bits
Information, bits with meaning (type)
Entity
Schema
4
Data Modeling
A data model is a specification of the
information content of a system
– conceptual data model describes information in
terms the users will understand
– logical data model describes information in a way
that can be used to build a database
– physical data model describes information in
terms of its representation in physical storage
5
Schemas and Instances
Schema is the structure of a database
–
–
–
–
intention or meaning of the data
data models are schemas
table definitions are schemas
class definitions are schemas
Instances are the contents of a database
– extension or values of the data
– objects are instances
– objects in a database are typically rows in a table
6
Levels of database schemas
Different schemas are presented to different
users
External View 1
External View 2
External View 3
External level
logical to external mappings
Logical Schema
Logical level
internal to logical mapping
disk
Internal Schema
Internal level
7
Database Languages
DDL, data definition language, conceptual schema
– describe conceptual schemas
SDL, storage definition language, internal schema
– describe file structures, indexes
VDL, view definition language, external schema
DML, data manipulation language
– High-level or non-procedural (e.g. SQL)
• Select Last Name from Roster where Section = 2
– Low-level or procedural
• For r in Roster loop
if r.section = 2 then
result.Add ( r.lastname );
8
Principals of ER Modeling
Entities and classes
– Entity, a thing in the real world
– Entity Class, the structure of a collection of
similar entities
Attributes
– Attribute, a property of an entity
– Each entity has a value for each of its attributes
Types of attributes
– simple vs. composite, single-valued vs. multivalued, stored vs. derived
– domains of attributes
9
Relationships Between Entities
Relationship type defines a set of
associations among given types.
Relationsip Instances are particular
relationships among objects.
Examples of relationship types in company
database
– Manages: 1:1 between employee and department
– Works-for: 1:N between department and employee
– Controls: 1:N between department and project
10
Find the Entities, Attributes and
Relationships
BigHit Video
Rental Receipt
Account Id: 101
Videotape Id: 90987 date: January 9, 1999
cost: $2.99
Jane Block
Elizabeth
date due: January 11, 1999
1010 Main St.
Apopka, FL 30458
11
ER schema diagram for BigHit Video
lastName
firstName
otherUsers
0/1
Customer
numberRentals
street
customerId
address
city
Rents
dateDue
state
videotapeId
dateRented
0/M
title
genre
Videotape
cost
zipcode
12
Chapter 4 The Relational Data Model
A Relation is a two-dimensional table
– Fixed list of columns
– One object per row
An attribute represents a single column of a
table and has a name and a type
A relation schema is the name and the list of
attributes of a relation
– Grade (studentId, assignmentId, points,
dateSubmitted)
A tuple is a row of a table, one value for each
attribute
– (123, 14, 27, 5/28/98)
13
Characteristics of Relational Model
Relation is a set of tuples
– No ordering of tuples
– No duplicate tuples
• no two rows have all the same values
Each attribute value is atomic
– hence no multiple-valued or composite attributes
– called first normal form
Each relation is a set of assertions
– Each represents a fact
– Some facts are about relationships
That’s it!
– no other data structures
– no explicit representation of relationships
Representing E-R Model as Relations
Entity class Relation schema
Entity row of table
– set of all entities of class table
Attribute column definition (attribute)
– attribute value table element
Relationship type
– relation schema
– attribute(s) of relation schema
15
Rules for Relationship Types
One-to-many
– For each one-to-many relationship type R
between subject class S and target class T, add
the key attributes of class S to class T as foreign
keys. Name the attributes using the role that S
plays in relationship type R.
– Add the attributes of the relationship type R to
class T.
One-to-one
– choose one side and use above rule
Examples in class
16
Many-to-many relationship types
Create a relation schema for the relationship
type
– foreign key attributes for the key of the related
schema
– add attributes of the relationship type
Examples in class!
17
Representing relationships as attributes
One-to-many
– For each one-to-many relationship type R
• subject class S (one side)
• target class T (many side),
– add the key attributes of S to the schema of T
• as foreign keys.
– Name the foreign key attributes
• ues the role that S plays in relationship type R.
– Add the attributes of the relationship type R to
schema for T.
One-to-one
– choose one side and use above rule
18
Representing Weak Entity Classes
Create a relation schema
– Add foreign key for each defining relationship type
– Key is partial key plus defining foreign keys
Consider Fig. 2.5, weak class Rental
Schema: Rental (videoId,dateDue,
dateRented, cost)
– key
• videoId (foreign key)
Weak Entity
Class
Participation
Constraint
1
Customer
1
Has
Rental
1
Has
Videotape
M
dateDue
dateRented
cost
Identif ying
Relationship
Type
19
Representing specialization hierarchies
Three possibilities
– 1. Create a table for the superclass with its
attributes and a table for each subclass with its
attributes
– 2. Create a table for the superclass with all of the
subclass attributes
– 3. Create a table for each subclass that includes
both subclass and superclass attributes
20
Functional Dependencies and
Normalization
Begin by discussing good and bad relation
schemas
Informal measures of the quality of relation
schema design
–
–
–
–
Semantics of the attributes
Reducing the redundant values in tuples
Reducing the null values in tuples
Disallowing spurious tuples
Define Normal Forms as formal measures of
the quality of schemas
– restrictions on the form of relation schemas
21
Update Anomalies
Insertion Anomalies
– When inserting a new owner, we must correctly insert the
Manuf field, or will create inconsistencies
– Cannot create a car without an owner
– Cannot create a make without a car and an owner
Deletion Anomalies
– Deletion of owner of a car also deletes make and
manufacturer of car
– Deletion of owner of the last Plymouth deletes relationship
between Plymouth and Chrysler
Modification Anomalies
– Changing the make of a car requires consistency check
– Cannot change so that a Plymouth is made by Ford
Guideline 2: no insertion, deletion, or modification
anomalies allowed!
22
Some definitions
superkey: a set of attributes of a relation
whose values are unique within the relation.
key, a superkey in which removal of any
attribute makes it not a superkey. If there is
more than one key, they are called candidate
keys.
primary key, arbitrarily designated candidate
key, all other candidate keys are secondary
keys.
prime attribute, one which is a member of
any key.
nonprime attribute, one which is not prime.
23
Definition of Functional Dependency
A functional dependency is a constraint between 2
sets of attributes from the database
– For each value of the first set there is a unique value of the
second set
X-->Y restricts the tuples that can be instances of R
if t1 and t2 are instances of R
– t1(X) = t2(X) then t1(Y) = t2(Y)
For example,
– {DLNum} --> {Oname}
– {CarId} --> {Make, Manuf}
– {Make} --> {Manuf}
Candidate keys are left hand sides of functional
dependencies
24
Second Normal Form (2NF)
X-->Y is a full functional dependency if the
removal of any attribute A from X removes the
dependency
– not X-{A} --> Y
X-->Y is a partial dependency if some
attribute A may be removed without removing
the dependency
– X-{A} --> Y
A relation schema R is in 2NF if every
nonprime attribute is fully functionally
dependent on the primary key of R
25
Putting the CarReg Schema into 2NF
Consider the Owner relation schema
– {DLNum} is the primary key
– Hence Owner is in 2NF
Consider the Car relation schema
– {CarId, DLNum} is primary key (multiple owners)
– {CarId} --> {Make, Model,...}
– Hence Car is not 2NF
Create new relations
– CarOwner = {CarId, Owner, PurchDate, TagNum,
RegisDate}
– Car = {CarId, Make, Model, Manuf, Year, Color}
Is it 2NF?
26
Rules for Functional Dependencies
Given a particular set of functional
dependencies, we can find others using
inference rules
– Splitting/combining rules
• A -> B1 B2 <=> A-> B1 and A->B2
– Trivial rules
• A B -> B, for all A, B
– Transitive rule
• A -> B and B -> C => A B -> C
We are interested in the closure of the set of
functional dependencies under these (and
other) rules
27
Inference Rules for Functional
Dependency
There are semantically obvious functional
dependencies, usually specified by schema
designer
Other functional dependencies can be
inferred from those
Inference rules
–
–
–
–
–
–
Reflexive, X includes Y, X-->Y
Augmentation, X-->Y then XZ-->YZ
Transitive, X-->Y-->Z then X-->Z
Decomposition, X-->YZ then X-->Y
Union, X-->Y and X-->Z then X-->YZ
Pseudotransitive, X-->Y and WY-->Z then WX-->Z
28
Definition of Key
A set of one or more attributes {A1,...Ak} is a
key for a relation R
– Those attributes functionally determine all other
attributes of R
• no 2 distinct tuples can agree on the key
– no proper subset of {A1,... Ak} is a key of R
• a key must be minimal
There can be more than one key in a relation
– Department (DeptName, DeptNo,...)
• since both are unique, both are keys
A superkey (superset of a key) is a set of
attributes that functionally determine all other
attributes of the relation.
29
Third Normal Form (3NF)
Based on transitive dependency, or nonkey dependency
A functional dependency X-->Y is a transitive
dependency if there is a set Z which is not a
subset of any key, and for which X-->Z and Z->Y
A relation schema is in 3NF if there is no
nonprime attribute which is functionally
dependent on a non-key set of attributes.
Example of {make}-->{manuf} violates 3NF
since make is not a key.
30
Section 6.1 Relational Algebra
Look at the formal basis for operations on the
relational data model
An “algebra” is a collection of operations on
some domain
Relational Algebra is a collection of operators
– operands and results are relations
– operators
• projection and selection remove parts of a
relation
• set operators, union, intersection and difference
• joins and products combine the tuples of two
relations
– other operators follow
31
Join Operations
Natural join is based on the cartesian product
– With a restriction on the tuples and attributes
• each common attribute appears once in result
• tuples are included only where the common
attributes have the same values
– R join S on A has those tuples of RS
• where R.A = S.A
– Each tuple from R is joined to all tuples of S that
have the same value for attribute A
Example
– Every combination of Customer and Rental where
the accountId fields match
32
Combining Operations to Form Queries
Can put all operations together
– Names and grades of students who made took
quiz 1
We’ll see how this works in in Access
In class, time permitting
– Demonstration of Queries in Access
33
Relational Expressions
Select account 113, project videoId and
dateDue
– videoId, dateDue( accountId=113(Rental))
VideoId, title and date due for account 113
– videoId, title, dateDue(( accountId=113(Rental)) videoId
Videotape movieId Movie)
– videoId, title, dateDue( accountId=113(
Rental videoId Videotape movieId Movie))
What is the order of evaluation?
34
Chapter 7: SQL
Standard Query Language
– ANSI and ISO standard
– SQL2 or SQL-92 is current standard
SQL is a data manipulation language (DML)
and a data definition language (DDL) and a
programming language
We can use SQL for
– Logical database specification (database schema
definitions
– Physical database specifications (indexes, etc.)
– Querying database contents
– Modifying database contents
35
Relational Operations in SQL
Select statement
– select <attribute names> from <tables>
where <condition>
Projection in SQL using select clause
– Select title from Movies
Selection in SQL using where clause
– select * from Customer where lastName = 'Doe'
– select distinct lastName, firstName from
Customer
• no duplicates with distinct
36
Products and Joins in SQL
Cartesian product in SQL using from clause
– Select * from Employee, Timecard
Join using from and where clauses
– Select * from Employee, Timecard where
Employee.ssn = Timecard.ssn
Join using join and on (non-standard)
– Select * from Employee join TimeCard
on Employee.ssn = TimeCard.ssn
37
Nested Queries
Nested select query
– Select videoId, dateAcquired
from Videotape
where videoId = (
select videoId from Rental
where dateRented=‘1/1/99’)
compare with
– Select v.videoId, dateAcquired
from Videotape v, Rental r
where v.videoId = r.videoId and
dateRented=‘1/1/99’)
Same result?
38
Select Using Group by and Having
Group by forms groups of rows with the
same column values
What is the average hourly rate by store?
– select storeId, avg(hourlyRate)
from HourlyEmployee e, WorksAt w
where e.ssn = w.ssn
group by stroreId
How many employees work at each store?
– select storeId, name, count (*)
from Store s, WorksAt w
where s.storeId = w.storeId
group by storeId, name
Having filters the groups
– having count (*)>2
39
Substrings, arithmetic and order
Find a movie with ‘Lion’ in the title
– select title from Movie where title like ‘%Lion%’
List the monthly salaries of salaried
employees who work in in store 3
– select salary/12 from Employees e, WorksAt w
where e.ssn=w.ssn and storeId=3
Give the list of employees in store 3, ordered
by salary
– select firstName, lastName
from Employees e, WorksAt w
where e.ssn=w.ssn and storeId=3
40
Modifying Content with SQL
Insert queries
– insert into Customer values (555, 'Yu', 'Jia','540
Magnolia Hall','Tallahassee', 'FL', '32306')
– insert into Customer (firstName, lastName,
accountId) values ('Jia', 'Yu', 555)
Update queries
– update TimeCard set paid = true
where paid = false
– update HourlyEmployee set hourlyRate =
hourlyRate *1.1 where ssn = '145-09-0967'
Samples in Access
41
Creating Pay Statements with SQL
Find the number of hours worked for each
employee entry
– select TimeCard.ssn, sum((endTimestartTime)*24) as hoursWorked from TimeCard
where paid=false group by ssn
Create the Pay Statement entries for each
Employee
– select ssn, hourlyRate, hoursWorked,
hoursWorked * hourlyRate as amountPaid, today
from …
Insert into the PayStatement table
– Insert into PayStatement select …
Look at the Access example in BigHit.mdb
42
Create Table Statement
create table Customer (
accountId int,
lastName varchar(32),
firstName varchar(32),
street varchar(100),
city varchar(32),
state char(2),
zipcode varchar(9)
)
Note that SQL has specific types
43
Key Constraints in SQL
Key declarations are part of create table
– create table Store (
storeId int primary key,
– create table Movie (
movieId varchar(10) primary key,
– create table Rental (
accountId int,
videoId varchar(10),
primary key (accountId, videoId)
44
Java Objects and variables
Objects are dynamically allocated
– Figures A.1 and A.2 show String variables
• Assignment (=) and equality (==)
a.
firstName="Fred";
b.
String
firstName="Jane";
Fred
String
Fred
firstName
firstName
obj1
String
String
Jane
Jane
String
Jane
obj2
String
obj3
Jane
45
Java DB Connectivity (JDBC)
Java
Program
Local computer
java.sql
package
Figure 8.4
Strategies for
implementing
JDBC
packages
JDBC package
JDBC package
JDBC-ODBC
bridge
JDBC package
Oracle Database
client
Middleware
client
ODBC database
client
Access ODBC
client
Oracle ODBC
client
Middleware
server
Intermediary
computer
Oracle Database
client
Database
computer
Access
server
Oracle
server
Sybase Database
client
Sybase
server
46
Executing Insert and Update Statements
Create new customer, using String +
int rowcount = stmt.executeUpdate(
”insert into Customer ”
+”(accountId,lastName,firstName) ”
+”values (1239,’Brown’,’Mary’)”);
if (rowcount == 0) // insert failed
Update
– String updateSQL = “update TimeCard set “
+”TimeCard.paid = 'yes’ where “
+”paid<>'yes’”;
int count = stmt.execute(updateSQL);
// count is number of rows affected
47
Chapter 13 Query Processing
Strategies for processing queries
Query optimization
First: How to represent relational DB?
– Each table is a file
• Record structure to store tuples
• File is a random access collection of records
– Query is executed by reading records from files
• Read record, create object in memory
• Process object
• Write result as a file of records or keep in
memory
48
Processing a range query
Figure 13.3 Illustration of query processing
for query
– select * from Customer where
accountId >= 101 and accountId < 300
347 903
B+ tree index for
accountId
35 110 347
10 24 35
101 110
10 ...
123 200 246 347
24 ...
123 ...
Data file
901 903
35 ...
200 ...
876 ...
700 767 876 901
101 ...
246 ...
901 ...
347 ...
902 ...
902 903
110 ...
700 ...
767 ...
903 ...
49
Using hashing to eliminate duplicates
A hash function partitions values so that
– All values that are the same are in the same
partition
– Values that are different are often in different
partitions
We can find duplicates by hashing
– For each tuple in the table
• Mash all attribute values in the tuple into a
single value
• Apply hash function
– For each partition
• Compare all pairs of tuples
• Eliminate duplicates
– Why does this work?
50
Processing join queries with indexes
Indexed nested loop join
while (not customer.eof()) {
Customer c= customer.read();
rental.reset();
while (not rental.eof()) {
Rental r[] = rental.readByAcctId(c.accountId);
for (int i=0; i<r.length; i++) {
result.write(c,r[i]);
result.write(c,r[I]);
} }}
Cost is Bc + Rr instead of Bc + Rc × Br without
index
Reduce cost by processing a block at a time?
51
ACID Transactions
Atomicity: the property of a transaction that all of the
updates are successful, or there is no update at all.
Consistency: each transaction should leave the
database in a consistent state. Properties such as
referential integrity must be preserved.
Isolation: each transaction when executed
concurrently with other transactions should have the
same affect as if it had been executed by itself.
Durability: once a transaction has completed
successfully, its changes to the database should be
permanent. Even serious failures should not affect
the permanence of a transaction.
52
Example of transaction
open transaction
videoId video1 = select id of a copy of
"Star Wars"
if (video1 == null) rollback transaction
insert row into Reservation for video1
videoId video2 = select id of a copy of
"Return of the Jedi"
if (video2 == null) rollback transaction
insert row into Reservation for video2
videoId video3 = select id of a copy of
"The Empire Strikes Back"
if (video3 == null) rollback transaction
insert row into Reservation for video3
commit transaction
53
Transaction isolation
Consider these transactions
– Actions of T1
• A: balance1 = (select balance from Customer where
accountId = 101); balance1 += 5.00;
• B: update Customer set balance = ?balance1 where
accountId = 101;
– Actions of T2
• A: balance2 = (select balance from Customer where
accountId = 101); balance2 += 10.00;
• B: update Customer set balance = ?balance1 where
accountId = 101;
Problems
– Lost update:
• T1.a, T2.a, T1.b, T2.b
– Dirty read:
• T1.a, T1.b, T2.a, T1.rollback, T2.b, and T2 commit
– Incorrect Summary: example in class
54
Locking database objects
Allow transaction operations to lock objects
– Read (shared) locks
– Write (exclusive) locks
Lock granularity
– What size object to lock?
– Table, row, field, column
Effect on concurrency
– T1:Select sum(balance) from Customers
– T2: Update Customers set firstName=‘Joe’ where
accountId=101
Effect on size and cost
– Smaller objects = more locks
55
Two phase locking (2PL)
Locks granted and released in two phases
– Growing phase
• Request and upgrade locks
– Request read on X
– Request write on X
– Shrinking phase
• Release and downgrade locks
– Request read on X (downgrade from write)
– Release read on X
2PL guarantees serializability
– Any conflicting operation is blocked
56
Transaction problems
Lost update
– Two transactions update, last one persists
Dirty read
– One transaction reads a value written by a transaction that
subsequently rolls back
Incorrect summary
– One transaction calculates an aggregate while another is
updating
Unrepeatable read
– One transaction reads the same object twice and receives
two different values
Phantom read
– A transaction reads a value inserted by another transaction
that subsequently rolls back
Deadlock
– Two transactions hold and request
57
Transactions in SQL
Transaction management statements
–
–
–
–
–
set transaction read only;
set transaction read write;
set transaction isolation level serializable;
commit transaction;
rollback transaction;
Executing SQL statement without opening
transaction
– autocommit mode
58
Causes of Failure, Possibilities of Recovery
Database server
– computer crashes
– server program crashes
– disk drive corruption
Client failure
– computer crashes
– client program crashes
Network failure
– connection fails, often temporary
Transaction failure
–
–
–
–
executes rollback (voluntary)
executes illegal operation (server created)
deadlock
introduces errors into the database
59
Recovery from failure
Primary technique, restart from consistent
backup/checkpoint
Reprocessing
– ask all committed transactions to execute again
Roll Forward
– Back to consistent backup state
– Apply redo transaction log
Roll Back
– Remove the effect of each transaction with undo
log
– Can be used to cancel the effects of rogue
transactions
60
Security in Relational Database Systems
Account security for validation of users
– Database accounts
– Operating system accounts
SQL statements for security
–
–
–
–
–
create user
alter user
create profile
create role
grant privileges to users, roles
61
Stored Procedures
Define numberRented function
– create function numberRented (accId int) return int
as select sum(*) from Rental where
Customer.accountId = accId;
Define checkIn procedure
– create procedure checkIn (vidId int, cost double)
as begin insert into PreviousRental …
Grant privileges to procedures
– grant update on PreviousRental to checkIn
– grant checkIn to clerk
– revoke update on PreviousRental to public
User in the clerk role can update the table, no
one else can
62
Distributed Database Systems
AP 1
AP 2
AP 1
AP 2
AP 2
AP 3
DDBMS
DDBMS
DDBMS
OSnet
OSdm
OSnet
OSdm
OSnet
OSdm
Database
Database
Database
Osnet = Network Communications
portion of Operating System
Osdm = Data management portion of
Operating System
DDBMS = Distributed Database System
63
Distributed Databases
Single schema with multiple servers
– Not one application connecting to multiple servers
– An application connects to a single server
Fragmentation of tables
– Horizontal, rows in different servers
– Vertical, columns in different servers
– Replicated, some rows or columns in multiple
servers
Distributed Transactions
– Two phase commit
– Discussion in class
64