Database Management Systems
Download
Report
Transcript Database Management Systems
Database Management Systems
What is a DBMS?
Database management systems:
Address problems such as:
Provide efficient (speed and space) and secure access to
large amount of data.
How to store the data efficiently
How to query data efficiently
How to update the data reliably and securely (by multiple
users)
Contrast with using file systems for the same task
Relational Databases
Based on the relational model
Student
Charles
Dan
…
Course
Term
SYSC3001
Fall, 2011
SYSC4602
Summer, 2010
…
…
Separates the logical view from the
physical view of the data.
Querying a Database
Find all the students who have taken SYSC3001 in
Fall 2011.
S(tructured) Q(uery) L(anguage)
select E.name
from Enroll E
where E.course=SYSC3001 and
E.term=“Fall_2011”
Query processor figures out how to answer the
query efficiently.
Database Industry
Relational databases are a great success of
theoretical ideas.
“Big 3” DBMS companies are among the largest
software companies in the world.
IBM (with DB2) and Microsoft (SQL Server,
Microsoft Access) are also important players.
$20B industry (several years old)
Challenged by object-oriented DBMS.
Why Use a DBMS?
Data independence and efficient access
Reduced application development time
Data integrity and security
Uniform data administration
Concurrent access and recovery from
crashes
Functionalities of a DBMS
Storage management
Abstract data model
High level query and data manipulation
language
Efficient query processing
Transaction (concurrency) processing
Resiliency: recovery from crashes
Interface with programming languages
The Study of DBMS
Some aspects:
Modeling and design of databases
Database programming: querying and update
operations
Database implementation
DBMS study cuts across many fields of
Computer Science and Engineering: OS,
languages, software engineering, AI, Logic,
multimedia, theory, ...
Database Modeling and Design
Why do we need it?
Agree on structure of the database before
deciding on a particular implementation.
Consider issues such as:
What entities to model?
How entities are related?
What constraints exist in the domain?
How to achieve good design?
Performance, memory space, reliability, and security
Database Design Formalisms
Entity/Relationship model (E/R):
More relational in nature
Conceptually similar to OO analysis and design
Can be translated (semi-automatically) to
relational schemas (with varying amount of
pain).
New comers: UML and XML
Entity / Relationship Diagrams
Objects
Classes
entities
entity sets
Attributes are the names of roles played by
some domain (a set of atomic values)
in a relation (a table of values or file of
records).
Relationships are associations among entities.
Product
address
buys
name
category
name
price
makes
Company
Product
stockprice
buys
employs
Person
address
name
ssn
Multi-way Relationships
How do we model a purchase relationship between buyers,
products and stores?
Product
Purchase
Person
Store
Roles in Relationships
What if we need an entity set twice in one relationship?
Product
Purchase
buyer
salesperson
Person
Store
Attributes on Relationships
date
Product
Purchase
Person
Store
The Relational Data Model
Database
Model
(E/R, UML)
Diagrams
(E/R, UML)
Relational
Schema
Tables:
row names: attributes
rows: tuples
Physical
storage
Complex
file organization
and index
structures.
Terminology
Product
Attribute names
Name
Price
Category
iPhone
$459.99
phone
Apple
Vista
$299.99
OS
MS
SingleTouch $149.99 photography
MultiTouch
tuples
$203.99
household
Manufacturer
Canon
Hitachi
More Terminology
Every attribute has an atomic type.
Relation Schema: relation name + attribute names +
attribute types
Relation instance: a set of tuples.
Only one copy of any tuple
Database Schema: a set of relation schemas.
Database instance: a relation instance for every relation in
the schema.
More on Tuples
Formally, a mapping from attribute names to values:
name
price
category
manufacturer
iPhone
$449.99
phone
Apple
Sometimes we refer to a tuple by itself: (note order of attributes)
(iPhone, $449.99, phone, Apple)
or
Product (iPhone, $449.99, phone, Apple).
Updates
The database maintains a current database state.
Updates to the data:
1) add a tuple
2) delete a tuple
3) modify an attribute in a tuple
Updates to the data happen very frequently.
Updates to the schema: relatively rare. Rather painful.
• Need good DB design
• Speed and space (security)
From E/R Diagrams to
Relational Schema
- relationships are already independent entities
- only atomic types exist in the E/R model.
Entity sets
relations
Relationships
relations
Special care for weak entity sets – existence depends on
existence
of another entity. Example: Dependent of Employee .
name
category
name
price
makes
Company
Product
Stock price
buys
employs
Person
address
name
ssn
Entity Sets to Relations
name
category
price
Product
Product:
Name
Category
iPhone
phone
Price
$450
Relationships to Relations
name
Start Year
category
name
makes
Company
Product
Stock price
Relation MAKES (watch out for attribute name conflicts)
Product-name
iPhone
Product-Category Company-name Starting-year
phone
Apple
2010
Mapping an UML Object Model
to a Database
UML object models can be mapped to relational
databases:
Some degradation occurs because all UML
constructs must be mapped to a single relational
database construct - the table
Mapping of classes and attributes
Each class is mapped to a table
Each attribute is mapped onto a column in the table
An instance of a class represents a row in the table
Methods are not mapped.
Mapping a Class to a Table
User
+firstName:String
+login:String
+email:String
+id:long
User table
id:long
firstName:text[25]
login:text[8]
email:text[32]
Primary and Foreign Keys
Any set of attributes that could be used to uniquely
identify any data record in a relational table is called a
candidate key
The actual candidate key that is used in the application to
identify the records is called the primary key
The primary key of a table is a set of attributes whose
values uniquely identify the data records in the table
A foreign key is an attribute (or a set of attributes) that
references the primary key of another table.
Example for Primary and Foreign
Keys
Primary key
User table
firstName
login
email
“alice”
“am384”
“[email protected]”
“john”
“js289”
“[email protected]”
“bob”
“bd”
“[email protected]”
Candidate key
League table
name
Candidate key
login
“tictactoeNovice”
“am384”
“tictactoeExpert”
“bd”
“chessNovice”
“js289”
Foreign key referencing User table
Buried Association
Associations with multiplicity “one” can be
implemented using a foreign key
For one-to-many associations we add the foreign key to the
table representing the class on the “many” end
LeagueOwner
1
*
League
owner
League table
LeagueOwner table
id:long
...
id:long
...
owner:long
Another Example for Buried
Association
Transaction
Portfolio
*
portfolioID
...
transactionID
Transaction Table
transactionID
Portfolio Table
portfolioID ...
portfolioID
Foreign Key
Mapping Many-To-Many
Associations
In this case we need a separate table for the association
City
*
Serves
*
cityName
Airport
airportCode
airportName
Separate table for
the association “Serves”
Primary Key
City Table
cityName
Houston
Albany
Munich
Hamburg
Airport Table
airportCode
IAH
HOU
ALB
MUC
HAM
airportName
Intercontinental
Hobby
Albany County
Munich Airport
Hamburg Airport
Serves Table
cityName airportCode
IAH
Houston
HOU
Houston
ALB
Albany
MUC
Munich
HAM
Hamburg
Another Many-to-Many
Association Mapping
We need the Tournament/Player association as a separate table
Tournament *
*
Player
Tournament table
id
name
23
novice
24
expert
...
Player table
TournamentPlayerAssociation
table
tournament
player
23
56
23
79
id
name
56
alice
79
john
...
Problems in Designing
Schema
Title
OS
DB
SE
….
ISBN
1234-390-231
3234-390-241
5234-390-281
Publisher
Wiley
Wiley
Wiley
Problems:
- redundancy
- update anomalies
- deletion anomalies
Phone
Address
312-1234567 87 1st Ave, NY, …
312-1234567 87 1st Ave, NY, …
312-1234567 87 1st Ave, NY, …
Relation Decomposition
Break the relation into two relations:
Book
Title
OS
DB
SE
….
Publisher
ISBN
1234-390-231
3234-390-241
5234-390-281
Author Publisher
xxx
yyy
aaa
Name
Phone Number
Wiley
Wiley
McGraw
McGraw
(201) 555-1234
(201) 555-1234
(320) 234-9876
(320) 234-9876
Wiley
Wiley
Wiley
Address
87 1st Ave, NY, …
87 1st Ave, NY, …
87 1st Ave, NY, …
87 1st Ave, NY, …
Anomalies
The updated programs will not operate correctly.
Examples: EMP_DEPT relation
EName SIN BDate ADDR Dnumber Dname DMgrSIN
Insertion anomalies: It is difficult to insert a new department
that has no employees as yet in the EMP_DEPT relation.
Deletion anomalies: If we delete from the EMP_DEPT an
employee tuple that happens to represent the last employee
working for a particular department, the information
concerning that department is lost from the database.
Update anomalies: In EMP_DEPT relation, if we want to
change the value of one of the attributes of a particular
department, say the manager of department 5, we must
update the tuples of all employees who work in that
department; otherwise, the database will become inconsistent.
Decompositions in General
Let R be a relation with attributes A1, A2, …, An
Create two relations R1 and R2 with attributes
B1, B2, … Bm
Such that:
B1, B2, … Bm
C1 , C2 , … Cl
C1 , C2 , … Cl = A1, A2, … A n
And
-- R1 is the projection of R on B , B , … B
1
2
m
-- R2 is the projection of R on C , C , … C
1
2
l
Boyce-Codd Normal Form
A simple condition for removing anomalies from relations:
A relation R is in BCNF if and only if:
Whenever there is a nontrivial dependency
A1, A2, …, An
1
for R , it is the case that {A1, A2, …, An}
a super-key for R.
In English (though a bit vague):
Whenever a set of attributes of R is determining another
attribute, it should determine all the attributes of R.
B
Example
Title
ISBN
Publisher Author Phone
Addr
OS
0-471-20284-3
Wiley
xxx
(201) 555-1234
1234 1st
DB
SE
Netw.
0-471-20282-3
0-471-20267-8
0-471-20267-8
Wiley
Wiley
Wiley
yyy
aaa
bbb
(206) 572-4312 1234 1st
(201) 555-1234 1234 1st
(201) 555-1234 1234 1st
What are the dependencies?
What are the keys?
Is it in BCNF?
And Now?
Title
ISBN
OS
DB
SE
Netw.
Publisher
0-471-20284-3
0-471-20282-3
0-471-20267-8
0-471-20267-8
Wiley
Wiley
Wiley
Wiley
Publisher
Phone
Wiley
McGraw
555-1234
234-9876
Author
xxx
yyy
aaaa
bbb
Addr
1234 1st St. ……
9876 5th Ave. ….
More Examples
EMP_DEPT:
ENAME SIN BDATE ADDR DNUM DNAME DMGRSIN
What’s wrong?
How to decompose? Functional dependency.
Decompose EMP_DEPT into:
EMP
ENAME SIN
BDATE ADDR
DEPT
DNUM
DNAME
DMGRSIN
DNUM
More Examples (cont’d)
Example:
EMP_PROJ
SIN
PNUMBER HOURS
ENAME
PNAME
Can be decomposed into
EP1
SIN
PNUMBER
SIN
ENAME
HOURS
EP2
EP3
PNUMBER
PNAME
PLOCATOIN
PLOCATOIN
More Examples (cont’d)
EMP
ENAME Proj_NAME Dep_NAME
Smith
X
john
Smith
y
anna
Smith
x
anna
Smith
y
john
Brown
w
jim
Brown
x
jim
Brown
y
jim
Brown
z
jim
Brown
w
Joan
Brown
x
joan
Brown
y
joan
Brown
z
joan
Brown
w
bob
Brown
x
bob
Brown
y
bob
Brown
z
bob
Decompose EMP into:
More Examples (cont’d)
EMP_PROJECTS
ENAME
Proj_NAME
Smith
x
Smith
y
Brown
w
Brown
x
Brown
y
Brown
z
EMP_DEPENDENTS
ENAME
Dep_NAME
Smith
anna
Smith
john
Brown
jim
Brown
joan
Brown
bob
SQL Introduction
Standard language for querying and manipulating data
Structured Query Language
Many standards out there: SQL92, SQL2, SQL3.
Vendors support various subsets of these, but all of what we’ll
be talking about.
Basic form: (many many more bells and whistles in addition)
Select attributes
From relations (possibly multiple, joined)
Where conditions (selections)
SQL Examples
Employee (FNAME, LNAME, SSN, BDATE, ADDR, SALARY, SUPERSSN, DNO)
Department (DNAME, DNUMBER, MGRSSN, MGRSTARTDATE)
Research,
5, 333445555, 22-May-78
Administration, 4, 987654321, 1-Jan-85
Headquarters, 1, 888665555, 19-Jun-71
Q1: Find John Smith’s birthday and address:
Q2: Find the salary of all employees:
Q3: Find all the attributes of all employees who work for department 5
Q4: Find all employees who work for the Research department
Q5: For each employee, retrieve the employee’s first and last name, and the
first and last name of all employees who work in the same department.
Q6: For each employee, retrieve the employee’s first and last name, and the
first name and last name of his/her supervisor.
SQL Examples - 1
Employee (FNAME, LNAME, SSN, BDATE, ADDR, SALARY, SUPERSSN, DNO)
Q1: Find John Smith’s birthday and address:
SELECT BDATE, ADDRESS
FROM EMPLOYEE
WHERE FNAME = ‘John’ AND LNAME = ‘Smith’
Q2: Find the salary of all employees:
SELECT SALARY
FROM EMPLOYEE
Q3: Find all the attributes of all employees who work for department 5
SELECT *
FROM EMPLOYEE
WHERE DNO = 5
SQL Examples - 2
Employee (FNAME, LNAME, SSN, BDATE, ADDR, SALARY, SUPERSSN, DNO)
Department (DNAME, DNUMBER, MGRSSN, MGRSTARTDATE)
Research,
5, 333445555, 22-May-78
Administration, 4, 987654321, 1-Jan-85
Headquarters, 1, 888665555, 19-Jun-71
Q4: Find all employees who work for the Research department
SELECT FNAME, LNAME, ADDRESS
FROM EMPLOYEE, DEPARTMENT
WHERE DNAME = ‘Research’ AND DNUMBER = DNO
SQL Examples - 3
Employee (FNAME, LNAME, SSN, BDATE, ADDR, SALARY, SUPERSSN, DNO)
Department (DNAME, DNUMBER, MGRSSN, MGRSTARTDATE)
Q5: For each employee, retrieve the employee’s first and last name, and the
first and last name of all employees who work in the same department.
SELECT
FROM
WHERE
E.FNAME, E.LNAME, S.FNAME, S.LNAME
EMPLOYEE AS E, EMPLOYEE AS S
E.DNO = S.DNO
Q6: For each employee, retrieve the employee’s first and last name, and the
first name and last name of his/her supervisor.
SELECT
FROM
WHERE
E.FNAME, E. LNAME, S.FNAME, S.LNAME
EMPLOYEE AS E, EMPLOYEE AS S
E.SUPERSSN=S.SSN
Selections
SELECT *
FROM Company
WHERE country=“USA” AND stockPrice > 50
You can use:
attribute names of the relation(s) used in the FROM.
comparison operators: =, <>, <, >, <=, >=
apply arithmetic operations: stockprice*2
operations on strings (e.g., “||” for concatenation).
lexicographic order on strings.
pattern matching: s LIKE p
special stuff for comparing dates and times.
Projections
Select only a subset of the attributes
SELECT name, stock price
FROM Company
WHERE country=“USA” AND stockPrice > 50
Rename the attributes in the resulting table
SELECT name AS company, stockprice AS price
FROM Company
WHERE country=“USA” AND stockPrice > 50
Ordering the Results
SELECT name, stock price
FROM Company
WHERE country=“USA” AND stockPrice > 50
ORDERBY country, name
Ordering is ascending, unless you specify the DESC keyword.
Ties are broken by the second attribute on the ORDERBY list, etc.
Joins
SELECT name, store
FROM
Person, Purchase
WHERE name=buyer AND city=“Ottawa”
AND product=“iPhone”
Product ( name, price, category, maker)
Purchase (buyer, seller, store, product)
Company (name, stock price, country)
Person (name, phone number, city)
Disambiguating Attributes
Find names of people buying telephony products:
SELECT Person.name
FROM
Person, Purchase, Product
WHERE
Person.name=buyer
AND product=Product.name
AND Product.category=“telephony”
Product ( name, price, category, maker)
Purchase (buyer, seller, store, product)
Person( name, phone number, city)
Tuple Variables
Find pairs of companies making products in the same category
SELECT product1.maker, product2.maker
FROM
Product AS product1, Product AS product2
WHERE
product1.category=product2.category
AND product1.maker <> product2.maker
Product ( name, price, category, maker)
Union, Intersection, Difference
(SELECT name
FROM
Person
WHERE City=“Seattle”)
UNION
(SELECT name
FROM
Person, Purchase
WHERE buyer = name AND store = “The Bon”)
Similarly, you can use INTERSECT and EXCEPT.
You must have the same attribute names (otherwise: rename).
Subqueries
SELECT Purchase.product
FROM Purchase
WHERE buyer =
(SELECT name
FROM Person
WHERE social-security-number = “123 - 45 - 6789”);
In this case, the subquery returns one value.
If it returns more, it’s a run-time error.
Subqueries Returning
Relations
Find companies who manufacture products bought by Joe Blow.
SELECT Company.name
FROM
Company, Product
WHERE Company.name=maker
AND Product.name IN
(SELECT product
FROM Purchase
WHERE buyer = “Joe Blow”);
You can also use: s > ALL R
s > ANY R
EXISTS R
Conditions on Tuples
SELECT Company.name
FROM
Company, Product
WHERE Company.name=maker
AND (Product.name,price) IN
(SELECT product, price)
FROM Purchase
WHERE buyer = “Joe Blow”);
Correlated Queries
Find movies whose title appears more than once.
SELECT title
FROM Movie AS Old
WHERE year < ANY
(SELECT year
FROM Movie
WHERE title = Old.title);
Movie (title, year, director, length)
Movie titles are not unique (titles may reappear in a later year).
Note scope of variables
Removing Duplicates
SELECT DISTINCT Company.name
FROM
Company, Product
WHERE Company.name=maker
AND (Product.name,price) IN
(SELECT product, price)
FROM Purchase
WHERE buyer = “Joe Blow”);
Conserving Duplicates
The UNION, INTERSECTION and EXCEPT operators
operate as sets, not bags.
(SELECT name
FROM
Person
WHERE City=“Seattle”)
UNION ALL
(SELECT name
FROM
Person, Purchase
WHERE buyer=name AND store=“The Bon”)
Aggregation
SELECT Sum(price)
FROM
Product
WHERE manufacturer=“Toyota”
SQL supports several aggregation operations:
SUM, MIN, MAX, AVG, COUNT
Except COUNT, all aggregations apply to a single attribute
SELECT Count(*)
FROM Purchase
Grouping and Aggregation
Usually, we want aggregations on certain parts of the relation.
Find how much we sold of every product
SELECT
FROM
WHERE
GROUPBY
product, Sum(price)
Product, Purchase
Product.name = Purchase.product
Product.name
1. Compute the relation (I.e., the FROM and WHERE).
2. Group by the attributes in the GROUPBY
3. Select one tuple for every group (and apply aggregation)
SELECT can have (1) grouped attributes or (2) aggregates.
HAVING Clause
Same query, except that we consider only products that had
at least 100 buyers.
SELECT
FROM
WHERE
GROUPBY
HAVING
product, Sum(price)
Product, Purchase
Product.name = Purchase.product
Product.name
Count(buyer) > 100
HAVING clause contains conditions on aggregates.
Modifying the Database
We have 3 kinds of modifications: insertion, deletion, update.
Insertion: general form -INSERT INTO R(A1,…., An) VALUES (v1,…., vn)
Insert a new purchase to the database:
INSERT INTO Purchase(buyer, seller, product, store)
VALUES (Joe, Fred, wakeup-clock-espresso-machine,
“The Sharper Image”)
If we don’t provide all the attributes of R, they will be filled with NULL.
We can drop the attribute names if we’re providing all of them in order.
More Interesting Insertions
INSERT INTO PRODUCT(name)
SELECT DISTINCT product
FROM Purchase
WHERE product NOT IN
(SELECT name
FROM Product)
The query replaces the VALUES keyword.
Note the order of querying and inserting.
Deletions
DELETE FROM
WHERE
PURCHASE
seller = “Joe” AND
product = “Brooklyn Bridge”
Factoid about SQL: there is no way to delete only a single
occurrence of a tuple that appears twice
in a relation.
Updates
UPDATE PRODUCT
SET price = price/2
WHERE Product.name IN
(SELECT product
FROM Sales
WHERE Date = today);
Defining Views
Views are relations, except that they are not physically stored.
They are used mostly in order to simplify complex queries and
to define conceptually different views of the database to different
classes of users.
View: purchases of telephony products:
CREATE VIEW telephony-purchases AS
SELECT product, buyer, seller, store
FROM Purchase, Product
WHERE Purchase.product = Product.name
AND Product.category = “telephony”
A Different View
CREATE VIEW Seattle-view AS
SELECT buyer, seller, product, store
FROM Person, Purchase
WHERE Person.city = “Seattle” AND
Person.name = Purchase.buyer
We can later use the views:
SELECT name, store
FROM
Seattle-view, Product
WHERE Seattle-view.product = Product.name AND
Product.category = “shoes”
What’s really happening when we query a view??
What is a Transaction?
Any action that reads from and/or writes to
a database may consist of
Simple SELECT statement to generate a list of
table contents
A series of related UPDATE statements to
change the values of attributes in various
tables
A series of INSERT statements to add rows to
one or more tables
A combination of SELECT, UPDATE, and
INSERT statements
What is a Transaction? (cont’d)
A logical unit of work that must be either
entirely completed or aborted
Successful transaction changes the
database from one consistent state to
another
One in which all data integrity constraints are
satisfied
Most real-world database transactions are
formed by two or more database requests
The equivalent of a single SQL statement in
an application program or transaction
Evaluating Transaction Results
Not all transactions update the database
SQL code represents a transaction because
database was accessed
Improper or incomplete transactions can have a
devastating effect on database integrity
Some DBMSs provide means by which user can
define enforceable constraints based on business
rules
Other integrity rules are enforced automatically by
the DBMS when table structures are properly
defined, thereby letting the DBMS validate some
transactions
Transaction Properties
Atomicity
Requires that all operations (SQL requests) of
a transaction be completed
Durability
Indicates permanence of database’s
consistent state
Transaction Properties
(continued)
Serializability
Ensures that the concurrent execution of
several transactions yields consistent results
Isolation
Data used during execution of a transaction
cannot be used by second transaction until
first one is completed
Transaction Management with
SQL
ANSI has defined standards that govern SQL
database transactions
Transaction support is provided by two SQL
statements:
COMMIT: permanent change to a DB
ROLLBACK: undo a change to a DB up to the COMMIT
point
ANSI standards require that, when a transaction
sequence is initiated by a user or an application
program,
it must continue through all succeeding SQL
statements until one of four events occurs
The Transaction Log
Stores
A record for the beginning of transaction
For each transaction component (SQL
statement)
Type of operation being performed (update,
delete, insert)
Names of objects affected by the transaction (the
name of the table)
“Before” and “after” values for updated fields
Pointers to previous and next transaction log
entries for the same transaction
The ending (COMMIT) of the transaction