SQL: Queries, Programming, Triggers

Download Report

Transcript SQL: Queries, Programming, Triggers

Database Systems II
Introduction
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester
1
Database Systems I Recap
A Database Management System (DBMS) is a
software package designed to store, manage
and retrieve databases.
A Database System (DBS) consists of two
components:
the DBMS
the database.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester
2
Database Systems I Recap
Why use a DBS?
- Logical data independence.
- Physical data independence.
- Efficient access.
- Reduced application development time.
- Data integrity and security.
- Concurrent access / concurrency control.
- Recovery from crashes.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester
3
Database Systems I Recap
A data model is a collection of concepts for
describing data (a formal language!).
A schema is a description of a particular collection
of data (database), using the given data model.
The relational data model is the most widely used
model today.
Main concept: relation, basically a table with rows
and columns.
Every relation has a schema, which describes the
columns, or fields.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester
4
Database Systems I Recap
The conceptual schema
defines the logical structure
of the whole database.
View 1 View 2 View 3
An external schema (view)
describes how some user
Conceptual Schema
sees the data (restricted
Physical Schema
access, derived data).
The physical schema
describes the storage and
index structures of the
database.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester
5
Database Systems I Recap
Relational database: a set of relations
Relation: made up of 2 parts:
Instance : a table, with rows and columns.
#Rows = cardinality,
#attributes = degree / arity.
Schema : specifies name of relation, plus name
and type of each attribute.
e.g. Students(sid: string, name: string, login:
string, age: integer, gpa: real).
Can think of a relation as a set of rows or tuples
(i.e., all rows are distinct).
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester
6
Database Systems I Recap
Relational algebra: mathematical query language
which forms the basis for “real” languages (e.g.
SQL), and for implementation.
Five basic operations:
union, set-difference, selection, projection,
cartesian product.
Shortcuts for common operations:
join, division.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester
7
Database Systems I Recap
SQL: the standard practical query language for
relational databases.
Schema modifications: create, alter, delete table.
Instance modifications: insert, delete, update
tuples of a table.
Queries to retrieve a specified set of tuples (what).
Queries are descriptive, which allows the DBS to
find the most efficient way how to process a
query.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester
8
Database Systems I Recap
SELECT
FROM
WHERE
[DISTINCT] target-list
relation-list
qualification
relation-list A list of relation names (possibly
with a range-variable after each name).
target-list A list of attributes of relations in
relation-list.
qualification Comparisons (“Attr op const” or
“Attr1 op Attr2”, where op is one of
) combined using AND, OR and NOT.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester
9
Database Systems I Recap
Semantics of an SQL query defined in terms of
the following conceptual evaluation strategy.
Compute the cross-product of relation-list.
Selection of the tuples satisfying qualifications.
Projection onto the attributes that are in targetlist.
If DISTINCT is specified, eliminate duplicate
rows.
A query optimizer will find more efficient
strategies to compute the same answers.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester
10
A Simple DBS Implementation
Relations
A
B
C
D E
SQL
Statements
A
Results
D
A
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester
D
11
A Simple DBS Implementation
Relations stored in files (ASCII)
e.g., relation R is in /usr/db/R.txt
Smith # 123 # CS
Jones # 522 # EE
.
.
Schema file (ASCII) in /usr/db/schema.txt
R1 # A # INT # B # STR …
R2 # C # STR # A # INT …
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester
12
A Simple DBS Implementation
Sample query
& select *
from R #
Relation R
A
B
SMITH 123
C
CS
&
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester
13
A Simple DBS Implementation
Sample session
& select *
from R | LPR #
&
Query result sent to printer
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester
14
A Simple DBS Implementation
Creating a new relation T
& select *
from R
where R.A < 100 | T #
&
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester
15
A Simple DBS Implementation
Processing single table queries
To process “select * from R where condition”:
(1) Read dictionary to get R attributes
(2) Read R file.
For each line:
(a) Check condition
(b) If OK, display
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester
16
A Simple DBS Implementation
Processing single table queries creating a new
table
To process
“select * from R where condition | T”:
(1) Process select as before
(2) Write results to new file T
(3) Append new line to dictionary
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester
17
A Simple DBS Implementation
Processing multi-table queries
To process
“select A,B from R,S where condition”:
(1) Read dictionary to get R,S attributes
(2) Read R file, for each line:
(a) Read S file, for each line:
(i) Create join tuple A,B from R,S
(ii) Check condition
(iii) Display if OK
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester
18
What’s wrong with this Implementation?
Tuple layout on disk
e.g.,
- Change string from ‘Cat’ to ‘Cats’ and we
have to rewrite the entire file
- ASCII storage is expensive
wastes a factor of ~256/10 of space for
integers
- Deletions are expensive
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester
19
What’s wrong with this Implementation?
Search very expensive
e.g.,
- Cannot find tuple with given key quickly
- Always have to read full relation
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester
20
What’s wrong with this Implementation?
Inefficient query processing
e.g.,
select *
from R,S
where R.A = S.A and S.B > 1000
Simple implementation has quadratic runtime
complexity
- Do selection first?
- More efficient join?
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester
21
What’s wrong with this Implementation?
No buffer manager
In particular, need caching
No concurrency control
No concept of transactions
Need to enforce ACID properties
No API
No interaction with other DBS
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester
22
DBS Architecture
Strategy Selector
User Transaction
Concurrency Control
Lock Table
Query Parser
Transaction Manager
Buffer Manager
File Manager
Statistical
Data
User
Recovery Manager
M.M. Buffer
Log
Indexes
User Data
System Data
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester
23
Outline Database Systems II
Secondary storage management
disks, records and files, . . .
Index structures
B-trees, hash tables, multi-dimensional
indexes
Query execution
one-pass algorithms, two-pass algorithms,
index-based algorithms
Query compiler
parsing and preprocessing, query
optimization, cost estimation
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester
24
Outline Database Systems II
Crash recovery
disk failures, stable storage, logging,…
Concurrency Control
correctness, locks, scheduling, …
Transaction Processing
logs, deadlocks, serializability,…
Data Mining
knowledge discovery in databases,
association rules
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester
25
Marking Scheme
Assignments 40%
paper and pencil,
no programming
Midterm exam 15%
covering all material up to and including
query optimization
Final exam 45%
covering all the material
No alternative marking scheme
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester
26
Tentative Schedule
October 21
other instructor or class canceled
October 28
midterm exam
December 2
last class
December 16
final exam
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester
27
References
Textbook
- Database Systems: The Complete Book, Garcia-Molina,
Ullman, and Widom, Prentice Hall, 2008: 2nd edition
- relevant sections listed in schedule on class
website, study these sections in advance!
Recommended book
Database Management Systems, Ramakrishnan
and Gehrke, McGraw Hill, 2003: 3rd edition
Lecture slides
- based on slides by Hector Garcia-Molina
and Martin Theobald,
- posted on the class website.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester
28