Course Overview/RDBMS Review
Download
Report
Transcript Course Overview/RDBMS Review
L1: Course Overview & Review
H.Lu/HKUST
The Teaching Staff
Instructor: Lu Hongjun
Office: 3543 (Lift 25-26), HKUST
E-Mail: [email protected]
URL:
http://www.cs.ust.hk/~luhj
Research Interests:
• Data/Knowledge base management with emphasis on query processing
and optimization
• Data warehousing and data mining
• Applied performance evaluation
• Database application development
• Parallel and distributed database systems
TA:
Name
Office:
E-Mail:
URL:
H.Lu/HKUST
Jiang Haifeng
Liu Guimei
4212 (DB Lab) HKUST
[email protected]
[email protected]
http://ihome.ust.hk/~jianghf
http://ihome.ust.hk/~cslgm
L01: RDBMS REVIEW -- 2
References
R.Ramakrishnan & J. Gehrke. Database Management
Systems, 3rd Ed. McGraw Hill, 2000
D. Shasha & P. Bonnet. Database Tuning: Principles,
Experiments, and Troubleshooting Techniques,
Revised edition , Morgan Kaufmann, 2002
Related papers
H.Lu/HKUST
L01: RDBMS REVIEW -- 3
Course Contents
Part I: Issues in database administration
Database design
Principles of database performance tuning
Database security
Part II: Emerging DB-related technology
OLAP and data warehouse
XML data management
Data stream processing
Course Web Page: http://course.cs.ust.hk/comp334/
H.Lu/HKUST
L01: RDBMS REVIEW -- 4
Grading
Written assignment (20%)
Exams (25%)
Course project (50 %)
Class participation (5%)
H.Lu/HKUST
L01: RDBMS REVIEW -- 5
Course Project Requirements
Carried in teams of two or four
Database related projects
You propose your own project, and get approve from the
instructor
Topic: database related
The amount of work : it accounts for 50% of your final
grade
Required documents (double-spaced)
Project proposal (1-2 pages)
• due date: 23-24/02
Status report (4-6 pages)
• due date: 28-29/03
Final report (8-10 pages)
• due date: 10-11/05
H.Lu/HKUST
L01: RDBMS REVIEW -- 6
Summary
It is a graduate level course
Not a DBA course
Not an introductory database course
Not a programming course, but you need to know
how to write programs
Hopefully, you will leave with
A good grade
A good understanding of studied topics
H.Lu/HKUST
L01: RDBMS REVIEW -- 7
Review -- RDBMS
Relational database systems
The basic concepts in database systems
Relational data model
Relational languages
Database design
Previous course: conceptual and logic design
This course: physical database design
Database management systems
The basic components of DBMS
Storage management
Transaction management
Query processing & optimization
H.Lu/HKUST
L01: RDBMS REVIEW -- 8
What Is Database & DBMS?
Database: a very large, integrated, persistent
collection of data.
Models real-world enterprise.
• Entities (e.g., students, courses)
• Relationships (e.g., James is taking CSIT530)
A Database Management System (DBMS) is a
software package designed to store and manage
databases.
H.Lu/HKUST
L01: RDBMS REVIEW -- 9
Data Models
A data model is a collection of concepts for describing
data and related operations,
semantics of data,
relationship among data, and
constraints on data
Two types of data models
Conceptual models: emphasize semantics of data
• Entity-Relationship model, Object-Oriented model
Logical models: ways how the data is organized in the
logical level
• Hierarchical model, Network model, Relational model
H.Lu/HKUST
L01: RDBMS REVIEW -- 10
Instances and Schemas
A schema is a description of a particular collection of
data, using a given data model - the logical structure
of the database (e.g., set of customers and accounts
and the relationship between them)
Schema Instance - the actual content of the database
at a particular point in time
Similar to types and variables in programming
languages
H.Lu/HKUST
L01: RDBMS REVIEW -- 11
Levels of Abstraction
ANSI-SPARC three-level
architecture
Many views, single
conceptual (logical) schema
and physical schema.
Views describe how users
see the data.
Conceptual schema
defines logical structure
Physical schema describes
the files and indexes used.
H.Lu/HKUST
View
View
View
Conceptual Schema
Physical Schema
L01: RDBMS REVIEW -- 12
Data Independence
Applications insulated from how data is structured and stored.
Ability to modify a schema definition in one level without
affecting a schema definition in the next higher level.
The interfaces between the various levels and components
should be well defined so that changes in some parts do
not seriously influence others.
Logical data independence: Protection from changes in
logical structure of data.
Physical data independence: Protection from changes in
physical structure of data.
H.Lu/HKUST
L01: RDBMS REVIEW -- 13
Database Environment
Procedures
And standards
Specifies &
enforces
Database
Administrator
Database
Designer
designs
End Users
use
manages
Hardware
Analysts &
Programmers
Application
Programs
DBMS
write
designs
H.Lu/HKUST
System
administrator
Data
L01: RDBMS REVIEW -- 14
DBMS Related Languages
Data Definition Language (DDL)
Specification notation for defining the database schema
Data storage and definition language - special type of
DDL in which the storage structure and access methods
used by the database system are specified
Data Manipulation Language (DML)
Language for accessing and manipulation the data
organized by the appropriate data model
Two classes of languages
• Procedural - user specifies what data is required and how to get
those data.
• Nonprocedural - user specifies what data is required without
specifying how to get those data
H.Lu/HKUST
L01: RDBMS REVIEW -- 15
DBMS Related Languages
Host Language
Programming
Language for
DBMS Applications
DDL
Data
Sublanguage
Procedural
DML
Non-Procedural
Query Language
H.Lu/HKUST
L01: RDBMS REVIEW -- 16
Evolution of Database Technology
1960s: Hierarchical (IMS) & network (CODASYL)
DBMS.
1970s: Relational data model, relational DBMS
implementation.
1980: RDBMS rules the earth
1985-: Advanced data models (extended-relational,
OO, deductive, etc.)
Application-oriented DBMS (spatial, scientific,
engineering, etc.).
1990s: ORDB, OLAP, Data mining, data warehousing,
multimedia databases, and network databases.
H.Lu/HKUST
L01: RDBMS REVIEW -- 17
What is an RDBMS
A piece of software that manages data based on the
relational model
Relational data, SQL queries
Commercial products
Oracle, IBM DB2, IBM Informix, Sybase,
Microsoft SQL Server
Each has ~10 million lines of C/C++ code
Smaller packages – MySQL, PostgresSQL
H.Lu/HKUST
L01: RDBMS REVIEW -- 18
Relational Data Model
Main concept: relation
A table with rows and columns
Every relation has a schema
Description of the columns, or fields
Relational data – rows in a table
No order among the rows in a table
The most widely used data model!
H.Lu/HKUST
L01: RDBMS REVIEW -- 19
University Database
Conceptual schema:
Students (sid: string, name: string, login: string, age:
integer, gpa:real)
sid
53666
53688
53650
name
login
Jones jones@cs
Smith smith@eecs
Smith smith@math
age
18
18
19
gpa
3.4
3.2
3.8
Cardinality = 3, degree = 5 , all rows distinct
Courses (cid: string, cname:string, credits:integer)
Enrolled (sid:string, cid:string, grade:string)
H.Lu/HKUST
L01: RDBMS REVIEW -- 20
Relational Languages
Formal languages
Relational algebra
Relational calculus
Commercial language: SQL
DDL (Data Definition Language)
• Create Table, Create Index, Create View …
DML (Data Manipulation Language)
• Queries
– Select
• Updates
– Insert, Delete, Update
H.Lu/HKUST
L01: RDBMS REVIEW -- 21
Creating Tables
CREATE TABLE
Students
(sid: CHAR(20),
name: CHAR(20),
login: CHAR(10),
age: INTEGER,
gpa: REAL)
H.Lu/HKUST
CREATE TABLE
Enrolled
(sid: CHAR(20),
cid: CHAR(20),
grade: CHAR(2))
L01: RDBMS REVIEW -- 22
Primary Key Constraints
A set of fields is a key for a relation if :
1. Any two distinct tuples differ in some fields of
the set, and
2. This is not true for any subset of the set.
A superkey: Condition 1 true and 2 false.
E.g., sid is a key for Students. {sid, gpa} is a
superkey.
One primary key can be set per relation.
H.Lu/HKUST
L01: RDBMS REVIEW -- 23
Primary and Candidate Keys
CREATE TABLE
CREATE TABLE
Students
Enrolled
(sid: CHAR(20),
(sid CHAR(20)
name: CHAR(20),
cid CHAR(20),
login: CHAR(10),
grade CHAR(2),
age: INTEGER,
PRIMARY KEY (sid,cid))
gpa: REAL,
PRIMARY KEY (sid),
UNIQUE (login))
H.Lu/HKUST
L01: RDBMS REVIEW -- 24
Foreign Key Constraints
Foreign key : a set of fields in a relation
Refers to the primary key of another relation
Referential integrity
CREATE TABLE Enrolled
(sid CHAR(20), cid CHAR(20), grade
No dangling references
Enrolled
sid
53666
53666
53650
53666
cid
grade
Carnatic101
C
Reggae203
B
Topology112
A
History105
B
H.Lu/HKUST
CHAR(2),
PRIMARY KEY (sid,cid),
FOREIGN KEY (sid) REFERENCES
Students )
Students
sid
53666
53688
53650
name
login
Jones jones@cs
Smith smith@eecs
Smith smith@math
age gpa
18 3.4
18 3.2
19 3.8
L01: RDBMS REVIEW -- 25
Integrity Constraints (ICs)
IC: condition that must be true for any db instance
Domain constraints
Primary constraints
Foreign key constraints
ICs are specified when a schema is defined.
ICs are checked when relations are modified.
A legal instance of a relation
Satisfies all specified ICs
H.Lu/HKUST
L01: RDBMS REVIEW -- 26
Adding and Deleting Tuples
INSERT INTO Students (sid, name, login, age, gpa)
VALUES (53688, ‘Smith’, ‘smith@ee’, 18, 3.2)
DELETE
FROM Students S
WHERE S.name = ‘Smith’
H.Lu/HKUST
L01: RDBMS REVIEW -- 27
Queries
SELECT *
FROM Students S
WHERE S.sid = 53688
sid
53666
53688
53650
H.Lu/HKUST
name
login
Jones jones@cs
Smith smith@eecs
Smith smith@math
age
18
18
19
gpa
3.4
3.2
3.8
L01: RDBMS REVIEW -- 28
Querying Multiple Tables
SELECT S.name, E.cid
FROM Students S, Enrolled E
WHERE S.sid=E.sid AND E.grade=“A”
Enrolled
sid
53666
53666
53650
53666
cid
grade
Carnatic101
C
Reggae203
B
Topology112
A
History105
B
H.Lu/HKUST
S.name E.cid
Smith
Topology112
Students
sid
53666
53688
53650
name
login
Jones jones@cs
Smith smith@eecs
Smith smith@math
age
18
18
19
gpa
3.4
3.2
3.8
L01: RDBMS REVIEW -- 29
Functional Components of DBMS
User/Application Database Administrator
Security Control
DML Stmt.
DDL Command
Query Processing
& Optimization
Transaction
Management
DDL
Compiler
Query Plan
Execution
Engine
Transaction Manager
Recovery
Concurrency
Control
Log
Lock
Table
Query Processing
Buffer
Index/file/record
Management
Storage Management
H.Lu/HKUST
Buffer
Management
Storage
Manager
Statistics Indexes
Metadata User data
L01: RDBMS REVIEW -- 30
Query Optimization
A major strength of RDBMS
SQL queries are declarative
Optimizer figures out how to answer them
Re-order operations
Pick among alternatives of one operation
Ensure that the answer is correct!
H.Lu/HKUST
L01: RDBMS REVIEW -- 31
Transaction
A key concept in databases
An atomic sequence of actions (read/write)
Brings DB from a consistent state to another
ACID
Atomicity
Consistency
Isolation
Durability
H.Lu/HKUST
L01: RDBMS REVIEW -- 32
Concurrency Control & Recovery
Concurrency Control
Essential for good DBMS performance
Run several user programs concurrently
Interleave actions of different users
Ensure the correctness
• Users may think it is a single-user system.
Recovery
Essential for durability of transactions
H.Lu/HKUST
L01: RDBMS REVIEW -- 33
RDBMS Features
Effective and efficient access
Easier application development
Data independence
Data integrity and security
Concurrent access
Recovery from crashes
Uniform data administration
H.Lu/HKUST
L01: RDBMS REVIEW -- 34
Summary
DBMS used to maintain, query large datasets.
Benefits include recovery from system crashes,
concurrent access, quick application development,
data integrity and security.
Levels of abstraction give data independence.
A DBMS typically has a layered architecture.
DBAs hold responsible jobs
and are well-paid!
DBMS R&D is one of the broadest,
most exciting areas in CS.
H.Lu/HKUST
L01: RDBMS REVIEW -- 35