Course Overview/RDBMS Review

Download Report

Transcript Course Overview/RDBMS Review

L1: Course Overview & Review
H.Lu/HKUST
The Teaching Staff

Instructor: Lu Hongjun
 Office: 3543 (Lift 25-26), HKUST
 E-Mail: [email protected]
 URL:
http://www.cs.ust.hk/~luhj
 Research Interests:
• Data/Knowledge base management with emphasis on query processing
and optimization
• Data warehousing and data mining
• Applied performance evaluation
• Database application development
• Parallel and distributed database systems

TA:
 Name
 Office:
 E-Mail:
 URL:
H.Lu/HKUST
Jiang Haifeng
Liu Guimei
4212 (DB Lab) HKUST
[email protected]
[email protected]
http://ihome.ust.hk/~jianghf
http://ihome.ust.hk/~cslgm
L01: RDBMS REVIEW -- 2
References
R.Ramakrishnan & J. Gehrke. Database Management
Systems, 3rd Ed. McGraw Hill, 2000
 D. Shasha & P. Bonnet. Database Tuning: Principles,
Experiments, and Troubleshooting Techniques,
Revised edition , Morgan Kaufmann, 2002
 Related papers

H.Lu/HKUST
L01: RDBMS REVIEW -- 3
Course Contents



Part I: Issues in database administration
 Database design
 Principles of database performance tuning
 Database security
Part II: Emerging DB-related technology
 OLAP and data warehouse
 XML data management
 Data stream processing
Course Web Page: http://course.cs.ust.hk/comp334/
H.Lu/HKUST
L01: RDBMS REVIEW -- 4
Grading
Written assignment (20%)
 Exams (25%)
 Course project (50 %)
 Class participation (5%)

H.Lu/HKUST
L01: RDBMS REVIEW -- 5
Course Project Requirements



Carried in teams of two or four
Database related projects
 You propose your own project, and get approve from the
instructor
 Topic: database related
 The amount of work : it accounts for 50% of your final
grade
Required documents (double-spaced)
 Project proposal (1-2 pages)
• due date: 23-24/02
 Status report (4-6 pages)
• due date: 28-29/03
 Final report (8-10 pages)
• due date: 10-11/05
H.Lu/HKUST
L01: RDBMS REVIEW -- 6
Summary
It is a graduate level course
 Not a DBA course
 Not an introductory database course
 Not a programming course, but you need to know
how to write programs
 Hopefully, you will leave with
 A good grade
 A good understanding of studied topics

H.Lu/HKUST
L01: RDBMS REVIEW -- 7
Review -- RDBMS



Relational database systems
 The basic concepts in database systems
 Relational data model
 Relational languages
Database design
 Previous course: conceptual and logic design
 This course: physical database design
Database management systems
 The basic components of DBMS
 Storage management
 Transaction management
 Query processing & optimization
H.Lu/HKUST
L01: RDBMS REVIEW -- 8
What Is Database & DBMS?

Database: a very large, integrated, persistent
collection of data.
 Models real-world enterprise.
• Entities (e.g., students, courses)
• Relationships (e.g., James is taking CSIT530)

A Database Management System (DBMS) is a
software package designed to store and manage
databases.
H.Lu/HKUST
L01: RDBMS REVIEW -- 9
Data Models


A data model is a collection of concepts for describing
 data and related operations,
 semantics of data,
 relationship among data, and
 constraints on data
Two types of data models
 Conceptual models: emphasize semantics of data
• Entity-Relationship model, Object-Oriented model
 Logical models: ways how the data is organized in the
logical level
• Hierarchical model, Network model, Relational model
H.Lu/HKUST
L01: RDBMS REVIEW -- 10
Instances and Schemas
A schema is a description of a particular collection of
data, using a given data model - the logical structure
of the database (e.g., set of customers and accounts
and the relationship between them)
 Schema Instance - the actual content of the database
at a particular point in time
 Similar to types and variables in programming
languages

H.Lu/HKUST
L01: RDBMS REVIEW -- 11
Levels of Abstraction


ANSI-SPARC three-level
architecture
Many views, single
conceptual (logical) schema
and physical schema.
 Views describe how users
see the data.
 Conceptual schema
defines logical structure
 Physical schema describes
the files and indexes used.
H.Lu/HKUST
View
View
View
Conceptual Schema
Physical Schema
L01: RDBMS REVIEW -- 12
Data Independence



Applications insulated from how data is structured and stored.
 Ability to modify a schema definition in one level without
affecting a schema definition in the next higher level.
 The interfaces between the various levels and components
should be well defined so that changes in some parts do
not seriously influence others.
Logical data independence: Protection from changes in
logical structure of data.
Physical data independence: Protection from changes in
physical structure of data.
H.Lu/HKUST
L01: RDBMS REVIEW -- 13
Database Environment
Procedures
And standards
Specifies &
enforces
Database
Administrator
Database
Designer
designs
End Users
use
manages
Hardware
Analysts &
Programmers
Application
Programs
DBMS
write
designs
H.Lu/HKUST
System
administrator
Data
L01: RDBMS REVIEW -- 14
DBMS Related Languages


Data Definition Language (DDL)
 Specification notation for defining the database schema
 Data storage and definition language - special type of
DDL in which the storage structure and access methods
used by the database system are specified
Data Manipulation Language (DML)
 Language for accessing and manipulation the data
organized by the appropriate data model
 Two classes of languages
• Procedural - user specifies what data is required and how to get
those data.
• Nonprocedural - user specifies what data is required without
specifying how to get those data
H.Lu/HKUST
L01: RDBMS REVIEW -- 15
DBMS Related Languages
Host Language
Programming
Language for
DBMS Applications
DDL
Data
Sublanguage
Procedural
DML
Non-Procedural
Query Language
H.Lu/HKUST
L01: RDBMS REVIEW -- 16
Evolution of Database Technology





1960s: Hierarchical (IMS) & network (CODASYL)
DBMS.
1970s: Relational data model, relational DBMS
implementation.
1980: RDBMS rules the earth
1985-: Advanced data models (extended-relational,
OO, deductive, etc.)
Application-oriented DBMS (spatial, scientific,
engineering, etc.).
1990s: ORDB, OLAP, Data mining, data warehousing,
multimedia databases, and network databases.
H.Lu/HKUST
L01: RDBMS REVIEW -- 17
What is an RDBMS
A piece of software that manages data based on the
relational model
 Relational data, SQL queries
 Commercial products
 Oracle, IBM DB2, IBM Informix, Sybase,
Microsoft SQL Server
 Each has ~10 million lines of C/C++ code
 Smaller packages – MySQL, PostgresSQL

H.Lu/HKUST
L01: RDBMS REVIEW -- 18
Relational Data Model
Main concept: relation
 A table with rows and columns
 Every relation has a schema
 Description of the columns, or fields
 Relational data – rows in a table
 No order among the rows in a table
 The most widely used data model!

H.Lu/HKUST
L01: RDBMS REVIEW -- 19
University Database

Conceptual schema:
 Students (sid: string, name: string, login: string, age:
integer, gpa:real)
sid
53666
53688
53650
name
login
Jones jones@cs
Smith smith@eecs
Smith smith@math
age
18
18
19
gpa
3.4
3.2
3.8
Cardinality = 3, degree = 5 , all rows distinct
 Courses (cid: string, cname:string, credits:integer)
 Enrolled (sid:string, cid:string, grade:string)
H.Lu/HKUST
L01: RDBMS REVIEW -- 20
Relational Languages
Formal languages
 Relational algebra
 Relational calculus
 Commercial language: SQL
 DDL (Data Definition Language)

• Create Table, Create Index, Create View …
 DML (Data Manipulation Language)
• Queries
– Select
• Updates
– Insert, Delete, Update
H.Lu/HKUST
L01: RDBMS REVIEW -- 21
Creating Tables
CREATE TABLE
Students
(sid: CHAR(20),
name: CHAR(20),
login: CHAR(10),
age: INTEGER,
gpa: REAL)
H.Lu/HKUST
CREATE TABLE
Enrolled
(sid: CHAR(20),
cid: CHAR(20),
grade: CHAR(2))
L01: RDBMS REVIEW -- 22
Primary Key Constraints
A set of fields is a key for a relation if :
 1. Any two distinct tuples differ in some fields of
the set, and
 2. This is not true for any subset of the set.
 A superkey: Condition 1 true and 2 false.
 E.g., sid is a key for Students. {sid, gpa} is a
superkey.
 One primary key can be set per relation.

H.Lu/HKUST
L01: RDBMS REVIEW -- 23
Primary and Candidate Keys
CREATE TABLE
CREATE TABLE
Students
Enrolled
(sid: CHAR(20),
(sid CHAR(20)
name: CHAR(20),
cid CHAR(20),
login: CHAR(10),
grade CHAR(2),
age: INTEGER,
PRIMARY KEY (sid,cid))
gpa: REAL,
PRIMARY KEY (sid),
UNIQUE (login))
H.Lu/HKUST
L01: RDBMS REVIEW -- 24
Foreign Key Constraints


Foreign key : a set of fields in a relation
 Refers to the primary key of another relation
Referential integrity
CREATE TABLE Enrolled
(sid CHAR(20), cid CHAR(20), grade
 No dangling references
Enrolled
sid
53666
53666
53650
53666
cid
grade
Carnatic101
C
Reggae203
B
Topology112
A
History105
B
H.Lu/HKUST
CHAR(2),
PRIMARY KEY (sid,cid),
FOREIGN KEY (sid) REFERENCES
Students )
Students
sid
53666
53688
53650
name
login
Jones jones@cs
Smith smith@eecs
Smith smith@math
age gpa
18 3.4
18 3.2
19 3.8
L01: RDBMS REVIEW -- 25
Integrity Constraints (ICs)




IC: condition that must be true for any db instance
 Domain constraints
 Primary constraints
 Foreign key constraints
ICs are specified when a schema is defined.
ICs are checked when relations are modified.
A legal instance of a relation
 Satisfies all specified ICs
H.Lu/HKUST
L01: RDBMS REVIEW -- 26
Adding and Deleting Tuples
INSERT INTO Students (sid, name, login, age, gpa)
VALUES (53688, ‘Smith’, ‘smith@ee’, 18, 3.2)
DELETE
FROM Students S
WHERE S.name = ‘Smith’
H.Lu/HKUST
L01: RDBMS REVIEW -- 27
Queries
SELECT *
FROM Students S
WHERE S.sid = 53688
sid
53666
53688
53650
H.Lu/HKUST
name
login
Jones jones@cs
Smith smith@eecs
Smith smith@math
age
18
18
19
gpa
3.4
3.2
3.8
L01: RDBMS REVIEW -- 28
Querying Multiple Tables
SELECT S.name, E.cid
FROM Students S, Enrolled E
WHERE S.sid=E.sid AND E.grade=“A”
Enrolled
sid
53666
53666
53650
53666
cid
grade
Carnatic101
C
Reggae203
B
Topology112
A
History105
B
H.Lu/HKUST
S.name E.cid
Smith
Topology112
Students
sid
53666
53688
53650
name
login
Jones jones@cs
Smith smith@eecs
Smith smith@math
age
18
18
19
gpa
3.4
3.2
3.8
L01: RDBMS REVIEW -- 29
Functional Components of DBMS
User/Application Database Administrator
Security Control
DML Stmt.
DDL Command
Query Processing
& Optimization
Transaction
Management
DDL
Compiler
Query Plan
Execution
Engine
Transaction Manager
Recovery
Concurrency
Control
Log
Lock
Table
Query Processing
Buffer
Index/file/record
Management
Storage Management
H.Lu/HKUST
Buffer
Management
Storage
Manager
Statistics Indexes
Metadata User data
L01: RDBMS REVIEW -- 30
Query Optimization
A major strength of RDBMS
 SQL queries are declarative
 Optimizer figures out how to answer them
 Re-order operations
 Pick among alternatives of one operation
 Ensure that the answer is correct!

H.Lu/HKUST
L01: RDBMS REVIEW -- 31
Transaction
A key concept in databases
 An atomic sequence of actions (read/write)
 Brings DB from a consistent state to another
 ACID
 Atomicity
 Consistency
 Isolation
 Durability

H.Lu/HKUST
L01: RDBMS REVIEW -- 32
Concurrency Control & Recovery

Concurrency Control
 Essential for good DBMS performance
 Run several user programs concurrently
 Interleave actions of different users
 Ensure the correctness
• Users may think it is a single-user system.

Recovery
 Essential for durability of transactions
H.Lu/HKUST
L01: RDBMS REVIEW -- 33
RDBMS Features
Effective and efficient access
 Easier application development
 Data independence
 Data integrity and security
 Concurrent access
 Recovery from crashes
 Uniform data administration

H.Lu/HKUST
L01: RDBMS REVIEW -- 34
Summary
DBMS used to maintain, query large datasets.
 Benefits include recovery from system crashes,
concurrent access, quick application development,
data integrity and security.
 Levels of abstraction give data independence.
 A DBMS typically has a layered architecture.
 DBAs hold responsible jobs
and are well-paid!
 DBMS R&D is one of the broadest,
most exciting areas in CS.

H.Lu/HKUST
L01: RDBMS REVIEW -- 35