Database System Implementation

Download Report

Transcript Database System Implementation

Instructor
彭智勇
武汉大学计算机学院珞珈学者特聘教授
软件工程国家重点实验室
电话:87653196
Email: [email protected]
Book
经典原版书库
《Database System Implementation》
(美)Hator Garcia-Molina, Jeffrey.D.Ullman, Jennifer Widom 著
(斯坦福大学)
机械工业出版社
Marking Scheme
•
•
•
•
Assignment (3) (练习,3次):
15%
Small Test (3) (小测验,3次): 15%
Final Examination (期末考试): 70%
Total
100%
Practice
• 安装PostgreSQL系统
• 分析PostgreSQL源代码
• 改进PostgreSQL系统
http://www.postgresql.org
Database System Implementation
Hector Garcia-Molina
Jeffrey D. Ullman
Jennifer Widom
Chapter 1
Introduction to DBMS Implementation
Database Management System
A database management system (DBMS) is a
powerful tool for creating and managing large
amounts of data efficiently and allowing it to persist
over long periods of time, safely.
Capabilities of a DBMS
• Persistent Storage
• Programming interface
allowing the user to access and modify data
through a powerful query language.
• Transaction Management
Supporting concurrent access to data and
resiliency ( i.e. recovering from failures or errors)
Terminology Review
• Data
• Database
A collection of data, well organized for access and
modification, preserved over a long period.
• Query
• Relation
An organization of data into a two-dimensional table.
• Schema (Metadata)
A description of the structure of the data
A Simple DBMS: Megatron 2000
Megatron 2000 is a relational database
management system which supports the
SQL query language.
Megatron 2000 Implementation
The Relation Students(name, id, dept)
Data: /usr/db/students
Smith#123#CS
Johnson#522#EE
……
Schema: /usr/db/schema
Students#name#STR#id#INT#dept#STR
Depts#name#STR#office#STR
……
Execution of Megatron 2000 DBMS
dbhost> megatron2000
WELCOME TO MEGATRON 2000 !
& SELECT *
FROM Students #
name
id
dept
Smith
Johnson
123
522
CS
EE
& SELECT *
FROM Students
WHERE id>= 500 | HighID #
/usr/db/HighID
Johnson#522#EE
How Megatron 2000 Executes Queries
SELECT * FROM R WHERE <Condition>
1.
2.
3.
4.
Read the file schema to determine the attributes of relation R and their types.
Check that the <Condition> is semantically valid for R.
Display each of the attribute names as the header of a column, and draw a line.
Read the file named R, and for each line:
(a) Check the condition, and
(b) Display the line as a tuple, if the condition is true.
SELECT * FROM R WHERE <Condition> | T
1. Read the file schema to determine the attributes of relation R and their types.
2. Check that the <Condition> is semantically valid for R.
3. Read the file named R, and for each line:
(a) Check the condition, and
(b) Write the result to a new file /usr/db/T, if the condition is true.
4. Add to the file /usr/db/schema an entry for T that looks just like the entry for R,
except that relation name T replaces R. That is, the schema for T is the
same as the schema for R.
Example 1.2
SELECT office
FROM Students, Depts
WHERE Students.name = ‘Smith’ AND
Students.dept = Depts.name #
for (each tuple s in Students)
for (each tuple d in Depts)
if(s and d satisfy the WHERE-condition)
display the office value from Depts;
Problem (1) of Megatron 2000
Tuple layout on disk
The data layout on disk is not flexible.
e.g., - Change string from ‘Cat’ to ‘Cats’ and
we have to rewrite file
- ASCII storage is expensive
- Deletions are expensive
Problem (2) of Megatron 2000
Search expensive; no indexes
e.g.,
- Cannot find tuple with given key quickly
- Always have to read full relation
Problem (3) of Megatron 2000
Brute force query processing
Query-processing is not clever.
e.g., select *
from R,S
where R.A = S.A and S.B > 1000
- Do select first?
- More efficient join?
Problem (4) of Megatron 2000
• No buffer manager
There is no buffer in main memory.
e.g.,
Need caching
Problem (5) of Megatron 2000
There is no concurrency control.
Problem (6) of Megatron 2000
•No reliability
e.g.,
- Can lose data
- Can leave operations half done
Problem (7) of Megatron 2000
No security
e.g.,
- File system insecure
- File system security is coarse
Problem (8) of Megatron 2000
• No application program interface (API)
e.g.,How can a payroll program get at the data?
Problem (9) of Megatron 2000
• Cannot interact with other DBMSs.
Problem (10) of Megatron 2000
• Poor dictionary facilities
Problem (11) of Megatron 2000
• No GUI
Overview of a Database Management System
Database
administrator
User/application
queries,
updates
transaction
commands
Query
Compiler
query plan
DDL Commands
Transaction Manager
DDL Compiler
metadata statistics
Execution
engine
metadata
Logging and Recovery
Concurrency Control
Index, file, and
record requests
Index/file/recOrd manager
log
pages
page commands
Buffer
manager
data,
metadata,
indexes
Lock
table
read/write pages
Storage
manager
Storage
Buffers
Storage Management
It is responsible for storing data, metadata,
indexes, and logs.
An important storage management component
is the buffer manager, which keeps portions of the
data contents in main memory.
Query Processing
A user or an application program
initiates query to extract data from the
database. The query is parsed and
optimized by a query compiler. The
resulting query plan is executed by the
execution engine.
Transaction Management
• Logging and Recovery
• Concurrency Control
• Deadlock Resolution
Course Overview
• Storage-Management Overview
C2
C3
C4
C5
Memory hierarchy
Storage of data elements
one-dimensional indexes
Multidimensional indexes
• Query Processing
C6 Query Execution
C7 Query compiler and optimizer
• Transaction-Processing
C8 System failures
C9 Concurrency control
C10 More about transaction management
• Information integration
The course lets students know
better ways of building a database
management system.