Course Design in Database Lecture 1
Download
Report
Transcript Course Design in Database Lecture 1
For 2004 ACM Honored Class
Feng Qian April 3, 2007
Introduction & Implementation in C++ (钱风)
Implementation in Java (周牧星)
Schedule and grading policy (马融)
Objective: Build an Database Management
System (DBMS) in C++/Java
A team project, hard but fun engineering
work
No official base code is given, you are
encouraged to work from zero
No strict technical restrictions, much more
flexible than previous projects (Compiler / OS)
TAs: 马融、周牧星、曲文涛、钱风
A Profound understanding of modern DBMS
A Proficient programming ability for large
engineering projects
A Perfect training for debugging skill for
thousands lines of codes
A Practical simulation of teamwork
environment in industrial zone
No pains, No gains
Storage
Buffer
Heap File
Index
SQL Parser
SQL Engine
Concurrency Control
Database Manager
Interface
Bottom Layer
Top Layer
Arrange data on disk, in the form of Page
Similar to Virtual Memory in OS
High level layers only knows a Page ID as an
entry point for access of the data
My Implementation
Virtual address space: 4GB
Self-grow page files: 4MB each
An advanced bitmap: segment tree
Main interfaces
(All deal with file operations on disk)
▪ CreateDB, OpenDB, CloseDB, DestroyDB
▪ AllocatePage, DeallocatePage
▪ WritePage, ReadPage
A traditional cache strategy, store frequently
accessed pages in memory (instead of in the
disk storage system)
My implementation
Store 2048 frequently used pages
2nd-chance replacement algorithm
Main interfaces:
▪ AllocatePage, DeallocatePage
▪ GetPage (Handle page missing)
▪ ReleasePage
▪ PinPage, UnpinPage (Reference count)
provides the ability of sequential access of
the records, as well as insertion, deletion,
updating and etc basic features.
Manage table structures
Manipulate tables in unit of tuple
Generate iterators for upper layers
The heap file defines the format of table
structures and tuples in pages.
Use single page to store a table structure
CREATE TABLE student (studentid INTEGER, name VARCHAR(10),
age INTEGER, schoolid INTEGER);
Table Info: Table ID: 2B, Num of columns: 2B, Num of tuples: 4B, First page ID: 4B
Num of pages: 4B, Table name: 10B
Each Field: Column Name: 2B, Column type: 2B, Column size: 2B, Not null: 1B
Key: 1B Offset: 2B, Index info: 4B
In the table structure page:
Store the table content
Assume a tuple (123,“Zhang”,20,45) is stored in Page 5
The unique identification (RID) of this tuple is defined by its page (5) and its
offset in the page (0x010h)
Use linked multiple pages to store all tuples in a table
Structure
Page
Global Bitmap
Content
Page 1
Content
Page 2
Content
Page 3
…
Content
Page n
Main Interface of my implementation
CreateTable
DropTable
InsertTuple
DeleteTuple
UpdateTuple
SearchTuple
To accelerate the access of tuples
The most simple type of index:
Always keep tuples in order
Screenshot of Microsoft Access 2007
Typical index technique: B+ Tree
BT Header
Page
some important information about the B+ tree
is recorded in the BT Header Page, such as the
root page id of the tree and the key type
BT Index
Page
BTIndexPage is the internal node
and only contains index to the
child B+ tree page.
BT Leaf
Page
BTLeafPage is the leaf node and
contain index entries with RIDs
Common B+ Tree operations
Search, Insert, Delete
BoundaryQuery (return an array of RIDs)
Maintain B+ Tree at the same time of manipulating
the heap file.
Keep the balance of the tree
To be simple, you can build a B+ Tree for only the
column with the “primary key” flag.
Please refer to algorithm books and papers for more
information of B+ Tree
Parse SQL statement into the grammar tree
Two ways
Write a naïve parser yourself
Use tools (YACC for C++ users, JCup for Java users)
Your DBMS should support st the following
SQL statements
SELECT fields FROM tables [WHERE Wclause] [GROUP BY fields] [HAVING
fexpression] [ORDER BY fields]
CREATE TABLE tablename (ctclauses)
DROP TABLE tablename
INSERT INTO tablename (fields) VALUE (values)
DELETE FROM tablename [WHERE Wclause]
UPDATE tablename SET uclauses [WHERE Wclause]
Group functions: AVG | COUNT | MIN | MAX | SUM
Relation Op.: AND | OR
Data Types: CHAR | VARCHAR | DATE | TIME | DOUBLE | INTEGER | LONG
Examples
CREATE TABLE student
(studentid INTEGER,
name VARCHAR(30) DEFAULT '(NONAME)',
age INTEGER DEFAULT -1,
dat DATE DEFAULT DATE '2000-01-01' NOT NULL);
SELECT star1.name, star2.name FROM MovieStar star1, MovieStar star2
WHERE Star1.address=star2.address AND Star1.name<Star2.name;
SELECT name FROM MovieExec WHERE (cert#) IN
(SELECT producerC# FROM Movie WHERE ( title , year ) IN
(SELECT movieTitle , movieYear FROM StarsIn
WHERE starName = 1));
SELECT COUNT(birthdate) FROM MovieStar
GROUP BY gender HAVING gender='F';
Execute the parsed SQL statement
A possible implementation
SQL Statement
SQL Grammar Tree (by SQL Parser)
Relational Algebra Query Tree
Optimized to Execution Tree
Relational Algebra Query Tree
Optimized Execution Tree
Refer http://www.ittc.ku.edu/~sgauch/647/s99/notes/11b/sld001.htm for details
Different users can open the same database in the
same time
They also can read from the same table
concurrently
They cannot access the same table when someone
do writing operations on it
Thus, a concurrency control is needed
A possible implementation
Set up a table lock
Two kinds of locks: Read lock and Write lock
When a table has a write lock, no operations are
allowed on it. When a table has a read lock, only
read requirements are allowed.
If a user want to do some operations on it, he
must wait until the previous lock is released.
Avoid dead locking!
The bridge between access interface and the core
system
Since we allow open more than one database
simultaneously, the DB manager should maintain all
opened databases in current system.
Maintain system tables for each database
package the result from the low-level system, and
send it to the right session
Logger
Main functions in my implementation
Hold the connected sessions (multi-threaded)
Manage every opened database in current system
Package the result and do response
Logger
Two types of user interfaces
Graphics User Interface
XDBC (JDBC, ODBC or you defined) programming
interface
Remote user interfaces should be
implemented
C++ users: recommend raw socket
Java users: recommend RMI
I “created” a simple API. Sample program:
#include “FatwormAPI.h”
CFatWormAPI::Connect(“192.168.1.12”,1202);
CFatWormAPI::ExecuteCommand("LOGIN AAA AAA;");
CFatWormAPI::ExecuteCommand("OPEN DATABASE STU;");
CFatWormAPI::ExecuteCommand(“SELECT * FROM student2 WHERE
studentid<1000;");
for (int i=0;i<CFatWormAPI::GetSelectCount();i++) {
printf("%d %s %d %d\n",
CFatWormAPI::GetInteger(i,0),
CFatWormAPI::GetChar(i,1),
CFatWormAPI::GetInteger(i,2),
CFatWormAPI::GetInteger(i,3)
);
}
CFatWormAPI::DisConnect();
Client
Local Access
Socket/RMI
Remote Server
Database Manager
SQL parsing & SQL Engine
Heap File
Concurrency
Control
B+ Tree Indexing
Storage & Buffer
X
D
B
C
Advantage
Very Fast (Most real DBMS are written in C++)
Direct memory access
inline static int ReadInt(const BYTE * pData,int ofs) {
return *((int *)(pData+ofs));
}
Powerful Marco
#define MAKE_RID(PID,OFS) ((((DWORD)(PID))<<20) | (DWORD)(OFS))
Disadvantage
“Access violation” program – Be careful when
programming!
Contact me via e-mail
[email protected]
[email protected]