CS511 Advanced Database Management Systems

Download Report

Transcript CS511 Advanced Database Management Systems

Lecture 4:
DBMS Architecture
Sept. 6 2006
ChengXiang Zhai
Most slides are adapted from Kevin Chang’s lecture slides
CS511 Advanced Database Management Systems
1
DBMS Mission Statement
• Simply: maintenance and computation of data
Data
Operations
Results
• But how to do it?
CS511 Advanced Database Management Systems
2
DBMS Architecture
User/Web Forms/Applications/DBA
query
transaction
Query Parser
Transaction Manager
Query Rewriter
Query Optimizer
Lock Manager
Logging &
Recovery
Query Executor
Files & Access Methods
Buffer Manager
Storage Manager
Buffers
Lock Tables
Main Memory
Storage
CS511 Advanced Database Management Systems
3
A Design Dilemma
• To what extent should we reuse OS services?
• Reuse as much as we can
– Performance problem (inefficient)
– Lack of control (incorrect crash recovery)
• Replicating some OS functions (“mini OS”)
– Have its own buffer pool
– Directly manage record structures with files
–…
CS511 Advanced Database Management Systems
4
OS vs. DBMS
• Conjecture: Perhaps pretty close
• Proof:
– There exists someone who can write popular
textbooks in both OS and DBMS!
[Operating | Database] System Concepts
– Jim Gray is from OS background!
CS511 Advanced Database Management Systems
5
OS vs. DBMS Similarities??
• What do they manage?
• What do they provide?
CS511 Advanced Database Management Systems
6
OS vs. DBMS: Similarities
• Purpose of an OS:
– managing hardware
– presenting interface abstraction to applications
• DBMS is in some sense an OS?
– DBMS manages data
– presenting interface abstraction to applications
• Both as API for application development!
CS511 Advanced Database Management Systems
7
Applications built upon DBMS
•
•
•
•
ERP: Enterprise Resource Planning
– SAP, Baan, PeopleSoft, Oracle, IBM,...
CRM: Customer Relationship Management
– E.phiphany, Siebel, Vantive, Oracle, IBM, ...
SCM: Supply Chain Management
– Trilogy, i2, Oracle, IBM, ...
A lot more in the Info Tech era:
–
–
–
–
e-business software
scientific data
multimedia
data analysis and decision support
CS511 Advanced Database Management Systems
8
OS vs. DBMS: Related Concepts
• Process Management
 What DB concepts?
– process synchronization
– deadlock handling
• Storage management
 What DB concepts?
– virtual memory
– file system
• Protection and security
 What DB concepts?
– authentication
– access control
CS511 Advanced Database Management Systems
9
OS vs. DBMS: Differences??
CS511 Advanced Database Management Systems
10
OS vs. DBMS: Differences
•
DBMS: Top-down to encapsulate high-level semantics!
– Data
• data with particular logical structures
– Queries
• query language with well defined operations
– Transactions
• transactions with ACID properties
•
OS: Bottom-up to present low-level hardware
CS511 Advanced Database Management Systems
11
DBMS on top of OS: Relations vs. File
system
•
•
Data object abstraction
– file: array of characters
– relation: set of tuples
Physical contiguity:
– large DB files want clustering of blocks
– extent: larger granularity allocation unit
• sol1: managing raw disks by DBMS
•
• sol2: simulate by managing free spaces in DBMS
Multiple trees (access methods)
– file access: directory hierarchy (user access method)
– block access: inodes
– tuple access: DBMS indexes
CS511 Advanced Database Management Systems
12
DBMS on top of OS: BM vs. VM
•
Query-aware replacement needed for performance
– not always LRU
•
Examples?
– how about sort-merge join??
– how about nested-loop join??
CS511 Advanced Database Management Systems
13
DBMS on top of OS: BM vs. VM
•
System-controlled replacement needed for correctness
– not always LRU
•
Examples?
CS511 Advanced Database Management Systems
14
•
Not Really OS Problems: Deferred Update
Semantics
Update emp.sal = 0.8*emp.sal if emp.sal > mgr.sal
empname
sal
manager
Smith
10k
Brown
Jones
9k
Brown
11k
Jones
– what are the possible semantics?
• INGRES solution: deferred updates
– buffer updates in intentions list for actual updates
(also serve as redo log)
– an example of “needing buffer knowledge in DBMS”,
so perhaps not sensible to do BM totally in OS
CS511 Advanced Database Management Systems
15
As the data model and application
context change,
so does the DBMS architecture…
CS511 Advanced Database Management Systems
16
Post-Relational DB Projects
•
•
•
Motivation:
– RDBMS not powerful enough for non-administrative dataintensive applications such as: CAD/CAM, GIS…
Buzz terms: object-oriented, extensible
Sample projects
– Postgres: U.C. Berkeley
– Starburst: IBM Almaden – “highly extensible”
• after System R (relational), R* (distributed)
• ultimately finding its way into IBM DB2 UDB
– Exodus: U. Wisconsin
• not a complete DB; an OO-style storage manager toolkit
• followed by Shore at Wisconsin, Predator at Cornell
CS511 Advanced Database Management Systems
17
POSTGRES: Post INGRES
Stonebraker, U.C. Berkeley
• 1977-1985: INGRES
– among the first relational DB implementation
–  Ingres Inc. --> ..  acquired by Computer
Associates
• 1986-1994: POSTGRES
– among the first object-relational DB implementation
–  Illustra  acauqired by Informix
– PostgreSQL (the SQL version)
CS511 Advanced Database Management Systems
18
RDBMS: the Relational Root
• Data model: (Codd, 1970’s)
– a database is a set of relations
– relation of n attributes: a set of n-tuples
– n-tuple: (v1, …, vn), where vi is in domain Si
CS511 Advanced Database Management Systems
19
Relational Model: Normal Forms
• Basic: 1NF (First Normal Form)
– implicitly required in the relation model
– definition:
• only simple domains of atomic elements (Codd)
• simple domains represent the base (built-in) types
– ? why?
• “Stronger” normal forms:
– 4NF, Boyce-Codd Normal Form, 3NF, 2NF, …
– ? why?
CS511 Advanced Database Management Systems
20
Normalizing Relations: Example
•
Unnormalized relation of book “objects”:
Books:
•
•
title
great future
career
authors
{smith, jones}
{jones}
date
4/01/01
7/12/00
Normalized relations: by decomposition
Books:
title
great future
career
day
4
7
month
1
12
Books:
title
great future
great future
career
authors
smith
jones
jones
year
01
00
?? Problems of the relational model?
CS511 Advanced Database Management Systems
21
Relational Model Problems
“A relational DB is like a garage that forces you to take your car apart and
store the pieces in little drawers.” (some researcher)
•
“Object” notion lost by decomposition
•
Base types are too restrictive
– non-intuitive: object is decomposed into several relations
– inefficient: a lot of online assembling by joins
– integers and strings are very primitive
•
•
– data “types” are typically application specific
Relational algebra is the only allowed operation
– simple, declarative, but also restrictive
– application = host language + embedded SQL
?? How to remedy these problems?
CS511 Advanced Database Management Systems
22
Quest for a Richer Model?
• Object-oriented data model
• Extensible ADTs
• Programming-language constructs
CS511 Advanced Database Management Systems
23
ORDBMS vs. OODBMS
• Question: How important is the relation?
• ORDBMS:
– RDBMS + OO features #
– query-based
• OODBMS:
– OO PL + database features (persistent objects)
– programming-based
• Meeting in the middle
CS511 Advanced Database Management Systems
24
Stonebraker’s Matrix
Simple Data
Complex Data
Query
RDBMS
ORDBMS
No Query
File System
OODBMS
• Prediction: ORDBMS will dominate
– evidence: big DB players are all on this side
CS511 Advanced Database Management Systems
25
Object Orientation Concepts
• Classes:
– classes as types
– encapsulation: interface + implementation
– inheritance: building class hierarchies
• Objects:
– complex objects:
• built from constructors, e.g., set-of, array, nested objs
– object identity (OID):
• system generated as unique object reference
• enables (efficient) object linking and navigation
CS511 Advanced Database Management Systems
26
POSTGRES Data Model
POSTGRES data model:
•
OO constructs
– classes as relations
• object (class instance) = tuple
• object-id = tuple-id
• method = attribute or function of attributes
– inheritance (multiple parents)
•
ADT constructs:
– types
– functions
CS511 Advanced Database Management Systems
27
POSTGRES Functions
•
•
Arbitrary C functions
– e.g.: overpaid(Employee)
– arbitrary semantics-- not optimized
– no fancy access methods-- typically sequential scan
Binary operators
– “hints” to provide semantics
– extensible access methods
•
• extensible B+tree or user-defined index
PostQuel procedures
– parameterized queries as functions
– e.g.: sal-lookup(name):
retrieve Emp.salary where Emp.name = name
CS511 Advanced Database Management Systems
28
POSTGRES Storage System
We were guided by a missionary zeal to do something different…
•
•
•
•
No-overwrite system
Logging:
– old values are not overwritten-- no value logging necessary
– log only needs to keep transaction state (commit/abort/going)
– ?? crash recovery-- how?
Vacuum-cleaner daemon to archive historical data
Advantages:
– recovery is cheap
– time travel is easy
CS511 Advanced Database Management Systems
29
Storage System: Problems
•
Problems
– flushing differential data (why?) by commit time can be costly
• unless “stable” main memory
• more costly than sequentially writing out logs – why ??
– reads have to stitch together current picture
•
And, yes, there are lots details unexplored or unexplained
CS511 Advanced Database Management Systems
30
Questing for the Right Models
Speaking about knowledge representation– The simple
relational model is by far the only successful KR
paradigm.
When the relational model came along, the network guys
resisted and their companies went under. …
When the OO model came along, the relational guys absorb
its best, and their companies prospered again!
-- Jeffery Ullman
CS511 Advanced Database Management Systems
31
What You Should Know
• What are some major limitations of services provided
by an OS in supporting a DBMS?
• In response to such limitations, what does a DBMS
do?
• As the data model and task environment change, the
architecture will also need to change
CS511 Advanced Database Management Systems
32
Carry Away Messages
•
One usually doesn’t fit all!
– An OS is designed to serve all kinds of applications, so it’s not optimal
for supporting a DBMS
•
– Other examples: a search engine is designed to serve all kinds of
people, so it’s not optimal for a particular person (personalized search)
When a problem is recognized, there are often opportunities for
breakthroughs in multiple areas
– DBMS could take over OS functions
•
– OS could provide more opportunities for customization
From “day 1”, high efficiency has been the primary challenge/concern in
designing and implementing a DBMS; reliability may be the second major
concern
– In contrast, “accuracy of answers” is at least as important as efficiency
for a Web search engine
– In the future, accuracy of answers will likely become more important
CS511 Advanced Database Management Systems
33