Slides from Yong
Download
Report
Transcript Slides from Yong
Object-relational database systems
Yong Yao
CS632
April 17, 2001
Content
Introduction
Michael Stonebraker: Inclusion of New Types in
Relational Data Base Systems.
Michael Stonebraker, Greg Kemnitz: The Postgres
Next Generation Database Management System.
PREDATOR System Design Document: A detailed
description of internal design decisions and
implementation details.
Introduction
The relational model
Dominant mainstream approach
Work well on business data processing
But things have changed drastically,
nontraditional problems.
Taxonomy of DBMS Application
Yes
No
Use SQL
data Simple
complexity
Complex
Simple data
Yes
Relational
DBMS
No
Use SQL
File
System
data Simple
complexity
Complex
OODBMS
Yes
Relational
DBMS
No
•A programming
language with a
type system
File System OODBMS
Use SQL
data Simple
complexity
Complex
•Persistent object
ORDBMS
Yes
No
Relational
DBMS
ORDBMS
File System
OODBMS
Use SQL
data Simple
complexity
Complex
Features
Base type extension
Inheritance
Complex object
Inclusion of New Types in Relational
Database Systems
Michael Stonebraker
EECS Dept.
University of California, Berkeley
Example
Represent a set of boxes in the DBMS and a simple query- find
all the boxes that overlap the unit square(0,1,0,1).
RDBMS:
Create box(id=i4, x1=f8, x2=f8,y1=f8,y2=f8)
select *
from box
where ! (box.x2<=0 or box.x1>=1 or box,y2<=0 or
box.y1>=1)
Too hard to understand
Too complex to optimize
Too many clauses to check
Example (cont’d)
Create box(id=i4,desc=box)
select *
from box
where box.desc!!”0,1,0,1”
Box: new data type
“!!”: overlap operator with two operands of data type box,
return a boolean
Operators for Boxes
Motivation
Needs of business data processing
applications
Geographic Information System
New Types: Point, Line, Polygon;
New operators: distance, intersection ;
New access methods: R-trees, KDB trees;
Conditions
1.
2.
3.
4.
The definition of user-defined data types
The definition of new operators for these
data types
The implementation of new access methods
for data types
Optimized query processing for commands
containing new data types and operators
Definition of New Types
Follow a registration process
define type-name length = value,
input = file-name,
output = file-name
Occupy a fixed amount of space
Conversion routines
Definition of New Operators
Safety loophole
Problem:
an ADT routine which has an error can overwrite
DBMS data structures
Unclear whether such errors are due to bugs in
the user routines or in the DBMS
Solutions:
Run in separate address space
Build a language processor
Provide two environments
New access methods
Users can add new access methods to
support user-defined data types
Goal: extensibility
Interface:
Conditions for the operators
Information on the data types of operators
Define set of operators
Example: B-tree
Conditions for operators:
Information on the data types of operators (B-tree)
•“<=“ is required
•Type: specific type, fixed, variable, fix-var, type1, type2
New set of operators (B-tree)
F1=(value - low-key) / (high-key – low-key)
F2=(high-key – value) / (high-key – low-key)
Implementing New Access Methods
A collection of procedure calls
open (relation-name)
close (descriptor)
get-next(descriptor, OPR, value, tuple-id)
insert (descriptor,tuple)
delete(descriptor, tuple-id)
replace(descriptor, tuple-id, new-tuple)
build(descriptor,keyname,OPR)
Hard problem- work with transaction management
log pages – simple, but may suffer from
performance penalty
log events
Event-oriented interface
REDO(T)
UNDO(T)
LOG(event-type, event-data)
Access path selection
Four pieces of information for optimization
Stups: estimate the expected number of records
…where rel-name.field-name OPR value
A second selectivity factor S:
…where relname-1.field-1 OPR relname-2.field-2
Feasibility of merge-sort
Feasibility of hash-join
Define operator token= AE,
……
Stups=1
S= min(N1.N2).
merge-sort with AL,
hash-join
Generate query processing plan
relname-1.field-1 OPR relname-2.field-2
Merge sort
Iterative substitution
Hash Join
Relname.field-name OPR value
any access method with field-name as a key
Secondary index
Sequential search
Summary
How to extend an abstract data type
How to define new access method
Integrate new access method with
transaction management
Generation of optimized query processing
plan
The POSTGRES Next-Generation
Database Management System
Michael Stonebraker
Greg Kemnitz
Three kinds of services for DBMS
Traditional data management
Object management
Simple data type
Complex data types, e.g. bitmaps, icons, text…
Knowledge management
Store and enforce a collection of rules
Example- newspaper layout
store and manipulate text and graphics, Bill
customers for advertisement.
Data: customer information
Object: text, pictures and icons
Rules: control newspaper layout
POSTGRES Data Model-Design criteria
Traditional relational DBMSs data model: a
collection of named relations
Orientation toward database access from a
query language
Interact with database by query languagePOSTQUEL
User defined functions
Design criteria
Orientation toward multilingual access
Two selections
One language tightly coupled to the system
Multilingual
POSTGRES is programming language
neutral
Design criteria
Small number of concepts
As few as possible
Four constructs:
Class
Inheritance
Type
Function
Data Model - Class
Class(constructed type or relation) : a named
collection of instances of objects.
Instances(record or tuple) : has the same
collection of attributes
create EMP(name=string, salary= float, age=int)
Data model - Inheritance
Inherit data elements from other classes
create EMP(name=string, salary= float, age=int)
create SALESMAN(quota=float) inherits EMP
Multiple inheritance – only create
objects without ambiguity
EMP
SALESMAN
Three kinds of classes
Real class: instances are stored in the database
Derived class :view
Version: store differential relative to its base class
Initially has all instances of the base class
Freely updated to diverge from the base class
Updates to the version do not affect the base class
Updates to the base class are reflected in the version
Supported by the rule system
Version Example
create version my-EMP from EMP
Two classes generated:
EMP-MINUS(deleted-OID)
EMP-PLUS(all-fields-in EMP, replaced-OID)
Retrieve: retrieve EMP-PLUS instead
Insert: All new instances for EMP or my-EMP will be
added into EMP-PLUS;
Delete: move records from EMP-PLUS to EMP-MINUS
Data model - Types
Base types:
Int, float, string
construct new base types.
Arrays of base types:
Composite types:
Construct complex objects: attributes contain
other instances as part or all of their value
Composite types
Contains zero or more instances of the same
class
create EMP (… , manager=EMP, coworkers=EMP)
Set: value is a collection of instances from all
classes
create EMP(… , hobbies=set)
{softball, skiing, skating…}
Data Model - Functions
C functions
Arbitrary C procedures
Can not be optimized by the POSTGRES
Argument: base types or composite types
Inherited down the class hierarchy
overpaid(EMP)
overpaid(SALESMAN)
Functions
Operators
Utilize index
Functions with one or two operands
{ALT, ALE, AE, AGT, AGE}
Allow new access methods
POSTQUEL functions
Any collection of commands in the POSTQUEL
Define function high-pay returns EMP as
Retrieve (EMP.all)
Where EMP.salary>50000
POSTQUEL
Set-oriented query language
Support nested queries
Transitive closure
Support for inheritance
Support for time travel
Allow a user to run historical query
retrieve (EMP.salary)
from EMP [T]
where EMP.name=“Sam”
Maintain two different physical collections of records
Vacuum cleaner: a daemon moves records
The Rules System
Requirements:
referential integrity
View management
Triggers
Integrity constraints
Protection
Version control
Design a general purpose rules system
Syntax of rules
ON event (TO) object WHERE
POSTQUEL-qualification
THEN DO [instead]
POSTQUEL-command(s)
on new EMP.salary where
EMP.name=“Fred”
then do replace
E (salary=new.salary)
from E in EMP
where E.name=“Joe”
Event: retrieve, replace, delete, new …
Object: name of a class or class.column
POSTQUEL-qualification: normal qualification
POSTQUEL-commands: a set of POSTQUEL commands
Forward and Backward chaining
Rules specify additional actions, and these actions
may activate other rules
on new EMP.salary where
EMP.name=“Fred”
then do replace
E(salary=new.salary)
from E in EMP
where E.name=“Joe”
on retrieve to EMP.salary where
EMP.name=“Joe”
then do instead retrieve
(EMP.salary)
where EMP.name=“Fred”
Implementation of Rules
Record level processing
Rules system is called when individual records are
accessed
Place a marker on the record
Query rewrite module
Perform poorly for a large number of small-scope
rules
Desirable when there are a small number of
larger-scope rules
When rules are activated
Immediate-same transaction
Immediate-different transaction
Deferred-same transaction
Deferred-different transaction
Currently only implements the first option
Storage System
‘no-overwrite’ storage manager
Old record remains in the database whenever an
update occurs
Instantaneous crash recovery
Support time travel
stable main memory required
Performance
Better than UCB-INGRES
But
Compare to the Cattell system, loses by about a
factor of two.
Comments
POSTGRES allows an application designer to
trade off performance for data independence
Imports only specific user functions into its
address space
PREDATOR
DESIGN AND INPLEMENTATION
PARVEEN SESHADRI
System overview
A client-server database system
Goal: Extensibility - adding the ability to
process new kinds of data
Basic components
Extensible table of Enhanced Abstract Data Type
(E-ADTs)
Extensible table of query processing engines(QPEs)
Server Architecture
Enhanced Data Types
Standard ADT
Specify the storage format for values
Specify methods that can be invoked on values
Specify how some methods can be matched using
indexes
Motivation: methods can be very expensive
Data.image.sharpen().clip()
Improvement
Data.image.sharpen().clip()
It is unnecessary for Sharpen to compress and write
its result to disk
-> Pass result directly in memory to Clip
Rewrite the expression
-> Data.image.clip().sharpen()
No need to retrieve the entire image
E-ADT
Expose the semantics of its methods
Query optimizations are performed using
these semantics
Find more at :
The Case for Enhanced Abstract Data Types.
VLDB 1997
Thank You