presentation source
Download
Report
Transcript presentation source
O-O, What Are They Doing
to Relational Databases?
(The Evolution of DB2 Universal Database)
Michael J. Carey
IBM Almaden
January 1999
Plan for Today's Presentation
The relational DBMS revolution
The object-relational DBMS evolution
O-R features in DB2 Universal Database V5.2
Some O-R implementation tradeoffs (V5.2)
What lies ahead for DB2 UDB & O-R databases?
Questions (and possibly answers)
Please ask questions throughout...!
The Relational DBMS Revolution
The pre-relational era (1970's)
Graph-based data models
Hierarchical (IMS), network (Codasyl)
Low-level, navigational interfaces
Labor-intensive and error-prone
The relational era (1980's)
Simple, abstract data model
Database = set of relations ("tables")
3 schema levels: views, base tables, physical schema
Algebra of set-oriented operations
High-level, declarative interfaces
SQL, QBE, et al
Embedded languages, 4GLs
The Relational Model (in one slide)
Employees and departments
Department
Employee
?
dno
10
20
eno
1
7
22
name
Toy
Shoe
name
Lou
Laura
Mike
salary
10000000
150000
80000
select E.name, E.salary, D.no
from Employee E, Department D
where E.salary < 100000
and D.name = 'Shoe'
and E.dept = D.dno;
dept
10
20
20
Relational Databases: A Success Story
The relational model has been a big success
Simplicity has made research tractable
Data independence yields productivity gains
Both academia and industry have benefitted
Relational DBMS "goodies" include
Efficient query optimization and execution
Well-defined transaction semantics and support
Excellent multi-user performance and robustness
Views for data independence, authorization
Constraints, triggers, and stored procedures for (shared)
business rule capture/enforcement
"The" success story for parallel computing
We've Achieved Nirvana ... Right?
The world is becoming increasingly complex
New data types are appearing (e.g., multimedia)
Real-world data doesn't fit neatly into tables
Entities and relationships (vs. tables)
Variance among entities (vs. homogeneity)
Set-valued attributes (vs. normalization)
Advanced applications bring complex data
E.g., CAD/CAM data management, web data management,
geographic information management, medical data
management, (your favorite application goes here)
So maybe objects are the answer...?
Yes, if we can keep all the relational "goodies"!
The Object-Relational DBMS Evolution
O-R extension #1: Abstract data types (ADTs)
New column types and functions/methods
E.g., text, image, audio, video, time series, point, line, OLE...
For modeling new kinds of facts about enterprise entities
Infrastructure for extenders/datablades/cartridges
O-R extension #2: Row types
Types and functions/methods for rows of tables
Desirable features include references, inheritance, methods, late
binding, and collection-valued attributes
For modeling enterprise entities with relationships & behavior
Infrastructure for DBMS-native object management
Recent SQL3 merger: Structured types
Can use for types of columns and/or tables
"Not Your Father's Employee Type"
Beyond name, rank, and serial number
New attribute types
Location (2-d point), job description (text), photo (image), ...
Associated functions
Distance(point, point), contains(text, string), ...
Beyond your basic employee record
Employees come in different flavors
Employees have many known relationships
Emp, RSM, Programmer, Manager, Temp, ...
Manager, department, projects, ...
Employees have behavior
Age(Emp), qualified(Emp, Job), hire(Emp), ...
An Employee is a simple "business
object"
The OSF Project at IBM Almaden
OSF stands for "Object Strike Force"
Semi-autonomous group "outside" UDB development
Focus: object-relational extensions for DB2 UDB
Both near-term and longer-term interests
Collaborate with our Toronto and Santa Teresa labs
Significant activities to date
Prototyped "row type" support for DB2 UDB
Delivered in DB2 UDB Version 5.2 (9/98)
Significantly revised SQL3 draft standard
Working on next step plus future technologies
DB2 Universal Database, Version 5
DB2 for Common Servers (Version 2)
User-defined column types (UDTs/distinct types)
User-defined functions (UDFs)
Binary/character large objects (BLOBs/CLOBs)
Distinct types: new data types for columns
Ex: create distinct type US_Dollar as Real with
comparisons;
US_Dollar is an available UDT with functions =, <>, <,
<=, >, >=, US_Dollar(Real), Real(US_Dollar)
User-defined functions: associated operations
create function CA_Tax (US_Dollar) returns
US_Dollar external name 'money!US_Dollar' language
C;
DB2 Universal Database, Version 5 (cont.)
Lots of other interesting features as well, e.g.:
Constraints and triggers
Recursive queries
OLAP support (cube and rollup)
Extenders (based on UDTs/UDFs)
Wide range of hardware/software platforms
PCs: Windows95, NT, OS/2, SCO
Unix workstations: AIX, Solaris, HP/UX
Parallel platforms: SMPs, MPPs (e.g., SP2)
Descended from Almaden's Starburst system
Extensible query compiler (with rule-based query
rewrite and query optimizer components)
New O-R Features in DB2 UDB V5.2
Structured types and references
Named types with attributes, O-O subtyping model
Ref(T) for directly modelling relationships
Typed tables and table hierarchies
Oid (user-provided) plus a column per attribute of T
Subtables for querying and managing subtype instances
Query language extensions
Substitutability for queries/updates (data independence ++)
Path expressions for querying relationships easily
Functions/predicates for runtime type inquiries
Object views (via a novel approach)
Virtual table hierarchies for flexible access control
Also facilitates O-O views of legacy tables
A Simple Example
Employee and department tables in the (late) 90's
person
dept
mgr
emp
dept
exec
student
Structured Types and References
Create structured types (and references)
create type Person_t as (
name Varchar(40), birthyear Integer
);
create type Emp_t under Person_t as (
salary Integer, dept Ref(Dept_t)
);
create type Exec_t under Emp_t as (
bonus Integer
);
create type Student_t under Person_t as (
major Varchar(20)
);
Structured Types and References (cont.)
Create structured types (cont).
create type Dept_t as (
name Varchar(20),
budget Integer,
headcount Integer,
mgr Ref(Emp_t)
);
Okay, so I lied (a little) on the last slide...
alter type Emp_t add attribute dept Ref(Dept_t);
Typed Tables and Table Hierarchies
Now create typed tables (and subtables)
create table person of Person_t
(ref is oid user generated);
create table emp of Emp_t under person
(dept with options scope dept);
create table exec of Exec_t under emp;
create table student of Student_t under person;
create table dept of Dept_t
(ref is oid user generated,
mgr with options scope emp);
SQL Query Extensions (by example)
Substitutability
select E.* from emp E
where E.birthyear > 1970 and E.salary > 50000;
Data modification (insert; update/delete)
insert into emp
values (Emp_t('e100'), 'John Smith', 1968, 65000,
(select oid from dept where name = 'Database'));
update person set birthyear = birthyear + 1 where name = 'John Smith';
Path expressions
select E.name, E.dept->name
from emp E
where E.dept->mgr->dept->mgr->name = 'Lou Gerstner';
Querying Table Hierarchies: An Example
Dept
Person
Emp
name
birthyear
P1
Harold
1970
P2
Carol
1958
oid
oid
name
birthyear
P3
Hamid
1956
P4
Lou
1940
select name, dept->name
from Emp
where birthyear < 1960
oid
D1
dept
_
name
.....
Databases
.....
select *
from Person
where name like 'H%'
SQL Query Extensions (cont.)
Support for type-dependent queries
select *
from only (emp) E
where dept->budget > 10000000;
select name
from person P
where deref(oid) is of type (only Emp_t, Student_t);
select type_name(deref(E.oid)), E.*
from outer (emp) E
where e.oid = Emp_t('e13');
Other Data Definition Features
ref is for object id column
Unique, user-generated (on insert)
scope clause for reference columns
Provides critical information to the query optimizer
not null constraints
Definable at any level of a table hierarchy
Enforced for indicated table and its subtables
unique constraints
Root level (and columns) only
create index for physical schema
Unique or non-unique index on root table
Non-unique index on subtable
Other Data Definition Features (cont.)
Authorization model for table hierarchies
grant and revoke on table or subtables
Substitutability: implicit subtable authorization on
columns inherited from an authorized supertable
Ex #1: select privilege on person table
Ex #2: update privilege on salary column of emp table
Some operations require authorization everywhere
deref function
is of type predicate and type_xxx functions
SQL3 also supports granting of table/subtable
privileges with hierarchy option
Object Views in DB2 UDB
Typed views and view hierarchies
vperson
vdept
mgr
vemp
vstudent
dept
Requirements: virtual table hierarchies
Typed rows with (derived) object ids
Views may be quite different from base data
Support for interconnected "view schemas"
Types For Object Views
Create types for use in views
create type VPerson_t as (
name Varchar(40)
);
create type VEmp_t under VPerson_t as (
dept Ref(VDept_t)
);
create type VStudent_t under VPerson_t as (
kind Varchar(8)
);
create type VDept_t as (
name Varchar(20), mgr Ref(VEmp_t)
);
Typed View Hierarchies
Now create typed views (and subviews)
create view vperson of VPerson_t (ref is oid user generated) as
select VPerson_t(Varchar(oid)), name from only (person);
create view vemp of VEmp_t under vperson
(dept with options scope vdept) as
select VEmp_t(Varchar(oid)), name, VDept_t(Varchar(dept))
from emp where salary > 0;
create view vstudent of VStudent_t under vperson as
select VStudent_t(Varchar(oid)), name,
case when major like '%Engineer%' then 'Geek'
else 'non-Geek' end
from student;
create view vdept of VDept_t ...;
O-R Implementation Issues/Tradeoffs
Some guiding principles for DB2 UDB V5.2
Performance must equal/exceed relational equivalents
Design amenable to future plans w.r.t. type evolution
Structured types must be supported in columns (someday)
Localize initial changes to query compiler where possible
Want "free" indexing, rewrites, optimization, parallelization, ...
Influenced by discussions with a CAD/CAM vendor
Information on existing approach and installations
Requirements for efficiency of new products
Let's look briefly at two areas
Table hierarchy representation
References and path query processing
Implementing Table Hierarchies
Implementation table approach
One physical table per table hierarchy with:
Type tag column (to distinguish subtable rows)
Object id column
Columns for all columns of the root table and its subtables
Vertical partitioning approach
One physical root table with:
Type tag column
Object id column
Columns for each root table column
N physical delta tables (one per subtable) with:
Object id column
Columns for each column introduced by this subtable
Implementing Table Hierarchies (cont.)
Horizontal partitioning approach
N separate physical tables with:
Object id column
Columns for every subtable column (inherited or not)
So what did we do for UDB V5.2...?
Vertical partitioning approach rejected quickly
Too many joins to materialize subtable rows
Multi-column constraints and indices problematic
Horizontal partitioning approach rejected eventually
Uniqueness issue for user-generated object ids
Query complexity for multi-hierarchy join queries
Ex: select p.name, q.name from Person p, Person q where ...
Implementation table approach taken for V5.2
Appeared to give us the most "free" functionality
Adopted despite row size (null columns) downside
References and Path Expressions
Reference values in tables should have a scope
"Other end" info for query rewrite and join optimization
Ditto for authorization checking (static vs. dynamic)
Schema makes overly wide references unnecessary
Uniqueness is hierarchy-relative, enforced with an index
V5.2 self-references (object ids) are user-generated
CAD/CAM vendor had "legacy references" in files
Different users have different id generation schemes
Loading cyclic data (e.g., emp<-> dept) is messy and slow
Ditto for creating objects from an object cache
References and Path Expressions (cont.)
Path expressions are logically equivalent to subqueries
select E.name, E.dept->name, E.dept->mgr->name
from emp E
where E.dept->headcount > 10;
Actual approach: shared subquery generation (QGM)
Compute common paths (prefixes) once to save work
Not every SQL context accepts an actual subquery
Also need to handle non-serializable locking levels
Efficiency obtained through query rewrite, e.g.:
Subquery to outer-join transformation
Outer-join to join transformation where possible
Where We Are Today in UDB
V5.2 of UDB contains new O-R features
Structured types with inheritance
Object tables and table hierarchies
References and path expressions
Object views and view hierarchies
Moreover, so does the SQL3 standard
Includes object views and user-defined references
IBM, Oracle, Informix heading in same general direction
Work continuing on O-R extensions
Let's have a brief look...
Additional Object Table Support
Business rule mechanisms for typed tables
Check constraints on tables/subtables (w/inheritance)
Referential integrity constraints to and from tables/subtables
Triggers on tables/subtables
Object modeling and management extensions
User-defined reference types (ref using)
More flexible object view definitions
Type and instance (i.e., row) evolution
Structured types for attributes/columns
Work in progress at IBM Santa Teresa Lab
Functions/methods just around the corner as well
Other Exploratory O-R Work
Efficient support for collection types
Multivalued attributes (e.g., Project.team)
Flavors: set, multiset, array, list, ...
Need to integrate into SQL, support querying well
Some experience from a first prototype
Other activities (and open problems)
Java mappings & bindings for O-R data
XML & data-centric web sites ("d-commerce")
Business object servers (caching/consistency)
Heterogeneous data & O-R database systems
User-defined and/or external indexing
Optimizer "hooks" for new data types
Etc.!
Partial List of UDB O-R Contributors
Almaden Research Center
Mike Carey, Don Chamberlin, Srinivasa Narayanan, Bennet
Vance; C.M. Park; Guido Moerkotte
Santa Teresa Lab
Nelson Mattos
Gene Fuh, Michelle Jou, Brian Tran
Toronto Lab
Doug Doole, Serge Rielau, Rick Swagerman
Leo Lao, Walid Rjaibi, Calisto Zuzarte
Cheryl Greene, various other consultants/hecklers
And as for future versions of UDB
Your name could appear here! (MS/PhD)
The End
What About Object-Oriented DBMSs?
OOPL + DBMS = OO-DBMS
Commonly based on C++ or Smalltalk
Persistence, collections, queries, versions, ...
Lots of interesting and useful research results
O-O data models and query languages
O-O query processing, system architecture, performance
Various products (O2, Objectstore, Versant, Objectivity, ...)
No widespread commercial acceptance
Many differences across systems (despite ODMG-93)
Never really caught up to RDBMS techology
Schema compilation, evolution painful
Missing many of the relational "goodies"
Single-language focus, lack of (relational) tools
Stonebraker Fellow Criteria (found on web)
Industrial database researcher
PhD from UC Berkeley
Must agree with the following motto:
Databases are the answer...!
What was the question again...?
At least 6' tall
Had a PhD thesis advisor with first name Mike
Produced a PhD student with first name Mike