presentation source

Transcript presentation source

O-O, What Are They Doing
to Relational Databases?
(The Evolution of DB2 Universal Database)
Michael J. Carey
IBM Almaden
January 1999
Plan for Today's Presentation
The relational DBMS revolution
The object-relational DBMS evolution
O-R features in DB2 Universal Database V5.2
Some O-R implementation tradeoffs (V5.2)
What lies ahead for DB2 UDB & O-R databases?
Questions (and possibly answers)
Please ask questions throughout...!
The Relational DBMS Revolution
The pre-relational era (1970's)

Graph-based data models


Hierarchical (IMS), network (Codasyl)
Low-level, navigational interfaces

Labor-intensive and error-prone
The relational era (1980's)

Simple, abstract data model
Database = set of relations ("tables")
 3 schema levels: views, base tables, physical schema
 Algebra of set-oriented operations


High-level, declarative interfaces
SQL, QBE, et al
 Embedded languages, 4GLs

The Relational Model (in one slide)
Employees and departments
Department
Employee
?
dno
10
20
eno
1
7
22
name
Toy
Shoe
name
Lou
Laura
Mike
salary
10000000
150000
80000
select E.name, E.salary, D.no
from Employee E, Department D
where E.salary < 100000
and D.name = 'Shoe'
and E.dept = D.dno;
dept
10
20
20
Relational Databases: A Success Story
The relational model has been a big success
Simplicity has made research tractable
 Data independence yields productivity gains
 Both academia and industry have benefitted

Relational DBMS "goodies" include
Efficient query optimization and execution
 Well-defined transaction semantics and support
 Excellent multi-user performance and robustness
 Views for data independence, authorization
 Constraints, triggers, and stored procedures for (shared)
business rule capture/enforcement
 "The" success story for parallel computing

We've Achieved Nirvana ... Right?
The world is becoming increasingly complex
New data types are appearing (e.g., multimedia)
 Real-world data doesn't fit neatly into tables

Entities and relationships (vs. tables)
 Variance among entities (vs. homogeneity)
 Set-valued attributes (vs. normalization)


Advanced applications bring complex data

E.g., CAD/CAM data management, web data management,
geographic information management, medical data
management, (your favorite application goes here)
So maybe objects are the answer...?

Yes, if we can keep all the relational "goodies"!
The Object-Relational DBMS Evolution
O-R extension #1: Abstract data types (ADTs)

New column types and functions/methods
E.g., text, image, audio, video, time series, point, line, OLE...
 For modeling new kinds of facts about enterprise entities


Infrastructure for extenders/datablades/cartridges
O-R extension #2: Row types

Types and functions/methods for rows of tables
Desirable features include references, inheritance, methods, late
binding, and collection-valued attributes
 For modeling enterprise entities with relationships & behavior


Infrastructure for DBMS-native object management
Recent SQL3 merger: Structured types

Can use for types of columns and/or tables
"Not Your Father's Employee Type"
Beyond name, rank, and serial number

New attribute types


Location (2-d point), job description (text), photo (image), ...
Associated functions

Distance(point, point), contains(text, string), ...
Beyond your basic employee record

Employees come in different flavors


Employees have many known relationships


Emp, RSM, Programmer, Manager, Temp, ...
Manager, department, projects, ...
Employees have behavior

Age(Emp), qualified(Emp, Job), hire(Emp), ...
An Employee is a simple "business
object"
The OSF Project at IBM Almaden
OSF stands for "Object Strike Force"
Semi-autonomous group "outside" UDB development
 Focus: object-relational extensions for DB2 UDB
 Both near-term and longer-term interests
 Collaborate with our Toronto and Santa Teresa labs

Significant activities to date
Prototyped "row type" support for DB2 UDB
 Delivered in DB2 UDB Version 5.2 (9/98)
 Significantly revised SQL3 draft standard
 Working on next step plus future technologies

DB2 Universal Database, Version 5
DB2 for Common Servers (Version 2)
User-defined column types (UDTs/distinct types)
 User-defined functions (UDFs)
 Binary/character large objects (BLOBs/CLOBs)

Distinct types: new data types for columns
Ex: create distinct type US_Dollar as Real with
comparisons;
 US_Dollar is an available UDT with functions =, <>, <,
<=, >, >=, US_Dollar(Real), Real(US_Dollar)

User-defined functions: associated operations

create function CA_Tax (US_Dollar) returns
US_Dollar external name 'money!US_Dollar' language
C;
DB2 Universal Database, Version 5 (cont.)
Lots of other interesting features as well, e.g.:
Constraints and triggers
 Recursive queries
 OLAP support (cube and rollup)
 Extenders (based on UDTs/UDFs)

Wide range of hardware/software platforms
PCs: Windows95, NT, OS/2, SCO
 Unix workstations: AIX, Solaris, HP/UX
 Parallel platforms: SMPs, MPPs (e.g., SP2)

Descended from Almaden's Starburst system

Extensible query compiler (with rule-based query
rewrite and query optimizer components)
New O-R Features in DB2 UDB V5.2
Structured types and references
Named types with attributes, O-O subtyping model
 Ref(T) for directly modelling relationships

Typed tables and table hierarchies
Oid (user-provided) plus a column per attribute of T
 Subtables for querying and managing subtype instances

Query language extensions
Substitutability for queries/updates (data independence ++)
 Path expressions for querying relationships easily
 Functions/predicates for runtime type inquiries

Object views (via a novel approach)
Virtual table hierarchies for flexible access control
 Also facilitates O-O views of legacy tables

A Simple Example
Employee and department tables in the (late) 90's
person
dept
mgr
emp
dept
exec
student
Structured Types and References
Create structured types (and references)
create type Person_t as (
name Varchar(40), birthyear Integer
);
create type Emp_t under Person_t as (
salary Integer, dept Ref(Dept_t)
);
create type Exec_t under Emp_t as (
bonus Integer
);
create type Student_t under Person_t as (
major Varchar(20)
);
Structured Types and References (cont.)
Create structured types (cont).
create type Dept_t as (
name Varchar(20),
budget Integer,
headcount Integer,
mgr Ref(Emp_t)
);
Okay, so I lied (a little) on the last slide...
alter type Emp_t add attribute dept Ref(Dept_t);
Typed Tables and Table Hierarchies
Now create typed tables (and subtables)
create table person of Person_t
(ref is oid user generated);
create table emp of Emp_t under person
(dept with options scope dept);
create table exec of Exec_t under emp;
create table student of Student_t under person;
create table dept of Dept_t
(ref is oid user generated,
mgr with options scope emp);
SQL Query Extensions (by example)
Substitutability
select E.* from emp E
where E.birthyear > 1970 and E.salary > 50000;
Data modification (insert; update/delete)
insert into emp
values (Emp_t('e100'), 'John Smith', 1968, 65000,
(select oid from dept where name = 'Database'));
update person set birthyear = birthyear + 1 where name = 'John Smith';
Path expressions
select E.name, E.dept->name
from emp E
where E.dept->mgr->dept->mgr->name = 'Lou Gerstner';
Querying Table Hierarchies: An Example
Dept
Person
Emp
name
birthyear
P1
Harold
1970
P2
Carol
1958
oid
oid
name
birthyear
P3
Hamid
1956
P4
Lou
1940
select name, dept->name
from Emp
where birthyear < 1960
oid
D1
dept
_
name
.....
Databases
.....
select *
from Person
where name like 'H%'
SQL Query Extensions (cont.)
Support for type-dependent queries
select *
from only (emp) E
where dept->budget > 10000000;
select name
from person P
where deref(oid) is of type (only Emp_t, Student_t);
select type_name(deref(E.oid)), E.*
from outer (emp) E
where e.oid = Emp_t('e13');
Other Data Definition Features
ref is for object id column

Unique, user-generated (on insert)
scope clause for reference columns

Provides critical information to the query optimizer
not null constraints
Definable at any level of a table hierarchy
 Enforced for indicated table and its subtables

unique constraints

Root level (and columns) only
create index for physical schema
Unique or non-unique index on root table
 Non-unique index on subtable

Other Data Definition Features (cont.)
Authorization model for table hierarchies
grant and revoke on table or subtables
 Substitutability: implicit subtable authorization on
columns inherited from an authorized supertable

Ex #1: select privilege on person table
 Ex #2: update privilege on salary column of emp table


Some operations require authorization everywhere
deref function
 is of type predicate and type_xxx functions


SQL3 also supports granting of table/subtable
privileges with hierarchy option
Object Views in DB2 UDB
Typed views and view hierarchies
vperson
vdept
mgr
vemp
vstudent
dept
Requirements: virtual table hierarchies
Typed rows with (derived) object ids
 Views may be quite different from base data
 Support for interconnected "view schemas"

Types For Object Views
Create types for use in views
create type VPerson_t as (
name Varchar(40)
);
create type VEmp_t under VPerson_t as (
dept Ref(VDept_t)
);
create type VStudent_t under VPerson_t as (
kind Varchar(8)
);
create type VDept_t as (
name Varchar(20), mgr Ref(VEmp_t)
);
Typed View Hierarchies
Now create typed views (and subviews)
create view vperson of VPerson_t (ref is oid user generated) as
select VPerson_t(Varchar(oid)), name from only (person);
create view vemp of VEmp_t under vperson
(dept with options scope vdept) as
select VEmp_t(Varchar(oid)), name, VDept_t(Varchar(dept))
from emp where salary > 0;
create view vstudent of VStudent_t under vperson as
select VStudent_t(Varchar(oid)), name,
case when major like '%Engineer%' then 'Geek'
else 'non-Geek' end
from student;
create view vdept of VDept_t ...;
O-R Implementation Issues/Tradeoffs
Some guiding principles for DB2 UDB V5.2
Performance must equal/exceed relational equivalents
 Design amenable to future plans w.r.t. type evolution
 Structured types must be supported in columns (someday)
 Localize initial changes to query compiler where possible


Want "free" indexing, rewrites, optimization, parallelization, ...
Influenced by discussions with a CAD/CAM vendor
Information on existing approach and installations
 Requirements for efficiency of new products

Let's look briefly at two areas
Table hierarchy representation
 References and path query processing

Implementing Table Hierarchies
Implementation table approach

One physical table per table hierarchy with:
Type tag column (to distinguish subtable rows)
 Object id column
 Columns for all columns of the root table and its subtables

Vertical partitioning approach

One physical root table with:
Type tag column
 Object id column
 Columns for each root table column


N physical delta tables (one per subtable) with:
Object id column
 Columns for each column introduced by this subtable

Implementing Table Hierarchies (cont.)
Horizontal partitioning approach

N separate physical tables with:
Object id column
 Columns for every subtable column (inherited or not)

So what did we do for UDB V5.2...?

Vertical partitioning approach rejected quickly
Too many joins to materialize subtable rows
 Multi-column constraints and indices problematic


Horizontal partitioning approach rejected eventually
Uniqueness issue for user-generated object ids
 Query complexity for multi-hierarchy join queries



Ex: select p.name, q.name from Person p, Person q where ...
Implementation table approach taken for V5.2
Appeared to give us the most "free" functionality
 Adopted despite row size (null columns) downside

References and Path Expressions
Reference values in tables should have a scope
"Other end" info for query rewrite and join optimization
 Ditto for authorization checking (static vs. dynamic)
 Schema makes overly wide references unnecessary
 Uniqueness is hierarchy-relative, enforced with an index

V5.2 self-references (object ids) are user-generated
CAD/CAM vendor had "legacy references" in files
 Different users have different id generation schemes
 Loading cyclic data (e.g., emp<-> dept) is messy and slow
 Ditto for creating objects from an object cache

References and Path Expressions (cont.)
Path expressions are logically equivalent to subqueries
select E.name, E.dept->name, E.dept->mgr->name
from emp E
where E.dept->headcount > 10;
Actual approach: shared subquery generation (QGM)
Compute common paths (prefixes) once to save work
 Not every SQL context accepts an actual subquery
 Also need to handle non-serializable locking levels

Efficiency obtained through query rewrite, e.g.:
Subquery to outer-join transformation
 Outer-join to join transformation where possible

Where We Are Today in UDB
V5.2 of UDB contains new O-R features
Structured types with inheritance
 Object tables and table hierarchies
 References and path expressions
 Object views and view hierarchies

Moreover, so does the SQL3 standard
Includes object views and user-defined references
 IBM, Oracle, Informix heading in same general direction

Work continuing on O-R extensions

Let's have a brief look...
Additional Object Table Support
Business rule mechanisms for typed tables
Check constraints on tables/subtables (w/inheritance)
 Referential integrity constraints to and from tables/subtables
 Triggers on tables/subtables

Object modeling and management extensions
User-defined reference types (ref using)
 More flexible object view definitions
 Type and instance (i.e., row) evolution

Structured types for attributes/columns
Work in progress at IBM Santa Teresa Lab
 Functions/methods just around the corner as well

Other Exploratory O-R Work
Efficient support for collection types
Multivalued attributes (e.g., Project.team)
 Flavors: set, multiset, array, list, ...
 Need to integrate into SQL, support querying well
 Some experience from a first prototype

Other activities (and open problems)
Java mappings & bindings for O-R data
 XML & data-centric web sites ("d-commerce")
 Business object servers (caching/consistency)

Heterogeneous data & O-R database systems
 User-defined and/or external indexing
 Optimizer "hooks" for new data types
 Etc.!

Partial List of UDB O-R Contributors
Almaden Research Center

Mike Carey, Don Chamberlin, Srinivasa Narayanan, Bennet
Vance; C.M. Park; Guido Moerkotte
Santa Teresa Lab
Nelson Mattos
 Gene Fuh, Michelle Jou, Brian Tran

Toronto Lab
Doug Doole, Serge Rielau, Rick Swagerman
 Leo Lao, Walid Rjaibi, Calisto Zuzarte
 Cheryl Greene, various other consultants/hecklers

And as for future versions of UDB

Your name could appear here! (MS/PhD)
The End
What About Object-Oriented DBMSs?
OOPL + DBMS = OO-DBMS
Commonly based on C++ or Smalltalk
 Persistence, collections, queries, versions, ...

Lots of interesting and useful research results
O-O data models and query languages
 O-O query processing, system architecture, performance
 Various products (O2, Objectstore, Versant, Objectivity, ...)

No widespread commercial acceptance
Many differences across systems (despite ODMG-93)
 Never really caught up to RDBMS techology

Schema compilation, evolution painful
 Missing many of the relational "goodies"
 Single-language focus, lack of (relational) tools

Stonebraker Fellow Criteria (found on web)
Industrial database researcher
PhD from UC Berkeley
Must agree with the following motto:
Databases are the answer...!
 What was the question again...?

At least 6' tall
Had a PhD thesis advisor with first name Mike
Produced a PhD student with first name Mike

presentation source

Transcript presentation source

Directory