Hybrid Object-Relational DataBases

Download Report

Transcript Hybrid Object-Relational DataBases

Database Design &Implementation




First priority of database management:
Never lose the content of an operational database.
This ability to automatically preserve data off-line
is called persistence.
Hybrid Object-Relational DataBases (ORDB’s)
are one way to solve the persistence problem
when implementing object-oriented applications
with persistent data content.
Understanding ORDBs requires more details
about database architecture (next slide).
Obj-RelDBv05f.ppt - RJL 050907 - <1>
Hybrid Object-Relational DataBases
(ORDB’s)


ORDB’s can preserve network-structured data content by
what is called ‘pointer swizzling’: Every address pointer in
a memory-resident structured data object is mapped to/from
an object key or unique identifier in external RDB tables.
This must identify the table/class type and the row/object
number within the table.
The COOL framework supports ORDB management
with separate components for data and method access:
–
–
GEN generates C/C++ code for persistent I/O and navigation
through data structures in the ORDB. This code depends only
on the database model or schema, not on the application code.
LCP captures event-mediated flow of control information in a
state model database. Application programmers write the State
Actions
Obj-RelDBv05f.ppt - RJL 050907 - <2>
LCP (Life Cycle Prototype) Interpreter

LCP controls application behavior by capturing eventmediated flow-of-control information in a state model
database. It dispatches events to object instances by
‘delegating’ execution to a prototype ‘Active Class’.
This is one way of implementing inheritance.

LCP supports method access inside prototype object
instances by (compile-time) mapping of named state
actions to function pointers for each Active Class.
Developers write the application-specific code for the
State Actions.
Obj-RelDBv05f.ppt - RJL 050907 - <3>
LCP Interpreter Performance
LCP performance depends on two kinds of overhead:

Control-flow interpretation from the State Model
database: this overhead increases as state actions
become more fine-grained.

Event-dispatching overhead: this is inherent in
distributed and multi-processor (non-sharedmemory) applications, but not in shared-memory
multi-threaded ones.
Obj-RelDBv05f.ppt - RJL 050907 - <4>
Relational Databases





An RBD ‘instance’ (a snapshot of its value at time t) is a set
of ‘tuples’; each tuple contains a sequence of scalar attribute
values for its coordinates or fields.
One tuple is stored externally as a record in a ‘flat’ file, and
internally as a sequence of data members for a single object.
Any homogeneous set of tuples can be viewed conceptually
as a table with rows and columns, with one column per field
or data member and one row per tuple or object instance.
‘Tuples generalize Vectors in the sense that coordinates of
the tuple space can be different types, while vectors have
homogeneous type for their coordinates.
A table of tuples has an abstract geometrical interpretation
as a set of points that ranges over a multi-typed coordinate
space.
Obj-RelDBv05f.ppt - RJL 050907 - <5>
Relational Databases





An RBD is a set of relations, each of which is a set of
‘tuples’.
Each tuple represents an object with scalar attributes.
Each relation may be stored externally as a set of similarly
structured records in a ‘flat’ file.
A single relation or file can be viewed conceptually as a
table of rows.
The tuples of one relation define a geometrical set of points
in a multi-typed coordinate space. Tuples generalize Vectors
in the sense that coordinates of the tuple space can be
different types, while vectors have one homogeneous type
for all of their coordinates.
Obj-RelDBv05f.ppt - RJL 050907 - <6>
Database Normalzation

Complex structured data types (and object instances) are
decomposed or ‘normalized’ into simple parts of ‘Second
Normal Form’ (2NF): (no composite=structured attributes or
repeating groups of values are allowed).

For maintenance and reliability reasons, the design is further
normalized (3NF): (There are no redundant or indirectly
computable field values and all properties are stored in only
one place.)

[Refs: Sanders Ch. 3 & Appndx A; Kroenke Chapter 2 ]
Obj-RelDBv05f.ppt - RJL 050907 - <7>
Composite pkeys in RDB’s



Every tuple must have a unique field (or set of fields) called
its primary key (pkey) which uniquely identifies it.
A composite pkey for a child or component tuple is often
built by concatenating multiple key fields from a chain of
ancestors. This complicates pkey-to-fkey matching.
Example: Dept--->Course--->Section (ERD on next slide)
–
–
–
CS has Dept# = 91 and OOAD has course# = 91.523.
Many courses have the same section # 201, so 201 is only a unique
identifier within sections of a particular course, just as 523 is only
a unique course# within a particular Department.
In my syllabus the course id 05f523 assumes only one Dept (91)
and one Section (291), but it combines a semester id (05f) for the
term=Fall 2005 with the course number 523 so I can partition
directories for this course over multiple terms.
Obj-RelDBv05f.ppt - RJL 050907 - <8>
Composite Pkeys (2 Examples)
Examples: SWEngI: (Schedule id vs. my directory id
A unique pkey for my section of
SWEngI is a composite of Dept,
Course and Section number:
My directory id for SWEngI
91.523 in fall 2005 associates
05f with 523:
RJL Course
pkey: 523
CS Dept.
pkey: 91
SWEngCourse
pkey: 91+523
Term
pkey: 05f
Directory (one)
Pkey: 01f523
(These are ’instance diagrams’, not ERD’s.
They show field values for a single table row; Section (one of many)
pkey: 91+523+291
an ERD only shows entity and field types.
Obj-RelDBv05f.ppt - RJL 050907 - <9>
Surrogate Keys in RDB’s







The composite key for my section of OOAD in the Student
Information System (SIS) Database is a composite of Dept,
Course and Section number: 91.523.291.
For IBM’s RDB, EFCodd advocated a hidden ‘surrogate’
pkey to replace the user-defined composite keys. This
improves code quality and performance (by expediting the
fundamental RDB operation ‘join’: match pkeys to fkeys).
[Ref: EFCodd: Extending the RDB Model…, ACM TODS Dec’79]
Example: Entity with old and new key name and value:
Entity:
alternate (old pkey): surrogate (name = value):
DEpt
deptNo = 91
DEid = DE0000xx
COurse
courseNo = 91+523 COid = CO000yyy
SEction
sectNo= 91+523+291 SEid = SE000zzz
Obj-RelDBv05f.ppt - RJL 050907 - <10>
RDB with Surrogate pfkeys:





A GEN Example: CS Dept View of SIS Database ERD:
Entity:
alternate (old pkey): surrogate (name = value):
Dept
deptNo = 91
DEid = DE000001
Course
courseNo = 91+523 COid = CO000220
Section
sectNo= 91+523+291 SEid = SE002601*
Department DE
DE000001
91
(Note that the fkey only
references the immediate
ancestor or container of
an object or tuple.)
Course CO
CO000220
DE000001
523
A Persistence Requirement (WHY?):
Each table has a mnemonic
abbreviation (DE,CO,SE) encoded
into the pkey value of its objects.
Section SE*
SE002601
CO000220
291
(*Sections of
ALL Courses
are in table
SE)
Obj-RelDBv05f.ppt - RJL 050907 - <11>
Surrogate Keys in COOL/GEN

GEN uses surrogate pkeys and matching
fkeys, but does not hide them. (OK for
CAD/CASE tools with hi-tech users.)*

Pkeys can never be re-used for new objects,
as long as fkeys exist that can reference their
former object (in old but still-in-use
database versions).
____________
* CJDate: The RDB Model (A-W 2001) faults EFCodd for
recommending that users should not see surrogate pfkey values.
Obj-RelDBv05f.ppt - RJL 050907 - <12>
Persistent Object Identifiers



C++ and Java runtime objects have an object-id (oid),
typically represented by its virtual memory address on the
heap. This oid corresponds at least conceptually to the
pkey of an RDB tuple. This type of oid is not visible and
not persistent, because it disappears when the program
terminates.
Large RDBMS’s can avoid loss of information and
achieve persistence by taking over or duplicating OS
virtual memory management functions: moving large
segments of virtual memory to/from mass storage in a
fail-safe manner, saving relative addresses on hard disk.
Another way to achieve persistence is to convert
pkey/fkey relationships to/from object references during
database import/export. This is done by COOL/GEN.
Obj-RelDBv05f.ppt - RJL 050907 - <13>
Persistent Databases



Persistence means that pkeys and fkeys are preserved
during export to mass storage or remote sites and reimport by the same or another DataBase Management
System (DBMS)
A relational database (RBD) supports inter-object
relationships by foreign key (fkey) fields. These are
both user-visible and persistent: they get saved in mass
storage if the program terminates.
The process of mapping RDB pkey-fkey associations to
and from C++ pointers is called ‘pointer swizzling’.
Database
in Main
Memory
import
export
Database
in Mass
Storage
Obj-RelDBv05f.ppt - RJL 050907 - <14>
Referential Integrity




The principle of ’Referential Integrity’:
– To maintain valid database content, all fkey values
must match the unique primary key or object
identifier of another tuple, or else have the reserved
‘null’ (unknown or undefined) value.
N-ary relations (N-way associations) can be implemented
by a new associative entity, whose tuples contain exactly N
fkeys (plus optional non-key attributes).
Most relations are binary (N = 2). Note that fkeys may refer
to the same or different types.
See Example 2, Composite Pkeys (slide 9) and next slide 16
Obj-RelDBv05f.ppt - RJL 050907 - <15>
N-ary Relation (ERD Styles)



N-ary relations are many-to-many associations among N
object instances (of the same or different types).
N-way associations can be implemented by introducing a new
associative entity, whose tuples contain exactly N fkeys (plus
optional non-key attributes). (Most relations are binary: N = 2).
The diamond indicates a ternary relation among types AA, BB
and CC. [It is superfluous if N=2, if the relation is one to many, or if
an associative entity replaces it.]
Example
BB
for N=3:
AA
AA
BB
CC
Optional
attributes
AABBCC
CC
(3 fkeys
inside)
New Entity AABBCC gives these attributes a decent
home, and replaces the diamond.
Obj-RelDBv05f.ppt - RJL 050907 - <16>
Extended ER Diagrams

When an RDB implements an Extended ERD (EERD), a
tuple’s fkeys or inter-object cross-references can identify
either a super-class object or an associated parent or
container object (instance of a class).
–
–

These two fkey types have different semantic interpretations.
The principle of ‘Referential Integrity’ constrains fkey values to
the pkey value of some existing parent (AA or BB) instance.
To improve readability, EERD’s use different link layouts
for inheritance and for instance-level associations:
AA
BB
1..1
0..*
In this example, CC both
inherits from AA and is a
component of the
composite entity BB.
It contains two fkeys,
(say) AAid and BBid.
CC
Obj-RelDBv05f.ppt - RJL 050907 - <17>
Multiple Inheritance on EERD’s




Multiple inheritance requires an fkey to each superclass
object whose properties (attributes or methods) are
inherited.
In a prototype or delegation implementation of multiple
inheritance, superclass object[s] actually exist apart from
their corresponding subclass object[s]. Each sub-object has
one fkey to each of its direct ancestor object types.
For a C or C++, an object with multiple inheritance paths to
the same superclass type must be constrained to inherit data
from the same instance as well. (Path divergence is risky!)
In an ORDB, fkeys also support dynamic method
delegation. The COOL/LCP interpreter implements such a
dynamic map (from a concrete object to its generic or prototype
Active Instance, from object class to generic Active Class).
Obj-RelDBv05f.ppt - RJL 050907 - <18>
ORDB via Prototype Delegation



An Extended ERD (EERD) can be implemented as either a
relational RDB, object-oriented OODB, or object-relational
ORDB. An OODB is supported by its own class-based data
representations.*
An ORDB can be class-based or prototype-based with
delegation. (GEN is prototype-based.)**
Prototype delegation does not rely on Class membership for
method inheritance - it creates object-level relationships to
support method delegation: ANY client object can ‘delegate’
any of its behavior to another server object via the oid
equivalent of an fkey.
* CJDate insists that OODBs are fully supported by the relational
model without any extensions. (I agree :-) [Ibid]
** Breugg:OOSE (10.4.4) calls this a vertical mapping.
Obj-RelDBv05f.ppt - RJL 050907 - <19>
GEN Database: Persistence
Our GEN tool imports an external RDB to a memory-resident
object-relational database (ORDB):
 Its external persistent RDB format is a union of records
representing tuples of different types.
 During import, fkeys are augmented or replaced by parent
and first-child and next-sibling object reference pointers,
which follow strict GEN naming conventions.
 During export, pointers are discarded; fkeys are preserved
or restored for persistent storage in external RDB tuples.
Obj-RelDBv05f.ppt - RJL 050907 - <20>
GEN Database: Schema Constraints

The external RDB schema (or EER Diagram) is first
converted to Third Normal Form.

Other attributes that would normally comprise a user-defined
(and typically composite) primary key can be removed
during schema or EERD conversion to Third Normal Form.

This eliminates redundant attributes that functionally depend
on some fkey instead of the pkey attribute.

The corresponding EERD network tends to have links which
are all 1 to many relations with mandatory parents and
optional children (1..1 to 0..* on UML class diagrams).

Inheritance links are 1..1 to 0..1 on class diagrams, because
at most one subclass instance can inherit from any given
ancestor class instance.
Obj-RelDBv05f.ppt - RJL 050907 - <21>
GEN Database: External Format






Our GEN tool imports an external RDB to a memoryresident object-relational database (ORDB):
Its external RDB format is a union of records representing
tuples of different types:
Every tuple record has an integral and immutable
‘surrogate’ primary key attribute (and object id).
Different tuple types have pairwise disjoint pkey ranges.
All foreign keys (fkeys) use this surrogate pkey value to
refer to their parent (container or superclass) record type.
Chgen and gencpp are external-database-compatible.
Obj-RelDBv05f.ppt - RJL 050907 - <22>
GEN Database: Internal Format





During import, fkeys are augmented or replaced by direct
parent object pointers plus first-child and next-sibling
object reference pointers. These are constructed from fkey
names following strict GEN naming conventions.
This results in an internal ORDB format which is a set of
multiply-threaded linked lists of parent-to-children and
super-to-subclass object (tuple instance) reference pointers.
Parent-pointers support direct access to parent table
attributes, replacing pair-wise join queries in an RDB.
For each 1-to-many parent-child relationship, chgen
provides a child_loop macro while gencpp provides a foreach iterator.
Gencpp has two optional modes: chgen or C++/STL.
Obj-RelDBv05f.ppt - RJL 050907 - <23>
GEN Database: Import/Export
GEN creates two schema-based import/export utilities:
 pr_load parses tuples and imports an external RDB into a
memory-resident object-relational database (ORDB);
 pr_dump exports the modified ORDB back to the persistent
external RDB.
 During import, fkeys are augmented or replaced by direct
parent pointers plus first-child and next-sibling object
reference pointers. These are constructed from fkey names
following strict GEN naming conventions, Super- and subclass objects are also connected in the same way.
 This results in an internal ORDB format which is a set of
multiply-threaded linked lists from each parent through each
of its child-sets, that supports parent-child JOINs.
[Ref: Jiri Soukup: Taming C++, A-W 1994; see also $PH/JiriSoukup]
Obj-RelDBv05f.ppt - RJL 050907 - <24>
Importing RDB’s to C++/Java




If the RDB is imported to an object-relational database
implemented in C++ or Java, then during import the fkey
fields of RDB tuple types should be converted to
corresponding C++/Java object reference types.
Caveat/pre-condition: All fkeys implied by links on the
RDB’s data model or EERD must conform to inheritance
and type constraints of the language (C++ or Java).
Fkeys in an RDB can also support non-exhaustive or overlapping subclasses (going beyond C++ constraints).
Fkeys and object references can also support dynamic
migration (of an object among the subclasses of its class).
– Example: An object may make transitions among OLC
states (states become subclasses of the object’s class).
Obj-RelDBv05f.ppt - RJL 050907 - <25>
Object-Relational Databases Prototypes and Delegation
The last few slides were inspired by Shlaer-Mellor-User
Group email related to Divergent Inheritance (parallel
hierarchies). This motivates the use of prototypes and
delegation to explain the static information architecture that is
supported by COOL’s chGEN/GENcpp code generator, and
illustrates concurrent sub-state machine models for dynamic
behavior.
–
–
–
–
–
To: [email protected]
Subject: Re: (SMU) Polymorphic events and other
paranormal activity
Message 10/734 From [email protected]
Sep 04, 01 08:45:33 AM
responding to Fontana: . . .
Obj-RelDBv05f.ppt - RJL 050907 - <26>
Divergent Hierarchies




responding to Fontana:
> I think Jay was driving at divergent hierarchies, not multiple
inheritance, eg:
> relationship S1 - supertype Dog, subtypes BigDog and SmallDog
> relationship S2 - supertype Dog, subtypes BlackDog and WhiteDog
DOG CLASS
Relationship S1:
(BLACK xor WHITE)
(Mutex and exhaustive):
Black Dog
White Dog
Relationship S2:
(BIG xor SMALL)
(Mutex and exhaustive):
Big Dog
Small Dog
Divergent Hierarchies Example:
> relationship S1 - supertype Dog, subtypes BigDog xor SmallDog
> relationship S2 - supertype Dog, subtypes BlackDog xor WhiteDog
Obj-RelDBv05f.ppt - RJL 050907 - <27>
OLC’s with Concurrent Sub-states
(Illustrates direct product composition of concurrent STDs)








> Assume each of the 4 subclasses has its own ‘object lifecycle’ (OLC):
> BigDog: Woofing <--> Sleeping
> SmallDog: Yipping <--> Skittering
> BlackDog: Panting <--> Drooling
> WhiteDog: Shedding <--> Scratching
> Now create one instance of Dog - let's say it is a big black dog, with a
> dogId = 13. It must be in one of the BigDog states (Woofing or Sleep> ing),
> AND in one of the BlackDog states (Panting or Drooling).
Dog #13 (Big and Black):
Big:
Woofing
Black:
Panting
Drooling
Sleeping
Obj-RelDBv05f.ppt - RJL 050907 - <28>
Merging OLC Behaviors of
Concurrent Subclasses:



Each of the 4 subclasses has its own ‘object lifecycle’ (OLC);
E.g. every Big&Black Dog must be in one of the BigDog states (Woofing
or Sleeping), AND in one of the BlackDog states (Panting or Drooling).
Dog #13 (Big and Black) has the behavior/activity of both BigDogs and
BlackDogs:
Black Dog OLC:
Panting
Drooling
Woofing
Woof&
Pant
Woof&
Drool
Sleeping
Sleep
&Pant
Sleep&
Drool
Big Dog OLC:
Obj-RelDBv05f.ppt - RJL 050907 - <29>
Divergent Hierarchies - revisited (1)
DOG CLASS
Partition S1:
(BLACK xor WHITE)
(Mutex and exhaustive):
Black Dog






White Dog
Partition S2:
(BIG xor SMALL)
(Mutex and exhaustive):
Big Dog
Small Dog
C++ does not support divergent class hierarchies.
One alternate is prototype objects with delegation.
RDB’s can support prototypes and delegation:
In our example, each dog object belongs to one subclass for color,
and simultaneously to another subclass for size.
That is, a ‘real’ dog object simultaneously belongs to, and inherits
from, exactly one of the subclasses in each inheritance tree above.
The next slide shows (by its messiness) that multiple inheritance is
best avoided.
Obj-RelDBv05f.ppt - RJL 050907 - <30>
Divergent Hierarchies - revisited (2)
DOG CLASS
Partition S1:
(BLACK xor WHITE)
(Mutex and exhaustive):
Black Dog
Big Black Dog



White Dog
Big White Dog
Partition S2:
(BIG xor SMALL)
(Mutex and exhaustive):
Big Dog
Small Black Dog
Small Dog
Small White Dog
Level 3 models concrete ‘leaf’ objects or ‘real’ dogs, which
simultaneously belong to a distinct pair of subclasses at level 2 of
the inheritance tree (compositional inheritance of properties).
So there are really 4 leaf classes at level 3, below level 2 above.
Each leaf class instance at level 3 has exactly two paths up to level
1; both paths must end up at the same root object (Dog instance).
Obj-RelDBv05f.ppt - RJL 050907 - <31>
Composition or Implementation
Inheritance




With compositional inheritance, dogs will inherit from two
‘component’ classes: Color and Size.
This is ‘impure’ multiple inheritance in C++ ( impure
because the two ancestor classes have nothing in common
with animals, which may not behave well as clients of
Color or Size ancestor methods).
Java does not have multiple inheritance - but any class may
‘implement’ the interfaces Color’ and ‘Size’ instead.
Dogs must then be eligible to inherit (C++) or implement
(Java) all the methods of the Color and Size classes - an
undesirable compromise. Over-riding only hides the mismatch between class Dog and Color or Size classes.
Obj-RelDBv05f.ppt - RJL 050907 - <32>
References











Frank & Ulrich: ”Delegation: An Important Concept for the
Appropriate Design of Object Models”, JOOP June 2000 (p.13-17,44)
Eliens: Principles of OO Software Dev. 2ed., AWL 2000 (Sect. 5.4:
Prototypes - delegation vs. inheritance)
Kilov/Ross: Information Models, PH 1994 (Not about delegation, but
covers multiple/concurrent/overlapping subclass membership.)
Lee &Tepfenhart: UML and C++: A Practical Guide to OO Dev, 2ed,
PH 2001(pp206-210) (Multiple Inheritance examples Fig. 12-4,12-5)
Sanders: Data Modeling, Boyd-Fraser/ITP 1995 (Ch. 3 and Appendix)
Jiri Soukup: Taming C++, A-W 1994; see also $PH/JiriSoukup
CJDate: The RDB Model, A-W 2001
CJDate: Database Systems, 7ed! A-W 2000
EFCodd: Extending the RDB Model…, ACM TODS Dec’79
Kroenke: Database Concepts, 2nd Ed, P-H 2005 Chapter 2
Breugge & Dutoit: OO Software Engineering, A-W 2004.
Obj-RelDBv05f.ppt - RJL 050907 - <33>