Object-Relational Databases

Download Report

Transcript Object-Relational Databases

Database Design &Implementation


One thing is paramount in military, commercial or
industrial applications: Never lose the content of an
operational database. This requires persistence.
Hybrid object-relational databases (ORDB’s) are one way
to solve the problem of writing object-oriented applications
with persistent data content.
–

The COOL framework includes GEN which generates C/C++
code for a hybrid ORDB, and LCP which supports method
delegation between prototype object instances by interpreting
a database of function names and/or function pointers.
Understanding ORDBs requires more details about
database architecture (more slides).
Obj-RelDBv2.ppt - RJL 020117 - # 1
Relational Databases




An RBD is a set of ‘tuples’; each tuple represents a simple
object with scalar attributes. Tuples are stored externally as
records in a file and viewed conceptually as rows of a table, or
geometrically as points in a multi-typed coordinate space.
Complex structured data types (and object instances) are
decomposed or ‘normalized’ into simple parts or Second
Normal Form (2NF): (no structured attributes or repeating
groups are allowed).
For maintenance and reliability reasons, the design is further
normalized (3NF): (There are no redundant or indirectly
computable field values and all properties are stored in only
one place.) [Ref: Sanders Ch. 3 and Appndx A.]
Other database types include object-oriented OODB’s, and
object-relational ORDB’s. (next slide)
Obj-RelDBv2.ppt - RJL 020117 - # 2
Composite pkeys in RDB’s



Every tuple must have a unique field (or set of fields) called
its primary key (pkey) which uniquely identifies it.
A composite pkey for a child or component tuple is often
built by concatenating multiple key fields from a chain of
ancestors. This complicates pkey-to-fkey matching.
Example: Dept--->Course--->Section (ERD on next slide)
–
–
–
CS has Dept# = 91 and OOAD has course# = 91.522.
Almost every course has a section # 201, so 201 is only a unique
identifier within the child set of sections of a particular course,
just as 522 is only a unique course# within a particular
Department.
(In my syllabus I renamed this 01f522 - Dept 91 is assumed.
01f adds a new ‘term=Fall 2001’ component to this identifier.
I teach only one section of CS Dept courses over multiple terms.)
Obj-RelDBv2.ppt - RJL 020117 - # 3
Composite Pkeys (Example)


Example: CS Dept View of SIS Database ERD:
The unique pkey which selects my section of OOAD in the
Student Information System (SIS) Database is a composite
of Dept, Course and Section number: 91.522.201.
Department
pkey: 91
Course
pkey: 91+522
(This is an’instance diagram’, not an ERD.
It shows field values in a single table row,
whereas an ERD shows only entity types.)
Section
pkey: 91+522+201
Obj-RelDBv2.ppt - RJL 020117 - # 4
Surrogate Keys in RDB’s


The unique pkey which selects this section of OOAD in the
Student Information System (SIS) Database is a composite of
Dept, Course and Sectio number: 91.522.201.
For IBM’s RDB, EFCodd advocated a hidden ‘surrogate’
pkey to replace the user-defined composite keys. This
improves code quality and performance (by expediting the
fundamental RDB operation ‘join’: match pkeys to fkeys).





Example: Entity with old and new key name and value:
Entity:
alternate (old pkey): surrogate (name = value):
Dept
deptNo = 91
DEid = DE000001
Course
courseNo = 91+522 COid = CO000220
Section
sectNo= 91+522+201 SEid = SE002601
Obj-RelDBv2.ppt - RJL 020117 - # 5
RDB with Surrogate pkeys:





A GEN Example: CS Dept View of SIS Database ERD:
Entity:
alternate (old pkey): surrogate (name = value):
Dept
deptNo = 91
DEid = DE000001
Course
courseNo = 91+522 COid = CO000220
Section
sectNo= 91+522+201 SEid = SE002601
Department DE
DE000001
91
Course CO
CO000220
DE000001
522
(Note that the fkey only
references the immediate
ancestor or container of
an object or tuple.)
Section SE
SE002601
CO000220
201
A Persistence Requirement (WHY?):
Each table has a mnemonic
abbreviation (DE,CO,SE) encoded
into the pkey value of its objects. Obj-RelDBv2.ppt
- RJL 020117 - # 6
Surrogate Keys in COOL/GEN

GEN uses surrogate pkeys and matching
fkeys, but does not hide them. (OK for
CAD/CASE tools with hi-tech users.)

Pkeys can never be re-used for new objects,
as long as fkeys exist that can reference their
former object (in old but still-in-use
database versions).
Obj-RelDBv2.ppt - RJL 020117 - # 7
Persistent Object Identifiers



C++ and Java objects have an object-id (oid), typically
represented by its virtual memory address. This oid
corresponds at least conceptually to the pkey of an RDB
tuple. This type of oid is not visible and not persistent,
because it disappears when the program terminates.
One way to avoid loss of information and achieve
persistence is to have the RDBMS take over or duplicate
OS memory-mapping functions: moving large segments
of virtual memory to/from mass storage in a fail-safe
manner.
Another way to achieve persistence is to convert
pkey/fkey relationships to/from object references during
import/export data flows. (This is done by COOL/GEN.)
Obj-RelDBv2.ppt - RJL 020117 - # 8
Persistent Databases



Persistence means that pkeys and fkeys are preserved
during export to mass storage or remote sites and reimport by the same or another DataBase Management
System (DBMS)
A relational database (RBD) supports inter-object
relationships by foreign key (fkey) fields. These are
both user-visible and persistent: they get saved in mass
storage if the program terminates.
The process of mapping RDB pkey-fkey associations to
and from C++ pointers is called ‘pointer swizzling’.
Database
in Main
Memory
import
export
Database
in Mass
Storage
Obj-RelDBv2.ppt - RJL 020117 - # 9
Referential Integrity




The principle of ’Referential Integrity’:
– To maintain valid database content, all fkey values
must match the unique primary key or object
identifier of another tuple, or else have the reserved
‘null’ (unknown or undefined) value.
N-ary relations (N-way associations) can be implemented
by a new associative entity, whose tuples contain exactly
N fkeys (plus optional non-key attributes).
Most relations are binary (N = 2). Note that fkeys may
refer to the same or different types.
Example: see next slide
Obj-RelDBv2.ppt - RJL 020117 - # 10
N-ary Relation (ERD Styles)



N-ary relations are many-to-many associations among N
object instances (of the same or different types).
N-way associations can be implemented by introducing a new
associative entity, whose tuples contain exactly N fkeys (plus
optional non-key attributes). (Most relations are binary: N = 2).
The diamond indicats a ternary relation among types AA, BB
and CC. [It is superfluous if N=2, if the relation is one to many, or if
an associative entity replaces it.]
Example
BB
for N=3:
AA
AA
BB
CC
Optional
attributes
AABBCC
CC
(3 fkeys
inside)
New Entity AABBCC gives
these atributes a home, and replaces the diamond.
Obj-RelDBv2.ppt - RJL 020117 - # 11
Extended ER Diagrams

When an RDB implements an Extended ERD (EERD), a
tuple’s fkeys or inter-object cross-references can identify
either a super-class object or an associated parent or
container object (instance of a class).
–

Both types of fkeys share the same integer key value range,
although they have distinct semantic meaning.
To improve readability, EERD’s should use different styles
for inheritance than for instance-level associations
AA
BB
0..*
In this example, CC both
inherits from AA and is a
component of the
composite entity BB.
It contains two fkeys,
(say) AAid and BBid.
CC
Obj-RelDBv2.ppt - RJL 020117 - # 12
Multiple Inheritance on EERD’s




Multiple inheritance requires an fkey to each superclass
object whose properties (atttributes or methods) are
inherited.
In a prototype implementation of multiple inheritance,
superclass object[s] actually exist apart from their
corresponding subclass object[s]. Each sub-object has fkeys
to each of its direct ancestor objects.
For a C or C++ implementation, only one of possibly
divergent inheritance hierarchies can be mapped into precompiled method inheritance. Avoid divergence if possible!
For an ORDB, fkeys also support dynamic mapping of
method inheritance. The COOL/LCP interpreter
implements such a dynamic map (from a concrete object to its
generic Active Instance, from object class to generic Active Class).
Obj-RelDBv2.ppt - RJL 020117 - # 13
ORDB via Prototype Delegation




An Extended ERD (EERD) can be implemented as either a
relational RDB, object-oriented OODB, or object-relational
ORDB. An OODB is supported by its own class-based data
representations.
An ORDB can be class-based or prototype-based with
delegation. (GEN is prototype-based.)
Prototype delegation does not rely on Class membership for
method inheritance - it creates object-level relationships to
support method delegation: ANY client object can ‘delegate’
any of its behavior to another server object via the oid
equivalent of an fkey.
To make disciplined use of delegation requires some policy
other than anarchy.
Obj-RelDBv2.ppt - RJL 020117 - # 14
GEN Database: Persistence
Our GEN tool imports an external RDB to a memory-resident
object-relational database (ORDB):
 Its external persistent RDB format is a union of records
representing tuples of different types.
 During import, fkeys are augmented or replaced by parent
and first-child and next-sibling object reference pointers,
which follow strict GEN naming conventions.
 During export, pointers are removed but fkeys are preserved
or restored for persistent storage in external RDB tuples.
Obj-RelDBv2.ppt - RJL 020117 - # 15
GEN Database: Schema Constraints

The external RDB schema (or EER Diagram) is first
converted to Third Normal Form.

Other attributes that would normally comprise a userdefined (and typically composite) primary key can be
removed during schema or EERD conversion to Third
Normal Form.

This eliminates redundant attributes that functionally
depend on some fkey instead of the pkey attribute.
Obj-RelDBv2.ppt - RJL 020117 - # 16
GEN Database: External Format





Our GEN tool imports an external RDB to a memoryresident object-relational database (ORDB):
Its external RDB format is a union of records representing
tuples of different types:
Every tuple record has an integral and immutable
‘surrogate’ primary key attribute (and object id).
Different tuple types have pairwise disjoint pkey ranges.
All foreign keys (fkeys) use this surrogate pkey value to
refer to their parent (container or superclass) record type.
Obj-RelDBv2.ppt - RJL 020117 - # 17
GEN Database: Internal Format




During import, fkeys are augmented or replaced by direct
parent object pointers plus first-child and next-sibling
object reference pointers. These are constructed from fkey
names following strict GEN naming conventions.
This results in an internal ORDB format which is a set of
multiply-threaded linked lists of parent-to-children and
super-to-subclass object (tuple instance) reference pointers.
Parent-pointers support direct access to parent table
attributes, replacing pair-wise join queries in an RDB.
For each 1-to-many parent-child relationship, chgen
provides a child_loop macro while gencpp provides a foreach iterator.
Obj-RelDBv2.ppt - RJL 020117 - # 18
GEN Database: Import/Export
GEN creates two schema-based import/export utilities:
 pr_load parses tuples and imports an external RDB into a
memory-resident object-relational database (ORDB);
 pr_dump exports the modified ORDB back to the persistent
external RDB.
 During import, fkeys are augmented or replaced by direct
parent pointers plus first-child and next-sibling object
reference pointers. These are constructed from fkey names
following strict GEN naming conventions, Super- and subclass objects are also connected in the same way.
 This results in an internal ORDB format which is a set of
multiply-threaded linked lists from each parent through each
of its child-sets, that supports parent-child JOINs.
Obj-RelDBv2.ppt - RJL 020117 - # 19
Importing RDB’s to C++/Java




If the RDB is imported to an object-relational database
implemented in C++ or Java, then during import the fkey
fields of RDB tuple types should be converted to
corresponding C++/Java object reference types.
Caveat/pre-condition: All fkeys implied by links on the
RDB’s data model or EERD must conform to inheritance
and type constraints of the language (C++ or Java).
Fkeys in an RDB can also support non-exhaustive or overlapping subclasses (going beyond C++ constraints).
Fkeys and object references can also support dynamic
migration (of an object among the subclasses of its class).
– Example: An object may make transitions among OLC
states (states become subclasses of the object’s class).
Obj-RelDBv2.ppt - RJL 020117 - # 20
Object-Relational Databases Prototypes and Delegation
The last few slides were inspired by Shlaer-Mellor-User
Group email related to Divergent Inheritance (parallel
hierarchies). This motivates the use of prototypes and
delegation to explain the static information architecture that is
supported by COOL’s chGEN/GENcpp code generator, and
illustrates concurrent sub-state machine models for dynamic
behavior.
–
–
–
–
–
To: [email protected]
Subject: Re: (SMU) Polymorphic events and other
paranormal activity
Message 10/734 From [email protected]
Sep 04, 01 08:45:33 AM
responding to Fontana: . . .
Obj-RelDBv2.ppt - RJL 020117 - # 21
Divergent Hierarchies




responding to Fontana:
> I think Jay was driving at divergent hierarchies, not multiple
inheritance, eg:
> relationship S1 - supertype Dog, subtypes BigDog and SmallDog
> relationship S2 - supertype Dog, subtypes BlackDog and WhiteDog
DOG CLASS
Relationship S1:
(BLACK xor WHITE)
(Mutex and exhaustive):
Black Dog
White Dog
Relationship S2:
(BIG xor SMALL)
(Mutex and exhaustive):
Big Dog
Small Dog
Divergent Hierarchies Example:
> relationship S1 - supertype Dog, subtypes BigDog xor SmallDog
> relationship S2 - supertype Dog, subtypes BlackDog xor WhiteDog
Obj-RelDBv2.ppt - RJL 020117 - # 22
OLC’s with Concurent Sub-states








> Assume each of the 4 subclasses has its own ‘object lifecycle’ (OLC):
> BigDog: Woofing <--> Sleeping
> SmallDog: Yipping <--> Skittering
> BlackDog: Panting <--> Drooling
> WhiteDog: Shedding <--> Scratching
> Now create one instance of Dog - let's say it is a big black dog, with a
> dogId = 13. It must be in one of the BigDog states (Woofing or Sleep> ing),
> AND in one of the BlackDog states (Panting or Drooling).
Dog #13 (Big and Black):
Big:
Woofing
Black:
Panting
Drooling
Sleeping
Obj-RelDBv2.ppt - RJL 020117 - # 23
Merging OLC Behaviors of
Concurrent Subclases:



Each of the 4 subclasses has its own ‘object lifecycle’ (OLC);
E.g. every Big&Black Dog must be in one of the BigDog states (Woofing
or Sleeping), AND in one of the BlackDog states (Panting or Drooling).
Dog #13 (Big and Black) has the behavior/activity of both BigDogs and
BlackDogs:
Black Dog OLC:
Panting
Drooling
Woofing
Woof&
Pant
Woof&
Drool
Sleeping
Sleep
&Pant
Sleep&
Drool
Big Dog OLC:
Obj-RelDBv2.ppt - RJL 020117 - # 24
Divergent Hierarchies - revisited (1)
DOG CLASS
Partition S1:
(BLACK xor WHITE)
(Mutex and exhaustive):
Black Dog






White Dog
Partition S2:
(BIG xor SMALL)
(Mutex and exhaustive):
Big Dog
Small Dog
C++ does not support divergent class hierarchies.
One alternate is prototype objects with delegation.
RDB’s can support prototypes and delegation:
In our example, each dog object belongs to one subclass for color,
and simultaneously to another subclass for size.
That is, a ‘real’ dog object simultaneously belongs to, and inherits
from, exactly one of the subclasses in each inheritance tree above.
The next slide shows (by its messiness) that multiple inheritance is
best avoided.
Obj-RelDBv2.ppt - RJL 020117 - # 25
Divergent Hierarchies - revisited (2)
DOG CLASS
Partition S1:
(BLACK xor WHITE)
(Mutex and exhaustive):
Black Dog
Big Black Dog



White Dog
Big White Dog
Partition S2:
(BIG xor SMALL)
(Mutex and exhaustive):
Big Dog
Small Black Dog
Small Dog
Small White Dog
Level 3 includes concrete ‘leaf’ objects or ‘real’ dogs, which
simultaneously belong to a distinct pair of subclasses at level 2 of
the inheritance tree (compositional inheritance of properties).
So there are really 4 leaf classes at level 3, below level 2 above.
Each leaf class instance at level 3 has exactly two paths up to level
1; both paths must end up at the same root object (Dog instance).
Obj-RelDBv2.ppt - RJL 020117 - # 26
Composition or Implementation
Inheritance




With compositional inheritance, dogs will inherit from two
‘component’ classes: Color and Size.
This is ‘impure’ multiple inheritance in C++ ( impure
because the two ancestor classes have nothing in common
with animals, which may not behave well as clients of
Color or Size ancestor methods).
Java does not have multiple inheritance - but any class may
‘implement’ the interfaces Color’ and ‘Size’ instead.
Dogs must then be eligible to inherit (C++) or implement
(Java) all the methods of the Color and Size classes - an
undesirable compromise. Over-riding only hides the mismatch between class Dog and Color or Size classes.
Obj-RelDBv2.ppt - RJL 020117 - # 27
References





Frank & Ulrich: ”Delegation: An Important Concept for the
Appropriate Design of Object Models”, JOOP June 2000 (pp13-17,
44)
Eliens: Principles of OO Software Dev. 2ed., AWL 2000 (Sect. 5.4:
Prototypes - delegation vs. inheritance)
Kilov/Ross: Information Models, PH 1994 (Not about delegation,
but covers multiple/concurrent/overlapping subclass membership.)
Lee &Tepfenhart: UML and C++: A Practical Guide to OO Dev,
2ed, PH 2001(pp206-210) (Multiple Inheritance examples Fig. 124,12-5)
Sanders: Data Modeling, Boyd-Fraser/ITP 1995 (Ch. 3 and
Appendix)
Obj-RelDBv2.ppt - RJL 020117 - # 28