Object-Oriented Database

Download Report

Transcript Object-Oriented Database

Object-Oriented Database
•
•
•
•
•
New Database Applications
Object-Oriented Data Models
Object-Oriented Languages
Persistent Programming Languages
Persistent C++ Systems
CIS-552
Introduction
1
New Database Applications
• Data models designed for data-processing-style
applications are not adequate for new technologies
such as computer-aided design, computer-aided
software engineering, multimedia, and image
database, and document/hypertext databases.
• These new applications requirement the database
system to handle features such as:
– Complex data types
– Data encapsulation and abstract data structures
– Novel methods for indexing and querying
CIS-552
Introduction
2
Object-Oriented Data Model
• Loosely speaking, an object corresponds to an
entity in the E-R model.
• The object-oriented paradigm is based on
encapsulating code and data related to an object
into a single unit.
• The object-oriented data model is a logical model
(like the E/R model).
• Adaptation of the object-oriented programming
paradigm (e.g. Smalltalk, C++) to database
systems.
CIS-552
Introduction
3
Object Identity
• An object retains its identity even if some or all of the
values of the variables or definitions of methods change
over time.
• Object identity is a stronger notion of identity than in
programming languages or data models not based on
object orientation.
– Value – data value; used in relational systems.
– Name – supplied by user; used for variables in procedures.
– Build-in – identity built into data model or programming language
• No user-supplied identifier is required.
• Form of identity used in object-oriented systems.
CIS-552
Introduction
4
Object Identifiers
Object identifiers used to uniquely identify objects
– Can be stored as a field of an object, to refer to another
object.
– E.g., the spouse field of a person object may be an
identifier of another person object
– Can be system generated (created by database) or
external (such as social-security number)
CIS-552
Introduction
5
Object Containment
bicycle
wheel
rim
spokes tire
brake
lever pad
gear
frame
cable
• Each component in a design may contain other components
• Can be modeled as containment of objects. Objects containing other
objects are called complex or composite objects.
• Multiple levels of containment create a containment hierarchy: links
interpreted as is-part-of, not is-a.
• Allows data to be viewed at different granularities by different users.
CIS-552
Introduction
6
Object-Oriented Languages
• Object-oriented concepts can be used as a design
tool, and be encoded into, for example, a relational
database (analogous to modeling data with E/R
diagram and then converting to a set of relations).
• The concepts of object orientation can be
incorporated into a programming language that is
used to manipulate the database.
– Object-relational systems – add complex types and
object-orientation to relational languages.
– Persistent programming languages – extend objectoriented programming language to deal with databases
by adding concepts such as persistence and collections.
CIS-552
Introduction
7
OO-DBMS
• Save objects created by an OOP language to
disk (make objects persistent).
• Ensure that if an object is saved, all of the
objects it references are saved.
• Allow saved objects (and the objects they
reference) to be retrieved from disk.
• Provide transaction management and
concurrency control to maintain data
integrity.
CIS-552
Introduction
8
Persistent Programming Language
• Persistent programming languages:
– Allow objects to be created and stored in a database without any
explicit format changes (format changes are carried out
transparently).
– Allow objects to be manipulated in-memory – do not need to
explicitly load from or store to the database.
– Allow data to be manipulated directly from the programming
language without having to go though a data manipulation
language like SQL.
• Due to power of most programming languages, it is easy to
make programming errors that damage the database.
• Complexity of languages makes automatic high-level
optimization more difficult.
• Do not support declarative querying very well
CIS-552
Introduction
9
Persistence of Objects
Approaches to make transient objects persistent include
establishing persistence by:
– Class – declare all objects of a class to be persistent;
simple but inflexible.
– Creation – extend the syntax for creating transient
objects to create persistent objects.
– Marking – an object that is to persist beyond program
execution is marked as persistent before program
termination.
– Reference – declare (root) persistent objects; objects are
persistent if they are referred to (directly or indirectly)
from a root object.
CIS-552
Introduction
10
Object Identity and Pointers
• A persistent object is assigned a persistent object identifier.
• Degrees of permanence of identity:
– Intraprocedure – identity persists only during the
execution of a single procedure.
– Intraprogram – identity persists only during execution
of a single program or query.
– Interprogram – identity persists from one program
execution to another.
– Persistent – identity persists through program
executions and structural reorganizations of data;
required for object-oriented systems.
CIS-552
Introduction
11
Object Identity and Pointers (Cont.)
• In O-O languages such as C++, an object identifier
is actually an in-memory pointer.
• Persistent pointer – persists beyond program
execution; can be thought as a pointer into the
database.
CIS-552
Introduction
12
Storage and Access of Persistent Objects
How to find objects in the database:
• Name objects (as you would name files) – cannot scale to
large number of objects.
– Typically given only to class extents and other
collections of objects, but not to objects.
• Expose object identifiers or persistent pointers to the
objects – can be stored externally.
– All objects have object identifiers.
CIS-552
Introduction
13
Storage and Access of Persistent Objects (Cont.)
How to find objects in the database (Cont):
• Store collections of objects and allow programs to iterate
over the collections to find required objects.
– Model collections of objects as collection types
– Class extent – the collection of all objects belonging to
the class; usually maintained for all classes that can
have persistent objects.
CIS-552
Introduction
14
Persistent C++ System
• C++ language allows support for persistence to be
added without changing the language
– declare a class called Persistent_Object with
attributes and methods to support persistence
– Overloading - ability to redefine standard function names
and operators (i.e., +, -, the pointer dereference operator
) when applied to new types
• Providing persistence without extending the C++
language is
– relatively easy to implement
– but more difficult to use
CIS-552
Introduction
15
ODMG C++ Object Definition Language
• Standardized language extensions to C++ to support persistence
• ODMG standard attempts to extend C++ as little as possible, providing
most functionality via template classes and class libraries
• Templates class Ref<class> used to specify references (persistent
pointers)
• Template class Set<class> used to define sets of objects. Provides
methods such as insert_element and delete_element.
• The C++ object definition language (ODL) extends the C++ type
definition syntax in minor ways.
Example: Use notation inverse to specify referential integrity
constraints.
CIS-552
Introduction
16
ODMG C++ ODL: Example
Class Person : public Persistent Object {
public:
String name;
String address;
};
class Customer : public Person {
public:
Date member_from;
int customer_id;
Ref<Branch> home_branch;
Set<Ref<Account>> accounts inverse
Account::owners;
};
CIS-552
Introduction
17
ODMG C++: Example (Cont.)
Class Account : public Persistent_Object {
private:
int balance;
public:
int number;
Set<Ref<Customer>> owners inverse Customer::accounts;
int find_balance();
int update_balance(int delta);
}
CIS-552
Introduction
18
ODMG C++ Object Manipulation Language
• Uses persistent versions of C++ operators such as
new(db).
Ref<Account> account = new(bank_db) Account;
new allocates the object in the specified database, rather than
in memory
• Dereference operator  when applied on a
Ref<Customer> object in memory (if not already
present) and returns in-memory pointer to the object.
• Constructor for a class – a special method to initialize
objects when they are created; called automatically when
new is executed
• Destructor for a class – a special method that is called
when objects in the class are deleted.
CIS-552
Introduction
19
ODMG C++ OML: Example
int create_account_owner(String name, String address) {
Database * bank_db;
bank_db = Database::open(“Bank-DB”);
Transaction Trans;
Trans.begin();
Ref<Account> account = new(bank_db) Account;
Ref<Customer> cust = new(bank_db) Customer;
cust->name = name;
cust->address = address;
cust->accounts.insert_element(account);
account->owners.insert_element(cust);
… Code to initialize customer_id, account number, etc.
Trans.commit();
}
CIS-552
Introduction
20
ODMG C++ OML: Example of Iterators
int print_customers() {
Database * bank_db;
bank_db = Database::open(“Bank-DB”);
Transaction Trans;
Trans.begin();
Iterator<Ref<Customer>> iter =
Customer::all_customer.create_iterator();
Ref<Customer> p;
while (iter.next(p)) {
print_cust(p);
}
Trans.commit();
}
• Iterator construct helps step through objects in a collection
CIS-552
Introduction
21
Mapping of Objects to Files
• Mapping objects to files is similar to mapping tuples to
files in a relational system; object data can be stored using
file structures.
• Objects in O-O databases may lack uniformity and may be
very large; such objects have to be managed differently
from records in a relational system.
– Set fields with a small number of elements may be implemented
using data structures such as linked lists.
– Set fields with a larger number of elements may be implemented as
B-trees, or as separate relations in the database.
– Set fields can also be eliminated at the storage level by
normalization.
CIS-552
Introduction
22
Mapping of Objects to Files (Cont.)
• Objects are identified by an object identifier
(OID); the storage system needs a mechanism to
locate an object given its OID.
– logical identifiers do not directly specify an object’s
physical location; must maintain an index that maps an
OID to the object’s actual location.
– physical identifiers encode the location of the object
so the object can be found directly. Physical OIDs
typically have the following part:
1. a volume or file identifier
2. a page identifier within the volume or file
3. an offset within the page
CIS-552
Introduction
23
Management of Persistent Pointers
• Physical OIDs may have a unique identifier. This identifier
is stored in the object also and is used to detect references
via dangling pointers.
Object
Physical Object Identifier
Vol.
Page
Unique-Id
Offset Unique-Id
Data ……
(a) General Structure
Location
6.32.45608
Unique-Id
51
Data
… data …
Good OID
6.32.45608
51
Bad OID
6.32.45608
50
(b) Example of use
CIS-552
Introduction
24
Management of Persistent Pointers (Cont.)
• Implement persistent pointers using OIDs; persistent pointers are
substantially longer than are in-memory pointers
• Pointer swizzling cuts down on cost of locating persistent objects
already in memory.
• Software swizzling (swizzling on pointer dereference)
– When a persistent pointer is first dereferenced, it is swizzled
(replaced by an in-memory pointer) after the object is located in
memory.
– Subsequent dereferences of the same pointer become cheap
– The physical location of an object in memory must not change if
swizzled pointers point to it; the solution is to pin pages in
memory
– When an object is written back to disk, any swizzled pointers it
contains need to be unswizzled.
CIS-552
Introduction
25
Hardware Swizzling
• Persistent pointers in objects need the same amount of
space as in-memory pointers – extra storage external to the
object is used to store rest of pointer information.
• Uses virtual memory translation mechanism to efficiently
and transparently convert between persistent pointers and
in-memory pointers.
• All persistent pointers in a page are swizzled when the
page is first read in.
– Thus programmers have to work with just one type of
pointer, i.e. in-memory pointer.
– Some of the swizzled pointers may point to virtual memory
addresses that are currently not allocated any real memory.
CIS-552
Introduction
26
Hardware Swizzling
• Persistent pointer is conceptually split into two parts: a
page identifier, and an offset within the page.
– The page identifier in a pointer is a short indirect
pointer: each page has a translation table that provides a
mapping from the short page identifiers to full database
page identifiers.
– Translation table for a page is small (at most 1024
pointers in a 4096 byte page with 4 byte pointers)
– Multiple pointers in a page to the same page share same
entry in the translation table.
CIS-552
Introduction
27
Hardware Swizzling (Cont.)
Page ID Off.
2395
255
Page ID Off.
4867
020
Page ID Off.
4867
170
Object 1
Object 2
Object 3
PageID
Translation Table
2395
4867
FullPageID
679.34.28000
519.56.84000
• Page image when on disk (before swizzling)
CIS-552
Introduction
28
Hardware Swizzling (Cont.)
• When an in-memory pointer is dereferenced, if the operating system
detects the page it points to has not yet been allocated storage, a
segmentation violation occurs.
• mmap call associates function to be called on segmentation violation
• The function allocates storage for the page and reads in the page from
disk.
• Swizzling is then done for all persistent pointers in the page (located
using object type information).
– If pointer points to a page not already allocated a virtual memory
address, a virtual memory address is allocated (preferably the
address in the short page identifier if it is unused). Storage is not
yet allocated for the page.
– The page identifier in pointer (and translation table entry) are
changed to the virtual memory address of the page.
CIS-552
Introduction
29
Hardware Swizzling (Cont.)
Page ID Off.
5001
255
Page ID Off.
4867
020
Page ID Off.
4867
170
Object 1
Object 2
Object 3
PageID
Translation Table
5001
4867
FullPageID
679.34.28000
519.56.84000
Page image after swizzling
• Page with short page identifier 2395 was allocated address 5001.
Observe change in pointers and translation table.
• Page with short page identifier 4867 has been allocated address 4867.
No change in pointer and translation table.
CIS-552
Introduction
30
Hardware Swizzling (Cont.)
• After swizzling, all short page identifiers point to virtual
memory address allocated for the page
– Functions accessing the objects need not know it has persistent
pointers!
– Can reuse existing code and libraries that use in-memory pointers.
• If all pages are allocated the same address as in the short
page identifier, no changes required in the page!
• No need for deswizzling – page after swizzling can be
saved back directly to disk
• A process should not access more pages than size of virtual
memory – reuse of virtual memory addresses for other
pages is expensive.
CIS-552
Introduction
31
Disk versus Memory Structure of Objects
• The format in which objects are stored in memory may be
different from the format in which they are stored on disk
in the database. Reasons are :
– software swizzling – structure of persistent and in-memory
pointers are different
– database accessible from different machines, with different data
representations
• Make the physical representation of objects in the database
independent of the machine and the compiler.
• Can transparently convert from disk representation to form
required on the specific machine, language, and compiler,
when the object (or page) is brought into memory.
CIS-552
Introduction
32
Large Objects
• Very large objects are called binary large objects (blobs)
because they typically contain binary data. Examples
include:
– text documents
– Graphical data such as images and computer aided designs
– audio and video data
• Large objects may need to be stored in a contiguous
sequence of bytes when brought into memory.
– If an object is bigger than a page, contiguous pages of the buffer
pool must be allocated to store it.
– May be preferable to disallow direct access to data, and only allow
access through a file-system-like API, to remove need for
contiguous storage.
CIS-552
Introduction
33
Modifying Large Objects
• Use B-tree structures to represent object: permits reading
the entire object as well as updating, inserting, and deleting
bytes from specified regions of the object.
• Special-purpose application programs outside the database
are used to manipulate large objects:
– Text data treated as a byte string manipulated by editors and
formatters.
– Graphical data is represented as a bit map or as a set of geometric
objects; can be managed within the database system or by special
software (e.g. VLSI design).
– Audio/video data is typically created and displayed by separate
application software and modified using special purpose editing
software.
– checkout/checkin method for concurrency and version control
CIS-552
Introduction
34