No Slide Title

Download Report

Transcript No Slide Title

Developing ObjectRelational Database
Applications
Paul Brown
Chief Plumber
INFORMIX Software
What We Will Talk About
• Plenty of Focus on Features & Functionality
• Question is, “How to use it all?”
Object-Relational technology makes a fresh look at
the process of database analysis and design
useful.
• What is different, what is the same?
• Many things are the same . . .
• . . . some things are new.
Developers using an ORDBMS benefit by taking a
more holistic approach to development than was
common with SQL-92 DBMS products.
Informix
user.conference
2
Overview of Tutorial
• Design Objectives of Databases
• How are OR Database Similar/Different?
• What about the theory?
• Methodology Description
•
•
•
•
Steps of a development procedure
Analysis and design
Implementation advice
Examples as we go through
• Architecture Advice
• Back-end and middle-ware
• Component-centric development
3
Informix
user.conference
Design Objectives
Meet the Needs of End Users
• Complete
• The Database needs to model everything that is relevant
to the problem domain (PD).
• Correct
• The Database needs to be an accurate model of the PD.
• Consistent
• The database model should not contain redundant data.
• Flexible
• The database should adapt to changing requirements.
• Efficient
• Fast, scaleable and simple to administer.
Informix
user.conference
4
Object-Relational Data Model
• Based on Relational Model of Codd [1971]
• Relation as storage abstraction and organizing
principle
• Declarative programming as interface model
• An Object-Relational DBMS Provides
• Physical abstraction (no byte level details for ORSQL)
• Formal modeling techniques (normalization)
20 years of R&D have addressed most performance,
reliability and scalability issues.
Informix
user.conference
5
Theoretic Overview ( 1/2 )
• Domains ( kinds or classes of data objects)
• { a, b, c . . . }
• Attributes (named columns of a domain)
• { A, B, C . . . }
• Dom ( A )  a
• Relation Schema (set of attributes)
• { . . . X, Y, Z } where X  {A, B, C . . }
• Relation (set of n-ary tuples over a relation schema)
• { . . P, Q, R . . } For P over X, P 
 Dom ( Xi )
• We also say t: X  Dom ( X )
Informix
user.conference
6
Theoretic Overview ( 2/ 2 )
Operations to Manipulate Relations
• Projection (select sub-set of columns from relation)
• Given P over X, and Y  X,  < P, Y >  Q over Y s.t.  t  P,  t’  Q where t (
A ) = t’ ( A )  A  Y.
• Restriction (select sub-set of tuples from relation )
• Given P over X, and  ( X ),  < P,  >  Q over X s.t.  t  Q,  (t)  true.
• Join ( cross-product of two relations )
• Given P over X, and Q over Y, < P, Q >  R over ( X  Y ) s.t.  t  R,  t’  P
and t’’  Q where t [ X ] = t’, and t [ Y ] = t’’.
• Plus (Aggregate, Intersection, Division, Union
. .)
Informix
user.conference
7
Enough Greek!
What does it look like in practice?
• Declarative language programming
• Dynamic, run-time interpretation
CREATE TABLE Employees (
Id
Employee_Number
PRIMARY KEY,
Name
PersonName
NOT NULL,
Date_of_Birth
DATE
NOT NULL,
Address
MailAddress
NOT NULL,
LivesAt
st_point
NOT NULL,
Resume
Document
NOT NULL,
Voice_Key
Audio_Recording
NOT NULL,
Holidays
SET ( Period NOT NULL )
);
How do you get to here, starting with a bunch of
users, a small herd of programmers, and a budget.
Informix
user.conference
8
Development Road-map
1. Analyze the Problem Domain
• Document the high level structure (E-ER)
• Describe each kind of data (UML Class Diagrams)
2. Design and Implementation
• Use User-Defined Types (UDT) and User-Defined Functions
(UDF) for Objects
• Combine Objects into Schema Tables/Views
• Queries to map to user views
3. Verify Each Step
• Prototype UDT/UDFs
• Normalize
• Test performance against production scale loads
Informix
user.conference
9
Outline of Procedure: 3 Tier Model
.
CREATE OPAQUE TYPE PersonName
( internallength = variable,
maxlen = 64 );
CREATE FUNCTION getFamilyName (
.
.
CREATE TABLE Employees (
Id
Employee_Id PRIMARY KEY,
Dept Dept_Id
REFERENCES Dept ( Id ),
Name PersonName NOT NULL,
DOB DATE
NOT NULL
.
);
.
.
1. Conceptual
2. Logical
3. Physical
Informix
user.conference
10
Phase 1: Conceptual Modeling
• Describe User Views of Problem Domain
• Pictures instead of words
• Multiple Semantic Model Possibilities
• Extended Entity-Relationship (E-ER) Modeling
• Object-Role Modeling (ORM or NIAM)
• Universal Modeling Language (UML)
• Use E-ER in this Tutorial
• Familiar to most developers and analysts
• Concepts introduced here apply everywhere
Informix
user.conference
11
E-ER Diagrams for Dummies
Entity: Principle data objects in the
problem domain. Typically identified with
a noun term; Branch, Employee, etc
Branch
Branch
Product
Employees
Branch
has_a Relationship: Association between
Entities. Typically identified with a verb
term; Manufacturers, WorksFor, etc. has_a
relationships can have additional attributes.
Full_Time
Production
12
Sales
is_a Relationship: Inheritance relationship
between Entities.
Informix
user.conference
1.1 Describe High-Level E-ER
worked for
Branch
Employees
When
Contractors
manufactures
Full_Time
Customer
Production
Product
Sales
sold
Qty
Informix
user.conference
13
1.2 Describe Each Entity
Employee
Products
Customer
Name
Kind
Name
Kind
Name
Kind
Id
Employ_Num
Id
Cust_Num
Id
UPC_Code
Name
Person_Name
Contact
Person_Name
Name
LabelString
Address
Mail_Addr
Address
Deliver_Addr
Dimensions
Physical_Size
Resume
Document
Bill_Address
Mail_Addr
Capacity
Mass
Lives_At
Geo_Point
Available_In
SET{ Color }
Date_of_Birth
DATE
Price
Currency
For each entity, name its attributes, and decide
what kind of data is in each of them.
Informix
user.conference
14
E-ER Modeling Review
• This is Really Nothing New
• Better modeling of is_a relationships (inheritance)
• What Other Tasks?
• Note Logical Keys in Entities
• Record arity of has_a relationships (1:N, N:M, etc)
• Record mandatory and optional relationships
• What is New?
Resist the temptation to analyze down to SQL-92
types. Within E-ER modeling, maintain a degree
Informix
of abstraction.
user.conference
15
Handling Multiple Conceptual Views
• Multiple Semantic Model Possibilities
• Extended Entity-Relationship Modeling
• Object-Role Modeling
• Universal Modeling Language
• What about their Object Model?
User Object Models often contain elements of
interest to OR database developers. A UDT defined
in the database need not be used in a table.
Informix
user.conference
16
The ‘Object’ Concept
“The most brutalized term in computer science”
• What is an ‘Object’?
• Atomic unit of meaning encapsulating both state and
behavior.
• Objects mostly map to domains in Relational Theory
• Why is the Concept Useful?
• Intuitive way of describing ‘things’ in the data model
• These intuitions can drive user-defined type and
user-defined function design.
Informix
user.conference
17
1.3 Catalog the Kinds of Data
Product_Number
Color
Geo_Point
Currency
Product
String
Employee_Num
Document
Bonus_Plan
Employees
Physical_Size
Person_Name
Customer_Num
Period
Full_Time_Emp
Mass
Mail_Address
Delivery_Address
Quantity
Sales
Create a list of all of the different kinds of data
identified in the schema and in the object
models. Some of these will be in more than one
entity.
Informix
user.conference
18
1.4 Use UML Class Diagrams
Mail_Address
Delivery_Address
+ Address_Line_One
String
+ Address_Line_One
String
+ Address_Line_Two
String
+ Address_Line_Two
String
+ City
String
+ City
String
+ State
String
+ State
String
+ ZipCode
+ Country
Zip_Code
String
+ Mail_Address(String, String,
String, Zip_Code)
+ approximateMatches ()
+ Label ()
+ Equal ()
- Compare()
+ ZipCode
+ Country
Zip_Code
+ Delivery_Notes
Document
String
+ Delivery_Address(String, String,
String, Zip_Code, Document)
+ approximateMatches ()
+ Label ()
+ Equal ()
- Compare()
Informix
user.conference
19
Class Diagram Details
State Elements
(Data Structure)
Constructors
Extract Data
Modifiers
Operators
Kind of Data
in Element
GeoPoint
- Longitude
FLOAT
- Latitude
FLOAT
- Ellipsoid
Ellispoid_Enum
+ GeoPoint ( String ) -> GeoPoint,
+ GeoPoint( FLOAT, FLOAT) -> GeoPoint,
+ Latitude ( GeoPoint ) -> FLOAT,
+ Longitude ( GeoPoint ) -> FLOAT
+ Distance ( GeoPoint, GeoPoint ) -> FLOAT
+ Quadrant ( GeoPoint ) -> CHAR(2)
+ SetLatitude( FLOAT ) -> GeoPoint
+ SetLongitude ( FLOAT ) -> GeoPoint
[ Spatial Operators, R-Tree Support ]
[ Compare(), Equal, NotEqual ]
Interface
Methods
Why These?
Informix
user.conference
20
Implementation Tip # 1
• Object Classes are used in OR-SQL queries
• SELECT DISTINCT, UNION are SQL operations
• Merge-Join and Hash-Join for internal efficiencies
Always create a Compare() for any object that has an
Equal(). The resulting order might be meaningless, but
within the context of the query processor it can make
sense.
Informix
user.conference
21
1.5 Minimize Set of Object Classes
• Use Detailed Specifications to Identify
• synonyms and
• antonyms
• Identify ‘like’ Object Classes
• Inheritance allows re-use
• { Mail_Address, Delivery_Address }
• Describe Algorithms for all Behaviors
• What does it mean to have ‘Equal’ UPC?
• Space efficient internal representation
Informix
user.conference
22
Workload and Common Queries
• Document Common Queries and Processes
• Helps to flesh out Object Class specifications
• Useful for identifying additional kinds of data
• Useful for later functionality/scalability testing
SELECT E.Name
FROM Employees E
WHERE Birthday(E.Date_of_Birth) =
Birthday(TODAY);
Birthday
+ Month
INTEGER
+ Day
INTEGER
- From_Leap_Year
Note: Types don’t have to be
used to define tables. They
can be used to extend SQL.
23
boolean
+ Birthday( DATE ) -> Birthday ,
+ Print( Birthday ) -> String,
+ Equal( Birthday,Birthday)->
Informix boolean
. etc .
user.conference
By this Point you Out to Have:
• An E-ER Model Representing High Level View
• List of entities, and their structure
• Understanding of relationships among entities
• List of rules-- keys, constraints etc
• Class Diagrams of each Distinct Kind of Data
• Described in terms of Structure, and Behavior
• Analysis of algorithms in methods
Next Step: Transform this conceptual model into a
sound OR database design and implementation.
Informix
user.conference
24
Phase 2: Database Implementation
• UDTs and UDFs for each Domain
• Use DataBlade extension libraries
• Develop business objects from scratch
• Build Relational Schema
• Naïve translation of E-ER conceptual model
• Normalize the initial model
• Some special considerations
• Performance and Scalability Testing
• Create volume data quickly
• Workload test harness
Informix
user.conference
25
2.1 Implement Domains/Kinds of Data
• Several Mechanisms to Choose From
• Do I buy a DataBlade™ product?
• Do I build objects from scratch?
• What about the client side?
• If Build, Which Technique?
• Built-in type, ROW TYPE, DISTINCT TYPE,
OPAQUE TYPE
• Each mechanism has different properties
• How to Implement the UDFs?
• SPL, Java or ‘C’?
Informix
user.conference
26
Implementation Options
Mechanism Example
Built-In
Types
Strengths and Weaknesses
INTEGER, VARCHAR, DATE etc. These Mature and high performance because they
are standardized in the SQL-92 language are compiled into the ORDBMS. But they
specification.
are very simple. Good building-blocks for
other types.
DISTINCT
CREATE DISTINCT TYPE
Simple to create, and useful when what you
String AS VARCHAR(32);
want is something very close to another type.
ROW TYPES CREATE ROW TYPE Address (
Relatively easy to use means of combining
Address_Line_One String NOT NULL, pre-existing types into a more complex
Address_Line_Two String NOT
objects, and enforcing rules about contents.
NULL,
ROW TYPEs have several drawbacks that
City
String NOT NULL, makes them a poor choice for types to define
State
String,
columns.
ZipCode
PostCode,
Country
String NOT NULL
);
Java Classes Combination of Java UDRs with opaque
More complex to develop, but an excellent
data storage.
choice when you want code that runs in both
the outside, and inside, the DBMS.
OPAQUE
CREATE OPAQUE TYPE GeoPoint (
Most complex to develop, but these are the
TYPES
internallength = 16
most powerful in terms of performance,
Informix
);
scalability and the range ofuser.conference
object sizes that
can be supported.
27
Implementation Tip # 2
• DISTINCT TYPES Re-Use Pre-Existing Types
• Create EXPLICIT Cast between Parent and DISTINCT
• Overload that CAST to Enforce Data Integrity
CREATE DISTINCT TYPE Quantity
AS INTEGER;
-CREATE FUNCTION INTEGER2Quantity ( Arg1 INTEGER )
RETURNS Quantity
IF ( Arg1 < 0 ) THEN
RAISE EXCEPTION –746,0,
“Error: Quantity must be non-negative”;
END IF;
RETURN Arg1::Quantity;
END FUNCTION;
-DROP CAST ( INTEGER AS Quantity );
Informix
CREATE IMPLICIT CAST
user.conference
( INTEGER AS Quantity WITH INTEGER2Quantity );
28
UDF Implementation Performance
• Choice of Three Languages
• Stored Procedure Language, Java and ‘C’
• Experiment: Pop( INTEGER ) and IsPrime ( INTEGER )
Seconds to Compute
10,000
Performance of UDR Languages
1000
'C' Prime Performance
800
'C' Pop Performance
600
Java Prime Performance
400
Java Pop Performance
SPL Prime Performance
200
SPL Pop Performance
0
4
5
6
7
8
9
10
Log ( Size ( Value s ) )
Informix
user.conference
29
Implementation Tip # 3
• Avoid SQL Callbacks Inside User-Defined Functions
• UDFs with SQL are variant and can’t be parallelized
• SQL inside a UDF kind of violates the philosophy
CREATE OPAQUE TYPE Mass (
internallength = 16, alignment = 8
);
--- The code implementing this external function includes all
-- of the conversion rates compiled into it. This permits it
-- to perform conversion operations hyper-fast.
-CREATE FUNCTION Compare (Mass ,Mass)
RETURNS INTEGER
WITH ( NOT VARIANT, PARALLELIZABLE )
EXTERNAL NAME
”$INFORMIXDIR/extend/Massbin/Mass.bld(MassCompare)"Informix
user.conference
LANGUAGE C;
30
2.2 Create the Schema Tables
• Combine the Types into Schema Tables
• Simply CREATE TABLE if it is not in hierarchy
• Create Hierarchy of ROW TYPE instances if it is
• For simple has_a relationships, extend the dependent table
with additional column for the first table’s primary key
column.
• For multi-variant has_a relationships, create a table with one
column per key, and extra columns
Similar procedure to the one used transforming ER to
RDBMS, except in the handling of Inheritance.
Informix
user.conference
31
A Note on Keys
• The Relational Key Concept is Important
• Identifies correctness constraints within schema
• Drives normalization, which helps ensure consistency
• Formerly, a Key is defined as:
A column where no two values are equal.
Products
Id
B-SH-3001
B-P-312-A
B-P-312-L
B-PACK-411
Name
Shoe Box
Small Pizza
Large Pizza
Computer
Dimensions
’18 IN x 6 IN x 6 IN’
’21 IN x 21 IN x 2 IN’
’26 IN x 26 IN x 2 IN’
’40 IN x 40 IN x 40 IN’
Capacity
’11 LB’
‘3 LB’
‘4 LB’
’50 LB’
Available_In
{white, black}
{brown}
{brown}
{brown,white}
Price
$US 0.12
$US 0.25
$US 0.25
$US 5.00
Informix
user.conference
32
Example Schema (1/3)
-CREATE TABLE Products (
Id
Product_Num
NOT NULL PRIMARY KEY,
Name
LabelString
NOT NULL,
Dimensions
Physical_Size NOT NULL,
Capacity
Mass
NOT NULL,
Available_In
SET{Color NOT NULL},
Price
Currency
NOT NULL
);
-CREATE TABLE Customers (
Id
Customer_Num
NOT NULL PRIMARY KEY,
Contact
Person_Name
NOT NULL,
Address
Mail_Address
NOT NULL,
Delivery
Delivery_Address
);
Informix
user.conference
33
Example Schema (2/3)
-CREATE ROW TYPE Sales_Employee_Type (
Base_Salary Currency
NOT NULL,
Sales_YTD
Currency
NOT NULL
) UNDER Full_Time_Employee_Type;
-CREATE TABLE Sales_Reps
USING Sales_Employee_Type
UNDER Full_Timers;
-CREATE TABLE Product_Sales_to_Customers (
Customer
Customer_Num
NOT NULL REFERENCES Customers ( Id ),
Product
Product_Num
NOT NULL REFERENCES Products ( Id ),
Sales_Rep Employee_Num
NOT NULL REFERENCES Sales_Reps ( Id ),
Qty_Sold
Quantity
NOT NULL
);
Informix
user.conference
34
Example Schema (3/3)
--- Query against the Product.Capacity column showing
-- behavior of the Mass object class invoked.
-SELECT P.Id,
P.Capacity
FROM Products P
Mass
WHERE P.Capacity > ’3 KG’
ORDER BY P.Capacity ASC;
{ Ounce, LB, Gram, KG . .}
- Unit
- Quantity
Id::Product_Num Capacity::Mass
B-SH-3001
B-PACK-411
35
’11 LB’
’50 LB’
float
+ Mass( String ) -> Mass ,+ Print( Mass ) -> String,
+ Equal( Mass, Mass ) -> boolean,
+ LessThan( Mass, Mass ) -> boolean,
+ LessThanOrEqual( Mass, Mass ) ->boolean,
+ GreaterThan( Mass, Mass ) ->boolean,
+ GreaterThanOrEqual( Mass, Mass ) -> boolean,
+ NotEqual( Mass, Mass ) -> boolean,
- Compare( Mass, Mass ) -> INTEGER,
+ Plus( Mass, Mass ) ->Mass,
+ Minus( Mass, Mass ) ->Mass,
Informix
+ Divide ( Mass, INTEGER ) ->Mass,
user.conference
+ Times ( Mass, INTEGER ) -> Mass
2.3 Normalization
• Normalization Concept & Procedure Same
• Key, Whole Key, Nothing but the Key
• Keys can be User-defined Types (of course)
• New “corner cases”: spatial key (topography), temporal
key
• What about COLLECTION Attributes?
• Good for domains of limited cardinality
• { Work Days, Sizes_Available, Other_Phone_Nums }
• Bigger sets, bigger domains, less useful
• No Indexing!
• Nested Tables are Right Out.
COLLECTION works well for UDF varargs.
Informix
user.conference
36
2.4 Load and Test (and test)
• Can Use ORDBMS to Generate Data
INSERT INTO Products
( Name, Dimensions, Capacity, Available_In,
SELECT MakeString( Random(1000), 24 ),
Random_Dimensions(),
Random_Mass(),
Random_Color_Set(),
Random_Currency()
FROM TABLE(SET{0,1,2,3,4,5,6,7,8,9}) N1 (
TABLE(SET{0,1,2,3,4,5,6,7,8,9}) N2 (
TABLE(SET{0,1,2,3,4,5,6,7,8,9}) N3 (
Price )
Num ),
Num ),
Num );
• Physical Tuning
• IDS.2000 Little Different from 7.X
• Remember to partition SBLOB data, as well as table
data (text indexing particularly!)
Informix
user.conference
37
Organizing Development Teams
• Developers Writing Extensions Get own Instance
• Writing UDFs means debugging, which will halt
the IDS instance.
• Testing with the DBMS
• Development Staging Process
• Individual Instances, to Test Environment
• only released to Production when happy.
• Looks like Mainframe development.
• Prototyping an Annealing
• SPL pass 1, Java or ‘C’ for what’s slow
Informix
user.conference
38
Architectural View
JDBC, ODBC,
ESQL/C, RMI,
Foo-ML, etc
Java, C, etc
Application Process
JDBC, ODBC,
ESQL/C, RMI
Java or SPL
Application Process
UDF
‘C’, SPL, Java
SQL
JDBC (if Java)
SQL
Types in Schema
Built-in, DISTINCT,
or OPAQUE
UDF
‘C’, SPL, Java
Informix
user.conference
39
Err..What was that?
The Following Principles Seem to Work:
• No SQL Inside a UDT Method
• Objects are orthogonal to schema structure
• Do not put SQL inside ‘C’ User-defined Routines
• No performance advantage
• Makes development hard
• UI calls App Logic calls SQL Still Applies
• Impedance mis-match between SQL DBMS and
languages like Java are minimized.
Informix
user.conference
40
Implementation Tip # 4
• User-defined Functions Work in SPL
• Any ‘C’ or Java (or SPL) UDF can be called as subroutine
• Most useful with iterator functions
CREATE FUNCTION GetFirstName ( Arg1 LVARCHAR )
RETURNING lvarchar;
DEFINE lIntVals LIST(lvarchar NOT NULL);
DEFINE lvRetVal lvarchar;
LET lIntVals = Split(Arg1, ' ' );
FOREACH cursor1 FOR SELECT * INTO lvRetVal
FROM TABLE(lIntVals)
EXIT FOREACH;
END FOREACH;
RETURN lvRetVal;
END FUNCTION;
-EXECUTE FUNCTION GetFirstName (“John Fitzgerald Kennedy”)
John
Informix
user.conference
41
On the Client Side
• 4GL is Perfectly Reasonable for ORDBMS
• OR-SQL queries can take & return strings &
numbers, and can invoke stored procedures.
• ‘C’, C++
• ESQL/C, ODBC, CLI are workable
• DataBlade™ products ship with client libraries and
header files
• Java
• JDBC 2.0 is majestic mojo
• Type <-> Class mapping can make the interaction
seamless
Informix
user.conference
42
‘Organic’ Applications
Another way to look at Database
Development
• What is an ORDBMS?
• An ORDBMS is a Software Back-Plane . . .
• supporting a declarative/query-centric interface.
• How do you use it?
• Combine objects to solve problems.
• No more system-wide upgrades.
• Component-centric development.
Informix
user.conference
43
Summary and Conclusions
• ORDBMS Development Is
• Best approached in a ‘holistic’ fashion.
• Consider the design of at high-level (relational)
• and low-level (objects and components)
• More Implementation Tips
http://www.informix.com/idn
http://www.iiug.org/software
http://examples.informix.com/
Informix
user.conference
44
References and Further Reading
INFORMIX Tech Notes: Q1, Q2 of 2000 contains two long
papers describing this in more detail.
Roy, Jacques Server-Side Programming in C:
INFORMIX Dynamic Server 2000 INFORMIX Press.
1999
And lastly!
Brown, Paul G. Developing Object-Relational Database
Applications INFORMIX Press. Dec, 2000.
Informix
user.conference
45