Jacques Roy - Washington Area Informix User Group

Download Report

Transcript Jacques Roy - Washington Area Informix User Group

Informix User Forum 2005
Moving Forward With Informix
IDS Extensibility
for Business Advantage
Jacques Roy
Technical Lead/IBM
Atlanta, Georgia
December 8-9, 2005
Agenda
• What is Database Extensibility?
• Database and Extensibility background
• IDS features
• Extensibility Examples
• DataBlade Modules
• Bladelets
• Considerations for building your own
What is Database Extensibility?
• Ability to add business components in the database
– Tailor the database to the business environment
• Reduce application complexity
– Put the processing where it makes the most sense
– Set processing provided by the database
• Higher performance
– Less data movement, better data representation, better indexing
• Faster development, lower maintenance cost
Databases are not commodities!
Database Background
• Hierarchical/Network models
– Tied to the physical representation of the data
– Solves one problem very well
• Relational Model
– Logical organization of the data
– Multiple views of the data
• Object-Relational Model (extensibility)
– Business representation and processing
– Adapts to the business environment
Extensibility Background
• Historic Approach
– user exits
– device drivers
• Relational Databases
– triggers, constraints
– Stored Procedures
• New development platforms
– Web Server: CGI, NSAPI/ISAPI
– App Servers: J2EE
• Object-Relational Capabilities
– Relational Framework
Databases are not commodities!
Objects Instantiation Problem
Corporation
DB
Regions
Branches
Accounts
...
...
Ex:
10 regions
100 branches
10,000 accnts
...
...
...
...
At 1KB per object you use 10GB of memory!
Over 10M objs
And 20M msgs
IDS Extensibility Features
• User-defined types
• Complex types
– Distinct, opaque
•
•
•
•
•
Table/Type inheritance
Functional index
R-Tree index
Smart Large Objects
More. . .
Extensions can be
written in:
SPL, C, Java
– Row, set, multiset, list
•
•
•
•
Polymorphism
User-Defined Function
User-Defined Aggregate
Primary/Secondary
Access Methods
Performance Relativity
•
•
•
•
•
336MHz SPARC  5200MB/sec
Memory  1300MB/sec
System Bus  100MB/sec
SCSI Bus  20MB/sec
SCSI Disk  0.24 to 80MB/sec
– Track-to-track seek time, rotation latency, etc.
• Network  1 to 10MB/sec
• Users  0.00001 MB/sec
(600 char/min, ~120 words/min)
CPU
cache
System bus
Memory
Disk
Controller
When to Use UDFs/UDRs
• Eliminate large data transfer
– Transfer time is much larger than processing time
• Simplify or improve processing
–
–
–
–
Eliminate table scan
Eliminate application set processing
Define new sorting order
Replace store procedures with aggregates (IDS only)
• Provide business processing to applications
– Consistency of result
• Eliminate the need for custom interfaces
Databases are not commodities!
Better Grouping
• Quarter() function:
SELECT Quarter(date), SUM(income)
FROM orders
WHERE Quarter(date) LIKE '2003Q%‘
GROUP BY 1
ORDER BY 1;
• AgeGroup() function:
SELECT AgeGroup(birthdate, 20, 10)
AgeGrouping,
SUM(total_price) Total
FROM customer c, orders o, items I
WHERE c.customer_num = o.customer_num
AND
o.order_num = i.order_num
GROUP BY 1
AgeGrouping
ORDER BY 1 DESC;
40-49
50-59
Total
$12646.97
$ 5507.80
Quarter Function
#include <mi.h>
mi_lvarchar *quarter(mi_date date, MI_FPARAM *fparam)
{
mi_lvarchar *RetVal;
/* The return value. */
short
mdy[3];
mi_integer ret, qt;
char *pdate, buffer[10];
ret = rjulmdy(date, mdy); /* Extract month, day, and year from the date */
qt = (mdy[0] - 1) / 3; /* calculate the quarter */
qt++;
sprintf(buffer, "%4dQ%d", mdy[2], qt);
RetVal = mi_new_var(strlen(buffer));
mi_set_vardata(RetVal, buffer);
/* Return the function's return value. */
return RetVal;
}
Compiling and Linking
COMPILE=-I$(INFORMIXDIR)/incl/public -O -c
cc -DMI_SERVBUILD $(COMPILE) quarter.c
ld -G -o mylib.bld quarter.o
chmod a+x mylib.bld (775 or read only to other)
Creating the function
CREATE FUNCTION quarter(date)
RETURNS varchar(10)
WITH (not variant, parallelizable)
external name "$INFORMIXDIR/extend/class/qlib.bld(quarter)"
LANGUAGE C;
GRANT EXECUTE ON FUNCTION quarter (date) TO public;
Removing the function
DROP FUNCTION quarter(date);
UDR Processing
Stored Procedures
SQL
group
UDR
order
scan
scan
order
scan
scan
Using SPL for Extensions
• Better date manipulation:
–
–
–
–
Day of the year
Week of the year
Week of the month
Quarter
CREATE FUNCTION quarter(dt date)
RETURNS integer
WITH (NOT VARIANT)
RETURN (YEAR(dt) * 100) + 1 +
(MONTH(dt) - 1) / 3;
END FUNCTION;
• Unit conversion
– Feet  Meters
– Gallons  Liters
– Fahrenheit  Celsius
• Functional indexes
Example:
EXECUTE FUNCTION
quarter('9/2/2005');
(expression)
200503
See Developer Works Informix zone:
www-128.ibm.com/developerworks/db2/zones/informix/
The Node Type
"Hierarchically" aware type: Node
Pre-processed the hierarchical
relationships
ƒ ex: Chapter 11, section 7, paragraph 3:
11.7.3
Add new way to relate objects to one
another
ƒ IsAncestor(), IsChild(), IsDescendant(),
IsParent()
Can change processing from
exponential to linear
examples:
policies, product classification,
bill-of-material, LDAP, XML, etc.
1.0
1.1
1.2
1.3
1.2.1 1.2.2 1.2.3
1.2.3.2 1.2.3.3 1.2.3.4
1.2.3.4.5
Node Application Example
Geo Hierarchy:
country > state > metro > city
Policy
GeoNode
Resource
Q: What policy apply to the Hyatt in Denver?
A: A Colorado Policy
Bill-of-Material Example
A component can be made up of
multiple components
A component is made up of
multiple parts
component
CREATE TABLE Components (
ComponentId Node,
Sequence
integer,
Quantity
integer,
Name
varchar(30),
PartNumber
integer
);
CREATE TABLE Parts (
PartNumber
integer,
Name
varchar(30),
ProviderId
integer,
Price
Money(10,2)
);
Parts
Node performance vs standard
relational
Levels
2
Count
6
Timing Ratios
Traditional/Node
1.0
3
42
1.8
4
258
8.4
5
1554
41.0
6
9330
37.75
Databases are not commodities!
For more information on the
Node Type
“Downloadable bladelets and demos”
http://www106.ibm.com/developerworks/db2/zones/informix/library/samples/db_
downloads.html
Or
“Open-Source Components for Informix Dynamic
Server 9.x”
Jacques Roy, William W. White, Jean T. Anderson, Paul G. Brown
ISBN 0-13-042827-2
What is XML?
• XML is a Structured Document Definition
• It Represents a Hierarchy of Elements
• Two Main Types of XML Documents
– Document and Data
• Two Ways to Process XML Documents
– Text Retrieval System
– Explode the XML Document into Columns
Generating XML
• Converting rows to XML format
SELECT genxml("customer", customer) FROM customer;
or
SELECT genxml("stats",ROW(customer_num, COUNT(*))) FROM
cust_calls GROUP BY customer_num;
• Creating a complete XML document
EXECUTE FUNCTION genxmlhdr("customer_set",
"SELECT * FROM customer");
• User-Defined Aggregate:
SELECT aggrxml(customer, "customer_set")
FROM customer;
See: Generating XML from IDS 9.x
//www106.ibm.com/developerworks/db2/zones/informix/library/techarticle/0
302roy/0302roy2.html
Other XML Possibilities
• Decompose (Shred) an XML document into a table row
• Generate rows from an XML document stored in the database
• Use a generic table to keep track of the element type and
position in the hierarchy
• Use the XML Path language to extract values from an XML
document
–
–
–
–
–
/table/row/lname
/table/row[1]/lname
/table/row[lname='Pauli']/customer_num
/table/row[@operation='I']/lname
/table/*/lname
Stay Tuned to DeveloperWorks!
Fine-Grained Auditing
• Use triggers and user-define routine
• Register event processing
– Commit or rollback
• Send events to file or outside process
• Use a generic function for any table
• Available in IDS 9.40.xC4 or later
See the developerworks article:
Event-driven fined-grained auditing with Informix Dynamic Server
//www-106.ibm.com/developerworks/db2/library/techarticle/dm-0410roy/
New Trigger Use
CREATE TRIGGER tab1instrig
INSERT ON tab1
FOR EACH ROW (
EXECUTE PROCEDURE
do_auditing2()
)
;
Event-Driven Architecture
Commit/Rollback
6
7
5
MonitorProgram
Callback
Statement
8
1
Register
3
2
4
Trigger
Table
EventTable
Inheritance and Polymorphism
loans
Manufacturing
Telco
Retail
Healthcare
Clothing
Food
Novelties
Financial
Services
Inheritance and Polymorphism (cont.)
 SELECT branch_id, AVG(risk(loans))
FROM loans
GROUP BY 1
HAVING AVG(risk(loans)) > 1
ORDER BY 2 DESC;
 SELECT branch_id, AVGRISK(loans)
FROM loans
GROUP BY 1
HAVING AVGRISK(loans) > 1
ORDER BY 2 DESC;
Replacing Store Procedures
with UDA
• Business problem: Merger multi-polygon types (ESRI)
• Original Solution:
– SPL stored procedure (82 lines, 3 SELECT statements, 1 insert
statement, 2 embedded FOREACH)
• New Solution: User-Defined Aggregate
– 23 lines of SPL, no SQL, no FOREACH
SELECT a.user_id, a.case_id, a.case_type,
a.serial_nr_full,a.spt_disp_theme_cd,
do_union(a.shp) shp
FROM case_shp a
WHERE user_id = "user0001“
GROUP BY 1, 2, 3, 4, 5
INTO TEMP my_temp_table WITH NO LOG;
User-Defined Aggregate Code
CREATE FUNCTION do_union_iter(state lvarchar, arg lvarchar)
RETURNING lvarchar
DEFINE retval
lvarchar;
IF (state IS NULL ) THEN
RETURN arg ;
END IF
IF (arg IS NULL) THEN
RETURN state ;
END IF
LET retval = state || arg ;
RETURN retval ;
END FUNCTION ;
CREATE AGGREGATE do_union
WITH (ITER=do_union_iter, COMBINE=do_union_iter);
User-Defined Aggregates
• A lot more flexible than standard aggregate functions
• Can take any type as input
– ex.: row type
• Can take an initialization parameter of any type
– ex: row type
• Can return complex types
– ex: list
Fabric Classification
• Business Problem:
Provide an efficient way to select fabrics
• Indexing Colors:
Cielab coding (3 dimensions) and other attributes
(Use the R-tree multi-dimensional indexing method)
• Fabric Type Hierarchy
Requires a hierarchy-aware type
ex: We want a “natural” fabric
• Fabric Style and Patterns
ex: What does “Victorian” mean?
Other examples
• Other multi-dimensional problems:
– 3D CAD drawings
• Support for Globally Unique Identifier (GUID)
http://www128.ibm.com/developerworks/db2/library/techarticle/dm0401roy/index.html
• Implementation of unsigned integer
• Soundex/Phonex
Key to Successful Extensibility
Project
• Start small, develop your expertise
– remember the first OO projects
• Use pre-built extensions
• Study examples
• Approaches:
– Use DataBlade Modules
– Use Bladelets
– Build your own
What are DataBlade Modules?
• Business Components
– A set of functionality that solves a specific business
problem
• Building Blocks
• Can include:
– User-defined types
– User-defined Functions
– Tables, views
– Client component
• An DataBlade can come from IBM, a third party
or be built in-house
IDS DataBlade Modules
• Spatial
– Free of charge
– with IDS 9.30 and higher
• Geodetic
• TimeSeries
• NAG
• Real-Time Loader
• Large Object Locator
(LLD)
– Comes with IDS 9.x
• MQSeries
– Included in IDS 10.0 xC3
• C-ISAM
• Image Foundation
• Video Foundation
• Web
• Excalibur Text
• Excalibur Image
• XSLT
Spatial is Everywhere
• Where are my stores located related to my distributors?
• How can I efficiently route my delivery trucks?
• How can I micro-market to customers fitting a particular
profile near my worst performing store?
• How can I set insurance rates near to flood plain?
• Where are the parcels in the city that are impacted by a
zoning change?
• Which bank branches do I keep after the merger based
on my customers locations (among other things)?
Complex Spatial Example
•Raise the level of abstraction at the
database level.
•Buy components to solve common
problems, and build components to
achieve a competitive edge
CREATE TABLE e_Yellow_Pages (
Name
VARCHAR(128)
NOT NULL,
Business Business_Type
NOT NULL,
Description Document
NOT NULL,
Location
GeoPoint
NOT NULL,
Occupied SET( Period NOT NULL )
);
-- “Show me available service stations specializing
-- in Porsche brakes within 30 miles of where
-- I am?”
-SELECT Y.Name
FROM e_Yellow_Pages Y
WHERE Contains ( Y.Location,
Circle( :GPS_Loc, ‘30 Miles’ ) )
AND Y.Business MATCH “Automotive Service”
AND DocContains( Y.Description,
“Porsche AND brakes” )
AND NOT Booked ( Y.Occupied,
Period (TODAY,TODAY+5));
TimeSeries DataBlade Module
• A time series is a set of data as it varies over time
• TimeSeries DataBlade optimizes storage usage
– 50% Savings not uncommon
• Optimized Access Time
– 10 times performance improvement typical
• Calendars
– Defines period of time that data may or may not be collected
• SQL, Java, and C interfaces
• VTI Interface
– Makes a time series look like a regular table
• Office Connect Interface
– Web enables Microsoft Excel plug-in
Who’s Interested in TimeSeries
• Capital Markets
– Arbitrage opportunities, breakout signals, risk/return
optimization, portfolio management, VaR calculations,
simulations, backtesting...
• Telecommunications:
– Network monitoring, load prediction, blocked calls (lost
revenue) from load, phone usage, fraud detection and
analysis...
• Manufacturing:
– Machinery going out of spec; process sampling and analysis
• Logistics:
– Location of a fleet (e.g. GPS); route analysis
• Scientific research:
– Temperature over time...
The NAG DataBlade Module
• NAG: Numerical Analysis Group
– over 30 years experience
– Experts in numerical and statistical computation
– Reputation for accuracy and performance
• Business functions: convert timeseries data to vectors and
vectors of returns; present value calculations; ErlangB
(network) calculations; data export functions
• NAG Fortran functions: correlation and regression analysis;
variance-covariance matrix generation; optimization;
eigenvectors, eigenvalues; ...
• Examples:
– Oil industry: temperature variability down an oil well
– Finance: calculate volume weighted average price and volatility
– Spatial routing: Find the shortest path between two points
IDS Bladelets
• Located at:
http://www106.ibm.com/developerworks/db2/zones/informix/library/samples/db_downlo
ads.html
and
http://www.iiug.org/ver1/software/index_ORDBMS.html
• Bladelets:
– mrLvarchar, Node, regexp, shape, exec, period, etc.
• For detailed information, see:
"Open-Source Components for Informix Dynamic Server 9.x "
Jacques Roy, William W. White, Jean T. Anderson, Paul G. Brown
ISBN 0-13-042827-2
includes: Node, Period, ffvti, exec, shape, sqllib/IUtil, regexp,
mrLvarchar, JPGImage
mrLvarchar and regexp
• Store data efficiently based on its length
– Document data vary in length
• Many business data set include such documents
– Web pages, XML documents, product description, articles,
etc.
• mrLvarchar stays in row when shorter than 2KB, in
BLOB otherwise
• Includes useful functions
– Snip, clip, concat, concatAll, Upper, Lower, Length, Instr,
regexp functions
• RegExp
– regexp_match, regexp_replace, regexp_extract, regexp_split
– Search on mrLvarchar
– more complete search capabilities than LIKE and MATCHES
Period
• Time periods comparisons based on dates or datetimes
– Manages information about fixed intervals in a timeline
• Rich comparison functions
– Equal(), NotEqual(), WithinNotTouches(), Within(),
ContainsNotTouch(), Contains(), AfterTouches(), BeforeTouches(),
Overlap(), Before() After()
• R-Tree Indexing
• Can be used in any type of scheduling
– Hotel room reservations
– Consultant scheduling
– Equipment scheduling
• Advantages:
• simpler SQL, Range indexing (R-Tree), proper behavior of a
range
• Example:
"Are there any situations where two different trains are scheduled on
the same track over the next week?"
New Applications
• Tracking Systems (TimeSeries, Spatial)
– RTD buses, U-Haul, Cell phones
• Customer Services: closest location
• Security Systems: face recognition, fingerprints,
etc.
• Handling hierarchical problems:
– material types, policies, XML, bill-of-material, etc.
WHAT ABOUT YOUR BUSINESS PROBLEMS?
Building Your Own Extensions
• Use the DataBlade Development Kit (DBDK)
• Learn the DataBlade API
– Many ESQL/C functions are also part of the DAPI
• Review the latest documentation in the release directory
• For More Information on the DataBlade API:
"Informix Dynamic Server.2000:Server-Side Programming
in C"
Jacques Roy ISBN 0-13-013709-X
Development Environment
• SPL
– Standard SPL environment
•C
– Include directory: $INFORMIXDIR/incl/public
– Makefile generated by DBDK
– DBDK Include file for Makefile:
$INFORMIX/incl/dbdk/makeinc.<platform>
• Java
– Server configuration: onconfig parameters
(see release notice and machine notes)
– $(INFORMIXDIR)/extend/krakatoa/krakatoa.jar
DBDK Components
• GUI tool, runs only on Windows NT
• Bladesmith
– Defines/build DataBlade modules
– Generates functional tests
– Generates script info for install
• Bladepack
– Organize distribution
– Generates installshield script
• Blade manager
– Stores info in the database
– install/uninstall DataBlade Modules in the application database
Programming Environment
• Proprietary Thread implementation
• Non-Preemptible Threads
• Multiple Processes on UNIX
• Multiple Threads on NT
Threading Restrictions
• Function Libraries
– Re-entrant
• Signals
• Memory Allocation
• Blocking Calls
– File Access
Dynamic Libraries
• Used for Server-Side Functions
• UNIX: dlopen(), dlsym(), dlclose()
• NT: loadLibrary(), GetProcAddress(),
FreeLibrary()
• Symbol Visibility
Conclusion
• IDS 9.x is a framework for business solutions
• You can use building blocks to speed up the
creation of solutions
• The database can adapt to the business
environment
• IDS is a full fledge partner in your business
processing
• Faster Time-to-Market, Higher Performance,
Lower Maintenance
Business Advantage!
Databases are not commodities!
Informix User Forum 2005
Moving Forward With Informix
IDS Extensibility
for Business Advantage
Jacques Roy
[email protected]
Atlanta, Georgia
December 8-9, 2005