ORMs and Hibernate - Data Systems Group
Download
Report
Transcript ORMs and Hibernate - Data Systems Group
ORMs and Hibernate
Glenn Paulley, Director, Engineering
http://iablog.sybase.com/paulley
What is object-relational mapping?
• Object-oriented programming technologies are
typically used to implement business logic
• Relational databases are used for persistent data
storage
• Impedance mismatch between the two
paradigms: objects vs. relations
Estimated that 30-40% of a JDBC application involves
coercing data from tuples to object instances and back again
• ORM toolkits are designed to address this
impedance mismatch
61 different ORM toolkits are listed in Wikipedia for Java, C++,
Delphi, Ruby, .NET, PHP, Python, Perl
2
What is object-relational mapping?
• To exploit object behaviour fully, data-access
from within an object-oriented programming
language should offer:
Separation of concerns
Information hiding
Inheritance
Change detection
Uniqueness capability
Database independence
3
Change detection
• ORM toolkits require mechanisms to track
changes to objects made by the application
When a transaction is complete, write the changes to the
database within an atomic transaction
Need appropriate guarantees to prevent lost updates
4
Uniqueness
• Mappings are usually a correspondence between a
row in a normalized table and a class instance
Specified using metadata
For example, a row of the Employee table will correspond to an
instance of the Employee object within the application
Mappings are often not isomorphic
Sophisticated ORMs such as Hibernate and LINQ permit object
models that differ substantially from the underlying relational store
Object-oriented language features offer greater semantic flexibility in
application design than 1NF values from a relational database
• Need to establish a correspondence between an inmemory object and a database row
Must be independent of how the object was acquired: a database
query, or navigating a reference to another object
Predicated on the existence of primary keys in the database
5
Database independence
• Many ORM toolkits attempt to offer database
independence, so that applications can be ported
from one DBMS to another
Create common APIs and models to interact with a variety of
DBMS platforms
• Useful with mobilized applications where the
consolidated database is one DBMS, and local
databases are different
6
Why are ORMs useful?
• Eliminates tedious, repetitive code that
instantiates object instances from tuples using a
SELECT statement and a CURSOR
• Insulates, to some extent, the application
developer from vendor-specific SQL extensions
• Permits the application developer to exploit
object-orientation and model and manipulate the
application view differently from the relational
model
• Data manipulation can be done at the object
level, rather than (only) at a SQL statement level
7
Challenges of ORMs
• ORM toolkits introduce an additional level of
complexity to the application
Example: Java Hibernate 3.2.6 is
266 packages, 1938 classes, 18,680 functions, over 118K LOC
Can be difficult to debug, perform performance analysis
Most frameworks suffer from a lack of appropriate tools
• Performance analysis is problematic because the
application’s behaviour is not tied directly to
specific interactions with the database
Complex mappings may cause very complex SQL queries to
be generated
Can be difficult for the application developer to understand
what caused their construction
8
Complex SQL (LINQ generated)
SELECT
[Project9].[ContactID] AS [ContactID],[Project9].[C1] AS [C1],[Project9].[C2] AS [C2],[Project9].[ContactID1] AS [ContactID1],[Project9].[SalesOrderID] AS
[SalesOrderID],
[Project9].[TotalDue] AS [TotalDue]
FROM ( SELECT
[Distinct1].[ContactID] AS [ContactID],
1 AS [C1],
[Project8].[ContactID] AS [ContactID1],
[Project8].[SalesOrderID] AS [SalesOrderID],
[Project8].[TotalDue] AS [TotalDue],
[Project8].[C1] AS [C2]
FROM
(SELECT DISTINCT [Extent1].[ContactID] AS [ContactID]
FROM [DBA].[Contact] AS [Extent1]
INNER JOIN [DBA].[SalesOrderHeader] AS [Extent2]
ON EXISTS (SELECT cast(1 as bit) AS [C1]
FROM ( SELECT cast(1 as bit) AS X ) AS [SingleRowTable1]
LEFT OUTER JOIN (SELECT [Extent3].[ContactID] AS [ContactID]
FROM [DBA].[Contact] AS [Extent3] WHERE [Extent2].[ContactID] = [Extent3].[ContactID] )AS [Project1] ON cast(1 as bit) = cast(1 as bit)
LEFT OUTER JOIN (SELECT [Extent4].[ContactID] AS [ContactID]
FROM [DBA].[Contact] AS [Extent4] WHERE [Extent2].[ContactID] = [Extent4].[ContactID] ) AS [Project2] ON cast(1 as bit) = cast(1 as bit)
WHERE ([Extent1].[ContactID] = [Project1].[ContactID]) OR (([Extent1].[ContactID] IS NULL) AND ([Project2].[ContactID] IS NULL)) )
) AS [Distinct1]
LEFT OUTER JOIN
(SELECT [Extent5].[ContactID] AS [ContactID], [Extent6].[SalesOrderID] AS [SalesOrderID], [Extent6].[TotalDue] AS [TotalDue], 1 AS [C1]
FROM [DBA].[Contact] AS [Extent5]
INNER JOIN [DBA].[SalesOrderHeader] AS [Extent6]
ON EXISTS (SELECT cast(1 as bit) AS [C1]
FROM ( SELECT cast(1 as bit) AS X ) AS [SingleRowTable2]
LEFT OUTER JOIN (SELECT [Extent7].[ContactID] AS [ContactID]
FROM [DBA].[Contact] AS [Extent7] WHERE [Extent6].[ContactID] = [Extent7].[ContactID] )AS [Project5] ON cast(1 as bit) = cast(1 as bit)
LEFT OUTER JOIN (SELECT [Extent8].[ContactID] AS [ContactID]
FROM [DBA].[Contact] AS [Extent8] WHERE [Extent6].[ContactID] = [Extent8].[ContactID] )AS [Project6] ON cast(1 as bit) = cast(1 as bit)
WHERE ([Extent5].[ContactID] = [Project5].[ContactID]) OR (([Extent5].[ContactID] IS NULL) AND ([Project6].[ContactID] IS NULL))
)
) AS [Project8]
ON ([Project8].[ContactID] = [Distinct1].[ContactID]) OR (([Project8].[ContactID] IS NULL) AND ([Distinct1].[ContactID] IS NULL))
) AS [Project9]
ORDER BY [Project9].[ContactID] ASC, [Project9].[C2] ASC
9
Equivalent SQL query
select Extent6.ContactID,
1 as C1,
1 as C2,
Extent6.ContactID as ContactID1,
Extent6.SalesOrderID as SalesOrderID,
Extent6.TotalDue as TotalDue
from DBA.SalesOrderHeader as Extent6
order by Extent6.ContactID asc
Can your query optimizer get there?
10
Challenges of ORMs
• Object-instance-at-a-time navigation through the
object model can result in multiple, separate
interactions to the database server to retrieve the
rows to create the objects
Termed the N+1 Selects problem
Application developer must tradeoff prefetching applicability in
various scenarios
Objects are not partially instantiated; all object attributes are required
for the constructor
• Transactional semantics are complex once caching
is introduced
Transactional semantics often differ across DBMSs, even with the
identical isolation level
Developers must (still) be aware of the potential for lost updates
11
Challenges of ORMs
• The SQL dialect supported by ORM toolkits is
typically a very restricted subset of ANSI SQL
• ORM toolkits often support only limited,
straightforward DDL for schema
creation/modification
Not recommended for production applications
12
Important aspects of ORM toolkits
•
•
•
•
•
•
Mapping specification
Query language
Persistence
Class inheritance
Fetch strategies
Caching
• We look and see how these are implemented in
Hibernate, a popular open-source ORM toolkit for
Java applications
13
Hibernate as an example of an ORM
Brief introduction to Hibernate
• Open-source, LGPL Java ORM toolkit
• Originally developed by Christian Bauer, Gavin
King, and a worldwide team of developers
• Now maintained by a team at JBoss (Redhat) led
by Steve Ebersole
• Ported to the .NET environment (C#), called
Nhibernate
• http://hibernate.org
15
Hibernate mapping specification
CREATE TABLE "GROUPO"."SalesOrders" (
"ID"
integer NOT NULL DEFAULT autoincrement
,"CustomerID"
integer NOT NULL
,"OrderDate"
date NOT NULL
,"FinancialCode"
char(2) NULL
,"Region"
char(7) NULL
,"SalesRepresentative"
integer NOT NULL
,CONSTRAINT "SalesOrdersKey" PRIMARY KEY ("ID")
)
ALTER TABLE "GROUPO"."SalesOrders“
ADD FOREIGN KEY "FK_SalesRepresentative_EmployeeID"
("SalesRepresentative“) REFERENCES "GROUPO"."Employees" ("EmployeeID")
ALTER TABLE "GROUPO"."SalesOrders“
ADD FOREIGN KEY "FK_FinancialCode_Code" ("FinancialCode")
REFERENCES "GROUPO"."FinancialCodes" ("Code“)
ON DELETE SET NULL
ALTER TABLE "GROUPO"."SalesOrders“
ADD FOREIGN KEY "FK_CustomerID_ID" ("CustomerID")
REFERENCES "GROUPO"."Customers" ("ID“)
16
Hibernate mapping specification
Hibernate mapping file SalesOrders.hbm.xml:
<?xml version="1.0"?>
<!DOCTYPE hibernate-mapping PUBLIC "-//Hibernate/Hibernate Mapping DTD 3.0//EN"
"http://hibernate.sourceforge.net/hibernate-mapping-3.0.dtd">
<!-- Generated Mar 3, 2009 11:59:55 AM by Hibernate Tools 3.2.2.GA -->
<hibernate-mapping>
<class name="SalesOrders" table="SalesOrders">
<comment>sales orders that customers have submitted to the sporting goods company</comment>
<id name="id" type="int">
<column name="ID" />
<generator class="assigned" />
</id>
<many-to-one name="employees" class="Employees" fetch="select">
<column name="SalesRepresentative" not-null="true" />
</many-to-one>
<many-to-one name="financialCodes" class="FinancialCodes" fetch="select">
<column name="FinancialCode" length="2" />
</many-to-one>
<many-to-one name="customers" class="Customers" fetch="select">
<column name="CustomerID" not-null="true" />
</many-to-one>
<property name="orderDate" type="date">
<column name="OrderDate" length="10" not-null="true" />
</property> .............[snip]
</class>
</hibernate-mapping>
17
Java class implementation
// default package
// Generated Mar 3, 2009 1:39:06 PM by Hibernate Tools 3.2.2.GA
import java.util.Date;
import java.util.HashSet;
import java.util.Set;
/**
* SalesOrders generated by hbm2java
*/
public class SalesOrders implements java.io.Serializable {
private
private
private
private
private
private
private
int id;
Employees employees;
FinancialCodes financialCodes;
Customers customers;
Date orderDate;
String region;
Set salesOrderItemses = new HashSet(0);
public SalesOrders() {
}
public SalesOrders(int id, Employees employees, Customers customers,
Date orderDate) {
this.id = id;
this.employees = employees;
this.customers = customers;
this.orderDate = orderDate;
}
18
HQL: Hibernate Query Language
• Subset of ANSI query specification with support
for
DISTINCT
GROUP BY, simple aggregation
INNER, LEFT- and RIGHT-OUTER JOIN
Quantified subqueries
• Supports dot-notation for many-to-one, one-toone associations, for example:
select s.id, s.orderDate, s.region, s.customers.givenName, s.customers.surname
from SalesOrders s
where s.orderDate between '2001-03-16' and '2001-03-26' and s.region <> 'Central'
order by s.orderDate
19
HQL: Hibernate Query Language
• Hibernate’s HQL supports SELECT, inner and
outer JOIN, WHERE, HAVING, simple GROUP BY,
UNION, ORDER BY, self-joins with different
correlation names
• HQL does not support recursion, common table
expressions, window functions, derived tables,
other set operators, table functions, array or
structured types, APPLY/OUTER APPLY, CROSS
JOIN, GROUP BY CUBE/ROLLUP/GROUPING
SETS, FULL OUTER JOIN
select distinct c.givenName
from Customers c left join c.salesOrderses
where c.city in (select c2.city from Customers c2 where
c2.companyName like '%Power%')
20
HQL: Hibernate Query Language
• HQL is augmented by “dialects” that implement
specific methods to modify the SQL generated
by Hibernate before execution on the server
• HQL has optional support for (implemented by a
specific dialect):
UNION ALL (for entity-type hierarchies)
LIMIT (SELECT TOP N), OFFSET
IDENTITY, GUID data types
Syntax to declare an updateable cursor and locking mode
Case-insensitive string comparisons
21
Object persistence in Hibernate
• Saving objects
Once an object is created or modified, it must be saved
explicitly and then the transaction must be committed:
session.save(<object name>);
tx.commit();
• Loading objects
The Hibernate session interface offers several load() methods
for loading objects into memory from database tuples:
public Object load(Class theClass, Serializable id)
public Object load(String entityname, Serializable id)
public Object load(Object object, Serializable id)
Other load() methods permit specification of a lock mode, ie
Select for Update
22
Object persistence in Hibernate
• A refresh() method is implemented to reload
objects from the database
Useful for when attributes are modified by database triggers
upon INSERT or UPDATE
Highly error-prone
• Updating objects
Hibernate manages changes to persistent objects
transparently and automatically
If an attribute is altered, the appropriate Hibernate session will
queue the change for writing to the database using SQL
One can force changes to be written at a certain point using the
flush() method, controlled by isDirty() and setFlushMode()
23
Class inheritance
• Hibernate offers a variety of built-in techniques
to handle different normalizations of entity-type
hierarchies:
Single table with discriminator value
Multiple tables fully normalized into BCNF
A hybrid model consisting of a mixture of the two
• If represented as different objects in a mapping,
an ETH requires careful construction and
different equals() and hashcode()
implementations
24
FETCH strategies
• A major pain point is the N+1 SELECTs problem
Navigation through the object model iteratively causes
additional rows to be retrieved from the database, using
independent SQL requests
“client-side” join; performance tends to be extremely poor due to
the additional latency
Adaptive, client-side prefetching and SQL rewriting may be of
benefit in these scenarios: see Ivan Bowman’s PhD thesis on
Scalpel
Alternatively, in the mapping one may specify the selection
method: either “lazy” or “eager”
One must tradeoff this method will global application behaviour, or
override it on a case-by-case basis using HQL-specific syntax
25
Concurrency control
• Hibernate relies on the database’s locking scheme
for concurrency control
ANSI isolation levels 0-3 are supported directly; 1 or 2 is
recommended (READ COMMITTED and REPEATABLE READ)
DBMS that support snapshot isolation require tweaks to their
Hibernate dialect implementation
Lock mode for individual HQL statements or instantiations of objects
can be specified directly, ie
Customer c = (Customer) session.get(Customer.class, 101, LockMode.Upgrade);
• Hibernate has builtin support for optimistic
concurrency control
Can use either version numbers (added to the schema of the table),
or base change control on all of the values in the row
26
Caching and concurrency control
• Like many runtime environments, Hibernate
supports built-in caching controls to speed-up
database interaction
• Two levels of caching are supported:
Level 1: Persistence context cache
Lifespan is a transaction or a conversation without sharing.
Guarantees scope of the object and object identity. Mandatory.
Level 2: Pluggable, scope is process or cluster (shared)
Configurable on a class-by-class basis
Selectable concurrency control strategies: Transactional, Readwrite, non-strict read-write, read-only
EHCache, OpenSymphony, SwarmCache, JBoss Cache
27
Research opportunities
• Robust, industrial-strength debugging frameworks
Identify how a particular database interaction was caused by what
specific application program behaviour
Analysis of concurrency control behaviour to determine application
correctness, for example with respect to lost updates
• Identifying common classes of SQL queries for
specific sets of optimizations
Place in the dialect layer, in the DBMS itself, or in a separate tool
that can support the optimization of multiple inter-related statements
• Mechanisms to support additional types of mappings
and more complex schemas
• Support for recursion
28
Questions?
• Watch my blog for continuing articles on
supporting ORM toolkits with SQL Anywhere
http://iablog.sybase.com/paulley
References:
•
•
•
Christian Bauer and Gavin King (November 2007). Java Persistence with
Hibernate (revised edition of Hibernate in Action). Manning Publications, New
York, New York. ISBN 1-932394-88-5. Seventh printing.
Dave Minter and Jeff Linwood (2005) Pro Hibernate 3. Apress Books, Berkeley,
CA.
ACM Queue 6(3), May/June 2008, “Bridging the Object-Relational Divide”.
29