ORMs and Hibernate - Data Systems Group

Download Report

Transcript ORMs and Hibernate - Data Systems Group

ORMs and Hibernate
Glenn Paulley, Director, Engineering
http://iablog.sybase.com/paulley
What is object-relational mapping?
• Object-oriented programming technologies are
typically used to implement business logic
• Relational databases are used for persistent data
storage
• Impedance mismatch between the two
paradigms: objects vs. relations
 Estimated that 30-40% of a JDBC application involves
coercing data from tuples to object instances and back again
• ORM toolkits are designed to address this
impedance mismatch
 61 different ORM toolkits are listed in Wikipedia for Java, C++,
Delphi, Ruby, .NET, PHP, Python, Perl
2
What is object-relational mapping?
• To exploit object behaviour fully, data-access
from within an object-oriented programming
language should offer:






Separation of concerns
Information hiding
Inheritance
Change detection
Uniqueness capability
Database independence
3
Change detection
• ORM toolkits require mechanisms to track
changes to objects made by the application
 When a transaction is complete, write the changes to the
database within an atomic transaction
 Need appropriate guarantees to prevent lost updates
4
Uniqueness
• Mappings are usually a correspondence between a
row in a normalized table and a class instance
 Specified using metadata
 For example, a row of the Employee table will correspond to an
instance of the Employee object within the application
 Mappings are often not isomorphic
 Sophisticated ORMs such as Hibernate and LINQ permit object
models that differ substantially from the underlying relational store
 Object-oriented language features offer greater semantic flexibility in
application design than 1NF values from a relational database
• Need to establish a correspondence between an inmemory object and a database row
 Must be independent of how the object was acquired: a database
query, or navigating a reference to another object
 Predicated on the existence of primary keys in the database
5
Database independence
• Many ORM toolkits attempt to offer database
independence, so that applications can be ported
from one DBMS to another
 Create common APIs and models to interact with a variety of
DBMS platforms
• Useful with mobilized applications where the
consolidated database is one DBMS, and local
databases are different
6
Why are ORMs useful?
• Eliminates tedious, repetitive code that
instantiates object instances from tuples using a
SELECT statement and a CURSOR
• Insulates, to some extent, the application
developer from vendor-specific SQL extensions
• Permits the application developer to exploit
object-orientation and model and manipulate the
application view differently from the relational
model
• Data manipulation can be done at the object
level, rather than (only) at a SQL statement level
7
Challenges of ORMs
• ORM toolkits introduce an additional level of
complexity to the application
 Example: Java Hibernate 3.2.6 is
 266 packages, 1938 classes, 18,680 functions, over 118K LOC
 Can be difficult to debug, perform performance analysis
 Most frameworks suffer from a lack of appropriate tools
• Performance analysis is problematic because the
application’s behaviour is not tied directly to
specific interactions with the database
 Complex mappings may cause very complex SQL queries to
be generated
 Can be difficult for the application developer to understand
what caused their construction
8
Complex SQL (LINQ generated)
SELECT
[Project9].[ContactID] AS [ContactID],[Project9].[C1] AS [C1],[Project9].[C2] AS [C2],[Project9].[ContactID1] AS [ContactID1],[Project9].[SalesOrderID] AS
[SalesOrderID],
[Project9].[TotalDue] AS [TotalDue]
FROM ( SELECT
[Distinct1].[ContactID] AS [ContactID],
1 AS [C1],
[Project8].[ContactID] AS [ContactID1],
[Project8].[SalesOrderID] AS [SalesOrderID],
[Project8].[TotalDue] AS [TotalDue],
[Project8].[C1] AS [C2]
FROM
(SELECT DISTINCT [Extent1].[ContactID] AS [ContactID]
FROM [DBA].[Contact] AS [Extent1]
INNER JOIN [DBA].[SalesOrderHeader] AS [Extent2]
ON EXISTS (SELECT cast(1 as bit) AS [C1]
FROM ( SELECT cast(1 as bit) AS X ) AS [SingleRowTable1]
LEFT OUTER JOIN (SELECT [Extent3].[ContactID] AS [ContactID]
FROM [DBA].[Contact] AS [Extent3] WHERE [Extent2].[ContactID] = [Extent3].[ContactID] )AS [Project1] ON cast(1 as bit) = cast(1 as bit)
LEFT OUTER JOIN (SELECT [Extent4].[ContactID] AS [ContactID]
FROM [DBA].[Contact] AS [Extent4] WHERE [Extent2].[ContactID] = [Extent4].[ContactID] ) AS [Project2] ON cast(1 as bit) = cast(1 as bit)
WHERE ([Extent1].[ContactID] = [Project1].[ContactID]) OR (([Extent1].[ContactID] IS NULL) AND ([Project2].[ContactID] IS NULL)) )
) AS [Distinct1]
LEFT OUTER JOIN
(SELECT [Extent5].[ContactID] AS [ContactID], [Extent6].[SalesOrderID] AS [SalesOrderID], [Extent6].[TotalDue] AS [TotalDue], 1 AS [C1]
FROM [DBA].[Contact] AS [Extent5]
INNER JOIN [DBA].[SalesOrderHeader] AS [Extent6]
ON EXISTS (SELECT cast(1 as bit) AS [C1]
FROM ( SELECT cast(1 as bit) AS X ) AS [SingleRowTable2]
LEFT OUTER JOIN (SELECT [Extent7].[ContactID] AS [ContactID]
FROM [DBA].[Contact] AS [Extent7] WHERE [Extent6].[ContactID] = [Extent7].[ContactID] )AS [Project5] ON cast(1 as bit) = cast(1 as bit)
LEFT OUTER JOIN (SELECT [Extent8].[ContactID] AS [ContactID]
FROM [DBA].[Contact] AS [Extent8] WHERE [Extent6].[ContactID] = [Extent8].[ContactID] )AS [Project6] ON cast(1 as bit) = cast(1 as bit)
WHERE ([Extent5].[ContactID] = [Project5].[ContactID]) OR (([Extent5].[ContactID] IS NULL) AND ([Project6].[ContactID] IS NULL))
)
) AS [Project8]
ON ([Project8].[ContactID] = [Distinct1].[ContactID]) OR (([Project8].[ContactID] IS NULL) AND ([Distinct1].[ContactID] IS NULL))
) AS [Project9]
ORDER BY [Project9].[ContactID] ASC, [Project9].[C2] ASC
9
Equivalent SQL query
select Extent6.ContactID,
1 as C1,
1 as C2,
Extent6.ContactID as ContactID1,
Extent6.SalesOrderID as SalesOrderID,
Extent6.TotalDue as TotalDue
from DBA.SalesOrderHeader as Extent6
order by Extent6.ContactID asc
Can your query optimizer get there?
10
Challenges of ORMs
• Object-instance-at-a-time navigation through the
object model can result in multiple, separate
interactions to the database server to retrieve the
rows to create the objects
 Termed the N+1 Selects problem
 Application developer must tradeoff prefetching applicability in
various scenarios
 Objects are not partially instantiated; all object attributes are required
for the constructor
• Transactional semantics are complex once caching
is introduced
 Transactional semantics often differ across DBMSs, even with the
identical isolation level
 Developers must (still) be aware of the potential for lost updates
11
Challenges of ORMs
• The SQL dialect supported by ORM toolkits is
typically a very restricted subset of ANSI SQL
• ORM toolkits often support only limited,
straightforward DDL for schema
creation/modification
 Not recommended for production applications
12
Important aspects of ORM toolkits
•
•
•
•
•
•
Mapping specification
Query language
Persistence
Class inheritance
Fetch strategies
Caching
• We look and see how these are implemented in
Hibernate, a popular open-source ORM toolkit for
Java applications
13
Hibernate as an example of an ORM
Brief introduction to Hibernate
• Open-source, LGPL Java ORM toolkit
• Originally developed by Christian Bauer, Gavin
King, and a worldwide team of developers
• Now maintained by a team at JBoss (Redhat) led
by Steve Ebersole
• Ported to the .NET environment (C#), called
Nhibernate
• http://hibernate.org
15
Hibernate mapping specification
CREATE TABLE "GROUPO"."SalesOrders" (
"ID"
integer NOT NULL DEFAULT autoincrement
,"CustomerID"
integer NOT NULL
,"OrderDate"
date NOT NULL
,"FinancialCode"
char(2) NULL
,"Region"
char(7) NULL
,"SalesRepresentative"
integer NOT NULL
,CONSTRAINT "SalesOrdersKey" PRIMARY KEY ("ID")
)
ALTER TABLE "GROUPO"."SalesOrders“
ADD FOREIGN KEY "FK_SalesRepresentative_EmployeeID"
("SalesRepresentative“) REFERENCES "GROUPO"."Employees" ("EmployeeID")
ALTER TABLE "GROUPO"."SalesOrders“
ADD FOREIGN KEY "FK_FinancialCode_Code" ("FinancialCode")
REFERENCES "GROUPO"."FinancialCodes" ("Code“)
ON DELETE SET NULL
ALTER TABLE "GROUPO"."SalesOrders“
ADD FOREIGN KEY "FK_CustomerID_ID" ("CustomerID")
REFERENCES "GROUPO"."Customers" ("ID“)
16
Hibernate mapping specification
Hibernate mapping file SalesOrders.hbm.xml:
<?xml version="1.0"?>
<!DOCTYPE hibernate-mapping PUBLIC "-//Hibernate/Hibernate Mapping DTD 3.0//EN"
"http://hibernate.sourceforge.net/hibernate-mapping-3.0.dtd">
<!-- Generated Mar 3, 2009 11:59:55 AM by Hibernate Tools 3.2.2.GA -->
<hibernate-mapping>
<class name="SalesOrders" table="SalesOrders">
<comment>sales orders that customers have submitted to the sporting goods company</comment>
<id name="id" type="int">
<column name="ID" />
<generator class="assigned" />
</id>
<many-to-one name="employees" class="Employees" fetch="select">
<column name="SalesRepresentative" not-null="true" />
</many-to-one>
<many-to-one name="financialCodes" class="FinancialCodes" fetch="select">
<column name="FinancialCode" length="2" />
</many-to-one>
<many-to-one name="customers" class="Customers" fetch="select">
<column name="CustomerID" not-null="true" />
</many-to-one>
<property name="orderDate" type="date">
<column name="OrderDate" length="10" not-null="true" />
</property> .............[snip]
</class>
</hibernate-mapping>
17
Java class implementation
// default package
// Generated Mar 3, 2009 1:39:06 PM by Hibernate Tools 3.2.2.GA
import java.util.Date;
import java.util.HashSet;
import java.util.Set;
/**
* SalesOrders generated by hbm2java
*/
public class SalesOrders implements java.io.Serializable {
private
private
private
private
private
private
private
int id;
Employees employees;
FinancialCodes financialCodes;
Customers customers;
Date orderDate;
String region;
Set salesOrderItemses = new HashSet(0);
public SalesOrders() {
}
public SalesOrders(int id, Employees employees, Customers customers,
Date orderDate) {
this.id = id;
this.employees = employees;
this.customers = customers;
this.orderDate = orderDate;
}
18
HQL: Hibernate Query Language
• Subset of ANSI query specification with support
for




DISTINCT
GROUP BY, simple aggregation
INNER, LEFT- and RIGHT-OUTER JOIN
Quantified subqueries
• Supports dot-notation for many-to-one, one-toone associations, for example:
select s.id, s.orderDate, s.region, s.customers.givenName, s.customers.surname
from SalesOrders s
where s.orderDate between '2001-03-16' and '2001-03-26' and s.region <> 'Central'
order by s.orderDate
19
HQL: Hibernate Query Language
• Hibernate’s HQL supports SELECT, inner and
outer JOIN, WHERE, HAVING, simple GROUP BY,
UNION, ORDER BY, self-joins with different
correlation names
• HQL does not support recursion, common table
expressions, window functions, derived tables,
other set operators, table functions, array or
structured types, APPLY/OUTER APPLY, CROSS
JOIN, GROUP BY CUBE/ROLLUP/GROUPING
SETS, FULL OUTER JOIN
select distinct c.givenName
from Customers c left join c.salesOrderses
where c.city in (select c2.city from Customers c2 where
c2.companyName like '%Power%')
20
HQL: Hibernate Query Language
• HQL is augmented by “dialects” that implement
specific methods to modify the SQL generated
by Hibernate before execution on the server
• HQL has optional support for (implemented by a
specific dialect):





UNION ALL (for entity-type hierarchies)
LIMIT (SELECT TOP N), OFFSET
IDENTITY, GUID data types
Syntax to declare an updateable cursor and locking mode
Case-insensitive string comparisons
21
Object persistence in Hibernate
• Saving objects
 Once an object is created or modified, it must be saved
explicitly and then the transaction must be committed:
session.save(<object name>);
tx.commit();
• Loading objects
 The Hibernate session interface offers several load() methods
for loading objects into memory from database tuples:
public Object load(Class theClass, Serializable id)
public Object load(String entityname, Serializable id)
public Object load(Object object, Serializable id)
 Other load() methods permit specification of a lock mode, ie
Select for Update
22
Object persistence in Hibernate
• A refresh() method is implemented to reload
objects from the database
 Useful for when attributes are modified by database triggers
upon INSERT or UPDATE
 Highly error-prone
• Updating objects
 Hibernate manages changes to persistent objects
transparently and automatically
 If an attribute is altered, the appropriate Hibernate session will
queue the change for writing to the database using SQL
 One can force changes to be written at a certain point using the
flush() method, controlled by isDirty() and setFlushMode()
23
Class inheritance
• Hibernate offers a variety of built-in techniques
to handle different normalizations of entity-type
hierarchies:
 Single table with discriminator value
 Multiple tables fully normalized into BCNF
 A hybrid model consisting of a mixture of the two
• If represented as different objects in a mapping,
an ETH requires careful construction and
different equals() and hashcode()
implementations
24
FETCH strategies
• A major pain point is the N+1 SELECTs problem
 Navigation through the object model iteratively causes
additional rows to be retrieved from the database, using
independent SQL requests
 “client-side” join; performance tends to be extremely poor due to
the additional latency
 Adaptive, client-side prefetching and SQL rewriting may be of
benefit in these scenarios: see Ivan Bowman’s PhD thesis on
Scalpel
 Alternatively, in the mapping one may specify the selection
method: either “lazy” or “eager”
 One must tradeoff this method will global application behaviour, or
override it on a case-by-case basis using HQL-specific syntax
25
Concurrency control
• Hibernate relies on the database’s locking scheme
for concurrency control
 ANSI isolation levels 0-3 are supported directly; 1 or 2 is
recommended (READ COMMITTED and REPEATABLE READ)
 DBMS that support snapshot isolation require tweaks to their
Hibernate dialect implementation
 Lock mode for individual HQL statements or instantiations of objects
can be specified directly, ie
Customer c = (Customer) session.get(Customer.class, 101, LockMode.Upgrade);
• Hibernate has builtin support for optimistic
concurrency control
 Can use either version numbers (added to the schema of the table),
or base change control on all of the values in the row
26
Caching and concurrency control
• Like many runtime environments, Hibernate
supports built-in caching controls to speed-up
database interaction
• Two levels of caching are supported:
 Level 1: Persistence context cache
 Lifespan is a transaction or a conversation without sharing.
 Guarantees scope of the object and object identity. Mandatory.
 Level 2: Pluggable, scope is process or cluster (shared)
 Configurable on a class-by-class basis
 Selectable concurrency control strategies: Transactional, Readwrite, non-strict read-write, read-only
 EHCache, OpenSymphony, SwarmCache, JBoss Cache
27
Research opportunities
• Robust, industrial-strength debugging frameworks
 Identify how a particular database interaction was caused by what
specific application program behaviour
 Analysis of concurrency control behaviour to determine application
correctness, for example with respect to lost updates
• Identifying common classes of SQL queries for
specific sets of optimizations
 Place in the dialect layer, in the DBMS itself, or in a separate tool
that can support the optimization of multiple inter-related statements
• Mechanisms to support additional types of mappings
and more complex schemas
• Support for recursion
28
Questions?
• Watch my blog for continuing articles on
supporting ORM toolkits with SQL Anywhere
http://iablog.sybase.com/paulley
References:
•
•
•
Christian Bauer and Gavin King (November 2007). Java Persistence with
Hibernate (revised edition of Hibernate in Action). Manning Publications, New
York, New York. ISBN 1-932394-88-5. Seventh printing.
Dave Minter and Jeff Linwood (2005) Pro Hibernate 3. Apress Books, Berkeley,
CA.
ACM Queue 6(3), May/June 2008, “Bridging the Object-Relational Divide”.
29