NoSQL / Spring Data

Download Report

Transcript NoSQL / Spring Data

NoSQL / Spring Data
Polyglot Persistence – An introduction to Spring Data
Pronam Chatterjee
[email protected]
© 2011 VMware Inc. All rights reserved
Presentation goal
How Spring Data simplifies the
development of NoSQL
applications
2
Agenda
•
•
•
•
Why NoSQL?
Overview of NoSQL databases
Introduction to Spring Data
Database APIs
- MongoDB
- HyperSQL
- Neo4J
3
Relational databases are great
•
•
•
•
•
•
SQL = Rich, declarative query language
Database enforces referential integrity
ACID semantics
Well understood by developers
Well supported by frameworks and tools, e.g. Spring JDBC, Hibernate, JPA
Well understood by operations
• Configuration
• Care and feeding
• Backups
• Tuning
• Failure and recovery
• Performance characteristics
• But….
4
The trouble with relational databases
• Object/relational impedance mismatch
- Complicated to map rich domain model to relational schema
• Relational schema is rigid
- Difficult to handle semi-structured data, e.g. varying attributes
- Schema changes = downtime or $$
• Extremely difficult/impossible to scale writes:
- Vertical scaling is limited/requires $$
- Horizontal scaling is limited or requires $$
• Performance can be suboptimal for some use cases
5
NoSQL databases have emerged…
Each one offers some combination of:
•High performance
•High scalability
•Rich data-model
•Schema less
In return for:
•Limited transactions
•Relaxed consistency
•…
6
… but there are few commonalities
• Everyone and their dog has written one
• Different data models
-
Key-value
Column
Document
Graph
• Different APIs – No JDBC, Hibernate, JPA (generally)
• “Same sorry state as the database market in the 1970s before SQL was
invented” http://queue.acm.org/detail.cfm?id=1961297
7
NoSQL databases have emerged…
• NoSQL usage small by
comparison…
• But growing…
8
Agenda
• Why NoSQL?
• Overview of NoSQL databases
• Introduction to Spring Data
• Database APIs
- MongoDB
- HyperSQL
- Neo4J
10
Redis
• Advanced key-value store
- Think memcached on steroids (the good kind)
- Values can be binary strings, Lists, Sets, Ordered Sets, Hash maps, ..
- Operations for each data type, e.g. appending to a list, adding to a
set, retrieving a slice of a list, …
- Provides pub/sub-based messaging
• Very fast:
- In-memory operations
- ~100K operations/second on entry-level hardware
• Persistent
- Periodic snapshots of memory OR append commands to log file
- Limits are size of keys retained in memory.
• Has “transactions”
- Commands can be batched and executed atomically
11
K1
V1
K2
V2
K3
V2
Redis use cases
• Use in conjunction with another database as the SOR
• Drop-in replacement for Memcached
- Session state
- Cache of data retrieved from SOR
- Denormalized datastore for high-performance queries
• Hit counts using INCR command
• Randomly selecting an item – SRANDMEMBER
• Queuing – Lists with LPOP, RPUSH, ….
• High score tables – Sorted sets
Notable users: github, guardian.co.uk, ….
14
vFabric Gemfire - Elastic data fabric
•
•
•
•
High performance data grid
Enhanced parallel disk persistence
Non Disruptive up/down scalability
Session state
- Cache of data retrieved from SOR
- Denormalized datastore for high-performance queries
• Heterogenous data sharing
• Java
• .net
• C++
• Co-located Transactions
14
Gemfire - Use Cases
•
•
•
•
Ultra low latency high throughput application
As an L2 cache in hibernate
Distributed Batch process
Session state
- Tomcat
- tcServer
• Wide Area replication
14
Neo4j
•Graph data model
- Collection of graph nodes
- Typed relationships between nodes
- Nodes and relationships have properties
•High performance traversal API from roots
- Breadth first/depth first
•Query to find root nodes
- Indexes on node/relationship properties
- Pluggable - Lucene is the default
•Graph algorithms: shortest path, …
•Transactional (ACID) including 2PC
•Deployment modes
- Embedded – written in Java
- Server with REST API
15
Neo4j Data Model
16
Neo4j Use Cases
• Use Cases
- Anything social
-
Cloud/Network management, i.e. tracking/managing physical/virtual resources
Any kind of geospatial data
Master data management
Bioinformatics
Fraud detection
Metadata management
• Who is using it?
- StudiVZ (the largest social network in Europe)
- Fanbox
- The Swedish military
- And big organizations in datacom, intelligence, and finance that wish to remain anonymous
19
MongoDB
• Document-oriented database
- JSON-style documents: Lists, Maps, primitives
- Documents organized into collections (~table)
• Full or partial document updates
- Transactional update in place on one document
- Atomic Modifiers
• Rich query language for dynamic queries
• Index support – secondary and compound
• GridFS for efficiently storing large files
• Map/Reduce
20
Data Model = Binary JSON documents
{
"name" : "Ajanta",
"type" : "Indian",
One document
=
"serviceArea" : [
"94619",
one DDD aggregate
"94618"
],
"openingHours" : [
{
"dayOfWeek" : Monday,
"open" : 1730,
"close" : 2130
}
],
"_id" : ObjectId("4bddc2f49d1505567c6220a0")
}
21
• Sequence of bytes on disk = fast I/O
- No joins/seeks
- In-place updates when possible => no index updates
• Transaction = update of single document
MongoDB query by example
• Find a restaurant that serves the 94619 zip code and is open at 6pm on a Monday
{
serviceArea:"94619",
openingHours: {
$elemMatch :
{
"dayOfWeek" : "Monday",
"open": {$lte: 1800},
"close": {$gte: 1800}
}
}
}
DBCursor cursor = collection.find(qbeObject);
while (cursor.hasNext()) {
DBObject o = cursor.next();
…
}
23
MongoDB use cases
• Use cases
- Real-time analytics
-
Content management systems
Single document partial update
Caching
High volume writes
• Who is using it?
- Shutterfly, Foursquare
-
Bit.ly Intuit
SourceForge, NY Times
GILT Groupe, Evite,
SugarCRM
Copyright (c) 2011 Chris Richardson. All rights reserved.
25
Other NoSQL databases
• SimpleDB – “key-value”
• Cassandra – column oriented database
• CouchDB – document-oriented
• Membase – key-value
• Riak – key-value + links
• Hbase – column-oriented…
http://nosql-database.org/ has a list of 122 NoSQL databases
26
Agenda
• Why NoSQL?
• Overview of NoSQL databases
• Introduction to Spring Data
• Database APIs
- MongoDB
- HyperSQL
- Neo4J
27
NoSQL Java APIs
Database
Libraries
Redis
Jedis, JRedis, JDBC-Redis, RJC
Neo4j
Vendor-provided
MongoDB
Vendor-provided Java driver
Gemfire
Pure Java map API, Spring-Gemfire templates
But
• Usage patterns
• Tedious configuration
• Repetitive code
• Error prone code
•…
28
Spring Data Project Goals
• Bring classic Spring value propositions to a wide range of NoSQL databases:
- Productivity
- Programming model consistency: E.g. <NoSQL>Template classes
- “Portability”
30
Spring Data sub-projects
•
•
•
•
•
Commons: Polyglot persistence
Key-Value: Redis, Riak
Document: MongoDB, CouchDB
Graph: Neo4j
GORM for NoSQL
http://www.springsource.org/spring-data
31
Many entry points to use
• Auto-generated repository implementations
• Opinionated APIs (Think JdbcTemplate)
• Object Mapping (Java and GORM)
• Cross Store Persistence Programming model
• Productivity support in Roo and Grails
32
Cloud Foundry supports NoSQL
MongoDB and Redis are provided as services
è Deploy your MongoDB and Redis applications in seconds
33
Agenda
• Why NoSQL?
• Overview of NoSQL databases
• Introduction to Spring Data
• Database APIs
- MongoDB
- HyperSQL
- Neo4J
34
Three databases for today’s talk
Document database
Relational database
Graph database
35
Three persistence strategies for today’s talk
• Lower level template approach
• Conventions based persistence (Hades)
• Cross-Store persistence using JPA and a NoSQL datastore
36
Spring Template Patterns
• Resource Management
• Callback methods
• Exception Translation
• Simple Query API
37
Repository Implementation
38
• Also known as HSQLDB or Hypersonic SQL
• Relational Database
• Table oriented data model
• SQL used for for queries
• … you know the rest…
39
Spring Data Repository Support
• Eliminate bolierplate code – only finder methods
• findByLastName – Specifications for type safe queries
• JPA CrietriaBuilder integration QueryDSL
40
• Type safe queries for multiple backends including JPA, SQL and MongoDB in Java
• Generate Query classes using Java APT
• Code completion in IDE
• Domain types and properties can be referenced safely
• Adopts better to refactoring changes in domain types
http://www.querydsl.com
41
QueryDSL
• Repository Support
• Spring Data JPA
• Spring data Mongo
• Spring Data JDBC extensions
• QueryDslJdbcTemplate
42
Spring Data Neo4J
•
•
•
•
•
•
Using AspectJ support providing a new programming model
Use annotations to define POJO entities
Constructor advice automatically handles entity creation
Entity field state persisted to graph using aspects
Leverage graph database APIs from POJO model
Annotation-driven indexing of entities for search
43
Spring Data Graph Neo4J cross-store
• JPA data and “NOSQL” data can share a data model
• Separate the persistence provider by using annotations
– could be the entire Entity
– or, some of the fields of an Entity
• We call this cross-store persistence
– One transaction manager to coordinate the “NOSQL” store with the JPA relational database
– AspectJ support to manage the “NOSQL” entities and fields
• holds on to changed values in “change sets” until the transaction commits for nontransactional data stores
44
A cross-store scenario ...
You have a traditional web app using JPA to persist data to a relational
database ...
45
JPA Data Model
46
8/3/11
Slide 46
Cross-Store Data Model
47