Transcript No Sql

NoSQL Databases CouchDB
By Tom Sausner
Agenda
• Introduction
• Review of NoSQL storage options
 CAP Theorem
 Review categories of storage options
• CouchDB
 Overview
 Interacting with data
 Examples
• Technologies applying Couch DB
What does it mean?
• Not Only SQL or NO! SQL
• A more general definition… a datastore that
does not follow the relational model including
using SQL to interact with the data.
• Why?
 One size does not fit all
 Relational Model has scaling issues
 Freedom from the tyranny of the DBA?
CAP Theorem
• Eric Brewer of U.C. Berkeley, Seth Gilbert
and Nancy Lynch, of MIT
• Relates to distributed systems
• Consistency, Availability, Partition
Tolerance… pick 2
• A distributed system is built of “nodes”
(computers), which can (attempt to) send
messages to each other over a network….
Consistency
• “is equivalent to requiring requests of the
distributed shared memory to act as if they
were executing on a single node,
responding to operations one at a time.”
 Not the same as “ACID”
• Linearizability ~ operations behave as if
there were no concurrency.
• Does not mention transactions
Available
• “every request received by a non-failing
node in the system must result in a
response.”
• says nothing about the content of the
response. It could be anything; it need not
be “successful” or “correct”.
Partition Tolerant
• any guarantee of consistency or
availability is still guaranteed even if there
is a partition.
• if a system is not partition-tolerant, that
means that if the network can lose
messages or any nodes can fail, then any
guarantee of atomicity or consistency is
voided.
Implications of CAP
• How to best scale your application? The world
falls broadly into two ideological camps: the
database crowd and the non-database crowd.
• The database crowd, unsurprisingly, like
database technology and will tend to address
scale by talking of things like optimistic locking
and sharding
• The non-database crowd will tend to address
scale by managing data outside of the database
environment (avoiding the relational world) for
as long as possible.
Types of NoSQL datastores
•
•
•
•
Key - value stores
Column stores
Document stores
Oject stores
Key Value stores
• Memcache ( just merged with CouchDB)
• Redis
• Riak
Column Stores
•
•
•
•
Big Table ( Google )
Dynamo
Cassandra
Hadoop/HBase
Document Stores
• Couch DB
• Mongo
Graph, Object Stores
• Neo4J
• db4o
Couch DB - relax ( taken from
website)
• An Apache project create by….Damien Katz…
• A document database server, accessible via a RESTful
JSON API.
• Ad-hoc and schema-free with a flat address space.
• Distributed, featuring robust, incremental replication with
bi-directional conflict detection and management.
• Recently merged with Membase
More on CouchDB
• The CouchDB file layout and commitment
system features all Atomic Consistent Isolated
Durable (ACID) properties.
• Document updates (add, edit, delete) are
serialized, except for binary blobs which are
written concurrently.
• CouchDB read operations use a Multi-Version
Concurrency Control (MVCC) model where each
client sees a consistent snapshot of the
database from the beginning to the end of the
read operation.
• Eventually Consistent
Couch DB Access via CURL
• curl http://127.0.0.1:5984/
• curl -X GET http://127.0.0.1:5984/_all_dbs
• curl -X PUT http://127.0.0.1:5984/baseball
// error.... already exist
• curl -X PUT http://127.0.0.1:5984/baseball
• curl -X DELETE http://127.0.0.1:5984/baseball
Adding Doc’s via CURL
• curl -X PUT http://127.0.0.1:5984/albums
• curl -X PUT http://127.0.0.1:5984/albums/1000 d '{"title":"Abbey Road","artist":"The Beatles"} '
• Uuids curl -X GET http://127.0.0.1:5984/_uuids
• curl -X GET http://127.0.0.1:5984/albums/1000
• _rev - If you want to update or delete a
document, CouchDB expects you to include the
_rev field of the revision you wish to change
• curl -X PUT http://127.0.0.1:5984/albums/1000 d '{"_rev":"142c7396a84eaf1728cdbf08415a09a41","title":"A
bbey Road", "artist":"The
Futon… Couch DB Maintenence
• http://127.0.0.1:5984/_utils/index.html
• Albums database review
 Add another document
•
•
•
•
Tools
Database, Document, View Creation
Secuity, Compact & Cleanup
Create and Delete
Demo Setup
• Examples implemented in Groovy
• Use HttpBuilder to interact with the
database
• Groovy RESTClient
• Use google GSON to move objects
between JSON and Java/Groovy
• Use Federal Contribution database for our
dataset.
• Eclipse
Data Loading Review
• Limited input to NY candidates, and only
year 2010
• contributions.fec.2010.csv
• Groovy bean for input data
• Readfile.groovy
• contribDB.put(path:"fed_contrib_test/${contrib.transactio
nId}", contentType: JSON, requestContentType: JSON,
body:json )
Couch DB Design Documents
• CouchDB is designed to work best when
there is a one-to-one correspondence
between applications and design
documents.
• _design/”design_doc_name”
• Design Documents are applications
 Ie. A CouchDB can be an application.
Design Documents contents
• Update Handler
 updates: {"hello" : function(doc, req) {…}
•
•
•
•
•
•
Views ( more on this later)
Validation
Shows
Lists
Filters
libs
Updates
• If you have multiple design documents,
each with a validate_doc_update function,
all of those functions are called upon each
incoming write request
• If any of the validate functions fail then the
document is not added to the database
Validation
• Validation functions are a powerful tool to ensure
that only documents you expect end up in your
databases.
• validate_doc_update section of the view
document
• function(newDoc, oldDoc, userCtx) {}
 throw({forbidden : message});
 throw({unauthorized : message});
Ok, how can I see my data?
• CouchDB design documents can contain a
“views” section
• Views contain Map/Reduce functions
• Map/Reduce functions are implemented in
javascript
 However there are different Query Servers
available using different languages
Views
• Filtering the documents in your database to find
those relevant to a particular process.
• Building efficient indexes to find documents by
any value or structure that resides in them
• Extracting data from your documents and
presenting it in a specific order.
• Use these indexes to represent relationships
among documents.
Map/Reduce dialog
•
•
•
•
Bob: So, how do I query the database?
IT guy: It’s not a database. It’s a key-value store.
Bob: OK, it’s not a database. How do I query it?
IT guy: You write a distributed map-reduce
function in Erlang.
• Bob: Did you just tell me to go screw myself?
• IT guy: I believe I did, Bob.
Map/Reduce in CouchDB
• Map functions have a single parameter a
document, and emit a list of key/value pairs of
JSON values
 CouchDB allows arbitrary JSON structures to be used
as keys
• Map is called for every document in the
database
 Efficiency?
• emit() function can be called multiple times in the
map function
• View results are stored in B-Trees
Reduce/Rereduce
• The reduce function is optional
• used to produce aggregate results for that view
• Reduce functions must accept, as input, results
emitted by its corresponding map function as
well as results returned by the reduce function
itself(rereduce).
• On rereduce the key = null
• On a large database objects to be reduced will
be sent to your reduce function in batches.
These batches will be broken up on B-tree
boundaries, which may occur in arbitrary places.
More on Map/Reduce
• Linked Documents - If you emit an object value
which has {'_id': XXX} then include_docs=true
will fetch the document with id XXX rather than
the document which was processed to emit the
key/value pair.
• Complex Keys
 emit([lastName, firstName, zipcode], doc)
• Grouping
• Grouping Levels
Restrictions on Map/Reduce
• Map functions must be referentially transparent.
Given the same doc will always issue the same
key/value pairs
 Allows for incremental update
• reduce functions must be able reduce on its
own output
 This requirement of reduce functions allows CouchDB
to store off intermediated reductions directly into inner
nodes of btree indexes, and the view index updates
and retrievals will have logarithmic cost
List Donors
• Map:
function(doc) {
if(doc.recipientName){
emit(doc.recipientName, doc);
}
else if(doc.recipientType){
emit(doc.recipientType, doc)
}
}
No reduce function
List of Query Parameters
•
•
•
•
•
•
•
key
startkey, endkey
startkey_docid , endkey_docid
limit, skip, stale, decending
group, grouplevel
reduce
include_docs, inclusive_end
List all NY candidates
• Want a list of all of the unique candidates
in the database
• Map:
 emit(doc.recipientType, null);
• Reduce:
 return true
• Must set group = true
Total Candidate Donations
• List the total campaign contributions for each
candidate
• Map:
 emit(doc.recipientType, doc.amount)
• Reduce:
 function(keys, values) {
var sum = 0;
for(var idx in keys) {
sum = sum + parseFloat(values[idx]);
}
return sum;
Donation Totals by Zip
• Complex Keys
• In the map function:
 emit([doc.recipientType, doc.contributorZipCode],
doc.amount);
• Reduce:
 function(keys, values) {
var sum = 0;
for(var idx in keys) {
sum = sum + parseFloat(values[idx]);
}
return sum;
Referencing other documents
Conflict Management
• Multi-Version Concurrency Control (MVCC)
• CouchDB does not attempt to merge the
conflicting revisions this is an application
• If there is a conflict in revisions between nodes




App is ultimately responsible for resolving the conflict
All revisions are saved
One revision is selected as the most recent
_conflict property set
Database Replication
• “CouchDB has built-in conflict detection and
management and the replication process is
incremental and fast, copying only documents
and individual fields changed since the previous
replication.”
• replication is a unidirectional process.
• Databases in CouchDB have a sequence
number that gets incremented every time the
database is changed.
Replication Continued
• "continuous”: true…
 automatically replicate over any new docs as they
come into the source to the target…there’s a complex
algorithm determining the ideal moment to replicate
for maximum performance.
• Create albums_backup using futon replicator
• curl -X PUT http://127.0.0.1:5984/albums/1010 d '{"title":"Let It Be","artist":"The Beatles"} '
Replication & Conflict
• Replicate albums db via Futon
• curl -X PUT http://127.0.0.1:5984/albums/1050 d '{"title":”RJUG Roundup","artist":"Rob",
”year":”2010"} ’
• Replicate again
• curl -X PUT
http://127.0.0.1:5984/albums_backup/1050 -d
'{"title":”RJUG Roundup","artist":"Rob",
”year":”2011"} ’
• Replicate, review
Notifications
• Polling , long polling
 _changes
• If executing not from a browser can
request continuous changes
• Filters can be applied to changes
 Ex only notify when level = error
• filterName:function(doc, req)
 Req contains query parameters
 Also contains userCtx
Security
• ships with OAuth, cookie auth handler,
default - standard http
• Authorizations
 Reader - read/write document
 Database Admin - compact, add/edit views
 Server Admin - create and remove databases
CouchDB Applied
• CouchOne
 Hosting Services
 CouchDB on Android
• CouchApp
 HTML5 applications
• jCouchDB
 Java layer for CouchDB access
• CouchDB Lounge
 Clustering support
Links
• http://couchdb.apache.org/
• http://wiki.apache.org/couchdb/FrontPage
• http://guide.couchdb.org/editions/1/en/inde
x.html
Questions?
• Thanks!