NoSQL / Scalabilite

Download Report

Transcript NoSQL / Scalabilite

:: Conférence ::
NoSQL / Scalabilite
Etat de l’art
Samuel BERTHE
Epitech Nantes
10 Mars 2014
Késako ??
Not
Only
SQL
Késako ??
WHATSAPP :
* 5 years
* 450 millions users
ouch !
Scalability
Scale UP
Scale OUT
Scalability
Scale UP
- Hard to maintain
- Single Point Of Failure (SPOF)
- A fat application
- 1To of RAM on a server
doesn’t exist…
Scale OUT
- A server is broken ? Doesn’t
matter !
- Easy to grow
- Easy to maintain
- More flexible (cloud)
Capacity
Scalability
Cost
Scalability – File system
Don’t try to scale your FS : you can’t !
- Hard to maintain
- SPOF
Riak CS or AWS S3 is a good choice
Scalability - Stateless
Your memory isn’t a
database.
-
Don’t use global variables
-> Use a datastore
-
Consistent request
-> 1 variable = 1 request
Technologies actuelles
Technologies actuelles
Use case – Web Agency
One server with :
- httpd (sometimes with load balancing)
- RDBMS (sometimes separated)
SQL
…only
Use case – Web Agency
How to scale up ?
How to protect you data ?
What is you faults tolerance ?
Use case – Web Agency
WORDPRESS / SQL ==
Single Point Of Failure
+
Fu****g backup management
+
I’m poor, so I can’t use cloud to scale up
Use case – Worldwide chain store
So many DATA
Use case – Worldwide chain store
Operational database
Use case – Worldwide chain store
Data Wharehouse
SQL – Transactions
Example : stock management system at Amazon.com
-> two customers buy at the same time a Samsung Galaxy S5
Customer 1
Customer 2
GET nbr of Samsung Galaxy S5
-> answer = 42
GET nbr of Samsung Galaxy S5
-> answer = 42
Customer buy 2 phones :
-> nbr -= 2 (== 40)
Customer buy 1 phone :
-> nbr -= 1 (== 41)
UPDATE value in DB : 40
UPDATE value in DB : 41
SQL – ACID Transaction
Atomicity
Consistency
Isolation
Durability
SQL
…but that was before !
NoSQL – Théorème CAP
Availability
Pick two
Consistency
Partition Tolerance
NoSQL – Key/Value-oriented DB
Use case :
- Session storage
- Cache
NoSQL – Document-oriented DB
Use case :
- Natural data modeling
- Fast to develop
- Polyvalent
NoSQL – Column-oriented DB
Use case :
- Large datasets
- Logs
- Write flooding
- BigData
NoSQL – Graph-oriented DB
Use case :
- Social relations
- Graph architecture
NoSQL – Théorème CAP
Availability
Mysql,
PostgreSQL
Couchdb,
Cassandra,
Riak
Pick two
Consistency
Couchbase,
Mongodb, HBase
Partition Tolerance
NoSQL – Replication
A-H
I-P
Partitioning
Q-Z
NoSQL – Replication
A-Z
A-Z
Sharding
A-Z
NoSQL – Replication
A–H
+
I-P
I–P
+
Q-Z
Partitioning + Sharding
Q–Z
+
A-H
NoSQL – Replication
Cross Datacenter Replication (XDCR)
NoSQL – Replication
Tunable consistency
More about MongoDB
Document oriented database
Collections
Big community
Big documentation
Shell client
Supported in several languages
“Transactional” operators
Aggregation
More about MongoDB
Easy to index
Easy to request
Fast to learn
Replica set
Master-Slave replication
Fucking shard key
Hard to maintain
More about Couchbase
Document oriented database
Buckets
TTL
Shell client
Browser Interface
Statistics
Asynchronous write
Master-Master replication
Auto-rebalancing
More about Couchbase
Memcached integration
Index replication
Map/Reduce - Views - Stale
Supported in less languages than MongoDB
Harder to request
Small community
A lot of Memory (at least 4Go)
ElasticSearch
Scalable indexing engine
ElasticSearch
Rivers
JSON Request
Real time GET
Segments + shards
With Leader Election
CP but can be AP
NoSQL is used for Big Data
Big new challenges :
* capture data
* storage
* data exploration
Usages :
* Marketing
* Customer relation
* Research
* Merchandising
* Spying (NSA) ;-)
3V : Variété, Volume,
Vélocité
BigData – Map/Reduce
BigData – Map/Reduce
Lets try to make a youtube video view counter
function reduce(id, docs) {
var res = 0;
function map(doc) {
if (doc.video_id)
emit(doc.video_id, 1);
}
for (var i = 0; I < docs.length; ++i) {
res++;
}
emit(id, res);
OUT :
-
}
“a”, 1
“b”, 1
“a”, 1
“a”, 1
“b”, 1
OUT :
- “a”, 3
- “b”, 2
BigData – Hadoop
Framework
HBase
HDFS
Map/Reduce
JobTracker
Hive
Learn more
Advises :
- Use many different databases, for
each usage, in a same project…
- …but one database to begin
Learn more - Training
MOOCs :
- 10gen educations (Mongodb learning)
- DataStax Academy (Cassandra learning)
Online testing db
Mongodb and Couchdb : pretty easy
Node.JS / Python
You mean BigData ? I tell you Java !
Download datasets, consume API or make a crawler
Enjoy !
Samuel BERTHE
[email protected]
@SamuelBerthe
www.samuel-berthe.fr