NoSQL with MySQL

Transcript NoSQL with MySQL

NoSQL with MySQL
Yekesa Kosuru
Distinguished Architect, Nokia
Peter Zaitsev, Percona
Agenda
•
•
•
•
Part 1 : Problem statement
Part 2 : Background (What is, Why, ACID, CAP)
Part 3 : Building a Key-Value Store
Part 4 : NoSQL with MySQL
PART 1: PROBLEM STATEMENT
NoSQL : No Formal Definition
• No Formal Definition
– Not Only SQL ?
– Don’t use SQL ?
– Limit expressive power ?
• Underlying issues ?
– ACID (Strong Consistency ?, Isolation ?)
– SQL
– Something else ?
• Facts
– Hardware will fail
– Software will have bugs
– Growing traffic, data volumes, unpredictable
– No downtime or expose errors to users
Problem to solve
• Key Value Store Problem Statement
– A consumer facing system with foll. characteristics :
•
•
•
•
•
•
•
•
•
Host large volumes of data & queries
Responsive with predictable performance
Always available (survives failures, software/hardware upgrades)
Low cost of ownership
Minimal administration
Availability : How to service customers (24 X 7) despite failures, upgrades
Scalability : How to scale (capacity, transactions, costs…)
Predictability : Predictable responsiveness
Not suited for all applications
PART 2: BACKGROUND
ACID
• Transactions simplify job of developers
• Client/developer point of view
• Atomicity: Bundled operations
– All of the operations (multiple updates, deletes, inserts) in the
transaction will complete, or none will
• Consistency: The database is in a consistent state when the transaction
begins and ends
– Referential integrity, and other constraints
– All reads must see latest data
• Isolation: The transaction will behave as if it is the only operation being
performed upon the database. Serialized updates i.e locks
• Durability: Upon completion of the transaction, the operation will not be
reversed.
• ACID in single database is efficient
Issues Scaling DBMS
• Databases extend ACID to span multiple nodes using 2PC
• Shared disk & shared nothing architectures
• Cost of 2PC (N nodes involved in transaction)
– Consistency & Isolation over 2PC makes it expensive
– 3N+1 Message Complexity
– N+1 Stable Writes
– Locks – lock resources
RequestCommit
Prepare
Prepare
Prepare
Prepare
Prepare
Prepare
Prepare
Prepared
Prepare
Prepare
Prepare
Commit
• Availability is a pain point
– 2PC is blocking protocol , uses single coordinator and not fault tolerant
– Coordinator failure can block transaction indefinitely
– Blocked transaction causes increased contention
• Rebalancing
• Upgrades
• Total Cost of Ownership
CAP
• Introduced by Prof. Brewer in year 2000
• Consistency, Availability, (Network) Partition Tolerance
– Industry proved that we can guarantee two out of three
• System point of view
• Consistency : Defines rules for the apparent order and visibility of updates
– Variations on consistency
– Eventual Consistency - eventually replicas will be updated – order of
updates may vary - all reads will eventually see it
• Availability : All clients can read & write data to some replica, even in the
presence of failures
• Partition-tolerance : operations will complete, even if network is
unavailable i.e disconnected network
PART 3: BUILDING KEY-VALUE STORE
Important Considerations
• Functional
– Simple API ( Get, Put, Delete - like Hashtable)
– Primary key lookups only (limit expressive power)
• No joins, No non-key lookups, No grouping functions, No constraints …
– Data Model (Key-Value, Value = JSON)
– Read-Your-Write Consistency
• SLA
– Highly Available & Horizontally Scalable
– Performance (low latencies & predictable)
• Uniform cost of queries
• Operational
– Symmetric, Rolling Upgrades, Backups, Alerts, Monitoring
• Total Cost of Ownership (TCO)
– Hardware, Software, Licenses …
Making it Available
•
•
•
•
Make N replicas of data
N = Number of replicas
Use Quorum (R + W > N)
R = Number of Reads
Overlap R & W to make it strongly consistent
Handle Faults (partition tolerance and self heal) W = Number of Writes
– Real faults
• Machine down, NIC down …
• Eventual Consistency (data)
• Prefer A & P in CAP
• If a node is down during write, read repair later – let write proceed
– Intermittent faults
• X and Y are clients of Z; X can talk to Z but Y can’t
• Server does not respond within timeout, stress scenarios …
• Inconsistent cluster view can cause inconsistencies in data
• If two concurrently writing clients have different views of cluster,
resolve inconsistency later - let write proceed
R+W>N
2+2>3
Making it Scalable
• Sharding (or partition data) works well for Capacity
• Large number of logical shards distributed over few physical machines
• How to shard
- Consistent Hashing
A,B,C = nodes
1,2,3,4 = keys
- Hash(key) say “2”, move clockwise to find nearest node “B”
- Hash to create a uniform distribution
- Allows for adding of machines
- Without redesign of hashing
- Without massive movement of data
- Load balancing
- If a machine can’t handle load, move shards to another machine
Vector Clocks
•
•
•
•
•
•
•
•
•
Determine history of a key-value, think data versioning
Need to interpret causality of events, to identify missed updates & detect
inconsistencies
Timestamps are one solution, but rely on synchronized clocks and don't
capture causality
Vector clocks are an alternative method of capturing order in a distributed
system
Node (i) coordinating the write generates a clock at the time of write
Construct: ( nodeid : logical sequence)
– Node A receives first insert : A:1
– Node B receives an update : A:2
Clock: (A:1) is predecessor of (A:1, B:1)
Clock: (A:2) and (A:1, B:1) are concurrent
Reads use vector clocks to detect missed updates & inconsistencies
Eventual Consistency Example 3/2/2
3 replicas
Application
Server
Node A
Client
PUT
Server
Node B
V1 A:1
Write
Write
PUT
Server
Node C
V1 A:1
V1 A:1
Write
NODE 1 DOWN
PUT
Write
V2 (A:1,B:1)
V2 (A:1,B:1)
PUT
NODE 1 UP
Outage
Upgrade
GET
Read
Detect Missing
Update
Read
V1 A:1
V2 (A:1,B:1)
V2 (A:1,B:1)
Read
V2 (A:1,B:1)
GET
Read
Repair
V2 (A:1,B:1)
Missed update added
PART 4: NOSQL WITH MYSQL
NoSQL with MySQL
• Key-Value store implemented on top of MySQL/ InnoDB
• MySQL serves as persistence layer, Java code handles the rest
• Atomicity limited to one operation (auto commit), Consistency & Isolation
limited to one machine (recall issues with 2PC)
• Why MySQL ?
– Easy to install, tune and restore
– Fast, Reliable, Proven
– Performance :
• MVCC, row level locking
• Buffer pool, insert buffering, direct IO
– Data Protection & Integrity :
• Automatic crash recovery
• Backup, Recovery (full and incremental)
– Leverage expertise across ACID and BASE systems
Implementation
• InnoDB Engine (innodb_file_per_table)
• Simple schema
– Auto increment PK
– App Key, Value
– Metadata
– Hash of app key (index)
• Clustered Index and Secondary Index
• KV use simple queries – no joins or grouping functions
• Database is a shard (pick right number of shards)
• Backup & Recovery
Tuning
• Queries are so simple that tuning is focused on IOPS, Memory, TCP,
Sockets, Timeouts, Connection Pool, JDBC, JVM , Queuing etc
– Memory to disk ratio
– Connector/J
useServerPrepStmts=true
useLocalSessionState = true
cachePrepStmts = true
cacheResultSetMetadata = true
tcpKeepAlive=true
tcpTrafficClass=24
autoReconnect=true
maxReconnects=...
allowMultiQueries=true
prepStmtCacheSize=…
blobSendChunkSize=…
largeRowSizeThreshold=…
locatorFetchBufferSize=…
• CentOS 5.5, XFS File System
Scalability Test (TPS/IOPS) EC2’s
Client
TPS
Threads
50.00%
90.00%
99.00%
99.50%
99.90%
Single Server
LAT
LAT
LAT
LAT
LAT
IOPS
1
114.4
8
16
24
28
52
26
2
226.88
8
16
26
30
63
48
4
437.37
9
17
29
34
63
96
5
600.1
8
14
27
32
64
107
10
1061.45
9
18
35
44
77
199
20
1730.69
10
23
52
66
118
307
50
2775.85
12
40
98
123
202
482
100
3207.18
15
78
224
274
407
571
m1.xlarge image
User Dataset = 256GB
(240M 1K values)
Machines = 5
Spindles=4 per box
XFS File System
Buffer Pool=12GB/machine
Memory:Disk=1:4
Workload R/W=100/0
Random Access Pattern
N/R/W=1/1/1
Incremental load to 100
threads
3500
3000
2500
T
P 2000
S
TPS
50% LAT
90% LAT
99% LAT
Server IOPS
1500
1000
500
0
0
20
40
Client Threads
60
80
100
Characteristics
120
Know Your IOPS
Disk
7.2k rpm SATA
10k rpm SAS
15k rpm SAS
Value SSD
Intel X25-M
Intel X25-E
DDR Drive x1
IOPS (random)
90
140
180
400
1500
5000
300000
Performance Test
Characteristics
User Dataset = 900GB
Machines = 9
Spindles=8 (15K)/machine
XFS File System (RAID0)
Buffer Pool=16GB/machine
Memory:Disk=1:6
Cache=16%
Workload R/W=100/0
Recency Skew
i.e Recent Access=80/20
N/R/W=3/2/2
Incremental load to 10K
(TPS)
0.6
0.5
0.4
S
E
C 0.3
S
0.2
0.1
0
1458
2393
TPS
3328
4263
5198
50%
6132
90%
7066
99%
8000
8937
9863
Tips By Peter Zaitsev
Percona
Why XFS
File System
Threads(1)
Threads(16)
RNDRD
243
582
RNDWR
219
218
Total
462
800
RNDRD
239
552
RNDWR
205
408
Total
444
960
EXT3
XFS
MySQL for NoSQL




Tune for Simple Queries
Memory is the most important

innodb_buffer_pool_size
Are we looking at intensive writes ?

innodb_log_file_size

RAID with BBU

XFS + O_DIRECT (avoid per inode locking)
Restrict Concurrency on database

Connection pool size or innodb_thread_concurrency
More on Scalability



Is contention problem

Use MySQL 5.5 or Percona Server
 Tune innodb_buffer_pool_instances
Index update contention

Use Multiple Tables
 Or MySQL Partitions if you can't
Eliminate Query Parsing Overhead

Consider HandlerSocket
 NoSQL Interface to InnoDB Storage Engine
Work on Result Stability

Ensure you're getting stable results

InnoDB Flushing can cause non uniform performance
THANK YOU
[email protected]

NoSQL with MySQL

Transcript NoSQL with MySQL

Directory