Scaling MySQL in the Cloud Presentation 2

Download Report

Transcript Scaling MySQL in the Cloud Presentation 2

Scaling MySQL in the Cloud
Moshe Shadmon
ScaleDB
1
Shared Disk vs. Shared Nothing
Shared Nothing
Shared Disk
Masters
Slaves
2
Shared Disk Advantages




Start small, grow incrementally
Scalable AND highly available
Add capacity on demand with zero downtime
Simplicity
 No need to partition data
 No need for master-slave
3
The Virtualized Cloud Database
My SQL Server
Server 1
Server 2
OSS DBMS
VM VMVM
VM VM
Storage Engine
OSS OSSOSS
DBMSDBMS
DBMS
OSS OSS
DBMSDBMS
ScaleDB
ScaleDB
ScaleDB
ScaleDB
ScaleDB
Local Disk
Shared Storage
Shared Nothing
Shared Disk
4
ScaleDB As the Storage Engine
MySql Database
Management Level
ScaleDB Storage Engine
Storage Engine
Level
MySql Server
5
ScaleDB’s Internal Architecture
ScaleDB
Cluster Manager
Global Lock
Manager
Global Sync
Manager
Global
Recovery
Manager
ScaleDB API
Transaction Manager
Lock Manager
Threads Manager
Local Lock
Manager
Log Manager
Buffer Manager
Index
Manager
Data
Manager
Local Sync
Coordinator
Recovery
Manager
Storage
Manager
ScaleDB
Node
ScaleDB Storage System
ScaleDB
Storage Sysytem
Cache & Storage
Devices
Cache & Storage
Devices
6
Deploying ScaleDB
Application Layer
Database Layer
(Physical or VM nodes)
ScaleDB
Cluster
Manager
Application
Node 1
Node 2
DBMS
DBMS
ScaleDB
ScaleDB
Node N
…
DBMS
ScaleDB
ScaleDB
Storage Layer
Shared
Storage
Shared
Storage
7
The Storage Engine
• Pluggable Storage Engine
–
–
–
–
–
–
Transactional storage engine
Supports MySQL Storage Engine API
Reads/Writes done via network to a shared storage
Maintains a local cache
Local Lock Manager – manage locking at the node level
Connector to Cluster Manager – synchronize operations at a
cluster level
8
The Cluster Manager
• Distributed Lock Manager – manage cluster level locks
– Locks can be held over any type of resource:
• DBMS, Table, Partition, File, Block, Row etc.
– Supports multiple lock modes:
• Read, Read/Write, exclusive etc.
– Synchronize state using messaging
• Local Lock Manager – manage locks at a node level
– Maintains locks at the node level
– Synchronize state using shared memory
• Identifies node failures and manage recovery
9
The Cluster Manager
• Distributed Lock Manager
– Synchronize conflicting processes between nodes in the
cluster
• Example: 2 nodes need to update the same resource at the same
time.
– The challenge:
• Requests are done via the network – can be expensive:
– Internal operations may be in nanoseconds , network operations are in milliseconds
– The solution
• Requests are send only when conflicts occur
10
The Storage
• Independent storage nodes
– Accessible via network
– Each node has a Cache Layer and a Persistent Layer
– Database nodes can force the write to disk based on
transactional requirement
– Data can be distributed over multiple storage nodes
– Each Storage Node can be mirrored
– Each Storage Node may have a Hot Backup Node
11
The Storage Node
Storage Node
Interface to
Storage
Cache
Based On LRU
– Manage the data in cache and
flush to disk when required.
– Supports the storage engine
calls for Read, Write, etc.
– Supports pushed calls from
storage engine such Count
Rows, Search, etc.
Disks
– Each node is a Linux machine.
No need for Network File
System (NFS).
12
Scaling the Storage Tier
Database Layer
(Physical or VM nodes)
ScaleDB
Cluster
Manager
Node 1
Node 2
DBMS
DBMS
ScaleDB
ScaleDB
ScaleDB
Local
Cache
Local
Cache
Local
Cache
Node N
…
DBMS
Storage Layer
Global
Cache
TCP/UDP
TCP/UDP
TCP/UDP
TCP/UDP
Cache
Cache
Cache
Cache
Shared
Storage
Shared
Storage
Shared
Storage
Shared
Storage
13
Global Cache
• Guarantees cache coherency
• Manages caching of shared data
• Minimizes access time to data which is not
in local cache and would otherwise be read
from disk
• Implements fast direct memory access over
high-speed interconnects for all data blocks
and types
• Uses an efficient and scalable messaging
protocol
14
HA of the Storage Tier
Database Layer
(Physical or VM nodes)
ScaleDB
Cluster
Manager
Node 1
Node 2
DBMS
DBMS
ScaleDB
ScaleDB
Node N
…
DBMS
ScaleDB
ScaleDB
Storage
Layer
Shared
Storage
Mirrored
Storage
Hot
Backup
15
Scaling the Storage Tier
Database Layer
(Physical or VM nodes)
ScaleDB
Cluster
Manager
Node 1
Node 2
DBMS
DBMS
ScaleDB
ScaleDB
Node N
…
DBMS
ScaleDB
Partitioned
Partitioned
Partitioned
Storage
Storage
Storage
Partitioned
Partitioned
Partitioned
Mirrored
Mirrored
Mirrored
Partitioned
Partitioned
Partitioned
Hot
Hot
Hot
Partition 2
Partition Q
Partition 1
Backup
Backup
Backup
16
Scaling the Storage Tier
Node N
Database Layer
MySQL
(Physical or VM nodes)
ScaleDB
Local
Cache
ScaleDB
Cluster
Manager
• Read
– From Local Cache
– From Main Or Mirror
• Get From Cache
• Get From Storage
• Write
– To local cache
– At end of transaction
Cache
Cache
Cache
Storage
Main
Storage
Main
Cache
Cache
Storage
Mirror
Storage
Mirror
Storage • multicast to main and
mirror
• optional acknowledgement:
– after receive
– after write
17
Traditional Query Processing
What Were Yesterday Sales ?
DBMS Server
Storage Array
Get The
Sales
Table
Process
Table
Data
Retrieve
Entire Sales
Table
18
ScaleDB Query Processing
DBMS Server
What Were Yesterday Sales ?
Storage Nodes
Get
October
15 Sales
Get
October
15 Sales
Get
October
15 Sales
Get
October
15 Sales
19
Scaling the Storage Tier
• Advantages
– Parallel processing:
• I/O calls are executed simultaneously on multiple Storage Nodes.
• Logic pushed to storage layer:
“SELECTcustomer_name from calls WHERE amount > 200”
• Traditional approach – return all rows to the database
• ScaleDB storage – return selected rows to the database
–
–
–
–
–
Leverage cache on multiple storage nodes
Storage layer can be expended without downtime
Data is Mirrored
Support for Hot-Backup
Low cost
20
High Availability
• Failure of a node
– Detected by the Cluster Manager
• A surviving node is requested to undo uncommitted transactions
• Failure of the Cluster Manager
– Detected by the Standby Cluster Manager
• Requests all nodes to undo uncommitted transactions
• Failure of a Storage Node
– Continue with a mirrored storage – or –
– Use the Storage Node Log to recover
21
Performance / Tuning
• Occurs when 2 or more nodes want the same
resource at the same time
• Types of Contention:
– Read/Read contention – is never a problem because of
the shared disk system
– Read/Write contention – reader is requested to release
the block and grant is provided to writer
– Write/Read or Write/Write –
• Writer sends block to the global cache layer,
• Buffer invalidate message is send to the other nodes
• Requestor receives the grant
22
Performance / Tuning
• Fast Network between the nodes
– 2 logical networks:
• Between the database nodes and the Cluster Manager
• Between the database nodes and the storage
– Optimize Socket Receive Buffers ( 256 KB – 1MB )
• Partition requests to maintain locality of data
– Send requests that update/query the same data to the same node
• By Database
• By Table
• By Table with PK
– Logic can change dynamically to adopt to changes
• Changes in data distribution
• Changes in user behaviors
• Additional DBMS nodes
23
ScaleDB: Elastic/Enterprise Database
Function
SimpleDB
RDS
ScaleDB
Transactions
No
Yes
Yes
Joins
No
Yes
Yes
No (Eventual)
Yes
Yes
SQL Support
No
Yes
Yes
ACID Compliant
No
Yes
Yes
Supports MySQL applications
without modification
Dynamic Elasticity (w/o
interruption)
High-Availability
No
Yes
Yes
Yes
No
Yes
Yes
No
Yes
Eliminates Partitioning
Yes
No
Yes
Eliminates possible 5-minute
data loss upon failure
Yes
No
Yes
Data Consistency
24
Value Proposition
• Runs on low-cost cloud infrastructures (e.g. Amazon)
• High-availability, no single point of failure
• Dramatically easier set-up & maintenance
– No partitioning/repartitioning
– No slave and replication headaches
– Simplified tuning
• Scales up/down without interrupting your application
• Lower TCO
25