Zephyr: Live Migration in Shared Nothing Databases for Elastic
Download
Report
Transcript Zephyr: Live Migration in Shared Nothing Databases for Elastic
Aaron J. Elmore, Sudipto Das,
Divyakant Agrawal, Amr El Abbadi
Distributed Systems Lab
University of California Santa Barbara
Serve thousands of applications (tenants)
◦ AppEngine, Azure, Force.com
Tenants are (typically)
◦
◦
◦
◦
Small
SLA sensitive
Erratic load patterns
Subject to flash crowds
i.e. the fark, digg, slashdot, reddit effect (for now)
Support for Multitenancy is critical
Our focus: DBMSs serving these platforms
Sudipto Das {[email protected]}
What the tenant wants…
What the service
provider wants…
Sudipto Das {[email protected]}
Resources
Capacity
Resources
Static provisioning for peak is inelastic
Capacity
Demand
Demand
Time
Time
Traditional Infrastructures
Deployment in the Cloud
Unused resources
Slide Credits: Berkeley RAD Lab
Sudipto Das {[email protected]}
Load Balancer
Application/
Web/Caching
tier
Database tier
Sudipto Das {[email protected]}
Migrate a tenant’s database in a Live
system
◦ A critical operation to support elasticity
Different from
◦ Migration between software versions
◦ Migration in case of schema evolution
Sudipto Das {[email protected]}
VM migration [Clark et al., NSDI 2005]
One tenant-per-VM
◦ Pros: allows fine-grained load balancing
◦ Cons
Performance overhead
Poor consolidation ratio [Curino et al., CIDR 2011]
Multiple tenants in a VM
◦ Pros: good performance
◦ Cons: Migrate all tenants Coarse-grained load
balancing
Sudipto Das {[email protected]}
Multiple tenants share the same
database process
◦ Shared process multitenancy
◦ Example systems: SQL Azure, ElasTraS, RelationalCloud,
and may more
Migrate individual tenants
VM migration cannot be used for fine-grained
migration
Target architecture: Shared Nothing
◦ Shared storage architectures: see our VLDB 2011 Paper
Sudipto Das {[email protected]}
Sudipto Das {[email protected]}
How to ensure no downtime?
Need to migrate the persistent database image
(tens of MBs to GBs)
How to guarantee correctness during
failures?
Nodes can fail during migration
How to ensure transaction atomicity and durability?
How to recover migration state after failure?
Nodes recover after a failure
How to guarantee serializability?
Transaction correctness equivalent to normal
operation
How to minimize migration cost? …
Sudipto Das {[email protected]}
Downtime
◦ Time tenant is unavailable
Service Interruption
◦ Number of operations failing/transactions aborting
Migration Overhead/Performance
impact
◦ During normal operation, migration, and after
migration
Additional Data Transferred
◦ Data transferred in addition to DB’s persistent image
Sudipto Das {[email protected]}
Migration executed in phases
Starts with transfer of minimal information to destination
(“wireframe”)
Source and destination concurrently execute
transactions in one migration phase
Database pages used as granule of migration
Pages “pulled” by destination on-demand
Minimal transaction synchronization
A page is uniquely owned by either source or destination
Leverage page level locking
Logging and handshaking protocols to
tolerate failures
Sudipto Das {[email protected]}
For
this talk
◦ Small tenants
i.e. not sharded across nodes.
◦ No replication
◦ No structural changes to indices
Extensions in the paper
◦ Relaxes these assumptions
Sudipto Das {[email protected]}
P1
Owned Pages
P2
P3
Pn
Active transactions
TS1,…,
TSk
Source
Destination
Page owned by Node
Page not owned by Node
Sudipto Das {[email protected]}
Freeze index wireframe and migrate
P1
Owned Pages
Active transactions
P2
P3
P1
P2
P3
Pn
Pn
Un-owned Pages
TS1,…,
TSk
Source
Destination
Page owned by Node
Page not owned by Node
Sudipto Das {[email protected]}
Source
Destination
Sudipto Das {[email protected]}
Requests for un-owned pages can block
P1
P2
P3
P3 accessed
by TDi
Pn
Old, still active
transactions
TSk+1,…
, TSl
Source
P1
P2
P3
P3 pulled
from
source
Pn
TD1,…,
TDm
New transactions
Destination
Index wireframes remain frozen
Page owned by Node
Page not owned by Node
Sudipto Das {[email protected]}
Pages can be pulled by the destination, if needed
P1
P2
P3
P1
P2
P3
Pn
P1, P2, …
pushed
from source
Pn
TDm+1,
…, TDn
Completed
Source
Destination
Page owned by Node
Page not owned by Node
Sudipto Das {[email protected]}
Index wireframe un-frozen
P1
P2
P3
Pn
TDn+1,…
, TDp
Source
Destination
Page owned by Node
Page not owned by Node
Sudipto Das {[email protected]}
Once migrated, pages are never pulled
back by source
◦ Transactions at source accessing migrated pages are
aborted
No structural changes to indices during
migration
◦ Transactions (at both nodes) that make structural
changes to indices abort
Destination “pulls” pages on-demand
◦ Transactions at the destination experience higher
latency compared to normal operation
Sudipto Das {[email protected]}
Only concern is “dual mode”
◦ Init and Finish: only one node is executing transactions
Local predicate locking of internal index
and exclusive page level locking
between nodes no phantoms
Strict 2PL Transactions are locally
serializable
Pages transferred only once
◦ No Tdest Tsource conflict dependency
Guaranteed serializability
Sudipto Das {[email protected]}
Transaction recovery
◦ For every database page, transactions at source
ordered before transactions at destination
◦ After failure, conflicting transactions replayed in
the same order
Migration recovery
◦ Atomic transitions between migration modes
Logging and handshake protocols
◦ Every page has exactly one owner
Bookkeeping at the index level
Sudipto Das {[email protected]}
In the presence of arbitrary repeated
failures, Zephyr ensures:
◦ Updates made to database pages are consistent
◦ A failure does not leave a page without an owner
◦ Both source and destination are in the same
migration mode
Guaranteed termination and
starvation freedom
Sudipto Das {[email protected]}
Replicated Tenants
Sharded Tenants
Allow structural changes to the indices
◦ Using shared lock managers in the dual mode
Sudipto Das {[email protected]}
Prototyped using an open source OLTP
database H2
◦
◦
◦
◦
Supports standard SQL/JDBC API
Serializable isolation level
Tree Indices
Relational data model
Modified the database engine
◦ Added support for freezing indices
◦ Page migration status maintained using index
◦ Details in the paper…
Tungsten SQL Router migrates JDBC
connections during migration
Sudipto Das {[email protected]}
Two database nodes, each with a DB
instance running
Synthetic benchmark as load
generator
◦ Modified YCSB to add transactions
Small read/write transactions
Compared against Stop and Copy
(S&C)
Sudipto Das {[email protected]}
System
Controller
Metadata
Default transaction
parameters:
10 operations per
transaction 80% Read,
15% Update, 5% Inserts
Workload: 60 sessions
100 Transactions per session
Migrate
Hardware: 2.4 Ghz Intel
Core 2 Quads, 8GB RAM,
7200 RPM SATA HDs with
32 MB Cache
Gigabit ethernet
Default DB Size: 100k rows
(~250 MB)
Sudipto Das {[email protected]}
Downtime (tenant unavailability)
◦ S&C: 3 – 8 seconds (needed to migrate,
unavailable for updates)
◦ Zephyr: No downtime. Either source or destination
is available
Service interruption (failed operations)
◦ S&C: ~100 s – 1,000s. All transactions with updates
are aborted
◦ Zephyr: ~10s – 100s. Orders of magnitude less
interruption
Sudipto Das {[email protected]}
Average increase in transaction latency
(compared to the 6,000 transaction
workload without migration)
◦ S&C: 10 – 15%. Cold cache at destination
◦ Zephyr: 10 – 20%. Pages fetched on-demand
Data transfer
◦ S&C: Persistent database image
◦ Zephyr: 2 – 3% additional data transfer (messaging
overhead)
Total time taken to migrate
◦ S&C: 3 – 8 seconds. Unavailable for any writes
◦ Zephyr: 10 – 18 seconds. No-unavailability
Sudipto Das {[email protected]}
Orders of
magnitude
fewer failed
operations
Sudipto Das {[email protected]}
Proposed Zephyr, a live database
migration technique with no downtime
for shared nothing architectures
◦ The first end to end solution with safety, correctness
and liveness guarantees
Prototype implementation on a
relational OLTP database
Low cost on a variety of workloads
Sudipto Das {[email protected]}
Txns
Source
Destination
Sudipto Das {[email protected]}
Txns
Source
Destination
Sudipto Das {[email protected]}
Txns
Source
Destination
Sudipto Das {[email protected]}
Txns
Source
Destination
Sudipto Das {[email protected]}
Txns
Source
Destination
Sudipto Das {[email protected]}
37
Txns
Source
Destination
Sudipto Das {[email protected]}
Either source or destination is serving the
tenant
◦ No downtime
Serializable transaction execution
◦ Unique page ownership
◦ Local multi-granularity locking
Safety in the presence of failures
◦ Transactions are atomic and durable
◦ Migration state is recovered from log
Ensure consistency of the database state
Sudipto Das {[email protected]}
Wireframe copy
Typically orders of magnitude smaller than data
Operational overhead during
migration
Extra data (in addition to database pages)
transferred
Transactions aborted during migration
Sudipto Das {[email protected]}
Failures due to
attempted
modification of
Index structure
Sudipto Das {[email protected]}
Only committed
transaction
reported
Loss of cache for
both migration
types
Zephyr results in a
remote fetch
Sudipto Das {[email protected]}