Zephyr: Live Migration in Shared Nothing Databases for Elastic

Download Report

Transcript Zephyr: Live Migration in Shared Nothing Databases for Elastic

Aaron J. Elmore, Sudipto Das,
Divyakant Agrawal, Amr El Abbadi
Distributed Systems Lab
University of California Santa Barbara

Serve thousands of applications (tenants)
◦ AppEngine, Azure, Force.com

Tenants are (typically)
◦
◦
◦
◦
Small
SLA sensitive
Erratic load patterns
Subject to flash crowds
 i.e. the fark, digg, slashdot, reddit effect (for now)


Support for Multitenancy is critical
Our focus: DBMSs serving these platforms
Sudipto Das {[email protected]}
What the tenant wants…
What the service
provider wants…
Sudipto Das {[email protected]}
Resources
Capacity
Resources
Static provisioning for peak is inelastic
Capacity
Demand
Demand
Time
Time
Traditional Infrastructures
Deployment in the Cloud
Unused resources
Slide Credits: Berkeley RAD Lab
Sudipto Das {[email protected]}
Load Balancer
Application/
Web/Caching
tier
Database tier
Sudipto Das {[email protected]}

Migrate a tenant’s database in a Live
system
◦ A critical operation to support elasticity

Different from
◦ Migration between software versions
◦ Migration in case of schema evolution
Sudipto Das {[email protected]}


VM migration [Clark et al., NSDI 2005]
One tenant-per-VM
◦ Pros: allows fine-grained load balancing
◦ Cons
 Performance overhead
 Poor consolidation ratio [Curino et al., CIDR 2011]

Multiple tenants in a VM
◦ Pros: good performance
◦ Cons: Migrate all tenants  Coarse-grained load
balancing
Sudipto Das {[email protected]}

Multiple tenants share the same
database process
◦ Shared process multitenancy
◦ Example systems: SQL Azure, ElasTraS, RelationalCloud,
and may more

Migrate individual tenants


VM migration cannot be used for fine-grained
migration
Target architecture: Shared Nothing
◦ Shared storage architectures: see our VLDB 2011 Paper
Sudipto Das {[email protected]}
Sudipto Das {[email protected]}

How to ensure no downtime?
 Need to migrate the persistent database image
(tens of MBs to GBs)

How to guarantee correctness during
failures?
 Nodes can fail during migration
 How to ensure transaction atomicity and durability?
 How to recover migration state after failure?
 Nodes recover after a failure

How to guarantee serializability?
 Transaction correctness equivalent to normal
operation

How to minimize migration cost? …
Sudipto Das {[email protected]}

Downtime
◦ Time tenant is unavailable

Service Interruption
◦ Number of operations failing/transactions aborting

Migration Overhead/Performance
impact
◦ During normal operation, migration, and after
migration

Additional Data Transferred
◦ Data transferred in addition to DB’s persistent image
Sudipto Das {[email protected]}

Migration executed in phases
 Starts with transfer of minimal information to destination
(“wireframe”)


Source and destination concurrently execute
transactions in one migration phase
Database pages used as granule of migration
 Pages “pulled” by destination on-demand

Minimal transaction synchronization
 A page is uniquely owned by either source or destination
 Leverage page level locking

Logging and handshaking protocols to
tolerate failures
Sudipto Das {[email protected]}
 For
this talk
◦ Small tenants
 i.e. not sharded across nodes.
◦ No replication
◦ No structural changes to indices

Extensions in the paper
◦ Relaxes these assumptions
Sudipto Das {[email protected]}
P1
Owned Pages
P2
P3
Pn
Active transactions
TS1,…,
TSk
Source
Destination
Page owned by Node
Page not owned by Node
Sudipto Das {[email protected]}
Freeze index wireframe and migrate
P1
Owned Pages
Active transactions
P2
P3
P1
P2
P3
Pn
Pn
Un-owned Pages
TS1,…,
TSk
Source
Destination
Page owned by Node
Page not owned by Node
Sudipto Das {[email protected]}
Source
Destination
Sudipto Das {[email protected]}
Requests for un-owned pages can block
P1
P2
P3
P3 accessed
by TDi
Pn
Old, still active
transactions
TSk+1,…
, TSl
Source
P1
P2
P3
P3 pulled
from
source
Pn
TD1,…,
TDm
New transactions
Destination
Index wireframes remain frozen
Page owned by Node
Page not owned by Node
Sudipto Das {[email protected]}
Pages can be pulled by the destination, if needed
P1
P2
P3
P1
P2
P3
Pn
P1, P2, …
pushed
from source
Pn
TDm+1,
…, TDn
Completed
Source
Destination
Page owned by Node
Page not owned by Node
Sudipto Das {[email protected]}
Index wireframe un-frozen
P1
P2
P3
Pn
TDn+1,…
, TDp
Source
Destination
Page owned by Node
Page not owned by Node
Sudipto Das {[email protected]}

Once migrated, pages are never pulled
back by source
◦ Transactions at source accessing migrated pages are
aborted

No structural changes to indices during
migration
◦ Transactions (at both nodes) that make structural
changes to indices abort

Destination “pulls” pages on-demand
◦ Transactions at the destination experience higher
latency compared to normal operation
Sudipto Das {[email protected]}

Only concern is “dual mode”
◦ Init and Finish: only one node is executing transactions



Local predicate locking of internal index
and exclusive page level locking
between nodes  no phantoms
Strict 2PL  Transactions are locally
serializable
Pages transferred only once
◦ No Tdest  Tsource conflict dependency

Guaranteed serializability
Sudipto Das {[email protected]}

Transaction recovery
◦ For every database page, transactions at source
ordered before transactions at destination
◦ After failure, conflicting transactions replayed in
the same order

Migration recovery
◦ Atomic transitions between migration modes
 Logging and handshake protocols
◦ Every page has exactly one owner
 Bookkeeping at the index level
Sudipto Das {[email protected]}

In the presence of arbitrary repeated
failures, Zephyr ensures:
◦ Updates made to database pages are consistent
◦ A failure does not leave a page without an owner
◦ Both source and destination are in the same
migration mode

Guaranteed termination and
starvation freedom
Sudipto Das {[email protected]}

Replicated Tenants

Sharded Tenants

Allow structural changes to the indices
◦ Using shared lock managers in the dual mode
Sudipto Das {[email protected]}

Prototyped using an open source OLTP
database H2
◦
◦
◦
◦

Supports standard SQL/JDBC API
Serializable isolation level
Tree Indices
Relational data model
Modified the database engine
◦ Added support for freezing indices
◦ Page migration status maintained using index
◦ Details in the paper…

Tungsten SQL Router migrates JDBC
connections during migration
Sudipto Das {[email protected]}


Two database nodes, each with a DB
instance running
Synthetic benchmark as load
generator
◦ Modified YCSB to add transactions
 Small read/write transactions

Compared against Stop and Copy
(S&C)
Sudipto Das {[email protected]}
System
Controller
Metadata
Default transaction
parameters:
10 operations per
transaction 80% Read,
15% Update, 5% Inserts
Workload: 60 sessions
100 Transactions per session
Migrate
Hardware: 2.4 Ghz Intel
Core 2 Quads, 8GB RAM,
7200 RPM SATA HDs with
32 MB Cache
Gigabit ethernet
Default DB Size: 100k rows
(~250 MB)
Sudipto Das {[email protected]}

Downtime (tenant unavailability)
◦ S&C: 3 – 8 seconds (needed to migrate,
unavailable for updates)
◦ Zephyr: No downtime. Either source or destination
is available

Service interruption (failed operations)
◦ S&C: ~100 s – 1,000s. All transactions with updates
are aborted
◦ Zephyr: ~10s – 100s. Orders of magnitude less
interruption
Sudipto Das {[email protected]}

Average increase in transaction latency
(compared to the 6,000 transaction
workload without migration)
◦ S&C: 10 – 15%. Cold cache at destination
◦ Zephyr: 10 – 20%. Pages fetched on-demand

Data transfer
◦ S&C: Persistent database image
◦ Zephyr: 2 – 3% additional data transfer (messaging
overhead)

Total time taken to migrate
◦ S&C: 3 – 8 seconds. Unavailable for any writes
◦ Zephyr: 10 – 18 seconds. No-unavailability
Sudipto Das {[email protected]}
Orders of
magnitude
fewer failed
operations
Sudipto Das {[email protected]}

Proposed Zephyr, a live database
migration technique with no downtime
for shared nothing architectures
◦ The first end to end solution with safety, correctness
and liveness guarantees


Prototype implementation on a
relational OLTP database
Low cost on a variety of workloads
Sudipto Das {[email protected]}
Txns
Source
Destination
Sudipto Das {[email protected]}
Txns
Source
Destination
Sudipto Das {[email protected]}
Txns
Source
Destination
Sudipto Das {[email protected]}
Txns
Source
Destination
Sudipto Das {[email protected]}
Txns
Source
Destination
Sudipto Das {[email protected]}
37
Txns
Source
Destination
Sudipto Das {[email protected]}

Either source or destination is serving the
tenant
◦ No downtime

Serializable transaction execution
◦ Unique page ownership
◦ Local multi-granularity locking

Safety in the presence of failures
◦ Transactions are atomic and durable
◦ Migration state is recovered from log
 Ensure consistency of the database state
Sudipto Das {[email protected]}

Wireframe copy
 Typically orders of magnitude smaller than data

Operational overhead during
migration
 Extra data (in addition to database pages)
transferred

Transactions aborted during migration
Sudipto Das {[email protected]}
Failures due to
attempted
modification of
Index structure
Sudipto Das {[email protected]}



Only committed
transaction
reported
Loss of cache for
both migration
types
Zephyr results in a
remote fetch
Sudipto Das {[email protected]}