Programming Support and Adaptive Checkpointing for High
Download
Report
Transcript Programming Support and Adaptive Checkpointing for High
PROGRAMMING SUPPORT AND
ADAPTIVE CHECKPOINTING FOR
HIGH-THROUGHPUT DATA SERVICES
WITH LOG-BASED RECOVERY
Jingyu Zhou
Shanghai Jiao Tong Univ
Jiesheng Wu
Microsoft
DSN 2010
Caijie Zhang
Google Inc.
Hong Tang
Yahoo!
Tao Yang
UC Santa Barbara
Backgrounds
Many large-scale data-mining and offline
applications in Google, Yahoo, Microsoft, Ask.com,
etc. require
High
data parallelism/throughput
Data persistence. But not so stringent availability
E.g., URL property service (UPS) at Ask.com search
offline mining platform
Hundreds
of app. modules access UPS
Examples of high-throughput data services for web
mining/search
Internet
Web documents
Data/ info
service
Crawler
Crawler
Crawler
Document
DB
Document
DB
Document
DB
10-50 billion URLs
Data/info
service
Data mining
Data mining
job
job
…
Data mining
job
e.g. URL property
service. 100K-500K/s
Existing Approaches for Highperformance and Persistence
Database systems
suffer
from high overhead, limits its performance while
supporting general features
Need more machine resources
Related work and well-known techniques for high
availability
Data
replication
Log-based recovery
Checkpointing
Challenges and Focus of this work
System design with careful selection and integration
of fault-tolerant techniques for high throughput
computing.
Trade
off in availability, but allow some down time.
Low cost: logging/checkpoint.
Fine-grain
for minimum service disruption.
Local data recovery. Periodic remote backup.
Programming support
Lightweight,
services.
simplifying construction of robust data
SLACH: Selective Logging & Adaptive
CHeckpointing
Targeted data services
Request-driven
thread model.
In-memory objects.
Data independence.
Similar
to key-value stores in BigTable/Dynamo, but higher
throughput.
Architecture of SLACH
Main Techniques
Selective operation logging
Only log write operations
(oid, op_type, parameters, timestamp)
Write-ahead log, i.e., write then apply operations
Object-level checkpoint to avoid service disruptions
with adaptive load control
Ckpt objects one-by-one. Still allow concurrent access of
other objects
Perform checkpointing when load is low to amortize cost of
checkpointing
Light weight API while supporting legacy code.
Object-level Checkpoints
Adaptive Checkpointing Control
Goal is to balance ckpt. cost and recovery speed
Ckpt.
less frequently-> larger logs -> lengthy recovery
Ckpt. too often -> higher overhead
Ideally,
High
server load -> ckpt. less frequently
Low server load -> ckpt. more frequently
Adjust between a Low Watermark (LW) & High
Watermark (HW) of service loads
Loadcurr
= α×loadprev+(1-α)×sample
Adaptive Checkpointing Frequency
Ckpt. threshold between LB and UB
LB,
UB are log size parameters, determined by app.
Threshold LB F(load) (UB LB),
where
SLACH Programming Support
Application developers
Call SLACH function log() to log an object operation
Define 3 callback functions:
1) what to checkpoint (call SLACH’s ckpt() for each selected
object,
2) recover one object from a checkpoint,
3) replay a log operation.
SLACH
Provide functions log() and ckpt().
Call user’s checkpoint callback fun during checkpoint.
Call a user’s recover function during checkpoint recover.
Call a user’s replay function when recovering from a log.
SLACH API for Applications
class SLACH::API {
public:
/* register ckpt. policy and parameters */
void register_policy(const Policy& p);
/* log one write operation */
void log(int64_t obj_id, int op, ...);
/* checkpoint one object */
void ckpt(int64_t obj_id, const void* addr, uint32_t size);
};
SLACH Interface
class SLACH::Application {
…
protected:
/* application checkpoint callback function */
virtual void ckpt_callback()=0;
/* callback of loading one object checkpoint*/
virtual void load_one_callback(int64_t obj_id, const void *addr,uint32_t
size)=0;
/* callback of replaying one operation log */
virtual void replay_one_callback(int64_t obj_id, int op, const para_vec&
args)=0;
};
An Example: Application-level code
struct Item {
double price;
int quantity;
};
class MyService : public SLACH::Application {
private:
Item obj[1000];
SLACH::API slach_; /* SLACH API */
static const int OP_PRICE=0;/* an op type */
public:
void update_price(int id, double p) {
slach_.log(id, OP_PRICE, &p, sizeof(p));
obj[id].price = p;
}
Application objects
being accessed
Log selected object
update operation
An Example: Call-back functions
SLACH calls this user function
during checkpointing.
void ckpt_callback() {
for (int i=0; i<1000 ; i++)
slach_.ckpt(i, &obj[i], sizeof(obj[i]));
}
SLACH calls this when
recovering an object from a
checkpoint .
void load_one_callback(int64_t id, const void
*p, uint32_t size) {
memcpy(&obj[id], p, size);
}
SLACH calls this when
recovering an object by log
replaying
void replay_one_callback(int64_t id, int op,
const para_vec& args) {
switch (op) {
case OP_PRICE:
obj[id].price = *(double*)args[0].second;
break;
// ...
}
SLACH Implementation and
Applications
Part of Ask.com middleware infrastructure in C++ for
data mining and search offline platform
Application samples:
UPS (URL property service) for recording property of all
URLs crawled/collected.
HIS (Host information service) for recording property of all
hosts crawled on the web.
20-80% of write traffic. Running on a cluster of hundreds of
machines. In production for last 3 years.
Significantly reduced development time (1-2 months vs.
few days).
Characteristics of UPS/HIS
Perfor. characteristics of UPS/HIS per partition.
Data
Max. Read
Max. Write
UPS
1.9GB
110K Req/s
56K Req/s
HIS
2.1GB
58K Req/s
16K Req/s
Parameters for adaptive ckpt. Control
UPS
HIS
0.8
α
Moving avg.
0.8
LB/UB
low/upper b.
1M-8M entries 0.3M-1.8M
LW/HW
L/H
watermark
20%-85%
35%-85%
β
Scaling
3
6
w
Sampling win.
5s
5s
Evaluation
Impact of logging overhead
System behavior during checkpointing
Effectiveness of adaptive checkpoint control
Performance comparison of hash table
implementation using SLACH and BerkeleyDB
Evaluation Setting
Benchmarks
UPS
(URL property service)
HIS (Host-level property service)
Persistent Hash Table (PHT)
Metric: throughput loss percent
Successful Re quests
LossPercent (1
) 100
TotalRe quests
Hardware: a 15 node cluster, gigabit link
Selective Logging Overhead of UPS
• Base: logging is
disabled
• Log: selective
logging is enabled
Negligible impact when
server load < 40%.
System Performance During
Checkpointing (100% server load)
During ckpt, 8.9%
throughput drop
During ckpt, 57.6%
increase of response
time
Effectiveness of Adaptive Threshold Controller –
Performance Comparison in UPS
• Fixed threshold policy, 8M has lower runtime overhead – less frequent ckpt
• Adaptive approach has comparable performance as fixed policy of 8M.
Effectiveness of Threshold Controller –
Recovery Speed
• Fixed threshold -> fixed log size -> same recovery time
• Adaptive approach: small log for light load (less recovery time), large log for higher
load (more recovery time)
SLACH is better for all value sizes, because
1. BDB incurs more per-operation
overhead
2. BDB involves more disk I/Os
PHT vs. Berkeley DB
30-B value, SLACH is 5.3 times higher
SLACH ckpt has less overhead
1. BDB ckpt is not async
2. SLACH fuzzy ckpt still allow access
Conclusions
SLACH contributions
A lightweight programming framework for very highthroughput, persistent data services
Simplify application construction while meeting reliability
demands
Selective logging to enhance performance
System design with careful integration of multiple techniques
Dynamic adjust ckpt. frequency to meet throughput demands
Fine-grained ckpt without service disruptions
Evaluation of integrated scheme in production
applications.
Data and Failure Models
Data independence and object-oriented access
model
Key-value
store as in Dynamo/BigTable, but with much
higher throughput demand per machine
Each object is a continuous memory block
Middleware
infrastructure can handle noncontiguous ones
Fail-stop
Focus
on local recovery due to app. failures
OS/Hardware failure can be dealt with remote ckpt.
Implemented,
but not the scope of this paper