Slides . - Department of Computer Science

Download Report

Transcript Slides . - Department of Computer Science

Trust-Sensitive Scheduling
on the Open Grid
Jon B. Weissman
with help from Jason Sonnek and Abhishek Chandra
Department of Computer Science
University of Minnesota
Trends in HPDC Workshop
Amsterdam 2006
Background
• Public donation-based infrastructures are
attractive
– positives: cheap, scalable, fault tolerant (UWCondor, *@home, ...)
– negatives: “hostile” - uncertain resource
availability/connectivity, node behavior, enduser demand => best effort service
Background
• Such infrastructures have been used for
throughput-based applications
– just make progress, all tasks equal
• Service applications are more challenging
– all tasks not equal
– explicit boundaries between user requests
– may even have SLAs, QoS, etc.
Service Model
• Distributed Service
– request -> set of independent tasks
– each task mapped to a donated node
– makespan
– E.g. BLAST service
• user request (input sequence) + chunk of DB form a
task
BOINC + BLAST
workunit = input_sequence + chunk of DB
generated when a request arrives
The Challenge
• Nodes are unreliable
– timeliness: heterogeneity, bottlenecks, …
– cheating: hacked, malicious (> 1% of SETi nodes),
misconfigured
– failure
– churn
• For a service, this matters
Some data- timeliness
Computation Heterogeneity
- both across and within nodes
PlanetLab – lower bound
Communication Heterogeneity
- both across and within nodes
The Problem for Today
• Deal with node misbehavior
• Result verification
– application-specific verifiers – not general
– redundancy + voting
• Most approaches assume ad-hoc replication
– under-replicate: task re-execution (^ latency)
– over-replicate: wasted resources (v throughput)
• Using information about the past behavior of a
node, we can intelligently size the amount of
redundancy
System Model
Problems with ad-hoc replication
Unreliable node
Task x
sent to
group A
Task y
sent to
group B
Reliable node
Smart Replication
• Reputation
– ratings based on past interactions with clients
– simple sample-based prob. (ri) over window t
– extend to worker group (assuming no collusion) =>
likelihood of correctness (LOC)
• Smarter Redundancy
– variable-sized worker groups
– intuition: higher reliability clients => smaller
groups
Terms
• LOC (Likelihood of Correctness), lg
– computes the ‘actual’ probability of getting a correct answer from a
group of clients (group g)

2 k 1


2 k 1
1 i
i
r
(
1

r
)
i
i

2 k 1
m  k 1 
 i 1

,


:


m
 1

2 k 1
i


i 1

• Target LOC (ltarget)
– the task success-rate that the system tries to ensure while forming
client groups
– related to the statistics of the underlying distribution
Trust Sensitive Scheduling
• Guiding metrics
– throughput r: is the number of successfully
completed tasks in an interval
– success rate s: ratio of throughput to number of
tasks attempted
Scheduling Algorithms
•
•
•
•
•
•
–
–
–
–
First-Fit
attempt to form the first group that satisfies ltarget
Best-Fit
attempt to form a group that best satisfies ltarget
Random-Fit
attempt to form a random group that satisfies ltarget
Fixed-size
randomly form fixed sized groups. Ignore client
ratings.
Random and Fixed are our baselines
Min group size = 3
Scheduling Algorithms
Scheduling Algorithms (cont’d)
Different Groupings
ltarget = .5
Evaluation
• Simulated a wide-variety of node reliability
distributions
• Set ltarget to be the success rate of Fixed
– goal: match success rate of fixed (which overreplicates) yet achieve higher throughput
– if desired, can drive tput even higher (but
success rate would suffer)
Comparison
gain: 25-250%
open question: how much better could we have done?
Non-stationarity
• Nodes may suddenly shift gears
– deliberately malicious, virus, detach/rejoin
– underlying reliability distribution changes
• Solution
– window-based rating (reduce t  20 from infinite)
• Experiment: “blackout” at round 300 (30%
effected)
Role of ltarget
• Key parameter
• Too large
– groups will be too large (low throughput)
• Too small
– groups will be too small (low success rate)
• Adaptively learn it (parameterless)
– maximizing r * s : “goodput”
– or could bias toward r or s
Adaptive algorithm
• Multi-objective optimization
– choose target LOC to simultaneously maximize
throughput r and success rate s
 a1 r  a2 s
– use weighted combination to reduce multiple
objectives to a single objective
– employ hill-climbing and feedback techniques to
control dynamic parameter adjustment
Adapting ltarget
• Blackout example
Throughput (a1=1, a2=0)
Xput comparison - BF
30
25
20
15
10
Min
Adapt
5
Max
Adapt
Min
BF-Bimodal
BF-HeavyHigh
BF-HeavyLow
BF-NormHigh
BF-NormLow
BF-Uniform
0
Max
Current/Future Work
•
Implementation of reputation-based
scheduling framework (BOINC and PL)
•
Mechanisms to retain node identities
(hence ri) under node churn
–
“node signatures” that capture the
characteristics of the node
Current/Future Work (cont’d)
• Timeliness
– extending reliability to encompass time
– a node whose performance is highly variable is less
reliable
• Client collusion
– detection: group signatures
– prevention:
• combine quiz-based tasks with reputation systems
• form random-groupings
Thank you.