Transcript Slide 1
SAN FRANCISCO, CA, USA
Jonathan Eastep
David Wingate
Anant Agarwal
06/3/2012
Multicores are Complex!
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• The Problem
System complexity is skyrocketing!
Multicore architecture is a moving target
The best algorithm and algorithm settings depend
Application inputs and workloads can be dynamic
Online tuning is necessary but typically absent
2
The Big Picture
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• Developed a dynamic optimization framework to
auto-tune software and minimize burden
• Framework is based on online machine learning
technologies
• Demonstrated the framework by designing “Smart
Data Structures” for parallel programs
• The framework is general; could apply to systems
such as Clouds, OS, Runtimes
3
Smart Data Structures
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• Smart Data Structures are parallel data
structures that self-optimize to minimize
programmer burden
• They use online machine learning to adapt
to changing app or system needs and
achieve the best performance
• A library of Smart Data Structures open
sourced on github (GPL)
– github.com/mit-carbon/Smart-Data-Structures
• Publications: [1], [2], [3], [4]
4
A Sketch of The Benefits of SDS
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• Use a Smart Lock to optimize a master-worker program
/ 1e6
per second)
(Items(beats
Heartrate
per second
/ 1e6)
Measure rate of completed work items
Emulate dynamic frequency scaling due to Intel Turbo Boost®
Workload 1: Worker 0 @ 3GHz, others @ 2GHz
Workload 2: Worker 3 @ 3GHz, others @ 2GHz
Workload #1
6
1.6
x 10
Workload #2
Workload #1
Ideal
Smartlock
gap
1.4
1.2
Optimal
Smartlock
1
Baseline
Priority lock: policy 1
Priority lock: policy 2
0.8
Spin!lock: Reactive Lock
Spin!Lock: Test and Set
0.6
0.0
0.3
0.6
1.0
1.3
1.6
2.0
2.3
2.6
3.0
3.3
3.6
4.0
Time (seconds)
5
Outline
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• Smart Data Structures
Anatomy of a Smart Data Structure
Implementation Example
Research Challenges and Solutions
Online Machine Learning Algorithm
Empirical Benchmark Results
Empirical Scalability Studies
Future Directions
Conclusions
6
What are Smart Data Structures?
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• Self-aware computing applied to data structures
• Data Structures that self-optimize using online
learning
Data
Structure
Storage
E.g.
Queue
Interface
• add
• remove
• peek
t1
t2 …
Algorithm
tn
knobs
• hand-tuned
• per system
• per app
• static
Smart Data
Structure
E.g.
Smart Queue
Interface
• add
• remove
• peek
t1
t2 …
tn
Online
Learning
Storage
Algorithm
knobs
• self-tuned
• automatically
• at runtime
• We can optimize knobs in other systems too
7
Smart Data Structure Library
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• C++/C Library of Popular Parallel Data
Structures
• Supported:
• ML Optimization Type:
–
–
–
–
–
Smart Lock
Smart Queue
Smart SkipList
Smart PairHeap
Smart Stack
Lock Acquisition Scheduling
Tuning Flat Combining
• Future Work:
– Smart DHT
Dynamic Load-Balancing
8
Smart Queue, SkipList, PairHeap, Stack
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• Implementation should leverage bestperforming prior work
• What are the best? Determine with experiments.
• Result: Flat Combining Data Structures from
Hendler et al.
• This is contrary to conventional wisdom
• Reason: FC Algorithm minimizes synchronization
overheads by combining data structure ops and
applying multiple ops at once
9
Flat Combining Primer
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
Lock
Serial Data Structure
enq a
enq b
enq d
enq c
enq b
enq
enq
enq
cd a
Scancount
1
0
3!!!
2
Working
Working
Working
Combining
Working
10
Smart Queue, SkipList, PairHeap, Stack
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• Here the application of learning is to auto-tune
a performance-critical knob called the
scancount
E.g.:
Smart
Queue
Thread Request
Records
Lock
Interface
• enqueue
• dequeue
• peek
t1
Serial Queue
Reinforcement
Learning
(of a discrete variable)
Scancount
knobs
• number of scans over request records
• dynamically tune the time spent combining
t2 … tn
11
Why Does the Scancount Matter?
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• Scancount controls how long threads spend as
the combiner
• Increasing scancount allows combiner to do more
data structure ops within the same lock
• But, increasing scancount increases latency of the
combiner’s op
• It’s good to increase scancount up to a point, but
after that latency can hurt performance
• Smart Data Structures use online learning to find
the ideal scancount at any given time
12
SDS Implementation
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
throughput (ops/s)
Smart Data
Structure
E.g.
Smart Queue
Interface
• add
• remove
• peek
t1
t2
Storage
s
t
a
t
Online
Learning
Reward
External
Perf. Monitor
Algorithm
Learning
Thread
…
Application
Threads
tn
E.g. Heartbeats
• Goal: minimize application disruption
• Internal lightweight statistics or external
application-specific reward signal
• Number of learning threads is one by
default; it runs learning engines for all
SDS
13
SDS Implementation
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
– Machine learning co-optimization
framework
– Supports joint optimization: multiple knobs
– Supports discrete, gaussian, boolean,
permutation knobs
– Designed explicitly to support other
systems than SDS
14
Major SDS Research Challenges
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
1. How do you find knob settings
with best long-term effects?
2. How do you measure if a knob
setting is helping?
3. How do you optimize quickly
enough to not miss opportunities?
4. How do you manage a potentially
intractable search space?
Quality
Challenges
Timeliness
Challenges
15
Addressing Other Quality Challenges
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
1. How do you find settings with best long-term
effects?
Leverage one of the machine learning technologies for planning
Use online RL to adapt to workload or phase changes
2. How do you measure if a knob setting is helping?
Extensible reward signal interface for performance monitoring
Heartbeats Framework for application-specific perf. evaluations
16
Addressing Timeliness Challenges
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
“Sorry I’m late dear…
have you been waiting long?”
3. How to optimize fast enough not to miss opportunities?
Choose a fast gradient-based machine learning algorithm
Use learning helper thread to decouple learning from app threads
4. How to manage potentially intractable search space?
Relax potentially exponential discrete action space into continuous one
Use a stochastic soft-max policy which enables gradient-based learning
17
Reinforcement Learning Algorithm
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• Goal: optimize rate of reward (e.g. heart rate)
• Method: Policy Gradients Algorithm
Online, model-free, handles exponential knob spaces
Learn a stochastic policy which will give a probability
distribution over knob settings for each knob
Sample settings for each knob from the policy, try them
empirically, and listen to performance feedback signal
Improve the policy using a method analogous to
gradient ascent
• I.e. estimate gradient of the reward wrt policy and step
policy in the gradient direction to get maximum reward
• Balance exploration vs. exploitation + make policy
differentiable via stochastic soft-max policy
18
How Does SDS Perform?
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• Full sweep over SDS, load: compare against Static
Oracle
• Result: near-ideal performance in many cases
Smart Queue
Static
Oracle
Ideal
Static
SDS
Dynamic
SDS
Dynamic
Avg Static
Smart Skip List
Smart Pair Heap
Throughput (ops/ms)
Static Avg
1600
1400
1200
1000
800
600
400
200
0
1000
1000
800
600
400
200
0
800
600
400
200
0
Post Computation (ns)
†14 threads
• Result:
Post Computation (ns)
Post Computation (ns)
Quality Challenge is met
19
What if Workload Changes Rapidly?
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
Throughput (ops/ms)
• Inject changes in the data structure “load” (i.e. post computation
between ops)
• Sweep over SDS, random load schedules, frequencies
• Result: Good benefit even when load changes every 10μs
Smart Queue: Sched. 1
1000
800
600
400
200
0
Smart Skip List: Sched. 1
800
800
600
600
400
400
200
200
0
1/10000 1/1000
1/100
1/10
Interval Frequency (1/µs)
Dynamic Oracle
Smart Pairing Heap: Sched. 1
SDS Dynamic
0
1/10000 1/1000
1/100
1/10
Interval Frequency (1/µs)
1/10000 1/1000 1/100
1/10
Interval Frequency (1/µs)
Dynamic Average
†14
threads
• Result: Quality and Timeliness Challenges are met
20
Future Directions
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• Extend this work to a common framework to
coordinate tuning across all system layers
E.g.: application -> runtime -> OS -> HW
Scalable, decentralized optimization methods
21
Conclusions
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• Developed a framework to dynamically tune systems and
minimize programmer burden via online machine learning
• Demonstrated the framework through a case study of selftuning “Smart Data Structures”
• Now looking at uses in systems beyond data structures
jonathan dot eastep at gmail
• Reinforcement Learning will play an increasingly important
role in the development of future software and hardware
22
Presentation References
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
[1] J. Eastep, D. Wingate, M. D. Santambrogio, A. Agarwal, “Smartlocks:
Lock Acquisition Scheduling for Self-Aware Synchronization,” 7th IEEE
International Conference on Autonomic Computing (ICAC’10), 2010.
Best Student Paper Award (pdf)
[2] J. Eastep, D. Wingate, M.D. Santambrogio, A. Agarwal, “Smartlocks:
Self-Aware Synchronization through Lock Acquisition Scheduling,” MIT
CSAIL Technical Report, MIT-CSAIL-TR-2009-055, November 2009.
(pdf)
[3] J. Eastep, D. Wingate, A. Agarwal, “Smart Data Structures: A
Reinforcement Learning Approach to Multicore Data Structures,” 8th
IEEE International Conference on Autonomic Computing (ICAC’11),
2011. (pdf)
[4] J. Eastep, “Smart Data Structures: An Online Machine Learning
Approach to Multicore Data Structures,” Doctoral Dissertation, MIT,
May 2011 (pdf)
23