Transcript Slide 1

SAN FRANCISCO, CA, USA
Jonathan Eastep
David Wingate
Anant Agarwal
06/3/2012
Multicores are Complex!
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• The Problem





System complexity is skyrocketing!
Multicore architecture is a moving target
The best algorithm and algorithm settings depend
Application inputs and workloads can be dynamic
Online tuning is necessary but typically absent
2
The Big Picture
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• Developed a dynamic optimization framework to
auto-tune software and minimize burden
• Framework is based on online machine learning
technologies
• Demonstrated the framework by designing “Smart
Data Structures” for parallel programs
• The framework is general; could apply to systems
such as Clouds, OS, Runtimes
3
Smart Data Structures
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• Smart Data Structures are parallel data
structures that self-optimize to minimize
programmer burden
• They use online machine learning to adapt
to changing app or system needs and
achieve the best performance
• A library of Smart Data Structures open
sourced on github (GPL)
– github.com/mit-carbon/Smart-Data-Structures
• Publications: [1], [2], [3], [4]
4
A Sketch of The Benefits of SDS
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• Use a Smart Lock to optimize a master-worker program
/ 1e6
per second)
(Items(beats
Heartrate
per second
/ 1e6)




Measure rate of completed work items
Emulate dynamic frequency scaling due to Intel Turbo Boost®
Workload 1: Worker 0 @ 3GHz, others @ 2GHz
Workload 2: Worker 3 @ 3GHz, others @ 2GHz
Workload #1
6
1.6
x 10
Workload #2
Workload #1
Ideal
Smartlock
gap
1.4
1.2
Optimal
Smartlock
1
Baseline
Priority lock: policy 1
Priority lock: policy 2
0.8
Spin!lock: Reactive Lock
Spin!Lock: Test and Set
0.6
0.0
0.3
0.6
1.0
1.3
1.6
2.0
2.3
2.6
3.0
3.3
3.6
4.0
Time (seconds)
5
Outline
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• Smart Data Structures








Anatomy of a Smart Data Structure
Implementation Example
Research Challenges and Solutions
Online Machine Learning Algorithm
Empirical Benchmark Results
Empirical Scalability Studies
Future Directions
Conclusions
6
What are Smart Data Structures?
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• Self-aware computing applied to data structures
• Data Structures that self-optimize using online
learning
Data
Structure
Storage
E.g.
Queue
Interface
• add
• remove
• peek
t1
t2 …
Algorithm
tn
knobs
• hand-tuned
• per system
• per app
• static
Smart Data
Structure
E.g.
Smart Queue
Interface
• add
• remove
• peek
t1
t2 …
tn
Online
Learning
Storage
Algorithm
knobs
• self-tuned
• automatically
• at runtime
• We can optimize knobs in other systems too
7
Smart Data Structure Library
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• C++/C Library of Popular Parallel Data
Structures
• Supported:
• ML Optimization Type:
–
–
–
–
–
Smart Lock
Smart Queue
Smart SkipList
Smart PairHeap
Smart Stack
Lock Acquisition Scheduling
Tuning Flat Combining
• Future Work:
– Smart DHT
Dynamic Load-Balancing
8
Smart Queue, SkipList, PairHeap, Stack
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• Implementation should leverage bestperforming prior work
• What are the best? Determine with experiments.
• Result: Flat Combining Data Structures from
Hendler et al.
• This is contrary to conventional wisdom
• Reason: FC Algorithm minimizes synchronization
overheads by combining data structure ops and
applying multiple ops at once
9
Flat Combining Primer
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
Lock
Serial Data Structure
enq a
enq b
enq d
enq c
enq b
enq
enq
enq
cd a
Scancount
1
0
3!!!
2
Working
Working
Working
Combining
Working
10
Smart Queue, SkipList, PairHeap, Stack
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• Here the application of learning is to auto-tune
a performance-critical knob called the
scancount
E.g.:
Smart
Queue
Thread Request
Records
Lock
Interface
• enqueue
• dequeue
• peek
t1
Serial Queue
Reinforcement
Learning
(of a discrete variable)
Scancount
knobs
• number of scans over request records
• dynamically tune the time spent combining
t2 … tn
11
Why Does the Scancount Matter?
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• Scancount controls how long threads spend as
the combiner
• Increasing scancount allows combiner to do more
data structure ops within the same lock
• But, increasing scancount increases latency of the
combiner’s op
• It’s good to increase scancount up to a point, but
after that latency can hurt performance
• Smart Data Structures use online learning to find
the ideal scancount at any given time
12
SDS Implementation
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
throughput (ops/s)
Smart Data
Structure
E.g.
Smart Queue
Interface
• add
• remove
• peek
t1
t2
Storage
s
t
a
t
Online
Learning
Reward
External
Perf. Monitor
Algorithm
Learning
Thread
…
Application
Threads
tn
E.g. Heartbeats
• Goal: minimize application disruption
• Internal lightweight statistics or external
application-specific reward signal
• Number of learning threads is one by
default; it runs learning engines for all
SDS
13
SDS Implementation
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
– Machine learning co-optimization
framework
– Supports joint optimization: multiple knobs
– Supports discrete, gaussian, boolean,
permutation knobs
– Designed explicitly to support other
systems than SDS
14
Major SDS Research Challenges
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
1. How do you find knob settings
with best long-term effects?
2. How do you measure if a knob
setting is helping?
3. How do you optimize quickly
enough to not miss opportunities?
4. How do you manage a potentially
intractable search space?
Quality
Challenges
Timeliness
Challenges
15
Addressing Other Quality Challenges
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
1. How do you find settings with best long-term
effects?


Leverage one of the machine learning technologies for planning
Use online RL to adapt to workload or phase changes
2. How do you measure if a knob setting is helping?


Extensible reward signal interface for performance monitoring
Heartbeats Framework for application-specific perf. evaluations
16
Addressing Timeliness Challenges
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
“Sorry I’m late dear…
have you been waiting long?”
3. How to optimize fast enough not to miss opportunities?


Choose a fast gradient-based machine learning algorithm
Use learning helper thread to decouple learning from app threads
4. How to manage potentially intractable search space?


Relax potentially exponential discrete action space into continuous one
Use a stochastic soft-max policy which enables gradient-based learning
17
Reinforcement Learning Algorithm
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• Goal: optimize rate of reward (e.g. heart rate)
• Method: Policy Gradients Algorithm
 Online, model-free, handles exponential knob spaces
 Learn a stochastic policy which will give a probability
distribution over knob settings for each knob
 Sample settings for each knob from the policy, try them
empirically, and listen to performance feedback signal
 Improve the policy using a method analogous to
gradient ascent
• I.e. estimate gradient of the reward wrt policy and step
policy in the gradient direction to get maximum reward
• Balance exploration vs. exploitation + make policy
differentiable via stochastic soft-max policy
18
How Does SDS Perform?
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• Full sweep over SDS, load: compare against Static
Oracle
• Result: near-ideal performance in many cases
Smart Queue
Static
Oracle
Ideal
Static
SDS
Dynamic
SDS
Dynamic
Avg Static
Smart Skip List
Smart Pair Heap
Throughput (ops/ms)
Static Avg
1600
1400
1200
1000
800
600
400
200
0
1000
1000
800
600
400
200
0
800
600
400
200
0
Post Computation (ns)
†14 threads
• Result:
Post Computation (ns)
Post Computation (ns)
Quality Challenge is met
19
What if Workload Changes Rapidly?
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
Throughput (ops/ms)
• Inject changes in the data structure “load” (i.e. post computation
between ops)
• Sweep over SDS, random load schedules, frequencies
• Result: Good benefit even when load changes every 10μs
Smart Queue: Sched. 1
1000
800
600
400
200
0
Smart Skip List: Sched. 1
800
800
600
600
400
400
200
200
0
1/10000 1/1000
1/100
1/10
Interval Frequency (1/µs)
Dynamic Oracle
Smart Pairing Heap: Sched. 1
SDS Dynamic
0
1/10000 1/1000
1/100
1/10
Interval Frequency (1/µs)
1/10000 1/1000 1/100
1/10
Interval Frequency (1/µs)
Dynamic Average
†14
threads
• Result: Quality and Timeliness Challenges are met
20
Future Directions
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• Extend this work to a common framework to
coordinate tuning across all system layers
 E.g.: application -> runtime -> OS -> HW
 Scalable, decentralized optimization methods
21
Conclusions
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• Developed a framework to dynamically tune systems and
minimize programmer burden via online machine learning
• Demonstrated the framework through a case study of selftuning “Smart Data Structures”
• Now looking at uses in systems beyond data structures
 jonathan dot eastep at gmail
• Reinforcement Learning will play an increasingly important
role in the development of future software and hardware
22
Presentation References
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
[1] J. Eastep, D. Wingate, M. D. Santambrogio, A. Agarwal, “Smartlocks:
Lock Acquisition Scheduling for Self-Aware Synchronization,” 7th IEEE
International Conference on Autonomic Computing (ICAC’10), 2010.
Best Student Paper Award (pdf)
[2] J. Eastep, D. Wingate, M.D. Santambrogio, A. Agarwal, “Smartlocks:
Self-Aware Synchronization through Lock Acquisition Scheduling,” MIT
CSAIL Technical Report, MIT-CSAIL-TR-2009-055, November 2009.
(pdf)
[3] J. Eastep, D. Wingate, A. Agarwal, “Smart Data Structures: A
Reinforcement Learning Approach to Multicore Data Structures,” 8th
IEEE International Conference on Autonomic Computing (ICAC’11),
2011. (pdf)
[4] J. Eastep, “Smart Data Structures: An Online Machine Learning
Approach to Multicore Data Structures,” Doctoral Dissertation, MIT,
May 2011 (pdf)
23