presentation - `N` Goal

Download Report

Transcript presentation - `N` Goal

SAN FRANCISCO, CA, USA
Adaptive Energy-efficient
Resource Sharing for
Multi-threaded Workloads in
Virtualized Systems
Can Hankendi
Ayse K. Coskun
Boston University
Electrical and Computer Engineering Department
This project has been partially funded by:
Energy Efficiency in Computing Clusters
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• Energy-related costs are among the biggest contributors to
the total cost of ownership.
• Consolidating multiple workloads on the same physical node
improves energy efficiency.
(Source: International Data Corporation (IDC), 2009)
2
Multi-threaded Applications in the Cloud
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• HPC applications are expected to shift towards cloud resources.
• Resource allocation decisions significantly affect the energy efficiency
of server nodes.
• Energy efficiency is a function of application characteristics.
Energy Savings on Virtualized Server
40
35
30
25
% 20
15
10
5
0
Max Energy
Saving
Min Energy
Saving
3
Outline
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• Background
• Methodology
• Adaptive Resource Sharing
• Results
• Conclusions
4
Background
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
Cluster-level VM Management
-
Consolidation policies
across server nodes
VM migration techniques
[Srikantaiah, HotPower’08]
[Bonvin, CCGrid’11]
Node-level Management
-
Co-scheduling based on thread
communication
Identifying best thread mixes to
co-schedule
[Frachtenberg, TPDS’05]
[McGregor, IPDPS’05]
Recent Co-scheduling policies
- Co-scheduling contrasting workloads
- Balancing performance events across nodes
- Cache misses
[Dhiman, ISLPED’09]
- IPC
[Bhadauria, ICS’10]
- Bus accesses
5
Virtualized System Setup
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• 12-core AMD Magny Cours Server
 2x 6-core dies attached side by side in the
same package
 Private L1 and L2-caches for each core
 6 MB shared L3-cache for each 6-core die
• Virtualized through VMware vSphere 5 ESXi hypervisor
 2 Virtual Machines (VM) with Ubuntu Server Guest OS
6
Methodology: Measurement Setup
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• System-level power measurements at 1s sampling rate
• Performance counter collection through vmkperf at 1s sampling rate
 Counters: CPU cycles, retired instructions, L3-cache misses
• VM-level CPU and memory utilization data collection through esxtop
with 2s sampling rate
esxtop
vmkperf
System-level power
measurement
Logger
7
Parallel Workloads
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• PARSEC 2.1 benchmark suite [Bienia et al., 2008]
Benchmark
Application
IPC
Memory Acc.
blackscholes
Financial Analysis
Low
Low
bodytrack
Computer Vision
High
Medium
canneal
VLSI Design
Low
High
dedup
Enterprise Storage
Medium
Low
ferret
Similarity Search
Medium
Low
freqmine
Data Mining
High
Low
swaptions
Financial Analysis
High
Low
streamcluster
Data Mining
Low
High
vips
Media Processing
High
Low
x264
Media Processing
Medium
Medium
8
Tracking Parallel Phases
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• consolmgmt
• Consolidation management interface
• Synchronizes ROI (region-of-interest) of
multiple workloads
parsecmgmt
hooks.c
Benchmark A
Input (Serial)
consolmgmt
sleep()
Output (Serial)
roi-Trigger()
start-Logging
Benchmark B
roi-Trigger()
Input (Serial)
start-Logging()
Output (Serial)
end-Logging()
9
Performance Impact of Consolidation
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• Consolidating multiple workloads can degrade performance due to
resource contention.
• Virtualization provides performance isolation by managing memory
and NUMA node affinities.
• With native OS, performance variation is 2.5x higher.
Average
throughput of
Streamcluster
when coscheduled with
another
PARSEC
benchmark
10
Outline
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• Background
• Methodology
• Adaptive Resource Sharing
• Results
• Conclusions
11
Impact of Application Selection
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• Previous co-scheduling policies focus on
application selection to improve energy efficiency.
• Application selection is based on balancing
memory operations and CPU usage.
A
B
C
D
12
Predicting Power Efficiency
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• To improve the energy efficiency, we need to allocate more CPU
resources to power-efficient workloads.
IPC*CPU Utilization
• IPC*CPU Utilization metric shows strong correlation with power
efficiency.
13
Application Classification
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• IPC*CPU Utilization metric is used to classify applications according to
their power efficiency levels.
• We utilize density based clustering algorithm (DBSCAN) to determine
application groups based on their power efficiency classes.
14
Application Classification
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• IPC*CPU Utilization metric is used to classify applications according to
their power efficiency levels.
• We utilize density based clustering algorithm (DBSCAN) to determine
application groups based on their power efficiency classes.
Benchmarks
Case 1
VM Configuration
VM0
VM1
ESXi
Case 2
VM0
VM0
VM1
VM1
ESXi
15
Reconfiguring Resource Allocations
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• CPU hot-plugging:
 Adding/removing vCPUs during runtime
 Cons: Removing vCPU is not supported in some OSes
• Resource Allocation Adjustment:
 Allocating/limiting CPU resources for VMs
 Pros: Fine granularity (resource allocation unit is MHz)
• Both techniques have low overhead, less than 1%.
Resource Configuration Comparison
16
Reconfiguration Runtime Behavior
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• Resource allocation limits can be dynamically adjusted according to
application classes.
• CPU allocation limits can be effectively reconfigured within a
second.
17
Results
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• Proposed approach improves throughput-per-watt by up to 25% and
by 9% on average.
18
Results
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• We generate 50 workload sets, each consists of randomly selected
10 PARSEC applications.
Set 2
3x canneal
3x ferret
2x bodytrack
1x dedup
1x vips
Set 1
4x blackscholes
2x vips
1x bodytrack
1x freqmine
1x streamcluster
1x swaptions
19
Results
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• We generate 50 workload sets, each consists of randomly selected
10 PARSEC applications.
• Proposed resource sharing technique improves the throughput-perwatt by 12% on average in comparison to application selection
based co-scheduling techniques.
20
Conclusions & Future Work
Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments
• Consolidation is a powerful technique to improve the energy efficiency
on data centers.
• Energy efficiency of parallel workloads varies significantly depending
on application characteristics.
• Adaptive VM configuration for parallel workloads improves the energy
efficiency by 12% on average over existing co-scheduling algorithms.
• Future research directions include:
 Investigating the effect of memory allocation decisions on energy
efficiency;
 Utilizing application-level instrumentation to explore power/energy
optimization opportunities;
 Expanding the application space.
21