Improving Energy Efficiency by Making DRAM Less Randomly

Download Report

Transcript Improving Energy Efficiency by Making DRAM Less Randomly

Improving Energy Efficiency by Making
DRAM Less Randomly Accessed
Hai Huang, Kang G. Shin, Charles Lefurgy, Tom Keller
University of Michigan
IBM Austin Research Lab
Overview

Continual increase in the power budget allocated to main
memory (i.e., DRAM)


E.g., in a mid-range IBM eServer system, 40% of the total system
energy is consumed by its main memory subsystem
By passively monitoring memory traffic and managing
the power, existing power management techniques are
not fully exploiting deeper power-saving states
=> Actively shape memory traffic to enable existing
techniques to save more energy
Passive Monitoring Memory Traffic

Why is passively monitoring memory traffic inefficient?



Memory accesses are random – good for performance, bad for
energy consumption!
Idle time between consecutive memory accesses is often too
short for use of the deeper power-saving state
Randomness is mostly due to OS’s arbitrary virtual-to-physical
mapping
Example: Active vs. Passive
Passive memory traffic management
Rank 0
Rank 1
time
High-power
Low-power
Ultra Low-power
Active memory traffic management
Rank 0
Rank 1
time
How to Shape Memory Traffic



Essentially, we need to artificially create disparity in
access frequency among different memory ranks
Hot Ranks and Cold Ranks
Disparity in access frequency can be created by finding
and migrating frequently-accessed pages to a subset of
memory ranks



Hot ranks: contain frequently-accessed pages
Cold ranks: contain infrequently-accessed and unmapped pages
Page migration can be done by system software
Implementation
First level
page table
Second level
page table
Process
Modify
PT
Time
triggers
Migration thread
Operating System
Migrate
(old_page, new_page)
MC
Hot ranks
page
counter
Rank 0
Rank 1
Rank 2
Cold ranks
Rank 3
Issues with Page Migration

There is a cost associated with each page migration
Memory access frequency
Is often highly skewed!!!
6% pages causes 75% accesses
14% pages causes 90% accesses
Not all pages need to be migrated
Evaluation

Simulators



Workloads




Mambo [IBM] – A full-machine simulator, cycle-accurate, supports
PowerPC architecture
Memsim [IBM] – Detailed trace-driven main memory simulator,
written in CSIM
Low memory-intensive workload: SPECjbb + bzip + crafty
High memory-intensive workload: SPECjbb + art + mcf
SPECjbb: simulating 8 warehouses
SPEC2000 benchmarks: using Reference input set
Low Memory-Intensive Workload
1.2
1
0.8
0.6
0.4
0.2
0
HW
HW1
Normalized Runtime
HW5
HW10
Normalized Power
High Memory-Intensive Workload
1.2
1
0.8
0.6
0.4
0.2
0
HW
HW1
Normalized Runtime
HW5
HW10
Normalized Power
Summary of Results

Energy:


Performance:



Actively shaping memory traffic saves 35% more energy than
passively monitoring
Low memory-intensive workload: small impact on performance
High memory-intensive workload: significantly degrades
performance due to more contention on hot ranks
Cost:


Use hardware counters, or
Software page faults
Conclusion



Actively shaping memory traffic allows existing power
management techniques to more effectively save power
Highly-skewed page accesses are observed
Alternative main memory design:




Use high-performance/highly-parallel ranks as hot ranks
Use low-performance/low-power ranks as cold ranks
Allows frequently-accessed pages to be accessed faster
Allows memory ranks that hold infrequently-accessed
and unmapped pages to consume less energy