5-Cady_Fu_Ren

Download Report

Transcript 5-Cady_Fu_Ren

Techniques for Multicore Thermal
Management
Field Cady, Bin Fu and Kai Ren
Techniques for Multicore Thermal
Management
•Overview and comparison of techniques
•Plus determining the critical thread
•DVFS details
•Thread movement
Taxonomy
• Stop & Go vs DVFS
– Stop & Go : suspend core operation for 30
millisecs when temperature above threshold
– DVFS : dynamic voltage and frequency
scaling, from control theory
• Distributed vs Global
– Apply above to all cores or individually
– Performance asymmetry : different demands
on different cores
Taxonomy (cont.)
• Migration
– Moving threads between cores
– Timescale on order of a millisecond, much
slower than DVFS
– Migration is “outer loop” or control, riding on
top of DVFS or Stop-Go
• Migrate “critical” thread
– Measure criticality with heat sensor
– Or with cache misses as a proxy
Aside : Criticality
• In separate paper, Abhishek et. al. defines
“critical” as slowest thread
• If we know which is critical:
– Task stealing from critical thread
– Guide DVFS to prefer critical thread
• Explored proxies
• 13-32% performance boost in task stealing
on 32-core machine
Criticality (cont.)
Cache misses an excellent proxy
Donald and Martonosi :
comparison of techniques
• Goal : maximize performance subject to
temperature constraint
• Measure performance in BIPS and “duty
cycle”, i.e. % useful time, scaled for DVFS
frequency
• Run on SPEC benchmarks
• Simulated 4-core processor
Results
All normalized to distributed Stop-Go
Stop-Go was terrible!
– Why didn’t they try with lower frequency?
– Was 30 milliseconds the right time to stop?
They subsequently focus solely on DVFS,
even though the hardware is trickier
Migration Policies
Summary & Conclusion
•
•
•
•
DVFS far superior to Stop-Go
Distributed control helps, esp. for Stop-Go
Migration helps for Stop-Go
Counter and Sensor-based migration
comparable
DVFS
• Dynamic voltage and frequency scaling (per
core).
• Dynamic voltage scaling is a power
management technique in computer
architecture, where the voltage used in a
component is increased or decreased
• Dynamic frequency scaling (also known as CPU
throttling) is a technique in computer
architecture where a processor is run at a lessthan-maximum frequency in order to conserve
power.
Challenge
• Multiple cores may need to be manipulated
simultaneously to control both power and
temperature for a CMP chip. Require a
Multi-Input-Multi-Output (MIMO) control
• Application software is always designed for
single-core processors. Power shifting needed.
• Heterogeneous cores
• Workload of a CMP processor is unpredictable
at design time and may vary significantly at
runtime
DFVS
Open-Loop Control
P(k+1) = P (k) + A Δ f(k)
Using Feedback (Close-loop)
• Dynamically change matrix A.
Thread Motion: Fine-Grained
Power Management for MultiCore Systems
Motivation
• Limitations of DVFS
– Coarse grained
• Initiated by OS in milliseconds
• Voltage transition delay ~ 10 microseconds
• Too slow to respond fine variations in program
behavior (Cache miss ~ nanoseconds)
– Per-core DVFS with multiple VF settings
• High cost of off-chip regulators
• Bad scalability with a large number of cores
Thread Motion
• Idea of Thread Motion
– Moving threads between cores with two VF domains
– Threads experience virtually continuous Voltage
Thread Motion
• TM Manager
– A separate embedded microcontroller running TM
algorithm
• Effective IPC
– maintain a table of IPC for each application
– high IPC – compute-intensive
– low IPC –cache miss, memory access latency
Thread Motion: Algorithm
• Movement Policy
– Assign a thread in a compute-intensive phase
to a high VF core
– Intra-cluster movement considered first
• Trigger point:
– TM-interval : fixed intervals ~ 200 cycles
– Miss-driven : move a cache-missed thread
Thread Motion
Better
Quality