Temperature-Aware Design

Download Report

Transcript Temperature-Aware Design

Temperature-Aware Design
Presented by Mehul Shah
4/29/04
The Problem

Power & Thermal densities are increasing




Operating Vdd scaling much more slowly (ITRS)
Cost of cooling rising exponentially


Currently @ 50W/cm2, 100W/cm2 @ 50nm technology
Power density doubles every 3 years
$1 - $3 per Watt of power dissipation
Packages designed for worst case power



Hot spots – heat dissipation non-uniform across chip
Low-Power design techniques not sufficient
Big Hammer : Global Clock Gating limits performance
Impact of Temperature on Design

Increased Delay, Lower Reliability

Slower Transistors



Higher Leakage Power



By orders of magnitude at higher temperature
Leakage becoming more significant than switching power
Higher Metal Resistivity


Carrier mobility lower at higher
temperature
o
o
Inverter 35% slower at 110 C vs. 60 C
o
Lower Mean-Time-To-Failure (MTF)


o
Copper 39% more resistive at 120 C vs. 20 C
MTF = MTFo exp (Ea / kb T)
MTF decreases exponentially w/ Temperature
Moral of the Story

Problem: Temperature adversely affects power, performance &
reliability

Solution: “Temperature-Aware” Design
Temperature Aware Design

Thermal Modeling






Estimate Operating Temperature
Simple : Allow architects to easily reason about
thermal effects
Detailed : Model runtime temperature at
Functional-Unit granularity
Computationally Efficient
Flexible : Easily extend to novel architectures
Dynamic Thermal Management

Use runtime behavior and thermal status to
adjust/distribute workload among Functional-Units
Talk Outline

Thermal Modeling





Model Description
Validation & Case Studies
Dynamic Thermal Management
Results
Conclusions
References


Kevin Skadron et. al, “Temperature-Aware
Microarchitecture”
Wei Huang et. al, Compact Thermal Modeling for
Temperature-Aware Design”
Thermal Modeling


Thermal model
interacts with Power,
Performance,
Reliability models
Design convergence
requires several
iterations
Heat Flow vs. Electrical Phenomenon

Both can be described by the same
differential equations




Describe design as a Thermal RC circuit


Heat Flow = Electrical Current
Temperature = Voltage
Capacitance = Heat Absorption Capacity
Node = Functional Block
Solve RC equations to obtain Node
Temperature
HotSpot Package
Equivalent Model
Equivalent Model (Continued)


Die Area divided into micro-architectural blocks
Spreader, Sink divided into five blocks




Rsp, Rhs areas under the die
Trapezoids not covered by the die
Rconvective represents thermal resistance from package to air
RC Model


Vertical R’s : heat flow between layers
Lateral R’s : heat diffusion within a layer


R=t/k*A




R1 = Block1 to Spreader, R2 = Block1 to rest of the chip
t : thickness
k : thermal conductivity of the material
A : Cross-sectional area
C=c*t*A


c : thermal capacitance per unit volume
Require empirical scaling factor due to lumped model
HotSpot Validation
Fallacy of Using a Power Metric
Compact Thermal Model
Equivalent Model
Equivalent Model (Cont.)

Compact Model vs. HotSpot




Arbitrary granularity grid
Thermal interface material
Spreader, Interface under the die are divided into chip
granularity
Primary Heat Flow Path


Rvertical = t / (k * A)
C = Alpha * cp * ρ * A



Alpha : To account for lumped capacitor model
Cp : specific heat
ρ : material density
Equivalent Model (Secondary Path)

Interconnect Thermal
Model

Self-heating power &
wire length prediction


Pself = I2R
R = ρ m * L / Am
Equivalent Model (Secondary Path, Cont.)

Equivalent Thermal Resistance
Model Validation & Evaluation (Primary)
Transient
Steady State
Model Validation (Secondary)
Case Study
Thermal Management

Dynamic Thermal Management

Emergency Threshold temperature above
which chip is in thermal violation

Trigger Threshold temperature above
which DTM is applied
DTM Techniques





Temperature-Tracking Frequency Scaling
Feedback controlled Fetch Toggling
Migrating Computation
Dynamic Voltage Scaling (DVS)
Global Clock Gating
DTM Results
Conclusions

Accurate Thermal models are essential for early
design estimation




Models are similar to electrical RC networks
Arbitrary granularity for localized temperature information
Model all parts of the package
Architectural Techniques can reduce demands on the
IC package by


Dynamically adjusting workload to avoid emergencies
Reducing Hot Spots