Thermal modeling

Download Report

Transcript Thermal modeling

© Mircea Stan, Kevin Skadron, David Brooks, 2002
Overview
1.
2.
3.
4.
5.
6.
7.
8.
9.
Motivation (Kevin)
Thermal issues (Kevin)
Power modeling (David)
Thermal management (David)
Optimal DTM (Lev)
Clustering (Antonio)
Power distribution (David)
What current chips do (Lev)
HotSpot (Kevin)
PowerPC G3 Microprocessor
© Mircea Stan, Kevin Skadron, David Brooks, 2002
• On-chip temperature sensor (junction
temperature)
– Based on differential voltage change
across 2 diodes of different sizes
– Implemented in PowerPC G3/G4
processors
• OS required for control
• Instruction Cache Throttling used to
dynamically lower junction
temperature
Pentium III
© Mircea Stan, Kevin Skadron, David Brooks, 2002
• On-die thermal diode
– Coupled with board-level thermal diode
sensor
• Uses
– Monitor long-term temperature and
environmental trends
– Provide indication of catastrophic failure
Pentium 4
• Thermal ramp rates ~50ºC/second
© Mircea Stan, Kevin Skadron, David Brooks, 2002
(over whole package)
• Much too high for coarse-grained
solutions
• Thermal Monitor
– Highly-accurate on-die temperature
sensing circuit
– Fast acting temperature control circuit
(~50ns)
Temperature Sensing Diode
Reference
Current
Source
PROCHOT
#
Current
Comparator
© Mircea Stan, Kevin Skadron, David Brooks, 2002
Pentium 4 -- Thermal Monitor
• Trip Point is calibrated at
manufacturing time
• Simple response
– Turn processor clocks on/off at 50% duty
cycle
– For 1.5GHz processor, ~2s on + ~2s off?
Pentium 4 -- Results
© Mircea Stan, Kevin Skadron, David Brooks, 2002
• For 200 traces (TPC-C, SPEC,
Microsoft)
– Thermal design point can be reduced to
75% of true “max power” with minimal
performance loss
Pentium 4
© Mircea Stan, Kevin Skadron, David Brooks, 2002
• Thermal monitors allow
– Tradeoff between cost and performance
– Cheaper package
• More triggers, Less Performance
– Expensive package
• No triggers, no performance loss
© Mircea Stan, Kevin Skadron, David Brooks, 2002
Architecture-level Thermal Management
• Dynamically adjust execution to control
temperature
• Avoid catastrophic failure (heat sink, fan)
• Permit use of less expensive package
– Design for less than the worst case
– Package costs ~$1/W above ~40 W
– Heat sinks, heat pipes, thinned wafers, fans
• Fans reduce battery life
– Peak power as high as 150 W now and > 200W in
1-2 generations
– Temperatures over 100°C
• More fundamentally -- there is a need for
architecture-level thermal modeling
– What’s actually going on in there?
© Mircea Stan, Kevin Skadron, David Brooks, 2002
HotSpot project
• Collaboration between HPLP and
LAVA Labs (ECE and CS depts. UVa)
• Deal with “hot spots”
– Localized heating occurs much
faster than chip-wide
• microsec. to millisec.
– Chip-wide treatment is too conservative
• seconds to minutes
• but there is significant lateral
thermal coupling through the package
• How do we model this?
• Prove temperature will be
safely bounded
Hot spots in Power4
© Mircea Stan, Kevin Skadron, David Brooks, 2002
Temperature “landscape”: space and time
How to estimate early in the design cycle?
Thermal modeling
• Want a fine-grained, dynamic model of
© Mircea Stan, Kevin Skadron, David Brooks, 2002
temperature
– At a granularity architects can reason
about
– That accounts for adjacency and package
– That does not require detailed designs
– That is fast enough for practical use
• HotSpot - a compact model based on
thermal R, C
– Parameterized to automatically derive a
model based on various
•
•
•
•
Architectures
Power models
Floorplans
Thermal Packages
© Mircea Stan, Kevin Skadron, David Brooks, 2002
Dynamic compact thermal model
Electrical-thermal duality
V temp (T)
I power (P)
R thermal resistance (Rth)
C thermal capacitance (Cth)
RC time constant (Rth Cth)
T_hot
T_amb
Kirchoff Current Law
differential eq.
I = C · dV/dt + V/R
thermal domain P = Cth · dT/dt + T/Rth
where T = T_hot – T_amb
At higher granularities of P, Rth, Cth
P, T are vectors and Rth, Cth are circuit matrices
Package we model
© Mircea Stan, Kevin Skadron, David Brooks, 2002
Heat sink
IC Package
Heat spreader
PCB
Die
Pin
Interface
material
© Mircea Stan, Kevin Skadron, David Brooks, 2002
Modeling the package
• Thermal management allows for packaging
alternatives/shortcuts/interactions
• HotSpot needs a model of packaging
• Basic thermal model:
–
–
–
–
Heat spreader
Heatsink
Interface materials (e.g. phase-change films)
Fan/Active cooler (TEC)
• Thermal resistance due to convection
• Constriction and bulk resistance for fins
• Spreading constriction and bulk resistance
for heatsink base and heat spreader
• Thermal resistance for bonding material
• Thermal capacitance heat spreader and
heatsink
“Optimal” package
© Mircea Stan, Kevin Skadron, David Brooks, 2002
• Default package is found using:
–
–
–
–
Power dissipation
Target temperature on chip
Chip area
Clock speed – high or low performance
• Power dissipation and target
temperature used to determine
resistance value needed
• Needs more work: modern packages
are incredibly complex, yet there is
still a need to model at higher levels
Now: what can we do with HotSpot?
Equivalent vertical network
• Diagram is simplified – peripheral
nodes
© Mircea Stan, Kevin Skadron, David Brooks, 2002
Chip
Peripheral spreader nodes
Interface
Spreader
Interface + Sink
Convection
Vertical network parameters
© Mircea Stan, Kevin Skadron, David Brooks, 2002
• Resistances
– Determined by the corresponding areas
and their cross sectional thickness
– R = resistivity x thickness / Area
• Capacitances
– C = specific heat x thickness x Area
• Peripheral node areas
Spreader
North
West Chip East
South
© Mircea Stan, Kevin Skadron, David Brooks, 2002
Lateral resistances
• Determined by the floorplan and the
length of shared edges between
adjacent blocks
– "Heat Spreading and Conduction in Compressed
Heatsinks", Jaana Behm and Jari Huttunen, in
proceedings of the 10th International Flotherm
User Conference, May 2001.
Lateral resistances – contd...
© Mircea Stan, Kevin Skadron, David Brooks, 2002
• Lengths used for silicon
• Lengths used in the spreader
© Mircea Stan, Kevin Skadron, David Brooks, 2002
Our model (lateral and vertical)
Interface material
(not shown)
Temperature equations
• Fundamental RC differential equation
– P = C dT/dt + T / R
© Mircea Stan, Kevin Skadron, David Brooks, 2002
• Steady state
– dT/dt = 0
– P=T/R
• When R and C are network matrices
– Steady state – T = R x P
– Modified transient equation
• dT/dt + (RC)-1 x T = C-1 x P
– HotSpot software mainly solves these two
equations
HotSpot
© Mircea Stan, Kevin Skadron, David Brooks, 2002
• Time evolution of temperature is driven
by unit activities and power dissipations
averaged over 10K cycles
– Power dissipations can come from any power
simulator, act as “current sources” in RC
circuit ('P' vector in the equations)
– Simulation overhead in Wattch/SimpleScalar:
< 1%
• Requires models of
– Floorplan: important for adjacency
– Package: important for spreading and time
constants
– R and C matrices are derived from the above
Implementation
© Mircea Stan, Kevin Skadron, David Brooks, 2002
• Primarily a circuit solver
• Steady state solution
– Mainly matrix inversion – done in two
steps
• Decomposition of the matrix into lower and
upper triangular matrices
• Successive backward substitution of solved
variables
– Implements the pseudocode from CLR
• Transient solution
– Inputs – current temperature and power
– Output – temperature for the next interval
– Computed using a fourth order RungeKutta (RK4) method
Transient solution
© Mircea Stan, Kevin Skadron, David Brooks, 2002
• Solves differential equations of the form
dT + AT = B where A and B are constants
– In HotSpot, A is constant but B depends on
the power dissipation
– Solution – assume constant average power
dissipation within an interval (10 K cycles)
and call RK4 at the end of each interval
• In RK4, current temperature (at t) is
advanced in very small steps (t+h, t+2h
...) till the next interval (10K cycles)
• RK – `4` because error term is 4th order
i.e., O(h^4)
© Mircea Stan, Kevin Skadron, David Brooks, 2002
Transient solution contd...
• 4th order error has to be within the
required precision
• The step size (h) has to be small
enough even for the maximum slope of
the temperature evolution curve
• Transient solution for the differential
equation is of the form Ae-Bt with A and
B are dependent on the RC network
• Thus, the maximum value of the slope
(AxB) and the step size are computed
accordingly
Validation
© Mircea Stan, Kevin Skadron, David Brooks, 2002
• Validated and calibrated using MICRED
test chips
– 9x9 array of power dissipators and sensors
– Compared to HotSpot configured with
same grid, package
• Within 7% for both steady-state and
transient step-response
– Interface material (chip/spreader) matters
Current features
© Mircea Stan, Kevin Skadron, David Brooks, 2002
• Specification of arbitrary floorplans
• Format of floorplan file:
– One line per unit
– Line format – <unit-name> \t <width> \t
<height> \t <left-x> \t <bottom-y> \n
• Takes a power trace file as an input
and outputs corresponding
temperature trace
• Ability to modify package
specifactions (type of interface
material, size and type of heat
spreader and heat sink etc.)
© Mircea Stan, Kevin Skadron, David Brooks, 2002
Current floorplan
•Modeled after an Alpha 21364
© Mircea Stan, Kevin Skadron, David Brooks, 2002
Current floorplan – CPU core
© Mircea Stan, Kevin Skadron, David Brooks, 2002
Soon to be features
• Grid model – RC network per grid cell
instead of a block
• Temperature models for wires, pads
and interface material between heat
sink and spreader
• Better (more user friendly) floorplan
specification
• Automatic floorplan generation using
classical floorplanning algorithms
Better floorplan specification
© Mircea Stan, Kevin Skadron, David Brooks, 2002
• Floorplan of current microprocessors
has a structural similarity
• Floorplans similar to MIPS R10K,
Pentium and the Alpha 21264
• Pipeline order corresponds to floorplan
adjacency
Better floorplan specification
© Mircea Stan, Kevin Skadron, David Brooks, 2002
• Sample specification (with % areas)
that takes advantage of pipeline order
Automatic floorplan for architects
© Mircea Stan, Kevin Skadron, David Brooks, 2002
• Why develop an architectural
floorplanning tool?
– Thermal modeling requires adjacency
information.
– Wire delays make performance depend
on the floorplan.
• Goal
– Derive a realistic floorplan using only
microarchitectural information
– Trade off thermal efficiency against
latency
– Simulated annealing based floorplan
optimization for thermal, delay and
combined metrics
• Current work. Results will be
available soon
Sensors
© Mircea Stan, Kevin Skadron, David Brooks, 2002
Caveat emptor:
We are not well-versed on sensor
design; the following is a digest of
information we have been able to
collect from industry sources and the
research literature.
© Mircea Stan, Kevin Skadron, David Brooks, 2002
Desirable Sensor Characteristics
•
•
•
•
•
•
•
Small area
Low Power
High Accuracy + Linearity
Easy access and low access time
Fast response time (slew rate)
Easy calibration
Low sensitivity to process and supply
noise
© Mircea Stan, Kevin Skadron, David Brooks, 2002
PowerPC G3
• (Sanchez et al, Symp. on VLSI
Circuits ‘97, COMPCON ‘97)
• 0.35 μ, 2.5V
• Area 0.2 mm2
• Power: 10 mW
• Precision: ±4.5°
• Offset: 12° at process corners
• Linearity: < ±4°
• Based on thermal diodes and current
mirrors
Types of Sensors
(In approx. order of increasing ease to build)
© Mircea Stan, Kevin Skadron, David Brooks, 2002
• Thermocouples – voltage output
– Junction between wires of different materials; voltage
at terminals is α Tref – Tjunction
– Often used for external measurements
• Thermal diodes – voltage output
– Biased p-n junction; voltage drop for a known current
is temperature-dependent
• Biased resistors (thermistors) – voltage output
– Voltage drop for a known current is temperature
dependent
• You can also think of this as varying R
– Example: 1 KΩ metal “snake”
• BiCMOS, CMOS – voltage or current output
– Rely on reference voltage or current generated from a
reference band-gap circuit; current-based designs
often depend on temp-dependence of threshold
Thermal Sensors in PowerPC
© Mircea Stan, Kevin Skadron, David Brooks, 2002
• On-chip temperature sensor (junction
temperature)
– Based on differential voltage change
across 2 diodes of different sizes
– Implemented in PowerPC G3/G4
processors
• Instruction Cache Throttling used to
dynamically lower junction
temperature
© Mircea Stan, Kevin Skadron, David Brooks, 2002
Typical Sensor Configuration
PTAT – Proportional to Absolute Temperature
© Mircea Stan, Kevin Skadron, David Brooks, 2002
Absolute Sensor 1
Syal, Lee, Ivanov, Altet, Online Testing Workshop, 2001
Schematics of Delta Vgs Current Reference (left)
Generator and Delay Cell (right)
© Mircea Stan, Kevin Skadron, David Brooks, 2002
Sensors: Problem Issues
• Poor control of CMOS transistor
parameters
• Noisy environment
– Cross talk
– Ground noise
– Power supply noise
• These can be reduced by making the
sensor larger
– This increases power dissipation
– But we may want many sensors
© Mircea Stan, Kevin Skadron, David Brooks, 2002
“Reasonable” Values
• Based on conversations with
engineers at Sun, Intel, and
HP (Alpha)
• Linearity: not a problem for range of
temperatures of interest
• Slew rate: < 1 μs
– This is the time it takes for the physical
sensing process (e.g., current) to reach
equilibrium
• Sensor bandwidth: << 1 MHz, probably
100-200 kHz
– This is the sampling rate; 100 kHz = 10 μs
– Limited by slew rate but also A/D
• Consider digitization using a counter
“Reasonable” Values: Precision
© Mircea Stan, Kevin Skadron, David Brooks, 2002
• Mid 1980s: < 0.1° was possible
• Precision
–
–
–
–
±
±
±
<
3° is very reasonable
P: 10s of mW
2° is reasonable
1° is feasible but expensive
± 1° is really hard
• The limited precision of the G3
sensor seems to have been a design
choice involving the digitization
Calibration
• Accuracy vs. Precision
© Mircea Stan, Kevin Skadron, David Brooks, 2002
– Analogous to mean vs. stdev
• Calibration deals with accuracy
– The main issue is to reduce inter-die
variations in offset
• Typically requires per-part testing
and configuration
• Basic idea: measure offset, store it,
then subtract this from dynamic
measurements
© Mircea Stan, Kevin Skadron, David Brooks, 2002
Dynamic Offset Cancelation
• Rich area of research
• Build circuit to continuously,
dynamically detect offset and
cancel it
• Typically uses an op-amp
• Has the advantage that it adapts to
changing offsets
• Has the disadvantage of more
complex circuitry
Role of Precision
© Mircea Stan, Kevin Skadron, David Brooks, 2002
• Suppose:
– Junction temperature is J
– Max variation in sensor is S
– Thermal emergency is T
• T=J–S
• Spatial gradients
– If sensors cannot be located exactly at
hotspots, measured temperature may be
G° lower than true hotspot
• T=J–S–G
© Mircea Stan, Kevin Skadron, David Brooks, 2002
Rate of change of temperature
• Our FEM simulations suggest
maximum 0.1° in about 25-100 μs
• This is for power density < 1 W/mm2
die thickness between 0.2 and 0.7mm,
and contemporary packaging
• This means slew rate is not an issue
• But sampling rate is!
Sensors Summary
• Sensor precision cannot be ignored
© Mircea Stan, Kevin Skadron, David Brooks, 2002
– Reducing operating threshold by 1-2
degrees will affect performance
• Precision of 1° is conceivable but
expensive
– Maybe reasonable for a single sensor or
a few
• Precision of 2-3° is reasonable even
for a moderate number of sensors
• Power and area are probably
negligible from the architecture
standpoint
• Sampling period <= 10-20 μs
© Mircea Stan, Kevin Skadron, David Brooks, 2002
HotSpot Summary
• HotSpot is a simple, accurate and
fast architecture level thermal
model for microprocessors
• Over 90 downloads till now
• Ongoing active development –
architecture level floorplanning will
be available soon
• Download site
– http://lava.cs.virginia.edu/HotSpot
• Mailing list
– www.cs.virginia.edu/mailman/listinfo/hotspot
© Mircea Stan, Kevin Skadron, David Brooks, 2002
Temperature-aware computing:
Optimize performance subject to a
thermal constraint