Lower Power Synthesis - VADA

Download Report

Transcript Lower Power Synthesis - VADA

L35: Lower Power Voltage
Scaling
1999. 8.
성균관대학교 조 준 동
http://vada.skku.ac.kr
SungKyunKwan Univ.
VADA Lab.
1
Voltage Scaling
• Merely changing a processor clock frequency
is not an effective technique for reducing
energy consumption. Reducing the clock
frequency will reduce the power consumed by
a processor, however, it does not reduce the
energy required to perform a given task.
• Lowering the voltage along with the clock
actually alters the energy-per-operation of the
microprocessor, reducing the energy required
to perform a fixed amount of work.
SungKyunKwan Univ.
VADA Lab.
2
Dynamic Voltage Scaling(DVS)
SungKyunKwan Univ.
VADA Lab.
3
Processor Usage Model
SungKyunKwan Univ.
VADA Lab.
4
OS: Voltage Scaling
SungKyunKwan Univ.
VADA Lab.
5
Scale Supply Voltage with
fCLK
SungKyunKwan Univ.
VADA Lab.
6
Adaptive Power Supply
Voltages
SungKyunKwan Univ.
VADA Lab.
7
Variable Supply Voltage Block Diagram
SungKyunKwan Univ.
VADA Lab.
8
Typical MPEG IDCT
Histogram
SungKyunKwan Univ.
VADA Lab.
9
Voltage scheduling under timing constraints
– Energy consumption of a processor:
• 10nJ/cycle at 2.5V
• 25nJ/cycle at 4 V
• 40nJ/cycle at 5V
– maximum clock frequencies:
• 50MHz at 5V, 40MHz at 4V, 25MHz at 2.5V
– Given that an application needs 1000M cycles
to finish and the timing constaint is 25sec.
SungKyunKwan Univ.
VADA Lab.
10
Energy consumption ( Vdd2)
Different Voltage Schedules
40J
1000Mcycles
50MHz
5.02
0
5.02
5
10
15
32.5J
750Mcycles
50MHz
Timing constraint
(A)
20
25
250Mcycles
25MHz
Time(sec)
(B)
2.52
0
5
5.02
4.02
10
15
20
25
25J
1000Mcycles
40MHz
0
5
10
SungKyunKwan Univ.
15
Time(sec)
(C)
20
25
Time(sec)
VADA Lab.
11
Example of Variable Supply
SungKyunKwan Univ.
VADA Lab.
12
DVS Implementation
SungKyunKwan Univ.
VADA Lab.
13
Variable Supply Voltage Block Diagram
•
•
•
SungKyunKwan Univ.
Computational work varies with
time. An approach to reduce
the energy consumption of
such systems beyond shut
down involves the dynamic
adjustment of supply voltage
based on computational
workload.
The basic idea is to lower
power supply when the a fixed
supply for some fraction of
time.
The supply voltage and clock
rate are increased during high
workload period.
VADA Lab.
14
Data Driven Signal Processing
The basic idea of
averaging two samples
are buffered and their
work loads are
averaged.
The averaged workload
is then used as the
effective workload to
drive the power supply.
Using a pingpong
buffering scheme, data
samples In +2, In +3
are being buffered while
In, In +1
are being processed.
SungKyunKwan Univ.
VADA Lab.
15
Example of Buffering
SungKyunKwan Univ.
VADA Lab.
16
Graphical Interpretation
SungKyunKwan Univ.
VADA Lab.
17
Buffering Example: MPEG Decoder
SungKyunKwan Univ.
VADA Lab.
18
DVS
SungKyunKwan Univ.
VADA Lab.
19
DVS Scheduling Framework
Energy ~ Work • Speed
µProc. Speed
Start
Work
Deadline
Idle time
represents
wasted
energy
Start
Deadline
Lower speed,
Lower voltage,
Lower energy
Work
Time
• Use real-time framework to
constrain task voltage scheduling
SungKyunKwan Univ.
VADA Lab.
20
DVS Simulation
Interrupts
S2
S3 D1
D3
D2
Task
Variance
Speed
S1
Time
Theory
User
Input
Cache
Behavior
Scheduling
Overhead
Intercom
Weather
Reality
Implementation
Simulate run-time scheduler to
fully understand voltage-scaling behavior
SungKyunKwan Univ.
VADA Lab.
21
Simulation Infrastructure
GUI
MPEG
{
Frame_Start(deadline);
Decode_MPEG_Frame();
Frame_Finish();
Cryptography
}
Windowing
I/O Support
Application
support libraries
Voltage
Scheduler
MPEG  Priority 80
GUI  Priority 23
Run-time
Scheduler Speed  Priority
lpARM
Develop support environment to
model complete software system
SungKyunKwan Univ.
VADA Lab.
22
Normalized
to 3.3V
fixed-voltage
processor
Total System Energy
Run-Time Voltage Scaling
100%
80%
DVS Simulation
Post-Trace Optimal
73%
65%
58%
60%
46%
40%
20%
16% 15%
25% 20%
Combination
of independent
benchmarks
0%
Audio
GUI
MPEG
Audio &
MPEG
• Dynamic Voltage Scaling
significantly reduces energy dissipation!
SungKyunKwan Univ.
VADA Lab.
23
Run-Time Performance Analysis
80%
60%
Audio
GUI
MPEG
40%
20%
0%
0 Fixed-V Frame Execution Time 2x
deadline
Normalized to
deadline at max
processor speed
100%
Total System Energy
100%
Frame Computation Histogram
80%
DVS System Energy
Basic Algorithm
Adjusted Algorithm
Post-Trace Optimal
60%
40%
20%
0%
Audio
MPEG
GUI
Software can automatically
recognize and adjust for
bi-modal GUI distribution
• Application characteristics strongly affect
voltage scaling performance
SungKyunKwan Univ.
VADA Lab.
24
Compute ASAP+ System
Shutdown
SungKyunKwan Univ.
VADA Lab.
25
Another Approach: Reduce Clock
Frequency
SungKyunKwan Univ.
VADA Lab.
26
Voltage Scheduling II
SungKyunKwan Univ.
VADA Lab.
27
Evaluation: Algorithms
SungKyunKwan Univ.
VADA Lab.
28
AVG<weight>
• Computes an exponentially moving average of the
previous intervals. At each interval the run-percent
from the previous interval is combined with the
previous running average, forming a long-term
prediction of system behavior. <weight> is the
relative weighting of past intervals relative of the
current interval (larger value means a great weight
on the past) using the equation (weight X old +
new)/(weight+1). 3 can be used.
SungKyunKwan Univ.
VADA Lab.
29
OS: Voltage Scheduling
SungKyunKwan Univ.
VADA Lab.
30
Run-Time Scheduling Dynamics
µProc. Speed
Run faster
to make up
lost time
Thread accomplishing
more than expected,
reduce speed
Deadline exceeded,
increase speed
Higher-priority
task
Initial speed
estimate
Time
E(work)
Optimal
schedule
Workload calculated to be
average of previous frames
• Periodically re-evaluate schedule to
adjust for unforeseen events
SungKyunKwan Univ.
VADA Lab.
31
Vertical Layering
SungKyunKwan Univ.
VADA Lab.
32
Optimal Scheduling
• For a region spanned by a given task
specification, each point in time will either be
scheduled at the minimum speed spanned by
that task or else the task will not be
scheduled to run at that point.
Algorithm
• n tasks to schedule
• O(n) speed settings to consider for each task
• O(n) linked tasks requiring adjustment for
each setting: Total complexity: O(n 3 ) time.
SungKyunKwan Univ.
VADA Lab.
33
Scheduling step0
SungKyunKwan Univ.
VADA Lab.
34
Scheduling step1
SungKyunKwan Univ.
VADA Lab.
35
Scheduling step2
SungKyunKwan Univ.
VADA Lab.
36
Scheduling step3
SungKyunKwan Univ.
VADA Lab.
37
Scheduling step4
SungKyunKwan Univ.
VADA Lab.
38
Scheduling step5
SungKyunKwan Univ.
VADA Lab.
39
References
•
[Lin97] Lin et al., "Scheduling Techniques for Variable Voltage Low Power Designs," ACM
Transactions on Design Automation of Electronic Systems, vol. 2, no. 2, pp. 81-97, 1997.
•
[Govil95] - Extended simulation with practical algorithms on traces of UNIX workstations
•
[Kuroda98] - Implementation of DVS processor to mitigate effects of process variation
•
[Ishihara98] - Dynamic voltage scaling with non- constant capacitances
•
S. Gary, et. al., "The PowerPC 603 Microprocessor: A Low-Power Design for Portable
Applications," Proceedings of the Thirty-Ninth IEEE Computer Society International Conference,
Mar. 1994, pp. 307-15.
•
A. Chandrakasan, Low Power Digital CMOS Design, Boston: Kluwer Academic Publishers, 1995.
•
C. Nagendra, et.al., "A Comparison of the Power-Delay Characteristics of CMOS Adders,”
Proceedings of the International Workshop on Low Power Design, Apr. 1994, pp. 231-6.
•
T. Callaway and E. Swartzlander, "Optimizing Arithmetic Elements for Signal Processing," VLSI
Signal Processing, Vol. 5, New York: IEEE Special Publications, 1992, pp. 91-100.
•
T. Biggs, et. al., "A 1 Watt 68040-Compatible Microprocessor," Proceedings of the IEEE
Symposium on Low Power Electronics, Oct. 1994, pp. 8-11.
•
J. Lorch, A Complete Picture of the Energy Consumption of a Portable Computer, M.S. Thesis,
University of California, Berkeley, 1995
SungKyunKwan Univ.
VADA Lab.
40
References
•
•
•
•
•
•
•
S. Kunii, "Means of Realizing Long Battery Life in Portable PCs," Proceedings of the
IEEE Symposium on Low Power Electronics, Oct. 1995, pp. 12-3.
M. Culbert, "Low Power Hardware for a High Performance PDA," Proceedings of the
Thirty-Ninth IEEE Computer Society International Conference, Mar. 1994, pp. 144-7.
T. Ikeda, "ThinkPad Low-Power Evolution," Proceedings of the IEEE Symposium on
Low Power Electronics, Oct. 1995, pp. 6-7.
A. Chandrakasan, A. Burstein, and R.W. Brodersen, "A Low Power Chipset for Portable
Multimedia Applications," IEEE Journal of Solid State Circuits, Vol. 29, Dec. 1994, pp.
1415-28.
M. Horowitz, T. Indermaur, and R. Gonzalez, "Low-Power Digital Design,"
Proceedings of the IEEE Symposium on Low Power Electronics, Oct. 1994, pp. 8-11.
D. Lidsky and J. Rabaey, "Early Power Exploration - A World Wide Web Application,"
Proceedings of the Thirty-Third Design Automation Conference, June 1996.
T. Burd, Low-Power CMOS Cell Library Design Methodology, M.S. Thesis, University
of California, Berkeley, UCB/ERL M94/89, 1994.
SungKyunKwan Univ.
VADA Lab.
41
•
•
•
•
•
•
•
A. Chandrakasan, S. Sheng, and R.W. Brodersen, "Low-Power CMOS Digital Design,"
IEEE Journal of Solid State Circuits, Apr. 1992, pp. 473-84.
Advanced RISC Machines, Ltd., ARM710 Data Sheet, Technical Document, Dec. 1994.
Integrated Device Technology, Inc., Enhanced Orion 64-Bit RISC Microprocessor, Data
Sheet, Sep. 1995.
Intel Corp., Embedded Ultra-Low Power Intel486TM GX Processor, SmartDieTM
Product Specification, Dec. 1995.
A. Stratakos, S. Sanders, and R.W. Brodersen, "A Low-voltage CMOS DC-DC
Converter for Portable Battery-operated Systems," Proceedings of the Twenty-Fifth
IEEE Power Electronics Specialist Conference, June 1994, pp. 619-26.
J. Bunda, et. al., "16-Bit vs. 32-Bit Instructions for Pipelined Architectures,"
Proceedings of the 20th International Symposium on Computer Architecture, May 1993,
pp. 237-46.
Advanced RISC Machines, Ltd., Introduction to Thumb, Developer Technical
Document, Mar. 1995.
SungKyunKwan Univ.
VADA Lab.
42
•
•
•
•
•
•
J. Bunda, W.C. Athas, and D. Fussell, "Evaluating Power Implications of CMOS
Microprocessor Design Decisions," Proceedings of the International Workshop on Low
Power Design, Apr. 1994, pp. 147-52.
P. Freet, "The SH Microprocessor: 16-Bit Fixed Length Instruction Set Provides Better
Power and Die Size," Proceedings of the Thirty-Ninth IEEE Computer Society
International Conference, Mar. 1994, pp. 486-8.
T. Burd, B. Peters, A Power Analysis of a Microprocessor: A Study of an
Implementation of the MIPS R3000 Architecture, ERL Technical Report, University of
California, Berkeley, 1994.
J. Montanaro, et. al., "A 160MHz 32b 0.5W CMOS RISC Microprocessor," Proceedings
of the Thirty-Ninth IEEE International Solid-State Circuits Conference - Slide
Supplement, Feb. 1996, pp. 170-1.
J. Bunda, Instruction-Processing Optimization Techniques for VLSI Microprocessors,
Ph.D. Thesis, The University of Texas at Austin, 1993.
R. Gonzalez and M. Horowitz, "Energy Dissipation in General Purpose Processors,"
Proceedings of the IEEE Symposium on Low Power Electronics, Oct. 1995, pp. 12-3.
SungKyunKwan Univ.
VADA Lab.
43