slides - Computer Science
Download
Report
Transcript slides - Computer Science
Zehan Cui, Yan Zhu,
Yungang Bao, Mingyu Chen
Institute of Computing Technology, Chinese Academy of Sciences
July 28, 2011
Motivation
Design & Implementation
Experiments
Conclusion & Work in Progress
Motivation
Design & Implementation
Experiments
Conclusion & Work in Progress
Watts/Server
[source: The Problem of Power Consumption in Servers,Intel,2009]
CPU no longer dominates
the system power.
[source: Barroso et. al. , The datacenter
as a computer, 2009]
Measurement is the basis.
Hardware
model
Low
power
measurement
Software
Component-Level: ATX-based method
accuracy
Directly powered through ATX wires.
Modern motherboards mostly have dedicated
ATX wires for processor.
VRM (Voltage Regulation Module) loss
Usually deduced from multi ATX wires.
Platform dependent.
Motivation
Design & Implementation
Experiments
Conclusion & Work in Progress
Disk & CPU
◦ Similar to other ATX-based methods
Memory & Add-in Card Devices
◦ Wrapper-based methods
Advantages
◦ Accurate: direct measurement
◦ Easy-to-use: no deduction needed
◦ Portable: multi-platform
Power
Supply
Current Sensor
Prototype
◦ Disk power
◦ CPU power
◦ Memory power
Component
Count
Description
Wrapper Card
1
Memory power measurement.
Intermediate
Card
1
8 channels.
DMM
2
Agilent 34411A.
Collector
1
PC
• Support DDR2-400 DIMM.
• A channel is capable of converting one current into voltages.
• One channel each.
• Max speed: 50K samples per second.
• LAN interface.
• Collect data from DMM.
Motivation
Design & Implementation
Experiments
Conclusion & Work in Progress
Component
Detail
CPU
Intel Core2 Duo E4500
Memory
DDR2-400 2GB UDIMM
Disk
640GB SATA
# of Cores: 2
Clock Speed: 2.2GHz
L2 Cache: 2MB
FSB Speed: 800MHz
Frequency: 200MHz
Max Bandwidth: 3.2GB/s
401.bzip2 from SPECCPU2006
50
CPU
Memory
Disk
(unit: Watt)
45
Power of Components
40
35
30
25
20
15
10
5
0
0
10
20
30
Time from Beginning
40
50
(unit: Second)
60
70
More frequently we measure the power, more
details we can get.
Observation:
5,000 samples/s is an appropriate sample frequency at
component level.
Higher BW,
but lower Power
Lower BW,
Higher Power
Malloc 512MB
Access in
different strides
Two causes
◦
◦
Row conflict
Lots of TLB miss
Time: 6.5 times longer
Power: slightly lower
Energy: 5.9 times higher
increase row buffer hit rate
large page may be more efficient
What is the relationship between
performance and power?
64MB memory
◦ Random vs. Sequential
Jump at least 64B
eliminate cache hit
Large page(2MB)
eliminate TLB miss
Load/Sotre_Unit % = LSU_stall_time/CPU_Cycle
Observation:
It seems that DRAM power is already proportional to bandwidth.
But the fact is that …
Use different SEEDs to generate different random
access patterns;
Power varies less than 1.1%.
Observation:
DRAM power is highly correlated to two factors
• Load/Store Unit Utilization
• Sequential / Random
We can build memory power models based on the two factors rather
than Bandwidth.
Motivation
Design & Implementation
Experiments
Conclusion & Work in Progress
We use a hybrid approach
◦ ATX-Based CPU/Disk
◦ Wrapper card DRAM/…
5KHz is an appropriate sampling frequency to
disclose fine-grain power behavior.
DRAM power is highly correlated to
Load/Store Unit Utilization, rather than
Bandwidth.
Upgrade current system
◦ Support DDR3
◦ Support Large memory capacity
◦ Support 40 simultaneous measuring channels
Use FPGA to collect measured data
Correlate the measured power data with
high-level semantics information
Thanks!
&
Questions?
Backup
Wrapper Card already exists
We only did several small modifications
Current Sensor
Power Supply
Signals
Normal
DIMM: Dual-Inline Memory Module
DIMM slot
Motherboard
With our initial wrapper card
Wrapper Card
DIMM
DIMM slot
Motherboard
I/O Circuitry
Banks
Row Decoder
Driver
s
Column Decoder
[Source: H. David et. al., Memory Power Management via
Dynamic Voltage/Frequency Scaling, ICAC, 2011]
Recievers
Runs at bus speed
• Independent arrays
Clock sync/distribution
• Asynchronous:
On-Die
Termination
Bank
0
Bus drivers and receivers
independent of
• Required by bus electrical
Buffering/queueing
memory bus speed
characteristics
for reliable operation
• Resistive element that dissipates power
Sense
when
busAmps
is active
Write
FIFO
Registers
•
•
•
•
ODT
28
Can be approximately divided into
◦ Background power
considered to be stable
◦ Bank power
active/precharge
Related to frequency of row operation
◦ I/O power
Burst
proportional to bandwidth
◦ Termination power
Termination resistors
Proportional to bandwidth
P=U*I
Doesn’t fluctuate
too much, less than
2% in our platform.
DC Voltage
ADC
CSA
or
DMM
Data
Collector
(PC)
DC Current
DC Voltage
(Current-Sense
Amplifier)
Possible reason for non-proportional of
random power in slide17:
◦ When bandwidth is low, auto-precharge (caused by
refresh) cause every access needs ACTIVE; the bank
power is proportional to bandwidth.
◦ When bandwidth is high, some access may hit in the
row buffer, which need less ACTIVE; the slope of
bank power increase is lower than before.