Performance / Watt: The New Server Focus
Download
Report
Transcript Performance / Watt: The New Server Focus
Performance / Watt:
The New Server Focus
Improving Performance / Watt
For Modern Processors
Tim Shattuck <[email protected]>
April 19, 2006
From the Paper by James Laudon <[email protected]>
Computer Architecture News, Volume 33, Number 4, September
2005
[Tim Shattuck, 2006]
[1]
At Issue:
Power Hungry Servers
Increasing Costs to Power Hardware
Wastes Limited Resources
[Tim Shattuck, 2006]
[2]
Three Trends
High power consumption to performance gains
ratio
Hardware costs account for a smaller
percentage of Total Cost of Ownership (TCO)
Energy costs are rising
These trends are expected to make power the
dominant factor in calculating TCO within five
years.
[Tim Shattuck, 2006]
[3]
Niagra Optimizations
Simple
Clock gating
Pipelines
More complex
Hardware support for multithreading
[Tim Shattuck, 2006]
[4]
Simple Optimizations
Clock gating
Don't power idle parts of the chip
Shorter, medium-length pipelines
Fewer registers, transistors between stages
Less power wasted on (failed) speculation
Allow for more cores / chip
[Tim Shattuck, 2006]
[5]
More Optimizations
Hardware Multithreading
Keep on-chip resources busy
Deals with high cache miss rates
Boosts performance / Watt
Increases throughput of threads
Increases power consumption only slightly
Increases size of the die 4 - 7% per thread
[Tim Shattuck, 2006]
[6]
Cores / Die
Fewer complex cores
More simple cores
Individual thread
completion
Aggregate thread
throughput
Simpler cores tend to have better performance / Watt
ratios
[Tim Shattuck, 2006]
[7]
Sufficient Cache and Memory
Bandwidth
Necessary to keep threads busy
Sun's Niagra:
Cores connected to L2 cache by a crossbar switch
Cache bandwidth of 76.8 GB/s
Four memory controllers directly connected to
DDR2 SDRAM memory unit (200 Mhz)
Raw memory bandwidth of 25.6 GB/s
Controllers can reorder accesses to favor reads
over writes.
[Tim Shattuck, 2006]
[8]
Testing
SPEC JBB 2000
Java server side business logic
TPC-C, TPC-W
Transactional processing tests
XML Test
Sun's multithreaded processing test.
Result: Scalar processors with moderate pipelines and thread
support outperformed superscalar processors.
[Tim Shattuck, 2006]
[9]
Case Studies
Sun's Niagra
8 cores, 4 threads each
Scalar cores
Tries to maximize performance / Watt
Intel's Pentium Extreme Edition
2 cores, 2 threads each
Superscalar cores
Tries to maximize performance
[Tim Shattuck, 2006]
[10]
Case Studies (II) - Results
Feature
Clock Speed
Pipeline Depth
Number of Cores
Number of Threads
L2 Bandwidth
Memory Bandwidth
Transistor Count
Niagra
1.2 Ghz
6 stages
8
32
76.8 GB/s
25.6 GB/s
279 Million
Pentium Extreme Edition
3.2 Ghz
31 stages
2
4
~180 GB/s
6.4 GB/s
230 Million
Power
72 W
130 W
[Tim Shattuck, 2006]
[11]
Simple Core Limitations
Lower single thread performance
Amplified by lower instruction level parallelism
Keeping a large number of threads busy may
become difficult
Hot locks – threaded applications may not scale
very well
[Tim Shattuck, 2006]
[12]
Future Directions
Use multithreading to enhance single threaded
applications
Run-ahead execution – allows out of order
execution with only a modest amount of hardware
Software control of power consumption
Dynamic adjustments to voltage and frequency to
tune power consumption
Control of non-processing devices' (disk, memory
systems) power consumption
[Tim Shattuck, 2006]
[13]
Conclusion
Invest in a Niagra today!
[Tim Shattuck, 2006]
[14]