minimising dynamic power consumption in on

Download Report

Transcript minimising dynamic power consumption in on

MINIMISING DYNAMIC POWER
CONSUMPTION IN
ON-CHIP NETWORKS
Robert Mullins
Computer Architecture Group
Computer Laboratory
University of Cambridge, UK
Communication-Centric
Architectures
• Future performance gains will primarily come
from increasing the number of IP cores in a
system not their complexity or operating
frequency
• Many reasons:
–
–
–
–
–
Diminishing returns from simply scaling what we have
Energy efficiency
Complexity
Fault tolerance
Economics
2/19
On-Chip Networks
• An efficient general
purpose chip-wide
communication
infrastructure is becoming
essential
• One flexible networking
option is to use packetswitched networks with
support for virtualchannels
3/19
The Lochside
Router
• Router Architecture
– Highly parameterised
implementation
– Packet-switched network
with virtual-channel flowcontrol
– Best case latency is one
cycle per network hop.
• Results presented here
are from post P&R
simulations targeting a
90nm technology
TILE
Traffic
Generator,
Debug &
Test
R
Lochside Chip (2004/05)
180nm Technology
4/19
Exploiting Speculation to Reduce
Communication Latency
Peh/Dally (2001)
5/19
Exploiting Speculation to Reduce
Communication Latency
6/19
Aims of this work
• Apply existing power saving techniques to an
on-chip network design
– e.g. clock and signal gating, gate-level optimisations
etc.
– Importance of applying such techniques before
making comparisons
• Measure power consumption and provide an
accurate breakdown of where the remaining
power is dissipated
• Where is best place to look for future power
savings?
7/19
Measuring and Optimizing Dynamic
Power
• Our Test Case
– 8mm x 8mm die
– 4x4 mesh network
– Low-latency routers, best
case latency is one cycle
per hop (incl. interconnect)
– 1.2V, 90nm technology
– 4 input-buffers/ VC
– 4 VC/ input port
– 48 x 80-bit network links
– 800MHz @ WC PVT
• ~32 FO4 clock period
– Results reported at
250MHz
8/19
Interconnect Delay/Energy Trade-offs
• Power dissipated in network links depends on how links
are spaced and buffered
• At least a factor of 3 difference in energy consumption
over range of potential interconnect options
• Could move to low-swing differential schemes for even
greater energy savings
For results we assume min. spaced wires, opt. energy x delay product
9/19
Clock Gating
• Clock gating optimisations applied at two
levels:
– Local Clock Gating
• Automated clock gating within router
• Some tuning of RTL involved to maximise
opportunities for synthesis tool
– Router Level Clock Gating
• Exploit opportunities to gate clock as it enters the
router
• Isolates router’s clock completely, only static power
consumption remains
10/19
Router-Level Clock Gating
• Clock gating exposes clock tree insertion delay
• Need to know early if router will be required
• Generate ‘early valid’ signals in neighbouring routers
– Early-valid signals are slightly pessimistic
– Based on what is requested not granted
11/19
Gate-Level Optimizations and
Signal Gating
• Automated signal gating and gate-level power
optimisations had minimal impact
• Inserting signal gating logic manually did reduce
input FIFO power requirements significantly
• The reported results could be further improved
(by 12%) by enabling logic optimisation across
module boundaries
– This was restricted to accurately determine where
power is dissipated
12/19
Analysis of Power Consumption
Power consumption of a single router and its links
• Simple power optimisations can quarter power
requirements + many more opportunities to save power
• Network is ~5% of core area
• Perhaps 10% of system power at present
• Don’t make comparisons without optimizing power!
13/19
Analysis of Power Consumption
• 22% Static power, 11% Inter-Router Links
• ~1% Global Clock tree
• 65% Dynamic Power
– Power Breakdown
• ~50% of dynamic power is consumed in local clock
tree and input FIFOs
• ~30% on router datapath
• ~20% on scheduling and arbitration
– Scheduling is probably more complex than typical
implementations due to speculation
14/19
Low-Power On-Chip Networks
• Interconnect and static power set to
increase
– Many low-power link technologies
• Low-swing differential techniques
– Power gating and other leakage reduction
techniques
• Potential power savings begin to require
lots of different techniques – no one silver
bullet?
15/19
Low-Power On-Chip Networks
• Topology
– Don’t want to sacrifice general or at least multipurpose nature of our networked SoC
– Results suggest higher radix routers and longer
interconnects could reduce power
• Probably not a long term solution
• Reduces path diversity, bad for fault-tolerance
• Architecture
– Scope for minimising memory required to store
precomputed router schedule (particular to our router)
– Simpler routers
– Single cycle routers reduce power? Speculation for
low-power?
16/19
Supporting Best-Effort (BE) and
Guaranteed Services (GS) Efficiently
• Current timing of the datapath and link suggests
additional GS data could be routed in the same
clock cycle
– Allocate datapath/link to GS traffic for first ½ of clock
cycle
• Double capacity of network
– Exploit simpler GS circuit-switched routing when
possible
– Reduce power
• Very little additional overhead
17/19
Clocking On-Chip Networks
• Network system timing issues are interesting
– naturally event-driven not synchronous
• Work is investigating placing local data-driven
clock generators in each network router
–
–
–
–
Clock is stretched when no data to be routed
Clock matches rate of incoming data streams
Robust synchronisation solution (true GALS)
Also investigating incorporating power gating support
• See also Distributed Clock Generator – DCG
(Fairbanks/Moore)
18/19
Challenges and Future Work
• These are early results in a much more rigorous study
on the power requirements of networked on-chip
comummunication
– Much more soon!
• Exploiting a general-purpose on-chip network
–
–
–
–
Exploiting execution diversity to improve energy-efficiency
Multi-use platforms and Virtual-IP
Fault tolerance
Networks of processing elements or networks that process?
• Scope for removing unnecessary interfaces and boundaries
• Impact of networking on IP and processor core design
19/19
Thank You