SMART: A Single-Cycle Reconfigurable NoC for SoC Applications

Download Report

Transcript SMART: A Single-Cycle Reconfigurable NoC for SoC Applications

SMART: A SingleCycle Reconfigurable
NoC for SoC
Applications
Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay
Subramaniam, Anantha P. Chandrakasan, Li-Shiuan Peh
Department of Electrical and Computer Science, MIT,
Cambridge
-Jyoti Wadhwani
Evolution of on-chip systems
ECE 284 Spring 2013
2
Challenges with this evolution
Scaling “compute” possible: Moore’s Law
What about communication network?
ECE 284 Spring 2013
3
More “hops” are bad
At each hop: router
• Latency
• Power
At system level
• delayed responses
 delayed injection of fresh requests
 overall shutdown
 increased power budget
ECE 284 Spring 2013
4
Motivation
• NoCs should deliver
• Low latency
• High bandwidth
with low power and
area overhead
• Signaling at low-voltage swing can lower energy consumption and propagation
delay
• Wire delay is much shorter than a typical router cycle time
• Can traverse multiple hops in a single cycle by bypassing buffering & arbitration at the
routers
Wires can be driven to
multiple mm within a
cycle using repeaters
1mm
1mm
Router cycle time = 500ps for a 2GHz clock
Full-swing repeated wire delay ~ 100ps/mm
by bypassing the buffers, we can traverse
5mm in 1 clock cycle!
1mm
1mm
1mm
Number of hops in a cycle
depends on the repeater
circuit and wire parasitics
1mm
ECE 284 Spring 2013
5
Approaches to reduce on-chip latency
• Application-specific topology reconfiguration needed
• To bypass the buffering and arbitration at routers
• Topology can be reconfigured to match application-specific
communication patterns at
• Design time
• Requires knowledge of all applications and their communication graphs at
design time
• Overhead: wiring density to support dedicated links
• Runtime
• Computation of contention free routes allowing flits to bypass the queues
This paper performs
online reconfiguration
of network routers at
runtime, to enable
different applications
to run on tailored
topologies
ECE 284 Spring 2013
6
SMART LINK
• Voltage lock repeater (VLR): Asynchronous low-swing repeater circuit
• For single-cycle multi-hop link traversal
• Low-swing link stretches the maximum distance spanned by a repeated
link in a single clock cycle
• For transmitting 5.5Gb/s data with BER less than 10−9 , power
consumption for
• Full swing repeater is 4.21mW
• VLR is 3.78mW
• Delay of the link with
• Full-swing repeaters is 100ps/mm
• VLRs is 60ps/mm
Node X voltage locked to
swing near the threshold
voltage of INV1x without
decrease in drive current
Low-swing voltage level is determined
by transistor sizes and link wire
impedance simulations performed
across process corners
ECE 284 Spring 2013
7
SMART Router Microarchitecture
SMART Crossbar
If the MUX is preset to
connect the incoming
link to the crossbar,
bypass path is enabled
bypass path
If the MUX is set to
connect the input port
buffer to the crossbar,
bypass path is disabled
Bypass path is disabled
when the same output
port is shared by
multiple input ports
ECE 284 Spring 2013
8
SMART Flow
The green and purple
flows do not overlap
with each other 
traverse from the source
to destination router in a
single clock cycle
Reverse credit mesh
network: to keep track
of the free VCs at the
endpoint of an arbitrary
SMART route
The red and blue flows
overlap  need to be
stopped at the routers 9
and 10 to arbitrate for
the shared crossbar
ports
For the blue flow, 3, 7
and 11 forward credits
from NIC3 to the router
10’s East output port
The VC queue of a
router keeps track of the
VCs at the input port of
a router multiple hops
away, and not just the
neighbor
ECE 284 Spring 2013
9
Results
• SMART is compared against two baselines:
• Mesh:
• No reconfiguration
• Each hop takes 3 cycles in the router and 1 cycle in the link
• Dedicated:
• 1-cycle dedicated links tailored to each application
• At 2GHz, SMART NoC can traverse 8mm within a
single clock cycle, i.e. 8 hops with 1mm cores
• SMART is 1.5 cycles off in performance from the
Dedicated baseline.
• when one core acts as a source and another acts as a sink
for most of the flows.
ECE 284 Spring 2013
10
Results
• Benefits of SMART are seen more when certain tasks are
tied to specific cores, resulting in longer paths
• SMART NoC gives 60% latency savings and 2.2X power
savings compared to the Mesh.
• Power savings are due to bypassing of buffers, low voltage
signaling and clock gating at the routers
ECE 284 Spring 2013
11
Conclusion
• The paper proposes
• an NoC architecture that reconfigures and tailors a generic
mesh topology for SoC applications at runtime
• a low-swing clockless repeated link circuit embedded within
router crossbars that allows packets to bypass all the way
from source to destination core within a single clock cycle
ECE 284 Spring 2013
12
Critiques/Comments
• Wire delay does not scale with the shrinking of
transistors unlike gate delay.
• In multi-mode design (operating at different voltage
levels) and wire resistance increasing with rise in
temperature, careful transistor sizing in the repeater
circuit is required by simulating across all PVT
corners (not just process corners).
ECE 284 Spring 2013
13
THANK YOU
ECE 284 Spring 2013
14