Dark Blue with Orange
Download
Report
Transcript Dark Blue with Orange
Presentation of
Designing Efficient Irregular Networks for
Heterogeneous Systems-on-Chip
by
Christian Neeb and Norbert Wehn
and
Workload Driven Synthesis of Irregular
Application Specific NoC's
by
Ben Meakin
Introduction
Irregular vs Regular Networks
N1
N2
N3
N1
N2
N3
N4
N5
N6
N4
N5
N6
N7
N8
N9
N7
N8
N9
Advantages:
High performance/low power
for specific application
Lower hardware cost of
network
Disadvantages:
Difficult routing
Difficult floor planning/P&R
Advantages:
Efficient for general purpose
applications
Easy routing
Easy floor planning/P&R
Disadvantages:
Expensive
Under utilized channels
Avg. latency not as scalable
Background and Motivation
NoC power a function of network hops:
NoC latency a function of network hops:
L = Rpipeline * Hops (clock cycles)
NoC hardware cost a function of router radix:
Pperflit = (buff + xbar + arb)*Hops + wire*D
C = sum (k1*radix_n + k2)
n = 1....nodes
Minimize hops and router complexity!
Design of Efficient Irregular Networks
for Heterogeneous Systems-on-Chip
Christian Neeb and Norbert Wehn
University of Kaiserslautern
9th EUROMICRO Conference on Digital System Design 2006
Application Communication Model
Resource Communication Graph (RCG)
Vertex: hardware component
Edge: abstract communication channel
Edge Weight: average communication rate (bandwidth)
Normalized to physical bandwidth of one channel link
Efficient mapping of task communication to RCG
is assumed and is not discussed in this paper
INOC Architecture
Nodes are tiles consisting of a chip resource and a
router
Routers: Wormhole with virtual channels
Round-Robin switch arbitration
Routing:
Deterministic: one output channel per destination
address
Adaptive: > one output channel per destination address
Routing in Irregular Topologies
Channel Dependency Graph (CDG)
Vertex: channel
Edge: pair of channels with a dependency due to
routing function
Deadlock free if there are no cycles
Node Numbering
Arbitrary in this paper
Channels Increasing / Channels Decreasing
Acyclic CDGs
Determined by source/destination node numbers
Deadlock Free Routing
Route packets on increasing channels, then decreasing
General form of dimension order routing
Routing in Irregular Topologies
Unsafe paths allowed provided there is a safe path
(acyclic) that a packet can escape to
Routing table filled via Dijkstra's algorithm
Virtual channels are used to expand routing
function using similar rules as dimension order
schemes
Network Generation
Start with a bidirectional connection of all nodes in
ascending order
Ensures a correct routing path exists
Add shortcuts based on network traffic estimation
and RCG
Provides an optimal routing path that will be used in
the common case
Results
Network traffic patterns randomly generated with
constraints to model heterogeneous SoC traffic
Simulated with 100 different patterns
Variable flit injections rates
Limitations
Naive implementation of “safe path” increases
cost/complexity of network
RCG is the most efficient topology
More efficient deadlock free solutions may exist
Efficient mapping of TCG to RCG is assumed
Can be a difficult problem
Network generation could be linked to a specific
programming model or API for simple TCG to RCG
mapping
Workload Driven Synthesis of Irregular
Application-Specific NoC's
Ben Meakin
[email protected]
Project Objectives
Synthesis tool for INoC's
Generation of optimal network topology
MCAPI workload specification
Synthesis of deadlock free routing
Comparison of irregular to regular NoC
performance, power, and cost
Motivation
CAD for ASICS always needs improvement
Reduce design cost, time-to-market, etc.
Synthesis of INoCs could be useful in
implementing parallel algorithms in programmable
logic
Average router complexity needs to be minimized
Asynch NoCs can realize improvement in average case
performance
GALS INoCs could be future direction in heterogeneous
SoC
Minimize hops to reduce power and latency
Specification of Workload
Communication channels:
ScalarChan N1
Comm Type
N2
256e9
10
Sender Receiver BW (bits/sec) Priority
Optimization Effort:
E = 100 * ((BW / Norm)/2 + (PR / Norm)/2)
Network Synthesis Algorithm
Sort channels by priority/effort
Add all nodes to network
For each channel:
if sender has links:
while link has not been added
if receiver has links, add link; else increase search depth
else recursive call from next sender neighbor node
while not done
if network is not correct, add link; else done
Network Synthesis Example
Max Radix is 3
N1 – N2 added
Highest priority link
N1
N2
Network Synthesis Example
N1 – N4 added
N1 – N3 added
N3 – N4 added
N4
N1
N3
N2
Network Synthesis Example
N1 – N5 added
Max radix of N1 reached, so
N2 used as via
N6 – N7 added
N5 – N8 added
Are we done?
N4
N1
N2
N6
N8
N3
N5
N7
Network Synthesis Example
N2 – N6 added for correctness
Every node must have a path to every
other node
N4
N1
N2
N6
N8
N3
N5
N7
Synthesis of Routing Function
Rename nodes with naming algorithm
Depth first, breadth first, root node selection
Most efficient synthesis for application
For each node pair in workload
if not, add a virtual channel to make it correct
Most correct synthesis for general purpose
For ALL node pairs in network
check if shortest path is correct
check if shortest path is correct
if not, add a virtual channel to make it correct
Synthesize ROM for each node
Uses of this Tool
Incorporate with previous MCAPI hardware
implementation
Allow user to synthesize a working HDL model based
on MCAPI workloads and a set of IP blocks
Modern FPGA's could make this a feasible and cost
effective alternative to fabricating ASIC's
Aid for studying optimal topology for known
network traffic
Questions?