Dark Blue with Orange

Download Report

Transcript Dark Blue with Orange

Presentation of
Designing Efficient Irregular Networks for
Heterogeneous Systems-on-Chip
by
Christian Neeb and Norbert Wehn
and
Workload Driven Synthesis of Irregular
Application Specific NoC's
by
Ben Meakin
Introduction



Irregular vs Regular Networks
N1
N2
N3
N1
N2
N3
N4
N5
N6
N4
N5
N6
N7
N8
N9
N7
N8
N9
Advantages:
 High performance/low power
for specific application
 Lower hardware cost of
network
Disadvantages:
 Difficult routing
 Difficult floor planning/P&R


Advantages:
 Efficient for general purpose
applications
 Easy routing
 Easy floor planning/P&R
Disadvantages:
 Expensive
 Under utilized channels
 Avg. latency not as scalable
Background and Motivation

NoC power a function of network hops:


NoC latency a function of network hops:


L = Rpipeline * Hops (clock cycles)
NoC hardware cost a function of router radix:


Pperflit = (buff + xbar + arb)*Hops + wire*D
C = sum (k1*radix_n + k2)
n = 1....nodes
Minimize hops and router complexity!
Design of Efficient Irregular Networks
for Heterogeneous Systems-on-Chip
Christian Neeb and Norbert Wehn
University of Kaiserslautern
9th EUROMICRO Conference on Digital System Design 2006
Application Communication Model

Resource Communication Graph (RCG)



Vertex: hardware component
Edge: abstract communication channel
Edge Weight: average communication rate (bandwidth)


Normalized to physical bandwidth of one channel link
Efficient mapping of task communication to RCG
is assumed and is not discussed in this paper
INOC Architecture

Nodes are tiles consisting of a chip resource and a
router

Routers: Wormhole with virtual channels

Round-Robin switch arbitration

Routing:


Deterministic: one output channel per destination
address
Adaptive: > one output channel per destination address
Routing in Irregular Topologies

Channel Dependency Graph (CDG)






Vertex: channel
Edge: pair of channels with a dependency due to
routing function
Deadlock free if there are no cycles
Node Numbering

Arbitrary in this paper
Channels Increasing / Channels Decreasing


Acyclic CDGs
Determined by source/destination node numbers
Deadlock Free Routing

Route packets on increasing channels, then decreasing

General form of dimension order routing
Routing in Irregular Topologies

Unsafe paths allowed provided there is a safe path
(acyclic) that a packet can escape to


Routing table filled via Dijkstra's algorithm
Virtual channels are used to expand routing
function using similar rules as dimension order
schemes
Network Generation

Start with a bidirectional connection of all nodes in
ascending order


Ensures a correct routing path exists
Add shortcuts based on network traffic estimation
and RCG

Provides an optimal routing path that will be used in
the common case
Results

Network traffic patterns randomly generated with
constraints to model heterogeneous SoC traffic


Simulated with 100 different patterns
Variable flit injections rates
Limitations

Naive implementation of “safe path” increases
cost/complexity of network



RCG is the most efficient topology
More efficient deadlock free solutions may exist
Efficient mapping of TCG to RCG is assumed


Can be a difficult problem
Network generation could be linked to a specific
programming model or API for simple TCG to RCG
mapping
Workload Driven Synthesis of Irregular
Application-Specific NoC's
Ben Meakin
[email protected]
Project Objectives

Synthesis tool for INoC's




Generation of optimal network topology
MCAPI workload specification
Synthesis of deadlock free routing
Comparison of irregular to regular NoC
performance, power, and cost
Motivation



CAD for ASICS always needs improvement

Reduce design cost, time-to-market, etc.
Synthesis of INoCs could be useful in
implementing parallel algorithms in programmable
logic
Average router complexity needs to be minimized

Asynch NoCs can realize improvement in average case
performance


GALS INoCs could be future direction in heterogeneous
SoC
Minimize hops to reduce power and latency
Specification of Workload

Communication channels:
ScalarChan N1
Comm Type

N2
256e9
10
Sender Receiver BW (bits/sec) Priority
Optimization Effort:
E = 100 * ((BW / Norm)/2 + (PR / Norm)/2)
Network Synthesis Algorithm



Sort channels by priority/effort
Add all nodes to network
For each channel:

if sender has links:

while link has not been added



if receiver has links, add link; else increase search depth
else recursive call from next sender neighbor node
while not done

if network is not correct, add link; else done
Network Synthesis Example
Max Radix is 3
N1 – N2 added


Highest priority link
N1
N2
Network Synthesis Example
N1 – N4 added
 N1 – N3 added
 N3 – N4 added

N4
N1
N3
N2
Network Synthesis Example

N1 – N5 added
 Max radix of N1 reached, so
N2 used as via
N6 – N7 added
 N5 – N8 added
 Are we done?

N4
N1
N2
N6
N8
N3
N5
N7
Network Synthesis Example
N2 – N6 added for correctness
 Every node must have a path to every
other node

N4
N1
N2
N6
N8
N3
N5
N7
Synthesis of Routing Function


Rename nodes with naming algorithm

Depth first, breadth first, root node selection
Most efficient synthesis for application

For each node pair in workload



if not, add a virtual channel to make it correct
Most correct synthesis for general purpose

For ALL node pairs in network


check if shortest path is correct
check if shortest path is correct

if not, add a virtual channel to make it correct
Synthesize ROM for each node
Uses of this Tool

Incorporate with previous MCAPI hardware
implementation



Allow user to synthesize a working HDL model based
on MCAPI workloads and a set of IP blocks
Modern FPGA's could make this a feasible and cost
effective alternative to fabricating ASIC's
Aid for studying optimal topology for known
network traffic
Questions?