ppt presentation

Download Report

Transcript ppt presentation

King Fahd University of Petroleum and Minerals
A High Throughput
Network-on-Chip Architecture
for System-on-Chip Interconnect
Abdelhafid Bouhraoua and M.E.S El-Rabaa
Computer Engineering Department (COE)
College of Computer Science and Engineering (CCSE)
King Fahd University of Petroleum and Minerals (KFUPM)
Dhahran, Eastern Province, Saudi Arabia
Abdelhafid Bouhraoua
SOC 2006 – Tampere, 14-16 Nov. 2006
CCSE – COE
King Fahd University of Petroleum and Minerals
Outline







Networks-on-Chips
State of The Art
Fat Tree Network Properties
Modified Fat Tree
Router Architecture
Performance Evaluation
Conclusion
‹#›
Abdelhafid Bouhraoua
SOC 2006 – Tampere, 14-16 Nov. 2006
CCSE – COE
King Fahd University of Petroleum and Minerals
Outline







Networks-on-Chips
State of The Art
Fat Tree Network Properties
Modified Fat Tree
Router Architecture
Performance Evaluation
Conclusion
‹#›
Abdelhafid Bouhraoua
SOC 2006 – Tampere, 14-16 Nov. 2006
CCSE – COE
King Fahd University of Petroleum and Minerals
Semiconductor Industry Future
35
130
31
Technology (nm)
120
100
30
25
90
80
20
65
60
15
15
45
40
7.8
20
10
32
22
3.9
0
1
1.9
2000
2002
2004
5
Transistors/Chip (100 Millions)
140
0
2006
2008
2010
2012
Year
‹#›
Technology Evolution is Faster than
Design Evolution
Abdelhafid Bouhraoua
SOC 2006 – Tampere, 14-16 Nov. 2006
CCSE – COE
King Fahd University of Petroleum and Minerals
Chip Design Methodology
 Chip Complexity: 100M – 3B.
 ASIC Methodology Not Suitable
• RTL  Synthesis  Back End
• Impossible to handle Full RTL Design for the whole
project
 IP-Reuse Based Methodology
• Opens a Wide Range of Possibilities
• IP Blocks Together On a Chip  “System-on-Chip”
From ASIC  System-on-Chip Era
‹#›
Abdelhafid Bouhraoua
SOC 2006 – Tampere, 14-16 Nov. 2006
CCSE – COE
King Fahd University of Petroleum and Minerals
SoC Constraints
 Very Short Time-To-Market
• Compressed Schedule
 Very Short Lifecycle
• Low Development Cost
• Small Team
 High Complexity
• Available Silicon Resources to
Produce Cost-Effective Highly
Integrated SoCs.
 Broad Range of IP Blocks
‹#›
• Impossibility to know them all
Abdelhafid Bouhraoua
SOC 2006 – Tampere, 14-16 Nov. 2006
CCSE – COE
King Fahd University of Petroleum and Minerals
SoC Methodology
Main Task: Integration of The IP Blocks
 System Level Integration
• Data Formatting and Conversion
• Protocol Interfacing
• Control Interfacing
 Interconnection Level Integration
• Signal Interfacing
• Data Transfer Interfacing
• Wire Interconnect and Back-end Integration
‹#›
Abdelhafid Bouhraoua
SOC 2006 – Tampere, 14-16 Nov. 2006
CCSE – COE
King Fahd University of Petroleum and Minerals
Interconnecting The IPs
Brute Force Method:
 Design The Interface Block for Every Pair of IPs in
the SoC
 Point-to-Point Communication between IPs
Problems:
 Design Effort Similar to that of a new IP Block
• If 20 Different Blocks  Around 400 New Designs !!!!
 Point-to-Point Communication  Wiring Mess
• 50 Blocks; 8bits bidirectional  More than 20,000 Globally
Routed Wires
‹#›
Should Look For a Better Way
Abdelhafid Bouhraoua
SOC 2006 – Tampere, 14-16 Nov. 2006
CCSE – COE
King Fahd University of Petroleum and Minerals
Networks-on-Chips
“Route Packets NOT Wires”, William J. Dally
Idea: Build a Complete on-Chip Network
 Unified Communication Model (Similar to OSI
Stack)
• No Ad-hoc Effort
• Standardized Interfacing (May be provided by IP Vendors)
 Unified Network Elements (Routers, Link
Interfaces)
• No Design required by the SoC Teams
 Flexible Interconnect and Reduced Global
Wiring
‹#›
Abdelhafid Bouhraoua
SOC 2006 – Tampere, 14-16 Nov. 2006
CCSE – COE
King Fahd University of Petroleum and Minerals
NoC Requirements
 Performance
• How fast packets are moved across the network?
• How much traffic is carried at the same time and
for how long?
 Overhead
• How Big is its required Size (in Gates) ?
 Adaptivity
• Does it Adapt Easily to new Designs ?
 Complexity
‹#›
• How Easy is Interfacing to it ?
Abdelhafid Bouhraoua
SOC 2006 – Tampere, 14-16 Nov. 2006
CCSE – COE
King Fahd University of Petroleum and Minerals
Outline







Networks-on-Chips
State of The Art
Fat Tree Network Properties
Modified Fat Tree
Router Architecture
Performance Evaluation
Conclusion
‹#›
Abdelhafid Bouhraoua
SOC 2006 – Tampere, 14-16 Nov. 2006
CCSE – COE
King Fahd University of Petroleum and Minerals
Previous Work
 Majority directly derived from other research (Interconnection
Networks for Parallel Architectures)
 Reproduce what has been learned in the area of inter-chip
networks,
 Focus on the router architecture alone to achieve certain goals
in latency
 Asynchronous design of NoCs, mainly GALS
 Circuit switching techniques introduced to provide a certain
guarantee for the latency.
 Did not fully take advantage of the fact that the network is onchip where the main gain is no-pin limitation.
 Router architectures directly derived from inter-chip
architectures where the routers were implemented on a single
chip  substantial overhead.
 Added complexity to achieve guaranteed latency is an overkill in
the on-chip context.
‹#›
Abdelhafid Bouhraoua
SOC 2006 – Tampere, 14-16 Nov. 2006
CCSE – COE
King Fahd University of Petroleum and Minerals
Which Network?
Most Straightforward  Crossbar
‹#›
 Good Throughput (maxes at 66%)
 Non Scalable (Quadratic)
 Complexity Of Implementation for Higher Number
of I/Os.
Abdelhafid Bouhraoua
SOC 2006 – Tampere, 14-16 Nov. 2006
CCSE – COE
King Fahd University of Petroleum and Minerals
2-D Mesh
 Very Popular Topology in NoCs.
• Very Suitable for the 2D nature of Chip
Floorplanning (Tiling)
 Very High Constraints
• Inefficient routing algorithms (deadlock-free by
construction)
• Efficient routing algorithms (Complex
implementation)
• Poor performance: Saturation reached at 30 %.
‹#›
Abdelhafid Bouhraoua
SOC 2006 – Tampere, 14-16 Nov. 2006
CCSE – COE
King Fahd University of Petroleum and Minerals
Analysis
 Low throughput. Means: latency cannot be guaranteed
above the maximum throughput levels
 Low throughput cause by contention over the output
ports of routers among several incoming packets
 Cannot prevent contention from happening. Contention
makes router architectures more complex because they
need to integrate buffering and prioritization logic.
 Routers that implement both packet and circuit switching
makes the architecture even more complex.
‹#›
Abdelhafid Bouhraoua
SOC 2006 – Tampere, 14-16 Nov. 2006
CCSE – COE
King Fahd University of Petroleum and Minerals
Methodology
 Take advantage of the On-Chip Context:
• Design frozen before tape out
• No internal IO limitations
 Aim for a High Throughput Architecture
• Circuitry used at 30% of its maximum is NOT an
optimal Solution (Clock frequency, power).
 Reduced router size
• Integrate a large number of routers
 Wormhole routing vs. Store and Forward
• Reduce required buffers in routers
‹#›
Abdelhafid Bouhraoua
SOC 2006 – Tampere, 14-16 Nov. 2006
CCSE – COE
King Fahd University of Petroleum and Minerals
Fat Tree
What topology resembles a crossbar?
Banyans or Multistage Interconnection Networks.
 Bidirectional multistage
or folded multistage
networks
 Bidirectional multistage
are two entities:
• The Fat Tree (FT)
• The butterfly.
 Fat Tree better than
butterfly (previous
work)
‹#›
Abdelhafid Bouhraoua
R
R
R
R
R
R
R
R
R
R
R
R
C



C
C
C
C
C
C
n+1 Stages (or rows)
Size is
• Routers = n x 2n
• Clients = 2n+1
Diameter = 2logk + 1; n = log k
SOC 2006 – Tampere, 14-16 Nov. 2006
CCSE – COE
C
King Fahd University of Petroleum and Minerals
Outline







Networks-on-Chips
State of The Art
Fat Tree Network Properties
Modified Fat Tree
Router Architecture
Performance Evaluation
Conclusion
‹#›
Abdelhafid Bouhraoua
SOC 2006 – Tampere, 14-16 Nov. 2006
CCSE – COE
King Fahd University of Petroleum and Minerals
Routing in Fat Tree
Routing reduced to routing in a binary tree.
Binary Trees
UP
‹#›
Three Routing Directions
 UP
 RIGHT
 LEFT
Abdelhafid Bouhraoua
SOC 2006 – Tampere, 14-16 Nov. 2006
Router
LEFT
RIGHT
CCSE – COE
King Fahd University of Petroleum and Minerals
Routing in Fat Tree


Matrix n rows x 2(n-1) columns.
Router (r,c)
• r : row index (rows are indexed from 0 to n-1)
• c: column index (columns are indexed from 0 to 2(n+1) -1)





[0, l[ U [u, 2n+1 -1]
Router
(r,c)
Size of the clients’ address space reachable using the
downside ports is equal to 2r
It is always a continuous interval of addresses of the [l, l+2r-1[
[l+2r-1, u[
form [l, u[.
Lower bound l : smallest address reached from the router (r,c).
Smallest address within the range obtained by clearing the lowest r bits
of the column c.
• l = (c/2r) x 2r.
Upper bound u: largest address reached from the router (r,c).
• Largest address obtained by adding 2r to the lower bound l.
• u = l+ 2r.
‹#›
Abdelhafid Bouhraoua
SOC 2006 – Tampere, 14-16 Nov. 2006
CCSE – COE
King Fahd University of Petroleum and Minerals
Routing In Fat Tree
“Summit” Routers
C
R
R
R
R
R
R
R
R
R
R
R
R
C
C
C
C
C
C
Alternate Paths
C
Routing UP: Adaptive
Routing Down: Deterministic
‹#›
Abdelhafid Bouhraoua
SOC 2006 – Tampere, 14-16 Nov. 2006
CCSE – COE
King Fahd University of Petroleum and Minerals
Outline







Networks-on-Chips
State of The Art
Fat Tree Network Properties
Modified Fat Tree
Router Architecture
Performance Evaluation
Conclusion
‹#›
Abdelhafid Bouhraoua
SOC 2006 – Tampere, 14-16 Nov. 2006
CCSE – COE
King Fahd University of Petroleum and Minerals
Contention in Fat Tree
UP
Contention
on the way
down
Many Choices
for Going UP
LEFT




‹#›
RIGHT
Packets coming from the UP links are never routed up
Only packets coming from the bottom links are routed up.
Since the number of UP links is equal to the number of bottom links,
there cannot be any contention when routing up.
Contention occurs only when going down.
• Bottom links are split in RIGHT and LEFT links, deterministic routing of
packets will lead to contention.
Abdelhafid Bouhraoua
SOC 2006 – Tampere, 14-16 Nov. 2006
CCSE – COE
King Fahd University of Petroleum and Minerals
Modified Fat Tree
Doubling of downward links eliminates contention
C
R
R
R
R
R
R
R
R
R
R
R
R
C
C
C
C
C
C
C
‹#›
Abdelhafid Bouhraoua
SOC 2006 – Tampere, 14-16 Nov. 2006
CCSE – COE
King Fahd University of Petroleum and Minerals
Outline







Networks-on-Chips
State of The Art
Fat Tree Network Properties
Modified Fat Tree
Router Architecture
Performance Evaluation
Conclusion
‹#›
Abdelhafid Bouhraoua
SOC 2006 – Tampere, 14-16 Nov. 2006
CCSE – COE
King Fahd University of Petroleum and Minerals
Router Architecture
k
k
 No Crossbar
Right
Left
Inputs Inputs
 No Buffers (Pushed to the Clients)
 Every downstream input
simultaneously connected to two
outputs.
 Contention eliminated between
the inputs going downstream.
 Number of outputs is 2k+2 for k
inputs (case of when the router is
2k + 2
2k + 2
a summit)
 Router models differ from each other only by two items:
• Number of input and output ports on the down link
• Routing function constants (r,c)
‹#›
Abdelhafid Bouhraoua
SOC 2006 – Tampere, 14-16 Nov. 2006
CCSE – COE
King Fahd University of Petroleum and Minerals
Routing Circuitry
 All network elements are constants
and frozen at design time.
• All lower bound and upper bound
values, used to generate the routing
functions, are constants for each router.
• These constants are entered as inputs
into the routing function
 Routing Function implemented
using comparators.
 Constants needed by the routing
function are:
‹#›
Address
A≥
l
<
B
A≥
L
<
SOC 2006 – Tampere, 14-16 Nov. 2006
RIGH
T
B
A≥
u
<
B
• l
• L = l + 2r-1
• u
Abdelhafid Bouhraoua
LEF
T
CCSE – COE
UP
King Fahd University of Petroleum and Minerals
Client Interface
Up Link
FIFO
FIFO
FIFO
FIFO
Down Links (from router)
FIFO
 Buffers pushed to the Client
Interfaces
 Each incoming link is
terminated with a FIFO
memory.
 The different FIFO memories
connected to the client
through a single shared bus.
Client/IP Block
• Bus can be wider to perform data transfers faster than what is
received in the FIFOs.
 The size of FIFOs customizable by design team according to
the specifications
‹#›
Abdelhafid Bouhraoua
SOC 2006 – Tampere, 14-16 Nov. 2006
CCSE – COE
King Fahd University of Petroleum and Minerals
Outline







Networks-on-Chips
State of The Art
Fat Tree Network Properties
Modified Fat Tree
Router Architecture
Performance Evaluation
Conclusion
‹#›
Abdelhafid Bouhraoua
SOC 2006 – Tampere, 14-16 Nov. 2006
CCSE – COE
King Fahd University of Petroleum and Minerals
Simulation Conditions
 Uniform Traffic Generation
 Uniform Distribution of Destinations
 Traffic Rate constant fraction of Maximum
Link Bandwidth
 Variable Packet Size (within a predetermined
range; eg. 64 bytes +/- 10%)
 Simulation Platform: Cycle-based C-based.
Developed for this purpose
‹#›
Abdelhafid Bouhraoua
SOC 2006 – Tampere, 14-16 Nov. 2006
CCSE – COE
King Fahd University of Petroleum and Minerals
Throughput
100
FT2
90
FT
Throughput (%)
80
70
60
50
40
30
20
10
0
10
20
30
40
50
60
70
80
90
Input Load (%)
‹#›
 More than 90% Throughput achieved
 Compare with Regular Fat Tree
Abdelhafid Bouhraoua
SOC 2006 – Tampere, 14-16 Nov. 2006
CCSE – COE
King Fahd University of Petroleum and Minerals
Latency (cycles)
Latency
130
FT2-32C-64B
120
FT2-32C-128B
110
FT2-64C-64B
100
FT2-64C-128B
90
FT-32C-64B
80
FT-32C-128B
70
FT-64C-64B
60
FT-64C-128B
50
40
30
10
20
30
40
50
60
70
80
90
Input Load (%)
‹#›
Abdelhafid Bouhraoua
SOC 2006 – Tampere, 14-16 Nov. 2006
CCSE – COE
King Fahd University of Petroleum and Minerals
Area and Speed
Buffer-less architecture less costly
‹#›
Abdelhafid Bouhraoua
SOC 2006 – Tampere, 14-16 Nov. 2006
CCSE – COE
King Fahd University of Petroleum and Minerals
 Buffers pushed to the
client interfaces.
• Considerable
number of buffer
lanes is necessary for
every client
interface.
 Simulations shows a
linear progression of
Max. Number of Active FIFOs
Client Buffer Utilization
10
9
8
7
6
5
4
3
2
1
0
ML = 64 bytes
ML = 128 bytes
ML = 256 bytes
10
20
30
40
50
60
70
80
90
Input Load (%)
the maximum number of lanes used during operation.
 Obtained figures are an order of magnitude lower than the number
imposed by the architecture.
 Number of buffer lanes in the client interface can be tailored to suit
the class of applications at hand while reducing buffering area.
‹#›
Abdelhafid Bouhraoua
SOC 2006 – Tampere, 14-16 Nov. 2006
CCSE – COE
King Fahd University of Petroleum and Minerals
Outline







Networks-on-Chips
State of The Art
Fat Tree Network Properties
Modified Fat Tree
Router Architecture
Performance Evaluation
Conclusion
‹#›
Abdelhafid Bouhraoua
SOC 2006 – Tampere, 14-16 Nov. 2006
CCSE – COE
King Fahd University of Petroleum and Minerals
Conclusion
 A contention-free modified FT architecture is
proposed.
 Proposed architecture achieves maximum
theoretical throughput and has smaller latency than
conventional FTs.
 Latency increases linearly with input load.
 Achieved performance is actual performance using a
contention-free network.
 The area of the network is kept small because of the
absence of buffers in the router architecture.
 Number of buffer lanes in the client interfaces can
be tailored for a specific platform to suit the class of
applications at hand while reducing buffering area.
‹#›
Abdelhafid Bouhraoua
SOC 2006 – Tampere, 14-16 Nov. 2006
CCSE – COE