36x48 Horizontal Poster - School of Computing
Download
Report
Transcript 36x48 Horizontal Poster - School of Computing
Hardware Design, Synthesis, and Verification of a
Multicore Communication API
Ben Meakin, Ganesh Gopalakrishnan
University of Utah School of Computing
Introduction
Modern trends in computer architecture and
semiconductor scaling are leading towards the design of
chips with more and more processor cores. Highly
concurrent hardware and software architectures are
inevitable in future systems. One of the greatest problems
in these systems is communication. It is essential that
scalable, flexible, and efficient hardware/software
mechanisms be researched and developed to ease the
technical community into developing concurrent systems.
This research effort is to create such mechanisms by
designing a scalable hardware implementation of a
multicore communication API as a case study of a
concurrent design, synthesis, and verification flow.
Communication
Architecture
RISC Communication Instructions Designed as
Extension to MIPS ISA
Targets Embedded Systems Where Clock Rate not as
Big of an Issue
Routing Function Taken Off Critical Path To Improve
Single Cycle Clock Rate
Flexible: Data can be passed as pointers to shared
memory or as 16, 32, or 64-bit scalars
Simple: MIPS is a well known ISA and good
compiler tools exist
Efficient implementation of MCAPI as a C library
utilizing instructions as inline assembly code
Multi-core Association Communication API
What is it?
Lightweight Message Passing Interface
Provides Communication Primitives
Targeted Towards Embedded SoC’s
Advantages/Disadvantages of Single Cycle Design
Lowest Possible Latency in Cycles
ISA Benefits?
MCAPI
Communication
Performance
NoC Router Design and
Synthesis
Best-Case Latency (in Clock Cycles)
Router Arbitration
Round-Robin Scheme
Starvation Free
Single Cycle Request/Grant Handshake Protocol
Routing Function
Dimension Order Routing
Deadlock Free
Saturating Counters Choose Best VC to Use
Reduces Worst-Case Latency
Physical Communication Medium is a 2-D Mesh
Network with 9 Nodes Consisting of a Modified MIPS
Core, Network Interface Unit, and an On-Chip Router
Worst-Case Latency (in Clock Cycles)
F = 5 * N * Y + L (where N = Hope, L = length of
packet, Y = maximum packet length)
Conclusions
For the worst-case conditions to occur there must be five
maximum length packets trying to use the same virtual
channel at each router along the path. This is a very rare
case. It is expected that the average case is much closer to
the best case latency.
References
1)“Multicore Communications API Specification V1.063,”
www.Multicore-Association.org
On-Chip Router Module
Critical Unit to Minimizing Latency
Hardware Design Flow
Wormhole Flow Control
F = N + L (where N is # of Hops and L is the length of
the Packet)
Synthesis of VHDL source with Xilinx Compiler
2) “Low-Latency Virtual-Channel Routers for On-Chip
Networks,” Mullins, West, and Moore. ISCA 2004.
Five Physical Channels with Two Virtual Channels
Each
Target Platform: Xilinx Virtex5 FPGA
3) “Communication Performance of Mesh and Ring Based
NoCs,” Vaclav Dvorak. 7th International Conference on
Networking.
Single Cycle Data path Design
Design Objective 1: Nine Core MIPS Processor
Running MCAPI Programs on Programmable Logic
4) FPGA Development Board Picture Courtesy of
www.digilentinc.com
Design Objective 2: Platform for Testing/Implementing
Research Ideas in Multicore Architectures
Supported by SRC 2008-TJ-1847 and NSF CCF 0811429