36x48 Horizontal Poster - School of Computing

Download Report

Transcript 36x48 Horizontal Poster - School of Computing

Hardware Design, Synthesis, and Verification of a
Multicore Communication API
Ben Meakin, Ganesh Gopalakrishnan
University of Utah School of Computing
Introduction
Modern trends in computer architecture and
semiconductor scaling are leading towards the design of
chips with more and more processor cores. Highly
concurrent hardware and software architectures are
inevitable in future systems. One of the greatest problems
in these systems is communication. It is essential that
scalable, flexible, and efficient hardware/software
mechanisms be researched and developed to ease the
technical community into developing concurrent systems.
This research effort is to create such mechanisms by
designing a scalable hardware implementation of a
multicore communication API as a case study of a
concurrent design, synthesis, and verification flow.
Communication
Architecture
RISC Communication Instructions Designed as
Extension to MIPS ISA
 Targets Embedded Systems Where Clock Rate not as
Big of an Issue
 Routing Function Taken Off Critical Path To Improve
Single Cycle Clock Rate
 Flexible: Data can be passed as pointers to shared
memory or as 16, 32, or 64-bit scalars
 Simple: MIPS is a well known ISA and good
compiler tools exist
 Efficient implementation of MCAPI as a C library
utilizing instructions as inline assembly code
Multi-core Association Communication API
What is it?
 Lightweight Message Passing Interface
 Provides Communication Primitives
 Targeted Towards Embedded SoC’s
Advantages/Disadvantages of Single Cycle Design
 Lowest Possible Latency in Cycles
ISA Benefits?
MCAPI
Communication
Performance
NoC Router Design and
Synthesis
Best-Case Latency (in Clock Cycles)
Router Arbitration
 Round-Robin Scheme
 Starvation Free
 Single Cycle Request/Grant Handshake Protocol
Routing Function
 Dimension Order Routing
 Deadlock Free
 Saturating Counters Choose Best VC to Use
 Reduces Worst-Case Latency
Physical Communication Medium is a 2-D Mesh
Network with 9 Nodes Consisting of a Modified MIPS
Core, Network Interface Unit, and an On-Chip Router
Worst-Case Latency (in Clock Cycles)
 F = 5 * N * Y + L (where N = Hope, L = length of
packet, Y = maximum packet length)
Conclusions
For the worst-case conditions to occur there must be five
maximum length packets trying to use the same virtual
channel at each router along the path. This is a very rare
case. It is expected that the average case is much closer to
the best case latency.
References
1)“Multicore Communications API Specification V1.063,”
www.Multicore-Association.org
On-Chip Router Module
 Critical Unit to Minimizing Latency
Hardware Design Flow
 Wormhole Flow Control
 F = N + L (where N is # of Hops and L is the length of
the Packet)
 Synthesis of VHDL source with Xilinx Compiler
2) “Low-Latency Virtual-Channel Routers for On-Chip
Networks,” Mullins, West, and Moore. ISCA 2004.
 Five Physical Channels with Two Virtual Channels
Each
 Target Platform: Xilinx Virtex5 FPGA
3) “Communication Performance of Mesh and Ring Based
NoCs,” Vaclav Dvorak. 7th International Conference on
Networking.
 Single Cycle Data path Design
 Design Objective 1: Nine Core MIPS Processor
Running MCAPI Programs on Programmable Logic
4) FPGA Development Board Picture Courtesy of
www.digilentinc.com
 Design Objective 2: Platform for Testing/Implementing
Research Ideas in Multicore Architectures
Supported by SRC 2008-TJ-1847 and NSF CCF 0811429