Transcript Network on a Chip
Network on a Chip: An Architecture for the Billion Transistor Era A. Hemani, A. Jantsch, S. Kumar, A. Postula, J. Öberg, M. Millberg, D. Lindqvist Royal Institute of Technology, Stockholm Jönkoping University, Jönkoping University of Queensland, Brisbane Ericsson Radio Systems, Stockholm The Problem Design Productivity Gap And it is not just more gates . . . 2000 1970 Funtionality Testability Funtionality Testability Wire Delay Power Management Embedded Software More design choices (HW, mP, DSP, FPGA,…) Signal Integrity RF Hybrid Chips Packaging Methodologies & Platforms Behavioural synthesis Solves an insignificant problem today. Will eventually replace and/or subsume RTL synthesis. IP/VC based design method 200-400 IP/VC blocks of 100k gates required in .1 micron era. Interface design too big a problem. Platform based design A step in the right direction. Platforms Bus based interconnect scheme will not scale FPGAs point in the right direction. Low granularity. The Emerging Platforms & Architectures Algorithm on a chip Hardwired computation Hardwired interconnectivity Centralised storage System on a chip Programmable computation Hardwired interconnectivity Partially distributed storage Network on a chip Programmable computation Programmable interconnectivity Fully distributed storage Network on a chip Generic Computational resources Processor cores, FPGA blocks Storage Distributed I/O Programmable Interconnect All resources have an address Resources are interconnected by a network of switches Resources communicate by sending addresed packets of data. Honeycomb Structure: a Possible NOC Topology • Nodes of a honeycomb cell are populated with resources • A switch at centre interconnects resources at nodes • Switches are connected to their immediate neighbours • Each resource is directly connected to three switches and can reach 12 resources with a single hop. • Connectivity is further improved by directly connecting switches to their next nearest neighbour. The Performance Overhead Pipelined Interconnect Wire delays will soon require pipelining wires(Berkeley) In .1 micron long wire delay will be 100x compared to gate delay. Signals will need 10s of clock cycles to cross chip. Switching in NOC provides natural pipelining. Latency is attenuated Globally asynchronous & Locally Synchronous design style Switching with low logic depth can be the high speed clock domain Computation with high logic depth can be the slow clock domain Latency is attenuated by the ratio of communication clock to computation clock. The Area Overhead Area overhead A study by Guerrier and Greiner shows that the area overhead will not be an issue. A more accurate and detailed answer will have to wait further research. Design Methodology Design entry: Set of communicating tasks Task graph analysis Scheduling policy Binding of tasks to resources Code generation for tasks NOC Compiler How to map an application onto the NOC platform? Summary Future systems on chip will be networks Fixed platforms will facilitate design Main open questions: Network topology? Network nodes? NOC Compiler?