Networking for Embedded Systems

Download Report

Transcript Networking for Embedded Systems

Network-on-chip
성균관대 조준동 교수
차례
NoC 소개
On Chip network 구조
On Chip Network 설계 사례
NoC 설계 사례
A Dynamic Routing Mechanism
for Network on Chip
Technology Evolution
NoC definition
 A flexible and scalable packet-based
on-chip micro-network designed
according to a layered methodology
Los Angeles : Reducing commute time by
15 min -> $15b economic impact
On chip communication will dominate
performance, power efficiency.
NoC의 필요성
Wireless processing system은 높은
throughput과 함께 많은 계산을 필요로
하지만 엄격한 power 제약이 있음
재구성 SoC 구현은 parallelism 에 의해
성능향상을 시도하고, IP reuse를 사용
Hot spot bottleneck(or traffic)에 의한 성능
예측을 통한 Algorithm partitioning
Network-on-chip
Architecture
Network-on-chip Architecture
Design challenges for On-Chip
Communication Architectures
Three System-on-chip design issues
Technology issues
Performance issues
Design productivity issues
Design challenges for On-Chip
Communication Architectures
 Technology issues :
Limiting the on-chip distance travelled by critical
signal due to the global wire delay.
Self-synchronous cores that communicate with
another through network-centric architecture to avoid
deep sub-micron effect (clock skew,power associated
with clock distribution trees)
Signal integrity issues can be solved by designing as
regular structures , allow to optimise and well-control
electrical parameters of wires.
Design challenges for On-Chip
Communication Architectures
 Perfomance issues :
Network congestion can cause large latency fluctuations
for packet delivery. There are two methods to solve this
problem.
Network overdimensioning (for NoCs support besteffort traffic).
Implementation of dedicated mechanisms to provide
guarantees for timing constrained traffic (e.g.,lossless data transport,minimal bandwidth,bounded
latency,throughput).
Design challenges for On-Chip
Communication Architectures
 Design productivity issues :
The reuse of complex pre-verified design blocks is
efficient means to increase productivity.
To use processing elements in different platform by
means of plug-and play style needs a scalable and
modular on-chip network
Using processing elements is facilitated by standard
network which make the modularity property NoC
effective.
Some standard of networks-on-chip were proposed such
as Virtual Socket Interface Alliance (VISA) ,OCP.
Network-on-chip
Architecture
 Network Interface (NI) :
Hiding detail about network communication protocol to the
cores, developed independently of the communication
infrastructure.
Communication protocol conversion (from end-to-end to
network protocol).
Data packetization (packet assembly,delivery and
disassembly).It is a critical task.Messages that have to
transmitted across the network are partitioned into fixed-length
packets.Packets are broken into flits that are represent logical
units of information. A phit is a information unit that can be
transferred across a physical channel.
Network-on-chip
Architecture
 Network Switch :
Carry packets injected into the network to their final destination
, following a staticaly defined or dynamically determined routing
path.
Switch may have both input and output buffer or only one type
of buffers
Network flow control (routing mode) addresses the limited
amount of buffering resources.
Three policies of network flow control are :
Store-and-forward routing : an entire packet is received and
store before being forwared to next switch.
Virtual cut-through routing : Also requires buffers space for
packet but allow lower latency communication.
Network-on-chip
Architecture
 Network Switch :
Three policies of network flow control are (cont):
Wormhole routing : Reduce switch memory requirements
and permit low latency communication.First flits is decoded
and switch creates a path for next flits.A flit is passed to the
next switch as soon as enough space to store it ,even
though there is not enough space to store whole packet.
Guaranteeing quality-of-service (QoS) in switch operation needs
to be service when time-constrained traffic is to be supported.
Contention related delay are responsible for large fluctuation of
performance metrics.
From Spaghetti wires to Noc
Marcello Coppola, MPSOC05
On-chip communication Infrastructure
온칩 네트워크
아키텍처
● Router/Scheduler 알고리즘 개발
● SystemC를 이용한 네트워크 모델 설계 및
검증
● Star형/Mesh형 온칩 네트워크 핵심 IP
설계
● Master/Slave 네트워크 인터페이스,
고성능 메모리 관리 인터페이스 설계
온칩 네트워크 기반
SoC 설계 플랫폼
● 분산형 Crossbar Switch Topology 생성 및
IP 맵핑 툴 개발
● IP to Mesh Tile 맵핑 툴 개발
● IP간 데이터 플로우 분석 기반 네트워크
Topology 생성 툴 개발, SoC 플랫폼 구축
활용 분야
- QoS를 보장하는 프로토콜을 지원하여 Real Time
Application 및 대용량 데이터 대역폭이 요구되는 응용
분야에 적합
- 멀티미디어 SoC, 휴대 및 통신용 단말기, 인터넷 셋톱
박스, 게임기, 네트워크 단말의 제품 구현에 필요한
시스템 레벨 칩 등
- high frame rate video 및 3D 그래픽 관련 등과 같은
멀티미디어 대용량 응용분야 SoC 설계
- 온칩 네트워크 핵심 IP 및 설계 지원 툴을 하나의
플랫폼화한 플랫폼 기반
- 설계 환경을 구축하여 이를 다양한 SoC 설계에 활용함
On chip communication
Putting the blocks together
posed tough questions:
•Do the hardware interfaces
work with one another?
• Do the chip have enough bus
and memory bandwidth under
worst-case loads?
• Do software tasks
communicate without
deadlock?
• Do all applications and
features of the full system
meet functional goals?
• Does the system meet
performance goals?
• Are the cost, power
acceptable?
IBM’s Coreconnect
초기의 32 비트에서 시작하여 128비트까지 대역폭을 확장
Sonics Smart Interconnect
IP
SMART
(Sonics Methodology and
Architecture for Rapid Time-to-Market)
plug-and-play on-chip communications
network
Packet-based
50 employees in a year
IP 및 설계환경 제공, SoC 설계 지원
Cadence와 연합
SiliconBackplne III는 통신+미디어
Arteris NoC layered
architecture
OCN Configuration
 규칙적인 연결구조와 정적인 스케줄링은 불필요한
interconnect switching 을 제거
 전체 core 에서 Computational load 의 균형을
맞추어 성능향상
 Overhead of the configuration streams
Configuration streams must be scheduled
periodically along with the data
4% 의 bandwidth를 configuration stream 이
사용
 Data content variation 과 system operating
환경에 따라 core interface 와 core 자체가 low
power 모드로 동적 재설정
Scheduled Communication
 Tile은 computational core
 Core interface는
heterogeneous
processing의 사용 제공
 Statically scheduled mesh
of interconnect
 Data 는 이웃하는tile 과
communication pipeline 에
의해 이동. Fast clock rate
와 interconnection
resource의 시 분할이 가능
 Core 와 runtime
interconnect 의 재설정
능력에 의해 dynamic power
management 를 가능
Adaptive System on Chip
Communication Interface
-Stream data that passes through a communication
interface is scheduled for a specific communication
- clock cycle based on data link availability.
-the result of scheduling for each interface is a set of
instructions for its associated interconnect memory.
9-core and 16-core Mode
Evaluation Methodology
Performance of the
Benchmarks
iSOC Compiler
divides applications into parts, each of
which fit into a specific core.
determines data communications between
the cores in a space-time fashion
generate interconnect memory contents
for each individual interface.
References
 aSOC: A Scalable, Single-Chip Communications Architecture
Jian Liang, Sriram Swaminathan, and Russell Tessier
University of Massachusetts, Amherst, MA. 01003.
{jliang, tessier}@ecs.umass.edu
 Configurable Platforms With Dynamic Platform
Management:
An Efficient Alternative to Application-Specific System-onChips
Krishna Sekar Kanishka Lahiri Sujit Dey
[email protected] [email protected] [email protected]
Dept. of ECE, UC San Diego, La Jolla, CA
NEC Laboratories America, Princeton, NJ
Benchmarks,
EE Times,7/2005
 Xpipes, Bologna and Stanford : compared w/
Amba AHB multilayer bus, 21% faster, but worse
latency
 When, Univ. of Kaiserslautern: LPDC decoder:
500Mhz vs 64 Mhz (fixed bus), but 30W vs.
700mW, twice the die size.
 Arteris: better die size, comparable power
consumption, 740Mhz (250Mhz)
 SonicsMX: power-efficient mobile-handset w/
power management
 STNoC, Spidergon: topology w/ degree 2-3
NoC Applications
http://www.eit.uni-kl.de/wehn
• Turbo-Decoder UMTS compliant, 100Mbit:
large flexibilty w/ 14 parallel units, area
= 16.84 mm2 (14mm2 PUs, 2.8mm2 NoC)
• LDPC Decoding,
T. Theocharides, G. Link, N. Chip, T.
Theocharides, G. Link, N. Vijaykrishnan, M. J. Irwin, Int.
Conference on VLSI Design 2005
– 1024 Bit block size, 1.2Gb/s, R=0.75
– NoC: 5x5 2D mesh, dimension-order
routing, large flexibility
– 160nm CMOS Technology, 1.8V, 500 MHz,
110 mm2, ~30 Watt
References









Terry Tao Ye, On-Chip Multiprocessor Communication Network Design and Analysis, Ph.D. Dissertation, Stanford
Univ.
E. Bolotin, et al., Automatic hardware-Efficient SoC Integration by QoS network on Chip, Israel Institute of Tech,
Haifa, Israel.
E. Bolotin, et al., Efficient Routing in Irregular Topology NoCs,
Technion- Israel Institute of Tech
[1] Alexandre E.Eichenberger, Kathryn O’Brien, Peng Wu, Tong Chen, Peter H. Oden, Daniel A. Prener, Janice C.
Shepherd, Byoungro So, Zehra Sura, Amy Wang, Tao Zhang, Peng Zhao, and Michael GschwindL. Gauthier, S.
Yoo, A. A. Jerraya “Optimizing Compiler for a CELL Processor”, PACT 2005, 17-21, pp161 – 172, Sept. 2005
[2] Sunao TORI, *Junji SAKAI, *INOUE, Hiroaki, *Tatsuya TOKUE and YoshiYuki ITO, “Asymmetric MultiProcessing Mobile Application Processor MP211”
[3] The Intel XeonTM Processor MP and the Intel XeonTM Processor MP with up to 2-MB L3 Cache on the 0.13
Micron Process
[4] Hans-Joachim Stolberg, Mladen Berkovic, Lars Friebe, Soren Moch, Sebastian Flugel, Xun Mao, Mark B.
Kulaczewski, Heiko Klubmann, and Peter Pirsch, “A Multi-Core System-on-Chip Architecture for Multimedia
Signal Processing Applications”, SIPS 2003, 27-29, pp. 189 – 194, Aug. 2003 ,
[5] Chen Yingqi, Yang Yuhong, Wang Feng, Guo Kai, “Inter Multi processor communication scheme and shared
memory control in the HDTV decoder SoC design”, IWVDVT 2005, 28-30, pp304 – 307, May 2005
[6] Kumar, R.; Tullsen, D.M.; Jouppi, N.P.; Ranganathan, P., “Heterogeneous Chip Multiprocessors”, Computer,
Volume 38, Issue 11, pp. 32 – 38, Nov. 2005