cpt3 - NDSU Computer Science

Download Report

Transcript cpt3 - NDSU Computer Science

Computer Networks (CS 778)
Chapter 3: Packet Switching data over long distances (not just 1 link).

2 Packet Switching approaches: connection-oriented and connectionless

Forwarding or Switching: Routing packets from an input to the right output.

Key problem a packet switch must deal with is finite bandwidth of its outputs.

Contention: Packets arrive for an output faster than its capacity (buffered)

Congestion: Switch runs out of buffer space.

This chapter deals with forwarding and contention in packet switches

LAN Switching & ATM Switching: 2 main network packet switching
technologies.
Switching/Forwarding
(layer? OSI network; Internet IP; ATM).
Switch: multiInput, multiOutput devise which transfers packets from an input to
 1 output.
(called switching or forwarding).
Assume bi-direction links (in a wire world, a link is an inPort-outPort pair).
Switchs add star topology to the topologies we’ve seen so far (pt-pt, bus, ring
Star allows a hierarchy and virtually unlimited size.
Stars are scalable (adding host to a switch w/o decreasing performance for others,
assuming switch “backplane” bandwidth is sum of link bandwidths)
How does a switch decide which output to place packet on? Several approaches:
Datagrams (connectionless approach)
Virtual Circuits (connection-oriented approach)
Source Routing (simple approach – less common than the other two)
(Switch below has two T3 links and one STS-1 SONET link)
Event Timing
Datagrams (and datagram networks)







No setup phase (connectionless model)
Each packet contains full destination address
Hosts never know if the network can deliver or even if the destination can receive.
Forwarding tables: eg, at SW2: A,C,D->3 B,G,H->0 E->2
Table creation (Cpt4) is hard: topology may change or there may be multiple paths

E.g., successive packets from A to B can follow different paths
A switch or link failure may not preclude communication.

tables may be updated to route around the failure.
This capability goes back to the ARPANET (forerunner of the internet)

Since it was a military network, this capability was essential.
Virtual Circuit (VC) Switching (connection-oriented)

Setup phase: Establishes a connection-state in each switch along connection–by the


System Admin: for long-lived permanent virtual circuit or PVC or by
Host: send setup req into net switched virtual circuit or SVC(signaling),
Transfer Phase
Teardown Phase.

All packets follow the same circuit (Analogous to phone calls)

Each switch keeps a VC table with VC state-entries:
| inPort | inVCI | outPort | outVCI |


(VCI=VC-identifier)
Combination of (inPort, inVCI) uniquely identifies VC thru a particular link.
VCI’s are not globally unique (in fact, inVCI & outVCI usually differ)
 VCIs have link-local scope. WHY??
Virtual Circuit (Continued)


For a PVC, Network Administrator picks unused VCI for each link (e.g., 5,11,7,4):
inPort
InVCI outPort outVCI
VC-table entries at each switch are: SW1: 2
5
1
11
SW2: 3
11
0
7
SW3: 0
7
3
4
Virtual Circuit (Continued)




How is signaling done for SVCs? (setup communication)
hostA sends SetupMessage (SM) to SW1 (with at least hostA, hostB addrs)
SM flows on SW2 -> SW3 -> hostB (How? Routing details later.)
Each SW sets table entry: (inPort, inVCI, outPort,__) (chooses an unused InVCI)


Note that the switch (or host for the last link) chooses the InVCI for the link coming into it)
VC-table entries would be
inPort
SW1: 2
SW2: 3
SW3: 0
InVCI
5
11
7
outPort
1
0
3
outVCI
Virtual Circuit (Continued)






hostB gets SetupMess (SM) If willing to accept connection, attaches OutVCI=4 to ack.
Sends ack downstream: hostB - > SW3 - > SW2 - > SW1 - > hostA.
Each SW completes VC-table entry, sends ack with appropriate link-VCI,
inPort
InVCI outPort outVCI
VC-table entries would be
SW1: 2
5
1
11
SW2: 3
11
0
7
SW3: 0
7
3
4
SW1 sends ack to hostA specifying VCI=5. The setup phase is complete.
Second stage is data transfer. Third stage is connection teardown (when done sending)


HostA sends teardown message (TD) to SW1 (SW1 removes table entry)
TD is sent SW1 - > SW2 - > SW3 - > hostB, each SW does similarly.
Virtual Circuit (versus Datagram)
Virtual Circuit Model:







Typically 1 RTT (setup) before 1st data packet is sent.
Data packets have only a small identifier (setup mess has full destination address)
(per-packet header overhead is small)
If switch/link fails, connection is broken and a new one needs to be set up.
Host reserves resources at setup, gets much info (net is able to transmit, dest is able to receiv
VCI service is local (no global server - involving constant communication overhead
The most popular VC technologies are:
OSI X.25 uses VC model in a 3 part strategy:



Buffers are allocated along the VC when circuit is initialized.
Sliding window is run between pairs of VC nodes for error correction (and flow ctrl)
Circuit setup is rejected by any node with insufficient buffer availability





called hop-to-hop flow control
Thus, there is contention, but never congestion.
Frame Relay is a straight-forward implementation of VC technology.
Extremely popular due to its simplicity (Frame Relay PVCs provide almost leased-line-like servic
Some basic QoS and Congestion-avoidance is provided, but it’s minimal.
ATM (coming up in 3.3)
Datagram (versus Virtual Circuit)
Datagram Model:




There is no round trip time delay waiting for setup.
(Host can send data when ready.)
Source doesn’t know if network can deliver packet or
even if the intended destination is up and accepting packets
Since packets are treated independently,
it is possible to route around link/node failures.
Since every packet must carry the full destination address,
per packet overhead is higher than for the
connection-oriented model.
Source Routing





Uses neither Virtual Circuits nor conventional datagrams.
Address contains entire sequence of out-Ports on source-to-destination path.
List is rotated so the next out-Port is always in front.
Problems? May be difficult for source to know route. Header must be variable size.
Alternatives to rotating OutPort Addresses:

Stripping: Each SW strips off its outPort (eg, 3,0,1 to 3,0 at SW1)

Out-Port Pointer in fixed position in header
(eg, |ptr| 3 | 0 | 1 | -- > |ptr| 3 | 0 | 1 | at SW1 ).
`-----^
` - -^
Source Routing (continued)


Source routing can be used in both datagram networks or VC networks.
Internet Protocol includes source routing option.




Selected packets can be source routed.
However, the majority are switched datagrams.
Some VC nets use source routing to get VC setup request along path.
Source routing suffers from poor scalability

(Hard for a host to know the complete route in large net).
Performance
A switch can be built from a general-purpose workstation.
In fact, Unix provides this capability in the kernel
(We will consider special-purpose switch hardware later.)





Install multiple NICs (Network Interface Cards).
Use DMA for transferring packets between MM and the NICs.
Build and manage your own buffers.
CPU needs to inspect only the header information to determine out-Port.
Usually the bottleneck is I/O bus bandwidth (all packets must go thru I/O bus)

Such a switch will have severe limitation on aggregate back-plane bandwidth.
I/O bus
CPU
Interface 1
Interface 2
Interface 3
Main memory
Forwarding vs Routing
Forwarding: select outPort based on dest-address and forwarding table
Routing: process by which forwarding table is built.

Bridge: A forwarding-switch (between LANS, eg Ethernets..)

AKA: LAN switch or LAN Bridge

For ethernets, one could use a repeater (to forward signal), but they impose size limitations.
Could implement using node in promiscuous mode between 2 Ethernets (forwarding all packets


Intelligent bridge (learning bridge) don’t forward all packets (use forwarding table:
Host > Port Starts empty; for each packet received, record sender’s port.
If host is not in table, forward to all ports (table is just a filter).




All entries timeout after a fixed time to protects against inaccuracies due to host removal.
Loops can form (causing frames to loop forever). Thus, bridges run distr spanning tree alg
Think of the bridge-extended LAN as a graph (vertexes=bridges, edges=connections).
Spanning tree is acyclic sub-graph which covers (spans) all vertexes.
A

Network as a Graph:
3
4
C
6
1
2
1
B
9
E
1
D
F
Asynchronous Transfer Mode (ATM)


Connection-oriented, packet-switched network –Virtual Circuit
Used for both WANs & LANs (but predominantly in long haul WANs today

Specified by ATM Forum (www.atmforum.org)

Commonly transmits over SONET at the physical level (but not a requirement)

QoS capabilities are one of the strong selling points.

Fixed length Packets = 53 byte cells:

When any VC is set up, dest address must appear in signaling message.

ATM uses 1 of several dest addr formats (different from MAC addr in LANs)


5-byte header + 48-byte payload.
Two examples (detail later)

NSAP (Network Service Access Point)

E.164
48 byte payload was a compromise (US bid for 64B and Europe bid for 32B.)
A little history on ATM
Part of the B-ISDN standard of ITU in 1984.
B-ISDN was motivated by PCs demanding higher bandwidths and lower error rates
- was to replace separate Telephone network infrastructure & data networks
- was to allow integration on one digital network fabric.
- was to scale to gigabit speeds
-was to provide a flexible way to divide bandwidth into chunks for different traffic
1988 ITU chose ATM as underlying switching/multiplexing technology for B-ISDN.
1991 ATM Forum was founded to replace ITU as the standards body for ATM.
Planned Benefits of ATM:
- Efficient use of network bandwidth (bandwidth on demand)
- Scalability (LAN-WAN, # of users, speed)
- Low latency and low latency variation (virtual circuit and pre-negotiated QoS)
- Transparency to existing Applications
- Integrated Service
- Internetwork-able with existing WANs
- Support both constant and variable bit rates
Cells
(Variable versus Fixed-Length? Size?)
Fixed-length easier to switch in hardware, simpler, but no optimal length



if small: header-to-data overhead is high
if large: low utilization for small messages
Small size provides a finer-grained pre-emption point for scheduling a link, e.g.,






maximum packet = 4KB = 4096 bytes
link speed = 100Mbps
transmission time = 4096 x 8 bits/packet / 100 = 327.68μs / packet
Thus, a high priority packet may sit in the queue for 327.68μs
in contrast, 53 x 8 / 100 = 4.24μs / packet for ATM
Near cut-through behavior, e.g.,
 two 4KB packets arrive at same time
 link idle for 327.68μs while both arrive
 at end of 327.68μs, still have 8KB to transmit
 in contrast to 53-byte cells where host can transmit first cell after 4.24μs and
at the end of 327.68μs, there would be just over 4KB left in queue
Cell Format

User-Network Interface (UNI) (cell format shown above) (host-to-switch format)
 GFC Generic Flow Control (Intended for traffic ctrl across user-net interface. Not used)
 VCI Virtual Circuit Identifier
 VPI: Virtual Path Identifier (size goes to 12 bits for NNIs (when GFC goes away)
 Type:




CLP: Cell Loss Priority

Set by source host if cell can be dropped without serious damage to message
HEC: Header Error Check (CRC-8)
Network-Network Interface (NNI)
 switch-to-switch format
 GFC becomes part of larger VPI field


1st bit: specifies management versus data cells
2nd bit: (for data cells) EFCI (Effective Forward Congestion Indicator) set by switches
about to become congested.
3rd bit: user signalling (used in conjunction with AAL-5 to delineate frames)
ATM Model

_VOICE
VIDEO
DATA_
| ATM Adaptation Layer (AAL)|
| ATM Layer
|
| Physical Layer
|
Physical Layer


physical interfaces and framing protocols
Several ATM Forum specs for physical connectivity between devices:
DS-1 or T1 at 1.54 Mbps
 DS-3 or T3 at 45 Mbps
 100 Mbps access using FIDDI standard
 155 Mbps access using Fiber Channel standard on multimode fiber
 SONET (nonUS=SDH, Synchronous Digital Hierarchy - single/multimode fiber at N*51.84
SONET is predominant physical layer LEVEL LINE-RATES
OC-1
51.84 Mbps
OC-3
155.52 Mbps
OC-12 622.08 Mbps
OC-48 2488.32 Mbps


ATM Adaptation Layer (AAL)
AAL is the Interface between user applications and the ATM Layer

Performs SAR, segmentation of packets into ATM celss and
reassembly of ATM cells into packets.

Also detects and handles out of order or lost cells.

Supports ATM Application Level Service Classes




CBR (Constant Bit rate) Reserves a set bandwidth end-to-end.
VBR (Variable Bit Rate) bursty traffic (realtime, non-rt; Reserves amt of variable bdwd)
ABR (Available Bit Rate) Min bandwidth and burst above it w/o cell loss.
UBR (Unspecified Bit Rate) best effort service similar to Internet
Serv Class Traffic descriptors (at Call Setup) QoS Parameters
CBR
PCR (Peak Cell Rate)
CTD (Cell Transfer Delay)
CDV (Cell Delay Variation)
CLR (Cell Loss Ratio)
rt-VBR
PCR (Peak Cell Rate)
Maximum CTD
SRC (Sustained CR))
peak-to-peak CDV
MBS (Max Burst Size)
CLR (Cell Loss Ratio)
ABR
PCR (Peak Cell Rate)
CLR (Cell Loss Ratio)
MCR (Min Cell Rate)
UBR
PCR (Peak Cell Rate)
FTP (file trans)
Intended Uses
realtime Video
Voice
compressed Voice
compressed Video
rt-OLTP
RPC
NFS/DDBMS
Four AAL protocols were originally defined (AAL-1, AAL-2, AAL-3, AAL-4),
then AAL-3 and AAL-4 were merged into AAL-3/4, then AAL-5 was added.
Segmentation and Reassembly
User Packets
ATM Cells
ATM Adaptation Layer (AAL)



AAL 1,2 designed for apps needing guaranteed rate (voice, video; CBR, rt-VBR)
AAL 3/4 designed for packet data (nrt-VBR)
AAL 5 alt standard for packet data (LAN traffic; connection/connectionless VBR)
Segmentation and Reassembly (details)
(Convergence sublayer of the AAL layer provides an interface to the application)
(SAR sublayer converts messages to cells)

AAL-1
AAL-1 is the protocol used for real-time, constant-bit-rate, connection-oriented traffic

E.g., Uncompressed audio and video
Bits are fed in by the application at a constant rate and must be delivered at the same rate
with minimum delay, jitter(variation in rate) and overhead

One byte (or two) of ATM payload is used for control information




P-cells are used when message boundaries must be preserved (Pointer gives the offset to the start of
the next message in number of bytes)
SN is the cell sequence number
SNP cell sequence number checksum (CRC-3), Even parity bit further reduces liklihood of bad SN)

AAL-2
AAL-2 is the protocol used for compressed, constant-bit-rate, connection-oriented traffic



E.g., Compressed audio and video
Bit rate can vary strongly over time
One byte (or two) of ATM payload is used for control information




SN is the cell sequence number
IT stands for Information Type and is used to indicate that the cell is the start/middle/end of message
LI is the length indicator (tells how bit the payload is in bytes (could be less than 45)
CRC is a checksum for the entire cell
AAL-2 Cell Format
AAL 3/4
8
8
CPI Btag
16
< 64 Kbytes
BASize
USER DATA
0-24
8
8
16
Pad 0 Etag Length
Convergence Sublayer Protocol Data Unit (CS-PDU = AAL3/4 packet) CPI: common part indicator (CS-PDU version);
Btag/Etag: begin/end tag BASize (Buffer size hint) User-data (AAL var len payload) Length: PDU size
Originally ITU had different protocols for connection-oriented and Connectionless service for data transport,
ie, sensitive to loss and errors but not time dependent. Then they discovered there was no need for 2 protocols
so conbined into AAL-3/4 which can operate in stream (no message bddry maintained) or message mode
and provide both reliable and unreliable transport as well as multiplexing (not available in any of the others) which
allows a host the option of multiplexing multiple sessions onto one VC (saves money, since charging is done by the VC):
40
2
4
10
352 (44 bytes)
ATM header Type SEQ MID Cell Payload
6
10
bits
Length CRC-10
ATM Cell AAL3/4 format: Type (BOM/EOM: begin/end of message COM: continuation of
message) SEQ: sequence number; MID: message id, AAL3/4 Payload=44B (4B of standard ATM
payload for 6 special AAL3/4 fields: Type, SEQ, MID, Length, CRC-10) Length: # of PDU bytes cell
| CS-PDU-Header |U S E R D A TA
U S E R D A T A | CS-PDU-trailer |
|
|
|
|
Segmentation:
V
V
V
V
|ATM-header|AAL-header| Cell-Payload |AAL-trailer| |||pyld|| |||pyld|| … |||pyld padding||
AAL5
< 64 KB
USER DATA
0-47B
16
16
32
pad Reserved Length CRC-32
Convergence Sublayer Protocol Data Unit CS-PDU Format (AAL5 packet format)
pad so trailer falls at end of ATM cell
Reserved for higher layer sequencing / multiplexing
Length: size of PDU (data only – padded to be a multiple of 48bytes)
CRC-32 (detects missing or misordered cells)
Cell Format - same as AAL3/4 except: end-of-PDU bit in Type field of ATM header
|U S E R D A TA
U S E R D A T A |pad| CS-PDU-trailer |
|
Segmentation:
V
|ATM-header|Cell-Payload | ||pyld| ||pyld| … ||pyld|
AAL-1 trhu AAL-3/4 were designed by the telecom industry without much input from the computer industry.
When the computer industry woke up and realized the implications of, complexity and inefficiency of two
headers (2 layers) and the short checksum (10 bits) they invented their own AAL protocol, AAL-5.
It was originally called SEAL for Simple Efficient Adaptation Layer. It offers several service options:
1. reliable service (guaranteed delivery and flow control)
2. Unreliable service (no guaranteed delivery – best effort)
VPI/VCI


Host: treat VPI/VCI together as a 24-bit circuit identifier
A Switch that routes many VCs between company sites can use
one VPI instead of many VCIs


Makes the Virtual Circuit Tables smaller and makes addressing faster.
Network: A VP aggregates multiple circuits into 1 path
ATM in the LAN

Problem: In common shared-media LANs multicast/broadcast is
easy since every node is connected to the same link. (e.g.,
Ethernet, Token-Ring)


Protocols were built to take advantage of easy broadcast (eg, Addr Resolution
Protocol=ARP)
Two Solutions:
 Redesign Protocols that make LAN assumptions which are not true of ATM


E.g., ATMARP doesn’t depend on broadcast
Make ATM behave more like a shared-media LAN (eg, support
broadcast/multicast without losing performance advantages of switched
network. I.e., add functionality to ATM LANs so anything that runs over
sharedmedia LAN runs on ATM LAN

Called LAN Emulation or LANE
LANE terms & addresses are confusing
(host/brdige/router=LANE Emulation Client =LECs)
LANE must provide, e.g., 48-bit MAC addresses to emulate Ethernet.
VCI is very different from an address (need addr for setup, then VCI used for transit)
For LANE, ATM switches don’t change, LANE has additional servers (at hosts?)
LECS: LAN Emulation Config Serv (New LEC finds LECS: gets LANE info, frame size, LES adr
LES: LAN Em Serv (New LEC sends MAC & ATM addrs to LES. LES gives ATM addr of BUS)
BUS: Broadcast & Unknown Server (maintains pt-multipt VC to all clients for broadcasting)
Switching Hardware Overview

Terminology: n x m switch
has n inputs and m outputs



(usually n=m, but not always)
Design Goals
 High throughput
 Scalability (with respect to n)
Ports and Fabrics
 Port





Input
port
Output
port
Input
port
Output
port
Fabric
Input
port
Output
port
Input
port
Output
port
Contains Electric or Optic receivers and transmitters, Provides buffers for
packets (cells) waiting to be switched or transmitted, contains circuitry.
InPort determines and attaches outPort# (in predominant case of self-routing fabric)
InPort is the first place to look for performance bottlenecks.
InPort deals with complexities of the outside world so fabric has simple job:
Fabric


Deliver presented packet to the right output. (as simply as possible)
May do buffering also (internal buffering fabric).
Buffering (and Head-of-line blocking)

Head-of-line blocking: E.g., when InPort buffers have head-of-line cells in a
FIFO queues destined for the same OutPort, while cells behind them wait
unnecessarily (destined for other OutPorts).




Can reduce throughput down to 59% (assuming uniformly distributed arrivals).
Majority of switches use pure outPort or mixed internal/outPort buffering.
Buffering is also important wrt QoS (can’t always use simple FIFO, Chpt 6)
Buffering is needed wherever contention is possible
 input ports (contending for fabric)
 Internal fabric buffers (contending for output port)
 output ports (contending for links)
2
1
2
Port 1
Sw itch
Port 2
Crossbar Switch

4X4 crossbar:
Conceptually simple (Every
input connected to every output)

Only possible contention problem
is OutPort contention.



Complexity of an OutPort grows
faster than the number of InPorts.
Complexity of switch  n2
Designing a switch with low
OutPort complexity is difficult.
Knockout Switch is one such. (next slide)
Knockout Switch (not-quite-perfect crossbar)

8-to-4 knockout concentrator
Perfect crossbar can route packets from
all n inports to 1 outport concurrently.
Inputs
n-by-l Knockout Concentrator:



OutPort can accept l packets
 Pick l small enough to keep costs low
 Pick l large enough for hotspots
 InPort where arrivals concentrate
 E.g., popular website..
Each OutPort has 3 parts:
 Filters (recognize packets for this port)

Concentrator (picks  l packets, discard rest
 Hard job – needs to be fair 



Losers go to the next section.
Winner beats all others in a section:
D
D
D
D
D
D
D
D
D
D
D
D
1
2
D
D
3
4
Outputs
section1
section2
section3
Queue of length l at each OutPort for accepted packets that are as yet untransmitted
section4
Knockout Switch Output Port Buffer


Each OutPort has l separate buffers
Buffers are filled round-robin (by a shifter)


Shifter
(a)
Buffers
Occupancy levels always within 1 of each other
Buffers are emptied in round-robin fashion

Preserving arrival order
Shifter



A) 3 packets arrive
B) 3 packets arrive, 1 leaves
C) 1 packet arrives, 1 leaves.
(b)
Buffers
Shifter
(c)
Buffers
Knockout Switch (All components)
Shared Media Switches
Examples include switches built from PCs (sharing PC bus and memory)
Tend to scale poorly (shared resources get overloaded as switching task grows)
Nice aspect is large shared buffer space
built using COTS parts
better utilization possible.
Writes only 1 packet to memory at a time.
Mux-to-memory bus must be
n times faster than link speed.
Arriving packet:
header is stripped and goes Write-ctrl logic
which gets a memory address from a freelist,
writes the packet to that address,
adds the address to the appropriate outPort list
Read-ctrl takes packets from outPort lists
sends to outPort thru demux
returns memory address to the freelist.
Self-Routing Fabrics
BANYAN 



Route: 0up, 1down on: left-bit middle-bit right-bit

Banyan Network
 Constructed from simple 2 x 2 switching elements as above
 InPort attaches self-routing header = Binary_OutPort#
 OutPort removes it
 Only one path exists from a given input to a given output.


No collisions if inputs are pre-sorted into ascending order
Complexity: n log2 n (n/2 switching elements per stage and log2 n stages)
Banyan Switch example
The route two cells take through the switch.
6 = 110 (down, down, up)
1 = 001 (up, up, down)
Banyan Switch examples
Cell collisions on the left, e.g., 5&7; 0&3; 6&4; 2&1. And 2 in middle due
to the fact that the inputs are not ordered (assume lesser is taken).
Collision-free routing on the right (inputs are ordered)
If the cells are sorted by destination and
presented on input lines, 0,2,4,6, 1,3,5,7; then there will be no collisions.

Batcher Network

switching element that sorts inputs (1 path from each In to each Out)
 some elements sort into ascending order ( )
 some elements sort into descending order (
) (if only 1 cell go opposite arrow)

elements arranged to implement merge sort
complexity: n log2 n


Common Design: Batcher-Banyan Switching Fabric
Batcher-Banyan Switch (example with 4 cells)
Batcher-Banyan Switches
Batcher-Banyan would have to drop packets whenever  2 are headed
for the same OutPort. There are switches that deal with this problem.
First came Starlite in 1984, Moonshine Switch in 1987, Sunshine Switch in 1991.
They differ only in the way their trap component works.
The l banyans allow accepting up to l packets destined for any one port at a time (selector makes
sure they go each to a different banyan and sends any extras to Delay for recycling). The
Trap identifies the extras for Selector to recycle.
High-Speed IP Routers





link interface
router lookup (input)
common IP path (input)
packet queue (output)
Line card
(forwarding
buffering)

Switch (possibly ATM)
Line Cards + Forwarding Engines
Network Processor


routing protocol(s)
exceptional cases
Line card
(forwarding
buffering)
Routing software
w/ router OS
Routing
CPU
Line card
(forwarding
buffering)
Line card
(forwarding
buffering)

Buffer
memory
Alternative Design
NI with
uP
...
NI with
uP
NI with
uP
...
NI with
uP
NI with
uP
...
NI with
uP
PC
PC
CPU
CPU
MEM
MEM
PC
CPU
MEM
PC
PC
Crossbar
Switch
CPU
MEM
PC
CPU
CPU
MEM
MEM
NI with
uP
...
NI with
uP
NI with
uP
...
NI with
uP
NI with
uP
...
NI with
uP