CREST Overview - ECE Users Pages

Download Report

Transcript CREST Overview - ECE Users Pages

Interconnection Networks
ECE 6101: Yalamanchili
Spring 2004
Overview

Physical Layer and Message Switching

Network Topologies

Metrics

Deadlock & Livelock

Routing Layer

The Messaging Layer
ECE 6101: Yalamanchili
Spring 2004
2
Interconnection Networks


Fabric for scalable, multiprocessor architectures
Distinct from traditional networking architectures such as
Internet Protocol (IP) based systems
ECE 6101: Yalamanchili
Spring 2004
3
Resource View of Parallel Architectures




P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
How do we present these resources?
What are the costs of different interconnection networks
What are the design considerations?
What are the applications?
ECE 6101: Yalamanchili
Spring 2004
4
Example: Clusters & Google Hardware Infrastructure
 VME rack 19 in. wide, 6 feet tall,
30 inches deep
 Per side: 40 1 Rack Unit (RU)
PCs +1 HP Ethernet switch (4
RU): Each blade can contain 8
100-Mbit/s EN or a single 1-Gbit
Ethernet interface
 Front+back => 80 PCs +
2 EN switches/rack
 Each rack connects to 2 128 1Gbit/s EN switches
 Dec 2000: 40 racks at most
recent site
 6000 PCs, 12000 disks: almost 1
petabyte!
 PC operates at about 55 Watts
 Rack => 4500 Watts , 60 amps
From
Patterson,
CS252, UCB
ECE
6101:
Yalamanchili
Spring 2004
5
Reliability
 For 6000 PCs, 12000s, 200 EN switches
 ~ 20 PCs will need to be rebooted/day
 ~ 2 PCs/day hardware failure, or 2%-3% / year
–
–
–
–
5% due to problems with motherboard, power supply, and connectors
30% DRAM: bits change + errors in transmission (100 MHz)
30% Disks fail
30% Disks go very slow (10%-3% expected BW)
 200 EN switches, 2-3 fail in 2 years
 6 Foundry switches: none failed, but 2-3 of 96 blades of switches
have failed (16 blades/switch)
 Collocation site reliability:
– 1 power failure,1 network outage per year per site
From
Patterson,
CS252, UCB
ECE
6101:
Yalamanchili
Spring 2004
6
The Practical Problem
From: Ambuj Goyal, “Computer Science Grand Challenge – Simplicity of Design,” Computing Research Association
Conference on "Grand Research Challenges" in Computer Science and Engineering, June 2002
ECE 6101: Yalamanchili
Spring 2004
7
Example: Embedded Devices
picoChip: http://www.picochip.com/
 Issues
PACT XPP Technologies: http://www.pactcorp.com/
ECE 6101: Yalamanchili
Execution performance
Power dissipation
Number of chip types
Size and form factor
Spring 2004
8
Physical Layer and Message Switching
ECE 6101: Yalamanchili
Spring 2004
Messaging Hierarchy
Routing Layer
Where?: Destination decisions, i.e., which output port
Switching Layer
When?: When is data forwarded
Physical Layer
How?: synchronization of data transfer
 This organization is distinct from traditional networking
implementations
 Emphasis is on low latency communication
– Only recently have standards been evolving
– Infiniband: http://www.infinibandta.org/home
ECE 6101: Yalamanchili
Spring 2004
10
The Physical Layer
Data
Packets
checksum
header
Flit: flow control digit
Phit: physical flow control digit

Data is transmitted based on a hierarchical data structuring
mechanism
– Messages  packets  flits  phits
– While flits and phits are fixed size, packets and data may be
variable sized
ECE 6101: Yalamanchili
Spring 2004
11
Flow Control

Flow control digit:
synchronized transfer of a unit
of information
– Based on buffer management


Asynchronous vs.
synchronous flow control
Flow control occurs at multiple
levels
– message flow control
– physical flow control

Mechanisms
– Credit based flow control
ECE 6101: Yalamanchili
Spring 2004
12
Switching Layer

Comprised of three sets of techniques
– switching techniques
– flow control
– buffer management

Organization and operation of routers are largely determined
by the switching layer

Connection Oriented vs. Connectionless communication
ECE 6101: Yalamanchili
Spring 2004
13
Generic Router Architecture
Wire delay
Switching delay
ECE 6101: Yalamanchili
Routing delay
Spring 2004
14
Virtual Channels







Each virtual channel is a pair of
unidirectional channels
Independently managed buffers
multiplexed over the physical
channel
De-couples buffers from physical
channels
Originally introduced to break
cyclic dependencies
Improves performance through
reduction of blocking delay
Virtual lanes vs. virtual channels
As the number of virtual channels
increase, the increased channel
multiplexing has two effects
–
–

decrease in header delay
increase in average data flit delay
Virtual Channels
Impact on router performance
–
switch complexity
ECE 6101: Yalamanchili
Spring 2004
15
Circuit Switching




Hardware path setup by a routing header or probe
End-to-end acknowledgment initiates transfer at full hardware
bandwidth
Source routing vs. distributed routing
System is limited by signaling rate along the circuits --> wave
pipelining
ECE 6101: Yalamanchili
Spring 2004
16
Packet Switching




Blocking delays in circuit switching avoided in packet switched
networks --> full link utilization in the presence of data
Increased storage requirements at the nodes
Packetization and in-order delivery requirements
Buffering
– use of local processor memory
– central queues
ECE 6101: Yalamanchili
Spring 2004
17
Virtual Cut-Through


Messages cut-through to the next router when feasible
In the absence of blocking, messages are pipelined
– pipeline cycle time is the larger of intra-router and inter-router
flow control delays


When the header is blocked, the complete message is
buffered
High load behavior approaches that of packet switching
ECE 6101: Yalamanchili
Spring 2004
18
Wormhole Switching





Messages are pipelined, but buffer space is on the order of a
few flits
Small buffers + message pipelining --> small compact buffers
Supports variable sized messages
Messages cannot be interleaved over a channel: routing
information is only associated with the header
Base Latency is equivalent to that of virtual cut-through
ECE 6101: Yalamanchili
Spring 2004
19
Comparison of Switching Techniques

Packet switching and virtual cut-through
– consume network bandwidth proportional to network load
– predictable demands
– VCT behaves like wormhole at low loads and like packet
switching at high loads
– link level error control for packet switching

Wormhole switching
– provides low latency
– lower saturation point
– higher variance of message latency than packet or VCT switching

Virtual channels
– blocking delay vs. data delay
– router flow control latency

Optimistic vs. conservative flow control
ECE 6101: Yalamanchili
Spring 2004
20
Saturation
ECE 6101: Yalamanchili
Spring 2004
21
Network Topologies
ECE 6101: Yalamanchili
Spring 2004
Direct Networks

Generally fixed degree

Modular

Topologies
– Meshes
– Multidimensional tori
– Special case of tori – the binary hypercube
ECE 6101: Yalamanchili
Spring 2004
23
Indirect Networks
– indirect networks
Multistage Network
Fat Tree Network
ECE 6101: Yalamanchili
– uniform
base
latency
– centralized
or
distributed control
– Engineering
approximations to
direct networks
Bandwidth
increases as
you go up the
tree
Spring 2004
24
Generalized MINs




Columns of k x k switches and connections between switches
All switches are identical
Directionality and control
May concentrate or expand or just connect
ECE 6101: Yalamanchili
Spring 2004
25
Specific MINs
 Switch sizes and interstage interconnect establish
distinct MINS
 Majority of interesting MINs have been shown to be
topologically equivalent
ECE 6101: Yalamanchili
Spring 2004
26
Metrics
ECE 6101: Yalamanchili
Spring 2004
Evaluation Metrics
bisection

Bisection bandwidth
– This is minimum bandwidth across any bisection of the network
– Bisection bandwidth is a limiting attribute of performance

Latency
– Message transit time

Node degree
– These are related to pin/wiring constraints
ECE 6101: Yalamanchili
Spring 2004
28
Constant Resource Analysis: Bisection Width
ECE 6101: Yalamanchili
Spring 2004
29
Constant Resource Analysis: Pin out
ECE 6101: Yalamanchili
Spring 2004
30
Latency Under Contention
32-ary 2-cube
vs.
10-ary 3 cube
ECE 6101: Yalamanchili
Spring 2004
31
Deadlock and Livelock
ECE 6101: Yalamanchili
Spring 2004
Deadlock and Live Lock
 Deadlock freedom can be ensured by enforcing
constraints
– For example, following dimension order routing in 2D
meshes
 Similar
ECE 6101: Yalamanchili
Spring 2004
33
Occurrence of Deadlock
VCT and SAF Dependency
Wormhole Dependency
 Deadlock is caused by dependencies between buffers
ECE 6101: Yalamanchili
Spring 2004
34
Deadlock in a Ring Network
ECE 6101: Yalamanchili
Spring 2004
35
Deadlock Avoidance: Principle
 Deadlock is caused by dependencies between buffers
ECE 6101: Yalamanchili
Spring 2004
36
Routing Constraints on Virtual Channels
 Add multiple virtual channels to each physical
channel
 Place routing restrictions between virtual channels
ECE 6101: Yalamanchili
Spring 2004
37
Break Cycles
ECE 6101: Yalamanchili
Spring 2004
38
Channel Dependence Graph
ECE 6101: Yalamanchili
Spring 2004
39
Routing Layer
ECE 6101: Yalamanchili
Spring 2004
Routing Protocols
ECE 6101: Yalamanchili
Spring 2004
41
Key Routing Categories

Deterministic
– The path is fixed by the source destination pair

Source Routing
– Path is looked up prior to message injection
– May differ each time the network and NIs are initialized

Adaptive routing
– Path is determined by run-time network conditions

Unicast
– Single source to single destination

Multicast
– Single source to multiple destinations
ECE 6101: Yalamanchili
Spring 2004
42
Software Layer
ECE 6101: Yalamanchili
Spring 2004
The Message Layer

Message layer background
– Cluster computers
– Myrinet SAN
– Design properties

End-to-End communication path
– Injection
– Network transmission
– Ejection

Overall performance
ECE 6101: Yalamanchili
Spring 2004
44
Cluster Computers
 Cost-effective alternative to supercomputers
– Number of commodity workstations
– Specialized
network
hardware
and
software
CPU Memory
CPU Memory
CPU Memory
CPU Memory
I/O Bus
I/O Bus
I/O Bus
I/O Bus
 Result: Large pool of host processors
Network
Interface
Network
Interface
Network
Interface
Network
Interface
Network
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
Spring 2004
45
For Example..
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
Spring 2004
46
For Example..
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
Spring 2004
47
Clusters & Networks
 Beowulf clusters
– Use Ethernet & TCP/IP
– Cheap, but poor Host-to-Host performance
– Latencies:
– Bandwidths:
~70-100 μs
~80-800 Mbps
 System Area Network (SAN) clusters
– Custom hardware/software
– Examples: Myrinet, SCI, InfiniBand, QsNet
– Expensive, but good Host-to-Host performance
– Latencies:
– Bandwidths:
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
as low as 3 μs
up to 3 Gbps
Spring 2004
48
Myrinet
 Descendant of Caltech Mosaic project
–
–
–
–
CPU
Wormhole network
Source routing
High-speed, Ultra-reliable network
Configurable topology: Switches, NICs, and cables
NI
NI
CPU
X
X
CPU
NI
CPU
NI
CPU
NI
X
CPU
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
NI
NI
CPU
Spring 2004
49
Myrinet Switches & Links

16 Port crossbar chip
X
– 2.0+2.0 Gbps per port
– ~300 ns Latency
Backplane
Fiber
X
X
Fiber
Fiber
Fiber

Line card
– 8 Network ports
– 8 Backplane ports
X
X
X FiberX
Fiber
Fiber
Fiber

X
X
16
X
Xbar
X
To
Line
X
Backplane
Cards
16 Port
Line
Xbar
8 Hosts
/ Line Card
Card
Backplane cabinet
– 17 line card slots
– 128 Hosts
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
Spring 2004
50
Myrinet NI Architecture

Custom RISC CPU
– 33-200MHz
– Big endian
– gcc is available

SRAM
SRAM
– 1-9MB
– No CPU cache

DMA Engines
– PCI / SRAM
– SRAM / Tx
– Rx / SRAM
PCI
Host
DMA
RISC
CPU
Tx
Rx
SAN
DMA
LANai Processor
Network Interface Card
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
Spring 2004
51
Message Layers
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
Spring 2004
“Message Layer” Communication Software

Message layers are enabling technology for clusters
– Enable cluster to function as single image multiprocessor system
– Responsible for transferring messages between resources
– Hide hardware details from end users
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
Cluster Message Layer
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
Spring 2004
53
Message Layer Design Issues

Performance is critical
– Competing with SMPs, where overhead is <1us

Use every trick to get performance
–
–
–
–
–
Single cluster user
Little protection
Reliable hardware
Smart hardware
Arch hacks
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
------
remove device sharing overhead
co-operative environment
optimize for common case of few errors
offload host communication
x86 is a turkey, use MMX, SSE, WC..
Spring 2004
54
Message Layer Organization
User-space Application
Device Driver
Communication Library
- Maintains cluster info
- Message passing API
- Device interface
User-space
Kernel
Message
NI Device
Layer Library
Driver
- Physical access
- DMA transfers
- ISR
NI Firmware
Firmware
- Monitor network wire
- Send/Receive messages
Courtesy
of C. Ulmer
ECE 6101:
Yalamanchili
Spring 2004
55
End User’s Perspective
Processor A
send(
Processor B
dest,
data,
size )
Msg
Msg = extract();
Message Layer
Msg
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
Spring 2004
56
End-to-End Communication Path
 Three phases of data transfer
– Injection
– Network
– Ejection
Message Passing
CPU
CPU
Remote Memory Operations
Memory
Memory
2
1
Source
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
NI
SAN
NI
3
Destination
Spring 2004
57
Injecting Data
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
Spring 2004
Injecting Data into the NI
send(
dest, data, size
)
msg[0] header
B
data
B
B,F
F
msg[1] header
data
data
PCI
[1]
[0]
Tx
Outgoing Message Queue
msg[n-1] header
data
Network Interface Card
Fragmentation
Courtesy
of C. Ulmer
ECE 6101:
Yalamanchili
Spring 2004
59
Host-NI: Data Injections
 Host-NI transfers challenging
CPU
– Host lacks DMA engine
Cache
 Multiple transfer methods
Main
Memory
Memory
Controller
– Programmed I/O
 What about virtual/physical addresses?
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
PCI Bus
– DMA
PCI
DMA
Network
Interface
Memory
Spring 2004
60
Virtual and Physical Addresses


Virtual address space
Physical Address
– Application’s view
– Contiguous
Host
Memory
Physical address space
–
–
–
–

Manage physical memory
Paged, non-contiguous
PCI devices part of PA
PCI devices only use PAs
Viewing PCI device memory
– Memory map
Virtual Address
User space
application
PCI Device
mmap
PCI
PCI Device
PCI Device
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
Spring 2004
61
Addresses and Injections

Programmed I/O (user-space)
– Translation automatic by host CPU
– Example: memcpy( ni_mem, source, size )
– Can be enhanced by use of MMX, SSE registers

DMA (kernel space)
– One-copy:
– Copy data into pinned, contiguous block
– DMA out of block
– Zero-copy:
– Transfer data right out of VA pages
– Translate address and pin each page
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
Spring 2004
62
TPIL Performance:
LANai 9 NI with Pentium III-550 MHz Host
140
DMA 0-Copy
DMA 1-Copy DB
Bandwidth (MBytes/s)
120
DMA 1-Copy
PIO SSE
100
PIO MMX
PIO Memcpy
80
60
40
20
0
10
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
100
1,000
10,000
Injection Size (Bytes)
100,000
1,000,000
Spring 2004
63
Network Delivery (NI-NI)
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
Spring 2004
Network Delivery (NI-NI)
 Reliably transfer message between pairs of NIs
– Each NI basically has two threads: Send and Receive
 Reliability
– SANs are usually error free
– Worried about buffer overflows in NI cards
– Two approaches to flow control: host-level, NI-level
network
SAN
Network Interface
Network Interface
Sending
Receiving
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
Spring 2004
65
Host-managed Flow Control
 Reliability managed by the host
– Host-level credit system
– NI just transfers messages between host and wire


Good points
– Easier to implement
– Host CPU faster than NI
Bad points
– Poor NI buffer utilization
– Retransmission overhead
Send
Sending
Endpoint
SAN
PCI
Receiving
Endpoint
Network Interface
Network Interface
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
PCI
Reply
Spring 2004
66
NI-Managed Flow Control
 NI manages reliable transmission of message
– NIs use control messages (ACK/NACK)


Good points
– Better dynamic buffer use
– Offloads host CPU
Sending
Endpoint
Bad points
– Harder to implement
– Added overhead for NI
DATA
DATA
DATA
PCI
SAN
PCI
Network Interface
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
ACK
Receiving
Endpoint
Network Interface
Spring 2004
67
Ejection (NI-Host)
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
Spring 2004
Message Ejection (NI-Host)
 Move message to host
– Store
close
to
host
CPU
CPU
Memory
 Incoming message queue
–
–
–
–
Pinned, contiguous memory
NI can write directly
Host extracts messages
Reassemble fragments
 How does host see new messages?
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
Network Interface
Spring 2004
69
Notification: Polling
 Applications explicitly call extract()
– Call examines queue front & back pointers
– Processes message if available

Good points
– Good performance
– Can tuck away in a thread
– User has more control
of C. Ulmer
ECE Courtesy
6101: Yalamanchili

Bad points
– Waste time if no messages
– Queue can backup
– Code can be messy
Spring 2004
70
Notification: Interrupts

NI invokes interrupt after putting message in queue
–
–
–
–

Host stops whatever it was doing
Device driver’s Interrupt service routine (ISR) catches
ISR uses UNIX signal infrastructure to pass to application
Application catches signal , executes extract()

Good points
– No wasted polling time
Bad points
– High overhead
– Interrupts: 10 us
– Constantly.. interrupted
NI
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
Device Driver
ISR
Application
Signal Handler
extract()
Spring 2004
71
Other APIs: Remote Memory Ops

Often just passing data
– Don’t disturb receiving application

Remote memory operations
– Fetch, store remote memory
– NI executes transfer directly (no need for notification)
– Virtual addresses translated by the NI (and cached)
Memory
Memory
CPU
CPU
NI
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
SAN
NI
Spring 2004
72
The Message Path
CPU
M
M
CPU
PCI
OS
OS
PCI
PCI
PCI
Memory
Memory
NI
NI
Network


Wire bandwidth is not the bottleneck!
Operating system and/or user level
performance
ECE 6101: Yalamanchili
software
limits
Spring 2004
73
Universal Performance Metrics
Sender
Sender
Overhead
Transmission time
(size ÷ bandwidth)
(processor
busy)
Time of
Flight
Transmission time
(size ÷ bandwidth)
Receiver
Overhead
Receiver
Transport Latency
(processor
busy)
Total Latency
Total Latency = Sender Overhead + Time of Flight +
Message Size ÷ BW + Receiver Overhead
Includes header/trailer in BW calculation?
From
Patterson,
CS252, UCB
ECE
6101:
Yalamanchili
Spring 2004
74
Simplified Latency Model
 Total Latency Overhead + Message Size / BW
 Overhead = Sender Overhead + Time of Flight +
Receiver Overhead
 Can relate overhead to network bandwidth utilization
From
Patterson,
CS252, UCB
ECE
6101:
Yalamanchili
Spring 2004
75
Commercial Example
ECE 6101: Yalamanchili
Spring 2004
Scalable Switching Fabrics for Internet Routers
Router

Internet bandwidth growth  routers with
– large numbers of ports
– high bisection bandwidth

Historically these solutions have used
– Backplanes
– Crossbar switches

White paper: Scalable Switching Fabrics for Internet Routers,
by W. J. Dally, http: //www.avici.com/technology/whitepapers/
ECE 6101: Yalamanchili
Spring 2004
77
Requirements

Scalable
– Incremental
– Economical  cost linear in the number of nodes

Robust
– Fault tolerant  path diversity + reconfiguration
– Non-blocking features

Performance
– High bisection bandwidth
– Quality of Service (QoS)
– Bounded delay
ECE 6101: Yalamanchili
Spring 2004
78
Switching Fabric

Three components
– Topology  3D torus
– Routing  source routing with randomization
– Flow control  virtual channels and virtual networks


Maximum configuration: 14 x 8 x 5 = 560
Channel speed is 10 Gbps
ECE 6101: Yalamanchili
Spring 2004
79
Packaging

Uniformly short wires between
adjacent nodes
– Can be built in passive
backplanes
– Run at high speed
– Bandwidth inversely proportional
to square of wire length
– Cabling costs
– Power costs
Figures are from Scalable Switching Fabrics for Internet Routers, by W. J. Dally (can be found at www.avici.com)
ECE 6101: Yalamanchili
Spring 2004
80
Available Bandwidth

Distinguish between capacity and I/O bandwidth
– Capacity: Traffic that will load a link to 100%
– I/O bandwidth: bit rate in or out

Discontinuuities
Figures are from Scalable Switching Fabrics for Internet Routers, by W. J. Dally (can be found at www.avici.com)
ECE 6101: Yalamanchili
Spring 2004
81
Properties

Path diversity
– Avoids tree saturation
– Edge disjoint paths for fault tolerance
– Heart beat checks (100 microsecs) + deflecting while tables are updated
Figures are from Scalable Switching Fabrics for Internet Routers, by W. J. Dally (can be found at www.avici.com)
ECE 6101: Yalamanchili
Spring 2004
82
Properties
Figures are from Scalable Switching Fabrics for Internet Routers, by W. J. Dally (can be found at www.avici.com)
ECE 6101: Yalamanchili
Spring 2004
83
Use of Virtual Channels

Virtual channels aggregated into virtual networks
– Two networks for each output port

Distinct networks prevent undesirable coupling
– Only bandwidth on a link is shared
– Fair arbitration mechanisms

Distinct networks enable QoS constraints to be met
– Separate best effort and constant bit rate traffic
ECE 6101: Yalamanchili
Spring 2004
84
Summary

Distinguish between traditional networking
performance multiprocessor communication

Hierarchy of implementations
and
high
– Physical, switching and routing
– Protocol families and protocol layers (the protocol stack)

Datapath and architecture of the switches

Metrics
– Bisection bandwidth
– Reliability
– Traditional latency and bandwidth
ECE 6101: Yalamanchili
Spring 2004
85
Study Guide






Given a topology and relevant characteristics such as channel
widths and link bandwidths, compute the bisection bandwidth
Distinguish between switching mechanisms based on how
channel buffers are reserved/used during message
transmission
Latency expressions for different switching mechanisms
Compute the network bisection bandwidth when the software
overheads of message transmission are included
Identify the major delay elements in the message transmission
path starting at the send() call and ended with the receive()
call
How do costs scale in different topologies
– Latency scaling
– Unit of upgrade  cost of upgrade
ECE 6101: Yalamanchili
Spring 2004
86