CREST Overview - ECE Users Pages
Download
Report
Transcript CREST Overview - ECE Users Pages
Interconnection Networks
ECE 6101: Yalamanchili
Spring 2004
Overview
Physical Layer and Message Switching
Network Topologies
Metrics
Deadlock & Livelock
Routing Layer
The Messaging Layer
ECE 6101: Yalamanchili
Spring 2004
2
Interconnection Networks
Fabric for scalable, multiprocessor architectures
Distinct from traditional networking architectures such as
Internet Protocol (IP) based systems
ECE 6101: Yalamanchili
Spring 2004
3
Resource View of Parallel Architectures
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
How do we present these resources?
What are the costs of different interconnection networks
What are the design considerations?
What are the applications?
ECE 6101: Yalamanchili
Spring 2004
4
Example: Clusters & Google Hardware Infrastructure
VME rack 19 in. wide, 6 feet tall,
30 inches deep
Per side: 40 1 Rack Unit (RU)
PCs +1 HP Ethernet switch (4
RU): Each blade can contain 8
100-Mbit/s EN or a single 1-Gbit
Ethernet interface
Front+back => 80 PCs +
2 EN switches/rack
Each rack connects to 2 128 1Gbit/s EN switches
Dec 2000: 40 racks at most
recent site
6000 PCs, 12000 disks: almost 1
petabyte!
PC operates at about 55 Watts
Rack => 4500 Watts , 60 amps
From
Patterson,
CS252, UCB
ECE
6101:
Yalamanchili
Spring 2004
5
Reliability
For 6000 PCs, 12000s, 200 EN switches
~ 20 PCs will need to be rebooted/day
~ 2 PCs/day hardware failure, or 2%-3% / year
–
–
–
–
5% due to problems with motherboard, power supply, and connectors
30% DRAM: bits change + errors in transmission (100 MHz)
30% Disks fail
30% Disks go very slow (10%-3% expected BW)
200 EN switches, 2-3 fail in 2 years
6 Foundry switches: none failed, but 2-3 of 96 blades of switches
have failed (16 blades/switch)
Collocation site reliability:
– 1 power failure,1 network outage per year per site
From
Patterson,
CS252, UCB
ECE
6101:
Yalamanchili
Spring 2004
6
The Practical Problem
From: Ambuj Goyal, “Computer Science Grand Challenge – Simplicity of Design,” Computing Research Association
Conference on "Grand Research Challenges" in Computer Science and Engineering, June 2002
ECE 6101: Yalamanchili
Spring 2004
7
Example: Embedded Devices
picoChip: http://www.picochip.com/
Issues
PACT XPP Technologies: http://www.pactcorp.com/
ECE 6101: Yalamanchili
Execution performance
Power dissipation
Number of chip types
Size and form factor
Spring 2004
8
Physical Layer and Message Switching
ECE 6101: Yalamanchili
Spring 2004
Messaging Hierarchy
Routing Layer
Where?: Destination decisions, i.e., which output port
Switching Layer
When?: When is data forwarded
Physical Layer
How?: synchronization of data transfer
This organization is distinct from traditional networking
implementations
Emphasis is on low latency communication
– Only recently have standards been evolving
– Infiniband: http://www.infinibandta.org/home
ECE 6101: Yalamanchili
Spring 2004
10
The Physical Layer
Data
Packets
checksum
header
Flit: flow control digit
Phit: physical flow control digit
Data is transmitted based on a hierarchical data structuring
mechanism
– Messages packets flits phits
– While flits and phits are fixed size, packets and data may be
variable sized
ECE 6101: Yalamanchili
Spring 2004
11
Flow Control
Flow control digit:
synchronized transfer of a unit
of information
– Based on buffer management
Asynchronous vs.
synchronous flow control
Flow control occurs at multiple
levels
– message flow control
– physical flow control
Mechanisms
– Credit based flow control
ECE 6101: Yalamanchili
Spring 2004
12
Switching Layer
Comprised of three sets of techniques
– switching techniques
– flow control
– buffer management
Organization and operation of routers are largely determined
by the switching layer
Connection Oriented vs. Connectionless communication
ECE 6101: Yalamanchili
Spring 2004
13
Generic Router Architecture
Wire delay
Switching delay
ECE 6101: Yalamanchili
Routing delay
Spring 2004
14
Virtual Channels
Each virtual channel is a pair of
unidirectional channels
Independently managed buffers
multiplexed over the physical
channel
De-couples buffers from physical
channels
Originally introduced to break
cyclic dependencies
Improves performance through
reduction of blocking delay
Virtual lanes vs. virtual channels
As the number of virtual channels
increase, the increased channel
multiplexing has two effects
–
–
decrease in header delay
increase in average data flit delay
Virtual Channels
Impact on router performance
–
switch complexity
ECE 6101: Yalamanchili
Spring 2004
15
Circuit Switching
Hardware path setup by a routing header or probe
End-to-end acknowledgment initiates transfer at full hardware
bandwidth
Source routing vs. distributed routing
System is limited by signaling rate along the circuits --> wave
pipelining
ECE 6101: Yalamanchili
Spring 2004
16
Packet Switching
Blocking delays in circuit switching avoided in packet switched
networks --> full link utilization in the presence of data
Increased storage requirements at the nodes
Packetization and in-order delivery requirements
Buffering
– use of local processor memory
– central queues
ECE 6101: Yalamanchili
Spring 2004
17
Virtual Cut-Through
Messages cut-through to the next router when feasible
In the absence of blocking, messages are pipelined
– pipeline cycle time is the larger of intra-router and inter-router
flow control delays
When the header is blocked, the complete message is
buffered
High load behavior approaches that of packet switching
ECE 6101: Yalamanchili
Spring 2004
18
Wormhole Switching
Messages are pipelined, but buffer space is on the order of a
few flits
Small buffers + message pipelining --> small compact buffers
Supports variable sized messages
Messages cannot be interleaved over a channel: routing
information is only associated with the header
Base Latency is equivalent to that of virtual cut-through
ECE 6101: Yalamanchili
Spring 2004
19
Comparison of Switching Techniques
Packet switching and virtual cut-through
– consume network bandwidth proportional to network load
– predictable demands
– VCT behaves like wormhole at low loads and like packet
switching at high loads
– link level error control for packet switching
Wormhole switching
– provides low latency
– lower saturation point
– higher variance of message latency than packet or VCT switching
Virtual channels
– blocking delay vs. data delay
– router flow control latency
Optimistic vs. conservative flow control
ECE 6101: Yalamanchili
Spring 2004
20
Saturation
ECE 6101: Yalamanchili
Spring 2004
21
Network Topologies
ECE 6101: Yalamanchili
Spring 2004
Direct Networks
Generally fixed degree
Modular
Topologies
– Meshes
– Multidimensional tori
– Special case of tori – the binary hypercube
ECE 6101: Yalamanchili
Spring 2004
23
Indirect Networks
– indirect networks
Multistage Network
Fat Tree Network
ECE 6101: Yalamanchili
– uniform
base
latency
– centralized
or
distributed control
– Engineering
approximations to
direct networks
Bandwidth
increases as
you go up the
tree
Spring 2004
24
Generalized MINs
Columns of k x k switches and connections between switches
All switches are identical
Directionality and control
May concentrate or expand or just connect
ECE 6101: Yalamanchili
Spring 2004
25
Specific MINs
Switch sizes and interstage interconnect establish
distinct MINS
Majority of interesting MINs have been shown to be
topologically equivalent
ECE 6101: Yalamanchili
Spring 2004
26
Metrics
ECE 6101: Yalamanchili
Spring 2004
Evaluation Metrics
bisection
Bisection bandwidth
– This is minimum bandwidth across any bisection of the network
– Bisection bandwidth is a limiting attribute of performance
Latency
– Message transit time
Node degree
– These are related to pin/wiring constraints
ECE 6101: Yalamanchili
Spring 2004
28
Constant Resource Analysis: Bisection Width
ECE 6101: Yalamanchili
Spring 2004
29
Constant Resource Analysis: Pin out
ECE 6101: Yalamanchili
Spring 2004
30
Latency Under Contention
32-ary 2-cube
vs.
10-ary 3 cube
ECE 6101: Yalamanchili
Spring 2004
31
Deadlock and Livelock
ECE 6101: Yalamanchili
Spring 2004
Deadlock and Live Lock
Deadlock freedom can be ensured by enforcing
constraints
– For example, following dimension order routing in 2D
meshes
Similar
ECE 6101: Yalamanchili
Spring 2004
33
Occurrence of Deadlock
VCT and SAF Dependency
Wormhole Dependency
Deadlock is caused by dependencies between buffers
ECE 6101: Yalamanchili
Spring 2004
34
Deadlock in a Ring Network
ECE 6101: Yalamanchili
Spring 2004
35
Deadlock Avoidance: Principle
Deadlock is caused by dependencies between buffers
ECE 6101: Yalamanchili
Spring 2004
36
Routing Constraints on Virtual Channels
Add multiple virtual channels to each physical
channel
Place routing restrictions between virtual channels
ECE 6101: Yalamanchili
Spring 2004
37
Break Cycles
ECE 6101: Yalamanchili
Spring 2004
38
Channel Dependence Graph
ECE 6101: Yalamanchili
Spring 2004
39
Routing Layer
ECE 6101: Yalamanchili
Spring 2004
Routing Protocols
ECE 6101: Yalamanchili
Spring 2004
41
Key Routing Categories
Deterministic
– The path is fixed by the source destination pair
Source Routing
– Path is looked up prior to message injection
– May differ each time the network and NIs are initialized
Adaptive routing
– Path is determined by run-time network conditions
Unicast
– Single source to single destination
Multicast
– Single source to multiple destinations
ECE 6101: Yalamanchili
Spring 2004
42
Software Layer
ECE 6101: Yalamanchili
Spring 2004
The Message Layer
Message layer background
– Cluster computers
– Myrinet SAN
– Design properties
End-to-End communication path
– Injection
– Network transmission
– Ejection
Overall performance
ECE 6101: Yalamanchili
Spring 2004
44
Cluster Computers
Cost-effective alternative to supercomputers
– Number of commodity workstations
– Specialized
network
hardware
and
software
CPU Memory
CPU Memory
CPU Memory
CPU Memory
I/O Bus
I/O Bus
I/O Bus
I/O Bus
Result: Large pool of host processors
Network
Interface
Network
Interface
Network
Interface
Network
Interface
Network
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
Spring 2004
45
For Example..
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
Spring 2004
46
For Example..
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
Spring 2004
47
Clusters & Networks
Beowulf clusters
– Use Ethernet & TCP/IP
– Cheap, but poor Host-to-Host performance
– Latencies:
– Bandwidths:
~70-100 μs
~80-800 Mbps
System Area Network (SAN) clusters
– Custom hardware/software
– Examples: Myrinet, SCI, InfiniBand, QsNet
– Expensive, but good Host-to-Host performance
– Latencies:
– Bandwidths:
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
as low as 3 μs
up to 3 Gbps
Spring 2004
48
Myrinet
Descendant of Caltech Mosaic project
–
–
–
–
CPU
Wormhole network
Source routing
High-speed, Ultra-reliable network
Configurable topology: Switches, NICs, and cables
NI
NI
CPU
X
X
CPU
NI
CPU
NI
CPU
NI
X
CPU
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
NI
NI
CPU
Spring 2004
49
Myrinet Switches & Links
16 Port crossbar chip
X
– 2.0+2.0 Gbps per port
– ~300 ns Latency
Backplane
Fiber
X
X
Fiber
Fiber
Fiber
Line card
– 8 Network ports
– 8 Backplane ports
X
X
X FiberX
Fiber
Fiber
Fiber
X
X
16
X
Xbar
X
To
Line
X
Backplane
Cards
16 Port
Line
Xbar
8 Hosts
/ Line Card
Card
Backplane cabinet
– 17 line card slots
– 128 Hosts
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
Spring 2004
50
Myrinet NI Architecture
Custom RISC CPU
– 33-200MHz
– Big endian
– gcc is available
SRAM
SRAM
– 1-9MB
– No CPU cache
DMA Engines
– PCI / SRAM
– SRAM / Tx
– Rx / SRAM
PCI
Host
DMA
RISC
CPU
Tx
Rx
SAN
DMA
LANai Processor
Network Interface Card
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
Spring 2004
51
Message Layers
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
Spring 2004
“Message Layer” Communication Software
Message layers are enabling technology for clusters
– Enable cluster to function as single image multiprocessor system
– Responsible for transferring messages between resources
– Hide hardware details from end users
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
Cluster Message Layer
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
Spring 2004
53
Message Layer Design Issues
Performance is critical
– Competing with SMPs, where overhead is <1us
Use every trick to get performance
–
–
–
–
–
Single cluster user
Little protection
Reliable hardware
Smart hardware
Arch hacks
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
------
remove device sharing overhead
co-operative environment
optimize for common case of few errors
offload host communication
x86 is a turkey, use MMX, SSE, WC..
Spring 2004
54
Message Layer Organization
User-space Application
Device Driver
Communication Library
- Maintains cluster info
- Message passing API
- Device interface
User-space
Kernel
Message
NI Device
Layer Library
Driver
- Physical access
- DMA transfers
- ISR
NI Firmware
Firmware
- Monitor network wire
- Send/Receive messages
Courtesy
of C. Ulmer
ECE 6101:
Yalamanchili
Spring 2004
55
End User’s Perspective
Processor A
send(
Processor B
dest,
data,
size )
Msg
Msg = extract();
Message Layer
Msg
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
Spring 2004
56
End-to-End Communication Path
Three phases of data transfer
– Injection
– Network
– Ejection
Message Passing
CPU
CPU
Remote Memory Operations
Memory
Memory
2
1
Source
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
NI
SAN
NI
3
Destination
Spring 2004
57
Injecting Data
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
Spring 2004
Injecting Data into the NI
send(
dest, data, size
)
msg[0] header
B
data
B
B,F
F
msg[1] header
data
data
PCI
[1]
[0]
Tx
Outgoing Message Queue
msg[n-1] header
data
Network Interface Card
Fragmentation
Courtesy
of C. Ulmer
ECE 6101:
Yalamanchili
Spring 2004
59
Host-NI: Data Injections
Host-NI transfers challenging
CPU
– Host lacks DMA engine
Cache
Multiple transfer methods
Main
Memory
Memory
Controller
– Programmed I/O
What about virtual/physical addresses?
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
PCI Bus
– DMA
PCI
DMA
Network
Interface
Memory
Spring 2004
60
Virtual and Physical Addresses
Virtual address space
Physical Address
– Application’s view
– Contiguous
Host
Memory
Physical address space
–
–
–
–
Manage physical memory
Paged, non-contiguous
PCI devices part of PA
PCI devices only use PAs
Viewing PCI device memory
– Memory map
Virtual Address
User space
application
PCI Device
mmap
PCI
PCI Device
PCI Device
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
Spring 2004
61
Addresses and Injections
Programmed I/O (user-space)
– Translation automatic by host CPU
– Example: memcpy( ni_mem, source, size )
– Can be enhanced by use of MMX, SSE registers
DMA (kernel space)
– One-copy:
– Copy data into pinned, contiguous block
– DMA out of block
– Zero-copy:
– Transfer data right out of VA pages
– Translate address and pin each page
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
Spring 2004
62
TPIL Performance:
LANai 9 NI with Pentium III-550 MHz Host
140
DMA 0-Copy
DMA 1-Copy DB
Bandwidth (MBytes/s)
120
DMA 1-Copy
PIO SSE
100
PIO MMX
PIO Memcpy
80
60
40
20
0
10
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
100
1,000
10,000
Injection Size (Bytes)
100,000
1,000,000
Spring 2004
63
Network Delivery (NI-NI)
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
Spring 2004
Network Delivery (NI-NI)
Reliably transfer message between pairs of NIs
– Each NI basically has two threads: Send and Receive
Reliability
– SANs are usually error free
– Worried about buffer overflows in NI cards
– Two approaches to flow control: host-level, NI-level
network
SAN
Network Interface
Network Interface
Sending
Receiving
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
Spring 2004
65
Host-managed Flow Control
Reliability managed by the host
– Host-level credit system
– NI just transfers messages between host and wire
Good points
– Easier to implement
– Host CPU faster than NI
Bad points
– Poor NI buffer utilization
– Retransmission overhead
Send
Sending
Endpoint
SAN
PCI
Receiving
Endpoint
Network Interface
Network Interface
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
PCI
Reply
Spring 2004
66
NI-Managed Flow Control
NI manages reliable transmission of message
– NIs use control messages (ACK/NACK)
Good points
– Better dynamic buffer use
– Offloads host CPU
Sending
Endpoint
Bad points
– Harder to implement
– Added overhead for NI
DATA
DATA
DATA
PCI
SAN
PCI
Network Interface
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
ACK
Receiving
Endpoint
Network Interface
Spring 2004
67
Ejection (NI-Host)
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
Spring 2004
Message Ejection (NI-Host)
Move message to host
– Store
close
to
host
CPU
CPU
Memory
Incoming message queue
–
–
–
–
Pinned, contiguous memory
NI can write directly
Host extracts messages
Reassemble fragments
How does host see new messages?
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
Network Interface
Spring 2004
69
Notification: Polling
Applications explicitly call extract()
– Call examines queue front & back pointers
– Processes message if available
Good points
– Good performance
– Can tuck away in a thread
– User has more control
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
Bad points
– Waste time if no messages
– Queue can backup
– Code can be messy
Spring 2004
70
Notification: Interrupts
NI invokes interrupt after putting message in queue
–
–
–
–
Host stops whatever it was doing
Device driver’s Interrupt service routine (ISR) catches
ISR uses UNIX signal infrastructure to pass to application
Application catches signal , executes extract()
Good points
– No wasted polling time
Bad points
– High overhead
– Interrupts: 10 us
– Constantly.. interrupted
NI
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
Device Driver
ISR
Application
Signal Handler
extract()
Spring 2004
71
Other APIs: Remote Memory Ops
Often just passing data
– Don’t disturb receiving application
Remote memory operations
– Fetch, store remote memory
– NI executes transfer directly (no need for notification)
– Virtual addresses translated by the NI (and cached)
Memory
Memory
CPU
CPU
NI
of C. Ulmer
ECE Courtesy
6101: Yalamanchili
SAN
NI
Spring 2004
72
The Message Path
CPU
M
M
CPU
PCI
OS
OS
PCI
PCI
PCI
Memory
Memory
NI
NI
Network
Wire bandwidth is not the bottleneck!
Operating system and/or user level
performance
ECE 6101: Yalamanchili
software
limits
Spring 2004
73
Universal Performance Metrics
Sender
Sender
Overhead
Transmission time
(size ÷ bandwidth)
(processor
busy)
Time of
Flight
Transmission time
(size ÷ bandwidth)
Receiver
Overhead
Receiver
Transport Latency
(processor
busy)
Total Latency
Total Latency = Sender Overhead + Time of Flight +
Message Size ÷ BW + Receiver Overhead
Includes header/trailer in BW calculation?
From
Patterson,
CS252, UCB
ECE
6101:
Yalamanchili
Spring 2004
74
Simplified Latency Model
Total Latency Overhead + Message Size / BW
Overhead = Sender Overhead + Time of Flight +
Receiver Overhead
Can relate overhead to network bandwidth utilization
From
Patterson,
CS252, UCB
ECE
6101:
Yalamanchili
Spring 2004
75
Commercial Example
ECE 6101: Yalamanchili
Spring 2004
Scalable Switching Fabrics for Internet Routers
Router
Internet bandwidth growth routers with
– large numbers of ports
– high bisection bandwidth
Historically these solutions have used
– Backplanes
– Crossbar switches
White paper: Scalable Switching Fabrics for Internet Routers,
by W. J. Dally, http: //www.avici.com/technology/whitepapers/
ECE 6101: Yalamanchili
Spring 2004
77
Requirements
Scalable
– Incremental
– Economical cost linear in the number of nodes
Robust
– Fault tolerant path diversity + reconfiguration
– Non-blocking features
Performance
– High bisection bandwidth
– Quality of Service (QoS)
– Bounded delay
ECE 6101: Yalamanchili
Spring 2004
78
Switching Fabric
Three components
– Topology 3D torus
– Routing source routing with randomization
– Flow control virtual channels and virtual networks
Maximum configuration: 14 x 8 x 5 = 560
Channel speed is 10 Gbps
ECE 6101: Yalamanchili
Spring 2004
79
Packaging
Uniformly short wires between
adjacent nodes
– Can be built in passive
backplanes
– Run at high speed
– Bandwidth inversely proportional
to square of wire length
– Cabling costs
– Power costs
Figures are from Scalable Switching Fabrics for Internet Routers, by W. J. Dally (can be found at www.avici.com)
ECE 6101: Yalamanchili
Spring 2004
80
Available Bandwidth
Distinguish between capacity and I/O bandwidth
– Capacity: Traffic that will load a link to 100%
– I/O bandwidth: bit rate in or out
Discontinuuities
Figures are from Scalable Switching Fabrics for Internet Routers, by W. J. Dally (can be found at www.avici.com)
ECE 6101: Yalamanchili
Spring 2004
81
Properties
Path diversity
– Avoids tree saturation
– Edge disjoint paths for fault tolerance
– Heart beat checks (100 microsecs) + deflecting while tables are updated
Figures are from Scalable Switching Fabrics for Internet Routers, by W. J. Dally (can be found at www.avici.com)
ECE 6101: Yalamanchili
Spring 2004
82
Properties
Figures are from Scalable Switching Fabrics for Internet Routers, by W. J. Dally (can be found at www.avici.com)
ECE 6101: Yalamanchili
Spring 2004
83
Use of Virtual Channels
Virtual channels aggregated into virtual networks
– Two networks for each output port
Distinct networks prevent undesirable coupling
– Only bandwidth on a link is shared
– Fair arbitration mechanisms
Distinct networks enable QoS constraints to be met
– Separate best effort and constant bit rate traffic
ECE 6101: Yalamanchili
Spring 2004
84
Summary
Distinguish between traditional networking
performance multiprocessor communication
Hierarchy of implementations
and
high
– Physical, switching and routing
– Protocol families and protocol layers (the protocol stack)
Datapath and architecture of the switches
Metrics
– Bisection bandwidth
– Reliability
– Traditional latency and bandwidth
ECE 6101: Yalamanchili
Spring 2004
85
Study Guide
Given a topology and relevant characteristics such as channel
widths and link bandwidths, compute the bisection bandwidth
Distinguish between switching mechanisms based on how
channel buffers are reserved/used during message
transmission
Latency expressions for different switching mechanisms
Compute the network bisection bandwidth when the software
overheads of message transmission are included
Identify the major delay elements in the message transmission
path starting at the send() call and ended with the receive()
call
How do costs scale in different topologies
– Latency scaling
– Unit of upgrade cost of upgrade
ECE 6101: Yalamanchili
Spring 2004
86