Transcript PPT

CS 552 Computer Networks
IP forwarding
Fall 2004
Rich Martin
(Slides from D. Culler and N. McKeown)
Position Paper
• Goals:
– Practice writing to convince others
– Research an interesting topic related to networking.
– Generate reactions amongst fellow classmates/professors
• Size of the paper:
– 4000-5000 words, 5-7 pages, 10 pt. Font
•
•
•
•
•
•
Must have title and abstract
Name on the paper is optional
You will evaluate 2 papers
You will revise your paper
Hand in in PDF (preferred) or PS only
Due Oct 8th
Evaluation Criteria
• Is the position well defined?
• Is it narrow enough to be manageable?
• Are the communities of people involved with the position (and
their positions) identified?
• Are the opposing positions articulated?
• Are rebuttals given to the opposing positions?
• What evidence is used to support the position?
• Quantitative evidence based on experimentation?
• General facts about the systems in question?
• Anecdotes only?
• Is the paper logically organized?
• Most importantly, does you paper influence someone
on the position?
Position Topics I
• Peer to Peer technologies equals pirating.
– (suggested by Thu Nguyen)
• SANs vs. LANs.
• Distributed hash tables (DHTs): What are
they good for?
• Ipv4 is sufficient for the next 30 years.
• IP over direct links.
Position Topics 2
• Over-provisioning vs. QoS.
– (Suggested by Badri Nath).
•
•
•
•
•
Multicast vs. P2P.
Mobile IP is dead.
Wireless Ad-hoc networks.
Information will be free.
Others Possible (e.g. on GPL, on standards, security)
– Must convince the instructor position is good.
Academic Integrity
• DO: think about the position
– Helps if you pick a position you care about (at
least a little bit)
• DO: write your own text
• DO: cite sources at points embedded in the
next.
• DON’T rip/off copy text
– Longer quotes (100-150 words) ok, IF properly
done.
• Use papers and samples as models.
Outline
•
•
•
•
Where IP routers sit in the network
What IP routers look like
What do IP routers do?
Some details:
– The internals of a “best-effort” router
• Lookup, buffering and switching
– The internals of a “QoS” router
Outline (next time)
•
•
•
•
The way routers are really built.
Evolution of their internal workings.
What limits their performance.
The way the network is built today
Outline
•
•
•
•
Where IP routers sit in the network
What IP routers look like
What do IP routers do?
Some details:
– The internals of a “best-effort” router
• Lookup, buffering and switching
– The internals of a “QoS” router
• Can optics help?
The Internet is a mesh of routers
(in theory)
IP Core router
The Internet Core
IP Edge
Router
What do they look like?
Access routers
e.g. ISDN, ADSL
Core router
e.g. OC48c POS
Core ATM switch
Basic Architectural Components
of an IP Router
Routing
Protocols
Routing
Table
Control Plane
Datapath”
Forwarding
Switching
Table
per-packet
processing
Per-packet processing in an IP Router
1. Accept packet arriving on an incoming link.
2. Lookup packet destination address in the forwarding
table, to identify outgoing port(s).
3. Manipulate packet header: e.g., decrement TTL,
update header checksum.
4. Send packet to the outgoing port(s).
5. Buffer packet in the queue.
6. Transmit packet onto outgoing link.
General Switch Model
Input
Ports
Receiver
Input
Buffer
Output
Buffer Transmiter
Interconnect
Cross-bar
Control
Routing, Scheduling
Output
Ports
IP Switch Model
1. Ingress
Forwarding
Table
Forwarding
Decision
Forwarding
Table
Forwarding
Decision
Forwarding
Table
Forwarding
Decision
2. Interconnect
3. Egress
Forwarding Engine
Packet
payload
header
Router
Destination
Address
Routing Lookup
Data Structure
Forwarding Table
Dest-network
Port
65.0.0.0/8
3
128.9.0.0/16
1
149.12.0.0/19
7
Outgoing
Port
The Search Operation is not a Direct Lookup
(Incoming port, label)
Memory
(Outgoing port, label)
IP addresses: 32 bits long  4G entries
The Search Operation is also not an
Exact Match Search
Exact match search: search for a key in
a collection of keys of the same length.
Relatively well studied data structures:
• Hashing
• Balanced binary search trees
Example Forwarding Table
Destination IP Prefix
Prefix length
65.0.0.0/8
Outgoing Port
3
128.9.0.0/16
1
142.12.0.0/19
7
IP prefix: 0-32 bits
65.0.0.0/8
0
65.0.0.0
128.9.0.0/16
128.9.16.14
224
65.255.255.255
142.12.0.0/19
232-1
Prefixes can Overlap
Longest matching
prefix
128.9.176.0/24
128.9.16.0/21 128.9.172.0/21
65.0.0.0/8
0
128.9.0.0/16
128.9.16.14
142.12.0.0/19
232-1
Routing lookup: Find the longest matching
prefix (the most specific route) among all
prefixes that match the destination address.
Prefix Length
Difficulty of Longest Prefix Match
32
24
128.9.176.0/24
128.9.16.0/21
128.9.172.0/21
142.12.0.0/19
128.9.0.0/16
8
65.0.0.0/8
128.9.16.14
Prefixes
Lookup Rate Required
Year
Line
Line-rate
(Gbps)
40B
packets
(Mpps)
1998-99
OC12c
0.622
1.94
1999-00
OC48c
2.5
7.81
2000-01
OC192c
10.0
31.25
2002-03
OC768c
40.0
125
31.25 Mpps  33 ns
DRAM: 50-80 ns, SRAM: 5-10 ns
Number of Prefixes
Size of the Forwarding Table
100000
90000
80000
70000
60000
50000
10,000/year
40000
30000
20000
10000
0
95
96
97
Year
Source: http://www.telstra.net/ops/bgptable.html
98
99
00
Internal Interconnects
1. Multiplexers
Io
2. Tri-State Devices
Io
I1
I2
I3
O0
I1
Oi
O2
I2
I3
O3
3. Shared Memory
phase
RAM
addr
Din Dout
Io
I1
I2
I3
O0
Oi
O2
O3
Interconnects
Two basic techniques
Input Queueing
Usually a non-blocking
switch fabric (e.g. crossbar)
Output Queueing
Usually a fast bus
Shared Memory Bandwidth
5ns SRAM
Shared
Memory
•
•
•
•
1
2
N
200 byte bus
5ns per memory operation
Two memory operations per packet
Therefore, up to 160Gb/s
In practice, closer to 80Gb/s
Input buffered swtich
Input
Ports
Output
Ports
R0
R1
R2
Internconnect
Cross-bar
R3
Scheduling
• Independent routing logic per input
– FSM
• Scheduler logic arbitrates each output
– priority, FIFO, random
• Head-of-line blocking problem
Input Queueing
Delay
Head of Line Blocking
Load
58.6%
100%
Head of Line Blocking
(Virtual) Output Buffered Switch
Input
Ports
N buffers per input
Output
Ports
R0
Output
Ports
R1
Output
Ports
R2
Output
Ports
R3
Control
• How would you build a shared pool?
Solving HOL with Input Queueing
Virtual output queues
Input Queueing
Delay
Virtual Output Queues
Load
100%
Output scheduling
R0
Input
Buffers
O0
R1
R2
Cross-bar
R3
O1
O2
• n independent arbitration problems?
– static priority, random, round-robin
• simplifications due to routing algorithm?
• general case is max bipartite matching
Output
Ports
Finding a maximum size match
Requests
Inputs
A
1
B
2
C
3
D
4
E
5
F
6
Outputs
• How do we find the maximum size (weight) match?
Network flows and bipartite matching
Source
s
A
1
B
2
C
3
D
4
E
5
F
6
Sink
t
Finding a maximum size bipartite matching is equivalent to
solving a network flow problem with capacities and flows
of size 1.
Network flows and bipartite matching
Maximum Size Matching:
A
1
B
2
C
3
D
4
E
5
F
6
Complexity of Maximum Matchings
• Maximum Size Matchings:
– Algorithm by Dinic O(N5/2)
• Maximum Weight Matchings
– Algorithm by Kuhn O(N3)
• In general:
– Hard to implement in hardware
– Too Slow in Practice
• But gives nice theory and upper bound
Aribitration
–
–
–
–
Maximal Matches
Wavefront Arbiter (WFA)
Parallel Iterative Matching (PIM)
iSLIP
Maximal Matching
• A maximal matching is one in which each
edge is added one at a time, and is not
later removed from the matching.
• i.e. no augmenting paths allowed (they
remove edges added earlier).
• No input and output are left unnecessarily
idle.
Example of Maximal Size Matching
A
1
A
1
A
1
B
2
B
2
B
2
C
3
C
3
3
D
4
C
D
4
4
E
5
D
5
F
F
6
E
5
6
E
F
6
Maximal
Size Matching
Maximum
Size Matching
Maximal Matchings
• In general, maximal matching is simpler to
implement, and has a faster running time.
• A maximal size matching is at least half the
size of a maximum size matching.
• A maximal weight matching is defined in the
obvious way.
• A maximal weight matching is at least half the
weight of a maximum weight matching.
Wave Front Arbiter
(Tamir)
Requests
Match
1
1
1
1
2
2
2
2
3
3
3
3
4
4
4
4
Wave Front Arbiter
Requests
Match
Wavefront Arbiters
Properties
• Feed-forward (i.e. non-iterative) design lends
itself to pipelining.
• Always finds maximal match.
• Usually requires mechanism to prevent inputs
from getting preferential service.
– What the 50Gbs router does:
• Scramble (permute) inputs each cycle
Parallel Iterative Matching
selection
selection
#1
1
2
3
4
1
2
3
4
f1: Requests
#2
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
f2: Grant
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
f3: Accept/Match
PIM Properties
• Guaranteed to find a maximal match in at
most N iterations.
• In each phase, each input and output arbiter
can make decisions independently.
• In general, will converge to a maximal match
in < N iterations.
Parallel Iterative Matching
PIM with a
single iteration
Parallel Iterative Matching
PIM with 4
iterations
Output Queuing
iSLIP
1
4
#1
1
2
3
1
2
3
1
2
3
13
2
3
1
2
3
1
2
3
4
4
4
4
4
4
F2: Grant
F1: Requests
#2
2
F3: Accept/Match
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
4
4
4
4
4
4
iSLIP Operation
• Grant phase: Each output selects the
requesting input at the pointer, or the next
input in round-robin order. It only updates its
pointer if the grant is accepted.
• Accept phase: Each input selects the
granting output at the pointer, or the next
output in round-robin order.
• Consequence: Under high load, grant
pointers tend to move to unique values.
iSLIP
Properties
•
•
•
•
•
Random under low load
TDM under high load
Lowest priority to MRU
1 iteration: fair to outputs
Converges in at most N iterations. (On average, simulations
suggest < log2N)
• Implementation: N priority encoders
• 100% throughput for uniform i.i.d. traffic.
• But…some pathological patterns can lead to low throughput.
iSLIP
FIFO
iSLIP with 4
iterations
Output
Buffering
iSLIP
Implementation
N
N
1
1
Grant
Accept
2
2
Grant
Accept
log2N
log2N
State
Decision
N
N
Grant
N
Accept
log2N
Alternative Switching
• Crossbars are expensive
• Alternative networks can match inputs to
outputs:
–
–
–
–
Ring
Tree
K-ary N-cubes
Multi-stage logarithmic networks
• Each cell has constant number of inputs and outputs
Example: Butterfly
But t er f l y Net w or k
000
000
001
001
010
011
010
011
100
100
101
101
110
111
110
111
spl i t on
MSB
spl i t on
LSB
Butterfly
Outputs
Inputs
Benes Network
Butterfly #1
Butterfly #2
Benes networks
• Any permutation has a conflict free route
– Useful property
– Offline computation is difficult
• Can route to random node in middle, then to
destination
– Conflicts are unlikely under uniform traffic
– What about conflicts?
Sorting Networks
12
5
5
12
5
15
5
6
5
9
3
5
0
3
0
14
0
6
0
2
0
1
6
15
15
6
6
12
12
15
4
6
0
4
4
5
3
13
3
7
1
3
2
3
0
9
0
9
4
0
9
4
3
12
9
12
6
9
4
11
2
4
4
6
4
5
3
4
4
3
9
3
3
0
0
15
6
15
12
15
5
10
1
5
5
7
6
7
13
8
8
13
8
11
1
8
14
1
14
11
14
13
6
8
8
14
8
11
8
9
1
11
11
1
1
13
11
13
10
8
13
10
11
10
7
9
9
13
9
10
10
11
7
2
2
7
14
2
14
10
11
7
7
1
8
7
2
12
11
12
12
14
12
13
14
10
14
10
10
7
7
2
13
2
8
2
2
1
1
15
10
15
13
15
14
15
• Bitonic sorter recursively merges sorted sublists.
• Can switch by sorting on destination.
– additional components needed for conflict resolution
min
max
max
min
Flow Control
• What do you do when push comes to shove?
–
–
–
–
ethernet: collision detection and retry after delay
FDDI, token ring: arbitration token
TCP/WAN: buffer, drop, adjust rate
any solution must adjust to output rate
• Link-level flow control
Ready
Data
Link Flow Control Examples
Ready/Ack
F/E
Destination
Req
Source
• Short Links
F/E
Data
• long links
– several flits on the wire
Smoothing the flow
Incoming Phits
Flow-control Symbols
Full
Stop
High
Mark
Go
Low
Mark
Empty
Outgoing Phits
• How much slack do you need to maximize
bandwidth?