gige - Washington University in St. Louis

Download Report

Transcript gige - Washington University in St. Louis

GigE for the MSR
Fred Kuhns
[email protected]
[email protected]
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
Ethernet Forwarding Scenario 1
Packet arrives with
IP: 192.163.204.2
IP: 192.163.204.3
IP: 192.163.150.3
destination
host
on
local
Destination Addr:
MAC: 08:00:20:7C:E3:25
MAC: 08:00:20:7C:F2:45
MAC: 08:00:20:54:6C:4A
192.168.204.2
network. Output port must
Host
Host
Host
IP hdr
map destination IP address to
P3
data
MAC address.
MSR
Ethernet
Switch
P1
Port 1:
IP: 192.163.204.2
MAC: 00:00:5E:04:00:01
Ethernet
Switch
P0
Use theRouter
Address
Resolution
P1
Host
Protocol
to Map 192.168.204.2
Port 0:
IP: 192.163.150.2
IP:
to192.163.204.4
08:00:20:7C:E3:25. MAC: 00:40:33:A3:4C:04
MAC: 00:01:03:7C:23:03
Encapsulation
datagram in
Port 1:
IP: 192.163.150.1
Ethernet
frame and send.
MAC: 00:01:03:7C:56:34
Fred Kuhns - 1/9/01
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
2
Ethernet Forwarding Scenario 2
Packet arrives with
IP: 192.163.204.2
IP: 192.163.204.3
IP: 192.163.150.3
destination
host
NOT
on
Destination Addr:
MAC: 08:00:20:7C:E3:25
MAC: 08:00:20:7C:F2:45
MAC: 08:00:20:54:6C:4A
192.168.150.2
locally attached network.
Host
Host
Host
IP hdr
Output port
must send
to the
P3
data
Forwards
to final
next
hop router.
destination
host
MSR
Ethernet
Switch
P1
Ethernet
Switch
Next hopP0 router IP address
P1
Host
must Router
be used in the ARP
Port 0: Map 192.168.204.4IP:
request:
to192.163.150.2
IP: 192.163.204.4
MAC: 00:40:33:A3:4C:04
00:01:03:7C:23:03.
MAC:
00:01:03:7C:23:03
Port 1:
Encapsulate
IP: 192.163.150.1 datagram in
MAC: 00:01:03:7C:56:34
Ethernet
frame and send.
Port 1:
IP: 192.163.204.2
MAC: 00:00:5E:04:00:01
Fred Kuhns - 1/9/01
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
3
Destination (6 B)
Destination Address cont.
Source Address - (6 B)
Source Address cont.
IP
Header
Version H-length
Ether Type (2 B)
TOS
Total length
Identification
TTL
Flags
Protocol
Fragment offset
IP Header checksum
IP Source Address
IP Destination Address
IP
Datagram
Ethernet
Header
Ethernet Frame Format
Transport Header
Fred Kuhns - 1/9/01
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
4
IP Encapsulation in Ethernet Frames
• Ethernet frame size: 64 - 1518 Bytes
• if type  1500, then IEEE frame, otherwise Ethernet V2.
Ethernet Encapsulation, RFC 894
dst address (6)
src address (6)
type
0800
Data (46-1500)
Pad
(0-46)
FCS (4)
Pad
(0-46)
FCS (4)
IEEE 803.2/802.2 encapsulation, RFC 1042
802.2 LLC/SNAP
dst address (6)
src address (6)
len
(2)
Data (38 - 1492)
0  len  1500
802.2 LLC
DSAP SSAP
AA
AA
Fred Kuhns - 1/9/01
802.2 SNAP
ctl
03
Org Code
00
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
type
0800
5
ARP Frame
Destination Address (6B)
Source Address (6B)
Ether Type (2B)
Hardware Address Space (2B)
Protocol Address Space (2B)
Byte length of Hardware address = 6 (1B)
Byte length of Protocol address = 4 (1B)
Operation Code 1/2(2B)
Hardware Address of Sender (6 B)
Protocol Address of Sender (4 B)
Hardware Address of Destination (6 B)
Protocol Address of Destination (4 B)
Fred Kuhns - 1/9/01
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
6
ARP Message Formats
ARP Message (28 Bytes for Request or Reply)
Host A Eth
<eth-B>
ARP Request
Host A IP
<ip-A>
dst address
src address type has pas hl pl op
ff:ff:ff:ff:ff:ff
<eth-A> 08060001 0800 6 4 01
sha
<eth-A>
spa
<ip-A>
tha
<??>
tpa
<ip-B>
pad
Request (01)
FCS
xx
18 Byte Pad
dst address
<eth-A>
src address type has pas hl pl op
<eth-B>
806 1 800 6 4 02
sha
<eth-B>
Host B Eth
<eth-B>
Ethernet Header (14 B)
spa
<ip-B>
tha
<eth-A>
tpa
<ip-A>
pad
Reply (02)
ARP Reply
Host B IP
<ip-A>
FCS
(4B)
Ethernet Data - Pad with zeros to 46 Bytes
Ethernet Frame with ARP Request/Reply - 64 Bytes
Fred Kuhns - 1/9/01
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
FCS
xx
7
IP over ATM (rfc 791 and 2684)
TOS
Total length
Identification
TTL
flags
protocol
Fragment offset
Header checksum
Source Address
IP
Datagram
IP Header
Version H-length
Destination Address
Options ??
AAL5 Trailer
IP data (transport header and
transport data)
AAL5 padding (0 - 40 bytes)
CPCS-UU (0)
CPCS-UU (0)
Length (IP packet + LLC/SNAP)
CRC
Fred Kuhns - 1/9/01
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
8
IP Header Fields (rfc 791)
• Version - support IPv4 (4)
• Header Length - Length in 32 bit words
(>= 5)
Prec.
D T R 0 0
• TOS • Total Length - Length of datagram in
octets
• Id - Assists in reassembling fragments
DF - 1 = Don’t Fragment,
• Flags - 0 DF MF
MF - 1 = More Fragments
• Fragment Offset - Where fragment
belongs, offset is in octets
Fred Kuhns - 1/9/01
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
TOS Precedense Field:
111 - Network Control
110 - Internetwork Control
101 - Critic/ECP
100 - Flash Override
011 - Flash
010 - Immediate
001 - Priority
000 - Routine
Remaining TOS Fields:
D - 1 = Low delay
T - 1 = High Throughput
R - 1 = High Reliability
9
IP Header Fields
• TTL - router must decrement, if 0 then discard
packet
• Protocol - UDP/TCP/ICMP/RSVP to name a few
• Header Checksum - 16 bit one’s complement of
the one’s complement sum of all 16 bit words in
header
• Source Address - Sending hosts IP address
• Destination Address - Destination hosts IP
address
Fred Kuhns - 1/9/01
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
10
Packet Routing Within MSR
Ingress
SPC
SPC
plugins
plugins
FIPL
FIPL
IP
proc
IP
proc
shim
demux
shim
update
WUGS
shim
update
OutVC
InVC
FIPL
...
out port + IntBase
(64 ... 127)
From
Inbound VC = SPI + ExtBase
previous hop
0 <= SPI <= 15
router or
Currently support at most 4
endstation Inbound VCs: One for Ethernet or
Four for ATM
Fred Kuhns - 1/9/01
shim
demux
...
Current VCI Support
1) 64 Ports (PN)
2) 16 sub-ports (SP)
Ethernet: Base VC used for
directly attached hosts,
subports are for hext hop
routers
FPX
shim
proc.
rem shim
add shim
Link Interface
FPX
Egress
Link Interface
IP processing for FPX
1. Broadcast and Multicast
destination address
2. IP options
3. ICMP messages
4. Packet not recognized
in port + IntBase
(64 ... 127)
Outbound VC = SPI + ExtBase
0 <= SPI<= 15
currently support at most 4
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
ATM uses VCs
as link layer
address.
11
GigE Link Interface
Send to pkt->dst
if bcast or mcast
map to eaddr
else
resolve w/ARP
ARP Table
(M Entries)
From
FPX/SPC
IP Header
IP
MAC
IP1
MAC1
...
...
IPM
MACM
data
NH #3 = Base + 3 = 53
Fred Kuhns - 1/9/01
No ARP entry aging!
Ethernet
IP Header
AAL5 trailer
To a next hop router
NH #1 = Base + 1 = 51
NH #2 = Base + 2 = 52
If ARP table lookup
fails, send ARP request
to broadcast address,
drop packet. No retries
are made.
VIN Table - 4 entries
if VC != 50,
Lookup VC in
VIN table
returns IP used
for ARP lookup
(support N = 4)
VC
MyIP
NhIP
50
MyIP0
0
51
MyIP0
NhIP0
52
MyIP1
NhIP1
53
MyIP2
NhIP2
Software creates
VIN table at boot
time by writing to
interface.
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
data
Add Ethernet header
using the derived
destination address and
out source address.
Protocol is IP.
To Next Hop or Endstation
Endsystem, broadcast
or multicast address
Pkt VC = 50
Map multicast
or broadcast to
ethernet address
12
Ethernet Assigned Numbers
• RFC1700 obsoleted by online database at IANA:
– http://www.iana.org/assignments/ethernet-numbers
• Ethernet Address - 6 octets:
– 3 high-order octets = Organizationally Unique
Identifier (OUI)
– 3 low-order octets = the interface number
• Multicast bit = lsb of the MSB (xxxx xxx1)
– first byte odd => multicast or broadcast
– first byte even => unicast address
– multicast address = ((OUI | 0x0100) << 24) & Group_ID
• Ethernet Broadcast: FF:FF:FF:FF:FF:FF
Fred Kuhns - 1/9/01
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
13
IP and Ethernet Multicast
• IANA has allocated address block with OUI = 00:00:5E
– Used for unicast addresses for ”IETF standard track protocols “
– Half of Multicast addresses reserved for IP, remaining for “special
use”. Leaves 23 bits for multicast addresses:
• 01:00:5E:00:00:00 to 01:00:5E:7F:FF:FF
– Could use this block for our interface, see ethernet numbers
• IP Multicast
– Class D address, 0xE0000000 + 28 Bit Group ID
– 224.0.0.0 to 239.255.255.255 (0xE0000000 - 0xEFFFFFFF)
• IP to Ethernet Mapping
– RFC1112 - Host Extensions for IP Multicasting
– Non-unique mapping: 28 bit IP group to 23 bit Ethernet group
• 32 IP multicast groups per mapped ethernet multicast address.
Fred Kuhns - 1/9/01
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
14
Multicast: IP to Ethernet Mappings
• Network Byte Ordering, Internet Standard Bit order:
(Big-Endian)
0
Multicast Bit
MSB
Internet Bit
24
LSB
47
0000 0001 0000 0000 0101 1110 0xxx xxxx xxxx xxxx xxxx xxxx
Block of Ethernet Multicast Address
23 bits
0
8
1110 xxxx xxxx xxxx xxxx xxxx xxxx xxxx
msb
Class D (Multicast)
Not Used in IP to Ethernet Mapping
Fred Kuhns - 1/9/01
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
lsb
LSB
15
IP Broadcast
• No Direct Impact on GigE Interface
• IP Broadcast : default, we will not forward directed
broadcasts.
– limited versus:
• {-1, -1}. Must not be forwarded, Destination address only
– Directed broadcast:
• {Network-Number, -1}, destination address only.
– Subnet Directed Broadcast:
• {Network-Number, Subnet-Number, -1}
– Directed Broadcast to all subnets:
• {Network-Number, -1, -1}
Fred Kuhns - 1/9/01
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
16
Unicast - If we use the IANA Block
Multicast Bit set to 0
0 MSB
23
LSB
47
0000 0000 0000 0000 0101 1110 0000 0100 xxxx xxxx xxxx xxxx
IANA Block of Ethernet Addresses
16 bits
ARL
Fred Kuhns - 1/9/01
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
Interface Number
17
GigE Link Interface
to FPX/SPC
Base VC
IP Header
data
AAL5 trailer
*Unicast MAC address filtering
Fred Kuhns - 1/9/01
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
18
To FPX/SPC
From Next Hop or Endstation
ARP Table (M Entries)
receive ethernet frame: eth
IP MAC
if (eth->type == ARP)
IP1 MAC1
if (eth->arp->has != Ethernet/0001) Drop Frame
...
...
if (eth->arp->pas != IP/0800) Drop Frame
IPM MACM
update {eth->arp->spa, eth->arp->sha} in ARP table
if (eth->arp->tpa NOT in {MyIP0, MyIP1, MyIP2})
Drop Frame // target IP not ours
if (eth->arp->op == Request/01) {
swap source and target ARP info
Ethernet
set operation to Reply
IP Header
set ether header src and dst address
send reply
data
}
// Already handled eth->arp->op == Reply/02
// when updated cache above
else if (eth->type == IPv4)
remove ethernet header, padding and CRC
add AAL5 trailer and required padding
break into cells and send on default Base VC
else
Error, drop packet
Notes
• Packet Received on ATM interface:
– If received on Base_VC (i.e. 50) then
• map IP destination (ip->dst_addr) to ethernet representation.
• Unicast uses ARP table, multicast and broadcast use appropriate mapping.
– Otherwise,
• lookup VC in VIN table: Table entry index = RX_VC - Base_VC.
• ARP the resulting Next Hop IP address.
– This permits a simple mechanism for “tunneling” traffic to a
gateway. This allows us to support directed broadcast and provides
a convenient mechanism for testing.
• Packet received on Ethernet interface:
– if IPv4 then send all (unicast, multicast and broadcast) to input port
processor on the Base_VC (i.e. 50)
Fred Kuhns - 1/9/01
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
19
ARP Cache
• IP Address = Network_Prefix.Host or simply Net.Host
– Assume a prefix length of at least 24 bits, leaves 8 bits for the host
– An interface can have at most 3 unique IP addresses
• Interface may communicate with at most 256 hosts per network
• Implement ARP cache as a table with 768 entries (3 * 256)
• See next slide
ARP Table
VIN Table
Entry Prefix
Number Mask
0
Mask0
IP
Local IP Next Hop
Address IP Address
MyIP0
NH0
1
Mask1
MyIP1
NH1
2
Mask2
MyIP2
NH2
Net 0 = Mask0 & MyIP0
Net 1 = Mask1 & MyIP1
Net 2 = Mask2 & MyIP2
Fred Kuhns - 1/9/01
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
IP0,0
...
Ethernet
Ether0,0
...
IP0,255
Ether0,255
IP1,0
...
Ether1,0
...
IP1,255
Ether1,255
IP2,0
...
Ether2,0
...
IP2,255
Ether2,255
Net 0
Net 1
Net 2
20
Implementing the ARP Table
‘get next packet’:
Entry
// received frame from ATM interface
Number
if (RX_VC == Base_VC)
0
ipdst = ip->dst_addr;
else
1
ipdst = VIN_Table[RX_VC- Base_VC].NextHop
2
// ipdst == IP Address of host we must send packet to
// determine network
for (i = 0; i < 3; i++) {
if ((ipdst & Maski) == (MyIPi & Maski)) {
index = (i << 8) | (ip->dst_addr & ~Maski)
break; }
if i == 3 ; drop packet, goto get next packet
// i corresponds to the Network Number (0 - 2)
if (ArpTable[index].EtherAddress != 00:00:00:00:00:00) {
construct ethernet frame
send packet
goto ‘get next packet’
} else {
send ARP Request for ipdst
drop packet, goto ‘get next packet’}
VIN Table
Prefix
Mask
Mask0
Mask1
MyIP1
NH1
Mask2
MyIP2
NH2
ARP Table
IP
index
don’t need to store IP address
Fred Kuhns - 1/9/01
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
Local IP Next Hop
Address IP Address
MyIP0
NH0
IP0,0
...
Ethernet
Ether0,0
...
IP0,255
Ether0,255
IP1,0
...
Ether1,0
...
IP1,255
Ether1,255
IP2,0
...
Ether2,0
...
IP2,255
Ether2,255
21
Notes and Issues
• GigE Control Interface for Software configuration.
1. Reset interface to defaults
2. Clear ARP cache
3. Read ARP table
4. Read VIN table
5. Read ethernet address
6. set VIN table entries and other registers
• Set BASE VC (currently 50)
• Set Entries in the VIN table
• Add static ARP entries??
Fred Kuhns - 1/9/01
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
22
Notes and Issues
• Comprehensive testing scenarios need defining
• verify multicast and broadcast
• VC to control line card
Fred Kuhns - 1/9/01
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
23
References
• RFC 1122 - Requirements for Internet Hosts
–
–
–
–
–
–
–
–
–
–
–
Must send and receive using RFC-894 - compliant
Should receive RFC-1042 mixed with RFC-894 - we do not
May send using RFC-1042 - we do not
Must use ARP
Must flush out-of-date ARP cache entries - not compliant
Must prevent ARP floods - we only try once
Should have configurable ARP cache timeout - no
Should save at least one (latest) unresolved (by ARP) packet - no
Must report broadcasts to IP layer - compliant
IP layer Must pass TOS to link layer - via the header
Must Not report no ARP entry as “destination unreachable” compliant
Fred Kuhns - 1/9/01
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
24
References
• RFC-826 : Address Resolution Protocol
– Maps <protocol, address> to 48 bit Ethernet address
– our processing differs in minor ways
• RFC 1700 : Assigned Numbers
– Ethertype values defined by RFC 1700
– IP to ethernet multicast address mapping defined
• RFC-1812 : Requirements for IPv4 Routers
– Must not believe ARP reply if contains multicast or broadcast
address - not compliant
– Must be compliant with RFC 1122 - Partial
• Support Ethernet V2 only
– RFC 894: IP encapsulation in Ethernet V2 - Supported
– RFC 1042: IP encapsulation in 802.3 frames - Not Supported
Fred Kuhns - 1/9/01
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
25