IP - Washington University in St. Louis

Download Report

Transcript IP - Washington University in St. Louis

Gigabit Ethernet Interface
for the MSR
Fred Kuhns
Applied Research Laboratory
Washington University
St. Louis Mo.
[email protected]
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
Overview
• Two Example Ethernet Scenarios
• Relevant Ethernet and ARP specifications
–
–
–
–
General standards
ATM AAL5 and Ethernet Frame Formats
Ethernet Addressing
The Address Resolution Protocol
• Routing in the MSR
• GigE Interface protocol processing requirements
– packet processing
– ARP
Fred Kuhns - 4/11/2016
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
2
Ethernet Forwarding Scenario 1
Packet arrives with
IP: 192.163.204.2
IP: 192.163.204.3
IP: 192.163.150.3
destination
host
on
local
Destination Addr:
MAC: 08:00:20:7C:E3:25
MAC: 08:00:20:7C:F2:45
MAC: 08:00:20:54:6C:4A
192.168.204.2
network. Output port must
Host
Host
Host
IP hdr
map destination IP address to
P3
data
MAC address.
MSR
Ethernet
Switch
P1
Port 1:
IP: 192.163.204.2
MAC: 00:00:5E:04:00:01
Ethernet
Switch
P0
Use theRouter
Address
Resolution
P1
Host
Protocol
to Map 192.168.204.2
Port 0:
IP: 192.163.150.2
IP:
to192.163.204.4
08:00:20:7C:E3:25. MAC: 00:40:33:A3:4C:04
MAC: 00:01:03:7C:23:03
Encapsulation
datagram in
Port 1:
IP: 192.163.150.1
Ethernet
frame and send.
MAC: 00:01:03:7C:56:34
Fred Kuhns - 4/11/2016
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
3
Ethernet Forwarding Scenario 2
Next hop
router
IP address
Packet
arrives
with
IP: 192.163.204.2
IP:
IP: 192.163.150.3
must
be192.163.204.3
used inhost
the ARP
destination
NOT
on
Destination Addr:
MAC: 08:00:20:7C:E3:25
MAC: 08:00:20:7C:F2:45
MAC: 08:00:20:54:6C:4A
request:
Mapattached
192.168.204.4
to
192.168.150.2
locally
network.
Host
Host
Host
IP hdr
00:01:03:7C:23:03.
Output port
must send
to the
P3
data
Forwards
to final
Encapsulate
datagram
in
next
hop
router.
destination host
Ethernet
frame
and send.Ethernet
Ethernet
MSR
P1
Port 1:
IP: 192.163.204.2
MAC: 00:00:5E:04:00:01
Switch
Switch
P0
Router
P1
Port 0:
IP: 192.163.204.4
MAC: 00:01:03:7C:23:03
Port 1:
IP: 192.163.150.1
MAC: 00:01:03:7C:56:34
Fred Kuhns - 4/11/2016
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
Host
IP: 192.163.150.2
MAC: 00:40:33:A3:4C:04
4
What is required?
• To simplify the overall system design, Ethernet
specific processing confined to Link Interface
• Interface must:
– Bridge between ATM and Ethernet networks
– Map IP addresses to corresponding Ethernet Addresses:
• send ARP requests and maintain an ARP cache
– Respond to ARP requests from other hosts
• send ARP replies in response to requests
• We are not supporting
– IEEE 802.1P/Q: VLANS, priorities etc.
Fred Kuhns - 4/11/2016
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
5
Related Specifications
• RFC 1122 - Requirements for Internet Hosts
–
–
–
–
–
–
–
–
–
–
–
Must send and receive using RFC-894 - compliant
Should receive RFC-1042 mixed with RFC-894 - we do not
May send using RFC-1042 - we do not
Must use ARP - compliant
Must flush out-of-date ARP cache entries - not compliant
Must prevent ARP floods - we only try once
Should have configurable ARP cache timeout - no
Should save at least one (latest) unresolved (by ARP) packet - no
Must report broadcasts to IP layer - compliant
IP layer Must pass TOS to link layer - via the header
Must Not report no ARP entry as “destination unreachable” compliant
Fred Kuhns - 4/11/2016
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
6
Related Specifications - continued
• RFC-826 : Address Resolution Protocol
– Maps <protocol, address> to Ethernet address
– Minor differences in suggested algorithm
• RFC 1700 : Assigned Numbers - Now an online database
– Managed by the Internet Assigned Numbers Authority (IANA)
– Ethertype values and IP to ethernet multicast address mapping
• RFC-1812 : Requirements for IPv4 Routers
– Must not believe ARP reply if contains multicast or broadcast
address - not compliant
– Must be compliant with RFC 1122 - Partial
• Support Ethernet V2 only
– RFC 894: IP encapsulation in Ethernet V2 - Supported
– RFC 1042: IP encapsulation in 802.3 frames - Not Supported
Fred Kuhns - 4/11/2016
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
7
IP over ATM (rfc 791 and 2684)
TOS
Total length
Identification
TTL
flags
protocol
Fragment offset
Header checksum
Source Address
IP
Datagram
IP Header
Version H-length
Destination Address
Options ??
AAL5 Trailer
IP data (transport header and
transport data)
AAL5 padding (0 - 40 bytes)
CPCS-UU (0)
CPCS-UU (0)
Length (IP packet + LLC/SNAP)
CRC
Fred Kuhns - 4/11/2016
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
8
Destination (6 B)
Destination Address cont.
Source Address - (6 B)
Source Address cont.
IP
Header
Version H-length
Ether Type (2 B)
TOS
Total length
Identification
TTL
Flags
Protocol
Fragment offset
IP Header checksum
IP Source Address
IP Destination Address
IP
Datagram
Ethernet
Header
Ethernet Frame Format
Transport Header
Fred Kuhns - 4/11/2016
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
9
IP Encapsulation in Ethernet Frames
• Ethernet frame size: 64 - 1518 Bytes
• if type  1500, then IEEE frame, otherwise Ethernet V2.
Ethernet Encapsulation, RFC 894 - Support
dst address (6) src address (6)
type
0800
Data (46-1500)
Pad
FCS (4)
(0-46)
IEEE 803.2/802.2 encapsulation, RFC 1042 - Do Not Support
802.2 LLC/SNAP
dst address (6) src address (6)
len
(2)
Data (38 - 1492)
Pad FCS (4)
(0-46)
0  len  1500
802.2 LLC
DSAP SSAP ctl
AA AA
03
Fred Kuhns - 4/11/2016
802.2 SNAP
Org Code
type
00
0800
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
10
Ethernet Assigned Numbers
• RFC1700 obsoleted by online database at IANA:
– http://www.iana.org/assignments/ethernet-numbers
• Ethernet Address - 6 octets:
– 3 high-order octets = Organizationally Unique
Identifier (OUI)
– 3 low-order octets = the interface number
• Multicast bit = lsb of the MSB (xxxx xxx1)
– first byte odd => multicast or broadcast
– first byte even => unicast address
– multicast address = ((OUI | 0x0100) << 24) & Group_ID
• Ethernet Broadcast: FF:FF:FF:FF:FF:FF
Fred Kuhns - 4/11/2016
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
11
IP and Ethernet Multicast
• IANA has allocated address block with OUI = 00:00:5E
– Used for unicast addresses for ”IETF standard track protocols “
– Half of Multicast addresses reserved for IP, remaining for “special
use”. Leaves 23 bits for multicast addresses:
• 01:00:5E:00:00:00 to 01:00:5E:7F:FF:FF
– Could use this block for our interface, see ethernet numbers
• IP Multicast
– Class D address, 0xE0000000 + 28 Bit Group ID
– 224.0.0.0 to 239.255.255.255 (0xE0000000 - 0xEFFFFFFF)
• IP to Ethernet Mapping
– RFC1112 - Host Extensions for IP Multicasting
– Non-unique mapping: 28 bit IP group to 23 bit Ethernet group
• 32 IP multicast groups per mapped ethernet multicast address.
Fred Kuhns - 4/11/2016
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
12
Multicast: IP to Ethernet Mappings
• Network Byte Ordering, Internet Standard Bit order:
(Big-Endian)
Multicast Bit
0
Internet Bit
MSB
24
LSB
47
0000 0001 0000 0000 0101 1110 0xxx xxxx xxxx xxxx xxxx xxxx
Block of Ethernet Multicast Address
0
23 bits
8
1110 xxxx xxxx xxxx xxxx xxxx xxxx xxxx
msb
lsb
Class D (Multicast)
LSB
Not Used in IP-to-Ethernet Mapping
Fred Kuhns - 4/11/2016
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
13
IP Broadcast
• No Direct Impact on GigE Interface
• IP Broadcast : default, we will not forward directed
broadcasts.
– limited versus:
• {-1, -1}. Must not be forwarded, Destination address only
– Directed broadcast:
• {Network-Number, -1}, destination address only.
– Subnet Directed Broadcast:
• {Network-Number, Subnet-Number, -1}
– Directed Broadcast to all subnets:
• {Network-Number, -1, -1}
Fred Kuhns - 4/11/2016
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
14
Unicast - We can use the IANA Block
Multicast Bit set to 0
0
MSB
23
LSB
47
0000 0000 0000 0000 0101 1110 0000 0100 xxxx xxxx xxxx xxxx
IANA Block of Ethernet Addresses
ARL
16 bits
Interface Number
WUARL MAC: 00:00:5E:04:XX:XX
Fred Kuhns - 4/11/2016
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
15
ARP Frame
Destination Address (6B)
Source Address (6B)
Ether Type (2B)
Hardware Address Space (2B)
Protocol Address Space (2B)
Byte length of Hardware address = 6 (1B)
Byte length of Protocol address = 4 (1B)
Operation Code 1|2(2B)
Hardware Address of Sender (6 B)
Protocol Address of Sender (4 B)
Hardware Address of Destination (6 B)
Protocol Address of Destination (4 B)
Fred Kuhns - 4/11/2016
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
16
ARP Message Formats
ARP Message (28 Bytes for Request or Reply)
ARP Request
dst address
src address type has pas hl pl op
ff:ff:ff:ff:ff:ff
<eth-A>
806 1 0800 6 4 01
sha
<eth-A>
Host A IP
<ip-A>
spa
<ip-A>
tha
<??>
tpa
<ip-B>
pad
Host A Eth
<eth-B>
Request (01)
FCS
xx
18 Byte Pad
dst address
<eth-A>
src address type has pas hl pl op
<eth-B>
806 1 800 6 4 02
sha
<eth-B>
Host B Eth
<eth-B>
Ethernet Header (14 B)
spa
<ip-B>
tha
<eth-A>
tpa
<ip-A>
pad
Reply (02)
ARP Reply
Host B IP
<ip-A>
FCS
(4B)
Ethernet Data - Pad with zeros to 46 Bytes
Ethernet Frame with ARP Request/Reply - 64 Bytes
Fred Kuhns - 4/11/2016
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
FCS
xx
17
Packet Routing, SPC and FPX
GiGE Interface will
only send on one VCI
value (currently = 50)
Egress
SPC
SPC
plugins
plugins
FIPL
FIPL
IP
proc
IP
proc
shim
demux
shim
update
WUGS
shim
update
FPX_VCI
FIPL
FPX_VCI
...
...
40 ... 47
(out port +40)
GiGE interface will use all
four Sub-Port identifier (i.e.
four VCI values)
FPX
shim
proc.
40 ... 47
(in port + 40)
From
previous
hop router
or endsystem
Fred Kuhns - 4/11/2016
shim
demux
Current VCI Support:
1) 8 Ports (PN)
2) 4 sub-ports (SP)
rem shim
add shim
Link Interface
FPX
Ingress
Link Interface
IP eval: IP processing for
FPX.
1. Broadcast and Multicast
destination address
2. IP options
3. Packet not recognized
Outbound VC = SPI + 50
0 <= SPI<= 3
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
18
Routing in the MSR
• Route tables must map a given destination address to output
port and sub-port identifier (i.e. Virtual Interface Number or
VIN).
– Route table entry: {prefix/length, Output_VIN}
• 192.168.204.0/24, 41 (Port 1, Subport 1)
• Output_VIN = {Port # (10 bits), Sub-port # (6 bits)}.
• At input port, packet is sent to the indicated output port:
– VCI = 40 + Port number
• At output port, the sub-port is mapped to an output VCI
value:
– VCI = 50 + Sub-port Identifier
Fred Kuhns - 4/11/2016
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
19
MSR Routing Example
Control
Processor
Switch Fabric
remove shim, calculate
VCI for subport 0: SPC
SPC
packet received at VCI = 50 + 0 = 50
Dist.
Dist. Q. Ctl.
Dist. Q. Ctl. Dist. Q. Ctl.
(4, 0)Q. Ctl.
input
port
route
lookup
returns
(4, 0)
IP hdr
Output
Input
Flow
Flow IP hdr
Port
data
<port
= 4, Port
subport
=
0>
Lookup
Lookup
Proc.
Proc.
data
.
.
.
insert shim withFlow/Route
IP hdr
Flow/Route
IP hdr Lookup
Lookup
OutVIN,
send
on
data
Send to next
data
VCI = 40 + 4 = 44
hop/endsystem
Fred Kuhns - 4/11/2016
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
20
Supporting Ethernet
• We can leverage the sub-port identifier to facilitate
IP to ethernet address resolution.
– if packet received on VCI = 50 (subport 0), then use the
IP destination address in the header
– otherwise (subports 1-3), lookup VCI value in a table to
obtain the next hop IP address.
• Once we have the IP address we must map it to the
corresponding Ethernet address.
– We can then implement a simple version of ARP in the
GigE interface card.
Fred Kuhns - 4/11/2016
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
21
GigE Link Interface
Endsystem, Broadcast
or Multicast address
Pkt VC = 50
From
FPX/SPC
Map multicast
or broadcast to
ethernet address
Send to pkt->dst
if bcast or mcast
map to eaddr
else unicast
resolve w/ARP
ARP Table
(Simplified)
IP Header
If ARP table lookup
fails, send ARP request
to broadcast address,
drop packet. No retries
are made.
No ARP entry aging!
IP
MAC
IP1
MAC1
...
...
Ethernet
MACM
IP Header
IPM
data
VIN Table
(Simplified)
AAL5 trailer
Entry NhIP
To a next hop router
NH #0 = Base + 1 = 51
NH #1 = Base + 2 = 52
NH #2 = Base + 3 = 53
Fred Kuhns - 4/11/2016
if VC != 50,
Lookup VC in
VIN table
returns IP used
for ARP lookup
0
NhIP0
1
NhIP1
2
NhIP2
Software creates
VIN table at boot
time by writing to
interface.
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
data
To Next Hop
or
Endstation
Add Ethernet header
using the derived
destination address and
our corresponding
source address.
22
GigE Link Interface
From Next Hop or Endstation
ARP Table
(Simplified)
IP
MAC
Ethernet
IP1
MAC1
ARP
...
...
IPM MACM
Ethernet
IP Header
data
if ( An ARP packet)
update Mapping in ARP table
if not for us then drop
if (ARP Request)
swap source and target info
set operation to Reply
set ether header
send reply
else if (An IPv4 packet)
remove ethernet “stuff”
add AAL5 trailer/padding
send on default Base VC
else drop packet
Fred Kuhns - 4/11/2016
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
to FPX/SPC
Base VC
IP Header
data
AAL5 trailer
23
Some Details
• Packet Received on ATM interface:
– If received on VCI 50 (i.e. the base VCI) then
• Map IP destination in header (ip->dst_addr) to ethernet MAC address.
• Unicast uses ARP table, multicast and broadcast use appropriate mapping.
– Otherwise (VCI = {51, 52 or 53}),
• lookup VC in VIN table: Table entry index = RX_VC - Base_VC - 1.
For example, packet received on VCI = 53, Index = 53 - 50 - 1 = 2
• ARP the resulting Next Hop IP address.
– This permits a simple mechanism for “directing” traffic to a
gateway. This allows us to support directed broadcast and provides
a convenient mechanism for testing.
• Packet Received on Ethernet interface:
– if IPv4 then send all (unicast, multicast and broadcast) to input port
processor on VCI 50 (i.e. the Base VCI)
Fred Kuhns - 4/11/2016
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
24
ARP Cache
• IP Address = Network_Prefix.Host or simply Net.Host
– Assume a prefix length of at least 24 bits, leaves 8 bits for the host
– An interface can have at most 3 unique IP addresses
• Interface may communicate with at most 256 hosts per network
• Implement ARP cache as a table with 768 entries (3 * 256)
• See next slide
ARP Table
IP
Ethernet
VIN Table
Entry Prefix
Number Mask
0
Mask0
Local IP Next Hop
Address IP Address
MyIP0
NH0
1
Mask1
MyIP1
NH1
2
Mask2
MyIP2
NH2
Net 0 = Mask0 & MyIP0
Net 1 = Mask1 & MyIP1
Net 2 = Mask2 & MyIP2
Fred Kuhns - 4/11/2016
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
IP0,0
...
Ether0,0
...
IP0,255
Ether0,255
IP1,0
...
Ether1,0
...
IP1,255
Ether1,255
IP2,0
...
Ether2,0
...
IP2,255
Ether2,255
Net 0
Net 1
Net 2
25
Implementing the ARP Table
VIN Table
‘get next packet’:
Entry Prefix Local IP Next Hop
// received frame from ATM interface
Number Mask
Address IP Address
if (RX_VC == Base_VC)
0
Mask0
MyIP0
NH0
ipdst = ip->dst_addr;
else
1
Mask1
MyIP1
NH1
ipdst = VIN_Table[RX_VC- Base_VC - 1].NextHop
2
Mask2
MyIP2
NH2
// ipdst == IP Address of host we must send packet to
// determine network, Using the VIN table
for (i = 0; i < 3; i++) {
ARP Table
// i corresponds to the Network Number (0 - 2)
IP
Ethernet
if ((ipdst & Maski) == (MyIPi & Maski)) {
IP0,0
Ether0,0
index = (i << 8) | ((ipdst & ~Maski) & 0xffffff00)
...
...
break; }
if i == 3 ; drop packet, goto get next packet
IP0,255 Ether0,255
// lookup in ARP table
IP1,0
Ether1,0
if (ArpTable[index].EtherAddress != 00:00:00:00:00:00) {
index
...
...
construct ethernet frame
IP1,255 Ether1,255
send packet
goto ‘get next packet’
IP2,0
Ether2,0
} else {
...
...
send ARP Request for ipdst
IP2,255 Ether2,255
drop packet, goto ‘get next packet’}
don’t need to store IP address
Fred Kuhns - 4/11/2016
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
26
Notes and Issues
• GigE Control Interface for Software configuration.
1. Reset interface to defaults
2. Clear ARP cache
3. Read ARP table
4. Read VIN table
5. Read ethernet address
6. set VIN table entries and other registers
• Set BASE VC (currently 50)
• Set Entries in the VIN table
• Add static ARP entries
Fred Kuhns - 4/11/2016
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
27
Hardware and Status
• Software Simulation completed
• Hardware implementation and status: Dave ...
Fred Kuhns - 4/11/2016
Washington
WASHINGTON UNIVERSITY IN ST LOUIS
28