Transcript PPT Version
Simple Protocol for Robust Tunnel Endpoint
MTU Determination (sprite-mtu)
IETF 70 Routing Research Group (RRG)
Fred L. Templin
[email protected]
MTU Determination Problem
End-to-End
Final Destination
(EMTU_R=64KB)
Tunnel
MTU=64KB
Original Source
(MTU=64KB)
MTU=9KB
MTU=64KB
MTU=64KB
Edge
Network
Tunnel
Near-End
Edge
Network
MTU=4KB
Tunnel
Far-End
(EMTU_R=8KB)
MTU=2KB
MTU=??
Internet/Enterprise Network/MANET/etc.
Tunnel MTU Issues (1)
• IPv4 path MTU discovery has limitations for tunnels:
• ICMPv4 “packet too big” (PTB) messages dropped by
middleboxes – result is undiagnosable black hole
• PTB messages returned to the tunnel near-end (TNE) can’t be
translated into PTBs to send back to the original source
• PTB messages easily forged by off-path attackers
• does not work in the presence of multi-MTU subnets, i.e., lasthop router cannot know the MTU of the tunnel far-end (TFE)
CHALLENGE: TNE CANNOT BLINDLY ADMIT BIG
PACKETS INTO THE TUNNEL WITH DF=1
Tunnel MTU Issues (2)
• Unmitigated IPv4 fragmentation is harmful:
• Existing TNEs have no way of knowing the Effective MTU to
Receive (EMTU_R) of the TFE
• Existing TNEs have no way of knowing the reassembly
timeout value used by the TFE
• Slow-path processing in fragmenting middleboxes
• TNE has no way of controlling NATs that rewrite ip_id
• IP fragment misassociations at TFE can cause undetected
data corruption
CHALLENGE: TNE CANNOT BLINDLY SEND BIG
PACKETS INTO THE TUNNEL WITH DF=0
Goals
•
•
•
•
•
•
•
Robust support for packets of various sizes
Maximize Packet Delivery Ratio
Manage fragmentation if necessary
Avoid in-the-network fragmentation
Avoid reassembly misassociations at TFE
Coexist with end-to-end MTU determination
Support larger MTUs
Solution: SPRITE-MTU
• UDP Echo service for tunnel MTU discovery
• Soft state management to track tunnel parameters (per
RFC2003)
• Explicit Congestion Notification for robust operation
over tunnels with small MTUs
• Improves operating conditions for end-to-end path
MTU determination (RFC4821)
RESULT: DISCOVERS TUNNEL MTU AND MINIMIZES
NUMBER OF FRAGMENTS PER PACKET
(PREFERABLY DOWN TO 1)
Relevant Elements of Normative Specifications
• RFC2003 (IPv4-in-IPv4 Encapsulation)
• Basic encapsulation/decapsulation specifications
• Inner packet fragmentation when DF=0 and packet larger than
the TFE’s EMTU_R
• Setting of DF
• Tunnel Soft State
• Sending packet while also returning PTB
• RFC4213 (IPv6-in-IPv4 Encapsulation)
• Basic encapsulation/decapsulation specifications
• Conceptual sending algorithm
• “Configuration knob” threshold for determining when an
outer packet is fragmentable
Configuration Knob for Fragmentable Outer Packets
• Two purposes: 1) avoid TFE receive buffer overrun, 2)
avoid/minimize fragmentation on the TNE->TFE path
• Below threshold, admit packets into tunnel without
returning PTBs (TFE may need to reassemble)
• Above threshold, admit packet into tunnel and return
PTB if packet is larger than cached MTU
• Minimums are 1280bytes for IPv6 (MUST) and 576bytes
for IPv4 (SHOULD)
• May be set to larger values based on knowledge of:
1) TFE’s EMTU_R, 2) other encapsulations that may
occur on the TNE->TFE path
• Ideally, push configuration knob up to 1480 (or better
yet 1500) – but not always possible
Setting the Configuration Knob (Assuming ENCAPS=20)
1280
….
1500
• 1280: safest option
• 1280 – ~1380: probably
safe for most paths
• 1380 – 1480: safe only if
little/no additional encaps
• 1480 – 1500: only safe if
path has larger-than-1500
MTU and TFE has largerthan-minimum EMTU_R
• optimizing down to the byte
level not always possible
Setting DF
• Set DF=1 in all packets larger than threshold
• Set DF=1 even if TNE fragments packet before sending
into tunnel
• MAY set DF=0 to increase PDR and avoid spurious
PTBs, but if so must use pacing and/or soft state
feedback to manage fragmentation
Sending Big Packets into Tunnel
• If packet is no larger than the tunnel’s probed MTU
(initially set to the configuration threshold) send
packet into tunnel with DF=1
• If packet is larger, send packet into tunnel with DF=1
but also send PTB back to source
• Sending packet increases PDR and also allows end-to-end
MTU determination (RFC4821) to determine actual MTU
• Sending PTB alerts RFC4821 nodes that there *may* be an
MTU restriction
What if it Might be Fragmenting?
• Institute pacing until pathMTU to TFE is probed
• If probed size is no smaller than configuration
threshold, relax pacing
• If probed size is smaller than configuration threshold,
or no probes returned, synchronize soft state with TFE
• Worst case: fast links with small MTUs on TNE->TFE
path (need to carefully monitor TFE’s reassembly)
Soft State Management Protocol
• TNE creates soft state and sends initial sprite to TFE
using TFE’s on-link link local address as destination
• TNE is asking TFE to synchronize state
• TFE sends reply using its current sprite address as
source
• no soft state created yet – avoid buffer attacks
• TNE sends sprite using TFE’s current sprite address
as destination
• TFE creates soft state; begins monitoring received packets
• TNE and TFE continuously exchange sprites while
packets are actively using the tunnel
Sprite-mtu Checksum
• “sprite-mtu checksum” sums every 10th byte of the
packet using the Fletcher-16 algorithm
• While synchronized, TNE includes trailing sprite-mtu
checksum
• TFE checks checksum and discards packet if
checksum disagrees
Explicit Congestion Notification
• TNE sets ECT(0) or ECT(1) codepoint in its sprites
• When TFE detects incorrect sprite-mtu checksums, it
begins setting CE codepoint in its sprite replys
• TNE institutes pacing while receiving sprite replys with
CE codepoint
• TNE relaxes pacing when CE codepoint no longer set
Futures
• IEEE 802.3as Frame Expansion
• larger than 1500 MTUs for 802.3 links
• may allow setting configuration threshold to > 1500
•
•
•
•
Larger EMTU_Rs for tunnel endpoints (up to 2KB)
Gigabit Ethernet 9KB jumboframes
Widespread use of sprite-mtu
Widespread use of RFC4821
TODO
• Some encapsulations dangerous with any level of
outer fragmentation – e.g., Teredo (IPv6/UDP/IPv4)
• NATs re-write ‘ip_id’
• ‘ip_id’ collisions when multiple nodes behind NAT talk to the
same TFE
• solution: “UDP Fragmentation for Teredo” (draft to be written)
• Use ICMP echo request/reply as fallback if TFE does
not implement sprite-mtu (is it worth it?)