Introduction

Download Report

Transcript Introduction

CS 640
Introduction to Computer Networks
Fall 1999
Professor Landweber
Copyright 1999, Lawrence H. Landweber
Copyright 1996, Larry Peterson (slides marked with
@ in lower left corner and those with white background
with the exception of the OSPF and TCP dialog slides
which are from IETF RFCs)
Computer Communications Network
(network, computer network)
Distributed system with:
• Transmission media (coaxial cable, twisted
pair (cat3, cat5), glass fibre, wireless-ether)
• Switching elements (switch, router, node,
hub, bridge, gateway …)
• Hosts (computer, server, workstation endsystem, node)
• Protocols (physical, link, network,
transport, session, presentation, application)
Communications Medium
• Connects switching elements to each other
and to hosts
• Transports digital data stream
• Link
– point-to-point - connects single host to
switching elt or switching elt to another
switching elt
– multipoint or broadcast - connects one or more
hosts and switching elts to each other
Units - Orders of Magnitude
• kilo (thousand), mega (million), giga
(billion), tera (trillion… thousand billion),
peta (10**15), exa (10**18)
• Till early 1980s high bandwidth meant
56Kbps - today Gbps is high speed
• A commercial fibre can transmit 800Gbps
(80 data streams at 10Gbps - DWDM Dense Wave Division Multiplexing)
• Research state of the art is 3.2Tbps (80 data
streams at 40Gbps) for a single fibre
Switching Elements
• Responsible for the transport of data across
a network - routing
• May support congestion handling or quality
of service mechanisms
• Examples
– routers - IP
– switches - ATM or local area networks (e.g.,
Ethernet-802.3, token ring)
– bridges, hubs - local area networks
Network Types - Direct Connectivity
• point-to-point
• mulitpoint /broadcast
Ethernet
Token Ring
Also satellite/radio
Network Types - Indirect Connectivity
link
• switched networks
host
• internetworks
backbone
switching
element
Protocols
A protocol is a set of conventions that
specifies the rules or parameters for
communication between two entities (hosts,
processes, switching elts, devices, etc).
Protocols specify conventions for:
• The type, semantics, and format of
information to be conveyed between entities
• Establishing/closing connections or sessions
• Routing across a network or internetwork
• Flow control: speed matching between
entities
• Reliability: error detection, handling,
correction
Protocols also specify
• Data and object types and their
representation
• Session characteristics (e.g., checkpointing,
encryption-security, compression)
• Congestion detection and control
• Quality of service parameters and handling
The Basic Idea: Packet Switching
Data networks use packet switching instead of
circuit switching, the method traditionally
used in the telephone system. Packet
switching was first invented in the 1960s as
a method for building a survivable
communications system in a battlefield
environment.
Circuit Switching - Telephone System
Model
• Dedicated bandwidth (capacity) reserved
along fixed path from caller to called for
duration of conversation (signalling -setup)
• Several conversations can share a link…but
each is allocated fixed fraction of link bw
• Bandwidth not shared so during periods of
silence, it is wasted
• If dedicated bandwidth not available for a
call, then the call not completed (blocking)
Circuit Switching
Circuit Switching
Circuit Switching
Circuit Switching
Path is fixed - all data follows this path
Bandwidth / capacity reserved on all links
Data flows continuously - NO PACKETS
BW unavailable --> BLOCKING
Packet Switching - Data Network Model
• Each data stream divided into chunks,
called packets; each packet sent separately
into the network; packets may follow diff
paths from source to dest
• Each packet has a header and possibly a
trailer that contain information as to its
source and destination
5
4
Packet
Header
3
Payload
2
1
Packet Switching - Data Network Model
• Bandwidth not reserved but is shared
between data streams
• Complete packet arrives at sw elt before it
is forwarded to next sw elt; store / forward
• Store and forward systems must deal with
allocation of buffers in sw elts and with
congestion that may result from too many
packets arriving in too short a time
• If a buffer not available when a packet
arrives at sw elt, a packet is discarded (may
not be one that just arrived)
Packet Switching - Datagrams and
Virtual Circuits
• Examples of packet switching ( i.e., S&F,
buffers, congestion, no link bw reserved)
• Virtual Circuit
– must set up path - vc (conn estab - signalling)
– path fixed for all packets of a data stream
– after setup, header need only specify vc/path
• Datagram
– packets for data stream may travel on diff paths
– headers must include full address of dest
Multiplexing
• Share network resources (sw elt and links)
among multiple users
• TDM - time div multiplexing (Alohanet,
satellite)
• FDM - frequency div multiplexing (CATV)
@
TDM
• Slots can be statically or dynamically
assigned; assignment decision can be
centralized or distributed
• Circuit switching can use TDM - e.g., T1
• Statistical TDM - packet switching
– packets from different sources interleaved
– buffer contending packets
– order of service may be FIFO or other
A B C A A C B
TDM - Circuit Sw. - T1
8 bits
7 data
1 parity
125 microseconds - each with 8 bits (digital) for
24 calls/conversations in parallel
Each call gets 8 * 8000 bits per second or 64Kbps
Total BW is 64Kbps * 24 + some admin bits
or about 1.53Mbps
T3 or DS3 = 28 T1 or ~45Mbps
E1= 2Mbps
What Can Go Wrong?
• Signal errors - noise - leads to garbled bit(s)
– CRC, checksum
• Packet loss due to congestion
– sequence numbers, acks, timers, retransmission
(voice vs. file transfer)
• Packets delayed, garbled, duplicated
• Link or sw elt failure
• Depending on application, protocols may
have to compensate for or correct the above
@
Performance - Bandwidth
• Amount of data that can be transmitted per
unit time (Kbps, Gbps, etc)
• Related to bit width / coding
• BW of link vs. end-end throughput
• Throughput determined by
@
–
–
–
–
bw of links on path
sw elts proc speed
congestion-queuing time
transmission rate of sender
Performance - Latency Delay
• Time to send message or single bit from A to B
• Components
– propagation - signal speed in medium - length of links
– transmission / processing time at sender and sw elts
– queuing time at sw elts and at entrance to network
• Speed of light
– 3.0 * (10**8) meters / second in a vacuum or ~1 foot
per nanosecond (billionth of a second)
– 2.0 to 2.7 * (10**8) meters per second in various media
– latency in LAN vs. terrestrial WAN vs. satellite
@
Latency vs. Bandwidth
• Independent
• Bandwidth - Delay Product
– amount of data that the network can store
between source and destination
– if RTT used for delay, then amount that can be
sent before sender can receive an ack
– example: 100ms RTT and 45Mbps: bw-delay
product is 560Kbytes
• Jitter - difference in time between arrival of
successive packets in a data stream
Protocol Architecture - Layering
• Use layers to localize protocol functionality
• Protocol that implements a layer has
interfaces to:
– higher level protocol and lower layer protocol
• services to / from (service interface)
• encapsulation / decapsulation
• *** connection oriented or connectionless
– peer protocol
• messages exchanged with peer protocol entity
• control information in header / trailer
Interfaces
service
interface
Protocol
Entity
Protocol
Entity
peer-to-peer interface
service
interface
Connection-oriented Service Interface (CO)
• Users of peer protocol entities can assume
there a direct physical connection or wire
between them (connection setup required)
• Data sent by one user is received by the
peer user in same order as it was sent by
first user
• Definition of CO Service Interface is indep
of its implementation in network (vc/dg)
• May or may not be reliable (TCP vs. ATM)
Connectionless Service Interface (CL)
• No notion of a connection between users of
peer protocol entities
• CL service interface promises best effort
delivery of data from one user to the other,
data may or may not reach the peer user and
may or may not be delivered in order sent
• AGAIN NOTE that this says nothing about
the implementation method within the
network
More on Service Interfaces
• CO, CL refer to service interfaces NOT to
the protocols used to implement the service
• The spec of a service interface does not
imply anything about how the interface is
implemented by a protocol
• It also does not imply anything about how
data is transported across the network
• THIS DISTINCTION IS OFTEN
BLURRED
application
presentation
OSI
Open Systems
Interconnection Model
7 Layers
session
sw elts in network
transport
network
link
physical
end system
end system
OSI Model
• Model of protocol functionality (CCITT and
ISO - Int. Org. for Standardization)
• Just a model … does not specify details of
protocols and interfaces or their
implementation
• Different protocols may conform to the
model but not interoperate
• Useful as a way of thinking about protocols
- a taxonomy of protocol functionality
Layers
• Physical - what happens on wire - coding of
bits, electrical or optical characteristics,
config of pins, connectors and other
hardware components
• (Data) Link - handles transfer of data
between two or more entities connected by
a single link or hop. Service interface may
be CO or CL. May or may not be reliable.
(X.25 Level 2, IEEE 802.x)
Layers
• Network - Transfer of data across a
network. Includes switching (dg, vc),
routing, congestion avoidance and control,
network wide addressing. (IP addresses,
IP/ICMP, X.25 Level 3, ATM, BGP, OSPF)
• Transport - Process to process transfer of
data. Connectionless (UDP) or connectionoriented service (TCP) interface.
Layers
• Session - conventions for sessions. E.g.,
checkpointing, half or full duplex, security.
• Presentation - syntactic conventions for
communication between apps, encoding
rules and file formats.
• Application - includes session and
presentation in the Internet protocol
architecture
Internet Protocol Architecture (IETF)
FTP
SMTP
Telnet
TCP
HTTP
UDP
RTP
IP
802.2
X.25
802.3
802.5
OSPF
Link Layer
• Two nodes (hosts, sw elts) connected by a
single link
• Framing and packet format
• CO or CL service interface
• Reliability
• Flow control
Link Layer - Framing
• Link layer packet = frame
• Problem: how to recognize beginning and
end of frame?
• Three methods
– byte counting (DDCMP)
– byte stuffing/oriented (BISYNCH, IMP-IMP)
– bit stuffing/oriented (X.25 Level 2, 802.x)
Byte Counting
• Include count field giving number of bytes
in frame
• Problem: what if byte field is corrupted?
• Might be able to catch with CRC… but only
if max length assumed
SYN…Count Header Body
@
CRC
Byte Stuffing/Oriented
• Special characters used for control
• Must distinguish special characters when
used for control from when in data
• Use special data link escape character, DLE
• E.g., STX = start of text… use DLE STX
when STX is in data (stuff DLE before STX
if it appears in data (also for SOH, ETX,...)
• If DLE in data, send DLE DLE
Byte Stuffing Problems
• Dependence on fixed character set
• Must examine every byte of data on sending
and receiving (insert / remove DLE)
• Was used widely in IBM bisynch (at
9600bps)
Bit Stuffing/Oriented
• Frame beginning and end marked by special
bit string …. usually 01111110
• If 5 1’s in data to be sent, sender inserts 0
• If receiver sees 5 1’s check next bit(s)
– if 0, remove it (stuffed bit)
– if 10, end of frame marker (01111110)
– if 11, error (7 1’s cannot be in data)
• Can be done fast in hardware
FLAG Header Body …
CRC FLAG
Error Detection
Cyclic Redundancy Check (CRC)
• Bitwise binary arithmetic
0+0=1+1= 0
0+1=1+0= 1
• Addition and subtraction are same operation
• Example
11011000
+/- 10001001
01010001
CRC
• Represent n bit message as an n-1 degree
polynomial
• Message M = 101101101
M(x) = x**8 + x**6 + x**5 + x**3 + x**2 + 1
• Special generator polynomial, C(x), degree k
• C(x) chosen to have special properties - this is
key to the method working well
CRC - Method
•
•
•
•
•
•
•
Pad M with k 0s yielding M’ (k = deg C(x))
Represent M’ as M’(x)
R(x) = rem ( M’(x) / C(x))
P = M’ + R (P(x) = M’(x) + R(x))
Transmit P (P(x))
At dest, receive P(x) + E(x)
Compute rem (P(x) + E(x)) / C(x)
– If 0, either message rec ok or C(x) divides E(x)
– If not 0, then error in transmission
Shift Register
• Consider C(x) = x**3 + x**2 + 1
• M(x) = X**7 + x**4 + x**3 + x
• M’ = 10011010000
1
X**3
+
0
0
X**2
X**1
+
1
1010000
1
The shift register performs long division
using bitwise operations - M’ / C
Shift Register
• Consider C(x) = x**6 + x**5 + x**3 + x +1
• M’ = 10110110101000000
0101000000
1
X**6
+
0
1
X**5
X**4
+
1
0
X**3 X**2
+
1 + 1
x
The shift register performs long division
using bitwise operations - M’ / C
1
Properties of C(x): Should not divide E(x)
for common errors
• Single bit errors
– non-zero x**k, x**0 terms
• Double (2 consecutive) bit errors
– C(x) has factor with at least 3 terms
• Odd number errors: contains factor (x+1)
– consider E(x) with odd no. terms - so E(1) = 1
– but if (x+1) is factor of E, then E(1) = 0
– so if odd number of errors, E cannot have
(x+1) factor
More on C(x)
• Burst error - at least first and last bits of
burst are in error
– if length of burst is <= k, the degree of C(x),
then burst error is detected
– C(x) with degree k cannot divide E(x) with
degree less than k
• High probability of detecting burst errors of
length greater than k… but not 100%
Reliable Transmission
• When CRC detects error, frame discarded
• Corrected by link layer retransmission ARQ (Automatic Repeat reQuest)
– mechanisms include: frame sequence numbers,
acknowledgements (acks), negative acks (naks),
selective acks (sacks), timers / timeout
– often implemented by sliding window
• Also used by transport layer - but link impl
simpler - wire connects protocol entities
Sliding Window - Functions
• Put data in order for delivery to user of
service interface
• Recovery from losses
– retransmission
• Flow control - prevent sender from
overrunning receiver’s buffers
– link - SWS usually fixed, RWS often = 1
– credits - signal from receiver to sender - used
with transport protocols
Timer / Retransmission Strategies
• Timers for link layer
– 1 timer for each frame
– 1 timer for link
• Retransmission on timeout
– retransmit frame whose timer goes off
– retransmit earliest unacked frame
– retransmit all frames from earliest unacked
frame - go back n
• Retransmit on nak or sack
Sliding Window Examples
• Principle
SWS + RWS <= MaxSeqNum (size seq # space)
• MaxSeqNum = 8, SWS = 4, RWS = 4
Sender
0,…,3
Receiver
0,…,3 accepted
lost
resend old 0,…,3
ack 0,…,3, NFE = 4
all rejected-no overlap
with expected frames
Problem occurs if resent frame overlaps with expected
new frame. Not possible if above principle observed
Sliding Window Examples
Sender
0,…,4
Receiver
0,…,3 accepted, disc 4
lost
resend old 0,…,4
acc 4, disc 0,1,2,3
lost
resend old 0,…,4
ack 0,…,3, NFE = 4
ack 4, will acc 5,6,7,0,
NFE = 4
acc 0, as new but is retrans
Problem occurs because resent frame overlaps with
expected new frame.
Bit Oriented Data Link Protocols
• ANSI Standard - ADCCP, Advanced Data
Communications Control Procedures
• Choices of functionality and operation
• Designed to operate
–
–
–
–
over point-point or multi-point links
in 2-way alternate or 2-way simultaneous mode
over terrestrial and satellite links
when communication is between logical equals
or non-equals
Station Types
• Primary
– flow control
– error recovery (other than retransmission)
– initialize and terminate link
• Secondary
– simpler than primary
commands
secondary
primary
responses
Data Transfer Modes
• NRM - Normal Response Mode
– one primary, multiple secondary stations
– polled multipoint, secondary often a device
• ARM - Asynchronous Response Mode
– one primary, one secondary station
– secondary can send at will as responses
– point-point - not good for multipoint
• ABM - Asynchronous Balanced Mode
– two combined stations (both issue commands
and responses)
Frame Format
F
A
C
Info
FCS
F
F = 01111110 (flag sequence)
A = address of
receiver if command
sender if response
all 1s = broadcast
C = 8 or 16 control bits - type of frame + functions
FCS = CRC
Control Fields
• All can be commands or responses
• Information transfer
– 0 N(S) P/F N(R)
• Supervisory (4)
– 10 SS P/F N(R)
• Unnumbered (32)
– 11 MM P/F MMM
• N(S) and N(R) are 3 or 7 bits; P/F is 1 bit
P/F Bit
• NRM - P=1, poll secondary, secondary sets
F=1 in last response frame
• ARM/ABM - P=1 is sent by primary asking
secondary to set F=1 in next response
• P/F bit used for synchronization
The Sliding Window
N(R), N(S)
• N(S) is the sequence number of the
information frame that contains it
• N(R) acks all info frames with seq numbers
< N(R); i.e., N(R) = NFE
• Sequence number space is 7 (3 bits) or 127
(7 bits)
• With X.25 L2, receive window is always 1
so frames only accepted in order
Miscellaneous
• To abort a frame (so it will be recognized as
not correct)
– append incorrect FCS or
– transmit 7 1s
• 15 ones transmitted means that station is
giving up control of the link
Supervisory Frames (4)
• Contains N(R), ack to N(R)-1
• RR - receive ready - ready to rec I frame
– clear busy condition (RNR)
• RNR - receive not ready - busy condition
– received frames may be discarded
• REJ - reject - ask retrans from N(R)
– clear busy condition
– cleared when frame with N(S) = N(R) of
rejected frame is received
• SREJ - selective reject - ask retrans of N(R)
Selected Unnumbered Frames
• SABM - set asynchronous balanced mode link establishment (also SARM, SNRM)
• DISC - disconnect
• UA - unnumbered ack, ack unnumbered
command
• DM - disconnect mode - request set mode
command, reject set mode
• FRMR - frame reject - bad control field or
bad I frame length or invalid N(R)
Link Initialization
SABM P=!
DTE
DCE
UA (yes) Or DM (no)
F=1
DM F=0
SABM
Timers / Constants
• T1 = frame retransmission timer
• T2 = may delay timer
• T3 = idle channel timer - inform protocol
user
• N1 = Max I frame size with headers
• N2 = Max # retransmission attempts
• K = max number of outstanding frames (<=
7 or <= 127) - pipelining
Unbalanced Normal Class (NRM)
UNC 3,4
Commands
Primary
I
RR
RNR
SREJ *
SNRM
DISC
UI
*Receive window can be > 1
Responses
Secondary
I
RR
RNR
SREJ *
UA
DM
FRMR
UI
Primary
B, SNRM,P
B,RR0,P
B,I00
B,I10
B,I20,P
B,RNR3
B,I33,P
Secondary
B,UA,F
B,RR0,F
B,I03
B,I13
B,I23,F
B,I34
B,I44,F
B,RNR5,P
B,RR4,F
B,DISC,P
B,UA,F
Balanced Asynchronous Class (ABM)
BAC 2,8
Commands
Primary
I
RR
RNR
REJ *
SABME
DISC
Responses
Secondary
RR
RNR
REJ *
UA
DM
FRMR
*Receive window = 1, only receive in order,
REJ requests resend of ALL sent, beginning with lost one
Local Area Networks
• Standardized by IEEE as 802.x
• A variety of Media Access Control (MAC)
protocols and associated topologies
–
–
–
–
–
802.3, Ethernet, CSMA/CD (3u, 3z)
802.4, Token Bus
802.5, Token Ring
802.11, Wireless
DQDB, etc
• With 802.2, comprises link layer
802.x Model
IP or other network protocol
Link
Layer
LLC Logical Link Control - 802.2
MAC Media Access Control - 802.3, 802.4, ...
PHY Physical Layer - 802.3,802.4,...
medium
Hub
Ethernet
Token
Token Ring
FDDI
Collision
Domain
Computer
802.2 - Logical Link Control
• PDU - Protocol Data Unit
DSAP SSAP Control Information
Addresses are 8 bits - all 1s = broadcast
DSAP = Destination Service Access Point, address
leftmost bit for Group or Individual
SSAP = Source Address
leftmost bit for Command or Response (a la X.25 L2)
Control = 16 bits if seq no included, else 8 bits
LLC Types / Classes
• There are two types of operation
– Type 1 - provides connectionless service
• no flow control or error control
• no connection setup
– Type 2 - provides a connection-oriented service
• similar to X.25 L2 (BAC 2,8)
• I frames can be responses
• There are also 2 classes of procedure
– Class 1 - includes Type 1 only
– Class 2 - includes Type 1 and Type 2
Commands/Responses
Type 1
Commands
UI
XID
TEST
Type 2
I
RR
RNR
REJ
SABME
DISC
Responses
XID
TEST
I
RR
RNR
REJ
UA
DM
FRMR
Details - Type 1
•
•
•
•
Every LLC on a LAN supports Type 1
UI frames used for data transfer
Does not use XID
TEST used by one entity to see if other still
is “alive”
Details - Type 2
• I frames used for data transfer
• Seq number space is 7 bits
• Receive window is1, frames only accepted
in order
• Exchange Information (XID) is used by
receiver to tell sender the max amount, k,
that the sender can send beyond the most
most recently received ack (N(R)). This
bounds pipelining.
Ethernet - 802.3
• Ethernet developed by Xerox in mid 1970s
• Basic ideas from AlohaNet packet radio
project
• Ethernet standardized by Xerox, DEC, Intel
in 1978
• IEEE later standardized as 802.3 - at MAC
layer differs in one header field from
Ethernet
• 10, 100, 1000 Mbps
CSMA/CD
• Carrier sense - stations sense whether there
is traffic on the wire
• Multiple access - more than one station can
simultaneously be connected and contend
for the right to send
• Collision detection - individual stations
detect when there has been a collision
(transmissions from two stations interfere
with each other) and react appropriately
Classical Ethernet (Thick-Net)
• Standardized as 802.3, 10Base5
• Coaxial cable for medium
• Parameters - max
– segment length = 500m, 2 repeaters on path
– length = 1500m, 1024 nodes
• Also
– 10Base2 (thin-net, 200m)
– 10BaseT (twisted-pair, star config, 100m)
– 10BaseF (fiber, 1-2KM, FP,FB, FL)
CD 4
Repeater
Multi-Port
Bridge
CD 1
Repeater=Hub
Bridge=Switch
CD=Collision Domain
Repeater
CD 2
CD 3
Repeater
Each CD includes
a MP Bridge Port
802.3 Frame Format
Preamble-7 StartDel-1 DstAddr-6 SrcAddr-6 Length-2 Payload FCS-4
StartDelimiter = 10101011
Dst and Src Addr = 6 bytes from flat address space
first bit is individual/group
Length = length of LLC packet in payload
max is 1500 bytes
Ethernet uses this field to specify upper layer protocol,
values are over 1500
*802.3 uses 802.2 to select upper level protocol
Payload is LLC packet plus PAD (min payload is 46 bytes)
so min packet length is 64 bytes (without preamble,
and start delimiter
LLC
TransDataEncaps
RecDataDecaps
Framing
Addressing (SRC, DST)
Error detection (FCS)
M
RecMediaAccess
A TransMediaAccess
Management
C Management
Medium allocation (collision avoidance)
Contention Resolution (collision handling)
TransDataEncoding
RecDataDecoding
Bit transmission/reception
P
Collision detection
L
Carrier sense
S
Preamble generation/detection
Transmission Without Contention
• Client layer (LLC) sends transmission
request to MAC
• Construct frame, compute FCS
• Monitor carrier sense signal
– if channel clear, initiate frane transmission after
interframe delay (9.6 us) - send bits to PLS
• Transmit encoded preamble, encode and
transmit bits
• Listen for collision
Reception Without Contention
• Synchronize with preamble, convert signal
to binary
• Set carrier sense while receiving bits
• Check address - if for this station or station
is in promiscuous mode, unpack frame, pass
data to client
• Compute FCS and send status code to client
Collision Handling on Sending
• If collision detect signal turned on while
transmitting
• Stop sending frame and transmit jam signal
(32 bits, special pattern) to insure collision
is detected by all stations
• Schedule retransmission after back-off time
Collision Handling on Reciving
• Smaller than minimum size packet is
detected and discarded (jam detected)
• Terminated transmitted frame plus jam must
be smaller than minimum size packet
Retransmission
• When station detects collision on
transmitting, backoff/retransmission is
scheduled as follows
• For n th retransmission, randomly choose r
in the range
– 0 < = r < 2**k, where k=min(10,n)
– wait r slot times before retransmission where a
slot time is 512 bit times (64 * 8) or 51.2us for
10mbps
• Truncated exponential backoff
Collision Detection
• To insure collision detection, enough bits must be
sent so that if other sender is maximally distant, its
first byte arrives while first sender is still sending
• Consider 10Base5
– max length = 1500m
– sender must continue sending for number of bit times
necessary for data to travel 1500 * 2 = 3000m
– at 10**7 bits/sec, assuming 2/3 speed of light in
medium, each bit occupies 20m so 150 bits must be
sent to deal with propagation time, plus additional bits
to deal with other delays, but min packet size =512 bits
– the jam is a special 32 bit pattern that allows all stations
to learn that that there has been a collision
Higher Bandwidth 802.3
• 802.3u, 802.3z standardize 100, 1000Mbps
• Bridges (switches) can be connected to
collision domains with different bandwidths
• To insure collision detection must maintain
relationship between
–
–
–
–
802.3 bandwidth
minimum frame size
length of combined media
e.g., higher bandwidth, same min frame size,
must have smaller length
Parameters
Parameters
10Mbps
100Mbps
1000Mbps
slotTime
interFrameGap
jamSize
maxFrameSize
minFrameSize
extendSize
512 bT
9.6us
32
1518 B
64 B
0B
512 bT
9.6us
32
1518 B
64 B
0B
4096 bT
.096us
32
1518 B
64 B
448 B
CD 4 (shared 10Mbps)
CD 3
Repeater
Multi-Port
Bridge
Repeater
Repeater
CD2
(dedicated 100Mbps)
CD 1(shared 100Mbps)
1Gbps (dedicated)
CD1
Multi-Port
Bridge
Multi-Port
Bridge
Repeater
100Mbps (shared)
CD4
Repeater
100Mbps (shared)
CD3
1Gbps (dedicated)
CD2
Multi-Port
Bridge
DSL - Digital Subscriber Line
• Digital copper loop transmission
technology and associated equipment
• Connection between subscriber and telco
central office or concentrator - local loop
access network
• Achieves broadband speed (up to Mbps)
over twisted pair
Terminology
• ATU - ADSL (Asynchronous Digital
Subscriber Line) Transceiver Unit
• CSU - Channel Service Unit
• DSU - Digital Service Unit
• CPE - Customer Premise Equipment
Current Telco Infrastructure
• Terminology
–
–
–
–
–
–
ILEC - Incumbent Local Exchange Carrier
CLEC - Competitive Local Exchange Carrier
PTO - Public Telephone Operator
CO - Central Office
DLC/RT - Digital Loop Carrier/Remote Term
MDF - Main Distribution Frame - where local
loops terminate
– DSLAM - DSL Access Multiplexer
The Access Network
• Typically, cable bundles with up to
thousands of twisted wire pairs each of
which terminates at a customer premise
• Long loops
– signals dissipate/weaken - lower signal to noise
– solutions for loops > 18K feet
• loading coils - every 6K feet, amplify signal, not
compatible with DSL
• DLC - concentrator, terminate twisted pair and
backhaul to CO with T1/E1, fiber or copper,
terminate DSL which needs end-end twisted pair
Transmission Frequency
• Higher frequency enables greater BW
• POTS - 0 to 3600 Hertz (cycles per sec)
– can also support modem up to 56Kbps (analog)
• DSL uses higher frequencies
• Problems
– attenuation - dissipation of signal as it traverses
copper wire
– crosstalk - interference between signals on
different copper wires
Attenuation
• Higher frequencies attenuate energy faster so shorter loop distance possible
• Solution
– use lower resistance wire - thicker gauge
– but telcos have used thinnest possible wire
because thicker wire more expensive to
buy/install
Crosstalk
• Electrical energy transmitted on wire
radiates energy to adjacent wires in bundle
• Interferes more with similar frequencies
• Less of a problem for lower frequencies,
e.g., those used with POTS
• Solutions for DSL
– two pair instead of one pair so lower freq can
be used to get desired BW
– use different frequencies upstream/downstream,
lower upstream because more crosstalk at CO
end - many lines converge
Some DSL Types
• HDSL - High Bit Rate DSL, 2 pair, T1/E1,
symmetric
• SDSL - Symmetric DSL, 1 pair, multiple
data rates to T1/E1
• ADSL - Asymmetric DSL, 1 pair, up to
6Mbps
• RADSL - Rate Adaptive DSL, 1 pair, up to
12Mbps, asymmetric or symmetric
Issues
• Frequencies to use for up/down stream
• Desired max distance - subscriber/DLC,
about 18K feet is usual upper bound
• Coding of bits
• NON ISSUE - compatibility with other DSL
systems - DSL is point-point and single
organization controls both ends
Network Layer
A variety of different types of protocols
• Access / service interface
– connection-oriented (X.25 Level 3, ATM) or
connectionless (IP)
• Within network
– switching method - DG (IP), VC (ATM)
• Routing - OSPF / BGP (IP)
• Congestion detection / handling - RED (IP)
• QoS - TM4.0 (ATM), RSVP-DiffServ (IP)
Routing
• Problem: Find path from a source to a
destination through a network (internet)
• Possible path criteria
–
–
–
–
–
–
–
correctness - packets sent to correct destination
simplicity
stability - converge to equilibrium)
robustness - operates in presence of failures
fairness - serves all users - can prioritize
optimality - minimize cost of path
QoS - paths depend on app requirements
Cost of Link
• Capacity / bandwidth
• Type - satellite / land
• Error rate
• Delay
• Security
Cost of path = sum of costs of links on path
Some Issues for Routing Protocols
• Who decides route
– centralized - Routing or Network Control
Center (RCC or NCC)
– distributed - individual sw elts
• When are routes changed
– fixed or static or non-adaptive - at system
startup time by administrator
– adaptive or dynamic - with each DG, with each
VC, weekly, ??
More Issues for Routing Protocols
• Changes based on
– measurements or estimate of traffic over some
period
– network topology - links, sw elts up/down,
added/removed
• Algorithm used
– distance vector (centralized or distributed)
– link state (distributed)
Process for Changing Routes
• Determine relevant parameters - topology,
delay, queue lengths, traffic intensity
• Send information to place(s) making
decisions (sw elts or RCC/NCC)
• Place receiving info computes routing tables
• Routing tables distributed to sw elts (if
NCC)
Distance Vector Routing
• Each sw elt maintains set of triples
– (Desination, Cost, NextHop)
• Each sw elt sends updates to (receives
updates from) directly connected neighbors
– periodically (on the order of several seconds)
– when its table changes (triggered update)
• Update is list of pairs - (Destination, Cost)
• Update local table if receive “better” route
• Refresh existing routes; delete if time out
1
D
A
3
B
1
2
F
8
E
1
Database at
C
A
B
C
D
E
F
-3B
-1D
---
3A
-8C
1D
---
-8B
--4E
--
1A
1B
---2F
--4C
--1F
---2D
1E
--
4
Destination
A
B
C
D
E
F
Distributed algorithm, each column at different sw elt
Centralized algorithm, entire table at RCC
1
D
2
F
A
3
B
1
8
E
1
Database at
C
A
B
C
D
E
F
-3B
-1D
---
2D
-8C
1D
12C
3D
-8B
--4E
--
1A
1B
---2F
-12C
4C
3F
-1F
---2D
1E
--
4
Destination
A
B
C
D
E
F
Changes in B, E tables after one exchange
of routing infomation
Example
B
C
A
D
E
F
@
G
Routing table at node B
Dest Cost NextHop
A
1
A
C
1
C
D
2
C
E
2
A
F
2
A
G
3
A
Example - Recovery
from Link Failure
B
C
A
D
E
F
@
G
F detects that link to G has failed
F sets distance to G to inf, sends update to A
A sets distance to G to inf since it uses F to reach G
A receives update from C with 2 hop path to G
A sets distance to G to 3 and sends update to F
F decides it can reach G in 4 hops via A
Looping Problem
B
C
A
D
E
F
@
G
Link from A to E fails
A advertises distance of infinity to E
B and C advertise distance of 2 to E
B decides it can reach E in3 hops, advertises this to A
A decides it can reach E in 4 hops, advertises this toC
C decides it can reach E in 5 hops
A, B, C form a routing loop
1
A
1
B
1
C
D
1. Destination D
2. Cost from
A
3B (cost to D from A is 3, first hop is B)
B
2C
C
1D
3. C-D fails (second column of 4)
4. Successive table entries for destination D from node
A entries 3B 3B 3B 5B 5B 7B 7B…
B entries
2C 2C 4A 4A 6A 6A 8A…
C entries
1D inf 3B 5B 5B 7B 7B…
Eventually, algorithm converges but only after “count to infinity”
Problem: path to D from B uses B but B is not aware of this
Solution - Split Horizon
• If sw elt W learned route to X from Y, W
does not tell Y about its route to X
• Modification: split horizon with poisoned
reverse - use metric infinity in update for
routes that have ceased to exist
– advantage - convergence can be faster
• Can also speed up convergence by
triggering update to neighbors when a sw
elt’s routing database changes
With Split Horizon
A
B
C
D
Cost From
A
B
C
3B
2C
1D
3B
2C
inf
3B
inf
inf
inf
inf
inf
Split Horizon: do not report route to destination to the
neighbor from which the route was learned
C
Consider
A
1
B
1
D
E
1
1. Destination E
2. Cost from
A
3B (cost to D is 3, first hop is B)
B
2D
C
3B
3. B-D fails
4. Successive table entries for destination E from node
A entries
3B 3B 4C 5C 6C…
B entries
2D inf 4C 5C 6C…
C entries
3B 3B 4A 5A 6A…
Eventually, algorithm converges - after “count to infinity”
Split horizon does not help!
C
A
B
D
E
Split horizon does not help
Successive table entries for destination E from nodes
A
B
C
3B
2D
3B
3B
inf
3B
4C
inf
4A
inf
5A
inf
6B
inf
inf
Solution - Hold Down
• When a route to a destination is lost by a sw
elt, it does not accept a new route to that
destination for a certain amount of time time based on max possible path loop
• Difficulty is determining hold down period.
Too short may lead to (incorrect) acceptance
of new routes. Too long may lead to loss of
data enroute to desination
C
A
B
D
E
With sufficient hold down
Successive table entries for destination E from nodes
A
B
C
3B
2D
3B
3B
inf
3B
inf
inf
inf
Link State
• Each sw elt sends information on directly
connected links to ALL sw elts in the AD
• Contrast with distance vector - entire routing
table sent just to neighbors
• Link state packet - packet used to carry info
• Reliable flooding - hop-hop reliable (optional)
• Shortest Path First Algorithm (Dijkstra) used
to compute routes
Link State Packet - LSP
• Routing protocol packet - on Internet
encapsulated in UDP DG or TCP segment
• Includes
–
–
–
–
@
ID of sw elt that created LSP
cost of link to each directly connected neighbor
sequence number (SEQNO)
time-to-live (TTL) for LSP - max hop count or
max number of time units, as defined by
protocol
LSP Handling
@
• Each sw elt periodically generates new LSP
with increased SEQNO (set to 0 on reboot)
• LSP forwarded on all links except one on
which it arrived
• Each sw elt stores newest LSP from each
source sw elt
• TTL decremented before being forwarded
• TTL of stored LSP may also be periodically
decremented
Dijkstra’s Shortest Path First Algorithm
Goal - find shortest path from source node s to
each node in a graph
• N = set of nodes in the graph
• l(i, j)= cost (weight) for edge (i, j) (>= 0)
• seN = the source node
• M = set of nodes incorporated so far
• C(n) = cost of path from s to n
@
Shortest Path First Algorithm
M = {s}
for each n in N - {s}
C(n) = l(s,n) (inf if s not dir conn to n)
while (N =/ M)
-M = M union {w}: where C(w) is the
minimum for all w in (N - M)
-for each n directly conn to a w
C(n) = MIN (C(n), C(w) + l(w,n))
@
M
s
w added
Min dist from S
X
Assume for all nodes in M, the min distance and correct first hop
from S have been found - consider node X in N-M having min dist
from S among all nodes in N-M. Is there shorter path to X? NO, WHY?
Consider shorter path to X. If node before X is not in M, then it would
have been chosen instead of X. So X is first node outside of M on this
shorter path. But then dist to X would already be this shorter value ...
Route Table Calculation
• Each sw elt contains two lists: Tentative and
Confirmed
• Each list contains a set of triples
(Destination, Cost, NextHop)
• Note that “NextHop” is the first sw elt on
the path from the source S to Destination
@
Confirmed Tentative
S
15
3
C
10 11 2
A
1 D
3
B
1. (S,0,-)
2. (S,0.-)
3. (S,0,-)
(C,3,C)
4. (S,0,-)
(C,3,C)
5. (S,0,-)
(C,3,C)
(D,5,C)
6. (S,0,-)
(C,3,C)
(D,,5,C)
(A,15,A)
(C,3,C)
(D,11,D)
(A.15,A)
(D,11,D)
(A.13,C)
(D,5,C)
(A,13,C)
(A,6,C)
7. (S,0,-)
(C,3,C)
(D,,5,C)
(A,6,C)
8. (S,0,-)
(C,3,C)
(D,,5,C)
(A,6,C)
(B,9,C)
Distance Vector vs. Link State
• Distance vector
– does not reveal net topology to sw elt
– looping behavior with ad hoc solutions
– slow to converge
• Link state
– scales well
– can secure by authenticating origin of LSPs
– each sw elt knows topology of entire network
The Transport Layer
• Implemented in end-systems (hosts) not sw elts
• Uses network layer services
• Issues
–
–
–
–
–
addressing
CO vs. CL, reliable vs. unreliable
connection establishment
flow control
connection termination
CO vs. CL
• Transport layer services may be CO or CL ,
reliable or unreliable
• Consider CO transport service over CL
network service
• Consider reliable transport service over
unreliable network service
Connection Establishment
2-way Handshake
• Initiator sends connection request (w. id)
• Responder sends accept or reject (w. id)
• Responder assumes connection exists when
it sends the ack
• Initiator assumes connection exists when it
receives the ack
• OK if network service reliable and CO
• If not? (consider lost, old conn reqs/ack)
Deal with Old Connection Requests
• Keep track of obsolete connection ids
– problem: lose state info on crash
• Limit packet lifetime with max hop count
– problem: queuing times
• If can bound time in network for request
and ack, then can use initial seq number
based on clock to distinguish requests
– done in TCP, problem: queuing times
• 3-way handshake
3-way Handshake
• Initiator sends request with ID (ISN in TCP)
• Responder sends ack of initiator’s ID (ISN)
plus its own ID (ISN)
• Initiator sends ack of responder’s ID (ISN)
and includes its ID
• Solves old connection request problem
because only establishes connection when
both sides ack the other’s ID
Flow Control
• Similar mechanism (e.g., sliding window)
to link layer flow control
• Link layer receive window is usually fixed
• Transport protocols may include a “credit”
parameter so receiver can inform sender of
its current receive window (variable receive
window), i.e., how much it will accept
beyond what is being acked
• Consider credit of zero!!
Connection Termination
Abort
• User indicates it will not send any more data
nor will it accept any more data
• Transport protocol immediately closes
connection
• Application must insure that all data sent has
been received before issuing abort
Connection Termination
Graceful Close
• User indicate it will send no more data but
will continue to accept data
• Transport protocol does not close
connection until both users have issued
graceful close
• Application can count on transport protocol
to insure all data sent is delivered
When is a segment accepted by a
receiver?
Segment Receive Test
Length Window
-------- -------- --------------------------------------------------0
0
0
>0
>0
0
>0
>0
SEG.SEQ = = RCV.NXT
RCV.NXT <= SEG.SEQ < RCV.NXT+RCV.WND
not acceptable
RCV.NXT <= SEG.SEQ < RCV.NXT+RCV.WND
or RCV.NXT <= SEG.SEQ+SEG.LEN-1 < RCV.NXT+RCV.WND
Basic 3-way Handshake for Connection
Synchronization
TCP A
TCP B
SEQ ACK CTL
1. CLOSED
2. SYN-SENT
LISTEN
 <100>
<SYN>
 SYN-RECEIVED
3. ESTABLISHED  <300> <101> <SYN,ACK>
 SYN-RECEIVED
4. ESTABLISHED  <101> <301> <ACK>
 ESTABLISHED
5. ESTABLISHED  <101> <301> <ACK> DATA  ESTABLISHED
Simultaneous Connection
Synchronization
TCP A
TCP B
SEQ ACK CTL
1. CLOSED
CLOSED
 <100>
<SYN>
...
3. SYN-RECEIVED  <300>
<SYN>
 SYN-SENT
4.
<SYN>
 SYN-RECEIVED
2. SYN-SENT
... <100>
5. SYN-RECEIVED  <100> <301> <SYN,ACK> ...
6. ESTABLISHED  <300> <101> <SYN,ACK>  SYN-RECEIVED
7. ESTABLISHED  <101> <301> <ACK>
 ESTABLISHED
Recovery from Old Duplicate SYN
TCP A
TCP B
SEQ ACK
CTL
1. CLOSED
LISTEN
2. SYN-SENT
 <100>
<SYN>
...
3. (duplicate)
... <90>
<SYN>
 SYN-RECEIVED
4. SYN-SENT
 <300> <91> <SYN,ACK>  SYN-RECEIVED
5. SYN-SENT
 <91>
<RST>
6.
... <100>
<SYN>
7. SYN-SENT
 <400> <101> <SYN,ACK>  SYN-RECEIVED
8. ESTABLISHED  <101> <401> <ACK>
 LISTEN
 SYN-RECEIVED
 ESTABLISHED
Half-Open Connection Discovery
TCP A
TCP B
SEQ ACK
CTL
1. (Crash)
(Send 300, receive 100)
2. CLOSED
ESTABLISHED
3. SYN-SENT
 <400>
4. (!!)
 <300> <100> <ACK>
 ESTABLISHED
5. SYN-SENT
 <100>
 (Abort!!)
<SYN>
<RST>
 (??)
6. SYN-SENT
7. SYN-SENT
CLOSED
 <400>
<SYN>

Old Duplicate SYN Initiates a Reset on Two
Passive Sockets
TCP A
TCP B
SEQ ACK
CTL
1. LISTEN
LISTEN
... < Z >
3. (??)
 < X > <Z+1> <SYN,ACK>
 SYN-RECEIVED
4.
 <Z+1>
 (return to LISTEN)
5. LISTEN
<SYN>
 SYN-RECEIVED
2.
<RST>
LISTEN
Normal Close Sequence
TCP A
TCP B
SEQ ACK
CTL
1. ESTABLISHED
ESTABLISHED
2. (Close)
FIN-WAIT-1
 <100> <300> <FIN,ACK>
 CLOSE-WAIT
3. FIN-WAIT-2
 <300> <101> <ACK>
 CLOSE-WAIT
 <300> <101> <FIN,ACK>
(Close)
 LAST-ACK
 <101> <301> <ACK>
 CLOSED
4.
TIME-WAIT
5. TIME-WAIT
6. (2 MSL)
CLOSED
Simultaneous Close Sequence
TCP A
TCP B
SEQ ACK
CTL
1. ESTABLISHED
2. (Close)
FIN-WAIT-1
3. CLOSING
4. TIME-WAIT
(2 MSL)
CLOSED
ESTABLISHED
 <100> <300> <FIN,ACK>
 <300> <100> <FIN,ACK>
... <100> <300> <FIN,ACK>
(Close)
... FIN-WAIT-1


 <101> <301> <ACK>
 <301> <101> <ACK>
... <101> <301> <ACK>
... CLOSING


TIME-WAIT
(2 MSL)
CLOSED
Service Interface (to IP)
• Net Send
–
–
–
–
–
identifier
source, destination IP address
protocol (TCP)
data, data length
type of service (precision, reliability, delay,
throughput)
– don’t fragment
– options (IP source routing, time stamp, etc)
Service Interface (from IP)
• Net Deliver
–
–
–
–
–
source, destination IP address
protocol (TCP)
data, data length
type of service
options
Service Interface (from application)
• Open
– unspecified passive - source port
– fully specified passive - +dest port, address
– active, active w. data - +data length, data, psh, urg+
•
•
•
•
•
Send - lcn, data length, data, psh, urg+
Allocate (accept) - lcn, data length, data, urg+
Close (graceful) - lcn
Abort - lcn
Status - lcn
Service Interface (to application)
• Open ID - local connection name, source
port, dest port and address (if known)
• Open failure - lcn
• Open success - lcn
• Deliver - lcn, data, data length, urg+
• Closing - lcn
• Terminate - lcn
• Status response and error
When Bad Segments Arrive
• Send RST (stay in same state)
– closed state and anything arrives except RST
– listen, syn-sent, syn-received states and
incoming acks something not sent
• Send ACK - established and other
synchronized states
• Use seq, ack numbers that other side will
believe (e.g., from incoming ack or data)
TCP Throughput - Error Free
Environment
• Max TCP window/credit = 64KB
• Max TCP throughput
– cross US - 100ms RTT - 5.2Mbps
– metropolitan - 1ms RTT - 520Mbps
– local - .1ms RTT - 5.2Gbps
• Problem is the max credit of 64KB
TP4 Throughput
• Max TP4 window/credit
– (8KB per TPDU) * (32K TPDUs)
– 256MB = 2Gb
• Max TP4 Throughput
– cross US - 100ms RTT - 20Gbps
– metropolitan - 1ms RTT - 2Tbps
– local - .1ms RTT - 20Tbps
TCP Solution - Window Scaling
• Add window scale option
• Both ends of TCP connection must agree at
connection setup time to use scale factors
• Each end tells the other what scale factor it
will accept (0 if willing to use scaling for
other end but not accept scaled window)
• ActualReceiveWindow =
SegmentReceiveWindow * 2**shift-cnt
Window Scaling
• Max shift-cnt = 14
• Max window/credit = 64KB * 2**14 =
~8Gb
• Max TCP througput across US = 80Gbps
TCP vs. TP4 - Common Features
• Connection-oriented reliable service
interface for client processes
• Sliding windows with cumulative acks and
credits
• Timers to trigger retransmission
• Checksum to insure data integrity
• Three-way handshake for connection
establishment
TCP vs. TP4 - Differences
• Sequence number space
– TCP - 32 bits indexing BYTES
– TP4 - 31 bits indexing TPDUs
• Maximum packet size
– TCP - 64KB
– TP4 - 8KB
• Credit - maximum receive window size
– 64KB
– 32K TPDUs, each with max.size 8KB
TCP vs. TP4 - Differences
• Packet types
– TCP - one type with flags (segment)
• SYN, FIN, ACK, RST, PSH, URG
– TP4 - multiple packet types (TPDU)
• CR, CC, DT, AK, DR, DC, ED, EA
• Out of band data
– TCP - URG, PSH
– TP4 expedited data
• Termination
– TCP - graceful close or abort
– TP4 - abort
TCP vs. TP4 - Differences
• Checksum
– TCP - one addition for 16 bytes of data
– TP4 - two additions for 1 byte of data
• Opening connections
– TCP - passive or active open, listen
– TP4 - active open, no listen
• Connections
– TCP - one between address/port pairs
– TP4 - multiple between TSAPs with diff ref #s
The Internet Protocol - IP
Version 4
Internet Model
R
R
R
R
R
R
@
R
TCP
R
IP
IP
TCP
IP
Hub
Packet Delivery Model
• Connectionless service interface
– datagrams may not be delivered to destination
user in the same order as sent by source user
• Potentially unreliable
– datagrams sent may never be delivered
• Datagrams are used within the network
– unbounded delays, reordering, loss, duplication
within network are possible
IP Header
•
•
•
•
•
•
•
Version (4) - currently version 4
HLEN (4) - number of 32 bit words in header
TOS (8) - Type of Service (not widely used)
Length (16) - number of bytes in this DG (max 64K)
Ident (16) - used by fragmentation/reassemlby
Flags/Offset (16) - used by frag/reass (3 +13)
TTL (8) - Time to Live - number of hops before discard this
DG
• Protocol (8) - TCP = 6, UDP = 17
• Checksum (16) - of header only
@ • Destination / Source IP address (32 bits each)
Fragmentation and Reassembly
• Each network has some MTU
• Strategy
–
–
–
–
–
–
@
fragment when necessary (MTU < DG size)
avoid fragmentation at source host
regragmentation is possible
fragments are self contained datagrams
delay reassembly until destination host
do not recover from lost fragments
Indent = x
0 Offset=0
1400 bytes
Indent = x
Indent = x
1 Offset=0
512 bytes
Indent = x
1
Flag = more fragments
512
512 bytes
Indent = x
@
0
376 bytes
1024
Global Addresses
• Properties
– globally unique
– hierarchical - network + host
– each network interface has an IP address
• Four classes - A, B, C, D (multicast-111…)
7
0 net
10
110
@
24
host
14
net
21
network
12.5.120.9
16
host
138.108.12.240
8
host
205.23.198.12
Datagram Forwarding
• Strategy
@
– every DG contains destination’s address
– if directly connected destination network, then
forward to host
– otherwise, forward to some router
– forwarding table maps network number into
next hop
– each host has a default router
– each router maintains a forwarding table
Address Translation
• Map IP addresses into physical addresses
(e.g., 48 bit IEEE)
• Address Resolution Protocol (ARP)
– table of IP to physical address bindings
– sender broadcasts request if destination IP address
not in its table
– target machine responds with its physical address
@
How do Hosts/Routers Build/Maintain
ARP Caches
• Table entries discarded if not refreshed (~10
minutes)
• Update your table with source when you are
the target
• Update your table if you already have an
entry and see info pass by on net
• Do not add new table entry based on info
that passes on net
ICMP
Internet Control Message Protocol
• Uses IP but is a separate protocol in the
network layer
ICMP HEADER
IP HEADER
IP HEADER
PROTOCOL = 1
TYPE CODE CHECKSUM
IP DATA
REMAINDER OF ICMP
MESSAGE (FORMAT IS TYPE
SPECIFIC)
ICMP Types
•
•
•
•
•
•
•
•
@
Echo - Echo Reply (ping)
Redirect (use other router)
Destination Unreachable (prot, port, host)
Time Exceeded
Source Quench
Checksum Failed
Reassembly Failed
Cannot Fragment
Echo and Echo Reply
TYPE CODE CHECKSUM
IDENTIFIER SEQUENCE #
DATA ….
TYPE = 8 = ECHO; 0 = ECHO REPLY
CODE = 0
IDENTIFIER
An identifier to aid in matching echoes and replies
SEQUENCE #
Same use as for IDENTIFIER
UNIX “ping” uses echo/echo reply
Redirect
TYPE CODE CHECKSUM
NEW ROUTER ADDRESS
IP HEADER + 64 bits data
from original DG
TYPE = 5
CODE =
0 = Network redirect
1 = Host redirect
2 = Network redirect for specific TOS
3 = Host redirect for specific TOS
Destination Unreachable
TYPE CODE CHECKSUM
UNUSED
IP HEADER + 64 bits data from original DG
TYPE = 3
CODE
0 = Net unreachable
1 = Host unreachable
2= Protocol unreachable
3 = Port unreachable
4 = Fragmentation needed but DF set
5 = Source route failed
Time Exceeded
TYPE CODE CHECKSUM
UNUSED
IP HEADER + 64 bits data from original DG
TYPE = 11
CODE
0 = Time to live exceeded in transit
1 = Fragment reassembly time exceeded
Source Quench
TYPE CODE CHECKSUM
UNUSED
IP HEADER + 64 bits data from original DG
TYPE = 4; CODE = 0
Indicates that a router has dropped the original DG or may
indicate that a router is approaching its capacity limit.
Correct behavior for source host is not defined.
Traceroute
• UNIX utility - displays router used to get to a
specified Internet Host
• Operation
– router sends ICMP Time Exceeded message to source if
TTL is decremented to 0
– if TTL starts at 5, source host will receive Time
Exceeded message from router that is 5 hopes away
• Traceroute sends a series of probes with different
TTL values… and records the source address of
the ICMP Time Exceeded message for each
• Probes are formatted to that the destination host
will send an ICMP Port Unreachable message
Scaling Issues
• Problems
– Internet addresses have only 32 bits and only 2
levels of hierarchy
– original addresses only allowed 256 networks
– routing protocols do not scale
• IPv6 solves both of these problems
• Classes (A, B, C) provided for more networks
• Subnetting adds a third level of hierarchy
• CIDR makes routing more efficient
Subnetting
• Add another level to address/routing
hierarchy: subnet
• Subnet masks define variable partition of
host part of Class A and B addresses
• Subnets are visible only within the local site
Class B Address
00000000
111111111111111111111111
Subnet Mask (255.255.255.0)
Network Number
SubnetID
Subnetted Address
HostID
Subnet Mask: 255.255.255.128
Subnet number: 128.96.34.0
129.96.34.15
128.96.34.1
H1
R1
Subnet Mask: 255.255.255.128
Subnet number: 128.96.34.128
1228.96.34.130
128.96.34.139
128.96.34.129
H2
R2
H3
128.96.33.14
128,96.33.1
Subnet Mask: 255.255.255.0
Subnet number: 128.96.33.0
@
Subnet
Subnet
Number
Mask
128.96.34.0
255.255.255.128
128.96.34.128 255.255.255.128
128.96.33.0
255.255.255.0
Next
Hop
int0
int1
R2
Forwarding Algorithm
D = destination IP address
for each entry (SubnetNum, SubnetMask, NextHop)
D1 = SubnetMask & D
if D1 = SubnetNum
if NextHop is an interface
deliver datagram directly to destination
else
deliver datagram to NextHop (a router)
@
Notes
• Would use a default router if nothing
matches
• Not necessary for all ones in subnet mask to
be contiguous
• Can put multiple subnets on one physical
network
• Subnets not visible from rest of the Internet
@
@
Supernetting
• Assign block of contiguous network numbers to
near-by networks
• Called CIDR: Classless Inter-Domain Routing
• Represent blocks with a single pair
– (first _network_address, count)
• Restrict block sizes to powers of 2
• Use bit mask (CIDR mask) to identify block size
• All routers must understand CIDR addressing
Route Propagation
Idea: Impose a second hierarchy on the
network that limits what routers talk to each
other. (The first hierarchy is the address
hierarchy that governs packet forwarding.)
• Autonomous Systems (AS)
– corresponds to administrative domain
– example: company backbone, ISP backbone
– assign each AS a 16 bit number
Routing Protocol Types
• Two-level route propagation hierarchy
• Interior gateway protocol
– inside an AS
– each AS selects its own
– routing decisions based on metric
• Exterior gateway protocol
– between ASs
– Internet-wide standard)
– routing decisions based on administrative issues
Popular Interior Gateway Protocols
• RIP: Route Information Protocol
– developed for XNS, distributed with UNIX
– distance vector algorithm, metric is hop count
• OSPF: Open Shortest Path First
–
–
–
–
–
recent Internet standard
link state algorithm
routes changed when topology changes
supports load balancing
2-level internal hierarchy
OSPF
OSPF - Open Shortest Path First
Intra-AS Routing Protocol
• Used within a single AS
• Calculates routes to destination IP network
address
• Link state protocol - SPF/flooding
• Route changes based on topology not traffic
OSPF
• Supports different routes depending on TOS
required
• Supports multiple equal cost paths
• Exchange of routing updates is reliable
between adjacent routers
• Link state advertisement (LSA) contains
info on state of router’s interfaces and
adjacencies, LSAs are flooded
Areas
• Two level routing hierarchy
• Divide AS into areas
• SPF is used to determine routes within each
area and on the backbone that routes
between the areas
• The backbone is also an area
Area 1
Area 2
Area 3
Area 4
Packet Types
•
•
•
•
Hello - acquire neighbors
Database description
Link state request
Link state update, Link state ack
– reliable hop-hop transfer of link state
advertisements (LSA)
– may aggregate LSAs for several sources
Link State Advertisements
• Router link - originated by all routers,
flooded in area
• Network link - originated by designated
router, flooded in area
• Summary link - originated by area border
routers, flooded into area, describes route to
destination outside the area but in the AS
• AS external link - originated by an AS
border router, flooded throughout AS,
describes route to dest in another AS
Types of OSPF Routers
• Internal router - within an area
• Area border router
– attached to multiple areas
– runs a copy of SPF for each attached area
– relays topological info on attached areas to
backbone
• Backbone router - includes area border
routers
• AS boundary routers
Popular Exterior Gateway Protocols
• EGP - Exterior Gateway Protocol
– original protocol in this class
– based on reachability tree
– no metric
• BGP4 - Border Gateway Protocol\
– recent internet standard
Routing Hierarchy Summary
• AS to destination AS (BGP4)
• Within AS (OSPF)
– 2 levels
– to destination network
• Within network
– to subnet, host
Border Routers
• Each AS has
– one or more border routers
– one or more border routers that serve as the
speakers for the AS
– border routers forward datagrams between ASs
• Speakers communicate with border routers
in adjacent ASs, using TCP
• Speakers advertise and receive routing info
and determine which routes to use
• Message types: OPEN, KEEPALIVE,
UPDATE, NOTIFICATION (errors)
UPDATE MESSAGES
• Routes are advertised and withdrawn in UPDATE
messages
• A route includes a sequence or set of ASs on the
path to the destination IP networks
unfeasible routes length (2B)
withdrawn routes (destination IP addresses)
total path attribute length (2B)
path attributes (routes)
network layer reachability info (destination IP addresses)
Routes
• When a BGP speaker advertises a route, it
adds itself to the route
• A BGP speaker only advertises routes that it
uses itself
• Policy considerations are used to determine
route; routes are not optimal
• An advertised route (sequence or set of
ASs) may change -- downstream ASs may
not discover in a timely fashion
Update Handling
• Receive an update with routes
• Decision process
– D (attributes of route) ------> non neg integer
– D specifies the degree of preference for a route
– route to dest with highest preference used
• Note that D does not depend on the
existence or non-existence of other routes
• D is based on policy considerations, not
necessarily on the cost of routes