Transcript network

Reference Notes on TCP/IP
Internetworking

Interconnection of 2 or more networks
forming an internetwork, or internet.
– LANs, MANs, and WANs.

Different networks man different protocols.
– TCP/IP, IBM’s SNA, DEC’s DECnet, ATM,
Novell and AppleTalk (for LANs).
– Also, satellite and cellular networks.
Example Internet
LAN-WANLAN
802.5
LAN
R
802.3
LAN
B
802.4
LAN
X.25 WAN
R
LAN-LAN
802.3
LAN
LANWAN
R
Gateway: device connecting 2 or
more different networks.
SNA WAN
R
Gateways





Repeaters: operate at physical layer (bits);
amplify/regenerate signal.
Bridges: store-and-forward frames; data link layer
devices.
Routers: operate at network layer.
Transport gateways: connect networks at the
transport layer.
Application gateways: connect 2 parts of an
application at application layer.
How do networks differ?










Service offered: connection-oriented versus connection-less.
Protocols: IP, IPX, AppleTalk, DECnet.
Addressing: flat (802) versus hierarchical (IP).
Maximum packet size.
Quality of service.
Error control: reliable, ordered, unordered delivery.
Flow control: sliding window versus rate-based.
Congestion control: leaky bucket, choke packets.
Security: privacy rules, encryption.
Parameters: different timeouts.
Types of Internetworks

Connection-oriented concatenation of VC
subnets.
– VC between source and router closest to destination
network.
– Router builds V to gateway to other subnet.
– Gateway keeps state about that VC.
– Builds VC to router in the next subnet, etc.

Every packet traverses same path.
– Ordered delivery.
– Routers convert between packet formats.
Connection-oriented
concatenation
VC between source and router closest to
destination network.
 Router builds VC to gateway to other
subnet. Gateway keeps state about VC.
 Gateway builds VC to router in the next
subnet, etc.
 Every packet traverses same path.

– Ordered delivery.
– Routers convert between packet formats.
Connectionless Internetworking

Datagram model.
– Different packets may take different routes.
– Separate routing decision for each packet.
– No ordered delivery guarantees.
Datagram versus VC Internets

VC:
– Plus’s: resources reserved in advance, ordered
delivery, short headers.
– Minus’s: vulnerability to failures, less adaptive,
hard if involving datagram subnet.

Datagram:
– Plus’s: more robust and adaptive, can be used over
datagram subnets (many LANs, mobile networks).
– Minus’s: Longer headers, unordered delivery.
Tunneling

Interconnecting through a “foreign” subnet.
Tunnel
Ethernet 2
Ethernet 1
G
G
WAN
IP
Ethernet frame
IP
IP
IP packet inside
payload field of
WAN packet.
Ethernet frame
Internetwork Routing 1

2-level hierarchy:
– Routing within each network: interior gateway protocol.
– Routing between networks: exterior gateway protocol.


Within each network, different routing algorithms
can be used.
Each network is autonomously managed and
independent of others: autonomous system (AS).
Internetwork Routing 2
Typically, packet starts in its LAN. Gateway
receives it (broadcast on LAN to
“unknown” destination).
 Gateway sends packet to gateway on the
destination network using its routing table.
If it can use the packet’s native protocol,
sends packet directly. Otherwise, tunnels it.

Fragmentation 1

Network-specific maximum packet size.
– Width of TDM slot.
– OS buffer limitations.
– Protocol (number of bits in packet length field).

Maximum payloads range from 48 bytes
(ATM cells) to 64Kbytes (IP packets).
Fragmentation 2




What happens when large packet wants to travel
through network with smaller maximum packet size?
Fragmentation.
Gateways break packets into fragments; each sent as
separate packet.
Gateway on the other side have to reassemble
fragments into original packet.
2 kinds of fragmentation: transparent and nontransparent.
Transparent Fragmentation


Small-packet network transparent to other subsequent
networks.
Fragments of a packet addressed to the same exit
gateway, where packet is reassembled.
– OK for concatenated VC internetworking.


Subsequent networks are not aware fragmentation
occurred.
ATM networks (through special hardware) provide
transparent fragmentation: segmentation.
Problems with Transparent
Fragmentation

Exit gateway must know when it received all
the pieces.
– Fragment counter or “end of packet” bit.
Some performance penalty but requiring all
fragments to go through same gateway.
 May have to repeatedly fragment and
reassemble through series of small-packet
networks.

Non-Transparent Fragmentation

Only reassemble at destination host.
– Each fragment becomes a separate packet.
– Thus routed independently.

Problems:
– Hosts must reassemble.
– Every fragment must carry header until it
reaches destination host.
Keeping Track of Fragments 1
Fragments must be numbered so that original
data stream can be reconstructed.
 Tree-structured numbering scheme:

– Packet 0 generates fragments 0.0, 0.1, 0.2, …
– If these fragments need to be fragmented later on, then
0.0.0, 0.0.1, …, 0.1.0, 0.1.1, …
– But, too much overhead in terms of number of fields
needed.
– Also, if fragments are lost, retransmissions can take
alternate routes and get fragmented differently.
Keeping Track of Fragments 2
Another way is to define elementary fragment
size that can pass through every network.
 When packet fragmented, all pieces equal to
elementary fragment size, except last one
(may be smaller).
 Packet may contain several fragments.

Keeping Track of Fragments 3

Header contains packet number, number of first
fragment in the packet, and last-fragment bit.
Last-fragment bit
E F G H I
27 0 1 A B C D
Number of
first fragment
Packet number
27 0
0 A B
C D
E
F
G
H
1 byte
J
(a) Original packet
with 10 data bytes.
27 8
1 I
(b) Fragments after passing through network
with maximum packet size = 8 bytes.
J
The Internet Network Layer
The Internet as a collection on networks or
autonomous systems (ASs).
 Hierarchical structure.

Transcontinental
links
Regional
network
US
backbone
Transcontinental
links
European
backbone
National
network
IP (Internet Protocol)
Glues Internet together.
 Common network-layer protocol spoken by all
Internet participating networks.
 Best effort datagram service:

– No reliability guarantees.
– No ordering guarantees.
IP
Transport layer breaks data streams into
datagrams; fragments transmitted over
Internet, possibly being fragmented.
 When all packet fragments arrive at
destination, reassembled by network layer
and delivered to transport layer at
destination host.

IP Versions

IPv4: IP version 4.
– Current, predominant version.
– 32-bit long addresses.

IPv6: IP version 6 (aka, IPng).
– Evolution of IPv4.
– Longer addresses (16-byte long).
IP Datagram Format
IP datagram consists of header and data (or
payload).
 Header:

– 20-byte fixed (mandatory) part.
– Variable length optional part.
IP Header
32 bits
Version Header Type of
length
service
Identification
TTL
Total length
U D M Fragment offset
Protocol
Header checksum
Source address
Destination address
Options
IP Header Fields 1




Version: which IP version datagram uses.
Header length: how long (in 32-bit words) is header;
minimum=5; maximum=15 (options=40 bytes).
Type of service: precedence (priority), 3 flags (delay,
throughput, reliability). In practice, routers ignore
type of service.
Total length: length of total datagram, i.e., header +
data (max = 64Kbytes).
IP Header Fields 2
Identification: which datagram fragment
belongs to.
 U: unused bit.
 D: don’t fragment.
 M: more fragments.
 Fragment offset: position of fragment in
datagram.
 TTL: datagram lifetime.

IP Header Fields 3
Protocol: number of the transport protocol
that generated the datagram.
 Header checksum: verifies header integrity;
computed at each hop.
 Source and destination address: IP
addresses of source and destination.
 Options: way of extending the protocol.

Addressing

Required for packet delivery.
– Each network may use different addressing
scheme.
– Addresses must be unique.
Flat addresses: physical addresses (e.g.,
Ethernet address).
 Hierarchical addresses: use hierarchy
scheme like postal addresses (e.g., IP).

Address Types
Unicast: uniquely distinguishes a single
node.
 Multicast: shared by a group of nodes.
 Broadcast: shared by all nodes.

IP Addresses
Every host and router on the Internet must
have an IP address.
 2-level hierarchy:

– Network number.
– Host number.

Notations:
– Binary: 10000000 00000110 11110000 00000011
– Dotted decimal: 128.6.240.3
IP Address Formats 1

4 different classes:
Network
Host
Class A:
0XXXXXXX
128 nets.
16M hosts/net.
Class B:
10XXXXXX XXXXXXXX
16K nets.
64K hosts/net.
Class C:
110XXXXX XXXXXXXX XXXXXXXX
2M nets.
256 hosts/net.
Class D:
1110XXXX XXXXXXXX XXXXXXXX XXXXXXXX
Multicast.
IP Address Formats 2
Class A: 1~127.
 Class B: 128~191.
 Class C: 192~223.
 Class D: 224~239.

Multi-addresses

A router usually has more than one IP
address.
236.240.128.0
129.98.0.0
129.98.95.1

236.240.128.3
80.0.0.8
Multi-homed host: host with multiple
network interfaces each of which has
different IP address.
80.0.0.0
Management and Scalability 1
Network numbers assigned by single
authority: NIC (network information
center).
 All hosts in a network must have same
network number.
 What if networks grow?

Management and Scalability 2

Example: company starts with 1 class C
LAN, thus can connect up to 256 hosts.
– It might grow to more than 256 hosts.
– It might get more LANs.
– For every new LAN, need new network number
from NIC.
– Moving machines between LANs needs address
change.
Subnetting 1

Split address space into several “internal”
subnets.
– Still act like single network to outside world.

Example: Class B address.
Class B:
16K nets.
64K hosts/net
10XXXXXX XXXXXXXX HHHHHHHH HHHHHHHH
10XXXXXX XXXXXXXX SSSSSSHH
Class B with
subnetting: 62
1st. subnet: 130.50.4.1
LANs, 1022 hosts each.
2nd. subnet: 130.50.8.1
HHHHHHHH
Subnetting 2

Routing: hierarchical.
– (network, -) entries: distant networks hosts.
– (this network, host) entries: local hosts.
– Routers only need to keep track of other networks and
local hosts.

With subnetting:
–
–
–
–
(network, -) entries: distant networks hosts.
(this network, subnet, -).
(this network, this subnet, host).
Adds extra hierarchical level => smaller RTs.
Subnet Mask

Used to compute the subnet number; i.e., gets
rid of the host number.
– Facilitates routing table look-up.
– IP address AND subnet mask = subnet #

Example:
10XXXXXX XXXXXXXX SSSSSSHH
HHHHHHHH
11111111 11111111 11111100 00000000
Ex: 130.50.15.6 AND subnet mask = 130.50.12.0,
which is subnet 3.
Internet Control Protocols
IP carries data.
 There are other network layer protocols that
carry control information.
 Example: ICMP, ARP, RARP, BOOTP.

ICMP
Internet Control Message Protocol.
 Report specific events.

– Generated by routers.
– Encapsulated in IP packets.
ICMP Messages
Destination unreachable
Time exceeded
Parameter problem
Source quench
Redirect
Echo request
Echo reply
Timestamp request
Timestamp reply
Packet couldn’t be delivered
TTL field hit 0
Invalid header field
Choke packets
Route problem
Check if destination is up
Destination responds
Same as echo request + TS
Same as echo reply + TS
Mapping IP to DLL Address
Internet applications refer to hosts by their IP
addresses; once packet gets to destination
LAN, node needs to figure out the destination
DLL address.
 One solution is to have configuration file.

– Hard to maintain/update.

Address Resolution Protocol (ARP):
– Run by every node to map IP to DLL address
(RFC 826).
ARP

Advantage:
– Easy to administer, less human intervention.
– Example: 2 hosts on the same Ethernet want to
communicate.
» Host 1 must figure out host 2’s Ethernet address.
» Host 1 broadcasts ARP packet on Ethernet asking for
the Ethernet address of host 2.
» Host 2 receives the ARP request, and replies with its
Ethernet address.
ARP Optimizations

Caching of ARP replies.
– Entries may have large TTLs.
When sending ARP request, piggyback its
own IP-DLL address mapping.
 Every machine broadcasts its mapping at
boot time.

– No response is expected.
– Other machines cache that information.
Proxy ARP

What if host 1 wants to send data to host 3
on a different LAN?
– Router connecting the 2 LANs can be
configured to respond to ARP requests for the
networks it interconnects: proxy arp.
– Another solution is for host 1 to recognize host
3 is on remote network and use default LAN
address that handles all remote traffic; that
could be the router’s Ethernet address.
RARP
Reverse Address Resolution Protocol.
 Given LAN address, what’s the IP address?
 Usually for booting diskless workstation.

–
–
–
–
Gets the OS image from remote file server.
Same image for all machines.
Machine broadcasts its LAN address.
Remote RARP server responds with machine’s IP
address.
BOOTP
RARP broadcasts are not forwarded by
routers.
 Need RARP server on every network.
 BOOTP uses UDP messages that are
forwarded by routers.

– Also provides additional information such as IP
address of file server holding OS image, subnet
mask, etc.
Internet Routing

IGPs and EGPs
– IGPs: routing within ASs.
– EGPs: routing between ASs.
IGPs

Original Internet IGP was RIP.
– Distance vector.
– OK for small ASs but not efficient as ASs got larger.

New IGP: OSPF.
–
–
–
–
Open Shortest Path First.
Became standard in 1990.
Link state algorithm.
RIP is still running but OSPF is taking over.
OSPF 1

Design requirements:
–
–
–
–
Open implementation.
Support for various distance metrics: delay, hops, etc.
Dynamic: automatically adapt to topology changes.
QoS Routing: real-time versus other traffic using IP’s type
of service field.
– Load balancing across multiple lines.
– Security and tunneling.
OSPF 2
Abstracts collection of networks, routers and
lines into a directed graph where edges are
assigned a cost proportional to the routing
metric.
 It then computes shortest path.
 Hierarchical routing within ASs.

– Areas: collection of contiguous networks.
– Area 0: AS backbone; all areas connected to it.
OSPF 3

Type of service routing:
– Uses different graphs labeled with different
metrics.

Routing updates:
– Adjacent routers exchange routing information.
– Adjacent routers are on different LANs.
– Reliable link state updates with sequence #’s.
EGPs
Routing protocol between ASs.
 Take policy into account.

– An AS may not be willing to carry traffic
originating and destined to foreign ASs.
– Example: phone companies are willing to carry
traffic for their customers but not for others.
Routing Policy Examples
No transit traffic through certain ASs.
 Traffic source restricts ASs through which
its traffic crosses.
 Same for destination.

BGP 1
Border Gateway Protocol.
 Policies are manually configured into BGP
routers.
 BGP abstracts networks as a collection of BGP
routers and the their links.
 2 BGP routers are connected if they share a
common network.
 BGP routers communicate reliably using TCP.

BGP 2

3 types of networks:
– Stub networks: have a single connection in the
BGP graph; cannot carry transit traffic.
– Multi-connected networks: have multiple
connections but refuse to carry transit traffic.
– Transit networks: agree to carry transit (3rd.
party) traffic possibly with some restriction;
e.g., backbones.
BGP 3
BGP is a distance vector protocol.
 Routing table entries keep whole path to
destination + distance.
 BGP routers can discard the paths containing
itself: avoiding loops and counting to infinity.
 Routers compute distance associated to a route
taking policy into account.

– If policy is violated, distance = infinity.
Internet Multicasting

IP supports multicasting using class D
addresses.
– Each class D address identifies a group of
hosts.
– 28 bits define over 250 million groups.

Best-effort delivery.
Group Membership
Hosts (single or multiple processes) may join
and leave group.
 Special, multicast routers perform multicast
routing and packet forwarding.

– Hosts belonging to multicast groups periodically
send messages to the closest multicast router.
– Multicast routers and hosts use IGMP (Internet
Group Management Protocol) to exchange
membership information.
IP Multicast Routing
Use spanning trees.
 Modified distance vector protocol using
unicast routing information.

– Build one spanning tree per source, per group.
– Or, one shared spanning tree per group.
– Use pruning to remove parts of the tree that don’t
have any multicast group members.
– Use tunneling to cross regions that are not
multicast capable.
Mobile IP 1

Support for mobile users.
– “Last hop” mobility.

Problem: IP addressing scheme.
– Class+network number+host number.
– If host moves and attaches itself to foreign
network, packets destined to it will still go to its
home network.
– Assigning hosts new IP address?
» Too much hassle.
Mobile IP 2

Solution:
– Home agent: runs at the home network.
– Foreign agent: runs at foreign network.
– When mobile host connects itself to foreign
network, registers with foreign network’s
foreign agent.
– Foreign agent assigns host care-of address, and
informs home agent.
Mobile IP 3
Sending packets: mobile host uses its care-of
address.
 Receiving packets:

– When packet arrives at home network, router that gets it
sends ARP request for that IP address.
– Home agent replies with its own Ethernet address. It gets
the packet, and tunnels it to foreign agent. Foreign agent
delivers packet to mobile host.
– Home agent sends care-of address to sender, so future
packets are sent directly to foreign network.
Mobile IP 4

Locating foreign agents:
– Foreign agents periodically broadcast their address and
service provided (e.g., home, foreign, or both).
– Mobile host can announce its presence and wait for
response from foreign agent.

Unregistration:
– If host leaves without unregistering, its registration expires
after some time.

Security:
– Authentication issues.
Scaling IP Addresses 1

Exponential growth of the Internet!
– 32-bit address fields are getting too small.
– Early predictions: it’d take decades to achieve
100,000 network mark.
– 100,000th. network was connected in 1996!
– Internet is rapidly running out of IP addresses!
– Waste due to hierarchical address.
IP Address Formats

4 different classes:
Network
Host
Class A:
0XXXXXXX
128 nets.
16M hosts/net.
Class B:
10XXXXXX XXXXXXXX
16K nets.
64K hosts/net.
Class C:
110XXXXX XXXXXXXX XXXXXXXX
2M nets.
256 hosts/net.
Class D:
1110XXXX XXXXXXXX XXXXXXXX XXXXXXXX
Multicast.
Scaling IP Addresses 2
Class A addresses: 16M hosts is usually too
much.
 Class C addresses: 254 hosts is usually too
small.
 Class B addresses provide room for 64K hosts.

– Organizations usually request class B addresses
but more than 50% of them only have up to 50
hosts!
Scaling IP Addresses 3


Class C addresses should have 10-bit host
numbers instead of only 8-bit numbers.
– Would allow for 1022 hosts instead of just 254.
– More Class C networks: network number can
grow up to 0.5M.
But, could result in routing table explosion.
– Routers will have to know about many more
networks.
CIDR 1
Classless Interdomain Routing: RFC 1519.
 No longer uses classes A, B, and C addresses.
 Allocate remaining Class C addresses in
variable-sized blocks.

– Example: if an organization needs 2000 addresses,
it’s given a block of 2048 addresses, or 8
contiguous class C networks and not a full class B
address.
CIDR 2


New allocation rules for class C addresses.
World partitioned into 4 zones and each one was
given portion of class C address space (192~223).
–
–
–
–
192.0.0.0~195.255.255.255: Europe.
198.0.0.0~199.255.255.255: North America.
200.0.0.0~201.255.255.255: Central and South America.
202.0.0.0~203.255.255: Asia and Pacific.
CIDR 3
Each region is allocated ~ 32M class C
addresses.
 Addresses 204.0.0.0~223.255.255.255
reserved for future use.
 Advantages:

– Less waste.
– Routers can keep only one RT entry per region,
i.e., 32M addresses compressed into one.
CIDR 4
Once packet gets to its destination region,
need more detailed routing information.
 One possibility is to keep 131,072 (32M/28)
entries for all “local” networks.

– Explosion problem.

Instead, use of 32-bit masks: only need to
keep start address of block.
CIDR - Example 1



Cambridge University has 2048 addresses from
194.24.0.0~194.24.7.255 and mask 255.255.248.0.
Oxford University: 4096 addresses
194.24.16.0~194.24.31.255 with mask
255.255.240.0.
U of Edinburgh: 1024 addresses
194.24.8.0~194.24.11.255 and mask 255.255.252.0.
IP Evolution
CIDR bought IPv4 a few more years.
 Because of its addressing limitations and to
accommodate next-generation Internet
applications, IP must evolve.
 In 1990, IETF started work on IP next
generation, or IPng.

– Several proposals were considered.
– SIPP (Simple Internet Protocol Plus) was selected
and became IPv6.
IPv6 1
RFCs 1883~1887.
 Features:

– Longer addresses (16 bytes versus only 4 in IPv4).
– Header simplification (only 7 fields versus 13
fields in IPv4): faster processing by routers.
– Better option support since fields that were
previously required are now optional.
– Improved security and QoS support.
IPv6 Header
32 bits
Version Priority
Payload length
Flow label
Next header
Source address
(16 bytes)
Destination address
(16 bytes)
Hop limit
IPv6 Header Fields 1

Version = 6.
– During transition period, routers will examine this field to
decide what kind of packet it is.

Priority: handling different kinds of traffic.
– 0~7: data that can be flow controlled, e.g., data distribution
services.
– 8~15: real-time traffic (e.g., audio, video)
– Within each group, lower values have lower priority than
higher values (e.g., 1 for news, 4 for ftp and 6 for telnet)
IPv6 Header Fields 2

Flow label (experimental): allows source and
destination to set up pseudo-connection.
– Try to have some kind of service guarantees.
– Example: assign flow number to a stream of
packets that need reserved bandwidth.
– Flow number: src+dst+flow #.

Payload length: length of data.
– Different from IPv4 which specified total length of
datagram.
IPv6 Header Fields 3
Next header: specifies what is present in the
options field (extension headers).
 Hop limit: equivalent to IPv4’s TTL.
 Source and destination addresses:

– 16-byte addresses (fixed length).
– Address space is divided by using prefixes.
IPv6 versus IPv4



No more IHL (header length); why?
No more protocol field: next header field.
No more fragmentation-related fields.
– All IPv6 hosts and routers must support 576-byte packets.
– Fragmentation is less likely to occur.
– Router sends error messages back to source when packet is
too big so source breaks it down.

No more checksum: rely on more reliable networks
and DLL and transport checksums.
IPv6 Addressing 1

Separate prefixes for provider-based and geographicbased addresses.
– Ability to accommodate 2 ways of address assignment:
» Addresses allocated to ISP companies.





Prefix 010.
Each ISP assigned portion of address space.
First 5 bits following prefix defines registry where provider is
registered.
Remaining 15 bytes are allocated by each provider.
Example: 3-byte provider number.
IPv6 Addressing 2

Geographic-based addresses:
– Prefix 100.
– Same model as current Internet.

Multicast addresses:
– Prefix 11111111.
– 4-bit flag + 4-bit scope fields + 112-bit group id.
– Flags: 1 bit defines whether group is permanent or
not.
– Scope: limit reach of multicast packet.
IPv6 Address Notation

8 groups of 4 hexadecimal digits separated
by colons.
– Example:
8000:0000:0000:0000:0123:4567:89AB:CDEF
– Optimizations:
» Leading zeros within group can be omitted.
» Groups of zeros can be replaced by pair of colons.

8000::123:4567:89AB:CDEF.
» IPv4 addresses: ::192.31.20.46.
Extension Headers 1
Equivalent to IPv4 options.
 6 types of extension headers:

Hop-by-hop options
Routing
Fragmentation
Authentication
Encrypted payload
Destination options
Misc. info for routers
Full or partial route included
Management of fragments
Verification of source’s id
Information about encryption
Information for destination
Extension Headers 2


Fixed format and variable-sized headers.
Variable-sized headers:
– (type, length, value).
– Type: 1 byte specifying which option this is.
» First 2 bits tell option-uncapable routers what to do: skip option,
discard packet, discard packet with ICMP message, discard packet
without ICMP packet for multicast addresses.
– Length: how long value field (0~255 bytes).
– Value: information.
Hop-by-Hop Header

Convey information all routers along path
must examine.
– Jumbograms: datagrams > 64KBytes.
Next Header
0
194
0
Jumbogram payload length
– Next header: what option this is.
– Length of hop-by-hop header excluding the first 8
(mandatory) bytes.
– Defines option, in this case datagram size.
Routing Header

Lists one or more routers that must be
visited on the way to the destination.
– Strict source routing: full path is supplied.
– Loose source routing: only selected routers are
listed.
Fragment Header

Allows source to fragment datagram.
– In IPv6, routers are not allowed to fragment.
– If a router receives packet that is too big, it
discards it and sends back a ICMP message to
source.
– Source uses this option to fragment packet, and
resend it.
– Contains datagram id, fragment number, and
“last fragment” bit.
Authentication Header
Supports verification of sender’s identity.
 Contains authentication key and
cryptographic checksum of the whole
datagram.
 Receiver uses key number to find secret
key. Computes checksum using secret key
and checks whether it matches with
received datagram.

Destination Options

Supports options that need only be
interpreted by destination host.
Quality of Service




Service offered by the network (carrier) to customer
(end user): service agreement.
Service agreement: offered traffic, offered service,
compliance requirements.
If customer and carrier don’t agree: VC will not be
set up.
Different requirements for each direction.
– E.g., VOD application: required bandwidth user->server
<> server->user.
Quality of Service Parameters 1
Peak cell rate
PCR Max. cell transmission rate
Sustained cell rate
SCR Average cell rate
Minimum cell rate
MCR Min. acceptable cell rate
Cell delay variation tolerance CDVT Max. acceptable cell jitter
Cell loss ratio
CLR Fraction of lost cells
Cell transfer delay
CTD Time to deliver
Cell delay variation
CDV Delivery delay variation
Cell error rate
CER Fraction of correct cells
QoS Parameters 2
PCR, SCR, MCR, and CVDT: specified by
sender.
 CLR, CTD, and CDV describe network
conditions and are measured at receiver.

The Transport Layer
The Transport Layer

End-to-end.
– Communication from source to destination
host.
– Only hosts run transport-level protocols.
– Under user’s control as opposed to network
layer which is controlled/owned by carrier.
The Transport Service
Service provided to application layer.
 Transport entity: process that implements
the transport protocol running on a host.

– At OS kernel, user-level process, or network
card.
The Transport Layer
Source host
Destination host
Application
Layer
Transport
address
Transport
Entity
Network
Layer
Network
Address
Application
Layer
Application/
transport
interface
TPDU
Transport/
network
interface
Transport
Entity
Network
Layer
Types of Transport Services
Connection-less versus connection-oriented.
 Connection-less service: no logical
connections, no flow or error control.
 Connection-oriented:

– Based on logical connections: connection setup,
data transfer, connection teardown.
– Flow and error control.
Transport versus Network
Layer

Transport layer is “controlled” by user.
– Ability to enhance network layer quality of
service.
– Example: transport service can be more reliable
than underlying network service.
– Transport layer makes standard set of
primitives available to users which are
independent from the network service
primitives, which may vary considerably.
Quality of Service

User may specify QoS parameters at then
transport layer.
– At connection setup time, user may define
preferred, acceptable, and minimum values for
various service parameters.
– Transport layer determines whether it’s possible
to provide required service based on available
network service(s).
Transport-Layer QoS Parameters
1
Connection establishment delay: time to
establish connection.
 Connection establishment failure
probability: probability connection is not
established within maximum establishment
time.
 Throughput: bytes transferred per second
measured over a time interval.

Transport-Layer QoS Parameters
2
Transit delay: time between sending a message
and receiving it on the other side (measured by
the transport entities).
 Residual error ratio: ratio of messages in error
to total messages sent.
 Priority: way for user to indicate that some
connections are more important.
 Resilience: probability connection is
terminated due to congestion, etc.

Transport Layer QoS
Only few transport protocols provide QoS
parameters.
 Most just try to minimize residual error rate.
 QoS parameters specified by transport user
when connection is setup.

– Desired and minimum acceptable values can be
specified.
– Service negotiation.
Transport Service Primitives
Allow transport users (e.g., application
programs) to access transport service.
 Example: connection-oriented transport
service primitives.

PRIMITIVE
TPDU Sent
Meaning
LISTEN
CONNECT
SEND
(none)
listen for connection
Connection Req. try to establish connection
DATA
send data
RECEIVE
(none)
waits for data
DISCONNECT
Disc. Req.
try to release connection
TPDU
Transport protocol data unit.
 Messages sent between transport entities.
 TPDUs contained in network-layer packets,
which in turn are contained in DLL frames.

Frame
header
Packet
header
TPDU
header
TPDU payload
Connection Management State
Machine
SERVER
CLIENT
Connect
executed
Active
establishment
pending
Connection
Accept
Active
Disconnect
disconnect
execute
pending
Connection
Idle
req. received
Passive
establishment
pending
Connect
executed Established
Disc.
s req.
Passive
disconnect received
pending
Disconnect
executed
Idle
Disc. accept. received
Berkeley Sockets 1


Set of transport-level primitives made available by
Berkeley UNIX.
Server side:
» SOCKET: create new communication end point.
» BIND: attach local address to socket (once server binds address,
clients can connect to it).
» LISTEN: listen for connection.
» ACCEPT: accept new connection.
» SEND, RECEIVE: send and receive data.
» CLOSE: release connection.
Berkeley Sockets 2

Client side:
» SOCKET: create socket.
» CONNECT: try to establish connection.
» SEND, RECEIVE: send and receive data.
» CLOSE: release connection.
Transport Protocol Issues:
Addressing
Address of the transport-level entity.
 TSAP: transport service access point
(analogous to NSAP).

–
–
–
–
Internet TSAP: (IP address, local port).
Internet NSAP: IP address.
There may be multiple TSAPs on one host.
Typically, only one NSAP.
Example 1

Finding the time of day from a time-of-day
server.
– Time-of-day server process on host 2 attaches
itself to TSAP 122 and waits for requests (e.g.,
through LISTEN).
– Application process (TSAP 6) on host 1 wants
to find out the time-of-day; issues CONNECT
specifying TSAP 6 as source and TSAP 122 as
destination.
Finding Services 1

Well-known TSAP.
– Time-of-day server has been using TSAP 122 forever so
every users know it.

Initial connection protocol: special process
server that proxies for less well-known
services.
– Process server listens to set of ports at the same time.
– Users CONNECT to a TSAP, and if there are no servers,
process server is likely to be listening. It them spawns
requested server.
Finding Services 2

Name or directory service.
– Name server listens to well-known TSAP.
– User sends service name and name server
responds with service’s TSAP.
– New services need to register with name server.

Finding the server’s network address.
– Hierarchical addresses solve this problem, i.e., the
NSAP is part of the TSAP.
Connection Establishment


CONNECTION REQUEST and CONNECTION
ACCEPTED TPDUs.
Problem: delayed duplicates.
– Duplicates can re-appear and be taken as the real
messages.

Solution: messages age and are discarded after some
time; need to discard ack’s.
– Maximum hop count.
– Timestamp.
Avoiding Duplicates 1
2 identically numbered TPDUs are never
outstanding at the same time.
 Bounded packet lifetime.
 Each host has its clock.

– Clock as a counter that increments itself.
– #bits(counter)>= #bits(sequence number).
– Clocks don’t “crash”.
Avoiding Duplicates 2
When connection setup, low-order k bits of
clock used as initial sequence number.
 Each connection starts numbering its
TPDUs with different sequence number.
 Sequence number space need to be such that
by the time sequence numbers wrap around,
old TPDUs with same sequence numbers
have aged.

Sequence Numbers versus Time
1
Seq.
#’s
. Linear relation between time
and initial sequence number.
Time
Sequence Numbers versus Time
2
Seq.
#’s
T
Forbidden
region
Time
. Host crash: when it comes
up, it doesn’t know where it
ere in the sequence # space.
. Example: T=60 sec and
clock ticks once per second.
. At t=30s, TPDU on connection
5 gets seq.# 80.
. Host crashes and comes up.
. At t=60s, reopens connections 0~4.
. At t=70s, reopens connection 5 and at t=80s, sends TPDU 80.
. Old TPDU 80 still valid, and one would look like a duplicate.
. To prevent this, check if it’s in the “forbidden region” and delay
sequence number.
Three-Way Handshake

Solves the problem of getting 2 sides to
agree on initial sequence number.
1
2
CR (seq=x)
ACK(seq=y,ACK=x)
DATA(seq=x, ACK=y)
CR: connection
request.
3-Way Handshake: Duplicates 1
2
1
*
CR(seq=x)
ACK(seq=y, ACK=x)
REJECT(ACK=y)
. Old duplicate CR.
. The ACK from host 2 tries
to verify if host 1 was trying to
open a new connection with
seq=x.
. Host 1 rejects host 2’s attempt
to establish.
Host 2 realizes it was a duplicate
CR and aborts connection.
3-Way Handshake: Duplicates 2
2
1
*
CR(seq=x)
ACK(seq=y, ACK=x)
DATA(seq=x,
ACK=z)
REJECT(ACK=y)
. Old duplicate CR and ACK
to connection accepted.
Connection Release

Asymmetric release: telephone system.
– When one party hangs up, connection breaks.
– May cause data loss.

Symmetric release:
– Treats connection as 2 separate unidirectional
connections.
– Requires each to be released separately.
Symmetric Release
How to determine when all data has been
sent and connection could be released?
 2-army problem:

Blue army 1
Blue army 2
. White army larger
than either blue armies.
White army
. Blue army together is
larger.
. If each blue army attacks, it’ll be defeated. They win if attack together.
2-Army Problem 1


To synchronize attack, they must use messengers that
need to cross valley: unreliable.
Is there a protocol that allows blue army to win? No.
– Blue army 1 sends message to blue army 2.
– Blue army 2 sends ACK back.
– Blue army 2 is not sure whether ACK was received.
2-Army Problem 2

Use 2-way handshake.
– Blue army 1 ACKs back but it’ll never know if
the ACK was received.

Applying to connection release:
– Neither side is prepared to disconnect until
convince other side is prepared to disconnect.
– In practice, hosts are willing to take risks.
Connection Release Protocol
Send DR+
start timer
DR
DR
Release
connection
Send
ACK
DR: disconnection
request.
Send DR+
start timer
ACK
Release
connection
Connection Release Scenarios 1
Send DR+
start timer
DR
DR
Release
connection
Send
ACK
DR: disconnection
request.
Send DR+
start timer
ACK
Timeout:
Release
connection
Connection Release Scenarios 2
Send DR+
start timer
DR: disconnection
request.
DR
DR
Timeout:
send DR+
start timer
Send DR+
start timer
DR
Send DR+
start timer
DR
ACK
Release
connection
The Internet Transport Protocols:
TCP and UDP

UDP: user datagram protocol (RFC 768).
– Connection-less protocol.

TCP: transmission control protocol (RFCs
793, 1122, 1323).
– Connection-oriented protocol.
UDP

Provides connection-less, unreliable service.
– No delivery guarantees.
– No ordering guarantees.
– No duplicate detection.

Low overhead.
– No connection establishment/teardown.

Suitable for short-lived connections.
– Example: client-server applications.
UDP Segment Format
0
15
31
Destination port
Source port
Length
Checksum
Data
Source and destination ports: identify the end points.
Length: 8-byte header+ data.
Checksum: optional; if not used, set to zero.
UDP Checksum
Computed over a pseudo-header+ UDP
header+data+padding (to even number of
bytes if needed).
 Pseudo-header:

0
31
Source IP address
00000000
Destination IP address
Protocol
Segment length
TCP
Reliable end-to-end communication.
 TCP transport entity:

– Runs on machine that supports TCP.
– Interfaces to the IP layer.
– Manages TCP streams.
» Accepts user data, breaks it down and sends it as
separate IP datagrams.
» At receiver, reconstructs original byte stream from
IP datagrams.
TCP Reliability

Reliable delivery.
– ACKs.
– Timeouts and retransmissions.

Ordered delivery.
TCP Service Model 1

Obtained by creating TCP end points.
– Example: UNIX sockets.
– TSAP address: IP address + 16-bit port
number.
– Multiple connections can share same port pair.
– Port numbers below 1024: well-known ports
reserved for standard services.
» List of well-known ports in RFC 1700.
TCP Service Model 2
TCP connections are full-duplex and pointto-point.
 Byte stream (not message stream).

– Message boundaries are not preserved e2e.
A
B
C
D
4 512-byte segments sent as
separate IP datagrams
ABCD
2048 bytes of data delivered
to application in single READ
TCP Byte Stream
When application passes data to TCP, it
may send it immediately or buffer it.
 Sometimes application wants to send data
immediately.

– Example: interactive applications.
– Use PUSH flag to force transmission.

URGENT flag.
– Also forces TCP to transmit at once.
TCP Protocol Overview 1

TCP’s TPDU: segment.
– 20-byte header + options.
– Data.
– TCP entity decides the size of segment.
» 2 limits: 64KByte IP payload and MTU.
» Segments that are too large are fragmented.

More overhead by addition of IP header.
TCP Protocol Overview 2

Sequence numbers.
– Reliability, ordering, and flow control.
– Assigned to every byte.
– 32-bit sequence numbers.
TCP Segment Header
Source port
Destination port
Sequence number
Acknowledgment number
Header
length
UA P R S F
Checksum
Window size
Urgent pointer
Options (0 or more 32-bit words)
Data
TCP Header Fields 1
Source and destination ports identify
connection end points.
 Sequence number.
 Acknowledgment number specifies next byte
expected.
 TCP header length: how many 32-bit words
are contained in header.
 6-bit unused field.

TCP Header Fields 2

6 1-bit flags:
– URG: indicate urgent data present; urgent
pointer gives byte offset from current sequence
number where urgent data is.
– ACK: indicates whether segment contains
acknowledgment; if 0, acknowledgement
number field ignored.
– PUSH: indicates PUSHed data so receiver
delivers it to application immediately.
TCP Header Fields 3

Flags (cont’d):
– RST: used to reset connection, reject invalid
segment, or refuse to open connection.
– SYN: used to establish connection; connection
request, SYN=1, ACK=0.
– FIN: used to release connection.

Window size: how many bytes can be sent
starting at acknowledgment number.
TCP Header Fields 4
Checksum: checksums the
header+data+pseudo-header.
 Options: provide way to add extra
information.

– Examples:
» Maximum payload host is willing to accept; can be
advertised during connection setup.
» Window scale factor that allows sender and receiver
to negotiate larger window sizes.
TCP Connection Setup

3-way handshake.
Host 1
SYN (SEQ=x)
SYN(SEQ=y,ACK=x+1)
(SEQ=x+1, ACK=y+1)
Host 2
TCP Connection Release 1

Abrupt release:
– Send RESET.
– May cause data loss.
TCP Connection Release 2

Graceful release:
– Each side of the connection released
independently.
» Either side send TCP segment with FIN=1.
» When FIN acknowledged, that direction is shut down for data.
» Connection released when both sides shut down.
– 4 segments: 1 FIN and 1 ACK for each direction;
1st. ACK+2nd. FIN combined.

TCP Connection Release 3

Timers to avoid 2-army problem.
– If response to FIN not received within 2*MSL,
FIN sender releases connection.

After connection released, TCP waits for
2*MSL (e.g., 120 sec) to ensure all old
segments have aged.
TCP Transmission 1
Sender process initiates connection.
 Once connection established, TCP can start
sending data.
 Sender writes bytes to TCP stream.
 TCP sender breaks byte stream into
segments.

– Each byte assigned sequence number.
– Segment sent and timer started.
TCP Transmission 2

If timer expires, retransmit segment.
– After retransmitting segment for maximum
number of times, assumes connection is dead and
closes it.
If user aborts connection, sending TCP flushes
its buffers and sends RESET segment.
 Receiving TCP decides when to pass received
data to upper layer.

TCP Flow Control

Sliding window.
– Receiver’s advertised window.
» Size of advertised window related to receiver’s
buffer space.
» Sender can send data up to receiver’s advertised
window.
TCP Flow Control: Example
App. writes
2K of data
App. does
3K write
Sender
blocked
Sender
may send up
to 2K
4K
2K;SEQ=0
2K
ACK=2048; WIN=2048
2K; SEQ=2048
0
App. reads
2K of data
ACK=4096; WIN=0
ACK=4096; WIN=2048
1K; SEQ=4096
2K
1K
TCP Flow Control: Observations

TCP sender not required to transmit data as
soon as it comes in form application.
– Example: when first 2KB of data comes in,
could wait for more data since window is 4KB.

Receiver not required to send ACKs as
soon as possible.
– Wait for data so ACK is piggybacked.
Delayed ACKs



Tries to optimize ACK transmission.
Delay ACKs and window update (500msec)
hoping to piggyback on data segment.
Example: telnet to interactive editor:
– Send 1 character at a time: 20-byte TCP header+ 1byte data+20-byte IP header.
– Receiver ACKs immediately: 40-byte ACK.
– When editor reads character, window update: 40-byte
datagram.
– Then echoes character back: 41-byte datagram.
Nagle’s Algorithm
Tries to optimize sending of small data
chunks.
 Example: telnet to interactive editor).

– Send first byte and buffer the rest until
outstanding byte is ACKed; then send all buffered
data in one segment; buffer until next ACK.

Disabled in some cases (e.g., window
application: mouse movements).
Silly Window Syndrome

Caused by receiver sending window updates of very
small values.
– Example:
» Receiver application reads 1 byte at a time and receiver TCP sends
1-byte window update.
» Sender TCP has large blocks to send but can only send 1 byte at a
time.

Solution: [Clark] prevent receiver from generating
small window advertisements; also, sender can wait.
Congestion Control

Why do it at the transport layer?
– Real fix to congestion is to slow down sender.

Use law of “conservation of packets”.
– Keep number of packets in the network
constant.
– Don’t inject new packet until old one leaves.

Congestion indicator: packet loss.
TCP Congestion Control 1

Like, flow control, also window based.
– Sender keeps congestion window (cwin).
– Each sender keeps 2 windows: receiver’s
advertised window and congestion window.
– Number of bytes that may be sent is
min(advertised window, cwin).
TCP Congestion Control 2

Slow start [Jacobson 1988]:
– Connection’s congestion window starts at 1
segment.
– If segment ACKed before time out,
cwin=cwin+1.
– As ACKs come in, current cwin is increased by
1.
– Exponential increase.
TCP Congestion Control 3

Congestion Avoidance:
–
–
–
–
–
Third parameter: threshold.
Initially set to 64KB.
If timeout, threshold=cwin/2 and cwin=1.
Re-enters slow-start until cwin=threshold.
Then, cwin grows linearly until it reaches
receiver’s advertised window.
TCP Congestion Control:
Example
TCP Retransmission Timer

When segment sent, retransmission timer
starts.
– If segment ACKed, timer stops.
– If time out, segment retransmitted and timer
starts again.
How to set timer?
Based on round-trip time: time between a
segment is sent and ACK comes back.
 If timer is too short, unnecessary
retransmissions.
 If timer is too long, long retransmission
delay.

Jacobson’s Algorithm 1

Determining the round-trip time:
– TCP keeps RTT variable.
– When segment sent, TCP measures how long it
takes to get ACK back (M).
– RTT = alpha*RTT + (1-alpha)M.
– alpha: smoothing factor; determines weight
given to previous estimate.
– Typically, alpha=7/8.
Jacobson’s Algorithm 2

Determining timeout value:
– Measure RTT variation, or |RTT-M|.
– Keeps smoothed value of cumulative variation
D=alpha*D+(1-alpha)|RTT-M|.
– Alpha may or may not be the same as value
used to smooth RTT.
– Timeout = RTT+4*D.
Karn’s Algorithm

How to compute ACKs for retransmitted
segments?
– Count it for first or second transmission?
– Karn proposed not to update RTT on any
retransmitted segment.
– Instead RTT is doubled on each failure until
segments get through.
Persistence Timer
Prevents deadlock if an window update
packet is lost and advertised window = 0.
 When persistence timer goes off, sender
probes receiver; receiver replies with its
current advertised window.
 If 0, persistence timer is set again.

Keepalive Timer
Goes off when a connection is idle for a
long time.
 Causes one side to check whether the other
side is still alive.
 If no answer, connection terminated.

TIME_WAIT
2*MSL.
 Makes sure all segments die after
connection is closed.

Wireless TCP 1
According to layered system design
principles, transport protocol should be
independent of underlying technology.
 However, wireless networks invalidate this
principle.

– Ignoring properties of wireless medium can
lead to poor TCP performance.
– Problem: TCP’s congestion control.
Wireless TCP 2

Problem: packet loss as congestion
indicator.
– When retransmission timer times out, sender
slows down.

Wireless links are lossy!
– Dealing with losses in this case should be resending lost segments asap.
Indirect TCP (I-TCP)
 [Bakne and Badrinath, 1995].

Split TCP connection in 2: one from sender to base
station and the other from base station to receiver.
– Base station serves as “repeater”: copies segments
between connections in both directions.
– Connections are homogeneous; timeouts on 1st.
connection, slow down sender.
– Problem: violates TCP’s e2e’ness.

Example: ACKs to sender mean base station received segments, not
necessarily receiver.
Snoop TCP
 [Balakrishnan et al., 1995].


Does not break connection.
Modifications to base station’s network layer code.
– Snooping agent on base station observes and caches TCP
segments sent to mobile host and ACKs coming back.
– If it doesn’t see an ACK for a segment or sees duplicate
ACKs, it times out and retransmits.
– But source may time out anyway.
End-To-End Argument
Design principle to help guide placement of
functionality in distributed systems.
 Rationale for moving functions upward
closer to application.

Where to place distributed
systems functions?

Layered system design:
– Different levels of abstraction for simplicity.
– Lower layer provides service to upper layer.
– Very well defined interfaces.

Some functions can be implemented at
different layers or even at multiple layers.
E2E Argument Statement
“The function in question can completely and
correctly be implemented only with the
knowledge and help of the application at the
endpoints. Therefore providing that function
in the communication system itself is not
possible. Sometimes an incomplete version
of the function provided by the
communication system may be useful as
performance enhancement.”
Functions Closer to Application


E2E argument paper argues that functions should be
moved closer to the application that uses them.
Rationale:
– Some functions can only be completely and correctly
implemented with app’s knowledge.
» Example: file transfer.
» If error occurs in the network, network reliability can fix it.
» Otherwise, only application can.
Another perspective: Cost

Why pay for something you don’t need.
» Example 1: the Internet.
» Example 2: trend in kernel design - take away from
kernel as much functionality as possible.

Applications that don’t need certain
functions should not have to pay for them.
E2E Counter Argument

Performance!
– Example: File transfer
» Reliability checks at lower layers detect problems
earlier.
» Abort transfer and re-try without having to wait till
whole file is transmitted.

“Spread out” functionality across layers.
Domain Name System (DNS)
Basic function: translation of names (ASCII
strings) to network (IP) addresses and viceversa.
 Example:

– zephyr.isi.edu <-> 128.9.160.160
History

Original approach (ARPANET, 1970’s):
– File hosts.txt listed all hosts and their IP addresses.
– Every night every host fetches file from central
repository.
– OK for a few hundred hosts.
– Scalability?
» File size.
» Centrally managed.
DNS
Hierarchical name space.
 Distributed database.
 RFCs 1034 and 1035.

How is it used?

Client-server model.
– Client DNS (running on client hosts), or
resolver.
– Application calls resolver with name.
– Resolver contacts local DNS server (using
UDP) passing the name.
– Server returns corresponding IP address.
DNS Name Space

Tree-based hierarchy.
int
com
ibm
eng sales cs
edu
usc
ee
gov mil
org
net
us
ca …
Name Space Structure

Top-level domains:
– Generic.
– Countries.
Leaf domains: no sub-domains.
 In practice all US organizations are under a
generic domain, while everything outside
the US is under the corresponding country
domain.

DNS Names

Domain names:
– Concatenation of all domain names starting from
its own all the way to the root separated by “.”.
– Refers to a tree node and all names under it.
– Case insensitive.
– Components up to 63 characters.
– Full name less than 255 characters.
Name Space Management

Domains are autonomous.
– Organizational boundaries.
– Each domain manages its own name space
independently of other domains.

Delegation:
– When creating new domain: register with parent
domain.
» For name uniqueness.
» For name resolution.
Resource Records





Entry in the DNS database.
Several types of entries or RRs.
Example: RR “A” contains IP address.
Name <-> several resource records.
RR format: five-tuple.
–
–
–
–
–
Name.
TTL (in seconds).
Class (usually “IN” for Internet info).
Type: type of RR.
Value.
RR Types 1

SOA: start of authority.
– Marks beginning of zone’s database.
– Provides general info about the zone: e-mail
address of admin, default TTL, etc.

A: address.
– Contains 32-bit IP address.
– Single name <-> several A RRs.

MX: mail exchange.
– Name of mail server for this domain.
RR Types 2

NS: name server.
– Name of name server for this domain.

CNAME: canonical name.
– Alias.

HINFO: host description.
– Provides information about host, e.g., CPU type, OS,
etc.

TXT: arbitrary string of characters.
– Generic description of the domain, where it is located,
etc.
Name Servers

Entire database in a single name server.
– Practical?
– Why?
DNS database is partitioned into zones.
 Each zone contains part of the DNS tree.
 Zone <-> name server.

– Each zone may be served by more than 1 server.
– A server may serve multiple zones.

Primary and secondary name servers.
Name Resolution 1


Application wants to resolve name.
Resolver sends query to local name server.
– Resolver configured with list of local name servers.
– Select servers in round-robin fashion.

If name is local, local name server returns matching
authoritative RRs.
– Authoritative RR comes from authority managing the RR
and is always correct.
– Cached RRs may be out of date.
Name Resolution 2

If information not available locally (not
even cached), local NS will have to ask
someone else.
– It asks the server of the top-level domain of the
name requested.
Recursive Resolution

Recursive query:
– Each server that doesn’t have info forwards it to
someone else.
– Response finds its way back.

Alternative:
– Name server not able to resolve query, sends back
the name of the next server to try.
– Some servers use this method.
– More control for clients.
Example

Suppose resolver on flits.cs.vu.nl wants to resolve
linda.cs.yale.edu.
–
–
–
–
Local NS, cs.vu.nl, gets queried but cannot resolve it.
It then contacts .edu server.
.edu server forwards query to yale.edu server.
yale.edu contacts cs.yale.edu, which has the authoritative
RR.
– Response finds its way back to originator.
– cs.vu.nl caches this info.
» Not authoritative (since may be out-of-date).
» RR TTL determines how long RR should be cached.