Introduction to Distributed Systems and Networking

Download Report

Transcript Introduction to Distributed Systems and Networking

Introduction to Networking
Internet: Example
• Click -> get page
• Specifies
- protocol (http)
- location
(www.cnn.com)
Internet: Locating Resource
• www.cnn.com
– name of a computer
– Implicitly also a file
• Map name to IP
address
– DNS
cnn.com?
cnn.com?
host
com
local
a.b.c.d
a.b.c.d
Internet: Connection
• Http sets up a connection (tcp)
– between the host and cnn.com to transfer the page
• The connection transfers page as a byte stream
– without errors: flow control + error control
Host
www.cnn.com
Page; close
Internet: End-to-end
• Byte stream flows end to end across many links/switches:
– routing (+ addressing)
• That stream is regulated and controlled by both ends:
– retransmission of erroneous or missing bytes; flow control
end-to-end pacing and
error control
routing
HOST
CNN.COM
Internet: Packets
• The network transports bytes grouped into packets
• Packets are “self-contained”; routers handle them 1 by 1
• The end hosts worry about errors and pacing
– Destination sends ACKs; Source checks losses
A | B | # , CRC | bytes
CNN.COM: A
HOST: B
C
B: to
C
Internet: Bits
• Equipment in each node sends packets as string of bits
• That equipment is not aware of the meaning of the bits
• Frames (packetizing) vs. streams
01011...011...110
01011...011...110
Transmitter
Physical Medium
Optical
Copper
Wireless
Receiver
Internet: Points to remember
• Separation of tasks
– send bits on a link: transmitter/receiver [clock, modulation,…]
– send packet on each hop [framing, error detection,…]
– send packet end to end [addressing, routing]
– pace transmissions [detect congestion]
– retransmit erroneous or missing packets [acks, timeout]
– find destination address from name [DNS]
• Scalability
– routers don’t know full path
– names and addresses are hierarchical
Internet : Challenges
•
•
•
•
•
•
Addressing ?
Routing ?
Reliable transmission ?
Interoperability ?
Resource management ?
Quality of service ?
Concepts at heart of the
Internet
•
•
•
•
•
Protocol
Layered Architecture
Packet Switching
Distributed Control
Open System
Protocol
• Two communicating entities must agree on:
– Expected order and meaning of messages they exchange
– The action to perform on sending/receiving a message
• Asking the time
Layered Architectures
• Human beings can handle lots of complexity in their
protocol processing.
– Ambiguously defined protocols
– Many protocols all at once
• How computers manage complex protocol
processing?
– Specify well defined protocols to enact.
– Decompose complicated jobs into layers;
• each has a well defined task
Layered Architectures
• Break-up design problem into smaller problems
– More manageable
• Modular design: easy to extend/modify.
• Difficult to implement
– careful with interaction of layers for efficiency
Layered Architecture
network
users
Applications
Web, e-mail, file transfer, ...
Middleware
Reliable/ordered transmission, QOS,
security, compression, ...
Routing
Physical Links
End-to-end transmission,
resource allocation, routing, ...
Point-to-point links,
LANs, radios, ...
The OSI Model
• Open Systems Interconnect model is a standard way of
understanding conceptual layers of network comm.
• This is a model, nobody builds systems like this.
• Each level provides certain functions and guarantees, and
communicates with the same level on remote notes.
• A message is generated at the highest level, and is passed
down the levels, encapsulated by lower levels, until it is sent
over the wire.
• On the destination, it makes its way up the layers,until the highlevel msg reaches its high-level destination.
OSI Levels
Node A Application
Application
Presentation
Presentation
Transport
Transport
Network
Network
Data Link
Data Link
Physical
Physical
Network
Node B
OSI Levels
•
•
•
•
Physical Layer: electrical details of bits on the wire
Data Link: sending “frames” of bits and error detection
Network Layer:” routing packets to the destination
Transport Layer: reliable transmission of messages,
disassembly/assembly, ordering, retransmission of lost packets
• Session Layer; really part of transport, typ. Not impl.
• Presentation Layer: data representation in the message
• Application: high-level protocols (mail, ftp, etc.)
Internet protocol stack
network
users
Application
HTTP, SMTP, FTP, TELNET, DNS, …
Transport
TCP, UDP.
Network
IP
Physical
Point-to-point links,
LANs, radios, ...
Air travel
Passenger Origin
Passenger Destination
Ticket (purchase)
Ticket (complain)
Baggage (check)
Baggage (claim)
Gates (load)
Gates (unload)
Runway (take off)
Runway (landing)
Airplane routing
Protocol stack
user X
English
user Y
e-mail client
SMTP
e-mail server
TCP server
TCP
TCP server
IP server
ethernet
driver/card
IP
IEEE 802.3 standard
electric signals
IP server
ethernet
driver/card
Protocol interfaces
user X
user Y
e-mail client
TCP server
e-mail server
s = open_socket();
socket_write(s, buffer);
…
TCP server
IP server
IP server
ethernet
driver/card
ethernet
driver/card
Addressing
• Each network interface has a hardware address
– Multiple interfaces  multiple addresses
• Each application communicates via a port
– Port is a logical connection endpoint
– Allows multiple local applications to use network resources
– Up to 65535
• < 1024 : used by privileged applications
• 1024 ≤ available for use ≤ 49151
• 49152 ≤ Dynamic ports/private ports ≤ 65535
– http ports 80 and 8080
– telnet 23, ftp 21, etc
• Think of a telephone network …
Addressing and Packet Format
• The ``Data'' segment contains higher level
protocol information.
Start (7 bytes)
– Which protocol is this packet destined for?
– Which process is the packet destined for?
– Which packet is this in a sequence of
packets?
Destination (6)
Source (6)
Length (2)
– What kind of packet is this?
• This is the stuff of the OSI reference model.
Msg Data (1500)
Checksum (4)
Ethernet packet dispatching
•
•
•
•
•
•
•
•
•
An incoming packet comes into the Ethernet controller.
The Ethernet controller reads it off the network into a buffer.
It interrupts the CPU.
A network interrupt handler reads the packet out of the controller into
memory.
A dispatch routine looks at the Data part and hands it to a higher level
protocol
The higher level protocol copies it out into user space.
A program manipulates the data.
The output path is similar.
Consider what happens when you send mail.
Hi Dad.
Example: Mail
To: Dad
Hi Dad.
SrcAddr: 128.95.1.2
DestAddr: 128.95.1.3
SrcPort: 110,
DestPort: 110Bytes: 1-20
Hi Dad.
To: Dad
Mail Composition And Display
User
Mail Transport Layer
Kernel
To: Dad
Hi Dad.
SrcAddr: 128.95.1.2
DestAddr: 128.95.1.3
SrcPort: 110,
DestPort: 110Bytes: 1-20
To: Dad
Network Transport Layer
Hi Dad.
SrcEther: 0xdeadbeef
DestEther: 0xfeedface
SrcAddr: 128.95.1.2
DestAddr: 128.95.1.3
SrcPort: 100
DestPort: 200Bytes: 1-20
Hi Dad.
Link Layer
To: Dad
Hi Dad.
SrcEther: 0xdeadbeef
DestEther: 0xfeedface
SrcAddr: 128.95.1.2
DestAddr: 128.95.1.3
SrcPort: 100
DestPort: 200Bytes: 1-20
To: Dad
Network
Hi Dad.
Protocol encapsulation
user X
“Hello”
user Y
e-mail client
“Hello”
e-mail server
TCP server
“Hello”
TCP server
IP server
“Hello”
IP server
“Hello”
ethernet
driver/card
ethernet
driver/card
End-to-End Argument
• What function to implement in each layer?
• Saltzer, Reed, Clarke 1984
– A function can be correctly and completely implemented only with
the knowledge and help of applications standing at the
communication endpoints
– Argues for moving function upward in a layered architecture
• Should the network guarantee packet delivery ?
– Think about a file transfer program
– Read file from disk, send it, the receiver reads packets and writes
them to the disk
End-to-End Argument
• If the network guaranteed packet delivery
– one might think that the applications would be simpler
• No need to worry about retransmits
– But need to check that file was written to the remote disk
intact
• A check is necessary if nodes can fail
– Consequently, applications need to perform their retransmits
• No need to burden the internals of the network with
properties that can, and must, be implemented at the
periphery
End-to-End Argument
• An Occam’s razor for Internet design
– If there is a problem, the simplest explanation is probably the
correct one
• Application-specific properties are best provided by
the applications, not the network
– Guaranteed, or ordered, packet delivery, duplicate
suppression, security, etc.
• The internet performs the simplest packet routing and
delivery service it can
– Packets are sent on a best-effort basis
– Higher-level applications do the rest
Two ways to handle
networking
• Circuit Switching
– What you get when you make a phone call
– Dedicated circuit per call
• Packet Switching
– What you get when you send a bunch of letters
– Network bandwidth consumed only when sending
– Packets are routed independently
Circuit Switching
• End-to-end resources reserved for “call”
–
–
–
–
Link bandwidth, switch capacity
Dedicated resources: no sharing
Circuit-like (guaranteed) performance
Call setup required
Packet Switching
• Each end-to-end data stream divided into packets
– User’s packets share network resources
• Compared to dedicated allocation
– Each packet uses full link bandwidth
• Compared to dividing bandwidth into pieces
– Resources are used as needed
• Compared to resource reservation
• Resource contention:
– Aggregate demand can exceed amount available
– Congestion: packets queue, wait for link use
– Store and forward: packets move one hop at a time
• Transmit over link
• Wait turn at next link
Routing
• Goal: move data among routers from source to dest.
• Datagram packet network:
–
–
–
–
Destination address determines next hop
Routes may change during session
Analogy: driving, asking directions
No notion of call state
• Circuit-switched network:
– Call allocated time slots of bandwidth at each link
– Fixed path (for call) determined at call setup
– Switches maintain lots of per call state: resource allocation
Packet vs. Circuit Switching
• Reliability: no congestion, in-order data in circuit-switch
• Packet switching: better bandwidth use
• State, resources: packet switching has less state
– Good: less control plane processing resources along the way
– More data plane (address lookup) processing
• Failure modes (routers/links down)
– Packet switch reconfigures sub-second timescale
– Circuit switching: more complicated
• Involves all switches in the path
A small Internet
W
w,e5
V
Scenario:
A wants to send data to B.
R
r3
r1,e1
a,e3
A
r2,e2
b,e4
B
Packet forwarding
Host A
Host B
Router R
Router W
HTTP
HTTP
TCP
TCP
IP
IP
ethernet
eth
IP
link
link
IP
eth
ethernet
The Link Layer
What is purpose of this layer?
• Physically encode bits on the wire
• Link = pipe to send information
– E.g. point to point or broadcast
• Can be built out of:
– Twisted pair, coaxial cable, optical fiber, radio waves, etc
• Links should only be able to send data
– Could corrupt, lose, reorder, duplicate, (fail in other ways)
How to connect
routers/machines?
• WAN/Router Connections
– Commercial:
•
•
•
•
•
T1 (1.5 Mbps), T3 (44 Mbps)
OC1 (51 Mbps), OC3 (155 Mbps)
ISDN (64 Kbps)
Frame Relay (1-100 Mbps, usually 1.5 Mbps)
ATM (some Gbps)
– To your home:
• DSL
• Cable
• Local Area:
– Ethernet: IEEE 802.3 (10 Mbps, 100 Mbps, 1 Gbps, 10Gbps)
– Wireless: IEEE 802.11 b/g/a (11 Mbps, 22 Mbps, 54 Mbps)
Link level Issues
•
•
•
•
Encoding: map bits to analog signals
Framing: Group bits into frames (packets)
Arbitration: multiple senders, one resource
Addressing: multiple receivers, one wire
Encoding
• Map 1s and 0s to electric signals
• Simple scheme: Non-Return to Zero (NRZ)
– 0 = low voltage, 1 = high voltage
1
0
1
1
0
• Problems:
– How to tell an error? When jammed? When is bus idle?
– When to sample? Clock recovery is difficult.
• Idea: Recover clock using encoding transitions
Manchester Encoding
• Used by Ethernet
• Idea: Map 0 to low-to-high transition, 1 to high-to-low
0
1
1
0
• Plusses: can detect dead-link, can recover clock
• Bad: reduce bandwidth, i.e. bit rate = ½ baud rate
– If wire can do X transition per second?
Framing
• Why send packets?
– Error control
• How do you know when to stop reading?
– Sentinel approach: send start and end sequence
– For example, if sentinel is 11111
– 11111 00101001111100 11111 10101001 11111 010011 11111
– What if sentinel appears in the data?
• map sentinel to something else, receiver maps it back
– Bit stuffing
Example: HDLC
• Same sentinel for begin and end: 0111 1110
• packet format:
0111 1110
header
data
CRC
• Bit stuffing
– Sender: If 5 1s then insert a 0
0111 1110
0111 1101 0
– Receiver: if 5 1s followed by a 0, remove 0
0111 1101 0
• Else read next bit
0111 1110
• Packet size now depends on the contents
0111 1110
Arbitration
• One medium, multiple senders
– What did we do for CPU, memory, readers/writers?
– New Problem: No centralized control
• Approaches
– TDMA: Time Division Multiple Access
• Divide time into slots, round robin among senders
• If you exceed the capacity  do not admit more (busy signal)
– FDMA: Frequency Division Multiple Access (AMPS)
• Divide spectrum into channels, give each sender a channel
• If no more channels available, give a busy signal
– Good for continuous streams: fixed delay, constant data rate
– Bad for bursty Internet traffic: idle slots
Ethernet
• Developed in 1976, Metcalfe and Boggs at Xerox
• Uses CSMA/CD:
– Carrier Sense Multiple Access with Collision Detection
• Easy way to connect LANs
Metcalfe’s Ethernet sketch
CSMA/CD
• Carrier Sense:
– Listen before you speak
• Multiple Access:
– Multiple hosts can access the network
• Collision Detection:
– Can make out if someone else started speaking
Older Ethernet Frame
CSMA
Wait
until
carrier
free
CSMA/CA
Garbled signals
If the sender detects a collision, it will stop and then retry!
What is the problem?
CSMA/CD
Packet?
No
Sense
Carrier
Send
Detect
Collision
Yes
Discard
Packet
attempts < 16
attempts == 16
Jam channel
b=CalcBackoff();
wait(b);
attempts++;
Ethernet’s CSMA/CD (more)
Jam Signal: make sure all other transmitters are aware of collision;
48 bits;
Exponential Backoff:
• Goal: adapt retransmission attempts to estimated current load
– heavy load: random wait will be longer
• first collision: choose K from {0,1}; delay is K x 512 bit
transmission times
• after second collision: choose K from {0,1,2,3}…
• after ten or more collisions, choose K from {0,1,2,3,4,…,1023}
Packet Size
If packets are too small, the collision goes unnoticed
 Limit packet size
 Limit network diameter
 Use CRC to check frame integrity
 truncated packets are filtered out
Ethernet Problems
• What if there is a malicious user?
– Might not use exponential backoff
– Might listen promiscuously to packets
• Integrating Fast and Gigabit Ethernet
Addressing & ARP
128.84.96.89
128.84.96.90
128.84.96.91
“What is the physical
address of the host
named 128.84.96.89”
“I’m at 1a:34:2c:9a:de:cc”
• ARP is used to discover physical addresses
• ARP = Address Resolution Protocol
Addressing & RARP
???
128.84.96.90
RARP Server
128.84.96.91
“I just got here. My
physical address is
1a:34:2c:9a:de:cc.
What’s my name ?”
“Your name is
128.84.96.89”
• RARP is used to discover virtual addresses
• RARP = Reverse Address Resolution Protocol
Repeaters and Bridges
• Both connect LAN segments
• Usually do not originate data
• Repeaters (Hubs): physical layer devices
– forward packets on all LAN segments
– Useful for increasing range
– Increases contention
• Bridges: link layer devices
– Forward packets only if meant on that segment
– Isolates congestion
– More expensive
Backbone Bridge
The Network Layer
Purpose of Network layer
• Given a packet, send it across the network to destination
• 2 key issues:
– Portability:
• connect different technologies
– Scalability
• To the Internet scale
application
transport
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
application
transport
network
data link
physical
What does it involve?
Two important functions:
• routing: determine path from source to dest.
• forwarding: move packets from router’s input to
output
T1
T3
Sts-1
T3
T1
Network service model
Q: What service model for
“channel” transporting packets
from sender to receiver?
•
•
•
•
•
guaranteed bandwidth?
preservation of inter-packet timing
(no jitter)?
loss-free delivery?
in-order delivery?
congestion feedback to sender?
The most important
abstraction provided
by network layer:
Which things can be “faked” at the transport layer?
? ?
?
virtual circuit
or
datagram?
Two connection models
• Connectionless (or “datagram”):
– each packet contains enough information that routers can decide how
to get it to its final destination
b
A
b
• Connection-oriented (or “virtual circuit”)
B
C
– first set up a connection between two nodes
– label it (called a virtual circuit identifier (VCI))
– all packets carry label
A
1
1
1
B
C
Virtual circuits: signaling
protocols
•
•
•
•
used to setup, maintain teardown VC
setup gives opportunity to reserve resources
used in ATM, frame-relay, X.25
not used in today’s Internet
application
transport 5. Data flow begins
network 4. Call connected
data link 1. Initiate call
physical
6. Receive data application
3. Accept call transport
2. incoming call network
data link
physical
Virtual circuit switching
• Forming a circuit:
– send a connection request from A to B. Contains VCI + address of B
– rule: VCI must be unique on the link its used on
– switch creates an entry mapping input messages with VCI to output
port
– switch picks a new VCI unique between it and next switch
2
1
2
a
c
1
5
2
1
b
Virtual circuit forwarding
• For each VCI switch has a table which maps input link to
output link and gives the new VCI to use
– if a’s messages come into switch 1 on link 2 and go out on link 3 then
the table will be:
(Input link,VCI) (output link, new VCI)
(1, 2)
(?, ?)
(1, 5)
(?, ?)
Switch 1
2
1
2
a
Switch 2
5
2
c
1
Switch 3
1
b
Virtual Circuits: Discussion
• Plusses: easy to associate resources with VC
– Easy to provide QoS guarantees (bandwidth, delay)
– Very little state in packet
• Minuses:
– Not good in case of crashes
• Requires explicit connect and teardown phases
– What if teardown does not get to all routers?
– What if one switch crashes?
• Will have to teardown and rebuild route
Datagram
networks
no call setup at network layer
•
• routers: no state about end-to-end connections
– no network-level concept of “connection”
• packets typically routed using destination host ID
– packets between same source-dest pair may take different paths
• Best effort: data corruption, packet drops, route loops
application
transport
network
data link 1. Send data
physical
application
transport
2. Receive data network
data link
physical
Datagrams: Forwarding
How does packet get to the destination?
• switch creates a “forwarding table”, mapping destinations to
output port (ignores input ports)
• when a packet with a destination address in the table arrives, it
pushes it out on the appropriate output port
• when a packet with a destination address not in the table
arrives, it must find out more routing information (next problem)
d
0 S1
2
1
0
S2
c
1
2
a
0
S3
1
b
Datagrams
• Plusses:
–
–
–
–
No round trip connection setup time
No explicit route teardown
No resource reservation  each flow could get max bandwidth
Easily handles switch failures; routes around it
• Minuses
– Difficult to provide resource guarantees
– Higher per packet overhead
• Internet uses datagrams: IP (Internet Protocol)
Datagrams Forwarding
• How to build forwarding tables?
– Manually enter it
• What if nodes crashed
• What about scale?
• The graph-theoretic routing problem
– Given a graph, with vertices (switches), edges (links), and
edge costs (cost of sending on that link)
– Find the least cost path between any two nodes
• Path cost =  (cost of edges in path)
Simple Routing Algorithm
• Choose a central node
– All nodes send their (nbr, cost) information to this node
– Central node uses info to learn entire topology of the network
– It then computes shortest paths between all pairs of nodes
• Using All Pair Shortest Path Algorithm
– Sends the new matrix to every node
• Nice, simple, elegant!
• What is the problem?
– Scalability: centralization hurts scalability
– Central node is “crushed” with traffic
Link State Routing
• Basic idea:
– Every node propagates its (nbr, cost) information
– This information at all nodes is enough to construct topology
– Can use a graph algorithm to find the shortest routes
• Mechanisms required:
– Reliable flooding of link information
– Method to calculate shortest route (Dijkstra’s algorithm)
• Example link state update packet:
– [node id, (nbr, cost) list, seq. no., ttl]
• Seq. no. to identify latest updates, ttl specifies when to stop msg.
Reliable flooding
receive(pkt)
If already have a copy of LSP from pkt.ID
if pkt’s sequence number <= copy’s
discard pkt
else
decrement pkt.TTL
replace copy with pkt
forward pkt to all links besides the
one that we received it on
# done every 10 minutes or so
gen_LSP()
increment node’s sequence # by one
recompute cost vector
send created LSP to all neighbors
Discussion: Link-State
Routing
Plusses:
•
– Simple, determines the optimal route most of the time
– Used by OSPF
• Minuses:
D
1
– Might have oscillations
A
1 A
1+e
0
0 0
C
e
B
1
2+e
0
D 1+e 1 B
0
0
C
0
D
1
A
0 0
C
2+e
B
1+e
e
Initially start with … everyone goes with … recompute
Least loaded =>
almost equal routes
least loaded
2+e
A
0
D 1+e 1 B
e
0
C
… recompute
Most loaded
– Avoid using load as cost metric, reduce herding effect
Is our routing algorithm
scalable?
• Route table size grows with size of network
– Because our address structure is flat!
• Solution: have a hierarchical structure
– Used by OSPF
– Divide the network into areas, each area has unique number
• Nodes carry their area number in the address 1.A, 2.B, etc.
– Nodes know complete topology in their area
– Area border routers (ABR) know how to get to any other
area
Hierarchical Addressing
Zone 2
2.a
1.b
0 S1
2
1
0
1.a
Forwarding table for switch 1
Destination switch port
2.
?
3.
?
1.b
?
1.a
?
1
2
S2
2.b
3
0
S3
1
3.b
2
Zone 3
3.a
IP has 2-layer addressing
• Each IP address is 32 bits
– Network part: which network the host is on?
– Host part: identifies the host.
• All hosts on same network have the same network part
18.26.0.1
network
32-bits
host
• 3 classes of addresses: A, B and C
0 net
1 7
host
24 bits
1 0 net
host
110
net
host
2
16 bits
3
21
8 bits
14
IP addressing
• The different classes:
class
Unicast
A
0 network
B
10
C
110
Multicast D
1110
Reserved E
1111
1.0.0.0 to
127.255.255.255
host
network
128.0.0.0 to
191.255.255.255
host
network
host
multicast address
reserved
32 bits
• Problems: inefficient, address space exhaustion
192.0.0.0 to
223.255.255.255
224.0.0.0 to
239.255.255.255
240.0.0.0 to
255.255.255.255
IP addressing: CIDR
• Classless InterDomain Routing
– network portion of address of arbitrary length
– address format: a.b.c.d/x, where x is # bits in network portion
network
part
host
part
11001000 00010111 00010000 00000000
– Examples:
• Class A: /8
• Class B: /16
• Class C: /24
200.23.16.0/23
Internet Protocol Datagram
IP protocol version
Number
header length
“type” of data
max number
remaining hops
(decremented at
each router)
upper layer protocol
to deliver payload to
32 bits
type of
ver head.
len service
length
fragment
16-bit identifier flgs
offset
time to upper
Internet
layer
live
checksum
total datagram
length (bytes)
for
fragmentation/
reassembly
32 bit source IP address
32 bit destination IP address
Options (if any)
data
(variable length,
typically a TCP
or UDP segment)
E.g. timestamp,
record route
taken, pecify
list of routers
to visit.
Datagram Portability
• IP Goal: To create one logical network from multiple physical
networks
– All intermediate routers should understand IP
– IP header information sufficient to carry the packet to destination
– Goal: Run over anything!
• Problem:
– Physical networks have different MTUs
– “max. transmission unit”: 1500 for Ethernet, 48 for ATM
• Solution 1:
– Fit everything in the MTU (!)
IP Fragmentation & Reassembly
• Solution 2: (the one used)
– If packet size > MTU of network, then fragment into pieces
• Each fragment is less than MTU size
• Each has IP headers + frag bit set + frag id + offset
– Packets may get refragmented on the way to destination
– Reassembly only done at the destination
reassembly
– What is a good initial packet size?
fragmentation:
in: one large datagram
out: 3 smaller datagrams
Internet: Names and
Addresses
Naming in the Internet
• What are named? All Internet Resources.
– Objects: www.cs.cornell.edu/~einar
– Services: weather.yahoo.com/forecast
– Hosts: planetlab1.cs.cornell.edu
• Characteristics of Internet Names
– human recognizable
– unique
– Persistent?
• Universal Resource Names (URNs)
Locating the resources
• Internet services and resources are provided by end-hosts
– ex. www1.cs.cornell.edu and www2.cs.cornell.edu host Einar’s home
page.
• Names are mapped to Locations
– Universal Resource Locators (URL)
– Embedded in the name itself: ex. weather.yahoo.com/forecast
• Semantics of Internet naming
 human recognizable
 uniqueness
x persistent
Locating the Hosts?
• Internet Protocol Addresses (IP Addresses)
– ex. planetlab1.cs.cornell.edu  128.84.154.49
• Characteristics of IP Addresses
– 32 bit fixed-length
– enables network routers to efficiently handle packets in the Internet
• Locating services on hosts
– port numbers (16 bit unsigned integer) 65536 ports
– standard ports: HTTP 80, FTP 20, SSH 22, Telnet 20
Mapping Not 1 to 1
• One host may map to more than one name
– One server machine may be the web server (www.foo.com),
mail server (mail.foo.com)etc.
• One host may have more than one IP address
– IP addresses are per network interface
• But IP addresses are generally unique!
– two globally visible machines should not have the same IP
address
– Anycast is an Exception:
• routers send packets dynamically to the closest host matching
an anycast address
How to get a name?
• Naming in Internet is Hierarchical
– decreases centralization
– improves name space management
• First, get a domain name then you are free to
assign sub names in that domain
– How to get a domain name coming up
• Example: weather.yahoo.com belongs to
yahoo.com which belongs to .com
– regulated by global non-profit bodies
Domain name structure
root (unnamed)
com edu gov
mil net org
gTLDs
lucent
cornell
ustreas
...
fr
gr
us uk
...
ccTLDs
second level (sub-)domains
gTLDs= Generic Top Level Domains
ccTLDs = Country Code Top Level Domains
Top-level Domains
(TLDs)
• Generic Top Level Domains (gTLDs)
–
–
–
–
–
–
–
.com - commercial organizations
.org - not-for-profit organizations
.edu - educational organizations
.mil - military organizations
.gov - governmental organizations
.net - network service providers
New: .biz, .info, .name, .xxx (nearly..)
• Country code Top Level Domains (ccTLDs)
– One for each country
How to get a domain name?
• In 1998, non-profit corporation, Internet
Corporation for Assigned Names and
Numbers (ICANN), was formed to assume
responsibility from the US Government
• ICANN authorizes other companies to
register domains in com, org and net and
new gTLDs
– Network Solutions is largest and in transitional
period between US Govt and ICANN had sole
authority to register domains in com, org and net
ICANN and politics..
• Why should a US company control
Internet naming?
• Should companies (from whatever
country) be able to profit from internet
names?
– 28th Aug 2006: “ICANN to allow domain
registries to charge ‘what the market will
bear’ for domain names & renewals”
How to get an IP Address?
• Answer 1: Normally, answer is get an IP address from
your upstream provider
– This is essential to maintain efficient routing!
• Answer 2: If you need lots of IP addresses then you
can acquire your own block of them.
– IP address space is a scarce resource - must prove you
have fully utilized a small block before can ask for a larger
one and pay $$ (Jan 2002 - $2250/year for /20 and
$18000/year for a /14)
How to get lots of IP
Addresses? Internet
Registries
RIPE NCC (Riseaux IP Europiens Network
Coordination Centre) for Europe, Middle-East, Africa
APNIC (Asia Pacific Network Information Centre )for
Asia and Pacific
ARIN (American Registry for Internet Numbers) for the
Americas, the Caribbean, sub-saharan Africa
Note: Once again regional distribution is important for
efficient routing!
Can also get Autonomous System Numbers (ASNs from
these registries
Are there enough addresses?
• Unfortunately No!
– 32 bits  4 billion unique addresses
– but addresses are assigned in chunks
– ex. cornell has four chunks of /16 addressed
• ex. 128.84.0.0 to 128.84.255.255
• 128.253.0.0, 128.84.0.0, 132.236.0.0, and 140.251.0.0
• Expanding the address space!
– IPv6 128 bit addresses
– difficult to deploy (requires cooperation and
changes to the core of the Internet)
DHCP and NATs
• Dynamic Host Control Protocol
– lease IP addresses for short time intervals
– hosts may refresh addresses periodically
 only live hosts need valid IP addresses
• Network Address Translators
– Hide local IP addresses from rest of the world
– only a small number of IP addresses are visible outside
 solves address shortage for all practical purposes
 access is highly restricted
• ex. peer-to-peer communication is difficult
NATs in operation
• Translate addresses when packets traverse
through NATs
• Use port numbers to increase number of
supportable flows
DNS: Domain Name System
Domain Name System:
• distributed database implemented in
hierarchy of many name servers
• application-layer protocol host, routers, name
servers to communicate to resolve names
(address/name translation)
– Note how a core Internet function is implemented
as application-layer protocol
– complexity at network’s “edge”
DNS name servers
How could we provide this
service? Why not
centralize DNS?
•
•
•
•
single point of failure
traffic volume
distant centralized database
maintenance
doesn’t scale!
• no server has all name-to-IP
address mappings
Name server: process
running on a host that
processes DNS requests
local name servers:
– each ISP, company has local
(default) name server
– host DNS query first goes to
local name server
authoritative name server:
– can perform name/address
translation for a specific
domain or zone
Name Server Zone Structure
root
com gov edu
lucent
mil net org
fr
gr
us uk
Structure based on
administrative issues.
ustreas
irs
www
Zone: subtree with common
administration authority.
Name Servers (NS)
root
com gov edu
lucent
cornell
ustreas
customs
...
Root NS
Lucent NS
Ustreas NS
irs
IRS NS
www
Name Servers (NS)
• NSs are duplicated for reliability.
• Each domain must have a primary and secondary.
• Each host knows the IP address of the local NS.
• Each NS knows the IP addresses of all root NSs.
DNS: Root name servers
• contacted by local name
server that can not
resolve name
• root name server:
– Knows the
authoritative name
server for main
domain
• ~ 60 root name servers
worldwide
– real-world
application of
anycast
Simple DNS example
root name server
host surf.eurecom.fr wants
IP address of
2
www.cs.cornell.edu
5
1. Contacts its local DNS
server, dns.eurecom.fr
2. dns.eurecom.fr contacts
root name server, if
necessary
local name server
dns.eurecom.fr
3. root name server contacts
authoritative name server,
1
6
dns.cornell.edu, if
necessary (what might
be wrong with this?)
requesting host
surf.eurecom.fr
4
3
authorititive name server
dns.cornell.edu
www.cs.cornell.edu
root name server
.edu name server
DNS example
Root name server:
2
• may not know
authoritative name
server
local name server
dns.eurecom.fr
• may know intermediate
name server: who to
contact to find
authoritative name
1
server
4
3
5
6
7
8
9
intermediate name
server
dns.cornell.edu
10
requesting host
authoritative name server
dns.cs.cornell.edu
surf.eurecom.fr
www.cs.cornell.edu
DNS Architecture
• Hierarchical Namespace Management
– domains and sub-domains
– distributed and localized authority
• Authoritative Nameservers
– server mappings for specific sub-domains
– more than one (at least two for failure resilience)
• Caching to mitigate load on root servers
– time-to-live (ttl) used to delete expired cached
mappings
DNS: query resolution
root name server
.edu name server
iterated query:
• contacted server
replies with name of iterated query 2
4
server to contact
3
recursive
• “I don’t know this
5
query
name, but ask this
6
server”
9
• Takes burden off root
local name server intermediate name server
servers
recursive query:
• puts burden of name
resolution on contacted
name server
• reduces latency
dns.eurecom.fr
1
10
requesting host
dns.cornell.edu
8
7
authoritative name server
dns.cs.cornell.edu
surf.eurecom.fr
www.cs.cornell.edu
DNS records: More than Name to
IP Address
DNS: distributed db storing resource records (RR)
RR format: (name,
• Type=A
– name is hostname
– value is IP address
– One we’ve been
discussing; most
common
• Type=NS
– name is domain (e.g. foo.com)
– value is IP address of
authoritative name server for
this domain
value, type,ttl)
• Type=CNAME
– name is an alias
name for some
“cannonical” (the
real) name
– value is cannonical
name
• Type=MX
– value is hostname of
mailserver associated
with name
nslookup
• Use to query DNS servers (not telnet
like with http – why?)
• Examples:
– nslookup www.yahoo.com
– nslookup www.yahoo.com
dns.cs.cornell.edu
• specify which local nameserver to use
– nslookup –type=mx cs.cornell.edu
• specify record type
PTR Records
• Do reverse mapping from IP address to
name
• Why is that hard? Which name server is
responsible for that mapping? How do
you find them?
• Answer: special root domain, arpa, for
reverse lookups
Arpa top level domain
Want to know machine name for 128.30.33.1?
Issue a PTR request for 1.33.30.128.in-addr.arpa
root
arpa com gov edu
mil net org
In-addr
ietf
gr
33
1
us uk
www.ietf.org.
www
128
30
fr
1.33.30.128.in-addr.arpa.
Why is it backwards?
• Notice that 1.30.33.128.in-addr.arpa is written
in order of increasing scope of authority just
like www.cs.foo.edu
• Edu largest scope of authority; foo.edu less,
down to single machine www.cs.foo.edu
• Arpa largest scope of authority; in-addr.arpa
less, down to single machine 1.30.33.128.inaddr.arpa (or 128.33.30.1)
In-addr.arpa domain
• When an organization acquires a domain
name, they receive authority over the
corresponding part of the domain name
space.
• When an organization acquires a block of IP
address space, they receive authority over
the corresponding part of the in-addr.arpa
space.
• Example: Acquire domain berkeley.edu and
acquire a class B IP Network ID 128.143
DNS
protocol,
messages
DNS protocol : query and reply messages, both with same
message format
msg header
• identification: 16 bit #
for query, reply to query
uses same #
• flags:
– query or reply
– recursion desired
– recursion available
– reply is authoritative
– reply was truncated
DNS protocol, messages
Name, type fields
for a query
RRs in reponse
to query
records for
authoritative servers
additional “helpful”
info that may be used
The Transport Layer
Purpose of this layer
• Interface end-to-end applications and protocols
– Turn best-effort IP into a usable interface
• Data transfer b/w processes:
– Compared to end-to-end IP
• We will look at 2:
– TCP
– UDP
application
transport
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
application
transport
network
data link
physical
UDP
• Unreliable Datagram Protocol
• Best effort data delivery between processes
– No frills, bare bones transport protocol
– Packet may be lost, out of order
• Connectionless protocol:
– No handshaking between sender and receiver
– Each UDP datagram handled independently
UDP Functionality
• Multiplexing/Demultiplexing
– Using ports
• Checksums (optional)
– Check for corruption
application-layer
data
segment
header
segment
Ht M
Hn segment
P1
M
application
transport
network
P3
M
M
P4
application
transport
network
receiver
M
P2
application
transport
network
Multiplexing/Demultiplexing
• Multiplexing:
– Gather data from multiple processes, envelope data with header
– Header has src port, dest port for multiplexing
• Why not process id?
• Demultiplexing:
– Separate incoming data in machine to different applications
– Demux based on sender addr, src and dest port
32 bits
Length, in
bytes of UDP
segment,
including
header
source port #
dest port #
length
checksum
Application
data
(message)
UDP segment format
Implementing Ports
• As a message queue
– Append incoming message to the end
– Much like a mailbox file
• If queue full, message can be discarded
• When application reads from socket
– OS removes some bytes from the head of the queue
• If queue empty, application blocks waiting
UDP Checksum
• Over the headers and data
– Ensures integrity end-to-end
– 1’s complement sum of segment contents
• Is optional in UDP
• If checksum is non-zero, and receiver computes
another value:
– Silently drop the packet, no error message detected
UDP Discussion
• Why UDP?
– No delay in connection establishment
– Simple: no connection state
– Small header size
– No congestion control: can blast packets
• Uses:
– Streaming media, DNS, SNMP
– Could add application specific error recovery
TCP
• Transmission Control Protocol
– Reliable, in-order, process-to-process, two-way byte stream
• Different from UDP
– Connection-oriented
– Error recovery: Packet loss, duplication, corruption,
reordering
• A number of applications require this guarantee
– Web browsers use TCP
Handling Packet Loss
sender
message
receiver
time
There are a number of reasons why the packet may get lost:
- router congestion, lossy medium, etc.
How does sender know of a successful packet send?
Lost Acks
sender
message
timeout
time
ack
What if packet/ack is lost?
receiver
Delayed ACKs
sender
message
timeout
time
receiver
ack
message
What will happen here? Due to congestion, small timeout, …
Delayed ACKs  duplicate packets
Delayed ACKs
sender
m1
receiver
timeout
ack
time
m1
m2
timeout
ack
How to solve this scenario?
Insertion of Packets
sender
m1
receiver
ack1
time
m2
m2’
ack2
m2’ could be from an old expired session!
Message Identifiers
• Each message has <message id, session id>
– Message id: uniquely identifies message in sender
– Session id: unique across sessions
• Message ids detect duplication, reordering
• Session ids detect packet from old sessions
• TCP’s sequence number has similar functionality:
– Initial number chosen randomly
– Unique across packets
– Incremented by length of data bytes
TCP Packets
32 bits
URG: urgent data
(generally not used)
ACK: ACK #
valid
PSH: push data now
(generally not used)
RST, SYN, FIN:
connection estab
(setup, teardown
commands)
Internet
checksum
(as in UDP)
source port #
dest port #
sequence number
acknowledgement number
head not
UA P R S F
len used
checksum
rcvr window size
ptr urgent data
Options (variable length)
application
data
(variable length)
counting
by bytes
of data
(not segments!)
# bytes
rcvr willing
to accept
sender
TCP Connection
Establishment
receiver
TCP is connection-oriented. Starts with a 3-way handshake.
Protects against duplicate SYN packets.
TCP Usage
sender
receiver
TCP timeouts
• What is a good timeout period ?
– Want to improve throughput without unnecessary transmissions
NewAverageRTT = (1 - ) OldAverageRTT +  LatestRTT
NewAverageDev = (1 - ) OldAverageDev +  LatestDev
where LatestRTT = (ack_receive_time – send_time),
LatestDev = |LatestRTT – AverageRTT|,
 = 1/8, typically.
Timeout = AverageRTT + 4*AverageDev
• Timeout is thus a function of RTT and deviation
TCP Windows
• Multiple outstanding packets can increase throughput
TCP Windows
DATA, id=17
DATA, id=18
DATA, id=19
DATA, id=20
ACK 17
ACK 18
ACK 19
ACK 20
• Can have more than one
packet in transit
• Especially over fat pipes, e.g.
satellite connection
• Need to keep track of all
packets within the window
• Need to adjust window size
TCP Congestion Control
• TCP increases its window size when no packets dropped
• It halves the window size when a packet drop occurs
– A packet drop is evident from the acknowledgements
• Therefore, it slowly builds to the max bandwidth, and hover
around the max
– It doesn’t achieve the max possible though
– Instead, it shares the bandwidth well with other TCP connections
• This linear-increase, exponential backoff in the face of
congestion is termed TCP-friendliness
TCP Window Size
Max Bandwidth
• Linear increase
• Exponential
backoff
Bandwidth
• Assuming no
other losses in
the network
except those
due to
bandwidth
Time
TCP Fairness
A
D
Bottleneck
Link
Bandwidth for Host A
B
Bandwidth for Host B
• Want to share
the bottleneck
link fairly
between two
flows
TCP Slow Start
• Linear increase takes a long time to build up a
window size that matches the link bandwidth*delay
• Most file transactions are not long enough
• Consequently, TCP can spend a lot of time with small
windows, never getting the chance to reach a
sufficiently large window size
• Fix: Allow TCP to build up to a large window size
initially by doubling the window size until first loss
TCP Slow Start
Max Bandwidth
• Initial phase of
exponential
increase
Bandwidth
• Assuming no
other losses in
the network
except those
due to
bandwidth
Time
TCP Summary
• Reliable ordered message delivery
• Connection oriented, 3-way handshake
• Transmission window for better throughput
• Timeouts based on link parameters
• Congestion control
• Linear increase, exponential backoff
• Fast adaptation
• Exponential increase in the initial phase