The Internet: A Distributed System

Download Report

Transcript The Internet: A Distributed System

The Internet: A Distributed
System
http://people.freebsd.org/~nik/dist-sys.ppt
Copyright © 2002 Nik Clayton
All rights reserved.
Redistribution and use, with or without modification, are permitted provided that the following
condition is met:
• Redistributions of this presentation must retain the above copyright notice, this list of
conditions and the following disclaimer.
THIS PRESENTATION IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS''
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS
BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER
IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Obligatory biographical bit
• Used to be [email protected]
• Now [email protected], [email protected]
• One of five running mail for Citigroup
– 11m msgs/week, 850MB/day
• Editor, “FreeBSD Handbook”
Looking at…
• How the Internet works
• How the Domain Name System (DNS)
works on top of this
• How the Simple Mail Transport Protocol
uses both of these to shuffle e-mail around
the place
So, how does the Internet work?
• Three key protocols involved:
– IP: Internet Protocol
– UDP: User Datagram Protocol
– TCP: Transmission Control Protocol, often written
TCP/IP
• IP is lowest layer, UDP and TCP sit on top of it.
• Not going to look at the physical layer (ethernet,
etc)
• Not going to look at IPv6
Internet and the OSI 7 layer model
7 Application
TELNET
FTP
SMTP
SNMP
DNS
6
Presentation
RFC 854
RFC 959
RFC 821
RFC 1098
RFC 1034
5 Session
4 Transport
3 Network
TCP
UDP
RFC 793
RFC 768
ARP
RARP
ICMP
BOOTP
RFC 826
RFC 903
RFC 792
RFC 951
IP
RFC 791
2 Link
802.2
802.3
1 Physical
802.5
Other
Medium
Access
Protocols
The 7 Layer Burrito
7. Sour cream
6. Cheese
5. Guacamole
4. Tomato
3. Lettuce
2. Seasoned rice
1. Refried beans
A Networking Analogy
• Two office blocks, each contains a number of
different companies
• Each company has one or more phone numbers
(so there are several phone numbers for the office
block)
• Each phone number has a few hundred
extensions
• To call anyone, you need their company phone
number, and their extension
• 4 numbers identify any call -- source phone
number, source extension, destination phone
A Networking Analogy (cont.)
• Imagine if everybody agreed on certain
standard phone extenions.
– #25 gets you to the mail room
– #80 is the marketing department
– #123 calls the speaking clock
• That’s almost how the Internet works
In an IP network…
• You have a host (an office building)
• Each host has one or more network interfaces (companies
within the building)
• Each interface has one or more IP addresses attached to it
(phone numbers)
• Each interface has 65535 ports (extensions)
• Connections are made from a port on an IP address to
another port on an IP address
• 4 numbers identify a connection on the Internet -- source
IP address, source port, destination IP address, destination
port
In an IP network…
• You have a host (an office building)
• Each host has one or more network interfaces
(companies within the building)
• Each interface has one or more IP addresses attached to
it (phone numbers)
• Each interface has 65535 ports (extensions)
• Connections are made from a port on an IP address to
another port on an IP address
• 4 numbers identify a connection on the Internet -- source
IP address, source port, destination IP address, destination
port
Packet switching
• Internet is a packet switched network
• Data is split into packets
• Each packet has a source IP/port, and a
destination IP/port, as well as other metainformation
• Packets may not arrive in the same order as
sent
• Packets may not even arrive at all
IP Address: A definition
• 32 bit number
– So there are 232 = 4,294,967,296 of them
• Normally written as 4 * 4 octet values, e.g.,
10.10.1.1 (dotted quad notation)
• Are assigned by the network people, who
arranged a block of addresses for the company,
who were given them by your ISP, who was
allocated them by their regional IP authority, who
were assigned a regional block by the Internic.
So, tell me what ports are
• Like a telephone extension
• Each IP address has 216 - 1 = 65535 ports
• A server listens on an IP address:port pair for
incoming connections
• A client is typically allocated a port at random for
outgoing connections, and specifies the
destination port it wants to connect to
• Some services (mail, web, etc) have “well known
ports” assigned that servers are expected to listen
on (25, 80, etc)
Networks are groups of IP
addresses
• IP addresses are grouped into collections, called
networks
• Network membership is determined by the
netmask
• Netmask splits the IP address in to two portions;
the host portion, and the network portion
• Two hosts are in the same network if the network
portions of their IP addresses are identical
How netmasks work
• 10.10.1.1 is really
00001010 00001010 00000001 00000001
10
10
1
1
• and 10.10.2.1 is really
00001010 00001010 00000010 00000001
10
10
2
1
How netmasks work (cont.)
• Netmask is another 32 bit binary number
• It is binary-ANDed with the IP address
• All bits still on after this form the network
portion of the IP address
• All bits left off are the host portion
How netmasks work (cont.)
• IP: 10.10.1.1
• Netmask: 255.255.255.0
• 00001010 00001010 00000001 00000001
10
10
1
1
AND
11111111 11111111 11111111 00000000
255
255
255
0
=
00001010 00001010 00000001 00000000
10
10
1
0
• So this is the .1 host in the 10.10.1.0 network
How netmasks work (cont.)
• Netmask doesn’t have to be a continuous
string of 1s, then continuous 0s
– 170.170.170.0
– 10101010 10101010 10101010 00000000
• That would be bloody stupid though
• In practice, netmasks are all 1s, then all 0s
How netmasks work (cont.)
• Leads to another common notation for
netmasks, /n
• /24 means 24 x 1, then all 0
– 11111111 11111111 11111111 00000000
– Same as 255.255.255.0
• /16 would 16 x 1, then all 0
– 11111111 11111111 00000000 00000000
– Same as 255.255.0.0
How netmasks work (cont.)
• Are these two hosts on the same network?
– 10.10.1.1/24
– 10.10.2.1/24
• No. The first is on the 10.10.1.0 net, the second is
on the 10.10.2.0 net
• What about these?
– 10.10.1.1/16
– 10.10.2.1/16
• Yes, they’re both on the 10.10.0.0 net
How netmasks work (cont.)
• Netmasks do not need to be on an octet boundary
– 11111111 11111111 11111111 11000000
– 255.255.255.192
– /26
– 10.10.1.33 = 00001010 00001010 00000001 00100001
– 10.10.1.67 = 00001010 00001010 00000001 01000011
The Network Addresses
• Network address is used to indicate the whole network
• No host can be given the network address
• Consists of the network portion as normal, with the host
portion set to all zero
• 10.10.1/24, the network address is 10.10.1.0
• 10.10.1/26 defines four networks
– 10.10.1.0
= 00001010 00001010 0000001 00000000
– 10.10.1.64 = 00001010 00001010 0000001 01000000
– 10.10.1.128 = 00001010 00001010 0000001 10000000
– 10.10.1.192 = 00001010 00001010 0000001 11000000
The Broadcast Addresses
• Broadcast address is used to send to all hosts on the
network
• No host can be given the broadcast address
• Consists of the network portion as normal, with the host
portion set to all ones
• 10.10.1/24, the broadcast address is 10.10.1.255
• 10.10.1/26 defines four networks and broadcast addresses
– 10.10.1.63
= 00001010 00001010 0000001 00111111
– 10.10.1.127 = 00001010 00001010 0000001 01111111
– 10.10.1.191 = 00001010 00001010 0000001 10111111
– 10.10.1.255 = 00001010 00001010 0000001 11111111
Shrinking address space
• /24 has 256 host addresses available
– .0 through to .255
• Lose .0, reserved for network
• Lose .255, reserved for broadcast
• Leaves you with (256 - 2) = 254 available
addresses for hosts
Shrinking address space (cont.)
• /25 creates two networks
• .0 network
– Network address is .0
– Broadcast address is .127
– Host addresses are .1 through to .126 (126 addresses)
• .128 network
– Network address is .128
– Broadcast address is .255
– Host addresses are .129 through to .254 (126
addresses)
Smaller subnets, fewer hosts
• /26 network has four networks
• Each network reserves 2 addresses
• So there are 4 * 2 = 8 addreses reserved
• 256 - 8 = 248 host addresses available
• And so on
Routing
• Hosts on the same network can contact each
other directly
• E.g., 10.10.1.1/24 wants to talk to 10.10.1.2/24.
• It puts a packet on the wire with a destination
address of 10.10.1.2, and 10.10.1.2 receives it
• It’s like magic, you don’t need to know how this bit
works, it just does
• If you become a network administrator, you will
learn, in long, tedious detail, how this magic works
Routing (cont.)
• Hosts on two different networks can’t talk directly,
they need a router to route the packets between
them
• A router is a device with at least 2 network
interfaces present on 2 or more different networks
• Hosts send packets for other networks to the
router
• Router looks at the destination address
information in the packet, and works out where to
send it
Routing (cont.)
• Each Internet host has to maintain a routing
table
• The routing table details how packets get
from a to b
• The routing table only contains information
about the networks the host is directly
connected to
Routing (cont.)
10.10.1.1/24
10.10.1.2/24
80.194.99.103/24
10.10.2.1/24
Internet
10.10.2.2/24
Routing (cont.)
• Here’s the routing table for the workstations on the
10.10.1/24 network
Destination
Gateway
10.10.1/24
Local interface
Default
10.10.1.1
• If it’s on the local network then we know we can
reach it directly
• Otherwise send it on to the router, and hope that it
knows how to deal with it
Routing (cont.)
• Here’s the routing table for the workstations on the
10.10.2/24 network
Destination
Gateway
10.10.2/24
Local interface
Default
10.10.2.1
• If it’s on the local network then we know we can
reach it directly
• Otherwise send it on to the router, and hope that it
knows how to deal with it
Routing (cont.)
• Here’s the routing table for the router
Destination
Gateway
10.10.1.0/24
Interface 1
10.10.2.0/24
Default
Interface 2
Interface 3
Routing (cont.)
• This is very scalable
– No host needs to know the complete route to
the destination, or the Internet’s topology
– They just need to know the IP address of the
nearest router
– The nearest router hands it off to the next
nearest router, and so on
User Datagram Protocol (UDP)
• Runs on top of IP
• Connectionless, just send data
– No guarantee packets will be delivered in order,
the applications must deal with this
– No guarantee packets will even arrive,
applications must resend data as necessary
– A bit like the Post Office
• But very low overhead
Transmission Control Protocol
(TCP)
• Runs on top of IP
• Connection oriented (open/send/close)
• Network stack ensures
– Packets are delivered to the application in the correct
order
– Missing packets are automatically resent
• Has more overhead than UDP, particularly on the
intial connection (three way handshake)
• Handles network congestion well
Internet summary
• Hosts have interfaces
• Interfaces have IP addresses
• IP addresses subdivided in to the network portion
and the host portion by the netmask
• Subdividing networks consumes available IP
addresses (for network and broadcast address)
• Hosts on the same network can talk to one
another directly
• Hosts on different networks need to know the
address of the correct router to use
Internet Summary (cont.)
• Data sent using either UDP or TCP
• UDP is faster, but the application has to do
more book keeping
• TCP starts slower, but the application has to
do less work
IP Design Good Points
• Very scalable
• Easy to understand, simple rules
• Does not enforce specific policy
– Networks can be any size
– Does not require particular cabling standard
– Hardware and OS agnostic
• Open
IP Design Bad Points
• Large networks send a lot of meta data
around
– Hosts announcing themselves
• Basic IP design is not secure
– Easy to spoof the source address on a packet
– Leads to denial of service attacks
– Malicious router can sniff traffic, or replace data
– Security in layers 5, 6, and 7 (SSL, SSH, etc)
Domain Name System
(DNS)
The Definitive Reference
• DNS and BIND,
Paul Albitz & Cricket Liu
• Everything you ever wanted
to know about the DNS
• Can’t recommend this book
highly enough
IP Addresses are a pain
• Working with IP addresses is
– Cumbersome
– Error prone
– Hard to remember
• We prefer to name things where possible
• Which is why we have domain names
Fully Qualified Domain Names
• FQDN is two or more names, separated by
dots
• L/R, the first part is the host name
• The rest is the domain name
• IP addresses are mapped to FQDNs
• FQDNs are mapped back to IP addresses
• How?
One way: The hosts file
10.10.1.1 gateway.example.com
10.10.1.2 me.example.com
10.10.1.3 another.example.com
...
This does not scale (!)
So the DNS was invented
• A hierarchical name space, read from right to left
• me.example.com (FQDN) is
.
.com
.com.example
.com.example.me
<- The root
<- Top level domain
<- Sub-domain
<- FQDN
• Converting a hostname to an IP address is called
“resolving” the address
• “zone” and “domain” are almost interchangable
terms
How the DNS is used
• 3 types of host
– DNS servers know how addresses and names map to
one another for one or more domains
– DNS caches, given a domain, know how to find out
which DNS server knows about that domain, and query
it for info
– DNS clients (resolvers) know how to talk to caches
• DNS clients contact their nearest cache when they
need to resolve an address. The cache works out
which DNS server will have this information, and
makes the queries
The root nameservers
• 12 (or so) machines, scattered around the world,
that know the nameservers immediately below
them
• Every DNS server in the world needs to know the
IP addresses of the root nameservers
• That’s the only bit of static configuration required
• Everything else is looked up as necessary
• Which is pretty cool
DNS Hierarchy
Root Nameservers
.uk
.co.uk
GTLD Nameservers
.ac.uk
.net
brunel.ac.uk
ic.ac.uk
www.brunel.ac.uk
doc.ic.ac.uk
src.doc.ic.ac.uk
.org
freebsd.org
www.freebsd.org
.com
slashdot.org
freefall.freebsd.org
citigroup.com
...
Primary and Secondary DNS
• Each domain has exactly one primary (master)
DNS server, and 0 to ‘n’ secondary (slave) servers
• To a client, there is no distinction between the two
• DNS information is updated on the primary DNS
server
• Secondary servers periodically check for updates,
and copy changes over as necessary
DNS in action
•
dns.example.com is the local DNS cache
•
me.example.com is a host that uses the
DNS server
•
You are a user running applications on
me.example.com
•
You type ‘www.freebsd.org’ in your web
browser
•
What happens?
DNS in action (cont.)
• First, me.example.com checks to see if it
knows the IP address of www.freebsd.org
• It doesn’t
• So it sends a DNS query to
dns.example.com
• This query says “Please give me the A
record for the FQDN www.freebsd.org”
DNS in action (cont.)
• dns.example.com knows nothing about
www.freebsd.org
• So it asks one of the root name servers
• They don’t know either, but they say “Go talk
to the .org name servers, here’s their IP
addresses”
• So dns.example.com goes and asks the .org
name servers
DNS in action (cont.)
• They say “We don’t know, but we do know
that ns.freebsd.org is the nameserver that’s
authoritive for *.freebsd.org, here’s its
address, go ask it”
• So dns.example.com says to ns.freebsd.org
“Please give me the A record for
www.freebsd.org”
• ns.freebsd.org says “Sure, it’s
216.136.204.117”
DNS in action (cont.)
• dns.example.com caches this information
(so if it’s asked again it doesn’t need to redo
all the above), and sends the info back to
me.example.com
• All this happens in a few seconds
• This is what your browser is doing when it
says something like “Resolving hostname”
Other types of DNS record
• That example used “A” records
– They map FQDNs back to IP addresses
• Called a “Forward” lookup
• Not the only type of records in the DNS
– PTR records map IP addresses to FQDNs
• Called a “Reverse” lookup
– NS records list the domain’s name servers
– MX records are used for mail routing
– SOA record is the ‘Start of Authority’
SOA Record
• Every zone has one SOA record
• Describes characteristics for the zone
– Serial number, which is incremented every time
the data changes
– Time-to-live, which says how long data should
be cached for
– E-mail address of DNS info maintainer
Example of a DNS Zone File
$ORIGIN brunel.ac.uk.
brunel.ac.uk. IN SOA sirius.brunel.ac.uk. hostmaster.brunel.ac.uk.
(2002103001
8000
7200
604800
21600
)
; Serial number
; Refresh after 2hrs 13min
; Retry after 2hrs
; Expire after 1wk
; Minimum TTL of 6hrs
IN NS sirius.brunel.ac.uk.
IN NS ns3.ja.net.
IN MX 5 nemesis.brunel.ac.uk.
IN MX 4 eros.brunel.ac.uk.
s70n133
s249n88
s249n90
………
IN
IN
IN
A
A
A
134.83.70.133
134.83.249.88
134.83.249.90
IP Characteristics of DNS
• DNS servers listen on port 53
• Generally uses UDP
– Very short communication lifespan
– TCP overhead is too high
– Protocol is simple and robust
• Didn’t get an answer? Just send the query again
• May use TCP where appropriate
– Zone transfers between primary and secondary servers
Smart things about DNS
• Simple mechanism for synchronising
primary and secondary servers
• Distributes data throughout the network, no
real single point of failure for the Internet
– With the exception of the root nameservers
– DDoS Attacks
Bad things about DNS
• Not secure, you have to trust your DNS
server
– Always do a forward lookup after a reverse
lookup
• DNS server is a single point of failure for a
network’s presence on the Internet
– So make sure that multiple secondary servers
exist
– On different, geographically disparate networks
Bad things about DNS (cont.)
• Difficult to do updates ‘on demand’
– There are enhancements that try to address this
– But they’re not widely deployed
– Commercial interests
Simple Mail Transport
Protocol
(SMTP)
SpaM Transport Protocol
What it sometimes feels like
A word from our sponsor…
• Wed 13th to 16th
November 2003
• Compass Theatre,
Ickenham
• £5.00, £6.50 or £7.50
• 07050 605081
• I’m in it as myself.
• “Nail it to the counter Lord
Fergason and damn the
cheesmongers!”
An e-mail message consists of…
• Envelope
– Contains addressing information
– Discarded once the message is successfully delivered
• Header
– Contains 1-n “name: value” fields
– From:, To:, CC:, BCC:, Subject:, Date:, Received:, XFoo:, X-Bar:, etc…
• Body
– Unstructured text of the actual message
Sample SMTP conversation
# telnet eros.brunel.ac.uk 25
220 ************
HELO ngo.dnsalias.org
250 eros.brunel.ac.uk OK
MAIL FROM: [email protected]
250 2.1.0 OK
RCPT TO: [email protected]
250 2.1.5 Recipient OK
DATA
354 Enter Mail, end by a line with only ‘.’
From: [email protected] (Nik Clayton)
To: [email protected] (Simon Taylor)
Subject: Slides for lecture
Sorry mate, no chance I’ll have the slides ready in
time, we’ll need to fake something. But keep it to
yourself, I don’t think they’ll notice.
Nik
.
250 2.1.5 Submitted & queued (msg.22684-0)
QUIT
221 2.0.0 eros.brunel.ac.uk says goodbye to ngo.dnsalias.org
SMTP Highlights
• Protocol is entirely plain text
– Easy to debug
– Easy to test by hand
– Easy to script
• Protocol is relatively simple
– Easy to write code for (Microsoft excepted)
• Protocol is unambiguous
– All information is contained in the status codes. The
explanatory text is useful but ignored by
implementations
SMTP Highlights (cont.)
• Protocol is consistent
– 2xx codes indicate success
– 3xx codes indicate ‘send more data’
– 4xx codes indicate temporary failures
– 5xx codes indicate permanent failures
• The ‘xx’s provide further delineation
• SMTP implementations are supposed to be
paranoid
A real SMTP failure
• We had an application that was a buggy SMTP
server
• Sometimes it failed to send back a valid SMTP
response after generating a bounce message
• The client didn’t know whether or not the message
was delivered, temp. failed, or perm. failed
• So it tried, tens of times a second, to resend the
message
• This generated thousands of bounce messages
very quickly
The Envelope and Bcc:
• From: [email protected]
To: [email protected]
Bcc: [email protected]
...
• 220 . . .
MAIL FROM: [email protected]
250 . . .
RCPT TO: [email protected]
250 2.1.5 Recipient OK
RCPT TO: [email protected]
250 2.1.5 Recipient OK
DATA
354 . . .
From: [email protected] (Nik Clayton)
To: [email protected] (Simon Taylor)
...
Sample Received: Lines
Received: from localhost ([email protected] [127.0.0.1])
by crf-consulting.co.uk (8.12.3/8.12.3) with ESMTP id g9GFo4Tk093919
for <nik@localhost>; Wed, 16 Oct 2002 16:50:04 +0100 (BST)
(envelope-from [email protected])
Received: from ngo.org.uk [212.219.216.39]
by localhost with POP3 (fetchmail-5.9.11)
for nik@localhost (single-drop); Wed, 16 Oct 2002 16:50:04 +0100 (BST)
Received: from nemesis.brunel.ac.uk (nemesis.brunel.ac.uk [134.83.108.17])
by ngo.org.uk (8.9.3/8.9.3) with ESMTP id RAA07600
for <[email protected]>; Wed, 16 Oct 2002 17:01:18 +0100 (BST)
Received: from csstsjt (actually s76n96.brunel.ac.uk) by nemesis.brunel.ac.uk
with SMTP-BRUNEL (PP) with ESMTP; Wed, 16 Oct 2002 16:47:25 +0100
Re-ordered Received: lines
Received: from csstsjt (actually s76n96.brunel.ac.uk) by nemesis.brunel.ac.uk
with SMTP-BRUNEL (PP) with ESMTP; Wed, 16 Oct 2002 16:47:25 +0100
Received: from nemesis.brunel.ac.uk (nemesis.brunel.ac.uk [134.83.108.17])
by ngo.org.uk (8.9.3/8.9.3) with ESMTP id RAA07600
for <[email protected]>; Wed, 16 Oct 2002 17:01:18 +0100 (BST)
Received: from ngo.org.uk [212.219.216.39]
by localhost with POP3 (fetchmail-5.9.11)
for nik@localhost (single-drop);
Wed, 16 Oct 2002 16:50:04 +0100 (BST)
Received: from localhost ([email protected] [127.0.0.1])
by crf-consulting.co.uk (8.12.3/8.12.3) with ESMTP id g9GFo4Tk093919
for <nik@localhost>; Wed, 16 Oct 2002 16:50:04 +0100 (BST)
(envelope-from [email protected])
Re-ordered Received: lines
Received: from csstsjt (actually s76n96.brunel.ac.uk) by nemesis.brunel.ac.uk
with SMTP-BRUNEL (PP) with ESMTP; Wed, 16 Oct 2002 16:47:25 +0100
Received: from nemesis.brunel.ac.uk (nemesis.brunel.ac.uk [134.83.108.17])
by ngo.org.uk (8.9.3/8.9.3) with ESMTP id RAA07600
for <[email protected]>; Wed, 16 Oct 2002 17:01:18 +0100 (BST)
Received: from ngo.org.uk [212.219.216.39]
by localhost with POP3 (fetchmail-5.9.11)
for nik@localhost (single-drop);
Wed, 16 Oct 2002 16:50:04 +0100 (BST)
Received: from localhost ([email protected] [127.0.0.1])
by crf-consulting.co.uk (8.12.3/8.12.3) with ESMTP id g9GFo4Tk093919
for <nik@localhost>; Wed, 16 Oct 2002 16:50:04 +0100 (BST)
(envelope-from [email protected])
Acronyms
• MTA = Mail Transfer Agent
– The software that routes message from host to host
(Sendmail, Postfix, Qmail, Exchange (cough))
• MUA = Mail User Agent
– The software that lets users send and receive e-mail
(Outlook, Eudora, etc)
• PBCK = Problem Between Chair and Keyboard
– A user. See also “DFU”
Mail Routing
• I tap in [email protected] into my
MUA. What happens?
• MUA hands message off to local MTA
• Local MTA uses the DNS to look up MX
records for brunel.ac.uk
• MX record?
MX Records
• Are entries in the DNS
• Unlike most other DNS entries (A records, etc),
they contain two pieces of information
– A FQDN
– A weight / preference
• A domain (brunel.ac.uk) may have multiple MX
records, listing different FQDNs and weights,
providing redundancy
• Hosts acting as MXs for a domain do not need to
be in the same domain as the domain they are
acting as MXs for (!)
Brunel and Citigroup MX records
Weight
Host
4
eros.brunel.ac.uk
5
nemesis.brunel.ac.uk
Weight
Host
50
mail1.citigroup.com
50
mail2.citigroup.com
50
mail3.citigroup.com
50
mail4.citigroup.com
50
mail5.ssmb.com
Mail Routing (cont.)
• The local MTA sorts the MX results in order of their
weight (lowest first)
• It does a DNS lookup for the IP address(es) of the
first FQDN in the list
• It tries to connect to that IP address on port 25
• If the connection succeeds it tries to deliver the
message
• If the connection fails, or the delivery attempt
failed with a temporary error, it tries again, with the
next MX record in the list
Mail Routing (cont.)
• The MTA will queue messages for a period of time
(5 days is typical)
• It will make regular attempts to re-deliver
messages that generated temporary failures
– Failure after a certain period (normally 4 hours) may
generate a “We are still trying to deliver your message”
note to the envelope sender address
• Messages that generate a permanent failure from
any of the MX hosts are not retried, and are
bounced
• Bounces go to the envelope sender address, not
the From: address
Citigroup Mail Backbone Structure
Internet
Anti-spam
Anti-virus
Archiving
Address re-writing
Exchange Servers
IP Characteristics of SMTP
• SMTP servers listen on port 25
• Always uses TCP
– Relatively long communication lifespan
– TCP overhead is acceptable
– TCP ensures packets are resent as necessary
Extending SMTP
• Turns out that, as originally specified, SMTP
doesn’t do some useful things
• So ESMTP was invented
• But how do you do this without breaking all
the existing implementations?
• Hmm…
Extending SMTP (cont.)
• Get out clause in the original SMTP spec
• If an SMTP server receives a command it
doesn’t understand, it:
– Does not drop the connection
– Returns an error code (5xx)
– Pretends it never received the command
• Robustness in action, and a stroke of genius
Extending SMTP (cont.)
• EHLO - Extended HELO
• Replaces ‘HELO’ in the
beginning of the SMTP spec
• If a server responds to EHLO
with a 2xx code you know it
speaks ESMTP
• If it responds with a 5xx code then you fall
back to regular SMTP, and immediately
send a HELO.
EHLO in action
220 issaspam-ny01.ssmb.com ESMTP Go ahead
EHLO ngo.dnsalias.org
250-issaspam-ny01.ssmb.com Hello
250-ENHANCEDSTATUSCODES
250-PIPELINING
250-8BITMIME
250-SIZE 26214400
250-DSN
250-DELIVERBY
250 HELP
MAIL FROM: [email protected]
250 . . .
EHLO failing
220 smtp.example.com
EHLO ngo.dnsalias.org
502 Error: command not implemented
HELO ngo.dnsalias.org
250 OK
MAIL FROM: [email protected]
250 . . .
A better way of solving the
problem
• Always embed version information in to your
protocols
• The version should be the first piece of
information in any transaction
• Defines the format of the rest of the
transaction
• But, still allow unimplemented commands to
fail gracefully
Nice things about SMTP
• It’s distributed from the get-go, and it scales
– Need more servers? Add them, and update
your MX records
• It’s open and royalty free
– SMTP is fully documented in RFC2821
– Message format is in RFC2822
• Heterogenous
– Nothing in SMTP ties it to a particular platform
More nice things about SMTP
• It’s resilient, and failures are handled
– MX server not responding? Go try another one
– Are they all down? Wait a bit, and try again
– It distinguishes between temporary errors
• Disk’s full, I can’t accept any mail at the moment, so try again
letter
– And permanent errors
• The e-mail address you’ve provided is invalid, I’m never going
to be able to deliver it.
• Hides implementation details from the user
– User doesn’t need to know the route the message takes
Nice things about SMTP..?
• Secure?
– Not really
– Relatively simple to forge mail
– Harder to forge it perfectly
– Does not address encryption or authentication
of message contents
• Nobody’s perfect
Thanks
Questions?
Bonus Slides
Things I wish I knew 10 years ago
• Work for a small company
– You learn a lot very quickly
– The hours can be insane
– You can accomplish a lot very fast
• Work for a large company
– You tend to specialise
– Regular hours
– Bureacracy is ever-present
More things to know
• Attend conferences
– You learn a lot
– The networking (people kind) is invaluable
– Speaking at them is great for the CV
• It also forces you to think clearly about a subject
– Never neglect the social side
• Travel whenever possible
– San Francisco is great in the summer
Still more things to know
• Always be aware of the Peter Principle
• Read “The Mythical Man Month”, Brooks
• Learn the Perl programming language
• Stay up to date with the technical journals
• Find time to have a life
Pseudo-code for a server
int s;
sockaddr_t addr;
int client;
// The socket handle
// The socket address
// Address info of the client
addr.sin_port = 80;
// We’ll listen on port 80
s = socket(AF_INET, SOCK_STREAM, 0);
// Create socket
// Assign the address info we specify to the socket
bind(s, &addr, sizeof(sockaddr_t));
listen(s, 5);
// 5 incoming connections at once
while(accept(s, &addr, &client)) {
// If we’re here then something’s connected to us.
// Do whatever we’re supposed to do when this happens
}
Pseudo-code for a client
int s;
sockaddr_t addr;
struct hostent *he;
// The socket handle
// The socket address
// Info about the remote host
s = socket(AF_INET, SOCK_STREAM, 0);
// Create socket
// Get the IP address of the host we want to connect to
he = gethostbyname(“www.freebsd.org”);
// Store the IP address, and the port we connect to
addr.sin_addr.s_addr = *((int *) he->h_addr_list[0]);
addr.sin_port = 80;
if(connect(s, &addr, sizeof(addr)) == 0) {
// Connected to the remote host.
// …
close(s);
}
// All done
User
me.example.com
dns.example.com
Root
Nameserver
.org
Nameserver
ns.freebsd.org
Nameserver