Transcript Server
1. Networks and the Internet
2. Network programming
3. Web services
A Client-Server Transaction
1. Client sends request
Client
process
4. Client
handles
response
Server
process
3. Server sends response
Resource
2. Server
handles
request
Note: clients and servers are processes running on hosts
(can be the same or different hosts)
Most network applications are based on the client-server
model:
A server process and one or more client processes
Server manages some resource
Server provides service by manipulating resource for clients
Server activated by request from client (vending machine analogy)
Hardware Organization of a Network Host
CPU chip
register file
ALU
system bus
memory bus
main
memory
I/O
bridge
MI
Expansion slots
I/O bus
USB
controller
mouse keyboard
graphics
adapter
disk
controller
network
adapter
disk
network
monitor
Computer Networks
A network is a hierarchical system of boxes and wires
organized by geographical proximity
SAN (System Area Network) spans cluster or machine room
Switched Ethernet, Quadrics QSW, …
LAN (Local Area Network) spans a building or campus
Ethernet is most prominent example
WAN (Wide Area Network) spans country or world
Typically high-speed point-to-point phone lines
An internetwork (internet) is an interconnected set of
networks
The Global IP Internet (uppercase “I”) is the most famous example
of an internet (lowercase “i”)
Let’s see how an internet is built from the ground up
Lowest Level: Ethernet Segment
host
100 Mb/s
host
hub
host
100 Mb/s
port
Ethernet segment consists of a collection of hosts connected
by wires (twisted pairs) to a hub
Spans room or floor in a building
Operation
Each Ethernet adapter has a unique 48-bit address (MAC address)
E.g., 00:16:ea:e3:54:e6
Hosts send bits to any other host in chunks called frames
Hub slavishly copies each bit from each port to every other port
Every host sees every bit
Note: Hubs are on their way out. Bridges (switches, routers) became cheap enough
to replace them (means no more broadcasting)
Next Level: Bridged Ethernet Segment
A
host
host
hub
B
host
host
X
100 Mb/s bridge
100 Mb/s hub
1 Gb/s
hub
100 Mb/s
bridge
host
100 Mb/s
host
host
hub
Y
host
host
host
host
host
C
Spans building or campus
Bridges cleverly learn which hosts are reachable from which
ports and then selectively copy frames from port to port
Conceptual View of LANs
For simplicity, hubs, bridges, and wires are often shown as a
collection of hosts attached to a single wire:
host
host ... host
Next Level: internets
Multiple incompatible LANs can be physically connected by
specialized computers called routers
The connected networks are called an internet
host
host ...
host
host
host ...
LAN
host
LAN
router
WAN
router
WAN
router
LAN 1 and LAN 2 might be completely different, totally incompatible
(e.g., Ethernet and Wifi, 802.11*, T1-links, DSL, …)
Logical Structure of an internet
host
router
host
router
router
router
router
router
Ad hoc interconnection of networks
No particular topology
Vastly different router & link capacities
Send packets from source to destination by hopping through
networks
Router forms bridge from one network to another
Different packets may take different routes
The Notion of an internet Protocol
How is it possible to send bits across incompatible LANs
and WANs?
Solution:
protocol software running on each host and router
smooths out the differences between the different networks
Implements an internet protocol (i.e., set of rules)
governs how hosts and routers should cooperate when they
transfer data from network to network
TCP/IP is the protocol for the global IP Internet
What Does an internet Protocol Do?
Provides a naming scheme
An internet protocol defines a uniform format for host addresses
Each host (and router) is assigned at least one of these internet
addresses that uniquely identifies it
Provides a delivery mechanism
An internet protocol defines a standard transfer unit (packet)
Packet consists of header and payload
Header: contains info such as packet size, source and destination
addresses
Payload: contains data bits sent from source host
Transferring Data Over an internet
LAN1
(1)
client
server
protocol
software
data
PH
data
PH
LAN2
(8)
data
(7)
data
PH
FH2
(6)
data
PH
FH2
protocol
software
FH1
LAN1 frame
(3)
Host B
data
internet packet
(2)
Host A
LAN1
adapter
LAN2
adapter
Router
FH1
LAN1
adapter
LAN2
adapter
LAN2 frame
(4)
PH: Internet packet header
FH: LAN frame header
data
PH
FH1
data
protocol
software
PH
FH2
(5)
Other Issues
We are glossing over a number of important questions:
What if different networks have different maximum frame sizes?
(segmentation)
How do routers know where to forward frames?
How are routers informed when the network topology changes?
What if packets get lost?
These (and other) questions are addressed by the area of
systems known as computer networking
Global IP Internet
Most famous example of an internet
Based on the TCP/IP protocol family
IP (Internet protocol) :
Provides basic naming scheme and unreliable delivery capability
of packets (datagrams) from host-to-host
UDP (Unreliable Datagram Protocol)
Uses IP to provide unreliable datagram delivery from
process-to-process
TCP (Transmission Control Protocol)
Uses IP to provide reliable byte streams from process-to-process
over connections
Accessed via a mix of Unix file I/O and functions from the
sockets interface
Hardware and Software Organization
of an Internet Application
Internet client host
Internet server host
Client
User code
Server
TCP/IP
Kernel code
TCP/IP
Sockets interface
(system calls)
Hardware interface
(interrupts)
Network
adapter
Hardware
and firmware
Global IP Internet
Network
adapter
A Programmer’s View of the Internet
Hosts are mapped to a set of 32-bit IP addresses
140.192.36.43
The set of IP addresses is mapped to a set of identifiers
called Internet domain names
140.192.36.43 is mapped to cdmlinux.cdm.depaul.edu
IP Addresses
32-bit IP addresses are stored in an IP address struct
IP addresses are always stored in memory in network byte order
(big-endian byte order)
True in general for any integer transferred in a packet header from one
machine to another.
E.g., the port number used to identify an Internet connection.
/* Internet address structure */
struct in_addr {
unsigned int s_addr; /* network byte order (big-endian) */
};
Useful network byte-order conversion functions (“l” = 32 bits, “s” = 16 bits)
htonl: convert uint32_t from host to network byte order
htons: convert uint16_t from host to network byte order
ntohl: convert uint32_t from network to host byte order
ntohs: convert uint16_t from network to host byte order
Dotted Decimal Notation
By convention, each byte in a 32-bit IP address is represented
by its decimal value and separated by a period
IP address: 0x8002C2F2 = 128.2.194.242
Functions for converting between binary IP addresses and
dotted decimal strings:
inet_aton: dotted decimal string → IP address in network byte order
inet_ntoa: IP address in network byte order → dotted decimal string
“n” denotes network representation
“a” denotes application representation
Internet Domain Names
unnamed root
.net
.edu
smith depaul
cti
.gov
berkeley
ece
.com
amazon
www
207.171.166.252
cstsis
reed
140.192.32.110
ctilinux3
140.192.36.43
First-level domain names
Second-level domain names
Third-level domain names
Domain Naming System (DNS)
The Internet maintains a mapping between IP addresses and
domain names in a huge worldwide distributed database called
DNS
Conceptually, programmers can view the DNS database as a collection of
millions of host entry structures:
/* DNS host entry structure
struct hostent {
char
*h_name;
/*
char
**h_aliases;
/*
int
h_addrtype;
/*
int
h_length;
/*
char
**h_addr_list; /*
};
*/
official domain name of host */
null-terminated array of domain names */
host address type (AF_INET) */
length of an address, in bytes */
null-terminated array of in_addr structs */
Functions for retrieving host entries from DNS:
gethostbyname: query key is a DNS domain name.
gethostbyaddr: query key is an IP address.
Properties of DNS Host Entries
Each host entry is an equivalence class of domain names and
IP addresses
Each host has a locally defined domain name localhost
which always maps to the loopback address 127.0.0.1
Different kinds of mappings are possible:
Simple case: one-to-one mapping between domain name and IP address:
reed.cs.depaul.edu maps to 140.192.32.110
Multiple domain names mapped to the same IP address:
eecs.mit.edu and cs.mit.edu both map to 18.62.1.6
Multiple domain names mapped to multiple IP addresses:
google.com maps to multiple IP addresses
A Program That Queries DNS
int main(int argc, char **argv) { /* argv[1] is a domain name */
char **pp;
/* or dotted decimal IP addr */
struct in_addr addr;
struct hostent *hostp;
if (inet_aton(argv[1], &addr) != 0)
hostp = Gethostbyaddr((const char *)&addr, sizeof(addr),
AF_INET);
else
hostp = Gethostbyname(argv[1]);
printf("official hostname: %s\n", hostp->h_name);
for (pp = hostp->h_aliases; *pp != NULL; pp++)
printf("alias: %s\n", *pp);
for (pp = hostp->h_addr_list; *pp != NULL; pp++) {
addr.s_addr = ((struct in_addr *)*pp)->s_addr;
printf("address: %s\n", inet_ntoa(addr));
}
}
Using DNS Program
$ ./hostinfo reed.cs.depaul.edu
official hostname: reed.cti.depaul.edu
alias: reed.cs.depaul.edu
address: 140.192.39.29
$ ./hostinfo 140.192.39.29
official hostname: reed.cti.depaul.edu
address: 140.192.39.29
$ ./hostinfo www.google.com
official hostname: www.google.com
address: 74.125.225.20
address: 74.125.225.17
address: 74.125.225.18
address: 74.125.225.19
address: 74.125.225.16
Querying DIG
Domain Information Groper (dig) provides a scriptable
command line interface to DNS
$ dig +short reed.cs.depaul.edu
reed.cti.depaul.edu.
140.192.39.29
$ dig +short -x 140.192.39.29
ipdstdwkr.cstcis.cti.depaul.edu.
reed.cti.depaul.edu.
$ dig +short www.google.com
74.125.225.19
74.125.225.16
74.125.225.20
74.125.225.17
74.125.225.18
Internet Connections
Clients and servers communicate by sending streams of bytes
over connections:
Point-to-point, full-duplex (2-way communication), and reliable.
A socket is an endpoint of a connection
Socket address is an IPaddress:port pair
A port is a 16-bit integer that identifies a process:
Ephemeral port: Assigned automatically on client when client makes a
connection request
Well-known port: Associated with some service provided by a server
(e.g., port 80 is associated with Web servers)
A connection is uniquely identified by the socket addresses
of its endpoints (socket pair)
(cliaddr:cliport, servaddr:servport)
Putting it all Together:
Anatomy of an Internet Connection
Client socket address
128.2.194.242:51213
Client
Client host address
128.2.194.242
Server socket address
208.216.181.15:80
Connection socket pair
(128.2.194.242:51213, 208.216.181.15:80)
Server
(port 80)
Server host address
208.216.181.15
Clients
Examples of client programs
Web browsers, ftp, telnet, ssh
How does a client find the server?
The IP address in the server socket address identifies the host
(more precisely, an adapter on the host)
The (well-known) port in the server socket address identifies the
service, and thus implicitly identifies the server process that performs
that service.
Examples of well know ports
Port 7: Echo server
Port 23: Telnet server
Port 25: Mail server
Port 80: Web server
Using Ports to Identify Services
Server host 128.2.194.242
Client host
Client
Service request for
128.2.194.242:80
(i.e., the Web server)
Web server
(port 80)
Kernel
Echo server
(port 7)
Client
Service request for
128.2.194.242:7
(i.e., the echo server)
Web server
(port 80)
Kernel
Echo server
(port 7)
Servers
Servers are long-running processes (daemons)
Created at boot-time (typically) by the init process (process 1)
Run continuously until the machine is turned off
Each server waits for requests to arrive on a well-known port
associated with a particular service
Port 7: echo server
Port 23: telnet server
Port 25: mail server
Port 80: HTTP server
A machine that runs a server process is also often referred to
as a “server”
Server Examples
Web server (port 80)
Resource: files/compute cycles (CGI programs)
Service: retrieves files and runs CGI programs on behalf of the client
FTP server (20, 21)
Resource: files
Service: stores and retrieve files
See /etc/services for a
comprehensive list of the port
mappings on a Linux machine
Telnet server (23)
Resource: terminal
Service: proxies a terminal on the server machine
Mail server (25)
Resource: email “spool” file
Service: stores mail messages in spool file
Sockets Interface
Created in the early 80’s as part of the original Berkeley
distribution of Unix that contained an early version of the
Internet protocols
Provides a user-level interface to the network
Underlying basis for all Internet applications
Based on client/server programming model
Sockets
What is a socket?
To the kernel, a socket is an endpoint of communication
To an application, a socket is a file descriptor that lets the
application read/write from/to the network
Remember: All Unix I/O devices, including networks, are
modeled as files
Clients and servers communicate with each other by
reading from and writing to socket descriptors
Client
clientfd
Server
serverfd
The main distinction between regular file I/O and socket
I/O is how the application “opens” the socket descriptors
Example: Echo Client and Server
On Client
On Server
$ ./echoserveri 28888
$ ./echoclient localhost 28888
server connected to
localhost.localdomain (127.0.0.1)
type: hello there
server received 12 bytes
echo: hello there
type: ^D
Connection closed
Overview of the Sockets Interface
Client
Server
socket
socket
bind
open_listenfd
open_clientfd
listen
connect
Client /
Server
Session
Connection
request
accept
rio_writen
rio_readlineb
rio_readlineb
rio_writen
close
EOF
rio_readlineb
close
Await connection
request from
next client
Socket Address Structures
Generic socket address:
For address arguments to connect, bind, and accept
Necessary only because C did not have generic (void *) pointers
when the sockets interface was designed
struct sockaddr {
unsigned short sa_family;
char
sa_data[14];
};
/* protocol family */
/* address data. */
sa_family
Family Specific
Socket Address Structures
Internet-specific socket address:
Must cast (sockaddr_in *) to (sockaddr *) for connect,
bind, and accept
struct sockaddr_in {
unsigned short sin_family;
unsigned short sin_port;
struct in_addr sin_addr;
unsigned char
sin_zero[8];
};
sin_port
AF_INET
/*
/*
/*
/*
address family (always AF_INET) */
port num in network byte order */
IP addr in network byte order */
pad to sizeof(struct sockaddr) */
sin_addr
0
0
sa_family
sin_family
Family Specific
0
0
0
0
0
0
Echo Client Main Routine
#include "csapp.h"
Send line to
server
Receive line
from server
/* usage: ./echoclient host port */
int main(int argc, char **argv)
{
int clientfd, port;
char *host, buf[MAXLINE];
rio_t rio;
host = argv[1]; port = atoi(argv[2]);
clientfd = Open_clientfd(host, port);
Rio_readinitb(&rio, clientfd);
printf("type:"); fflush(stdout);
while (Fgets(buf, MAXLINE, stdin) != NULL) {
Rio_writen(clientfd, buf, strlen(buf));
Rio_readlineb(&rio, buf, MAXLINE);
printf("echo:");
Fputs(buf, stdout);
printf("type:"); fflush(stdout);
}
Close(clientfd);
exit(0);
}
Read input
line
Print server
response
Overview of the Sockets Interface
Client
Server
socket
socket
bind
open_clientfd
listen
connect
Connection
request
accept
open_listenfd
Echo Client: open_clientfd
int open_clientfd(char *hostname, int port) {
int clientfd;
This function opens a connection
struct hostent *hp;
from the client to the server at
struct sockaddr_in serveraddr;
hostname:port
}
if ((clientfd = socket(AF_INET, SOCK_STREAM, 0)) < 0)
return -1; /* check errno for cause of error */
Create
socket
/* Fill in the server's IP address and port */
if ((hp = gethostbyname(hostname)) == NULL)
return -2; /* check h_errno for cause of error */
bzero((char *) &serveraddr, sizeof(serveraddr));
serveraddr.sin_family = AF_INET;
bcopy((char *)hp->h_addr_list[0],
(char *)&serveraddr.sin_addr.s_addr, hp->h_length);
serveraddr.sin_port = htons(port);
Create
address
/* Establish a connection with the server */
if (connect(clientfd, (SA *) &serveraddr,
sizeof(serveraddr)) < 0)
return -1;
return clientfd;
Establish
connection
Echo Client: open_clientfd
(socket)
socket creates a socket descriptor on the client
Just allocates & initializes some internal data structures
AF_INET: indicates that the socket is associated with Internet protocols
SOCK_STREAM: selects a reliable byte stream connection
provided by TCP
int clientfd;
/* socket descriptor */
if ((clientfd = socket(AF_INET, SOCK_STREAM, 0)) < 0)
return -1; /* check errno for cause of error */
... <more>
Echo Client: open_clientfd
(gethostbyname)
The client then builds the server’s Internet address
int clientfd;
/* socket descriptor */
struct hostent *hp;
/* DNS host entry */
struct sockaddr_in serveraddr; /* server’s IP address */
...
/* fill in the server's IP address and port */
if ((hp = gethostbyname(hostname)) == NULL)
return -2; /* check h_errno for cause of error */
bzero((char *) &serveraddr, sizeof(serveraddr));
serveraddr.sin_family = AF_INET;
serveraddr.sin_port = htons(port);
bcopy((char *)hp->h_addr_list[0],
(char *)&serveraddr.sin_addr.s_addr, hp->h_length);
Echo Client: open_clientfd
(connect)
Finally the client creates a connection with the server
Client process suspends (blocks) until the connection is created
After resuming, the client is ready to begin exchanging messages with the
server via Unix I/O calls on descriptor clientfd
int clientfd;
/* socket descriptor */
struct sockaddr_in serveraddr;
/* server address */
typedef struct sockaddr SA;
/* generic sockaddr */
...
/* Establish a connection with the server */
if (connect(clientfd, (SA *)&serveraddr, sizeof(serveraddr)) < 0)
return -1;
return clientfd;
}
Echo Server: Main Routine
int main(int argc, char **argv) {
int listenfd, connfd, port, clientlen;
struct sockaddr_in clientaddr;
struct hostent *hp;
char *haddrp;
unsigned short client_port;
port = atoi(argv[1]); /* the server listens on a port passed
on the command line */
listenfd = open_listenfd(port);
while (1) {
clientlen = sizeof(clientaddr);
connfd = Accept(listenfd, (SA *)&clientaddr, &clientlen);
hp = Gethostbyaddr((const char *)&clientaddr.sin_addr.s_addr,
sizeof(clientaddr.sin_addr.s_addr), AF_INET);
haddrp = inet_ntoa(clientaddr.sin_addr);
client_port = ntohs(clientaddr.sin_port);
printf("server connected to %s (%s), port %u\n",
hp->h_name, haddrp, client_port);
echo(connfd);
Close(connfd);
}
}
Overview of the Sockets Interface
Client
Server
socket
socket
bind
open_listenfd
open_clientfd
listen
connect
Connection
request
accept
Office Telephone Analogy for Server
Socket:
Bind:
Listen:
Accept:
Buy a phone
Tell the local administrator what number you want to use
Plug the phone in
Answer the phone when it rings
Echo Server: open_listenfd
int open_listenfd(int port)
{
int listenfd, optval=1;
struct sockaddr_in serveraddr;
/* Create a socket descriptor */
if ((listenfd = socket(AF_INET, SOCK_STREAM, 0)) < 0)
return -1;
/* Eliminates "Address already in use" error from bind. */
if (setsockopt(listenfd, SOL_SOCKET, SO_REUSEADDR,
(const void *)&optval , sizeof(int)) < 0)
return -1;
... <more>
Echo Server: open_listenfd (cont.)
...
/* Listenfd will be an endpoint for all requests to port
on any IP address for this host */
bzero((char *) &serveraddr, sizeof(serveraddr));
serveraddr.sin_family = AF_INET;
serveraddr.sin_addr.s_addr = htonl(INADDR_ANY);
serveraddr.sin_port = htons((unsigned short)port);
if (bind(listenfd, (SA *)&serveraddr, sizeof(serveraddr)) < 0)
return -1;
/* Make it a listening socket ready to accept
connection requests */
if (listen(listenfd, LISTENQ) < 0)
return -1;
return listenfd;
}
Echo Server: open_listenfd
(socket)
socket creates a socket descriptor on the server
AF_INET: indicates that the socket is associated with Internet protocols
SOCK_STREAM: selects a reliable byte stream connection (TCP)
int listenfd; /* listening socket descriptor */
/* Create a socket descriptor */
if ((listenfd = socket(AF_INET, SOCK_STREAM, 0)) < 0)
return -1;
Echo Server: open_listenfd
(setsockopt)
The socket can be given some attributes
...
/* Eliminates "Address already in use" error from bind(). */
if (setsockopt(listenfd, SOL_SOCKET, SO_REUSEADDR,
(const void *)&optval , sizeof(int)) < 0)
return -1;
Handy trick that allows us to rerun the server immediately
after we kill it
Otherwise we would have to wait about 15 seconds
Eliminates “Address already in use” error from bind()
Strongly suggest you do this for all your servers to simplify
debugging
Echo Server: open_listenfd
(initialize socket address)
Initialize socket with server port number
Accept connection from any IP address
struct sockaddr_in serveraddr; /* server's socket addr */
...
/* listenfd will be an endpoint for all requests to port
on any IP address for this host */
bzero((char *) &serveraddr, sizeof(serveraddr));
serveraddr.sin_family = AF_INET;
serveraddr.sin_port = htons((unsigned short)port);
serveraddr.sin_addr.s_addr = htonl(INADDR_ANY);
IP addr and port stored in network (big-endian) byte order
sin_port
AF_INET
sa_family
sin_family
sin_addr
INADDR_ANY
0
0
0
0
0
0
0
0
Echo Server: open_listenfd
(bind)
bind associates the socket with the socket address we just
created
int listenfd;
/* listening socket */
struct sockaddr_in serveraddr; /* server’s socket addr */
...
/* listenfd will be an endpoint for all requests to port
on any IP address for this host */
if (bind(listenfd, (SA *)&serveraddr, sizeof(serveraddr)) < 0)
return -1;
Echo Server: open_listenfd
(listen)
listen indicates that this socket will accept connection
(connect) requests from clients
LISTENQ is constant indicating how many pending requests
allowed
int listenfd; /* listening socket */
...
/* Make it a listening socket ready to accept connection requests */
if (listen(listenfd, LISTENQ) < 0)
return -1;
return listenfd;
}
We’re finally ready to enter the main server loop that
accepts and processes client connection requests.
Echo Server: Main Loop
The server loops endlessly, waiting for connection
requests, then reading input from the client, and echoing
the input back to the client.
main() {
/* create and configure the listening socket */
while(1) {
/* Accept(): wait for a connection request */
/* echo(): read and echo input lines from client til EOF */
/* Close(): close the connection */
}
}
Overview of the Sockets Interface
Client
Server
socket
socket
bind
open_listenfd
open_clientfd
listen
connect
Client /
Server
Session
Connection
request
accept
rio_writen
rio_readlineb
rio_readlineb
rio_writen
close
EOF
rio_readlineb
close
Await connection
request from
next client
Echo Server: accept
accept() blocks waiting for a connection request
int listenfd; /* listening descriptor */
int connfd;
/* connected descriptor */
struct sockaddr_in clientaddr;
int clientlen;
clientlen = sizeof(clientaddr);
connfd = Accept(listenfd, (SA *)&clientaddr, &clientlen);
accept returns a connected descriptor (connfd) with
the same properties as the listening descriptor
(listenfd)
Returns when the connection between client and server is created
and ready for I/O transfers
All I/O with the client will be done via the connected socket
accept also fills in client’s IP address
Echo Server: accept Illustrated
listenfd(3)
Client
Server
clientfd
Connection
request
Client
1. Server blocks in accept,
waiting for connection request
on listening descriptor
listenfd
listenfd(3)
Server
2. Client makes connection request by
calling and blocking in connect
clientfd
listenfd(3)
Client
clientfd
Server
connfd(4)
3. Server returns connfd from
accept. Client returns from connect.
Connection is now established between
clientfd and connfd
Connected vs. Listening Descriptors
Listening descriptor
End point for client connection requests
Created once and exists for lifetime of the server
Connected descriptor
End point of the connection between client and server
A new descriptor is created each time the server accepts a
connection request from a client
Exists only as long as it takes to service client
Why the distinction?
Allows for concurrent servers that can communicate over many
client connections simultaneously
E.g., Each time we receive a new request, we fork a child to
handle the request
Echo Server: Identifying the Client
The server can determine the domain name, IP address,
and port of the client
struct hostent *hp; /* pointer to DNS host entry */
char *haddrp;
/* pointer to dotted decimal string */
unsigned short client_port;
hp = Gethostbyaddr((const char *)&clientaddr.sin_addr.s_addr,
sizeof(clientaddr.sin_addr.s_addr), AF_INET);
haddrp = inet_ntoa(clientaddr.sin_addr);
client_port = ntohs(clientaddr.sin_port);
printf("server connected to %s (%s), port %u\n",
hp->h_name, haddrp, client_port);
Echo Server: echo
The server uses RIO to read and echo text lines until EOF
(end-of-file) is encountered.
EOF notification caused by client calling close(clientfd)
void echo(int connfd)
{
size_t n;
char buf[MAXLINE];
rio_t rio;
Rio_readinitb(&rio, connfd);
while((n = Rio_readlineb(&rio, buf, MAXLINE)) != 0) {
upper_case(buf);
Rio_writen(connfd, buf, n);
printf("server received %d bytes\n", n);
}
}
Testing Servers Using telnet
The telnet program is invaluable for testing servers
that transmit ASCII strings over Internet connections
Our simple echo server
Web servers
Mail servers
Usage:
unix> telnet <host> <portnumber>
Creates a connection with a server running on <host> and
listening on port <portnumber>
Testing the Echo Server With telnet
$ ./echoserveri 28888
Use separate
SSH sessions
$ telnet localhost 28888
Trying ::1...
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Hello
hello
For More Information
W. Richard Stevens, “Unix Network Programming:
Networking APIs: Sockets and XTI”, Volume 1, Second
Edition, Prentice Hall, 1998
THE network programming bible
Unix Man Pages
Good for detailed information about specific functions
Web Services
Web History
1989:
Tim Berners-Lee (CERN) writes internal proposal to develop a
distributed hypertext system.
Connects “a web of notes with links.”
Intended to help CERN physicists in large projects share and
manage information
1990:
Tim BL writes a graphical browser for Next machines.
Web History (cont)
1992
NCSA server released
26 WWW servers worldwide
1993
Marc Andreessen releases first version of NCSA Mosaic browser
Mosaic version released for (Windows, Mac, Unix).
Web (port 80) traffic at 1% of NSFNET backbone traffic.
Over 200 WWW servers worldwide.
1994
Andreessen and colleagues leave NCSA to form “Mosaic
Communications Corp” (predecessor to Netscape).
Internet Hosts
Web Servers
HTTP request
Clients and servers
communicate using the
HyperText Transfer Protocol
(HTTP)
Client and server establish TCP
connection
Client requests content
Server responds with
requested content
Client and server close
connection (eventually)
Current version is HTTP/1.1
RFC 2616, June, 1999.
Web
client
(browser)
Web
server
HTTP response
(content)
HTTP
TCP
IP
Web content
Streams
Datagrams
http://www.w3.org/Protocols/rfc2616/rfc2616.html
Web Content
Web servers return content to clients
content: a sequence of bytes with an associated MIME (Multipurpose
Internet Mail Extensions) type
Example MIME types
text/html
text/plain
application/postscript
image/gif
image/jpeg
HTML document
Unformatted text
Postcript document
Binary image encoded in GIF format
Binary image encoded in JPEG format
Static and Dynamic Content
The content returned in HTTP responses can be either
static or dynamic.
Static content: content stored in files and retrieved in response to
an HTTP request
Examples: HTML files, images, audio clips.
Request identifies content file
Dynamic content: content produced on-the-fly in response to an
HTTP request
Example: content produced by a program executed by the
server on behalf of the client.
Request identifies file containing executable code
Bottom line: All Web content is associated with a file that
is managed by the server.
URLs
Each file managed by a server has a unique name called a URL
(Universal Resource Locator)
URLs for static content:
http://reed.cs.depaul.edu:80/index.html
http://reed.cs.depaul.edu/index.html
http://reed.cs.depaul.edu
Identifies a file called index.html, managed by a Web server at
reed.cs.depaul.edu that is listening on port 80.
URLs for dynamic content:
http://riely373.cdm.depaul.edu:8000/cgi-bin/adder?15000&213
Identifies an executable file called adder, managed by a Web server
at riely373.cdm.depaul.edu that is listening on port 8000,
that should be called with two argument strings: 15000 and 213.
How Clients and Servers Use URLs
Example URL: http://www.depaul.edu:80/index.html
Clients use prefix (http://www.depaul.edu:80) to infer:
What kind of server to contact (Web server)
Where the server is (www.depaul.edu)
What port it is listening on (80)
Servers use suffix (/index.html) to:
Determine if request is for static or dynamic content.
No hard and fast rules for this.
Convention: executables reside in cgi-bin directory
Find file on file system.
Initial “/” in suffix denotes home directory for requested content.
Minimal suffix is “/”, which all servers expand to some default
home page (e.g., index.html).
Anatomy of an HTTP Transaction
$ telnet reed.cs.depaul.edu 80
Trying 140.192.39.42...
Connected to reed.cti.depaul.edu.
Escape character is '^]'.
GET / HTTP/1.1
host: reed.cs.depaul.edu
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Accept-Ranges: bytes
ETag: W/"2285-1357855910000"
Last-Modified: Thu, 10 Jan 2013 22:11:50 GMT
Content-Type: text/html
Content-Length: 2285
Date: Mon, 04 Mar 2013 04:01:00 GMT
<html>
<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8”
...
HTTP Requests
HTTP request is a request line, followed by zero or more
request headers
Request line: <method> <uri> <version>
<version> is HTTP version of request (HTTP/1.0 or
HTTP/1.1)
<uri> is typically URL for proxies, URL suffix for servers.
A URL is a type of URI (Uniform Resource Identifier)
See http://www.ietf.org/rfc/rfc2396.txt
<method> is either GET, POST, OPTIONS, HEAD, PUT,
DELETE, or TRACE.
HTTP Requests (cont)
HTTP methods:
GET: Retrieve static or dynamic content
Arguments for dynamic content are in URI
Workhorse method (99% of requests)
POST: Retrieve dynamic content
Arguments for dynamic content are in the request body
OPTIONS: Get server or file attributes
HEAD: Like GET but no data in response body
PUT: Write a file to the server!
DELETE: Delete a file on the server!
TRACE: Echo request in response body
Useful for debugging.
Request headers: <header name>: <header data>
Provide additional information to the server.
HTTP Versions
Major differences between HTTP/1.1 and HTTP/1.0
HTTP/1.0 uses a new connection for each transaction.
HTTP/1.1 also supports persistent connections
multiple transactions over the same connection
Connection: Keep-Alive
HTTP/1.1 requires HOST header
Host: www.depaul.edu
Makes it possible to host multiple websites at single Internet
host
HTTP/1.1 supports chunked encoding (described later)
Transfer-Encoding: chunked
HTTP/1.1 adds additional support for caching
HTTP Responses
HTTP response is a response line followed by zero or more
response headers.
Response line:
<version> <status code> <status msg>
<version> is HTTP version of the response.
<status code> is numeric status.
<status msg> is corresponding English text.
200
301
403
404
OK
Moved
Forbidden
Not found
Request was handled without error
Provide alternate URL
Server lacks permission to access file
Server couldn’t find the file.
Response headers: <header name>: <header data>
Provide additional information about response
Content-Type: MIME type of content in response body.
Content-Length: Length of content in response body.
GET Request From Chrome Browser
URI is just the suffix, not the entire URL
GET / HTTP/1.1\r\n
Host: reed.cs.depaul.edu\r\n
Connection: keep-alive\r\n
Accept:
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.22
(KHTML, like Gecko) Chrome/25.0.1364.97 Safari/537.22\r\n
Accept-Encoding: gzip,deflate,sdch\r\n
Accept-Language: en-US,en;q=0.8\r\n
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3\r\n
Cookie:__utma=114012434.756988690.1360702406.1360702406.1360874291.2;
__utmz=114012434.1360874291.2.2.utmcsr=cdm.depaul.edu|utmccn=(referral
)|utmcmd=referral|utmcct=/academics/Pages/bs%20computerscience%20stand
ard.aspx\r\n
\r\n
GET Response From Apache Server
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1\r\n
Accept-Ranges: bytes\r\n
ETag: W/”2285-1357855910000”\r\n
Last-Modified: Thu, 10 Jan 2013 22:11:50 GMT\r\n
Content-Type: test/html\r\n
Content-Length: 2285\r\n
Date: Mon, 04 Mar 2013 04:58:40 GMT\r\n
\r\n
<html>\n
<head>\n
...
Tiny Web Server
Tiny Web server described in text
Tiny is a sequential Web server.
Serves static and dynamic content to real browsers.
text files, HTML files, GIF and JPEG images.
226 lines of commented C code.
Not as complete or robust as a real web server
Tiny Operation
Read request from client
Split into method / uri / version
If not GET, then return error
If URI contains “cgi-bin” then serve dynamic content
Fork process to execute program
Otherwise serve static content
Copy file to output
Tiny Serving Static Content
/* Send response headers to client */
From tiny.c
get_filetype(filename, filetype);
sprintf(buf, "HTTP/1.0 200 OK\r\n");
sprintf(buf, "%sServer: Tiny Web Server\r\n", buf);
sprintf(buf, "%sContent-length: %d\r\n", buf, filesize);
sprintf(buf, "%sContent-type: %s\r\n\r\n",
buf, filetype);
Rio_writen(fd, buf, strlen(buf));
/* Send response body to client */
srcfd = Open(filename, O_RDONLY, 0);
srcp = Mmap(0, filesize, PROT_READ, MAP_PRIVATE, srcfd, 0);
Close(srcfd);
Rio_writen(fd, srcp, filesize);
Munmap(srcp, filesize);
Serve file specified by filename
Use file metadata to compose header
“Read” file via mmap
Write to output
Serving Dynamic Content
Client sends request to
server.
If request URI contains the
string “/cgi-bin”, then the
server assumes that the
request is for dynamic
content.
GET /cgi-bin/env.pl HTTP/1.1
Client
Server
Serving Dynamic Content (cont)
The server creates a child
process and runs the
program identified by the
URI in that process
Client
Server
fork/exec
env.pl
Serving Dynamic Content (cont)
The child runs and generates
the dynamic content.
The server captures the
content of the child and
forwards it without
modification to the client
Client
Content
Server
Content
env.pl
Issues in Serving Dynamic Content
How does the client pass program
arguments to the server?
How does the server pass these
arguments to the child?
How does the server pass other info
relevant to the request to the child?
How does the server capture the
content produced by the child?
These issues are addressed by the
Common Gateway Interface (CGI)
specification.
Request
Client
Content
Content
Server
Create
env.pl
CGI
Because the children are written according to the CGI
spec, they are often called CGI programs.
Because many CGI programs are written in Perl, they are
often called CGI scripts.
However, CGI really defines a simple standard for
transferring information between the client (browser),
the server, and the child process.
The cdmlinux addition portal
input URL
host
port CGI program
args
Output page
Serving Dynamic Content With GET
Question: How does the client pass arguments to the server?
Answer: The arguments are appended to the URI
Can be encoded directly in a URL typed to a browser or a URL
in an HTML link
http://cdmlinux.cdm.depaul.edu/cgi-bin/adder?n1=4&n2=7
adder is the CGI program on the server that will do the addition.
argument list starts with “?”
arguments separated by “&”
spaces represented by “+” or “%20”
URI often generated by an HTML form
<FORM METHOD=GET ACTION="cgi-bin/adder">
<p>X <INPUT NAME="n1">
<p>Y <INPUT NAME="n2">
<p><INPUT TYPE=submit>
</FORM>
Serving Dynamic Content With GET
URL:
cgi-bin/adder?4&7
Result displayed on browser:
Welcome to THE Internet addition portal.
The answer is: 4+7=11
Thanks for visiting!
Serving Dynamic Content With GET
Question: How does the server pass these arguments to
the child?
Answer: In environment variable QUERY_STRING
A single string containing everything after the “?”
For add: QUERY_STRING = “4&7”
Additional CGI Environment Variables
General
SERVER_SOFTWARE
SERVER_NAME
GATEWAY_INTERFACE (CGI version)
Request-specific
SERVER_PORT
REQUEST_METHOD (GET, POST, etc)
QUERY_STRING (contains GET args)
REMOTE_HOST (domain name of client)
REMOTE_ADDR (IP address of client)
CONTENT_TYPE (for POST, type of data in message body, e.g.,
text/html)
CONTENT_LENGTH (length in bytes)
Even More CGI Environment Variables
In addition, the value of each header of type type
received from the client is placed in environment variable
HTTP_type
Examples (any “-” is changed to “_”) :
HTTP_ACCEPT
HTTP_HOST
HTTP_USER_AGENT
Serving Dynamic Content With GET
Question: How does the server capture the content produced by the child?
Answer: The child generates its output on stdout. Server uses dup2 to
redirect stdout to its connected socket.
Notice that only the child knows the type and size of the content. Thus the child
(not the server) must generate the corresponding headers.
/* Make the response body */
From adder.c
sprintf(content, "Welcome to add.com: ");
sprintf(content, "%sTHE Internet addition portal.\r\n<p>",
content);
sprintf(content, "%sThe answer is: %s\r\n<p>",
content, msg);
sprintf(content, "%sThanks for visiting!\r\n", content);
/* Generate the HTTP response */
printf("Content-length: %u\r\n", (unsigned) strlen(content));
printf("Content-type: text/html\r\n\r\n");
printf("%s", content);
Serving Dynamic Content With GET
$ telnet perko406.cdm.depaul.edu 8000
Trying 140.192.39.11...
Connected to perko406.cdm.depaul.edu.
Escape character is '^]'.
GET /cgi-bin/adder?4&7 HTTP/1.0
HTTP/1.0 200 OK
Server: Tiny Web Server
Content-length: 97
Content-type: text/html
HTTP request sent by client
HTTP response generated by the server
Welcome to THE Internet addition portal.
<p>The answer is: 4 + 7 = 11
<p>Thanks for visiting!
HTTP response generated by
Connection closed by foreign host.
the CGI program
$
Tiny Serving Dynamic Content
/* Return first part of HTTP response */
sprintf(buf, "HTTP/1.0 200 OK\r\n");
Rio_writen(fd, buf, strlen(buf));
sprintf(buf, "Server: Tiny Web Server\r\n");
Rio_writen(fd, buf, strlen(buf));
From tiny.c
if (Fork() == 0) { /* child */
/* Real server would set all CGI vars here */
setenv("QUERY_STRING", cgiargs, 1);
Dup2(fd, STDOUT_FILENO); /* Redirect stdout to client */
Execve(filename, emptylist, environ);/* Run CGI prog */
}
Wait(NULL); /* Parent waits for and reaps child */
Fork child to execute CGI program
Change stdout to be connection to client
Execute CGI program with execve
Proxies
A proxy is an intermediary between a client and an origin
server.
To the client, the proxy acts like a server.
To the server, the proxy acts like a client.
1. Client request
Client
2. Proxy request
Origin
Server
Proxy
4. Proxy response
3. Server response
Why Proxies?
Can perform useful functions as requests and responses pass
by
Examples: Caching, logging, anonymization, filtering, transcoding
Client
A
Request foo.html
foo.html
Request foo.html
Client
B
Request foo.html
Proxy
cache
foo.html
Fast inexpensive local network
foo.html
Slower more
expensive
global network
Origin
Server
For More Information
Study the Tiny Web server described in your text
Tiny is a sequential Web server.
Serves static and dynamic content to real browsers.
text files, HTML files, GIF and JPEG images.
220 lines of commented C code.
Also comes with an implementation of the CGI script for the add.com
addition portal.
See the HTTP/1.1 standard:
http://www.w3.org/Protocols/rfc2616/rfc2616.html