Transcript Protocol
Web Basics
Hypertext Transport Protocol
(HTTP)
Instructor: Sergey Goldman
1
Client-Server
2
HTTP
•
HTTP is the Language of the Web (WWW)
–
•
protocol used for communication between web browsers and web/application servers
Protocol
– An agreement about how to do something
•
•
•
•
•
Enables computers and software to be able to communicate.
HTTP is an asymmetric request-response client-server protocol.
An HTTP client sends a request message to an HTTP server.
The server returns a response message.
HTTP is a pull protocol, the client pulls information from the server
– (instead of server pushes information down to the client).
3
Other Protocols
Hyper Text Transfer Protocol (HTTP) – Web Browser
File Transfer Protocol (FTP) – File transfer
Simple Mail Transfer Protocol (SMTP) – Email
Internet Protocol (IP) – Packets across the Internet
TCP port 80 typically opened (8080, 8081, 8090…)
4
Ports to know
Port
TCP
21
22
23
UDP
Description
UDP
File Transfer Protocol (FTP)
control (command)[11][12][22][23]
UDP
Secure Shell (SSH),[11] secure
logins, file transfers (scp, sftp)
and port forwarding
UDP
Telnet protocol—unencrypted
text communications[11][24]
TCP
SCTP[12]
TCP
SCTP[12]
TCP
25
TCP
UDP
Simple Mail Transfer
Protocol (SMTP),[11][25] used for
email routing between mail
servers
26
TCP
UDP
Unassigned
38
TCP
UDP
Route Access
Protocol (RAP)[27][importance?]
53
TCP
UDP
Domain Name System (DNS)[11]
70
TCP
UDP
Gopher protocol[38]
UDP[44]
Hypertext Transfer
Protocol (HTTP)[11][45][46][47]
UDP
Post Office Protocol, version 3
(POP3)[11][55][56]
80
110
TCP
SCTP[12]
TCP
5
Web Client and Servers
•
•
Web content lives on web servers.
Web servers speak the HTTP protocol, so they are often called HTTP servers.
•
The HTTP servers store the Internet’s data and provide the data when it is requested
by HTTP clients.
•
Web servers host web resources. A web resource is the source of web content. The
simplest kind of web resource is a static file on the web server’s filesystem.
– text files
– HTML files
– Microsoft Word files
– Adobe Acrobat files
– MP3 files
– JPEG image files
– AVI movie files
…
6
Resources
7
Protocol Layering
• Internet Protocol (IP) provides a way to deliver packets to a
destination
http, ssh, ftp, smtp,
gopher, pop
(tcp connections)
DNS, VoIP
(udp
communication
data)
TCP
(packets over IP)
UDP
(raw packets over IP)
Internet Protocol (IP)
•
•
•
•
•
•
DNS – domain naming system
SMTP - Send email
POP – Read email
Telnet – old standard for exchanging transmission, directly connect
accounts on different systems.
Archie – 1980’s FTP
Gopher – first provided GUI
8
Internet/Networking
• The Internet
– WWW + HTML
– DNS
– Clients/Servers and UNIXS
• SSH, Telnet clients
• FTP client
…
• Networking
– Networks, Routers, and Packets
– Connections
• Hostnames, IP addresses, ports
• Protocols
…
9
Protocol Layers
HTTP, FTP, IMAP, DNS…
Protocols
TCP and UDP
Protocols
Internet Protocol
Hardware
Application Layer
Transport Layer
Network Layer
Data link and Physical Layers
10
7 layers
Applications
HTTP FTP SMTP POP
TCP/UDP
IP
Data link layer
protocols
Physical layer protocols
11
RFC 1945 (Request for Comments)
•
https://tools.ietf.org/html/rfc1945 HTTP 1.0
•
http://www.w3.org/Protocols/rfc2616/rfc2616.html HTTP 1.1.
•
An application-level protocol with the lightness and speed necessary
for distributed, collaborative, hypermedia information systems
•
It is a generic, stateless, object-oriented protocol which can be used
for many tasks, such as name servers and distributed object management
systems, through extension of its request methods (commands)
•
Stateless: the current request does not know what has been done in
the previous requests.
•
A feature of HTTP is the typing of data representation, allowing
systems to be built independently of the data being transferred. HTTP has
been in use by the World-Wide Web global information initiative since 1990.
This specification reflects common usage of the protocol referred to as
"HTTP/1.0"
12
Browser
When issuing
http://developer.android.com/sdk/index.html, the browser
turns the URL into a request message and sends it to the HTTP
server.
The HTTP server interprets the request message, and returns you an
appropriate response message, which is either the resource you
requested or an error message.
13
Media Types
• HTTP tags each object being transported through the Web with a
data format label called a MIME type (Multipurpose Internet Mail
Extensions)
– originally designed to solve problems encountered in moving
messages between different electronic mail
• Web servers attach a MIME type to all HTTP object data.
• When a web browser gets an object back from a server, it looks at
the associated MIME type to see if it knows how to handle the
object.
• Most browsers can handle hundreds of popular object types:
– image files, parsing and formatting HTML files, playing audio files
through the computer’s speakers, or launching external plug-in
software to handle special formats.
14
Media Types Examples
• An HTML-formatted text document would be labeled with type
text/html.
• A plain ASCII text document would be labeled with type text/plain.
• A JPEG version of an image would be image/jpeg.
• A GIF-format image would be image/gif.
• An Apple QuickTime movie would be video/quicktime.
• A Microsoft PowerPoint presentation would be
application/vnd.ms-powerpoint.
15
WEB .NET
16
HTTP Client-Server conversation
Client (initializes connection)
Server
• Open connection
• OK
• GET <file location>
• Send page or error message
• Display response
• Close connection
• OK
17
URI, URN, URL
• Uniform Resource Identifier(URI)
– a string of characters used to identify a name or
a resource on the Internet (information about a
resource)
– Classification:
• Uniform Resource Name (URN)
– The name of the resource with in a namespace
– Ex. ISBN: 0132575663 (urn: ISBN: 01-3257-56-63)
• Uniform Resource Locator (URL)
– How to find the resource, a URI that says how to find the resource
– Ex. file:///home/username/httpprotocal.pptx
18
URL Structure
<scheme>://<host>:<port>/<path>
• Scheme (protocol)
– HTTP, FTP, GOPHER, MAILTO, ...
• Host (name, domain.name)
– An IP address or DNS name
• Port
– TCP port number
– Optional (defaults to 80 for http)
• Path
– directory path to the resource
– resource name
• Ex.
– http://xxx.someplace.com/www/index.html
– http://xxx.somedomain.somename.com:80/cgi-bin/file.exe
19
HTTP example
•
Browser request
– http://www.oit.edu/~instructorname/
– http:// says to use HTTP protocol
– Resolve www.oit.edu in DNS
• 140.211.128.21
– Make TCP connection
• 140.211.128.21, port 80
– Send the following text string
• GET /~instructorname
•
Server response
HTTP/1.1 200 OK
Date: Wed, 11 Dec 2012 20:11:12 PST
Server: Apache/1.3.1
Last-Modified: Wed, 10 Dec 2012 20:10:01 PST
ETag: “66e12-3aba-1901aca1"
Content-Length: 15472
Accept-Ranges: bytes
Connection: close
Content-Type: text/html
<HTML>
…
20
Examples of URL
•
•
•
•
ftp://www.ftp.org/docs/test.txt
mailto:[email protected]
news:soc.culture.singapore
telnet://www.test101.com/
see
http://www.w3.org/Protocols/rfc977/rfc977
21
HTTP example cont
Header
Name:Value
22
23
Request Headers
•
Host: domain-name - HTTP/1.1 supports virtual hosts. Multiple DNS names (e.g., www.test101.com and
www.test102.com) can reside on the same physical server, with their own document root directories. Host header
is mandatory in HTTP/1.1 to select one of the hosts.
•
content negotiation media type, e.g. JPEG vs. GIF, or language used e.g. English vs. French. if the server
maintain multiple versions for the same document.
•
Accept: mime-type-1, mime-type-2, ... - The client can use the Accept header to tell the server the MIME types it
can handle and it prefers. If the server has multiple versions of the document requested (e.g., an image in GIF and
PNG, or a document in TXT and PDF), it can check this header to decide which version to deliver to the client.
(E.g., PNG is more advanced more GIF, but not all browser supports PNG.) This process is called content-type
negotiation.
•
Accept-Language: language-1, language-2, ... - The client can use the Accept-Language header to tell the
server what languages it can handle or it prefers. If the server has multiple versions of the requested document
(e.g., in English, Chinese, French), it can check this header to decide which version to return. This process is
called language negotiation.
•
Accept-Charset: Charset-1, Charset-2, ... - For character set negotiation, the client can use this header to tell
the server which character sets it can handle or it prefers. Examples of character sets are ISO-8859-1, ISO-88592, ISO-8859-5, BIG5, UCS2, UCS4, UTF8.
•
Accept-Encoding: encoding-method-1, encoding-method-2, ... - The client can use this header to tell the
server the type of encoding it supports. If the server has encoded (or compressed) version of the document
requested, it can return an encoded version supported by the client. The server can also choose to encode the
document before returning to the client to reduce the transmission time. The server must set the response header
"Content-Encoding" to inform the client that the returned document is encoded. The common encoding methods
are "x-gzip (.gz, .tgz)" and "x-compress (.Z)".
•
Connection: Close|Keep-Alive - The client can use this header to tell the server whether to close the connection
after this request, or to keep the connection alive for another request. HTTP/1.1 uses persistent (keep-alive)
24
connection by default. HTTP/1.0 closes the connection by default.
Request Headers
•
Referer: referer-URL - The client can use this header to indicate the referrer of this request. If you click a link from
web page 1 to visit web page 2, web page 1 is the referrer for request to web page 2. All major browsers set this
header, which can be used to track where the request comes from (for web advertising, or content customization).
Nonetheless, this header is not reliable and can be easily spoofed. Note that Referrer is misspelled as "Referer"
(unfortunately, you have to follow too).
•
User-Agent: browser-type - Identify the type of browser used to make the request. Server can use this
information to return different document depending on the type of browsers.
•
Content-Length: number-of-bytes - Used by POST request, to inform the server the length of the request body.
•
Content-Type: mime-type - Used by POST request, to inform the server the media type of the request body.
•
Cache-Control: no-cache|... - The client can use this header to specify how the pages are to be cached by proxy
server. "no-cache" requires proxy to obtain a fresh copy from the original server, even though a local cached copy
is available. (HTTP/1.0 server does not recognize "Cache-Control: no-cache". Instead, it uses "Pragma: nocache". Included both request headers if you are not sure about the server’s version.)
•
Authorization: Used by the client to supply its credential (username/password) to access protected resources.
(This header will be described in later chapter on authentication.)
•
Cookie: cookie-name-1=cookie-value-1, cookie-name-2=cookie-value-2, ... - The client uses this header to
return the cookie(s) back to the server, which was set by this server earlier for state management. (This header
will be discussed in later chapter on state management.)
•
If-Modified-Since: date - Tell the server to send the page only if it has been modified after the specific date.
25
Browser Response
• Client IP address
• The browser type
• The refer link
– What URL last looked at
• HTTP is stateless
– Does not provide storing of information between requests
– No indication of any relationship between two different requests
• Cookies (persistent client state for a URL)
– small data structures that a web server requests the HTTP client
to store on the local machine
• used to maintain state information e.g. cookies store recently view
items on a web shop
– Server response can include a set-cookie header
• Browser saves the cookie
• Browser resends cookies to server next time
26
DNS
a.
The browser extracts the server’s hostname from the URL
b.
The browser converts the server’s hostname into the server’s IP
address (DNS)
c.
The browser extracts the port number (if any) from the URL
d.
The browser establishes a TCP connection with the web server
e.
The browser sends an HTTP request message to the server
f.
The server sends an HTTP response back to the browser
g.
The connection is closed, and the browser displays the document
27
TELNET http
GET / HTTP/1.1
To enable Telnet command line
utilities Windows 8/10:
1.Click Start > Control Panel.
2.Click Programs and Features.
3.Click Turn Windows features on
or off.
4.In the Windows Features dialog
box, check the Telnet Client check
box.
5.Click OK. The system installs the
appropriate files. This will take a
few seconds to a minute.
28
Architectural Components of the Web
• Proxies
HTTP intermediaries that sit between clients and servers
• Caches
HTTP storehouses that keep copies of popular web pages
close to clients
• Gateways
Special web servers that connect to other applications
• Tunnels
Special proxies that blindly forward HTTP communications
• Agents
Semi-intelligent web clients that make automated HTTP
requests
29
HTML
<h1> Your name</h1>
<img src='img.jpg' alt='address' align='left'>
<br />salutation: Mr.
<br />Address:
<br />City: Portland
<br />State: Oregon
30
HTTP - methods
– GET
• retrieve a URL from the server
– simple page request
– run a CGI program
– run a CGI with arguments attached to the URL
– POST
•
•
•
•
preferred method for forms processing
run a CGI program
parameterized data in sysin
more secure and private than GET
– PUT
• Used to transfer a file from the client to the server
– HEAD
• requests URLs status header only
• used for conditional URL handling for performance enhancement
schemes
– retrieve URL only if not in local cache or date is more recent than cached
copy
31
CGI (Common Gateway Interface)
• Web servers must be able to server up content from dynamic
sources
– A server respond to a request is invoking an application that will
automatically generate a document to be returned
• One of the first approaches was CGI, a standard mechanism that enables HTTP
servers, to interface with external applications, which can serve as “gateways“
to the local information system
•
How does CGI work
–
assigns programs to URLs, so that when the URL is invoked, the program is executed
–
CGI programs serve as an interface between a database and a Web server, allowing users to
submit queries over the DB through predefined URLs
–
When the Web server receives request for the URL, it runs a program, that acts as a client of the
database and submit the query executing and packs the result into a HTML document
returned to browser
32
HTTP Request Packets
• Sent from client to server
• Consists of HTTP header
– header is hidden in browser environment
– contains:
•
•
•
•
content type / mime type
content length
user agent - browser issuing request
content types user agent can handle
• and a URL
33
Referer (misspelled)
• For Server’s benefit, client lists URL or document
(or document type) from which the URL in request
was obtained
• Allows server to generate back-links, logging,
tracing of bad links…
• Ex.
– Referer (spelled as is): http:/www.w3.com/xxx.html
34
Authorization:
• For Password and authentication schemes
• Ex.
– Authorization: user fred:mypassword
– Authorization: kerberos kerberosparameters
35
Response Status Codes
200 OK:
The request is fulfilled.
301 Move Permanently:
The resource requested for has been permanently moved to a new location. The URL of the new location is given in the response header
called Location. The client should issue a new request to the new location. Application should update all references to this new location.
302 Found & Redirect (or Move Temporarily):
Same as 301, but the new location is temporarily in nature. The client should issue a new request, but applications need not update the references.
304 Not Modified:
In response to the If-Modified-Since conditional GET request, the server notifies that the resource requested has not been modified.
400 Bad Request:
Server could not interpret or understand the request, probably syntax error in the request message.
401 Authentication Required:
The requested resource is protected, and require client’s credential (username/password). The client should re-submit the request with his credential
(username/password).
403 Forbidden:
Server refuses to supply the resource, regardless of identity of client.
404 Not Found:
The requested resource cannot be found in the server.
405 Method Not Allowed:
The request method used, e.g., POST, PUT, DELETE, is a valid method. However, the server does not allow that method for the resource requested.
408 Request Timeout:
414 Request URI too Large:
500 Internal Server Error:
Server is confused, often caused by an error in the server-side program responding to the request.
501 Method Not Implemented:
The request method used is invalid (could be caused by a typing error, e.g., "GET" misspell as "Get").
502 Bad Gateway:
Proxy or Gateway indicates that it receives a bad response from the upstream server.
503 Service Unavailable:
Server cannot response due to overloading or maintenance. The client can try again later.
504 Gateway Timeout:
Proxy or Gateway indicates that it receives a timeout from an upstream server.
36
State
• Hidden variables <input type=hidden>
• Sessions
– Special header tags interpreted by the server
• Used by ASP, PHP, JSP
– Implemented at the language API level
37
Servlets
• Performance CGI programs involve a certain overhead
• Separate process for each instance takes time, requires a context switch in
the operating system
• Multiple request results – multiple process
• To avoid this overhead, Java servlets can be used instead
• The idea is exactly the same as in CGI programs, but the implementation
differs
• How do they work?
– Execution and result is the same, but servlets are invoked directly
by embedding servlet-specific information within an HTTP request
run as threads of the Java server process, moreover they run
as a part of the Web server
eliminates overhead
38
Servlets Java and ASP.NET
import javax.servlet.*;
import javax.servlet.http.*;
import java.io.*;
public class SimpleServlet extends HttpServlet {
public void doGet(HttpServletRequest request,
HttpServletResponse response)
throws ServletException, java.io.IOException {
response.setContentType("text/html");
PrintWriter out = response.getWriter();
out.println("<html><body>");
out.println("Simple Servlet Body");
out.println("</body></html>");
out.close();
}
}
//-----------------//
using System;
using System.Web;
using System.Web.UI;
public class SimpleServlet : System.Web.UI.Page {
private void Page_Load(object sender, EventArgs args) {
Response.ContentType = "text/html";
Response.Write("<html><body>");
Response.Write("Simple Servlet Body");
Response.Write("</body></html>");
}
}
39