WEB - Gadjah Mada University

Download Report

Transcript WEB - Gadjah Mada University

HTTP
WEB
Risanuri Hidayat, Ir., M.Sc.
World Wide Web
T. Berners-Lee, R. Fielding, H. Frystyk: “Hypertext
Transfer Protocol - HTTP/1.0”, RFC 1945, 1996.

Naming scheme for resources
URL, URN, URI

Multimedia documents
MIME encoding (RFC)

Transfer protocol
HTTP/1.0, HTTP/1.1


Implemented over TCP/IP
Integrated with Internet infrastructure
DNS, SMTP
Sejarah
Hypertext systems:

no network access protocol
Gopher, WAIS

no hyperlinks
WWW @ CERN (Tim Berners-Lee, 1990)
HTTP/0.9 (1992)
Aplikasi Internet
Application
e-mail
remote terminal access
Web
file transfer
streaming multimedia
remote file server
Internet telephony
Application
layer protocol
Underlying
transport protocol
smtp [RFC 821]
telnet [RFC 854]
http [RFC 2068]
ftp [RFC 959]
proprietary
(e.g. RealNetworks)
NSF
proprietary
(e.g., Vocaltec)
TCP
TCP
TCP
TCP
TCP or UDP
TCP or UDP
typically UDP
What is HTTP
HTTP stands for Hypertext Transfer Protocol. It's the
network protocol used to deliver virtually all files and
other data (collectively called resources) on the World
Wide Web, whether they're HTML files, image files,
query results, or anything else. Usually, HTTP takes
place through TCP/IP sockets (and this tutorial ignores
other possibilities).
A browser is an HTTP client because it sends requests
to an HTTP server (Web server), which then sends
responses back to the client. The standard (and default)
port for HTTP servers to listen on is 80, though they can
use any port.
HTTP is used to transmit resources, not just files. A
resource is some chunk of information that can be
identified by a URL
HTTP
method
GET
URL or pathname
//www.dcs.qmw.ac.uk/index.html HTTP/ 1.1
HTTP version
HTTP/1.1
HTTP version headers message body
status code reason headers message body
200
OK
•Resource := MIME-encoded data
•Content negotiation
•Authentication
resource data
Methods:
•GET, HEAD, POST
•PUT, DELETE, TRACE,
OPTIONS, CONNECT
URL
URL
http://www.cdk3.net:8888/WebExamples/earth.html
DNS lookup
Resource ID (IP number, port number, pathname)
55.55.55.55
8888
WebExamples/earth.html
Web server
Network address
file
2:60:8c:2:b0:5a
Socket
HTTP Transactions
HTTP uses the client-server model:


An HTTP client opens a connection and sends a
request message to an HTTP server;
the server then returns a response message, usually
containing the resource that was requested.
After delivering the response, the server closes
the connection (making HTTP a stateless
protocol, i.e. not maintaining any connection
information between transactions).
HTTP Protocol
http: hypertext transfer
protocol
WWW’s application
layer protocol
client/server model


client: browser that
requests, receives,
“displays” WWW objects
server: WWW server
sends objects in
response to requests
http1.0: RFC 1945
http1.1: RFC 2068
PC running
Explorer
Server
running
Apache Web
server
SUN running
Netscape Navigator
HTTP Protocol
http: TCP transport
service:
client initiates TCP
connection (creates
socket) to server, port 80
server accepts TCP
connection from client
http messages
(application-layer
protocol messages)
exchanged between
browser (http client) and
WWW server (http
server)
TCP connection closed
http is “stateless”
server maintains no
information about past
client requests
Protocols that maintain “state”
are complex!
past history (state) must be
maintained
if server/client crashes, their
views of “state” may be
inconsistent, must be
reconciled
HTTP Protocol
The format of the request and response
messages are similar, and Englishoriented. Both kinds of messages consist
of:




an initial line,
zero or more header lines,
a blank line (i.e. a CRLF by itself), and
an optional message body (e.g. a file, or
query data, or query output).
Request
Initial Request Line


A request line has three parts, separated by spaces: a
method name, the local path of the requested
resource, and the version of HTTP being used.
A typical request line is:
GET /path/to/file/index.html HTTP/1.0



GET is the most common HTTP method; it says "give me this
resource". Other methods include POST and HEAD-- more on
those later. Method names are always uppercase.
The path is the part of the URL after the host name, also called
the request URI (a URI is like a URL, but more general).
The HTTP version always takes the form "HTTP/x.x",
uppercase
HTTP Request Header Format
Two types of messages: request, response
http request message:

ASCII (human-readable format)
request line
(GET, POST,
HEAD commands)
header
lines
Carriage return,
line feed
indicates end
of message
GET /somedir/page.html HTTP/1.1
Connection: close
User-agent: Mozilla/4.0
Accept: text/html,
image/gif,image/jpeg
Accept-language:en
(extra carriage return, line feed)
HTTP Request Header Format
Response/Reply
Initial Response Line (Status Line). The initial
response line, called the status line, also has
three parts separated by spaces:



the HTTP version,
a response status code that gives the result of the
request, and
an English reason phrase describing the status code.
Typical status lines are:


HTTP/1.0 200 OK or
HTTP/1.0 404 Not Found Notes:
HTTP Reply Header Format
status line
(protocol
status code
status phrase)
header
lines
data, e.g.,
requested
html file
HTTP/1.1 200 OK
Connection: close
Date: Thu, 06 Aug 1998 12:00:15 GMT
Server: Apache/1.3.0 (Unix)
Last-Modified: Mon, 22 Jun 1998 …...
Content-Length: 6821
Content-Type: text/html
data data data data data ...
HTTP Reply Status Code
200 OK

request succeeded, requested object later in this
message
301 Moved Permanently

requested object moved, new location specified later
in this message (Location:)
400 Bad Request

request message not understood by server
404 Not Found

requested document not found on this server
505 HTTP Version Not Supported
Sample HTTP Exchange
To retrieve the file at the URL
http://www.somehost.com/path/file.html first open a
socket to the host www.somehost.com, port 80 (use
the default port of 80 because none is specified in the
URL). Then, send something like the following through
the socket:
GET /path/file.html HTTP/1.0
From: [email protected]
User-Agent: HTTPTool/1.0
[blank line here]
Sample HTTP Exchange
The server should respond with something like the following, sent
back through the same socket:
HTTP/1.0 200 OK
Date: Fri, 31 Dec 1999 23:59:59 GMT
Content-Type: text/html
Content-Length: 1354
<html>
<body>
<h1>Happy New Millennium!</h1>
(more file contents) . . .
</body>
</html>
After sending the response, the server closes the socket.
User-server interaction: authentication
Authentication goal: control
access to server
server
client
documents
usual http request msg
stateless: client must
401: authorization req.
present authorization in
each request
WWW authenticate:
authorization: typically
name, password
usual http request msg
+ Authorization:line

authorization: header
line in request
usual http response msg
 if no authorization
presented, server
usual http request msg
refuses access, sends a
+ Authorization:line
WWW authenticate:
time
header line in
usual http response msg
response
User-server interaction: cookies
Server sends “cookie”
to client in response
Set-cookie: #
Client present cookie
in later requests
server
client
usual http request msg
usual http response +
Set-cookie: #
cookie: #
Server matches
presented-cookie with
server-stored cookies
 authentication
 remembering user
preferences,
previous choices
usual http request msg
cookie: #
usual http response msg
usual http request msg
cookie: #
usual http response msg
cookiespectific
action
cookiespectific
action
User-server interaction: conditional GET
client
Goal: don’t send object if
client has up-to-date
http request msg
If-modified-since:
stored (cached) version
<date>
client: specify date of
http response
cached copy in http
HTTP/1.0
304 Not Modified
request
If-modified-since:
<date>
server: response
contains no object if
cached copy up-to-date:
HTTP/1.0 304 Not
Modified
server
object
not
modified
http request msg
If-modified-since:
<date>
http response
HTTP/1.1 200 OK
…
<data>
object
modified
Message format: multimedia extensions
MIME: multimedia mail extension, RFC 2045, 2056
additional lines in msg header declare MIME content type
MIME version
method used
to encode data
multimedia data
type, subtype,
parameter declaration
encoded data
From: [email protected]
To: [email protected]
Subject: Picture of yummy crepe.
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Type: image/jpeg
base64 encoded data .....
.........................
......base64 encoded data
.
MIME types
Text
example subtypes:
plain, html
Video
example subtypes:
mpeg, quicktime
Image
example subtypes:
jpeg, gif
Audio
exampe subtypes:
basic (8-bit mu-law
encoded), 32kadpcm
(32 kbps coding)
Application
other data that must be
processed by reader
before “viewable”
example subtypes:
msword, octetstream
HTTP Headers (samples)
User-Agent

Mozilla/4.0
Accepts: (client-side)

Mean #bytes per header:
300 (requests), 160 (responses)
* Require parsing !
text/html, image/*
Content-type: (server-side)

text/html
Expires, Last-Modified, If-Modified-Since


absolute time stamps (1-sec resolution)
Eg: Thu, 03 Jun 1999 20:16:34 GMT=
Accept-Language, Accept-Charset
Content-encoding
HTTP/1.1 Improvements
B/W optimization


persistent connections
pipelining
does not block waiting for previous responses
end-of-message mechanism

Content-range
access only specified “range” of a resource
Explicit cache control (Cache-control)
Digest authentication (Content-MD5)
Web Caches (proxy server)
Goal: satisfy client request without involving origin server
User sets browser:
WWW accesses via
web cache
client sends all http
requests to web cache


if object at web cache,
web cache
immediately returns
object in http response
else requests object
from origin server, then
returns http response
to client
origin
server
client
client
Proxy
server
origin
server
Why WWW Caching?
Assume: cache is
“close” to client
(e.g., in same
network)
smaller response
time: cache “closer”
to client
decrease traffic to
distant servers

link out of
institutional/local ISP
network often
bottleneck
origin
servers
public
Internet
1.5 Mbps
access link
institutional
network
10 Mbps LAN
institutional
cache
Web caching (in)effectiveness
Observed hit ratios below 50%

even lower byte-weighted ratios !
Possible remedies ?




Prefetching
Delta-encoding
HTML macros
Duplicate suppression (digest-based)
HTTP status & perspective
J. C. Mogul, “What’s wrong with HTTP (and
why it doesn’t matter)”, Proc. USENIX
Technical Conference, 1999


Definitely not optimal
Probably adequate
It works well enough
It’s not the only game in town



Two-way initiation of operations
Real-time
Deferred delivery
Revising it again would be too hard

HTTP/1.0 -> HTTP/1.1 evolution took 4+ years !