Lecture note 16

Transcript Lecture note 16

HTTP and Web Server
16-1
HTTP
• HTTP is the protocol used
by Web applications to
exchange information.
• HTTP is a simple stateless
request-response protocol.
• HTTP uses TCP.
• Do not be confused with
HTML, which is a mark up
language for describing a
web page’s format.
16- 2
Web Page
• Consists of objects.
• An object is simply a file – HTML file, JPEG,
Audio clip, etc.
• A Web page has a base html file that may
reference several objects via the object’s URL.
• A web browser will first fetch the web page’s base
html file and then fetch the objects referenced in
the base file from a web server.
16- 3
Uniform Resource Locator (URL)
• It is a pointer to an object or
a service on the Internet.
• It has five components:
–
–
–
–
–
Protocol or applications
Hostname
TCP/IP port
Path name
File name
16- 4
URL Protocols
16- 5
TCP Connections
• HTTP 1.0 uses nonpersistent connections.
– To fetch each object, the web browser needs to open
and establish a TCP connection to the web server.
• Serial open: Mosaic
• Parallel open: Netscape
• Persistent connections is the default mode for
HTTP 1.1.
– The web browser opens and establishs only one TCP
connection to the web server.
– Then all objects are transferred on this TCP connection.
16- 6
Why Using Persistent?
• If we use nonpersistent connections
– Parallel open
• “The easiest way to kill the Internet.” said by a famous
network researcher, V.B.
• Before, we have shown that the packet drop rate in the
bottleneck router will grow as the number of competing
TCP connections grow.
– Serial open
• Better for congestion control but suffer longer download
delay.
• Why, each object needs to open a new TCP connection,
which needs to take 1.5 RTT to finish its 3-way
handshaking. When the requested object returns, it is
16- 7
already 2 RTT later.
Persistent Connections
• If we use persistent connections:
– The Internet congestion is better controlled.
– The download delay is reduced from 2*RTT to only 1 RTT.
• No need to go through the 3-way handshaking.
• Only the request/response delay.
• To further reduce download time, a pipelining
technique can be used (the default mode).
– Requests can be sent before the response of the previous
request returns.
– All objects can be requested and returned in one RTT. (if
they are small.)
• The server will close the connection after a certain
period of idle time.
16- 8
HTTP Request Message
Don’t want to use
persistent connections
Example:
Used by POST
request
16- 9
HTTP Response Message
Used for cache control
16- 10
Methods (Requests)
• GET
– Retrieve whatever information identified by the URL.
• HEAD
– Identical to GET but does not return the entity body
• POST
– Transfer some information to the server. The function performed
depends on the requested URL.
• PUT
– Overwrite a file
• DELETE
– Delete a file
• OPTION
– Ask a server to return its feature and capability
• TRACE
– Each back service for debugging purposes
16- 11
Content Negotiation
• Because the capability (e.g., network bandwidth, screen
resolution and size, CPU power, etc.) of user/browser/agent
may be very different, HTTP provides a content negotiation
mechanism that can be used to choose the best representation
for an object.
• Server-driven negotiation
– Server decides which representation is the best for a client
and sends back the chosen representation.
• Agent-driven negotiation
– Server just returns the various representation options.
Clients make the decision.
• Transparent negotiation (to client, from the client’s viewpoint)
– A proxy server can do the agent-driven negotiation on
behalf of the client after getting a list of representation
16- 12
options from the origin server.
Stateless Causes Problems
• HTTP is a stateless protocol, which makes it
robust in presence of crashes.
• However, in some applications, the web server
may want to keep states for a client.
– Example 1: Only allow an already authenticated user to
access some web pages.
– Example 2: Implement shopping carts, which can
accumulate (remember) the goods that a user has
ordered.
– Example 3: Personalized commercial product
advertisements (know your preferences, know you
personal home page, etc…)
• Authenticate header and cookies are two solutions.
16- 13
Authentication
• The client first sends an ordinary request to the
server.
• The server then responds with empty body with
“401 Authorization Required” status code.
– WWW-Authenticate: header is included which
specifies how to do the authentication
• The client resends the request which this time
includes an “Authorization” header.
– This header typically will include the username and
password information.
• After obtaining the first object, the client
continues to send the cached username and
password in subsequent requests.
16- 14
Cookies
• A client contacts a web site for the first time.
• The server response includes a Set-cookies: header.
– This line include an identification number.
– E.g., Set-cookie: 1678453
• When the client receives the response message and
see Set-cookies, it appends the line to its cookies file.
• In subsequent requests to the same server, the client
will include Cookie:1678453 in its request header.
• In this manner, the server does not know the user
name of the user, bust it does know that this user is
the same user that made requests before.
– Shopping cart thus can be implemented.
16- 15
Cookies Applications
• If the server requires authentication but does not
want to hassle a user with a password prompt every
time.
• If a server wants to remember a user’s preference so
it can provide targeted advertisement.
• If the server wants to implement a shopping cart or
log in a user to his home page automatically when
the user contacts the web site such as Yahoo.
• Problem: Cookies poses problems for nomadic users
who access the same site from different machines.
• Problems: There are also many privacy issues.
16- 16
Cache Control Mechanism
• To reduce network bandwidth usage and download delay,
cache is often used.
• Cache can be employed either at the client, server, or
proxy side.
– Client side: very good for “back” button
– Server side: multiple cache servers sitting in front of the server to
do load balancing
– Proxy side: multiple clients share the cache in the proxy server
• A proxy server acts as both a client and a server.
– It receives Web requests from clients.
– If its cache has the requested objects, these objects are returned
to the clients.
– Otherwise, it makes requests to the origin server to download the
requested objects into its cache, and then returns the objects to
16- 17
the clients.
Expiration Model
• If a web page can specify its lifetime (i.e.,
when its validness will expire), then the
client can check a cached web page to
determine whether it needs to download the
web page from the origin server again.
• There are two headers designed for this
purposes:
– Expire: Thu, 01 Dec 1994 16:00:00 GMT
– Cache-Control: max-age = 3600
16- 18
Conditional GET
• To save network bandwidth, we do not want
a client to retrieve an object again if it is
still the same as what is cached on the client.
• By using the Last-Modified and Ifmodified-since headers, the GET method
becomes the conditional GET.
• The client will retrieve the object only if it
has been modified since the last download.
16- 19
Conditional GET Example
16- 20
E-Tag Method
• Actually to compare whether two objects are still the same,
beside comparing the modification time, there is another
method.
• This method is to compare an entity tag, which can be any
number or text string. (e.g., ETAG: “abc”)
• The E-tag functions as a generation number. Whenever the
object is modified at the origin server, the object’s E-tag is
changed.
• When an object is downloaded, its E-tag is also downloaded
with it to the client.
• Later if the client wants to access the same object, its request
will include the downloaded E-tag.
• By comparing the current and submitted E-tags, the server
then knows whether it should return a more recent object to
16- 21
the client.
No Cache
• In some situations, we may not want our web pages
to be cached. For example:
– A commercial web site (e.g., Yahoo) wants to know its
real hit count to price its advertisement. Using cache
would hide its user.
– You are afraid to lose your control over your content.
People just keep spreading your content without coming
back to your origin site to download new content.
• Methods that you can use to disable cache:
–
–
–
–
Cache-control: max-age = 0
Cache-control: no-cache
Expire: (set the time to a very old time that has passed)
Pragma: no-cache
16- 22
Request Redirection
• A server can redirect a received request to another server
or proxy if it does not have the requested object and it
knows that some other server/proxy may have it.
• This feature can be used as a cache mechanism to reduce
network bandwidth usage or download delay.
• This feature can also be used as a load balancing
mechanism for a very busy web site such as yahoo.
(yahoo.com -> yahoo3.com):
– LAN: in a closet (e.g., a web cluster)
– WAN: spread over the globe
• The method designed for this purpose:
– Location: www.yahoo3.com (a response header)
• Note: This mechanism is a layer-7 mechanism. It is easy
16- 23
to use but its performance is not very good.
Referer
• If you have a web server hosting several web
pages, are you interested in knowing who or
which web sites have links pointing to your web
pages? Knowing this information may be useful:
– Know which web sites are taking advantage of your
web pages (may be a good or bad thing)
– Find a better cache location/hierarchy to serve your
web pages, etc.
• This information of course should be provided by
the client (optional”). The method used for this
purpose is:
– Referer: http://www.csie.nctu.edu.tw/office.
16- 24
Preferred Language
• When we connect to the Google search engine,
why is it so smart to return a Chinese web page?
instead of a English web page?
• Of course, the client needs to provide its language
preference explicitly. Otherwise, the Google has
no way to know which language you prefer to see.
• Method designed for this purpose:
– Accept-Language: ch, en
16- 25
Dynamic Pages
• Some web pages are generated dynamically. The
content of these pages may depend on the user
(e.g., cookies) identity or user input, not totally on
the URL. These pages usually thus are not
cacheable.
• There are many ways to generate dynamic web
pages. One of them is through CGI (common
gateway interface).
• CGI is not a programming language, but rather a
specification that allows a HTTP server to
exchange information with other programs (php,
perl script, C program, …..)
16- 26
Form and CGI Example
<html><head><title>Login</title></head>
<br><br><br>
<form method="post" action=“cgi-bin/login.sh">
<div align="center">
<p><img src="image/logot.gif" width="244" height="124"></p>
<p><b>User Name</b>:
<input name="user" type="text" size="10">
<br>
<b>Password: </b>
<input name="password" type="password" size="16">
<br>
<input name="enter" type="submit" value="login">
</p>
</div>
</form>
</body>
</html>
16- 27
Virtual Hosts
• It is economic for a web hosting company to use just
one web server to serve many web sites, rather than
one server for each web site.
• For example: www.abc.com, www.xyz.com, and
www.nctu.edu.tw are all served by the same server.
• To do this, we can configure the web hosting
machine with many IP addresses (Although it has
only one physical network interface card), then tell
the web server which IP address maps to which web
site.
• Of course, you need to configure the DNS server so
that it returns one of these IP addresses for
16- 28
www.abc.com.
Virtual Hosts Example
16- 29
Access Audio/Video Servers
• Nowadays most web browsers can support
downloading audio and video files.
• To play audio/video files, however, a web browser
usually needs a helper program which understands
the complicated format of these audio/video files.
• The problem with implementing this function is that
before the browser contacts the web server, it does
not know the format of a download A/V file. Thus it
can not fork an appropriate helper program to receive
the downloaded file.
16- 30
A Naive Implementation
• Instead, the browser needs to receive the whole A/V file, then
fork the helper program, then transfer the file to the helper
program.
• This design is inefficient, especially when the file is very huge.16- 31
A Better Implementation for Whole
Download
• To solve the problem, the downloaded file is a short
metafile describing the format of the A/V file.
• The browser, upon knowing the format of the A/V file,
forks the helper program and asks the helper program
to receive and playback the A/V file.
• In this case, the A/V file is stored on the same web
server and is transferred back to the browser as
normal html files (by TCP) by the same web server.
16- 32
A Better Implementation for Whole
Download
16- 33
A Better Implementation for
Streaming
• If the media is a stream such as a movie or music
clip and the transfer is a streaming transfer (i.e.,
the helper can start playing the media without
receiving the whole file), the normal Web server
does not know how to deal with streaming
• To solve this problem, we need to run up a
streaming server and let the helper program to
contact the streaming server directly.
• The streaming server usually uses UDP to send
these A/V packets to the helper program.
16- 34
A Better Implementation for
Streaming
16- 35

Lecture note 16

Transcript Lecture note 16

Directory