Agenda (1) - University of Wollongong
Download
Report
Transcript Agenda (1) - University of Wollongong
BUSS 909
Office Automation & Intranets
Lecture 6
Web Architecture and
Standards
Clarke, R. J (2001) L909-06:
1
Notices (1)
Assignment 2 is available from the
BUSS909 Intranet- includes a Marking
Criteria sheet
there are files on the intranet that provide
information needed for the assignment:
Organising Structures and Schemes
Media & Content Classification
Navigation, Labeling and Searching
Clarke, R. J (2001) L909-06:
2
Notices (2)
Additional files have been placed on the
BUSS909 Intranet
a fundamentals of ‘Information Theory and
Systems Theory’ file called
sl909-00. ppt
an introduction to different types of services
on the internet is available in a file called
sl909-03.ppt
Clarke, R. J (2001) L909-06:
3
Agenda (1)
WWW Basics
Web Server Overview
Web Documents & Trees
Hypertext Transfer Protocol (HTTP)
Serving a Web Document- Example
Clarke, R. J (2001) L909-06:
4
WWW Basics
Clarke, R. J (2001) L909-06:
5
WWW Basics
WWW and the Internet
Web Client and Web Server Software
Universal Resource Locators (URLs)
Hypertext Transfer Protocol (HTTP)
Hypertext Markup Language (HTML)
Clarke, R. J (2001) L909-06:
6
Uniform Resource Locators
Clarke, R. J (2001) L909-06:
7
Uniform Resource Locators (1)
Definition
a Uniform Resource Locator (URL) is
the address of a network resource.
URLs for the WWW actually contain
several components
the first component identifies the
URL scheme or protocol being used
to transfer information
Clarke, R. J (2001) L909-06:
8
Uniform Resource Locators (2)
Some Popular URL Schemes
Hypertext Transfer Protocol
HTTP using Secure Sockets Layer (SSL)
E-mail Address
File Transfer Protocol
Finger protocol
Gopher protocol
Wide Area Information Server
Usenet news
Usenet news via Network News Transfer Protocol (NNTP)
Usenet news via SSL-encrypted NNTP
Host-specific filenames
Internet Relay Chat session
Telnet interactive session
http
https
mailto
ftp
finger
gopher
wais
news
nntp
snews
file
irc
telnet
Clarke, R. J (2001) L909-06:
9
Uniform Resource Locators (3)
Server Name & Resource
the second component identifies the
name of a server sitting on the
Internet from which a resource is
being requested
the third component identifies part of
the server’s subdirectory and the file
name for a resource- most likely a
HTML document
Clarke, R. J (2001) L909-06:
10
Uniform Resource Locators (4)
‘Complete URL’ to UOW Home Page
URL scheme
server name
server’s subdirectory and
resource file name
http://www.uow.edu.au/index.html
Clarke, R. J (2001) L909-06:
11
Uniform Resource Locators (5)
Incomplete URL top UOW Home Page
However, the shorter URL
http://www.uow.edu.au/index.html
points to the ‘home page’ of that server
Web servers have a default filename
often default.html or index.html
Note: either this URL or the previous
one enables the user to view the home
page for UOW web site
Clarke, R. J (2001) L909-06:
12
Uniform Resource Locators (6)
Omitting the Scheme in Web URLs
Because of the popularity of WWW,
the scheme is occasionally omitted
web browsers are able to substitute
this parts of web URLs
the URL terra.uow.edu.au is
interpreted by Netscape as
http://terra.uow.edu.au/
Clarke, R. J (2001) L909-06:
13
Uniform Resource Locators (7)
Partial or Relative Web URLs
a partial or relative URL is one which
does not have a protocol, host, port, or
path
eg. rsch-ss.htm when referenced by
http://www.uow.edu.au/commerce/buss/
research.htm
is a relative form of
http://www.uow.edu.au/commerce/buss/
rsch-ss.htm
Clarke, R. J (2001) L909-06:
14
Uniform Resource Locators (8)
Anchors in Web URLs
Web URLs support the use of a # sign after
the HTML filename to indicate an anchor
for example,
http://www.uow.edu.au/residences/
inter_house/#Facilities
refers to the “Facilities” section of the
document inter_house.htm
Clarke, R. J (2001) L909-06:
15
Uniform Resource Locators (9)
Preserving State Information in URLs ...
WWW is inherently stateless
once a request from a client is
answered by a HTTP server, the
transaction is effectively concluded
the transaction’s current status is
lost, that is normally not recorded
for future transactions
Clarke, R. J (2001) L909-06:
16
Uniform Resource Locators (10)
… Preserving State Information in URLs ...
state information must be available
for many uses like:
electronic commerce across internet
(shopping carts), extranet (EDI), etc
researching on the web with search
engines which generally involves
multiple attempts at converging on a
small set of useful sources
Clarke, R. J (2001) L909-06:
17
Uniform Resource Locators (11)
… Preserving State Information in URLs ...
however, state can be preserved for the
duration of a user’s session by placing
additional information into the URL
this information is typically sent to the
CGI-BIN area on the server- the CGI-BIN
area is where user provided executable
routines are placed for execution
during a user’s session
Clarke, R. J (2001) L909-06:
18
Uniform Resource Locators (12)
… Preserving State Information in URLs ...
conventions exist for passing state
information to CGI routines
search parameters can form state
information- for example, search
term “intranets” can be sent as a
parameter to the query routine
located in the CGI bin of Ultavista
search engine
Clarke, R. J (2001) L909-06:
19
Uniform Resource Locators (13)
… Preserving State Information in URLs
Everything after the ? is the
parameter string that is past to the
query routine located on the Altavista
site
http://www.altavista.com/cgi-bin/
query?pg=q&kl=XX&q=intranets&search=Search
Clarke, R. J (2001) L909-06:
20
Web Server Overview
Clarke, R. J (2001) L909-06:
21
Web Server Overview
Web Server Components
Relationship to HTTP
Limits of Web Servers
Clarke, R. J (2001) L909-06:
22
Web Documents & Trees
Clarke, R. J (2001) L909-06:
23
Web Documents & Trees
MIME file extensions and types
Documents, Links and Anchors
Document Tree Organisation
Clarke, R. J (2001) L909-06:
24
Hypertext Transfer Protocol
Clarke, R. J (2001) L909-06:
25
Hypertext Transfer Protocol
browser and server communicate
using HTTP
simple set of rules designed to be
suitable for hypermedia systems
distributed across networks
must understand this protocol in order
to understand the WWW
HTTP defines a simple requestresponse ‘conversation’
Clarke, R. J (2001) L909-06:
26
Hypertext Transfer Protocol
HTTP does define how to correctly
format the request and the response
the client- often but not necessarily a
browser- is the requesting program and
establishes a connection to the
receiving program or server
the server replies with a response
including the requested information if
possible
Clarke, R. J (2001) L909-06:
27
Hypertext Transfer Protocol
HTTP does not define:
how the network connection is made or
managed, or
how the information is actually transmitted
(this is done by lower-level protocols such
as TCP/IP)
HTTP requests consist of a method, a
Universal Resource Identifier (URI), a
protocol version, and other information
Clarke, R. J (2001) L909-06:
28
Hypertext Transfer Protocol
HTTP Requests: Methods ...
HTTP Methods- commonly supported
methods include:
GET- which returns the object;
retreives the information
HEAD- returns only information about
the object, but not the object itself
POST- send information to be stored on
the server (eg. input to scripts)
Clarke, R. J (2001) L909-06:
29
Hypertext Transfer Protocol
... HTTP Requests: Methods
some HTTP methods are not supported
by many browsers because they may
put the integrity of the server at risk:
PUT- send a new copy of an existing
object
DELETE- permanently remove an object
other medthos may be added to the
standard in the future- HTTP is
extensible and has evolved- slowly
Clarke, R. J (2001) L909-06:
30
Hypertext Transfer Protocol
HTTP Requests: Information Client -> Server
User-Agent: kind of browser making request
If-Modified-Since: the object is returned only
if it is newer than a specified date (can save
the cost of a retrieval)
Accept: the MIME types and formats the
browser has been congigured to accept (can
save the cost of downloading an unreadable
document)
Authorization: user password etc. as required
Clarke, R. J (2001) L909-06:
31
Serving Documents- Example
Clarke, R. J (2001) L909-06:
32
Serving Documents- Example
1: Server waits for a new request
httpd program waits for a clients request
to arrive from somewhere on the Internet
server listens to a port until someone
calls it and until that occurs it is dormant
Clarke, R. J (2001) L909-06:
33
Serving Documents- Example
2: Request arrives from client ...
ultimately a request is sent by a
client to the server either by typing a
URL or selecting a HTML anchor
the network software (client) locates
the server computer and sets up a 2way network connection from the
client to the server
Clarke, R. J (2001) L909-06:
34
Serving Documents- Example
... 2: Request arrives from client
client can locate servers by the use of
Internet protocols and the name service
(DNS) to locate and initiate a connection
with the server
once the connection is established the
client sends the HTTP request:
GET /sample.htm HTTP/1.0
sent over the network in ASCII, server
receives it and saves it
Clarke, R. J (2001) L909-06:
35
Serving Documents- Example
3: server parses the request ...
server decodes the request using
HTTP protocol to determine what to do
there are three important pieces of
information:
the method instructs the server as to
what action should be taken. The GET
method is used to locate and read the file
and return it to the client ...
Clarke, R. J (2001) L909-06:
36
Serving Documents- Example
... 3: server parses the request
the document (/sample.htm) can be
fetched by the server because it knows
where it is in the document tree, and the
browser protocol being used (HTTP/1.0)
so that the contents can eventually be
returned to the client sent back over the
same connection as the request. (Note
that the server need not find the client on
the Internet or make a new connection)
Clarke, R. J (2001) L909-06:
37
Serving Documents- Example
4: Read other information (if necessary) ...
the httpd program reads the rest of the
requests needed
using HTTP/1.0 the browser is expected
to send additional information about
itself to the server
this meta-information describes the
browser and its capabilities which may
be needed by the server to reply to the
request
Clarke, R. J (2001) L909-06:
38
Serving Documents- Example
... 4: Read other information (if necessary)
for example:
User-agent: Mosaic for X Windows/2.4
Accept: text/plain
Accept text/html
Accept: image/*
indicates the browser is Mosaic
configured to display text, and any
kind of image
Clarke, R. J (2001) L909-06:
39
Serving Documents- Example
5: Do the requested method ...
Assuming no errors, the httpd
program executes the request
to GET a document requires looking
up the file /sample.htm in its document
tree using its standard operating
system
there are two alternative courses of
action depending on sucess or failure
Clarke, R. J (2001) L909-06:
40
Serving Documents- Example
... 5: Do the requested method (Success) ...
the httpd daemon sends a result code and
the information that describes the type of
information expected by the client
as the document is found a code 200
(everything is OK) is sent and the document will
follow
the information is a HTML document so the
Content-type: text/htm; the document is 1066
bytes long so the Content-length: 1066
the server software and the file date are also
included
Clarke, R. J (2001) L909-06:
41
Serving Documents- Example
... 5: Do the requested method (Success)
the header sent to the client might
look something like this:
HTTP/1.0 200 Document follows
Server: NCSA/1.4
Date: Thu, 20 Jul 1996 22:00:00 GMT
Content-type: text/html
Content-length: 1066
Last-modified: Thu, 20 Jul 1996 20:38:40 GMT
Clarke, R. J (2001) L909-06:
42
Serving Documents- Example
5: Do the requested method (Failure)...
if the requested file could not be found or
read then the status code will not be 200
the most common problem is that the
name of the requested file is misspelt so
the server cannot find it
if the requested file was called smple.htm
it would not be found- the server would
send a status code 403
Clarke, R. J (2001) L909-06:
43
Serving Documents- Example
... 5: Do the requested method (Failure)...
the response might look like this:
HTTP/1.0 403 Not Found
Server: NCSA/1.4
Date: Thu, 20 Jul 1996 22:00:00 GMT
Content-type: text/htm
Content-length: 0
Clarke, R. J (2001) L909-06:
44
Serving Documents- Example
6: Finish Up
when the file is completely sent or an
error message is sent,
the httpd server has finished its work- it
closes the file if it was open, and closes the
network port which terminates the network
connection
the client receives and formats the data- the
server knows nothing
the httpd server listens for another request
(go back to step 1)
Clarke, R. J (2001) L909-06:
45
Web Server Operations
Clarke, R. J (2001) L909-06:
46
Web Server Operations
a web server has a collection of
information in a document tree and it
serves it according to the HTTP protocol
web servers are reactive programs waiting
until a request is made; it attempts to
make it, this is repeated etc.
the previous example is only slightly
simplified
Clarke, R. J (2001) L909-06:
47
Web Server Operations
Handling Multiple Requests (1)
if a server processes one request at a
time, but can receive many simultaneous
requests then delays will occur- an image
may take several seconds to serve
without a priority scheme, small jobs that can
be serviced quickly take inordinate amount of
time to serve
with a large number of hits servers can go
down- backlog can be too great
Clarke, R. J (2001) L909-06:
48
Web Server Operations
Handling Multiple Requests (2)
web servers are therefore designed to
handle as many requests as possible
simultaneously
several strategies are available to do this
(the last two are are more difficult unless
special software is used):
clone a copy of the httpd program for each
request- very easy under UNIX
multithreading the httpd program
spreading the work amongst several helper
programs
Clarke, R. J (2001) L909-06:
49
Web Server Operations
Cloning Servers (1)
each request is processed by a new
copy of the httpd program
the original server called the parent
immediately returns to listening for
another request
the new copy called the child
performs the processing
Clarke, R. J (2001) L909-06:
50
Web Server Operations
Cloning Servers (2)
the parent passes the network
connection to the adult at the time
that it is first spawned
when the has services the request, it
terminates forever
the web server hardware may have
many copies of the httpd program
running simultaneously
Clarke, R. J (2001) L909-06:
51
Web Server Operations
Multithreaded Execution
many mechanisms can be used for
implementing this approach
server may monitor the progress of several
connections, switching between them as
necessary
when a lengthy process is in operation the
server may switch to another pending task
when the pending processes is complete it can
return to the previous lengthy process
server closes the network connections of any
finished processes
this can be an extremely efficient method
Clarke, R. J (2001) L909-06:
52
Web Server Operations
Servers as Cooperating Sets of Programs
the httpd server itself can be made a set of
cooperating programs specialised to
perform particular tasks
One program reads the requests fro the
network, another allocates them to
specialised helper programs
the scheme is very efficient, the number of
helpers can be adjusted to meet the
number of requests, the type of requests
(generally less common) or the size of the
system
Clarke, R. J (2001) L909-06:
53
Web Server Operations
Multiple Web Services on the same Servers
more than one web service can run on the
same computer
any number of httpd programs can run on a
UNIX machine as long as they have a unique
port number
the following web services are on the same
computer but different ports (the superuser sets
up port 80 servers, but users can own and operate
unrestricted ports above 1024):
http://www.rods.org/index.htm (port 80)
http://www.rods.org:8080/index.htm (port 8080)
http://www.rods.org:8081/index.htm (port 8081)
Clarke, R. J (2001) L909-06:
54
Web Server Operations
Establishing a Two-Way Network Connection
client must look up the network address of
the server using its name
the client’s system software sends a packet
back to the server, requesting a connection
the server’s system software sends a packet
back to the client, agreeing to set up a
connection
the client program is connected to the new
network connection
the server program is connected to the new
network connection
Clarke, R. J (2001) L909-06:
55