Agenda (1) - University of Wollongong

Download Report

Transcript Agenda (1) - University of Wollongong

BUSS 909
Office Automation & Intranets
Lecture 6
Web Architecture and
Standards
Clarke, R. J (2001) L909-06:
1
Notices (1)
Assignment 2 is available from the
BUSS909 Intranet- includes a Marking
Criteria sheet
there are files on the intranet that provide
information needed for the assignment:
Organising Structures and Schemes
Media & Content Classification
Navigation, Labeling and Searching
Clarke, R. J (2001) L909-06:
2
Notices (2)
Additional files have been placed on the
BUSS909 Intranet
a fundamentals of ‘Information Theory and
Systems Theory’ file called
sl909-00. ppt
an introduction to different types of services
on the internet is available in a file called
sl909-03.ppt
Clarke, R. J (2001) L909-06:
3
Agenda (1)
WWW Basics
Web Server Overview
Web Documents & Trees
Hypertext Transfer Protocol (HTTP)
Serving a Web Document- Example
Clarke, R. J (2001) L909-06:
4
WWW Basics
Clarke, R. J (2001) L909-06:
5
WWW Basics
WWW and the Internet
Web Client and Web Server Software
Universal Resource Locators (URLs)
Hypertext Transfer Protocol (HTTP)
Hypertext Markup Language (HTML)
Clarke, R. J (2001) L909-06:
6
Uniform Resource Locators
Clarke, R. J (2001) L909-06:
7
Uniform Resource Locators (1)
Definition
a Uniform Resource Locator (URL) is
the address of a network resource.
URLs for the WWW actually contain
several components
the first component identifies the
URL scheme or protocol being used
to transfer information
Clarke, R. J (2001) L909-06:
8
Uniform Resource Locators (2)
Some Popular URL Schemes
Hypertext Transfer Protocol
HTTP using Secure Sockets Layer (SSL)
E-mail Address
File Transfer Protocol
Finger protocol
Gopher protocol
Wide Area Information Server
Usenet news
Usenet news via Network News Transfer Protocol (NNTP)
Usenet news via SSL-encrypted NNTP
Host-specific filenames
Internet Relay Chat session
Telnet interactive session
http
https
mailto
ftp
finger
gopher
wais
news
nntp
snews
file
irc
telnet
Clarke, R. J (2001) L909-06:
9
Uniform Resource Locators (3)
Server Name & Resource
the second component identifies the
name of a server sitting on the
Internet from which a resource is
being requested
the third component identifies part of
the server’s subdirectory and the file
name for a resource- most likely a
HTML document
Clarke, R. J (2001) L909-06:
10
Uniform Resource Locators (4)
‘Complete URL’ to UOW Home Page
URL scheme
server name
server’s subdirectory and
resource file name
http://www.uow.edu.au/index.html
Clarke, R. J (2001) L909-06:
11
Uniform Resource Locators (5)
Incomplete URL top UOW Home Page
However, the shorter URL
http://www.uow.edu.au/index.html
points to the ‘home page’ of that server
Web servers have a default filename
often default.html or index.html
Note: either this URL or the previous
one enables the user to view the home
page for UOW web site
Clarke, R. J (2001) L909-06:
12
Uniform Resource Locators (6)
Omitting the Scheme in Web URLs
Because of the popularity of WWW,
the scheme is occasionally omitted
web browsers are able to substitute
this parts of web URLs
the URL terra.uow.edu.au is
interpreted by Netscape as
http://terra.uow.edu.au/
Clarke, R. J (2001) L909-06:
13
Uniform Resource Locators (7)
Partial or Relative Web URLs
a partial or relative URL is one which
does not have a protocol, host, port, or
path
eg. rsch-ss.htm when referenced by
http://www.uow.edu.au/commerce/buss/
research.htm
is a relative form of
http://www.uow.edu.au/commerce/buss/
rsch-ss.htm
Clarke, R. J (2001) L909-06:
14
Uniform Resource Locators (8)
Anchors in Web URLs
Web URLs support the use of a # sign after
the HTML filename to indicate an anchor
for example,
http://www.uow.edu.au/residences/
inter_house/#Facilities
refers to the “Facilities” section of the
document inter_house.htm
Clarke, R. J (2001) L909-06:
15
Uniform Resource Locators (9)
Preserving State Information in URLs ...
WWW is inherently stateless
once a request from a client is
answered by a HTTP server, the
transaction is effectively concluded
the transaction’s current status is
lost, that is normally not recorded
for future transactions
Clarke, R. J (2001) L909-06:
16
Uniform Resource Locators (10)
… Preserving State Information in URLs ...
state information must be available
for many uses like:
electronic commerce across internet
(shopping carts), extranet (EDI), etc
researching on the web with search
engines which generally involves
multiple attempts at converging on a
small set of useful sources
Clarke, R. J (2001) L909-06:
17
Uniform Resource Locators (11)
… Preserving State Information in URLs ...
however, state can be preserved for the
duration of a user’s session by placing
additional information into the URL
this information is typically sent to the
CGI-BIN area on the server- the CGI-BIN
area is where user provided executable
routines are placed for execution
during a user’s session
Clarke, R. J (2001) L909-06:
18
Uniform Resource Locators (12)
… Preserving State Information in URLs ...
conventions exist for passing state
information to CGI routines
search parameters can form state
information- for example, search
term “intranets” can be sent as a
parameter to the query routine
located in the CGI bin of Ultavista
search engine
Clarke, R. J (2001) L909-06:
19
Uniform Resource Locators (13)
… Preserving State Information in URLs
Everything after the ? is the
parameter string that is past to the
query routine located on the Altavista
site
http://www.altavista.com/cgi-bin/
query?pg=q&kl=XX&q=intranets&search=Search
Clarke, R. J (2001) L909-06:
20
Web Server Overview
Clarke, R. J (2001) L909-06:
21
Web Server Overview
Web Server Components
Relationship to HTTP
Limits of Web Servers
Clarke, R. J (2001) L909-06:
22
Web Documents & Trees
Clarke, R. J (2001) L909-06:
23
Web Documents & Trees
MIME file extensions and types
Documents, Links and Anchors
Document Tree Organisation
Clarke, R. J (2001) L909-06:
24
Hypertext Transfer Protocol
Clarke, R. J (2001) L909-06:
25
Hypertext Transfer Protocol
browser and server communicate
using HTTP
simple set of rules designed to be
suitable for hypermedia systems
distributed across networks
must understand this protocol in order
to understand the WWW
HTTP defines a simple requestresponse ‘conversation’
Clarke, R. J (2001) L909-06:
26
Hypertext Transfer Protocol
HTTP does define how to correctly
format the request and the response
the client- often but not necessarily a
browser- is the requesting program and
establishes a connection to the
receiving program or server
the server replies with a response
including the requested information if
possible
Clarke, R. J (2001) L909-06:
27
Hypertext Transfer Protocol
HTTP does not define:
how the network connection is made or
managed, or
how the information is actually transmitted
(this is done by lower-level protocols such
as TCP/IP)
HTTP requests consist of a method, a
Universal Resource Identifier (URI), a
protocol version, and other information
Clarke, R. J (2001) L909-06:
28
Hypertext Transfer Protocol
HTTP Requests: Methods ...
HTTP Methods- commonly supported
methods include:
GET- which returns the object;
retreives the information
HEAD- returns only information about
the object, but not the object itself
POST- send information to be stored on
the server (eg. input to scripts)
Clarke, R. J (2001) L909-06:
29
Hypertext Transfer Protocol
... HTTP Requests: Methods
some HTTP methods are not supported
by many browsers because they may
put the integrity of the server at risk:
PUT- send a new copy of an existing
object
DELETE- permanently remove an object
other medthos may be added to the
standard in the future- HTTP is
extensible and has evolved- slowly
Clarke, R. J (2001) L909-06:
30
Hypertext Transfer Protocol
HTTP Requests: Information Client -> Server
User-Agent: kind of browser making request
If-Modified-Since: the object is returned only
if it is newer than a specified date (can save
the cost of a retrieval)
Accept: the MIME types and formats the
browser has been congigured to accept (can
save the cost of downloading an unreadable
document)
Authorization: user password etc. as required
Clarke, R. J (2001) L909-06:
31
Serving Documents- Example
Clarke, R. J (2001) L909-06:
32
Serving Documents- Example
1: Server waits for a new request
httpd program waits for a clients request
to arrive from somewhere on the Internet
server listens to a port until someone
calls it and until that occurs it is dormant
Clarke, R. J (2001) L909-06:
33
Serving Documents- Example
2: Request arrives from client ...
ultimately a request is sent by a
client to the server either by typing a
URL or selecting a HTML anchor
the network software (client) locates
the server computer and sets up a 2way network connection from the
client to the server
Clarke, R. J (2001) L909-06:
34
Serving Documents- Example
... 2: Request arrives from client
client can locate servers by the use of
Internet protocols and the name service
(DNS) to locate and initiate a connection
with the server
once the connection is established the
client sends the HTTP request:
GET /sample.htm HTTP/1.0
sent over the network in ASCII, server
receives it and saves it
Clarke, R. J (2001) L909-06:
35
Serving Documents- Example
3: server parses the request ...
server decodes the request using
HTTP protocol to determine what to do
there are three important pieces of
information:
the method instructs the server as to
what action should be taken. The GET
method is used to locate and read the file
and return it to the client ...
Clarke, R. J (2001) L909-06:
36
Serving Documents- Example
... 3: server parses the request
the document (/sample.htm) can be
fetched by the server because it knows
where it is in the document tree, and the
browser protocol being used (HTTP/1.0)
so that the contents can eventually be
returned to the client sent back over the
same connection as the request. (Note
that the server need not find the client on
the Internet or make a new connection)
Clarke, R. J (2001) L909-06:
37
Serving Documents- Example
4: Read other information (if necessary) ...
the httpd program reads the rest of the
requests needed
using HTTP/1.0 the browser is expected
to send additional information about
itself to the server
this meta-information describes the
browser and its capabilities which may
be needed by the server to reply to the
request
Clarke, R. J (2001) L909-06:
38
Serving Documents- Example
... 4: Read other information (if necessary)
for example:
User-agent: Mosaic for X Windows/2.4
Accept: text/plain
Accept text/html
Accept: image/*
indicates the browser is Mosaic
configured to display text, and any
kind of image
Clarke, R. J (2001) L909-06:
39
Serving Documents- Example
5: Do the requested method ...
Assuming no errors, the httpd
program executes the request
to GET a document requires looking
up the file /sample.htm in its document
tree using its standard operating
system
there are two alternative courses of
action depending on sucess or failure
Clarke, R. J (2001) L909-06:
40
Serving Documents- Example
... 5: Do the requested method (Success) ...
the httpd daemon sends a result code and
the information that describes the type of
information expected by the client
as the document is found a code 200
(everything is OK) is sent and the document will
follow
the information is a HTML document so the
Content-type: text/htm; the document is 1066
bytes long so the Content-length: 1066
the server software and the file date are also
included
Clarke, R. J (2001) L909-06:
41
Serving Documents- Example
... 5: Do the requested method (Success)
the header sent to the client might
look something like this:
HTTP/1.0 200 Document follows
Server: NCSA/1.4
Date: Thu, 20 Jul 1996 22:00:00 GMT
Content-type: text/html
Content-length: 1066
Last-modified: Thu, 20 Jul 1996 20:38:40 GMT
Clarke, R. J (2001) L909-06:
42
Serving Documents- Example
5: Do the requested method (Failure)...
if the requested file could not be found or
read then the status code will not be 200
the most common problem is that the
name of the requested file is misspelt so
the server cannot find it
if the requested file was called smple.htm
it would not be found- the server would
send a status code 403
Clarke, R. J (2001) L909-06:
43
Serving Documents- Example
... 5: Do the requested method (Failure)...
the response might look like this:
HTTP/1.0 403 Not Found
Server: NCSA/1.4
Date: Thu, 20 Jul 1996 22:00:00 GMT
Content-type: text/htm
Content-length: 0
Clarke, R. J (2001) L909-06:
44
Serving Documents- Example
6: Finish Up
when the file is completely sent or an
error message is sent,
the httpd server has finished its work- it
closes the file if it was open, and closes the
network port which terminates the network
connection
the client receives and formats the data- the
server knows nothing
the httpd server listens for another request
(go back to step 1)
Clarke, R. J (2001) L909-06:
45
Web Server Operations
Clarke, R. J (2001) L909-06:
46
Web Server Operations
 a web server has a collection of
information in a document tree and it
serves it according to the HTTP protocol
 web servers are reactive programs waiting
until a request is made; it attempts to
make it, this is repeated etc.
 the previous example is only slightly
simplified
Clarke, R. J (2001) L909-06:
47
Web Server Operations
Handling Multiple Requests (1)
 if a server processes one request at a
time, but can receive many simultaneous
requests then delays will occur- an image
may take several seconds to serve
 without a priority scheme, small jobs that can
be serviced quickly take inordinate amount of
time to serve
 with a large number of hits servers can go
down- backlog can be too great
Clarke, R. J (2001) L909-06:
48
Web Server Operations
Handling Multiple Requests (2)
 web servers are therefore designed to
handle as many requests as possible
simultaneously
 several strategies are available to do this
(the last two are are more difficult unless
special software is used):
 clone a copy of the httpd program for each
request- very easy under UNIX
 multithreading the httpd program
 spreading the work amongst several helper
programs
Clarke, R. J (2001) L909-06:
49
Web Server Operations
Cloning Servers (1)
each request is processed by a new
copy of the httpd program
the original server called the parent
immediately returns to listening for
another request
the new copy called the child
performs the processing
Clarke, R. J (2001) L909-06:
50
Web Server Operations
Cloning Servers (2)
the parent passes the network
connection to the adult at the time
that it is first spawned
when the has services the request, it
terminates forever
the web server hardware may have
many copies of the httpd program
running simultaneously
Clarke, R. J (2001) L909-06:
51
Web Server Operations
Multithreaded Execution
 many mechanisms can be used for
implementing this approach
 server may monitor the progress of several
connections, switching between them as
necessary
 when a lengthy process is in operation the
server may switch to another pending task
 when the pending processes is complete it can
return to the previous lengthy process
 server closes the network connections of any
finished processes
 this can be an extremely efficient method
Clarke, R. J (2001) L909-06:
52
Web Server Operations
Servers as Cooperating Sets of Programs
 the httpd server itself can be made a set of
cooperating programs specialised to
perform particular tasks
 One program reads the requests fro the
network, another allocates them to
specialised helper programs
 the scheme is very efficient, the number of
helpers can be adjusted to meet the
number of requests, the type of requests
(generally less common) or the size of the
system
Clarke, R. J (2001) L909-06:
53
Web Server Operations
Multiple Web Services on the same Servers
 more than one web service can run on the
same computer
 any number of httpd programs can run on a
UNIX machine as long as they have a unique
port number
 the following web services are on the same
computer but different ports (the superuser sets
up port 80 servers, but users can own and operate
unrestricted ports above 1024):
http://www.rods.org/index.htm (port 80)
http://www.rods.org:8080/index.htm (port 8080)
http://www.rods.org:8081/index.htm (port 8081)
Clarke, R. J (2001) L909-06:
54
Web Server Operations
Establishing a Two-Way Network Connection
 client must look up the network address of
the server using its name
 the client’s system software sends a packet
back to the server, requesting a connection
 the server’s system software sends a packet
back to the client, agreeing to set up a
connection
 the client program is connected to the new
network connection
 the server program is connected to the new
network connection
Clarke, R. J (2001) L909-06:
55