2-WWW1 - Faculty of Computer Science

Download Report

Transcript 2-WWW1 - Faculty of Computer Science

World Wide Web
Basics
Original version by Carolyn Watters
(Dalhousie U. Computer Science)
The Web…
• …is a distributed document delivery
system that uses Internet protocols
• …links documents stored in computers
communicating by the Internet
• Main authority is the W3 Consortium
www.w3.org
2
Basic Definitions
• Web server – machine that services
Internet request
• Web client – machine that initiates
Internet request
• Browser – software to interact with
Internet data at the web client
• TCP/IP – internet data protocol
• FTP – internet file transfer protocol
• HTTP – hypertext transfer protocol
• HTML – hypertext markup language
3
Servers and Clients
• Servers – computer systems at the end
of a network that store files and provide
other services
• Clients – computer systems that are
end points for users of the data
4
Client-Server Model & WWW
•
•
•
•
•
Cloud model
TCP/IP
HTTP and MIME types
FTP
Protocol stacks
5
Client-Server Model
6
Internet Model Layers
Application layer
Communication services (FTP, telnet, e-mail)
Transport layer
Transmission of messages end-to-end
Network services layer
Transmission of messages sequence of links
Data Link layer
Transmission of packet across one link
Physical layer
Where the signals move
7
Internet Layer Model
Application layer
Transport layer
http ftp smtp
telnet rlogin
TCP
UDP
Network Services
IP
Data Link layer
LAN link
Physical layer
Physical
Connection
8
Application Layer
•
•
•
•
•
FTP
HTTP
SMTP
Telnet
Etc.
9
TCP/IP
• Suite of protocols made the standard for the Internet
• facilitates communication between heterogeneous and
similar networks that are connected together
• reliable, connection oriented, byte stream protocol
10
Transport layer: TCP & UDP
TCP
– transmission control
protocol
– full duplex byte stream
– virtual path (connected)
– error free
– uses acknowledgements
– 16 bit address of ports
UDP
–
–
–
–
–
user datagram protocol
connectionless
no acknowledgements
no flow control
no resending of
erroneous packets
– some error detection
– 16 bit port addresses
11
Data Flow and Headers
12
TCP and IP
13
Network Layer: IP
• Delivers packets up to 64 Kb, 1 at a time
• Each packet has a header
– sending host and intended host network
addresses
– 32 bit addresses
• IP layer (like UDP)
– unreliable
– connectionless
14
Data Encapsulation
15
TCP/IP apps
TCP/IP software usually includes:
– remote terminal client using TELNET
protocol for remote login
– electronic mail client using SMTP protocol
to transfer e-mail to remote system
– file transfer client using FTP protocol to
transfer files between 2 machines
16
HTTP
HyperText Transport Protocol
• Native protocol for WWW
• Sits on top of internet’s TCP/IP protocol
• HTTP is a 4 step process per
transaction
• Uses a predefined set of document
formats from MIME
17
MIME
Multipurpose Internet Mail Extensions
– defines file formats (images, video, text, etc)
– e.g. Content-type: text/html
– Data type/subtype
» text/html
» text/plain
» image/gif
» video/mpeg
» application/msword
» etc!
18
HTTP
Connection
• 1. Client
– Makes an HTTP request for a web page
– Makes a TCP/IP connection
• 2. Server accepts request
– Sends page as HTTP
• 3. Client downloads page
• 4. Server breaks the connection
19
HTTP is Stateless!
• Each operation or transaction makes a
new connection
• each operation is unaware of any other
connection
• each click is a new connection
• So how do they do those shopping
carts?
20
What does it look like?
• Header + object file
• Header
–
–
–
–
–
plain text
info about the object (MIME, etc.)
methods allowed
etc.
browser sends a header to server each time you
ask for information
– server sends a header and possibly content
21
HTTP Transaction Example
GET /catalog/ip/ip.htm HTTP 1.0
Accept: text/plain
Accept: text/html
Referer:
http://www.june.com/catalog.html
User-Agent: Mozilla/2.0
CRLF
22
HTTP REQUEST PROTOCOL
Request = Simple | Full
Simple = GET <URI> CRLF
Full
= Method URI ProtVersion CRLF
[<HTRQ Header>*] [CRLF <data>]
Method = GET | POST | HEAD | ….
<HTRQ Header> = <Fieldname>:<Value>CRLF
<data> = MIME conforming message
w.w3.org/Protocols/HTTP/
23
HTTP Header fields
• General-header fields
– used for both requests and responses
• Request-header fields
– used for responses
– extra client information for use by server
– optional
24
General-header fields
• Date: Mon,11, Jan 1999 08:14:32 GMT
• MIME-version: 1.0
• Pragma: no cache
– directives
25
Request-header fields
• acceptable MIME types for response
– Accept:text/html
– Accept:*/*
• 401 response from client
– Authorization: Basic abcdef (uuencoded
username and password)
• From:client-email-addr
26
More Request-header fields
• If-Modified-Since:date
– conditional get
• source of current requested URL
– Referer:URL
• robot/browser identification
– User-Agent:Mozilla/2.0
27
Examining HTTP Header Values
• In perl
– $ENV{"From"}
• In Netscape
– www.cs.dal.ca/~jamie/cgibin/4173/about/env.cgi
28
HTTP Methods
• Client requests either
– simple request
– full request
Request-line= method
Request-URI HTTP-version CRLF
GET /catalog/ip.html HTTP/1.0
29
Simple requests
• Only for HTTP 0.9
• only uses Get method
• causes the server to locate and transfer
the object specified
• client responsible for handling the object
GET <uri> CRLF
30
Full Request
• Uses HTTP version and more methods
• method tells server what to do to the
resource requested
• Methods
– GET
– POST
– HEAD
31
GET Method
• Request server to retrieve object
specified
• conditional GET
– request message includes
– If-Modified-Since in header
32
HEAD Method
• Like GET but does not return the object
• returns a header about the resource
requested (meta information)
• good way to test link validity
33
POST Method
• Include an object in the request
• server should use that object in
processing the request
• must include a Content-Length in
header
34
HTTP Response Message
• HTTP protocol version
• 3 digit status code
• reason phrase
• CRLF
• optional header fields
• CRLF
35
HTTP Response Header
Fields
• Additional information about the server
• such as:
– LOCATION: exact URI address
– SERVER:
server software (CERN/3.0)
– WWW-AUTHENTICATE:
• status 401 responses (unauthorized request)
• server challenges client
• client may use to send authorization info to
server
36
Understanding STATUS
Codes
•
•
•
•
•
1xx – for information only
2xx – action successful
3xx – further action needed (redirect)
4xx – client request error
5xx – server error
37
HTTP Transaction
1. Client and server establish a connection
2. Client makes a request
3. Server makes a response
4. Server terminates connection
38
• Step 1 establish connection
–
–
–
–
TCP/IP connection set up
uses a port number as application reference
usually port 80
ports ≤ 1024 are privileged (>1024 are open)
• Step 2 client request
– HTTP message sent with a request line
– request-line = method URL HTTP version
39
• Step 3 Server response
– server sends HTTP message and
optionally requested data
– resp-message = HTTP version status code
reason-phrase [optional stuff]
• Step 4 connection terminated
– usually the server
– sometimes the client “stops” it
– anything else, whoever notices terminates
40
Some Port Assignments
•
•
•
•
•
•
21 FTP
23 Telnet
25 smtp (mail)
70 gopher
79 finger
80 HTTP
41