here - David Meredith`s Web Site

Download Report

Transcript here - David Meredith`s Web Site

Lecture 7
HTTP and Web Programming in Java
(Based on Møller and Schwartzbach, 2006, Chapter 8)
David Meredith
[email protected]
www.titanmusic.com/teaching/cis336-2006-7.html
CIS336
Website design, implementation and
management
(also Semester 2 of CIS219, CIS221 and
IT226)
1
The Internet and HTTP
• HTTP: Hypertext Transfer Protocol
– a cornerstone of the infrastructure of the Web
– prescribes how machines on the web exchange
• HTML and XML documents
• form field values
• ...
– uses a client-server model
• communication follows a simple request-response
pattern
– client always initiates the interaction
– client (e.g., browser) requests a resource by sending the
URL of the resource (e.g., HTML file) to a server
– if server accepts request then it returns the resource
2
Network layers
OUR APPLICATIONS
THE APPLICATION LAYER
HTTP, FTP, SMTP, DNS
THE TRANSPORT LAYER
TCP, UDP
THE INTERNET LAYER
IP
THE NETWORK INTERFACE LAYER
Ethernet
• Internet network protocols organised into a
number of layers
• Network Interface Layer is hardware used
to communicate bits from one physical
location to another (e.g., ethernet)
3
Internet Layer
OUR APPLICATIONS
THE APPLICATION LAYER
HTTP, FTP, SMTP, DNS
THE TRANSPORT LAYER
TCP, UDP
THE INTERNET LAYER
IP
THE NETWORK INTERFACE LAYER
•
Ethernet
Internet Layer is that of the Internet Protocol (IP)
–
IP addresses
•
•
•
•
–
used to identify machines on the network
e.g., 158.223.1.118 is the IP address of the Department of Computing Web server
(www.doc.gold.ac.uk)
Internet Assigned Numbers Authority (IANA) manages allocation of IP addresses to
organizations
127.0.0.1 always refers to the current machine (also called localhost)
Datagram
•
packet of data of limited size
–
–
IP defines how datagrams sent across the network
•
–
up to 65535 bytes, but only 1500 bytes on Ethernet network
involves routing through intermediate machines
IP is an unreliable protocol
•
datagrams may be lost, arrive out of order or duplicated
4
Transport Layer
OUR APPLICATIONS
THE APPLICATION LAYER
HTTP, FTP, SMTP, DNS
THE TRANSPORT LAYER
TCP, UDP
THE INTERNET LAYER
IP
THE NETWORK INTERFACE LAYER
Ethernet
• Transport layer contains Transmission Control Protocol (TCP)
– transmits data in a stream of unbounded size
– segments stream into IP datagrams and reassembles them at
destination
– Reliable protocol
• retransmits lost datagrams
• sorts datagrams into correct order when received
• discards duplicate datagrams
– Connection-oriented
• connection set up between two machines
• data can be sent in both directions across connection (full-duplex)
5
Sockets and ports
•
•
•
End points of a TCP connection are called sockets
Each socket is associated with a particular port on a particular
machine
Port is identified by an integer between 0 and 65535
– allows single machine to have many simultaneous connections, each to
a different port
– Ports 0-1023: well-known ports
• assigned to server applications executed by privileged processes (e.g., UNIX
root user), e.g.,
–
–
–
–
port 80 reserved for HTTP communication
ports 20 and 21 reserved for FTP servers
port 25 reserved for SMTP servers
port 443 reserved for HTTPS
– Ports 1024-49151: registered ports
• allocated by IANA to avoid vendor conflicts
• e.g., port 8080 reserved as alternative to 80 for running a web server using
ordinary user privileges
– Ports 49152-65535: dynamic or private ports
• can be freely used by any client or server program
•
Browsers obtain ports for their TCP sockets arbitrarily among
unused non-well-known ports
6
User Datagram Protocol (UDP)
• User datagram protocol (UDP) is an
alternative to TCP in the transport
layer
– UDP is unreliable and datagram-oriented
– faster than TCP
– can be used for voice and video where
speed is important and occasional losses
are acceptable
• UDP provides foundation for the
domain name system (DNS)
7
IP is getting old
• Specifications for TCP/IP are from
1981
– original ideas from 1960s developed by
DARPA
• Most internet traffic uses IPv4
– more than 20 years old
– shortage of IP addresses
• even though allows for 4 billion
• IPv6 solves IP address shortage
8
Application Layer
OUR APPLICATIONS
THE APPLICATION LAYER
HTTP, FTP, SMTP, DNS
THE TRANSPORT LAYER
TCP, UDP
THE INTERNET LAYER
IP
THE NETWORK INTERFACE LAYER
Ethernet
• Application layer contains applications of the transport layer,
e.g.,
– HTTP, FTP, SMTP, DNS
• HTTP requests and responses transmitted using TCP
• Two versions of HTTP:
– HTTP/1.0
– HTTP/1.1
• becoming more prevalent
• provides better support for caching, bandwidth optimization, error
notification, security and content negotiation
9
Domain Name System (DNS)
• Defines structure of domain names
• Defines services governing association of IP
addresses with domain names
– e.g., association of 82.165.120.54 with
www.titanmusic.com
• Benefits of DNS
– can move services from one machine to another without
changing domain name
– single domain name can be associated with many IP
addresses
• allows replication of servers
– decreases workload
– improves fault tolerance
– many domain names can be associated with a single IP
address
• virtual hosting
– domain names are easier to remember than IP addresses
10
URIs
• URI identifies network resource and has the
general form
http://<host>:<port>/<path>?<query>
– e.g.
http://www.google.com/search?q=An+Introduction+to+XML+and+Web+Technologies
• scheme is http
• host is www.google.com which is a domain name that has been registered
using DNS as being associated with one or more IP addresses
• no port specified (port 80 is the default for http)
• host and port identify web server program to be used to process request
• path is search
– path typically identifies file in server's file system or program that can generate
appropriate response
• query here is q=An+Introduction+to+XML+and+Web+Technologies
– contains arguments to program that processes request
• URI may also contain fragment identifier that accesses a particular part
(fragment) of a resource
– prefixed by # symbol
11
Requests
• HTTP request sent from client to server using TCP
• Entering the address
http://www.google.com/search?q=An+Introduction+to+XML+and+Web+Technologies
in a web browser causes
– TCP connection to be established with
• the IP address associated by DNS with www.google.com
• port 80 (default value)
– message such as one above to be sent from browser to server
• Line 1 is a request line
– here, uses GET method to ask the server to send the resource
/search?q=An+Introduction+to+XML+and+Web+Technologies
using HTTP/1.1
• Remaining lines are header lines, each with the form,
field: value
• HTTP/1.1 supports larger set of header fields than HTTP/1.0
12
Request header fields
• Host
– contains domain name and port number (if not
omitted) of server that receives request
– optional in HTTP/1.0, mandatory in HTTP/1.1
• User-Agent
– contains information about the user agent (e.g.,
browser) that sends the request
• allows response to be tailored for use in the client software
• Referer
– allows client to specify URI of resource from which
URI in request was obtained
• e.g., if HTML page contains an img link, then request for 13
image will contain Referer field set to URI of HTML page
•
•
Accept header field
Specifies media types that are acceptable as a response to the request
–
also called MIME types (Multipurpose Internet Mail Extensions)
•
now used for much more than e-mail
Common media type are
–
text/plain - plain, unformatted text
–
text/html - HTML documents (not XHTML)
–
text/xml - XML documents
–
application/xml - for XML documents intended for application use, not human-readable XML (not clearly demarcated from
text/xml)
–
application/xhtml+xml - recommended for use with XHTML
–
multipart/form-data - HTML-like form field values
–
application/octet-stream - arbitrary binary data and data that doesn't fit into other categories
–
image/jpeg - JPEG image
•
Long list of media types maintained by Internet Assigned Numbers Authority (IANA)
•
*/* means all media fields
•
Quality parameter: mime-type;q=value
–
value between 0 and 1 (default)
–
indicates that mime-type is only acceptable if the quality of other mime types with higher q values is less than value times the
quality of the mime-type format resource
14
Other request header fields
• Accept-Language
– defines acceptability of natural languages
• Accept-Encoding
– specifies accepted content codings
• usually compression techniques
• Accept-Charset
– specifies accepted character sets
• All can use q parameters
15
Responses
• Response from server
sent using same TCP
connection as request
• Response consists of
– header (lines 1-10 at left)
• begins with status line
indicating overall result
of attempt to satisfy
request
• followed by header lines
– body (lines 12-24 at left)
• contains requested
resource if request was
successful
• Response at left
returned when request
URI is
http://www.brics.dk/index.html
16
Response status line
• Status line (line 1 at left)
tells us that
– response uses HTTP/1.1
– status code for request is
200 OK
• means request succeeded
and resource follows header
• Five classes of status codes:
– 1xx indicates provisional,
informational response
– 2xx indicates success
• e.g., 200 OK
– 3xx indicates redirection
• e.g., 301 Moved
Permanently
– 4xx indicates client error
• e.g., 404 Not found
– 5xx indicates server error
• e.g., 500 Internal Server
Error
17
HTTP Response header lines
•
•
•
Date shows date and time when
response sent
Server contains information
about the server software
ETag used for cache
management
– usually digest of file size and last
modification time
•
•
•
•
•
Content-Length gives size of
body in bytes
Content-Type gives mime type
of resource in body
Content-Encoding indicates
whether resource has been
compressed (e.g., with gzip)
Transfer-Encoding, if present,
usually has value chunked,
indicating that resource is being
delivered in chunks
Location used with status codes
301 and 307 to give new location
of resource
18
•
•
HTML
Forms
When GO! button pressed, form
field values sent to server as list
of name-value pairs, encoded into
a query string according to media
type chosen using enctype
attribute in form element
Default media type is
application/x-www-formurlencoded
(URL encoding) which would
produce following:
bet=someone+else&email=toot%
40pop.com&send=GO%21
•
Fields listed in order of
appearance in source
& separates fields
= separates name from value
+ replaces each space
non-alphanumeric characters escaped
line breaks encoded as %0d%0a
19
get and post methods in an HTML form
• If form method is get, then query string is
appended to action URL:
– http://www.brics.dk/ixwt/echo?bet=someone+els
e&email=toot%40pop.com&send=GO%21
– Request line in HTTP request will therefore be
GET /ixwt/echo?bet=someone+else&email=toot%40pop.com&send=GO%21 HTTP 1.1
• If form method is post, then query string is
placed in body of HTTP request which might
then be as above
– as in response, body of request separated by
empty line from header
20
The difference between get and post
•
GET requests
– mainly for retrieving data
– safe to the client
• client not responsible for any side-effects on server
– idempotent - i.e., side effects of two or more identical requests are
same as for one
– generated by clicking on an HTML link
– limited by maximum URL length imposed by browsers
– only possible media type is application/x-www-urlencoded
•
POST request
– is for operations that have side-effects on the server
– user usually responsible for any side effects on server
– not necessarily idempotent
• clicking "reload" on a page that results from a POST request causes browser to
warn that this might repeat the action the form has carried out
– not limited by maximum URL length imposed by browsers
– used for sensitive information (e.g. passwords) because servers usually
log request URIs but not request bodies
21
Web programming with Java
• Java highly suitable for web (and XML)
programming because
– it is platform independent
– it has a safe runtime model
• array bound checks, automatic garbage collection,
bytecode verification, etc.
– supports multi-threading and concurrency
• useful for servers and clients
– supports Unicode
– comes with a suite of powerful libraries for
network programming
• Only other language that competes with it
for web programming is C#
22
TCP/IP in Java
• Accessing TCP/IP in Java usually
requires
– java.net.InetAddress
• represents an IP address
• can do DNS look-ups
– java.net.Socket
• represents a TCP socket
– java.net.ServerSocket
• represents a server socket which is capable
of waiting for requests from clients
23
Performing DNS look-up
• Above program takes a single argument which should be a domain
name
• In line 7, getAllByName method used to produce an array of
InetAddresses which contains the IP addresses associated with the
domain name
• In line 9, getHostAddress method used to get IP address from
each InetAddress object in array a and print it out
• getAllByName method may throw an UnknownHostException 24
Finding the domain name and IP
address of current machine
• Uses getLocalHost method in line 6 to construct an InetAddress
object containing information about the name and IP address of
the current machine on which the program is being executed
• Use getHostName and getHostAddress in lines 7 and 8 to get the
name and IP address of the current machine and print them out
• getLocalHost method may throw an UnknownHostException
25
Making a TCP connection between a
server and a client: The server
•
New ServerSocket created on line 7
•
Starts infinite loop in line 8, on each iteration of which,
–
uses accept method in line 9 to get ss to listen for a connection to be made on the port given on the command line, then accepts
it and creates a new socket, con, to represent the connection
–
constructs an InputStreamReader, in, to read bytes from the input stream of con (line 10) and convert them to characters
–
reads input using in, terminated with a 0 byte (lines 11-14) and stores in msg
–
attaches PrintWriter object, out, to the output stream of con and prints "Simon says: " plus the message in msg on this stream
(lines 15-17)
–
closes the connection con (line 18)
–
accept method may throw an IOException
26
Making a TCP connection between a
server and a client: The client
•
Establishes a connection with the SimpleServer by giving its IP address and port as command line arguments
•
The third command line argument is a message to send to the server
•
Attaches a PrintWriter to the output stream associated with the connection (line 8)
•
Prints the message given as an argument to the program to this output stream and terminates the message with
a zero byte
•
The read method (line 14) returns -1 when end of stream is reached
•
Then associates an InputStreamReader with the input stream associated with the connection and receives the
message sent by the server
•
Finally closes the connection (line 17)
•
getOutputStream method may throw an IOException
27
HTTP in Java
(The hard way)
•
Manually implements HTTP
support on top of TCP/IP
•
Sends request to Google
and extracts the result
•
Manually constructs an
HTTP request (lines 8-11,
15-17) using fact that
Google's "I'm Feeling
Lucky" feature accepts GET
requests of a particular
format
•
Parses response using fact
that response always
contains a Location header
line
•
First constructs a Socket and establishes a connection with Google server on port 80 (line 7)
•
Constructs a query string in the right format for the "I'm Feeling Lucky" feature (lines 8-11)
•
Writes the request header to an output stream attached to the socket (lines 12-18)
•
Reads response a line at a time until finds a header line starting with "Location:" (while loop starting in line
24)
•
Prints the URL value of this header line to standard output (line 26)
•
Closes connection (line 35)
28
•
•
•
•
•
HTTP in Java
(The easier way)
HttpURLConnection class makes it easier to create HTTP requests and parse
responses
Above program does same as previous one but uses HttpURLConnection object to
create a connection
First construct a URL object (line 13) then use its openConnection method to create a
URLConnection
URLConnection is an abstract class but when URL's scheme is http, openConnection
creates an HttpURLConnection
–
return value of openConnection should therefore be coerced to the correct class
29
Read http://www.google.com/terms_of_service.html before running this program!
Methods in HttpURLConnection
•
setRequestMethod
–
•
setRequestProperty
–
•
•
– can be disabled using
setInstanceFollowRedirects(false)
• see line 15 above
•
–
establishes TCP connection
–
usually not necessary since
connection attempted at first write
getOutputStream
returns response code (e.g., 200 for
OK)
getHeaderField
–
•
gives output stream for request body
of POST requests
getResponseCode
–
•
set to true (false by default) if intend
to write output to connection
connect
–
• Note that request header lines are called
properties in HttpURLConnection
• In HttpURLConnection, http redirects are
followed by default
should be set to true (default) if
intend to read input from connection
setDoOutput
–
•
sets a field:value pair in a header line
in the request
setDoInput
–
•
sets request method (usually GET or
POST)
returns field from response header
getInputStream
–
gives input stream for reading
response body
30
A simple Web server in Java
• Takes two command line
arguments
– a port
– the root directory for files to be
served
• Then instantiates the class
FileServer and starts it (lines
26-27)
31
A simple Web server in Java
• run method creates a
ServerSocket
• Starts infinite loop of
processing requests
• Only reads first line of each
32
request (lines 45-6)
•
•
•
•
•
•
•
•
•
A simple Web
server in Java
processRequest parses reqest
line
First makes sure request is
well-formed (lines 63-9)
Then ensures that URL does
not contain "/." or end with a
"~" (lines 72-5)
Then checks that if the file is a
directory then it ends with a
'/' and sends a "Moved
Permanently" message back
to the browser (which
typically resends the request
with the new URL (lines 7784)
If requested file is a directory,
then path of returned file set
to the file index.html in the
directory (lines 86-8)
Attaches input stream to
requested file (line 91)
Guesses content type of file
(lines 92-3)
Prints out the response on the
output print stream (lines 9499)
33
Logs interaction (line 100)
A simple Web server in Java
• log method
prints out
record of each
interaction
• errorReport
returns an
HTML Error
page to the
client browser
• sendFile sends
the file as the
body of an
HTTP response
as a sequence
34
of bytes