The World Wide Web - Monash University

Download Report

Transcript The World Wide Web - Monash University

Topic 9: The World Wide Web
CSE2395/CSE3395
Perl Programming
Camel3 page 878
LWP, lwpcook, CGI manpages
In this topic
 The World Wide Web
 Writing a Perl web client
►
LWP module
 Dynamic web pages
►
Common Gateway Interface (CGI)
2
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
The World Wide Web
 Developed in 1991 as a mechanism for linking
hypertext across the Internet
►
documents contain links to other documents
 Documents were considered static and stateless
►
requesting the same document twice always returned
identical copies
 Documents were primarily text
focus was on content, not presentation
► HTML contained some rudimentary markup for
formatting
►
 Much of this has now changed
3
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Terminology
 Documents are identified with a Universal Resource
Locator/Identifier (URL/URI)
►
►
unique string identifying a document’s location
http://www.google.com/
 Documents are requested and sent using Hypertext
Transfer Protocol (HTTP)
►
simple text-based file-transfer protocol understood by both ends
of a transfer
– web browser (user agent) (client)
– web site (server)
►
form of responses strongly resembles email messages
 Documents are often written in Hypertext Markup
Language (HTML)
►
text-based, like Rich Text Format (RTF), since expanded into
Extensible Markup Language (XML)
4
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Fetching a document by HTTP
time
user agent (browser)
running on client
web server program
running on server
... Internet ...
request
response
5
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
User agent
 Web browser is a kind of user agent
initiates HTTP connection to server
► requests document using GET request
► receives response (header and document) from
server
► disconnects from server
► decodes headers
► renders document on screen
►
 Any program can be a user agent
►
►
►
Library for the Web with Perl (LWP) provides helper
functions
use LWP::UserAgent;
use LWP::Simple;
6
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Timeout
# Fetch a web page with LWP::Simple;
use LWP::Simple;
$doc = get("http://www.google.com/");
die "Couldn't access document" unless defined $doc;
# Process the document.
if ($doc =~ /<title>(.*?)<\/title>/i)
{
print "Title is $1\n";
}
else
{
print "Document has no <title> tag\n";
}
7
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Common Gateway Interface (CGI)
 Document served by server is usually a file on disk
 Server may instead run a program (“CGI program”) that
produces the document
►
part of the URL designates the program’s name
 Program produces the entire response
►
►
including HTTP header and blank line
response is sent as-is by server to user agent
 Server needs to distinguish between serving a static file
or running a program
►
two common approaches
– run anything in .cgi
– run anything in the /cgi-bin directory
8
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Fetching a document by HTTP
time
user agent (browser)
running on client
web server program program (instance of
running on server
application)
server invokes
program and passes
form data to it
server verifies
format of response
and passes it to
client
9
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Writing a CGI program
 Read form data
►
►
contents of all form elements on originating web page, if any
form data found either at end of URL or on standard input
– depending on whether GET or POST method used
►
Perl CGI module facilitates this
 Process data
 Produce response
►
►
send to standard output
produce HTTP header
– Content-Type header mandatory
►
►
produce blank line
produce body of response
10
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Installing a CGI program at Monash
 Install program in
►
$HOME/WWW/cgi-bin/myprogram
 Permissions must be set correctly
►
cgi-bin and parent directories must be searchable
by all
– home.page.setup
– chmod a+x ~ ~/WWW ~/WWW/cgi-bin
►
program must be readable and executable by you
– chmod u+rx myprogram
 Program is accessible at URL
http://users.monash.edu.au/~you/
cgi-bin/myprogram
11
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Timeout
#!/usr/bin/perl -w
# Generate a static CGI page.
# << notation is a fancy kind of string quoting
# reminiscent of shell here-documents. All text
# between the FLAGS is in the string.
print <<"FLAG";
Content-Type: text/html
<html>
<head><title>Hello</title></head>
<body><p>Hello, world!</p></body>
</html>
FLAG
12
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Timeout
#!/usr/bin/perl -w
# Generate a CGI page with varying text.
print <<"EOT";
Content-Type: text/html
<html>
<head><title>Date</title></head><body>
EOT
# Get date.
chomp($date = `/bin/date`);
print "<p>The date is <b>$date</b></p>\n";
print "</body></html>\n";
13
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Forms
What is your species?
What is your preferred language?
Thai
Go
14
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Form data
 Form data is text entered into web page in HTML
<INPUT>, <SELECT> and <TEXTAREA> tags
<FORM><INPUT type="text" name="species">
<INPUT type="text" name="language" value="Thai")>
<INPUT type="submit" name="x" value="Go"></FORM>
 Form data is submitted by browser in HTTP request
►
►
each parameter and its value
species=human&language=English&x=Go
 Perl CGI module includes param function which extracts
parameters’ values
►
►
►
use CGI ("param");
param("species") # "human"
param("language") # "English"
15
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Timeout
# Process form data and produce a response.
use CGI qw(param);
# Get parameters.
$kind = param("species");
$tongue = param("language");
print <<"EOT";
Content-Type: text/html;
<html><head><title>Greetings</title></head>
<body><h1>Greetings, $kind!</h1>
<p>Do you speak $tongue?</p>
</body></html>
EOT
16
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
HTML shortcuts
 Printing raw HTML can make source code
difficult to read
 CGI module provides helper functions for
generating HTML tags
markup and form generation
► without shortcut: print "<h1>Heading</h1>";
► with shortcut: print h1("Heading");
►
 Need to import helper functions
►
►
use CGI qw(h1 h2 p b em table ...);
use CGI qw(:standard);
17
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Timeout
# Using HTML shortcuts.
use CGI qw(:standard);
# Get parameters.
$kind = param("species");
$tongue = param("language");
print header(), start_html("Greetings"),
h1("Greetings, $kind!"),
p("Do your speak $tongue?"),
end_html();
18
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Keeping state
 HTTP is a stateless protocol
►
each connection is independent
 Often want to present several pages to user in
sequence
►
e.g., shopping cart
 Several solutions
►
use a hidden parameter
– <INPUT type="hidden">
►
use cookies
– CGI module’s cookie function
►
put state information in URL
– requires support from web server
19
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Timeout
use CGI qw(:standard);
$page = param("state");
print header();
if (!defined $page)
{
print start_html("Question"), start_form(),
p("What is your species?", textfield("species")),
p("Use what language?", textfield("language", "Thai")),
p(submit("x", "Go")), hidden("state", "result"),
end_form(), end_html();
}
elsif ($page eq "result")
{
print start_html("Greetings"),
h1 ("Greetings, $kind!"),
p("Do your speak $tongue?"),
end_html();
20
}
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
CGI security
 CGI security is very important
►
CGI programs are run on local host
– as your user ID
– in your directories
►
connections initiated from user agents worldwide
– strangers can’t be trusted!
►
HTTP requests can be hand-crafted to exploit security holes
 Always check form data for correctness
►
►
correct values
correct combination of parameters
 Never let error conditions provide hints about
implementation
►
error messages that are helpful during debugging are also
helpful to crackers
21
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Further reading
 LWP, lwpcook manpages
 CGI manpage
 Learning Perl 2nd edition, chapter 19
►
not in 3rd edition
 CGI Programming with Perl
►
Scott Guelich, Shishir Gundavaram, Gunther
Birznieks, O’Reilly 2000
 Perl Cookbook
►
Tom Christiansen & Nathan Torkington, O’Reilly 1st
edition 1998, 2nd edition 2003
22
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Covered in this topic
 Writing a Perl web client
►
LWP::Simple module
 Dynamic web pages
Common Gateway Interface (CGI)
► forms
► keeping state
►
23
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Going further
 LWP::UserAgent
►
full object-oriented interface to Perl web user agent
 HTML::Parser and XML::Parser
►
tools for processing HTML and XML
 GD
►
module to create images on the fly
 Tainting
dealing with insecure data
► Camel3 pages 558-568
►
24
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Next topic
 References
►
Perl’s answer to pointers
 Nested data structures
multi-dimensional arrays
► emulating C structs
►
perlref, perlreftut, perllol, perldsc manpages
25
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Copyright
Perl Programming lecture notes Copyright ©
2000-2004 Deborah Pickett. Reproduction of this
presentation for nonprofit study use is permitted.
All other reproduction must be authorized in
writing by the author.
26
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University