HyperText Markup Language

Download Report

Transcript HyperText Markup Language

HTML and HTTP
Based on Computer Networks and
Internets, Comer
CSIT 220 (Blum)
1
Hypertext
• HTML stands for HyperText Markup Language
and HTTP stands for HyperText Transport
Protocol, so that raises the question: what is
hypertext?
• Hypertext is “a method of storing data through a
computer program that allows a user to create and
link fields of information at will and to retrieve
the data non-sequentially.” (Webster’s)
• A hyperlink is a region on one document (page)
that when clicked brings up for the user another
document.
• It was developed by Ted Nelson in the 1960s.
CSIT 220 (Blum)
2
URL
• The “resources” (data or program files) are
located on many computers through an
internet or the Internet, hence this is a
“distributed” system
• The location of a resource is given by its
URL (Uniform Resource Locator)
– http://www.lasalle.edu:1234/it/fake.htm#attach
CSIT 220 (Blum)
3
Browser
• Hypertext is generally viewed in a web browser,
an application used to locate (linked or otherwise)
web pages and display them.
• Some browsers such as Lynx only link text
documents.
• But when most people think of browsers they
think of Netscape Navigator and/or Microsoft
Internet Explorer, which support more than just
text.
CSIT 220 (Blum)
4
Hypermedia
• Modern browsers link information in non-textual
format (graphics, sound, video, etc.) and so are
“multimedia” or “hypermedia” programs.
• The browser may need a plug-in to support some
formats. A plug-in adds a particular feature or
service to a larger system.
• Browsers plug-ins are based on MIME file types.
CSIT 220 (Blum)
5
Mosaic
• The first widely used multimedia browser was
Mosaic.
• Marc Andreessen is credited with initiating the
development of Mosaic.
• Mosaic moved the Internet out of the realm of
academics and computer hobbyists by making it
accessible to a much more general audience.
• It helped the Internet maintain its exponential
growth in number of users.
CSIT 220 (Blum)
6
Fig. 2.1: Computers connected to
the Internet vs. Year
mosaic
CSIT 220 (Blum)
7
Mosaic (Cont.)
• Andreessen started Mosaic while working for the
National Center for Supercomputing Applications
(NCSA) at the University of Illinois.
• Andreessen helped found Netscape
Communications, which was originally called
Mosaic Communications.
• Mosaic is distinct from Netscape. In fact, Mosaic
is also licsensed for commercial use and is
provided to users by some Internet access
providers.
CSIT 220 (Blum)
8
HTML
• Browsers interpret web documents,
especially HTML documents
• HyperText Markup Language is an
“authoring” scheme for creating documents
for the World Wide Web.
• The World Wide Web (WWW) is the
collection of resources available through
HTTP to users on the Internet.
CSIT 220 (Blum)
9
CSIT 220 (Blum)
10
Markup
• The M in HTML stands for “Markup”
• Markup refers to the sequence of characters
(or symbols) inserted in a document to
indicate how the file should look when it is
printed or displayed and/or to describe the
document's logical structure.
• The markup indicators are often called
"tags."
CSIT 220 (Blum)
11
Tags
• These formatting instructions must be
distinguishable from the text they are in.
• In HTML, angle brackets < and > are used as
delimiters to indicate the beginning and end of a
tag
– This gives <b>bold</b> type.
• As with the byte stuffing we saw in Ethernet
frames (where soh an eot were special characters),
angle brackets must be replaced in a HTML
document with &lt; and &gt;
CSIT 220 (Blum)
12
Tags (Cont.)
• The formatting or structure the tag indicates often
refers to an entire region, so many HTML tags
occur in pairs (heading and trailing). The trailing
tag includes a slash.
• An HTML document begins an <HTML> tag and
ends with an </HTML> tag.
• An HTML document is broken into two pieces:
the head and the body
– The head is the part between the head tags <head> and
</head>
– The body is the part between the body tags <body> and
</body>
CSIT 220 (Blum)
13
Example from W3
CSIT 220 (Blum)
14
HTML 4.01 versions
CSIT 220 (Blum)
15
HTML tag
CSIT 220 (Blum)
16
Page from my site
A space
CSIT 220 (Blum)
17
Cascading Style Sheet
• An html document can be be written to work in
conjunction with a css file – a cascading style
sheet.
• A cascading style sheet separates out instructions
about look and layout so that they can be reused –
that is referred to many times either within in the
same document or even by different documents.
• XLS extensible style language is a newer version,
but css is still popular.
CSIT 220 (Blum)
18
HTML (Cont.)
• There are hundreds of other tags used to format
and layout the information in a Web page.
• For instance, <P> is used to make paragraphs and
<I> … </I>is used to italicize fonts.
• Tags are also used to specify hypertext links.
– <a href=“http://www.lasalle.edu”>La Salle</a>
• HTML is not the only Markup Language.
CSIT 220 (Blum)
19
SGML
• HTML has similarities to SGML, Standard
Generalized Markup Language, a generic system
for organizing and tagging elements of a
document.
• GML was started by IBM and became SGML
when it was taken over by the International
Organization for Standards (ISO).
• SGML is not about formatting, it’s more general.
SGML provides rules for tagging elements.
• Those tags might be interpreted as formatting as is
done in HTML but can be interpreted in other
ways as well.
CSIT 220 (Blum)
20
XML
• Extensible Markup Language
• “Extensible” means capable of being extended,
and markup language involves tags, so XML is a
scheme in which the user can define his or her
own tags.
• For example, a company may elect to designate a
social security number by placing it in tags
defined for that purpose
– <ssn>123456789</ssn>
• This data can be transported from application to
application and system to system and is carrying
around a self-identifying tag with it.
CSIT 220 (Blum)
21
XML (Cont.)
• Unlike HTML tags, XML tags are not necessarily
about formatting and presentation.
• However, a presentation application can be
instructed to represent a certain type of data (as
identified by its XML tags) in a particular way.
• On the other hand, a database interface program
can be instructed to place the information into the
appropriate field.
CSIT 220 (Blum)
22
XHTML
• Extensible Hypertext Markup Language is a
mixture of HTML and XML designed for
network display devices.
• XHTML is written in XML; therefore, it is
an XML application.
CSIT 220 (Blum)
23
CFML
• ColdFusion Markup Language is a mark up
language developed by Allaire (who have merged
with Macromedia) for use with ColdFusion, a
product that helps webpages work with databases.
– CFML is proprietary.
• ColdFusion tags are placed in HTML files.
– The HTML tags determine the page's look (layout).
– The CFML tags bring in information (content) that
result from user input and/or database queries.
• Files created with CFML have the file extension
.cfm
CSIT 220 (Blum)
24
DOM
• The Document Object Model is a set of rules about how
the objects on web-page (for instance, text, images,
textboxes, buttons) look and function.
• The DOM specifies the properties of the object as well as
the events associated with each object.
– A button’s properties are its height, width, position,
color, etc.
– A button’s events are click, right click, mouse down,
etc.
• Dynamic HTML (DHTML) uses DOM to determine the
appearance of Web pages after they have been downloaded
(that is on the client-side).
CSIT 220 (Blum)
25
DOM (Cont.)
• Alas Netscape Navigator and Microsoft
Internet Explorer use different DOMs.
• This is why their implementations of
DHTML are so different.
• Both companies have submitted their
DOMs to the World Wide Web
Consortium (W3C) for standardization.
CSIT 220 (Blum)
26
HTTP
• HTML and other web documents are sent across
the network using HTTP Hypertext Transport
Protocol, which was originally developed by Dr.
Tim Berners-Lee.
– It was developed while he worked at CERN, a center
for particle physics, so that scientists from all over the
world could share information.
• HTTP defines rules for how messages are
formatted and transmitted, what actions are
allowed by Web servers, what actions are allowed
by clients, etc.
CSIT 220 (Blum)
27
HTTP message
CSIT 220 (Blum)
28
HTTP
• A Web server has an HTTP daemon that waits for
HTTP requests and handles them when they
arrive.
• A Web browser is an HTTP client, sending
requests to server machines.
• For example, entering a URL in the location field
of a browser (client) sends an HTTP request to the
appropriate Web server, which responds with the
page.
– Of course, if a domain name is entered, it may have to
go to the DNS server first.
CSIT 220 (Blum)
29
HTTP
• HTTP is a stateless protocol. Each command is
executed without knowing anything about any
preceding commands.
– This is good for keeping transmission lines available,
since there are no ongoing sessions tying up resources.
– This is bad for having a web site respond in an
intelligent way to a user.
• This problem of HTTP is addressed in a number of
ways, including ActiveX, Java, JavaScript and
cookies.
CSIT 220 (Blum)
30
HTTP 1.1
• Most modern browsers support HTTP 1.1
• Instead of opening and closing a connection for
each application request, HTTP 1.1 provides a
persistent connection that allows multiple
requests to be batched or pipelined to an output
buffer.
• The underlying TCP layer can put multiple
requests (and responses to requests) into one TCP
segment.
• Fewer segments, less overhead.
CSIT 220 (Blum)
31
HTTP 1.1 (Cont.)
• Compression: If a browser (client) indicates
that it can decompress HTML files, then a
server compresses them for transport across
the Internet.
• Standard image files are already in a
compressed format, so this improvement
applies only to HTML and other non-image
data types.
CSIT 220 (Blum)
32
sHTTP
• Secure HTTP is an extension to the HTTP
protocol for sending data securely over the Web.
• Not all browsers and servers support S-HTTP.
• Another technology for secure communications
over the Web is Secure Sockets Layer (SSL).
• SSL and S-HTTP have different designs and goals.
SSL is designed to establish a secure connection
between two computers, S-HTTP is designed to
send individual messages securely.
CSIT 220 (Blum)
33
Cache
• To increase speed, browsers cache web page
documents locally.
• There are also cache servers, machines on
the local network that cache web page
documents.
• First, the page is looked for on the local
machine, then on the local network (cache
server) and then at the remote location.
CSIT 220 (Blum)
34
Refresh if you don’t want the
cached version
CSIT 220 (Blum)
35
FTP
Based on Computer Networks and
Internets, Comer
CSIT 220 (Blum)
36
FTP
• File Transfer Protocol is a set of rules for moving
files around on an internet or the Internet.
• One common use of FTP is to move web-page
files from the computer on which they were
created to the web server where they are
accessible to people on the web.
• Another common use is to download programs
and other files to one’s computer.
• One can also download files using HTTP but FTP
is faster.
CSIT 220 (Blum)
37
CSIT 220 (Blum)
38
Versions
• There is a command-line version of FTP.
– This is a fairly standard utility but the user must know a
set of commands to use it.
– A user can put a file into a directory at a remote
location or get a file from there.
• There is also a GUI version.
– This version is easier to operate (with its listboxes,
scrollbars and buttons).
– But it must be downloaded.
• One can also use a browser to get files using FTP
from sites.
CSIT 220 (Blum)
39
Access and Capability
• Access to the FTP services typically requires
authenticating the user (username and password).
• In such cases, the user can typically delete,
rename, move files and so on, in addition of
copying them.
• Anonymous FTP does not authenticate a user but
allows the user to do less, typically one only gets
files
– It is used as a means to distribute files.
CSIT 220 (Blum)
40
Anonymous FTP
• In anonymous FTP, one enters "anonymous"
for the username.
• The password may not matter or they may
request an email address, or in old versions
the password may be “guest”.
• This is a way of giving the public access to
a server so that files can be downloaded.
CSIT 220 (Blum)
41
Data and Control
• Local machine must have an FTP client.
• Remote machine must have an FTP server.
• Transferring a file using FTP actually
consists of two connections.
• An FTP daemon listens at TCP port 21.
– (UDP has its own set of ports.)
• Port 21 is for initiating a control connection.
CSIT 220 (Blum)
42
Data
Control
CSIT 220 (Blum)
43
Data and Control
• The client’s initial control message includes the
port number at which the client expects to
receive data.
• The server’s port 20 initiates a data connection to
that port on the client.
• The control connection indicates what files will be
transferred in which direction; the actual
transferring takes place on the data connection.
• There is one control connection during an FTP
session, but the data connections close when the
transfer is complete, thus an FTP session may
have several data connections.
CSIT 220 (Blum)
44
FTP Client and Server
CSIT 220 (Blum)
45
Command-line FTP:
Start/Run/cmd
For older operating systems, use command instead of cmd.
CSIT 220 (Blum)
46
Command-line FTP:
ftp <domain name>
Enter username and password,
password need not be echoed
CSIT 220 (Blum)
47
Command-line FTP: ls
Shows contents
of current
directory (folder)
CSIT 220 (Blum)
48
Command-line FTP:
cd <directory name>
Moves one into the
specified folder on the
remote machine
CSIT 220 (Blum)
49
Command-line FTP:
wildcard
* is the wildcard, it stands
in for anything that might
follow, in this case we are
listing any files that begin
with f
CSIT 220 (Blum)
50
Command-line FTP:
wildcard
* is the wildcard, it stands
in for anything that might
precede, in this case we are
listing any files that end
with .jpg
CSIT 220 (Blum)
51
Command-line FTP:
get <filename>
Transfers a copy of a
remote file to the local
machine
CSIT 220 (Blum)
52
Overwriting
• Most versions of FTP simply overwrite a
file of the same name when one uses the get
or put commands.
• Unlike many applications, the user will not
be given a warning that he or she is about to
overwrite a file.
CSIT 220 (Blum)
53
Command-line FTP:
put <filename>
Places a copy of a
local file onto a
remote computer
CSIT 220 (Blum)
54
Command-line FTP: binary
Get and put assume files are in
ASCII, the binary command
switches the mode to binary for
transferring other types of files
While the first get looks like it worked, the PowerPoint file could
not be opened, the second get provided a useable ppt file.
CSIT 220 (Blum)
55
Command-line FTP: ascii
Puts FTP back into
ASCII mode
CSIT 220 (Blum)
56
Htm file transferred in ASCII mode
CSIT 220 (Blum)
57
Htm file transferred in Binary mode
“Returns” in original document can be lost,
replaced with unprintable characters
CSIT 220 (Blum)
58
FTP commands
CSIT 220 (Blum)
59
A GUI version: ws_ftp le
CSIT 220 (Blum)
60
ws_ftp le
Le: limited edition
CSIT 220 (Blum)
61
Establishing a session
CSIT 220 (Blum)
62
Startup
Local folder to start in
Remote folder you want to start in, you must have permission.
This doesn’t always work. FTP server may not allow you to
specify folder.
CSIT 220 (Blum)
63
The well known port for FTP
control is 21
CSIT 220 (Blum)
64
ws_ftp le
Local file directory
CSIT 220 (Blum)
65
ws_ftp le
Remote file directory
CSIT 220 (Blum)
66
ws_ftp le
Modes: ASCII or Binary
CSIT 220 (Blum)
67
ws_ftp le
get
CSIT 220 (Blum)
put
68
Can also rename files, delete files
and refresh the directory
CSIT 220 (Blum)
69
An FTP site: FTP service using a browser
ftp (not http) as the protocol
CSIT 220 (Blum)
70
Passive FTP
• Passive FTP is a more secure form of data
transfer in which the flow of data is set up and
initiated by the File Transfer Program (FTP)
client rather than by the FTP server program.
• FTP client programs sometimes allow the user to
select passive FTP.
• Most Web browsers (which act as FTP clients) use
passive FTP by default.
CSIT 220 (Blum)
71
Passive FTP
CSIT 220 (Blum)
72
Passive FTP
• Recall FTP consists of two connections, in
normal FTP the client initiates the control
connection, but the server establishes the
data connection.
• Some networks have firewalls that only
allows connections that were initiated from
within, this would rule out the data
connection of a normal FTP session.
CSIT 220 (Blum)
73
“Normal” vs Passive FTP
• Normal: Client initiates control and gives a
port number to server which then initiates
data connection.
• Passive: Client initiates control and asks
server to return over the control connection
which port it intends to use (for data), then
the client initiates a data connection using
the port number supplied by the server.
CSIT 220 (Blum)
74
TFTP
• Trivial File Transfer Protocol, a simple version
of FTP, but TFTP uses the User Datagram
Protocol (UDP) instead of TCP.
– It is simpler, faster, requires less code.
– But is less capable and less secure.
• It is used where user authentication and directory
visibility are not required.
• It is often used by servers to boot diskless
workstations, X-terminals, and routers.
– Diskless workstations need operating systems too.
CSIT 220 (Blum)
75
Other References
• http://www.webopedia.com
• http://www.whatis.com
• http://www.uic.edu/depts/accc/network/ftp/v
ftp.html
• http://www.w3.org/TR/REChtml40/struct/global.html
CSIT 220 (Blum)
76