Transcript html

Web pages
Programming Language Design and
Implementation (4th Edition)
by T. Pratt and M. Zelkowitz
Prentice Hall, 2001
Section 12.2.1
ARPANET






Initial idea was by Defense Advanced Research Project
Agency (DARPA) project in late 1960s for a national
defense network
began a project to see whether several computers,
widely separated geographically, could be linked
together to enable users at a terminal on one system
to access the resources on another computer.
Initial military concept: provide access to computers
if some communications lines are destroyed by
building a network where data communications traffic
could dynamically adapt to changing conditions.
data communications -sending messages reliably from
one computer to another- was the major obstacle.
The initial ARPANET began in 1970 as a three-node
network linking BBN in Cambridge, Massachusetts, with
UCLA and SRI in California using 56 kilobit lines.
Sites added until several hundred by mid-1970s
2
ARPANET communications
Communication between two computers was handled via
messages. A message was broken down into fixed-length
strings called packets, and the packets were sent
from computer to computer until the original message
was reassembled at the receiving node.
To ensure that messages destined for another computer
arrived reliably, a formal communication model called a protocol -was developed. For the ARPANET,
this developed as the Transmission Control Program/
Internet Protocol (TCP/IP).
TCP/IP was a low-level communication mechanism that
simply determined that a sequence of bytes destined
for a specific computer arrived there uncorrupted. It
was generally too complex for users to use directly
for accessing a computer.
3
User protocols
Telnet is a protocol that makes the sending computer the computer the user is actually working on -behave
like a terminal connected to the distant computer. user is connected to a client computer, which acts
like a terminal, and the terminal program is
communicating using the telnet protocol to a distant
host computer, which is providing the server program.
SMTP is Simple Mail Transport Protocol. This provides
the basic e-mail (electronic mail) that has become so
ubiquitous today
FTP is File Transfer Protocol. One would invoke the FTP
client on a local machine, log onto the distant
server machine using the FTP protocol, and then
retrieve the desired documents from the distant
machine or send documents from the user's machine to
the distant machine.
4
Weaknesses in FTP
one had to know explicitly which machine to access to
retrieve the desired data.
One also had to have access to the files of that
machine to retrieve the information. The anonymous
login partially solved that.
One had to know exactly where on the file system the
desired information was.
Despite these weaknesses, FTP was the standard file
transmission mechanism for many years until the web
changed all that.
5
Birth of the Internet
In the mid-1980s, ARPA decided to stop supporting the ARPANET.
As a research activity, the concept had been proved, and
ARPA was not in the business of providing what was becoming
a commercial service.
The U.S. National Science Foundation (NSF) took over the
backbone network in the United States -the set of highspeed telephone lines that provided the basic TCP/IP
communications traffic between host computers as a way to
link universities together.
The name of the network gradually evolved into the Internet.
NSF support stopped.
Attached to this backbone, local networks (a state, a
university, a large company) were added until the Internet
became this amorphous collection of computers all
continually chattering to one another.
Commercial providers, now called Internet Service Providers
(ISP) established links to the Internet so that individuals
on their home computers could use a modem to dial into
their local ISP to be on the Internet.
6
The World Wide Web
By the late 1980s, widespread interest to easily transfer
files. FTP was a cumbersome process. Systems like gopher,
archie, veronica developed
Physicists -principally Tim Berners-Lee at CERN in Geneva desired a mechanism to access and transfer documents by
computer that was simpler than the standard FTP server.
They developed the concept of a semantic description language.
One server program would display a document, and a client
program, called a browser, would read and understand the
displayed document. The power of their system was that the
displayed document contained pointers to other documents
called hypertext.
An earlier version of hypertext was Apple Computer's HyperCard
product for the Macintosh, but the real power of the CERN
development was to allow hypertext links to documents that
existed on other computers connected to the Internet.
7
HTTP
The protocol developed was the HyperText Transport
Protocol (HTTP). Http an addition to TELNET, FTP, and
SMTP protocols discussed earlier.
Release of the first MOSAIC browser in 1993 led to
rapid growth of the web.
Each pointer became known as a Uniform Resource Locator
(URL). Document location was reduced to:
 invoking a Web browser on your local machine,
 typing in a URL for the document you wanted to access,
 connecting to a Web server on the distant machine
that contained the location of the typed in URL,
 displaying the document obeying the HTTP protocol.
HTML language based upon SGML - to be explained later.
8
Web navigation
9
Prentice Hall example of navigation
1. The user types in the URL for the home page. This URL consists
of: a Domain name (www.cs.umd.edu) and a file on that
machine(users/mvz/pzbook).
2. The Web browser sends the domain name to one of several
special Internet machines called Domain Name Servers} (DNS).
The DNS returns the Internet Protocol address of the desired
web page.
3. The web browser sends the file name to the Web server at IP
address 128.8.128.80. A HTTP Daemon (HTTPD) program on this
machine is the main interface between a web server and the
Internet.
4. The Web server appends the name index.html because the given
file was a directory and not a file.
5. The contents of the file are sent back to the Web browser and
displayed to the user.
6. If the user now clicks on the URL for Prentice-Hall that
appears on the Web page (www.prenticehall.com), the process is
repeated and the Prentice-Hall server at IP address
63.69.110.94 is accessed and the appropriate Web page is
displayed.
10
Portals
To make navigation easier, certain Web sites
are now known as portals -entrance sites to the WWW.
These sites have programs known as search engines. A
search engine is a query processor in which you enter
a question. The result of that query is a list of WWW
locations that answer the question.
Search engines often operate as Web crawlers. Beginning
at one location, the Web crawler follows all links on
that Web page to find other Web pages.
11
SGML






Structured General Markup Language is basis of SGML
an unstructured sequence of characters
within the text can be SGML elements. The semantics
of elements are unspecified, but their syntax is
given.
elements are bracketed by a start-tag and an end-tag
notation.
<zork> I am a zork </zork>
identifies “I am a zork” as the contents of the zork
element.
A report in SGML:
<report>
<title} text </title>
<author} text </author>
<abstract} text </abstract>
<body} text </body>
</report>
SGML handles semantic content, not presentation
12
HTML




An instance of SGML with a defined syntax for Web
pages
<html>
<title> title of document </title>
<body>
text of document
</body>
</html>
Problem: SGML is semantic content, not layout
(presentation).
How to handle things like:
<h1>Major heading</h1>
- What font and font size to use?
- Where on page to place heading?
Elements like <font size=...> move away from pure
semantic content
13
Links in HTML
HTML contains:
 Embedded text
 URLs: Links to other web pages <http://web address>
 Images: <SRC SRC=...>
 MAILTO: protocol (Send email)
 Executable pages (CGI scripts. To be discussed soon)
14