PZ14B - Web pages

Download Report

Transcript PZ14B - Web pages

PZ14B - Web pages
Programming Language Design and Implementation (4th Edition)
by T. Pratt and M. Zelkowitz
Prentice Hall, 2001
Section 12.2.1
PZ14B
Programming Language design and Implementation -4th Edition
Copyright©Prentice Hall, 2000
1
ARPANET
• Initial idea was by Defense Advanced Research Project
Agency (DARPA) project in late 1960s for a national
defense network
• began a project to see whether several computers,
widely separated geographically, could be linked
together to enable users at a terminal on one system
to access the resources on another computer.
• Initial military concept: provide access to computers
if some communications lines are destroyed by
building a network where data communications traffic
could dynamically adapt to changing conditions.
• data communications -sending messages reliably from
one computer to another- was the major obstacle.
• The initial ARPANET began in 1970 as a three-node
network linking BBN in Cambridge, Massachusetts, with
UCLA and SRI in California using 56 kilobit lines.
• Sites added until several hundred by mid-1970s
PZ14B
Programming Language design and Implementation -4th Edition
Copyright©Prentice Hall, 2000
2
ARPANET communications
Communication between two computers was handled via
messages. A message was broken down into fixed-length
strings called packets, and the packets were sent
from computer to computer until the original message
was reassembled at the receiving node.
To ensure that messages destined for another computer
arrived reliably, a formal communication model called a protocol -was developed. For the ARPANET,
this developed as the Transmission Control Program/
Internet Protocol (TCP/IP).
TCP/IP was a low-level communication mechanism that
simply determined that a sequence of bytes destined
for a specific computer arrived there uncorrupted. It
was generally too complex for users to use directly
for accessing a computer.
PZ14B
Programming Language design and Implementation -4th Edition
Copyright©Prentice Hall, 2000
3
User protocols
Telnet is a protocol that makes the sending computer the computer the user is actually working on -behave
like a terminal connected to the distant computer. user is connected to a client computer, which acts
like a terminal, and the terminal program is
communicating using the telnet protocol to a distant
host computer, which is providing the server program.
SMTP is Simple Mail Transport Protocol. This provides
the basic e-mail (electronic mail) that has become so
ubiquitous today
FTP is File Transfer Protocol. One would invoke the FTP
client on a local machine, log onto the distant
server machine using the FTP protocol, and then
retrieve the desired documents from the distant
machine or send documents from the user's machine to
the distant machine.
PZ14B
Programming Language design and Implementation -4th Edition
Copyright©Prentice Hall, 2000
4
Weaknesses in FTP
one had to know explicitly which machine to access to
retrieve the desired data.
One also had to have access to the files of that
machine to retrieve the information. The anonymous
login partially solved that.
One had to know exactly where on the file system the
desired information was.
Despite these weaknesses, FTP was the standard file
transmission mechanism for many years until the web
changed all that.
PZ14B
Programming Language design and Implementation -4th Edition
Copyright©Prentice Hall, 2000
5
Birth of the Internet
In the mid-1980s, ARPA decided to stop supporting the
ARPANET. As a research activity, the concept had
been proved, and ARPA was not in the business of
providing what was becoming a commercial service.
The U.S. National Science Foundation (NSF) took over
the backbone network in the United States -the set of
high-speed telephone lines that provided the basic
TCP/IP communications traffic between host computers
as a way to link universities together.
The name of the network gradually evolved into the
Internet. NSF support stopped.
Attached to this backbone, local networks (a state, a
university, a large company) were added until the
Internet became this amorphous collection of
computers all continually chattering to one another.
Commercial providers, now called Internet Service
Providers (ISP) established links to the Internet so
that individuals on their home computers could use a
modem to dial into their local ISP to be on the
Internet.
PZ14B
Programming Language design and Implementation -4th Edition
Copyright©Prentice Hall, 2000
6
The World Wide Web
By the late 1980s, widespread interest to easily
transfer files. FTP was a cumbersome process. Systems
like gopher, archie, veronica developed
Physicists -principally Tim Berners-Lee at CERN in
Geneva -desired a mechanism to access and transfer
documents by computer that was simpler than the
standard FTP server.
They developed the concept of a semantic description
language. One server program would display a
document, and a client program, called a browser,
would read and understand the displayed document. The
power of their system was that the displayed document
contained pointers to other documents called
hypertext.
An earlier version of hypertext was Apple Computer's
HyperCard product for the Macintosh, but the real
power of the CERN development was to allow hypertext
links to documents that existed on other computers
connected to the Internet.
PZ14B
Programming Language design and Implementation -4th Edition
Copyright©Prentice Hall, 2000
7
HTTP
The protocol developed was the HyperText Transport
Protocol (HTTP). Http an addition to TELNET, FTP, and
SMTP protocols discussed earlier.
Release of the first MOSAIC browser in 1993 led to
rapid growth of the web.
Each pointer became known as a Uniform Resource Locator
(URL). Document location was reduced to:
• invoking a Web browser on your local machine,
• typing in a URL for the document you wanted to
access,
• connecting to a Web server on the distant machine
that contained the location of the typed in URL,
• displaying the document obeying the HTTP protocol.
HTML language based upon SGML - to be explained later.
PZ14B
Programming Language design and Implementation -4th Edition
Copyright©Prentice Hall, 2000
8
Web navigation
PZ14B
Programming Language design and Implementation -4th Edition
Copyright©Prentice Hall, 2000
9
Prentice Hall example of navigation
1. The user types in the URL for the home page. This URL
consists of: a Domain name (www.cs.umd.edu) and a file
on that machine(users/mvz/pzbook).
2. The Web browser sends the domain name to one of several
special Internet machines called Domain Name Servers}
(DNS). The DNS returns the Internet Protocol address of
the desired web page.
3. The web browser sends the file name to the Web server
at IP address 128.8.128.80. A HTTP Daemon (HTTPD)
program on this machine is the main interface between a
web server and the Internet.
4. The Web server appends the name index.html because the
given file was a directory and not a file.
5. The contents of the file are sent back to the Web
browser and displayed to the user.
6. If the user now clicks on the URL for Prentice-Hall
that appears on the Web page (www.prenticehall.com), the
process is repeated and the Prentice-Hall server at IP
address 63.69.110.94 is accessed and the appropriate Web
page is displayed.
PZ14B
Programming Language design and Implementation -4th Edition
Copyright©Prentice Hall, 2000
10
Portals
To make navigation easier, certain Web sites
are now known as portals -entrance sites to the WWW.
These sites have programs known as search engines. A
search engine is a query processor in which you enter
a question. The result of that query is a list of WWW
locations that answer the question.
Search engines often operate as Web crawlers. Beginning
at one location, the Web crawler follows all links on
that Web page to find other Web pages.
PZ14B
Programming Language design and Implementation -4th Edition
Copyright©Prentice Hall, 2000
11
SGML
• Structured General Markup Language is basis of SGML
• an unstructured sequence of characters
• within the text can be SGML elements. The semantics
of elements are unspecified, but their syntax is
given.
• elements are bracketed by a start-tag and an end-tag
notation.
<zork> I am a zork </zork>
identifies “I am a zork” as the contents of the zork
element.
• A report in SGML:
<report>
<title} text </title>
<author} text </author>
<abstract} text </abstract>
<body} text </body>
</report>
• SGML handles semantic content, not presentation
PZ14B
Programming Language design and Implementation -4th Edition
Copyright©Prentice Hall, 2000
12
HTML
• An instance of SGML with a defined syntax for Web
pages
<html>
<title> title of document </title>
<body>
text of document
</body>
</html>
• Problem: SGML is semantic content, not layout
(presentation).
• How to handle things like:
<h1>Major heading</h1>
- What font and font size to use?
- Where on page to place heading?
• Elements like <font size=...> move away from pure
semantic content
PZ14B
Programming Language design and Implementation -4th Edition
Copyright©Prentice Hall, 2000
13
Links in HTML
HTML contains:
• Embedded text
• URLs: Links to other web pages <http://web address>
• Images: <SRC SRC=...>
• MAILTO: protocol (Send email)
• Executable pages (CGI scripts. To be discussed soon)
PZ14B
Programming Language design and Implementation -4th Edition
Copyright©Prentice Hall, 2000
14