Transcript web system

The World Wide Web
Web Growth
Web Server Statistics
• Apache is the most popular web server today
(freely available)
• Microsoft IIS is gaining ground
The World Wide Web: A Brief History
Vannevar Bush 1945
MEMory EXtender system
• Problem:
– Bush was concerned about “new knowledge
not reaching the people who would benefit
from it”
MEMory EXtender system
• Store publications,
correspondence, and
personal work on microfilm
• Items retrieved rapidly using
index codes
• Can annotate text with
margin notes, comments
• Can construct a trail through
the material and save it
• Acts as an external memory
MEMory EXtender system
limitations
• Basic unit of content is
an image page
– No links to/from sub-text
• No digital content
– No keyword search, only
TOC/index codes
• No networking
– No rapid info sharing
Results
• The MEMory EXtender system was the
inspiration for the creators of hypertext
and the web
McLuhan is
known for
coining the
expressions
the medium is
the message
and the global
village, and for
predicting the
World Wide
Web almost
thirty years
before it was
invented
Marshall McLuhan 1964 – Hypertext Coined
Sir Timothy John "Tim" Berners-Lee is an English
computer scientist known as the inventor of the
World Wide Web. He made a proposal for an
information management system in March 1989,
and on 25 December 1990, with the help of
Robert Cailliau and a young student at CERN, he
implemented the first successful communication
between a Hypertext Transfer Protocol (HTTP)
client and server via the Internet. By the mid
1990s, the World Wide Web had replaced Mark
P. McCahill's Gopher protocol as the dominant
Internet protocol.
Berners-Lee is the director of the World Wide
Web Consortium (W3C), which oversees the
Web's continued development. He is also the
founder of the World Wide Web Foundation, and
is a senior researcher and holder of the Founders
Chair at the MIT Computer Science and Artificial
Intelligence Laboratory (CSAIL). He is a director
of the Web Science Research Initiative
(WSRI),and a member of the advisory board of
the MIT Center for Collective Intelligence.
Tim Berners-Lee invents Global Hypertext WWW 1991
World First Browser and Web page.
3
Then…
4
What has happened through the
years?
• Appearance of JavaScript (JS) in Late
1995.
• Standardization of HTML in Jan 1997.
• Introduction of Cascading Style Sheet
(CSS) in 1998.
• Document Object Model (DOM) was
standardized in End of 1998. Begin to be
working in 2001.
• AJAX (Widely used by Google in 2004). 5
Web Architecture
6
The World Wide Web:
HTTP Protocol
HTTP vs HTML
• HTML: hypertext markup language
– Definitions of tags that are added to Web documents
to control their appearance
• HTTP: hypertext transfer protocol
– The rules governing the conversation between a Web
client and a Web server
Both were invented at the same time by the same person
What is a protocol?
• In diplomatic circles, a protocol is the set of rules
governing a conversation between people
• We have seen that the client and server carry on
a machine-to-machine conversation
• A network protocol is the set of rules governing a
conversation between a client and a server
• There are many protocols, HTTP is just one
An HTTP conversation
Client
• I would like to open a
connection
Server
• OK
• GET <file location>
• Send page or error message
• Display response
• Close connection
• OK
HTTP is the set of rules governing the format and content
of the conversation between a Web client and server
An HTTP example
The message requesting a Web page must begin with the
work “GET” and be followed by a space and the location of
a file on the server, like this:
GET /fac/lpress/shortbio.htm
The protocol spells out the exact message format, so
any Web client can retrieve pages from any Web server.
Network protocols
• The details are only important to developers.
• The rules are defined by the inventor of the
protocol – may be a group or a single person.
• The rules must be precise and complete so
programmers can write programs that work with
other programs.
• The rules are often published as an RFC along
with running client and server programs.
• The HTTP protocol used for Web applications
was invented by Tim Berners Lee.
RFC = request for comments
Tim Berners-Lee
Tim Berners-Lee was knighted by Queen Elizabeth for his invention of the
World Wide Web. He is shown here, along with the first picture posted on the
Web and a screen shot from an early version of his Web browser.
HTTP is an application layer protocol
• The Web client and the Web server are application
programs
• Application layer programs do useful work like retrieving
Web pages, sending and receiving email or transferring files
• Lower layers take care of the communication details
• The client and server send messages and data without
knowing anything about the communication network
The application layer is boss – the top layer
Layer
Function
Application
Do useful work like Web browsing, email,
and file transfer
Lower layers Handle communication between the client
and server
• Your boss says: Send this package to Miami -- I don't care if you
use Federal Express, UPS, or any other means. Also, let me know
when it arrives or if it cannot be delivered for some reason.
• The application program says: Send this request to the server -- I
don't care how you do it or whether it goes over phone lines, radio,
or anything else about the details. Just send the message, and let
me know when it arrives or if it cannot be delivered for some reason.
There are five TCP/IP layers, the application layer and four lower layers.
Many application layer protocols are used
on the Internet, HTTP is only one
Protocol
Application
HTTP: Hypertext Transfer Retrieve and view Web pages
FTP: File Transfer
Copy files from client to server
or from server to client
SMTP: Simple Mail
Transport
Send email
POP: Post Office
Read email
The TCP/IP protocol layers
The application program is king – it gets work done using the lower
level layers for communication between the client and server.
Application
Get useful work done – retrieve Web pages, copy
files, send and receive email, etc.
Transport
Make client-server connections and optionally
control transmission speed, check for errors, etc.
Internet
Route packets between networks
Data link
Route data packets within the local area network
Physical
Specify what medium connects two nodes, how
binary ones and zeros are differentiated, etc,
HTTP Operations
•
•
•
•
•
GET: retrieves URL (most widely used)
HEAD: retrieves only response header
POST: posts data to server
PUT: puts page on server
DELETE: deletes page from server
Simple HTTP Request and
Reply
Request:
GET http://www.server.com/page.html HTTP/1.0
Response:
HTTP-Version: HTTP/1.0 200 OK
Content-Length: 3012
Content-Type: text/html
<body>
HTTP 1.0
• Client opens a separate TCP connection for
each requested object
• Object is served and connection is closed
• Advantages
– maximum concurrency
• Limitations
– TCP connection setup/tear-down overhead
– TCP slow start overhead
HTTP 1.0
connect()
Client
Server
write()
Retrieve Data
From Disk
close()
connect()
write()
Retrieve Image
From Disk
close()
HTTP 1.1
connect()
Client
Server
write()
Retrieve Data
From Disk
write()
close()
Retrieve Image
From Disk
Server Side Close()
connect()
write()
Client
Server
Set timeout
Reset
timeout
Retrieve Data
From Disk
write()
Retrieve Image
From Disk
Timeout!
close()
Dynamic Content
• Web pages can be created as requests arrive
• Advantages
– Personalization (e.g., my.yahoo.com),
– interaction with client input
– interaction with back-end applications
• Disadvantages
– Performance penalty
• Generating dynamic content (CGI, ASP, PHP,
ColdFusion, JavaScript, Flash, …)
CGI Scripts
• CGI scripts are URLs with a .cgi extension
• The script is a program (e.g., C, JAVA, …)
• When the URL is requested, server invokes the
named script, passing to it client info
• Script outputs HTML page to standard output
(redirected to server)
• Server sends page to client
CGI Execution
fork()
CGI
Script
Server
Send page
Request
Response
Active Server Pages (ASPs)
• Active server pages are HTML documents
with extensions for embedded program
execution
• When request arrives, server fetches and
parses the HTML document
• Server executes embedded executable
code and plugs output into page
• Expanded page is sent to client