Transcript Slide 1

Web
Components
(Introduction)
Chapter 1
1
Web Protocols and Practice
INTRODUCTION
Topics









2
Web History
Web Definition
Semantic Components of the Web
Content on the Web
Software Components
Underlying Network
Standardization
Web Traffic and Performance
Web Applications
Web Protocols and Practice
INTRODUCTION
Web History
 1945: Vanner Bush proposed Memex which is a device
to extend human memory by providing a large scaling
indexing of text.
 1965: Hypertext: Non-sequential writing that presents
information as the collection of linked nodes.
 1960-1970: U.S. Department of Defense extended the
use of its communication infrastructure (ARPANET) for
the connected computers. In 1980 they deployed TCP/IP
that caused rapid growth in size and scope of ARPANET.
 1989: Tim Berners Lee proposed using the hypertext for
accessing the information of the computers at CERN
3
Web Protocols and Practice
INTRODUCTION
Web History
 During 1980-1990 these systems have been used widely
on Internet to access the information :
 FTTP: For file transfer. It works by knowing the ftp server.
 Gopher: Provided the ways for the users to search the
servers in the network.
 WAIS (Wide Area Information Servers): Allowed users to
send queries to the databases around the network.
 Archie: Global index of ftp servers that allowed the users to
do the search based on file name.
 1992: The first official release of the web browser.
 1993: First graphical web browser (MOSAIC)
4
Web Protocols and Practice
INTRODUCTION
Web Definition
 The World Wide Web, or simply the Web, is the
universe of information accessible via networked
computers.
 Internet is different from web. It is a network of
computers, in which a computer may not
necessarily act as a web client or web server.
5
Web Protocols and Practice
INTRODUCTION
Semantic Components of the Web
 Three main semantic components of the Web
are:
 A naming infrastructure (URI)
 A document language (HTML)
 A message exchange protocol (HTTP)
6
Web Protocols and Practice
INTRODUCTION
URI (Uniform Resource Identifier)
 Accessing and manipulating resources distributed
throughout the Web requires a way to identify them. URI
is a universal naming mechanism for identifying resource
on Web independent of its current location or value.
 URI can be thought of as a pointer to a black box to
which request method can be applied to generate
different responses at different times. Request method is
a simple operation such as fetching, changing, or
deleting a resource.
for example in the high level a string such as
http:// www.foo.com/coolpic.gif is a URI.
 Later we will see how it is different from URL.
7
Web Protocols and Practice
INTRODUCTION
HTML (Hypertext Markup Language)
 HTML provides a standard representation for
hypertext documents in ASCII format.
8
Web Protocols and Practice
INTRODUCTION
HTTP (Hypertext Transfer Protocol)
 HTTP is the most common way of transferring
resources on the Web.
 HTTP defines the format and meaning of
messages exchanged between web
components, such as clients and servers.
 HTTP is simply a language that has specific
syntax and semantics associate with the use of
the language elements.
9
Web Protocols and Practice
INTRODUCTION
HTTP (Hypertext Transfer Protocol)
 HTTP is a request-response protocol
 The client sends a request message and then the
server replies with the response message.
 HTTP is a stateless protocol
 clients and servers treat each message
exchange independently and are not required to
maintain any state across requests and
responses.
10
Web Protocols and Practice
INTRODUCTION
Table 1.1. Common Web terms
Term
Definition
WWW/Web World Wide Web, the universe of information accessible
via networked computers
Hypertext Nonlinear writing or linking related documents for
navigation
Internet
Worldwide collection of interconnected networks using
the Internet Protocol (IP)
Web page Document accessible on the Web via a URI
Web site
Collection of related Web pages
Browser
Application for requesting and displaying Web resources
11
Web Protocols and Practice
INTRODUCTION
Content on the Web
 Each resource may be available in different
formats for example:
 HTML
 PostScript
 A resource may be:
 A static file on a machine
 Generated dynamically at the time of the request
12
Web Protocols and Practice
INTRODUCTION
Content on the Web
 Each HTTP transfer consists of two messages:
 The request message
» Sent by the client
 The response message
»
13
Sent by the server
Web Protocols and Practice
INTRODUCTION
Table 1.2. Terminology related to Web resources and HTTP messages
Term
Definition
Resource
Network data object or service identified by a URI
Message
Basic unit of communication in HTTP
Sender/receiver Component responsible for sending/ receiving a
message
Header
Control portion of a message
Entity
Information transferred in the body of a message
14
Web Protocols and Practice
INTRODUCTION
Software Components
 User agent
 A user agent can be a Web browser that
generates requests on behalf of a user and
performs a variety of other tasks, such as
displaying Web pages and storing the user's
bookmarks.
 Proxy
 A proxy is an intermediary between clients and
servers that performs a variety of functions:
» filtering of requests to undesirable Web sites
» Providing a degree of anonymity to clients
» caching popular resources.
15
Web Protocols and Practice
INTRODUCTION
Software Components
 Server
 The server may instruct the user agent to retain
state across a series of requests and responses
by storing a cookie. We will discuss cookies later
16
Web Protocols and Practice
INTRODUCTION
Table 1.3. Terminology related to the software components of the Web
Term
Definition
User agent
Web client
Web Server
Client program that initiates a request (e.g., a browser)
Program that sends an HTTP request to a Web server
Program that receives an HTTP request from a Web
client and transmits a response
Origin Server Server where the requested resource resides or is
created
Intermediary Web component in the path between the user agent
and an origin server (e.g., a proxy, gateway)
Proxy
Intermediary program that functions as a server to a
client and as a client to a server
Cookie
State information passed between the user agent and
the origin server
17
Web Protocols and Practice
INTRODUCTION
Underlying Network
 A Web client identifies the Web server by the
hostname (e.g., www.att.com), rather than an IP
address by using Domain name system (DNS)
 The two applications exchange HTTP messages
 By using Transmission Control Protocol
(TCP) The client and the server establish a TCP
connection.
18
Web Protocols and Practice
INTRODUCTION
Table 1.4. Terminology related to the Internet and its protocols of the
Web
Term
Host
Packet
IP
IP address
Hostname
DNS
TCP
Connection
19
Definition
Computer or machine connected to the network
Basic unit of communication in the Internet
Internet Protocol, a protocol that coordinates the
Delivery of individual packets between hosts
32-bit numerical address identifying an Internet host
Case-insensitive string identifying an Internet host
Domain name System, a distributed infrastructure for
translating between hostnames and IP addresses
Transmission Control Protocol, a protocol that
provides the abstraction of a reliable, bidirectional
connection
Logical communication channel between two hosts
Web Protocols and Practice
INTRODUCTION
Standardization
A protocol standard is needed for interoperation of
the components.
 The Internet Engineering Task Force (IETF) is
an open community that deals with Internet
standardization through a series of official
publications called Request for Comments
(RFC)
 Not all Internet Drafts become RFCs. RFCs are
divided into different tracks: standards, historic,
informational and Experimental
20
Web Protocols and Practice
INTRODUCTION
Standardization
Standard documents have compliance
requirements of the following levels:
 Any compliant implementation has to meet all
the MUST-level requirements.
 An implementation can be considered
conditionally compliant if it meets all the
SHOULD-level requirements.
 The MAY- level requirements are optional for an
implementation to meet.
21
Web Protocols and Practice
INTRODUCTION
Standardization
 A standards document proceeds through three
stages:
 Proposed Standard
 Draft Standard
 Internet Standard
 Some RFCs reflect the Best Current Practices
(BCP)
 Standards do not last forever; they can be retired
and replaced by a superior specification.
22
Web Protocols and Practice
INTRODUCTION
Standardization
 World Wide Web Consortium (W3C) was founded in
1994 to encourage the growth of Web.
 The W3C works on
»
The representation of Web content, such as the
»
»
HTML language, rather than the networking aspects
Architectural issues
User-interface issues
 Formats
 Languages
»
»
»
23
Social issues
Legal and public policy matters
Accessibility issues to ensure that people with
disabilities are able to have access to the technology
Web Protocols and Practice
INTRODUCTION
Table 1.5. Terminology related to Internet protocol standards
Term
Definition
IETF
Internet Engineering Task force, an open community
contributing to the evolution of the Internet
Working Group IETF group chartered to work on a particular
standards specification
Internet Draft
Informal version of a standards documents reflecting
work in progress
RFC
Request for comments, an official document related
to Internet standards
24
Web Protocols and Practice
INTRODUCTION
Web Traffic and Performance
 User expectations for quick responses have
focused attention on performance issues.
 High user perceived latency can be because of
variety of factors such as:
 DNS overhead
 Network congestion
 Load on server
 Analysis of logs is a useful for knowing the
workload characteristics such as time between
the requests and size of the requests and
resource popularity, which have the important
implications on Web performance
25
Web Protocols and Practice
INTRODUCTION
Table 1.6. Terminology related to Web traffic and performance
Term
Definition
Latency
Time between the initiation of an action and the first
Indication of a response
User-perceived Time between a user action and the initial display of
latency
the content
Bandwidth
Amount of traffic that can be carried per unit time
Workload
Inputs received by a Web component over time
Log
Record of transactions performed by a Web
component
26
Web Protocols and Practice
INTRODUCTION
Web Applications
 Important applications are:
 Web caching
» Caching moves contents closer to the user.
» A cache can be located at
 A user's browser
 An origin server
 A machine in the path between the user and the
origin server
 Multimedia streaming
» The client plays the samples and frames as they
arrive from the server, rather than downloading the
content in its entirety before beginning playout.
27
Web Protocols and Practice
INTRODUCTION
Table 1.7. Terminology related to Web catching and multimedia
streaming
Term
Cache
Cache coherency
Replication
Content distribution
Audio/video stream
Streaming
Media player
28
Definition
Store of messages used to reduce userperceived latency and load on the network
and server
Mechanism to lower the possibility of returning
out-of-date messages from the cache
Duplication of resources on multiple origin
servers
Delivery of resources on behalf of an origin
server
Sequence of audio samples or video frames
Overlap of the server transmission and client
playback of audio/video data
Helper application for playing multimedia
streams
Web Protocols and Practice