Web - School of Engineering and Computer Science
Download
Report
Transcript Web - School of Engineering and Computer Science
Basic Internet and
Networking Concepts
Representation and Management of Data
on the Internet
The Internet and the
World-Wide Web
The Internet
A worldwide network connecting millions
of hosts
Interconnecting many Local Area Networks
(LANs) (inter-network or just Internet)
The LANs connected to the Internet can be
of various types
A host is a computer that is connected to the
Internet
History of the Internet
It started as a United States government project,
sponsored by the Advanced Research Projects
Agency (ARPA), and was originally called the
ARPANET
The Internet grew quickly throughout the 1980s
and 90s
Less than 600 computers were connected to the
Internet in 1983; now there are over 10 million
Internet Applications
Email
World-Wide Web
FTP
Telnet
Newsgroups
Chat
...
The Web
The Web
The term World-Wide Web (Web or WWW) refers to pieces
of information found on the Internet
These pieces of information can be reached by hosts
connected to the Internet
The Web allows many different types of information to be
accessed using a common interface (Web browser)
A Web document usually contains links to other Web
documents, creating a hypermedia environment
The term Web comes from the fact that information is not
organized in a linear fashion
The Web
The term World-Wide Web (or simply Web)
describes a collection of many pieces of
information that are found on the Internet
Internet hosts can access this information
The Web allows many different types of
information to be accessed using a common
interface (Web browser)
A Web document usually contains links to other
Web documents, creating a hypermedia
environment
The term Web comes from the fact that
information is not organized in a linear fashion
Web Servers
These pieces of information are stored as
files on particular hosts of the Internet
These hosts are called Web servers
Information Types on the Web
The information pieces of the Web can be of
textual nature, images, video, audio,
programs or any other type of information
Every type of information can have
different formats for storing it as a file
For example, some formats for storing
images are jpeg, bmp, gif, ps, pdf
HTML
Much of the information that is found on
the Web is stored as HTML files
HTML is a markup language for formatting
text. In addition, HTML facilitates inclusion
of other types of information (such as
images) in our text documents
Here is an example of an HTML document
This is how it looks like when displayed
inside a browser
Browsers
We use a browser to display HTML
documents
The browser is responsible for fetching the
documents and displaying their contents
according to the HTML rules
Browsing
HTML documents can also contain links to
other HTML documents (or files of other
types, such as images, etc.). The user can
follow these links (by clicking them) to
view other related documents and files
Browsing/surfing refers to the activity of
viewing documents in the Internet and
following their links
URLs
Each information piece on the Web has a
unique identifying address which is called a
URL (Uniform Resource Locator)
A URL takes the following form:
http://www.huji.ac.il/index.html
protocol
hostname
file
It has 3 parts: a protocol field, a hostname
field and a file field
URL Fields
The protocol field (“http” in the previous
example) specifies the way in which the
information should be accessed
The host field specifies the host on which
the information is found
The file field specifies the particular
location in the host's file system where the
file is found
There could be more complex forms of
URLs, but we do not discuss them
Search Engines
What are search engines?
How do they work?
Shortcomings of search engines
Some popular search engines: Infoseek,
HotBot, Altavista, Excite, Lycos, Yahoo!,
Jeeves,...
HTTP Daemons
The information pieces of the Web are
stored as files on Web servers
In order to make these information pieces
available to other hosts, each server runs an
HTTP-daemon
HTTP Daemons (continued)
An HTTP-daemon is an application that is
constantly running on the server and waits
for requests from remote hosts
A host can request the daemon for a
document (a file) that is located on the
server
Technically, any host connected to the
Internet can act as a Web server by running
an HTTP-daemon application
Browser - HTTPD Interaction
user requests
http:// www.cs.huji.ac.il /index.html
GET /index.html
host www.cs.huji.ac.il
sends the
content of
index.html
HTTPD
application
Browser
Disk
Browser - HTTPD Interaction
The user requests http://www.cs.huji.ac.il/index.html
The browser contacts the HTTP-daemon running on
the host www.cs.huji.ac.il and requests the document
/index.html
The HTTP-daemon translates the requested name to
a specific file in its local file system
The HTTP-daemon reads the file index.html from
the disk and sends the contents of the file to the
browser
The browser receives the document, parses it
according to the HTML rules and displays it
IP (Internet-Protocol) Addresses
Hostnames are used by people. The network
mechanism uses IP-addresses instead
Every host connected to the Internet has a
unique IP address that identifies it
IP addresses are 32-bit numbers that are
usually written as four decimal numbers
separated by dots, e.g. 135.17.98.240,
where the numbers refer to the four bytes
composing this address
IP Packets
Information that is sent over a network is often
broken down in parts, called packets, which are
sent to the receiving machine and then
reassembled
In the Internet, data is transferred from one host to
another is divided into IP-packets
Routing IP Packets
The essential role of the Internet is to enable every
host to send IP-packets to any other host
Each IP-packet contains source and target IPaddresses
There is a routing protocol that handles the
transfer of packets to their target hosts, according
to the target IP addresses
The sending host only needs to know the IP
address of the target host it wishes to
communicate with
Using IP Addresses
How does the browser know the IP address
of the Web server?
One possibility is that the user explicitly
specifies the IP address of the server in the
host field of the URL, for example:
http://135.17.98.240/index.html
However, it is inconvenient for people to
remember such addresses
Internet Addresses
Many hosts have, in addition to IP address,
human-readable Internet Address (or hostnames)
Here are some examples of Internet Addresses:
www.cs.huji.ac.il
www.cocacola.com
www.yellowpages.co.il
www.isdn.net.il
The first part is the name of a particular host (i.e.,
computer)
The rest is the domain name
Internet Addresses (continued)
Hostnames have a hierarchical structure
www.cs.huji.ac.il
www is a computer in the Dept. of
Computer Science (cs) at the Hebrew
University of Jerusalem, Israel (huji), which
is an Academic Campus (ac) of Israel (il)
The rightmost name describes the main
domain of the host (il - Israel). Left to it,
there is a sub-domain, and then further to
the left, there are more specific sub-domains
Generic Domains
There are 7 special domains that are called generic
domains
• com - commercial organizations
(www.cocacola.com)
• edu - educational institutions (www.berkeley.com)
• gov - U.S. governmental organizations
(www.cia.gov)
• int - international organizations
• mil - U.S. military
• net - networks (InterNIC)
• org - other organizations (www.w3.org)
Country Domains
Generic domains usually refer to hosts inside the
U.S. Other countries use two-letter country
domains:
•
•
•
•
il - Israel
uk - United Kingdom
jp - Japan
se - Sweden
These domains usually have sub-domains that
correspond to the generic domains. For example,
co.il is the domain of all the commercial
organizations in Israel, and ac.il is the domain of
all the academic institutions inside Israel
Back to the Browser
When we address a host in the Internet, we usually
use its hostname (e.g., using a hostname in a URL)
The browser needs to map this hostname into the
corresponding IP address of the given host
There is no one-to-one correspondence between
the sections of an IP address and the sections of a
hostname
Translating IP Addresses to
Hostnames
The translation of IP addresses to hostnames
requires a lookup table
Since there are millions of hosts on the Internet, it
is not feasible for the browser to hold a table
which maps all hostnames to their IP-addresses
Moreover, new hosts are added to the Internet
every day and hosts change their names
DNS
The browser (and other Internet
applications) use a DNS-Server to map
hostnames to IP addresses
DNS (Domain Name System) is an
hierarchical scheme for naming hosts
Proxy Servers
A proxy server acts as a delegate of browsers for
accessing the Web
The browser transfers the requests for a document
to the Proxy
The Proxy contacts the suitable Web-server and
fetches the document on behalf of the browser
Proxy Server
proxy asks the
document from
the HTTPD
user requests a document
browser request the document
from the proxy
sends the
content of
index.html
Proxy server
Proxy
application
Cache
Browser
Advantages of Proxy Servers
Proxy servers have several advantages over
direct access:
• They can be combined with a firewall to
enable restricted access to the Internet
• They enable caching of popular documents
• They can enlarge the functionality of the
browser by translating from one protocol to
another (for example, from FTP to HTTP
and vice-versa)
Firewalls
A firewall poses restrictions on the traffic in
or out of a local-area network
Examples:
Hides sensitive data from the outside world
Prevents access of local users to specific sites
outside the local-area network
How a Firewall Works
All the traffic (of IP-packets) in or out of
the local-area network is forced to go
through a single host
A firewall application is installed on this
host
The firewall examines all the in and out
traffic of IP-packets and discards illegal
packets
Dynamically Generated Documents
user requests
http://www.excite.com/search?what=something
GET /search?what=something
host www.excite.com
sends the
contents of
index.html
HTTPD
application
execution of
search program
Browser
Basic Networking Concepts
Local-Area Networks
A Local-Area Network
(LAN) covers a small
distance and a small
number of computers
LAN
A LAN often connects the machines
in a single room or building
39
LANs (Local-Area Networks)
Limited size
Privately owned
•
•
•
•
Centrally managed
Usually hosts physically connected via a cable
Homogeneous devices & protocols
Known features (latency, bandwidth,..)
WANs (Wide Area Networks)
Wide-Area Networks
A Wide-Area Network (WAN)
connects two or more LANs,
often over long distances
LAN
LAN
A LAN is usually owned
by one organization, but
a WAN often connects
different groups in
different countries
42
What is a protocol?
06 7647834
Welcome to Mount Hermon
ski site. For ski conditions
press 1, for reservation of ski
packages press 5, ...
5
Please select the type
of your credit card.
For Visa press 1, ...
Layering
models protocol
sketches protocol
CAD protocol
modem protocol
TCP/IP
A protocol is a set of rules that determine how things
communicate with each other
The software which manages Internet communication
follows a suite of protocols called TCP/IP
The Internet Protocol (IP) determines the format of the
information as it is transferred
The Transmission Control Protocol (TCP) dictates how
messages are reassembled and handles lost information
45
TCP/IP protocol suite
Application
HTTP, FTP, TELNET,...
Transport
TCP, UDP
Internet
IP
Link
Ethernet, Token-Ring,...
TCP/IP protocol suite
Taken from "TCP/IP Illustrated Vol. 1" / Richard Stevens
Packets headers
Taken from "TCP/IP Illustrated Vol. 1" / Richard Stevens
IP Layer
Transmission of packets between two hosts
IP addresses
Routing protocol
IP Addresses
Class
Network ID
Host ID
32 bit
Class
A
B
C
D
E
From
0.0.0.0
128.0.0.0
192.0.0.0
224.0.0.0
240.0.0.0
InterNIC
Till
127.255.255.255
191.255.255.255
233.255.255.255
239.255.255.255
247.255.255.255
Net ID
7 bit
14 bit
21 bit
28 bit
27 bit
Host ID
24 bit
16 bit
8 bit
-
Routing
Transport Layer
TCP
• Connection oriented
• Reliable, keeps order
UDP
• Connectionless
• Unreliable
• Fast
Client-Server Model
Server application
Server machine
144.12.34.99
Port 5746
Client application
Client machine
190.30.42.155
Well-Known Ports
FTP 21
Telnet 23
HTTPD 80
...
End of Lecture 1
55
HTTP Protocol
Hypertext Transfer Protocol
Used between Web-clients (e.g., browsers)
and Web-servers (and proxies)
Text based
Built on top of TCP
Stateless protocol
HTTP Transaction -- Client
Client
request:
• Sends a request
GET /index.html HTTP/1.0
• Sends optional header information
User-Agent: browser name
Accept:formats the browser understands
...
• Sends a blank line (\n)
• Can send post data
HTTP Transaction -- Server
Server response:
• sends status line
HTTP/1.0 200 OK
• sends header information
Content-type: text/html
Content-length: 3022
...
sends a blank line (\n)
sends document data
Reacting to Responses of Clients
HTML pages are static documents
To achieve interaction with the user, there is
a need for Internet tools and techniques that
get input from the user and react according
to this input
Sometimes there is a need to produce output
as a result of querying a database. The
output in this case is not known in advance
Server Technologies
Some Web applications use online input to create
pages on the fly (for example, search engines)
A request will include, in addition to the URL of
the service provider, a list of parameters
For example,
http://www.google.com/search?q=search-word
The creation of the pages may also require
interaction with some applications (for example,
database queries)
Creating Pages on the Fly
in the Server
There are four common ways to serve page
requests that include input parameters:
•
•
•
•
CGI (Common Gateway Interface) programming
Java Servlets
JSP -- Java Server Pages, or
Microsoft ASP -- Active Server Pages (similar to
JSP)
CGI Programming
CGI is a scripting language
A cgi script works with an application that runs on
the server and creates HTML code
An early technology
Java Servlets
Servlets are java applications that some Web
servers can run
A Servlet creates pages on the fly and these pages
are returned to the requesting browser
JSP and ASP
JSP (Java Server Pages)
• Create an HTML page that has Java code inside
HTML tags
This page is actually a template
The code, for example, could issue a database query
and create an HTML table for the result
• The Web server executes the code in the template
and produces a pure HTML page that is returned to
the client
Microsoft ASP (Active Server Pages)
• The code is VB (Visual Basic) scripts
• The Web server must be Microsoft IIS server
Client Technologies
Some technologies interact with the user on the
client level (Web browser)
Java Script is a scripting language that can be
added to HTML pages
Web browsers can run the script and change the
output accordingly
There is a slight interaction of the script with the
file system using cookies
Cookies are small files that store some personal
information in the file system of the client
Separating Contents from Style
In HTML, the contents and the style of
pages are inseparable
• HTML tags actually refer only to the style
XML (eXtensible Markup Language) is a
new markup language for marking the
semantics (meaning) of the data
XML tags describe the meaning of each
portion of text in an XML document
XML Tags
XML tags are similar to attributes in a
relation
However, the attributes are the same for all
the records of the relation
In XML documents, each portion of text has
its own tag
• <course> databases </course>
• <course> operating systems </course>
XML tags can be nested
Parsing XML Documents
XML facilitates easy parsing of documents
according to their semantics
For example, the CS Department has many
Web pages of courses
Can we write a program that reads all these
pages and prints a list of the names of
courses?
If XML tags are used, it is easy to do that
Using XML
XML is important in the context of data
exchange between applications
It is possible to define a common set of tags
that are suited for specific applications
For example, MathML is used for
exchanging mathematical information
Showing XML Document in
Browsers
XML documents contain data with semantic
tags
For a graphical representation, information
about the style must be added
• For example, HTML tags provide information
about the style
Style Sheets
Style is added to XML documents by
means of style sheets
There are two style-sheet languages
• CSS -- Cascading Style Sheets
•
Describe how to graphically show the data
• XSL -- XML Style-sheet Language
•
Can also transform the data
Putting it All Together
A common architecture for Web applications has
several tiers
• DBMS (database management system) for storing
and processing information
• A Web server for producing pages as a result of
client requests
• A browser that supports dynamic pages using Java
scripts (for creating dynamic pages) and CSS (for
creating the desired visual output)
How Should XML be Used?
How can we query easily and effectively XML
documents?
How can we store efficiently XML documents?
What is the proper way to include other resources
in XML documents (i.e., figures, sounds, etc.)?
How can we use
a general style, and
information that is semantically well defined
without making the process of creating documents
too cumbersome?
Course topics
Server-side programming
• JDBC for connecting to the DBMS
• Servlets
• JSP
Client-side programming
• Java Scripts
• CSS
Data storage and processing on the Web
• XML
• XSL