Web - School of Engineering and Computer Science

Download Report

Transcript Web - School of Engineering and Computer Science

Basic Internet and
Networking Concepts
Representation and Management of Data
on the Internet
The Internet and the
World-Wide Web
The Internet
A worldwide network connecting millions
of hosts
 Interconnecting many Local Area Networks
(LANs) (inter-network or just Internet)
 The LANs connected to the Internet can be
of various types
 A host is a computer that is connected to the
Internet

History of the Internet
 It started as a United States government project,
sponsored by the Advanced Research Projects
Agency (ARPA), and was originally called the
ARPANET
 The Internet grew quickly throughout the 1980s
and 90s
 Less than 600 computers were connected to the
Internet in 1983; now there are over 10 million
Internet Applications
Email
 World-Wide Web
 FTP
 Telnet
 Newsgroups
 Chat
 ...

The Web
The Web





The term World-Wide Web (Web or WWW) refers to pieces
of information found on the Internet
These pieces of information can be reached by hosts
connected to the Internet
The Web allows many different types of information to be
accessed using a common interface (Web browser)
A Web document usually contains links to other Web
documents, creating a hypermedia environment
The term Web comes from the fact that information is not
organized in a linear fashion
The Web





The term World-Wide Web (or simply Web)
describes a collection of many pieces of
information that are found on the Internet
Internet hosts can access this information
The Web allows many different types of
information to be accessed using a common
interface (Web browser)
A Web document usually contains links to other
Web documents, creating a hypermedia
environment
The term Web comes from the fact that
information is not organized in a linear fashion
Web Servers
These pieces of information are stored as
files on particular hosts of the Internet
 These hosts are called Web servers

Information Types on the Web
The information pieces of the Web can be of
textual nature, images, video, audio,
programs or any other type of information
 Every type of information can have
different formats for storing it as a file
 For example, some formats for storing
images are jpeg, bmp, gif, ps, pdf

HTML
Much of the information that is found on
the Web is stored as HTML files
 HTML is a markup language for formatting
text. In addition, HTML facilitates inclusion
of other types of information (such as
images) in our text documents
 Here is an example of an HTML document
 This is how it looks like when displayed
inside a browser

Browsers
We use a browser to display HTML
documents
 The browser is responsible for fetching the
documents and displaying their contents
according to the HTML rules

Browsing
HTML documents can also contain links to
other HTML documents (or files of other
types, such as images, etc.). The user can
follow these links (by clicking them) to
view other related documents and files
 Browsing/surfing refers to the activity of
viewing documents in the Internet and
following their links

URLs
Each information piece on the Web has a
unique identifying address which is called a
URL (Uniform Resource Locator)
 A URL takes the following form:
 http://www.huji.ac.il/index.html

protocol

hostname
file
It has 3 parts: a protocol field, a hostname
field and a file field
URL Fields
The protocol field (“http” in the previous
example) specifies the way in which the
information should be accessed
 The host field specifies the host on which
the information is found
 The file field specifies the particular
location in the host's file system where the
file is found
 There could be more complex forms of
URLs, but we do not discuss them

Search Engines
What are search engines?
 How do they work?
 Shortcomings of search engines
 Some popular search engines: Infoseek,
HotBot, Altavista, Excite, Lycos, Yahoo!,
Jeeves,...

HTTP Daemons
The information pieces of the Web are
stored as files on Web servers
 In order to make these information pieces
available to other hosts, each server runs an
HTTP-daemon

HTTP Daemons (continued)
An HTTP-daemon is an application that is
constantly running on the server and waits
for requests from remote hosts
 A host can request the daemon for a
document (a file) that is located on the
server
 Technically, any host connected to the
Internet can act as a Web server by running
an HTTP-daemon application

Browser - HTTPD Interaction
user requests
http:// www.cs.huji.ac.il /index.html
GET /index.html
host www.cs.huji.ac.il
sends the
content of
index.html
HTTPD
application
Browser
Disk
Browser - HTTPD Interaction





The user requests http://www.cs.huji.ac.il/index.html
The browser contacts the HTTP-daemon running on
the host www.cs.huji.ac.il and requests the document
/index.html
The HTTP-daemon translates the requested name to
a specific file in its local file system
The HTTP-daemon reads the file index.html from
the disk and sends the contents of the file to the
browser
The browser receives the document, parses it
according to the HTML rules and displays it
IP (Internet-Protocol) Addresses
Hostnames are used by people. The network
mechanism uses IP-addresses instead
 Every host connected to the Internet has a
unique IP address that identifies it
 IP addresses are 32-bit numbers that are
usually written as four decimal numbers
separated by dots, e.g. 135.17.98.240,
where the numbers refer to the four bytes
composing this address

IP Packets


Information that is sent over a network is often
broken down in parts, called packets, which are
sent to the receiving machine and then
reassembled
In the Internet, data is transferred from one host to
another is divided into IP-packets
Routing IP Packets




The essential role of the Internet is to enable every
host to send IP-packets to any other host
Each IP-packet contains source and target IPaddresses
There is a routing protocol that handles the
transfer of packets to their target hosts, according
to the target IP addresses
The sending host only needs to know the IP
address of the target host it wishes to
communicate with
Using IP Addresses
How does the browser know the IP address
of the Web server?
 One possibility is that the user explicitly
specifies the IP address of the server in the
host field of the URL, for example:
http://135.17.98.240/index.html
 However, it is inconvenient for people to
remember such addresses

Internet Addresses
 Many hosts have, in addition to IP address,
human-readable Internet Address (or hostnames)
 Here are some examples of Internet Addresses:
www.cs.huji.ac.il
www.cocacola.com
www.yellowpages.co.il
www.isdn.net.il


The first part is the name of a particular host (i.e.,
computer)
The rest is the domain name
Internet Addresses (continued)

Hostnames have a hierarchical structure
www.cs.huji.ac.il
www is a computer in the Dept. of
Computer Science (cs) at the Hebrew
University of Jerusalem, Israel (huji), which
is an Academic Campus (ac) of Israel (il)

The rightmost name describes the main
domain of the host (il - Israel). Left to it,
there is a sub-domain, and then further to
the left, there are more specific sub-domains
Generic Domains

There are 7 special domains that are called generic
domains
• com - commercial organizations
(www.cocacola.com)
• edu - educational institutions (www.berkeley.com)
• gov - U.S. governmental organizations
(www.cia.gov)
• int - international organizations
• mil - U.S. military
• net - networks (InterNIC)
• org - other organizations (www.w3.org)
Country Domains

Generic domains usually refer to hosts inside the
U.S. Other countries use two-letter country
domains:
•
•
•
•

il - Israel
uk - United Kingdom
jp - Japan
se - Sweden
These domains usually have sub-domains that
correspond to the generic domains. For example,
co.il is the domain of all the commercial
organizations in Israel, and ac.il is the domain of
all the academic institutions inside Israel
Back to the Browser



When we address a host in the Internet, we usually
use its hostname (e.g., using a hostname in a URL)
The browser needs to map this hostname into the
corresponding IP address of the given host
There is no one-to-one correspondence between
the sections of an IP address and the sections of a
hostname
Translating IP Addresses to
Hostnames



The translation of IP addresses to hostnames
requires a lookup table
Since there are millions of hosts on the Internet, it
is not feasible for the browser to hold a table
which maps all hostnames to their IP-addresses
Moreover, new hosts are added to the Internet
every day and hosts change their names
DNS
The browser (and other Internet
applications) use a DNS-Server to map
hostnames to IP addresses
 DNS (Domain Name System) is an
hierarchical scheme for naming hosts

Proxy Servers



A proxy server acts as a delegate of browsers for
accessing the Web
The browser transfers the requests for a document
to the Proxy
The Proxy contacts the suitable Web-server and
fetches the document on behalf of the browser
Proxy Server
proxy asks the
document from
the HTTPD
user requests a document
browser request the document
from the proxy
sends the
content of
index.html
Proxy server
Proxy
application
Cache
Browser
Advantages of Proxy Servers

Proxy servers have several advantages over
direct access:
• They can be combined with a firewall to
enable restricted access to the Internet
• They enable caching of popular documents
• They can enlarge the functionality of the
browser by translating from one protocol to
another (for example, from FTP to HTTP
and vice-versa)
Firewalls
A firewall poses restrictions on the traffic in
or out of a local-area network
Examples:
Hides sensitive data from the outside world
Prevents access of local users to specific sites
outside the local-area network
How a Firewall Works
All the traffic (of IP-packets) in or out of
the local-area network is forced to go
through a single host
A firewall application is installed on this
host
The firewall examines all the in and out
traffic of IP-packets and discards illegal
packets
Dynamically Generated Documents
user requests
http://www.excite.com/search?what=something
GET /search?what=something
host www.excite.com
sends the
contents of
index.html
HTTPD
application
execution of
search program
Browser
Basic Networking Concepts
Local-Area Networks
A Local-Area Network
(LAN) covers a small
distance and a small
number of computers
LAN
A LAN often connects the machines
in a single room or building
39
LANs (Local-Area Networks)


Limited size
Privately owned
•
•
•
•
Centrally managed
Usually hosts physically connected via a cable
Homogeneous devices & protocols
Known features (latency, bandwidth,..)
WANs (Wide Area Networks)
Wide-Area Networks
A Wide-Area Network (WAN)
connects two or more LANs,
often over long distances
LAN
LAN
A LAN is usually owned
by one organization, but
a WAN often connects
different groups in
different countries
42
What is a protocol?
06 7647834
Welcome to Mount Hermon
ski site. For ski conditions
press 1, for reservation of ski
packages press 5, ...
5
Please select the type
of your credit card.
For Visa press 1, ...
Layering
models protocol
sketches protocol
CAD protocol
modem protocol
TCP/IP
 A protocol is a set of rules that determine how things
communicate with each other
 The software which manages Internet communication
follows a suite of protocols called TCP/IP
 The Internet Protocol (IP) determines the format of the
information as it is transferred
 The Transmission Control Protocol (TCP) dictates how
messages are reassembled and handles lost information
45
TCP/IP protocol suite
Application
HTTP, FTP, TELNET,...
Transport
TCP, UDP
Internet
IP
Link
Ethernet, Token-Ring,...
TCP/IP protocol suite
Taken from "TCP/IP Illustrated Vol. 1" / Richard Stevens
Packets headers
Taken from "TCP/IP Illustrated Vol. 1" / Richard Stevens
IP Layer



Transmission of packets between two hosts
IP addresses
Routing protocol
IP Addresses
Class
Network ID
Host ID
32 bit
Class
A
B
C
D
E

From
0.0.0.0
128.0.0.0
192.0.0.0
224.0.0.0
240.0.0.0
InterNIC
Till
127.255.255.255
191.255.255.255
233.255.255.255
239.255.255.255
247.255.255.255
Net ID
7 bit
14 bit
21 bit
28 bit
27 bit
Host ID
24 bit
16 bit
8 bit
-
Routing
Transport Layer

TCP
• Connection oriented
• Reliable, keeps order

UDP
• Connectionless
• Unreliable
• Fast
Client-Server Model
Server application
Server machine
144.12.34.99
Port 5746
Client application
Client machine
190.30.42.155
Well-Known Ports
FTP 21
 Telnet 23
 HTTPD 80
 ...

End of Lecture 1
55
HTTP Protocol
Hypertext Transfer Protocol
 Used between Web-clients (e.g., browsers)
and Web-servers (and proxies)
 Text based
 Built on top of TCP
 Stateless protocol

HTTP Transaction -- Client
 Client
request:
• Sends a request
GET /index.html HTTP/1.0
• Sends optional header information
User-Agent: browser name
Accept:formats the browser understands
...
• Sends a blank line (\n)
• Can send post data
HTTP Transaction -- Server

Server response:
• sends status line
HTTP/1.0 200 OK
• sends header information
Content-type: text/html
Content-length: 3022
...
sends a blank line (\n)
 sends document data

Reacting to Responses of Clients
HTML pages are static documents
 To achieve interaction with the user, there is
a need for Internet tools and techniques that
get input from the user and react according
to this input
 Sometimes there is a need to produce output
as a result of querying a database. The
output in this case is not known in advance

Server Technologies




Some Web applications use online input to create
pages on the fly (for example, search engines)
A request will include, in addition to the URL of
the service provider, a list of parameters
For example,
http://www.google.com/search?q=search-word
The creation of the pages may also require
interaction with some applications (for example,
database queries)
Creating Pages on the Fly
in the Server

There are four common ways to serve page
requests that include input parameters:
•
•
•
•
CGI (Common Gateway Interface) programming
Java Servlets
JSP -- Java Server Pages, or
Microsoft ASP -- Active Server Pages (similar to
JSP)
CGI Programming
CGI is a scripting language
A cgi script works with an application that runs on
the server and creates HTML code
An early technology
Java Servlets
Servlets are java applications that some Web
servers can run
A Servlet creates pages on the fly and these pages
are returned to the requesting browser
JSP and ASP

JSP (Java Server Pages)
• Create an HTML page that has Java code inside
HTML tags


This page is actually a template
The code, for example, could issue a database query
and create an HTML table for the result
• The Web server executes the code in the template
and produces a pure HTML page that is returned to
the client

Microsoft ASP (Active Server Pages)
• The code is VB (Visual Basic) scripts
• The Web server must be Microsoft IIS server
Client Technologies





Some technologies interact with the user on the
client level (Web browser)
Java Script is a scripting language that can be
added to HTML pages
Web browsers can run the script and change the
output accordingly
There is a slight interaction of the script with the
file system using cookies
Cookies are small files that store some personal
information in the file system of the client
Separating Contents from Style

In HTML, the contents and the style of
pages are inseparable
• HTML tags actually refer only to the style
XML (eXtensible Markup Language) is a
new markup language for marking the
semantics (meaning) of the data
 XML tags describe the meaning of each
portion of text in an XML document

XML Tags
XML tags are similar to attributes in a
relation
 However, the attributes are the same for all
the records of the relation
 In XML documents, each portion of text has
its own tag

• <course> databases </course>
• <course> operating systems </course>

XML tags can be nested
Parsing XML Documents
XML facilitates easy parsing of documents
according to their semantics
 For example, the CS Department has many
Web pages of courses
 Can we write a program that reads all these
pages and prints a list of the names of
courses?
 If XML tags are used, it is easy to do that

Using XML
XML is important in the context of data
exchange between applications
 It is possible to define a common set of tags
that are suited for specific applications
 For example, MathML is used for
exchanging mathematical information

Showing XML Document in
Browsers
XML documents contain data with semantic
tags
For a graphical representation, information
about the style must be added
• For example, HTML tags provide information
about the style
Style Sheets
Style is added to XML documents by
means of style sheets
There are two style-sheet languages
• CSS -- Cascading Style Sheets
•
Describe how to graphically show the data
• XSL -- XML Style-sheet Language
•
Can also transform the data
Putting it All Together

A common architecture for Web applications has
several tiers
• DBMS (database management system) for storing
and processing information
• A Web server for producing pages as a result of
client requests
• A browser that supports dynamic pages using Java
scripts (for creating dynamic pages) and CSS (for
creating the desired visual output)
How Should XML be Used?




How can we query easily and effectively XML
documents?
How can we store efficiently XML documents?
What is the proper way to include other resources
in XML documents (i.e., figures, sounds, etc.)?
How can we use
a general style, and
information that is semantically well defined
without making the process of creating documents
too cumbersome?
Course topics

Server-side programming
• JDBC for connecting to the DBMS
• Servlets
• JSP

Client-side programming
• Java Scripts
• CSS

Data storage and processing on the Web
• XML
• XSL