ppt - School of Engineering and Computer Science

Download Report

Transcript ppt - School of Engineering and Computer Science

Basic Internet and
Networking Concepts
Representation and Management
of Data on the Internet
The Internet and the
World-Wide Web
TCP/IP and Web Browsers
2
The Internet and the Web

Internet means Inter-Network
• A world-wide network of many LANs (localarea networks)
• The LANs are of various types

Web means World-Wide Web
• A large collection of information arranged
as hypertext and stored in many computers
that are part of the Internet

The two are similar but not identical
3
A Bit of History
The Internet grew very rapidly
throughout the 1980s and 90s
Less than 600 computers were connected
to the Internet in 1983
Now there are tens (if not hundreds) of
millions of computers
The Web started in 1989 and grew very
rapidly during the 1990s
The current Web has billions of pages
Internet Applications
 Email
 Telnet
 FTP
 Newsgroups
 World-Wide
 Chat
 ...
Web
The Web
Web Browsers
Web browsers provide a very
convenient interface for viewing the
information stored on the Web
 Mosaic – the first browser – was
introduced in 1993 and sharply
increased the popularity of the Web

TCP/IP

TCP/IP is the common language of the
Internet
• IP – Internet Protocol
• TCP – Transmission Control Protocol


The IP protocol transmits packets of
data from one host (computer) to
another
The TCP protocol uses many packets to
transmit a long stream of data
8
TCP vs. IP

IP routes each packet from the source
host to the destination host
• IP is oblivious to the fact that usually each
packet is part of a data stream

TCP handles correctly a long data
stream
• Divides a long data stream into many
packets, at the source
• Reassembles the packets, in the right
order, at the destination
• Handles errors and lost data
9
Sockets



Sockets are a common interface that
make TCP streams look like file streams
Modern programming languages
support sockets
A read or a write operation to/from a
socket may block
• Until data arrives, or
• Until data can be sent

Use multiple threads so that blocking
will not cause the whole GUI to freeze
10
A Short Overview of How the
Web Works
11
Web Servers
Pieces of information are stored on the
Web as HTML pages
 These HTML pages are stored as files
on particular hosts of the Internet
 These hosts are called Web servers
 Each server runs an HTTP-daemon in
order to make its HTML pages available
to other hosts

Browsers
 We
use a browser to display HTML
pages
 The browser is responsible for
fetching the HTML pages and
displaying their contents according
to the HTML rules
HTTP Daemons
An HTTP-daemon is an application that
is constantly running on the server and
waits for requests from remote hosts
 A host can request the daemon for an
HTML page (a file) that is located on the
server
 Technically, any host connected to the
Internet can act as a Web server by
running an HTTP-daemon application

Browser - HTTPD Interaction
user requests
http:// www.cs.huji.ac.il /index.html
GET /index.html
host www.cs.huji.ac.il
sends the
content of
index.html
HTTPD
application
Browser
Disk
Browser - HTTPD Interaction





The user requests
http://www.cs.huji.ac.il/index.html
The browser contacts the HTTP-daemon
running on the host www.cs.huji.ac.il and
requests the HTML page /index.html
The HTTP-daemon translates the requested
name to a specific file in its local file system
The HTTP-daemon reads the file index.html
from the disk and sends the content of the file
to the browser
The browser receives the HTML page, parses it
according to the HTML rules and displays it
Proxy Servers
A proxy server acts as a delegate of
browsers for accessing the Web
 The browser transfers the request for a
document to the Proxy
 The Proxy contacts the suitable Webserver and fetches the document on
behalf of the browser

proxy asks the
document from
the HTTPD
Proxy Server
user requests a document
browser requests the document
from the proxy
sends the
content of
Proxy server index.html
Proxy
application
Cache
Browser
Advantages of Proxy Servers

Proxy servers have several advantages
over direct access:
• They can be combined with a firewall to
enable restricted access to the Internet
• They enable caching of popular documents
• They can enlarge the functionality of the
browser by translating from one protocol
to another (for example, from FTP to HTTP
and vice-versa)
Dynamically Generated
Documents
user requests
http://www.excite.com/search?what=something
GET /search?what=something
host www.excite.com
sends the
content of
index.html
HTTPD
execution of a
application
search program
Browser
IP Addresses, Host Names
and URLs
21
IP Addresses
Every host connected to the Internet
has a unique IP address that identifies it
 IP addresses are 32-bit numbers that
are usually written as four decimal
numbers separated by dots, e.g.
135.17.98.240, where the numbers refer
to the four bytes composing this
address

Internet Addresses
Many hosts have, in addition to IP
address, human-readable Internet
Address (or hostnames)
Here are some examples of Internet
Addresses:
www.cs.huji.ac.il
www.cocacola.com
shum.cc.huji.ac.il
The first part is the name of a particular
host (i.e., computer)
 The rest is the domain name

Internet Addresses (cont’d)

Hostnames have a hierarchical
structure
www.cs.huji.ac.il
www is a computer in the Dept. of
Computer Science (cs) at the Hebrew
University of Jerusalem, Israel (huji), which
is an Academic Campus (ac) of Israel (il)

The rightmost name describes the main
domain of the host (il - Israel); left to it,
there is a sub-domain, and then further
to the left, there are more specific subdomains
Generic Domains

There are 7 special domains that are
called generic domains
• com - commercial organizations
(www.cocacola.com)
• edu - educational institutions
(www.berkeley.com)
• gov - U.S. governmental organizations
(www.cia.gov)
• int - international organizations
• mil - U.S. military
• net - networks (InterNIC)
• org - other organizations (www.w3.org)
Country Domains

Generic domains usually refer to hosts inside
the U.S. Other countries use two-letter
country domains:
•
•
•
•

il - Israel
uk - United Kingdom
jp - Japan
se - Sweden
These domains usually have sub-domains
that correspond to the generic domains; for
example, co.il is the domain of all the
commercial organizations in Israel, and ac.il
is the domain of all the academic institutions
inside Israel
URLs
Each information piece on the Web has
a unique identifying address which is
called a URL (Uniform Resource
Locator)
 A URL takes the following form:
 http://www.huji.ac.il/index.html

protocol

hostname
file
It has 3 parts: a protocol field, a
hostname field and a file field
URL Fields
The protocol field (“http” in the previous
example) specifies the way in which the
information should be accessed
 The host field specifies the host on
which the information is found
 The file field specifies the particular
location in the host's file system where
the file is found
 More complex forms of URLs are
possible

Using IP Addresses
How does the browser know the IP
address of the Web server?
 One possibility is that the user explicitly
specifies the IP address of the server in
the host field of the URL, for example:
http://135.17.98.240/index.html
 However, it is inconvenient for people to
remember such addresses

Back to the Browser
When we address a host in the Internet,
we usually use its hostname (e.g., using
a hostname in a URL)
 The browser needs to map this
hostname into the corresponding IP
address of the given host
 There is no one-to-one correspondence
between the sections of an IP address
and the sections of a hostname

Translating IP Addresses to
Hostnames
The translation of IP addresses to
hostnames requires a lookup table
 Since there are millions of hosts on the
Internet, it is not feasible for the browser
to hold a table which maps all
hostnames to their IP-addresses
 Moreover, new hosts are added to the
Internet every day and hosts change
their names

DNS
 The
browser (and other Internet
applications) use a DNS Server to
map hostnames to IP addresses
 DNS (Domain Name System) is an
hierarchical scheme for naming
hosts
Basic Networking
Concepts
33
Local-Area Networks
A Local-Area Network
(LAN) covers a small
distance and a small
number of computers
LAN
A LAN often connects the machines
in a single room, floor or building
34
LANs (Local-Area Networks)
Limited size
 Privately owned

• Centrally managed
• Usually hosts physically connected via
cables
• Homogeneous devices & protocols
• Known features (latency, bandwidth,..)
WANs (Wide-Area Networks)
Wide-Area Networks
A Wide-Area Network (WAN)
connects two or more LANs,
often over long distances
LAN
LAN
A LAN is usually owned
by one organization, but
a WAN often connects
different groups in
different countries
37
Measures

Bandwidth
• Kbps, Mbps, Gbps – Kilo, Mega, Giga bits
per second
• To convert to KBps, MBps, GBps (Bytes
per second) divide by 10 (to allow for
overhead)

Latency
• Initial delay for the first useful bit to go from
the source to the destination
38
Bandwidth vs. Latency

Which technology provides the
largest bandwidth between Tel Aviv
and NY?
• A jumbo jet loaded with DVDs
• But the latency is terrible (20 hours)

Latency is at times more important
and is generally harder to improve
than bandwidth
39
What is a protocol?
06 7647834
Welcome to Mount Hermon
ski site. For ski conditions
press 1, for reservation of ski
packages press 5, ...
5
Please select the type
of your credit card.
For Visa press 1, ...
Layering
models protocol
sketches protocol
CAD protocol
modem protocol
TCP/IP
 A protocol is a set of rules that determine how
things communicate with each other
 The software which manages Internet
communication follows a suite of protocols
called TCP/IP
 The Internet Protocol (IP) determines the
format of the information as it is transferred
 The Transmission Control Protocol (TCP)
dictates how messages are reassembled and
handles lost information
42
TCP/IP protocol suite
Application
HTTP, FTP, TELNET,...
Transport
TCP, UDP
Internet
IP
Link
Ethernet, Token-Ring,...
TCP/IP protocol suite
Taken from "TCP/IP Illustrated Vol. 1" / Richard Stevens
Packets headers
Taken from "TCP/IP Illustrated Vol. 1" / Richard Stevens
IP Layer
Transmission of packets between two
hosts
 IP addresses
 Routing protocol

IP Addresses
Class
Network ID
Host ID
32 bit
Class
A
B
C
D
E

From
0.0.0.0
128.0.0.0
192.0.0.0
224.0.0.0
240.0.0.0
InterNIC
Till
127.255.255.255
191.255.255.255
233.255.255.255
239.255.255.255
247.255.255.255
Net ID
7 bit
14 bit
21 bit
28 bit
27 bit
Host ID
24 bit
16 bit
8 bit
-
Routing
Routing Principles

A router sits on two or more LANs
• It routes packets between LANs



A router does not have a global,
end-to-end picture of the route a
packet should take
Routing is done hop by hop
“Best Effort” Delivery
• No guarantee of delivery
49
Router Protocol


Routers constantly talk to each
other to collectively decide which
routes are best
Routers can dynamically adjust
things as congestion appears or if a
link or router goes down
50
Transport Layer
 TCP
• Connection oriented
• Reliable, keeps order
 UDP
• Connectionless
• Unreliable
• Fast
Client-Server Model
Server application
Port
Server machine
144.12.34.99
Client application
Client machine
190.30.42.155
Well-Known Ports
 FTP
21
 Telnet 23
 HTTPD 80
 ...
Firewalls
A firewall poses restrictions on the
traffic in or out of a local-area
network
Examples:
Hides sensitive data from the outside
world
Prevents access of local users to
specific sites outside the local-area
network
How a Firewall Works
All the traffic (of IP-packets) in or
out of the local-area network is
forced to go through a single host
A firewall application is installed on
this host
The firewall examines all the in and
out traffic of IP-packets and
discards illegal packets
HTTP Protocol, Server-Side
and Client-Side Technologies
CGI, Servlets, JSP, Java Scripts
56
HTTP Protocol
 Hypertext
Transfer Protocol
 Used between Web-clients (e.g.,
browsers) and Web-servers (and
proxies)
 Text based
 Built on top of TCP
 Stateless protocol
HTTP Transaction -- Client
 Client
request:
• Sends a request
GET /index.html HTTP/1.0
• Sends optional header information
User-Agent: browser name
Accept:formats the browser understands
...
• Sends a blank line (\n)
• Can send post data
HTTP Transaction -- Server
 Server
response:
• sends status line
HTTP/1.0 200 OK
• sends header information
Content-type: text/html
Content-length: 3022
...
 sends
a blank line (\n)
 sends document data
Reacting to Responses of
Clients
HTML pages are static documents
 To achieve interaction with the user,
there is a need for Internet tools and
techniques that get input from the user
and react according to this input
 Sometimes there is a need to produce
output as a result of querying a
database. The output in this case is not
known in advance

Server Technologies
Some Web applications use online input to
create pages on the fly (for example, search
engines)
 A request will include, in addition to the URL
of the service provider, a list of parameters
 For example,
http://www.google.com/search?q=search-word
 The creation of the pages may also require
interaction with some applications (for
example, database queries)

Creating Pages on the Fly
in the Server

There are four common ways to serve
page requests that include input
parameters:
• CGI (Common Gateway Interface)
programming
• Java Servlets
• JSP -- Java Server Pages, or
• Microsoft ASP -- Active Server Pages
(similar to JSP)
CGI Programming
CGI is a scripting language
A cgi script works with an application that
runs on the server and creates HTML code
An early technology
Java Servlets
Servlets are java applications that some
Web servers can run
A Servlet creates pages on the fly and
these pages are returned to the requesting
browser
JSP and ASP

JSP (Java Server Pages)
• Create an HTML page that has Java code inside
HTML tags
 This page is actually a template
 The code, for example, could issue a database
query and create an HTML table for the result
• The Web server executes the code in the template
and produces a pure HTML page that is returned
to the client

Microsoft ASP (Active Server Pages)
• The code is VB (Visual Basic) scripts
• The Web server must be Microsoft IIS server
Client Technologies





Some technologies interact with the user on
the client level (Web browser)
Java Script is a scripting language that can
be added to HTML pages
Web browsers can run the script and change
the output accordingly
There is a slight interaction of the script with
the file system using cookies
Cookies are small files that store some
personal information in the file system of the
client
Separating Content from Style
XML and Style Sheets
67
Separating Content from Style

In HTML, the contents and the style of
pages are inseparable
• HTML tags actually refer only to the style
XML (eXtensible Markup Language) is a
new markup language for marking the
semantics (meaning) of the data
 XML tags describe the meaning of each
portion of text in an XML document

XML Tags
XML tags are similar to attributes in a
relation
 However, the attributes are the same for
all the records of the relation
 In XML documents, each portion of text
has its own tag

• <course> databases </course>
• <course> operating systems </course>

XML tags can be nested
Parsing XML Documents
XML facilitates easy parsing of
documents according to their semantics
 For example, the CS Department has
many Web pages of courses
 Can we write a program that reads all
these pages and prints a list of the
names of courses?
 If XML tags are used, it is easy to do
that

Using XML
 XML
is important in the context of
data exchange between
applications
 It is possible to define a common
set of tags that are suited for
specific applications
 For example, MathML is used for
exchanging mathematical
information
Showing XML Document in
Browsers
XML documents contain data with
semantic tags
For a graphical representation,
information about the style must be
added
• For example, HTML tags provide
information about the style
Style Sheets
Style is added to XML documents
by means of style sheets
There are two style-sheet
languages
• CSS -- Cascading Style Sheets
• Describe how to graphically show the data
• XSL -- XML Style-sheet Language
• Can also transform the data
Putting it All Together

A common architecture for Web
applications has several tiers
• DBMS (database management system) for
storing and processing information
• A Web server for producing pages as a
result of client requests
• A browser that supports dynamic pages
using Java scripts (for creating dynamic
pages) and CSS (for creating the desired
visual output)
How Should XML be Used?




How can we query easily and effectively XML
documents?
How can we store efficiently XML
documents?
What is the proper way to include other
resources in XML documents (i.e., figures,
sounds, etc.)?
How can we use
 a general style, and
 information that is semantically well defined
without making the process of creating
documents too cumbersome?
Course topics

Server-side programming
• JDBC for connecting to the DBMS
• Servlets
• JSP

Client-side programming
• Java Scripts
• CSS

Data storage and processing on the Web
• XML
• XSL
Search Engines
 What
are search engines?
 How do they work?
 Shortcomings of search engines
 Some popular search engines:
Infoseek, HotBot, Altavista, Excite,
Lycos, Yahoo!, Jeeves,...