ppt - School of Engineering and Computer Science
Download
Report
Transcript ppt - School of Engineering and Computer Science
Basic Internet and
Networking Concepts
Representation and Management
of Data on the Internet
The Internet and the
World-Wide Web
TCP/IP and Web Browsers
2
The Internet and the Web
Internet means Inter-Network
• A world-wide network of many LANs (localarea networks)
• The LANs are of various types
Web means World-Wide Web
• A large collection of information arranged
as hypertext and stored in many computers
that are part of the Internet
The two are similar but not identical
3
A Bit of History
The Internet grew very rapidly
throughout the 1980s and 90s
Less than 600 computers were connected
to the Internet in 1983
Now there are tens (if not hundreds) of
millions of computers
The Web started in 1989 and grew very
rapidly during the 1990s
The current Web has billions of pages
Internet Applications
Email
Telnet
FTP
Newsgroups
World-Wide
Chat
...
Web
The Web
Web Browsers
Web browsers provide a very
convenient interface for viewing the
information stored on the Web
Mosaic – the first browser – was
introduced in 1993 and sharply
increased the popularity of the Web
TCP/IP
TCP/IP is the common language of the
Internet
• IP – Internet Protocol
• TCP – Transmission Control Protocol
The IP protocol transmits packets of
data from one host (computer) to
another
The TCP protocol uses many packets to
transmit a long stream of data
8
TCP vs. IP
IP routes each packet from the source
host to the destination host
• IP is oblivious to the fact that usually each
packet is part of a data stream
TCP handles correctly a long data
stream
• Divides a long data stream into many
packets, at the source
• Reassembles the packets, in the right
order, at the destination
• Handles errors and lost data
9
Sockets
Sockets are a common interface that
make TCP streams look like file streams
Modern programming languages
support sockets
A read or a write operation to/from a
socket may block
• Until data arrives, or
• Until data can be sent
Use multiple threads so that blocking
will not cause the whole GUI to freeze
10
A Short Overview of How the
Web Works
11
Web Servers
Pieces of information are stored on the
Web as HTML pages
These HTML pages are stored as files
on particular hosts of the Internet
These hosts are called Web servers
Each server runs an HTTP-daemon in
order to make its HTML pages available
to other hosts
Browsers
We
use a browser to display HTML
pages
The browser is responsible for
fetching the HTML pages and
displaying their contents according
to the HTML rules
HTTP Daemons
An HTTP-daemon is an application that
is constantly running on the server and
waits for requests from remote hosts
A host can request the daemon for an
HTML page (a file) that is located on the
server
Technically, any host connected to the
Internet can act as a Web server by
running an HTTP-daemon application
Browser - HTTPD Interaction
user requests
http:// www.cs.huji.ac.il /index.html
GET /index.html
host www.cs.huji.ac.il
sends the
content of
index.html
HTTPD
application
Browser
Disk
Browser - HTTPD Interaction
The user requests
http://www.cs.huji.ac.il/index.html
The browser contacts the HTTP-daemon
running on the host www.cs.huji.ac.il and
requests the HTML page /index.html
The HTTP-daemon translates the requested
name to a specific file in its local file system
The HTTP-daemon reads the file index.html
from the disk and sends the content of the file
to the browser
The browser receives the HTML page, parses it
according to the HTML rules and displays it
Proxy Servers
A proxy server acts as a delegate of
browsers for accessing the Web
The browser transfers the request for a
document to the Proxy
The Proxy contacts the suitable Webserver and fetches the document on
behalf of the browser
proxy asks the
document from
the HTTPD
Proxy Server
user requests a document
browser requests the document
from the proxy
sends the
content of
Proxy server index.html
Proxy
application
Cache
Browser
Advantages of Proxy Servers
Proxy servers have several advantages
over direct access:
• They can be combined with a firewall to
enable restricted access to the Internet
• They enable caching of popular documents
• They can enlarge the functionality of the
browser by translating from one protocol
to another (for example, from FTP to HTTP
and vice-versa)
Dynamically Generated
Documents
user requests
http://www.excite.com/search?what=something
GET /search?what=something
host www.excite.com
sends the
content of
index.html
HTTPD
execution of a
application
search program
Browser
IP Addresses, Host Names
and URLs
21
IP Addresses
Every host connected to the Internet
has a unique IP address that identifies it
IP addresses are 32-bit numbers that
are usually written as four decimal
numbers separated by dots, e.g.
135.17.98.240, where the numbers refer
to the four bytes composing this
address
Internet Addresses
Many hosts have, in addition to IP
address, human-readable Internet
Address (or hostnames)
Here are some examples of Internet
Addresses:
www.cs.huji.ac.il
www.cocacola.com
shum.cc.huji.ac.il
The first part is the name of a particular
host (i.e., computer)
The rest is the domain name
Internet Addresses (cont’d)
Hostnames have a hierarchical
structure
www.cs.huji.ac.il
www is a computer in the Dept. of
Computer Science (cs) at the Hebrew
University of Jerusalem, Israel (huji), which
is an Academic Campus (ac) of Israel (il)
The rightmost name describes the main
domain of the host (il - Israel); left to it,
there is a sub-domain, and then further
to the left, there are more specific subdomains
Generic Domains
There are 7 special domains that are
called generic domains
• com - commercial organizations
(www.cocacola.com)
• edu - educational institutions
(www.berkeley.com)
• gov - U.S. governmental organizations
(www.cia.gov)
• int - international organizations
• mil - U.S. military
• net - networks (InterNIC)
• org - other organizations (www.w3.org)
Country Domains
Generic domains usually refer to hosts inside
the U.S. Other countries use two-letter
country domains:
•
•
•
•
il - Israel
uk - United Kingdom
jp - Japan
se - Sweden
These domains usually have sub-domains
that correspond to the generic domains; for
example, co.il is the domain of all the
commercial organizations in Israel, and ac.il
is the domain of all the academic institutions
inside Israel
URLs
Each information piece on the Web has
a unique identifying address which is
called a URL (Uniform Resource
Locator)
A URL takes the following form:
http://www.huji.ac.il/index.html
protocol
hostname
file
It has 3 parts: a protocol field, a
hostname field and a file field
URL Fields
The protocol field (“http” in the previous
example) specifies the way in which the
information should be accessed
The host field specifies the host on
which the information is found
The file field specifies the particular
location in the host's file system where
the file is found
More complex forms of URLs are
possible
Using IP Addresses
How does the browser know the IP
address of the Web server?
One possibility is that the user explicitly
specifies the IP address of the server in
the host field of the URL, for example:
http://135.17.98.240/index.html
However, it is inconvenient for people to
remember such addresses
Back to the Browser
When we address a host in the Internet,
we usually use its hostname (e.g., using
a hostname in a URL)
The browser needs to map this
hostname into the corresponding IP
address of the given host
There is no one-to-one correspondence
between the sections of an IP address
and the sections of a hostname
Translating IP Addresses to
Hostnames
The translation of IP addresses to
hostnames requires a lookup table
Since there are millions of hosts on the
Internet, it is not feasible for the browser
to hold a table which maps all
hostnames to their IP-addresses
Moreover, new hosts are added to the
Internet every day and hosts change
their names
DNS
The
browser (and other Internet
applications) use a DNS Server to
map hostnames to IP addresses
DNS (Domain Name System) is an
hierarchical scheme for naming
hosts
Basic Networking
Concepts
33
Local-Area Networks
A Local-Area Network
(LAN) covers a small
distance and a small
number of computers
LAN
A LAN often connects the machines
in a single room, floor or building
34
LANs (Local-Area Networks)
Limited size
Privately owned
• Centrally managed
• Usually hosts physically connected via
cables
• Homogeneous devices & protocols
• Known features (latency, bandwidth,..)
WANs (Wide-Area Networks)
Wide-Area Networks
A Wide-Area Network (WAN)
connects two or more LANs,
often over long distances
LAN
LAN
A LAN is usually owned
by one organization, but
a WAN often connects
different groups in
different countries
37
Measures
Bandwidth
• Kbps, Mbps, Gbps – Kilo, Mega, Giga bits
per second
• To convert to KBps, MBps, GBps (Bytes
per second) divide by 10 (to allow for
overhead)
Latency
• Initial delay for the first useful bit to go from
the source to the destination
38
Bandwidth vs. Latency
Which technology provides the
largest bandwidth between Tel Aviv
and NY?
• A jumbo jet loaded with DVDs
• But the latency is terrible (20 hours)
Latency is at times more important
and is generally harder to improve
than bandwidth
39
What is a protocol?
06 7647834
Welcome to Mount Hermon
ski site. For ski conditions
press 1, for reservation of ski
packages press 5, ...
5
Please select the type
of your credit card.
For Visa press 1, ...
Layering
models protocol
sketches protocol
CAD protocol
modem protocol
TCP/IP
A protocol is a set of rules that determine how
things communicate with each other
The software which manages Internet
communication follows a suite of protocols
called TCP/IP
The Internet Protocol (IP) determines the
format of the information as it is transferred
The Transmission Control Protocol (TCP)
dictates how messages are reassembled and
handles lost information
42
TCP/IP protocol suite
Application
HTTP, FTP, TELNET,...
Transport
TCP, UDP
Internet
IP
Link
Ethernet, Token-Ring,...
TCP/IP protocol suite
Taken from "TCP/IP Illustrated Vol. 1" / Richard Stevens
Packets headers
Taken from "TCP/IP Illustrated Vol. 1" / Richard Stevens
IP Layer
Transmission of packets between two
hosts
IP addresses
Routing protocol
IP Addresses
Class
Network ID
Host ID
32 bit
Class
A
B
C
D
E
From
0.0.0.0
128.0.0.0
192.0.0.0
224.0.0.0
240.0.0.0
InterNIC
Till
127.255.255.255
191.255.255.255
233.255.255.255
239.255.255.255
247.255.255.255
Net ID
7 bit
14 bit
21 bit
28 bit
27 bit
Host ID
24 bit
16 bit
8 bit
-
Routing
Routing Principles
A router sits on two or more LANs
• It routes packets between LANs
A router does not have a global,
end-to-end picture of the route a
packet should take
Routing is done hop by hop
“Best Effort” Delivery
• No guarantee of delivery
49
Router Protocol
Routers constantly talk to each
other to collectively decide which
routes are best
Routers can dynamically adjust
things as congestion appears or if a
link or router goes down
50
Transport Layer
TCP
• Connection oriented
• Reliable, keeps order
UDP
• Connectionless
• Unreliable
• Fast
Client-Server Model
Server application
Port
Server machine
144.12.34.99
Client application
Client machine
190.30.42.155
Well-Known Ports
FTP
21
Telnet 23
HTTPD 80
...
Firewalls
A firewall poses restrictions on the
traffic in or out of a local-area
network
Examples:
Hides sensitive data from the outside
world
Prevents access of local users to
specific sites outside the local-area
network
How a Firewall Works
All the traffic (of IP-packets) in or
out of the local-area network is
forced to go through a single host
A firewall application is installed on
this host
The firewall examines all the in and
out traffic of IP-packets and
discards illegal packets
HTTP Protocol, Server-Side
and Client-Side Technologies
CGI, Servlets, JSP, Java Scripts
56
HTTP Protocol
Hypertext
Transfer Protocol
Used between Web-clients (e.g.,
browsers) and Web-servers (and
proxies)
Text based
Built on top of TCP
Stateless protocol
HTTP Transaction -- Client
Client
request:
• Sends a request
GET /index.html HTTP/1.0
• Sends optional header information
User-Agent: browser name
Accept:formats the browser understands
...
• Sends a blank line (\n)
• Can send post data
HTTP Transaction -- Server
Server
response:
• sends status line
HTTP/1.0 200 OK
• sends header information
Content-type: text/html
Content-length: 3022
...
sends
a blank line (\n)
sends document data
Reacting to Responses of
Clients
HTML pages are static documents
To achieve interaction with the user,
there is a need for Internet tools and
techniques that get input from the user
and react according to this input
Sometimes there is a need to produce
output as a result of querying a
database. The output in this case is not
known in advance
Server Technologies
Some Web applications use online input to
create pages on the fly (for example, search
engines)
A request will include, in addition to the URL
of the service provider, a list of parameters
For example,
http://www.google.com/search?q=search-word
The creation of the pages may also require
interaction with some applications (for
example, database queries)
Creating Pages on the Fly
in the Server
There are four common ways to serve
page requests that include input
parameters:
• CGI (Common Gateway Interface)
programming
• Java Servlets
• JSP -- Java Server Pages, or
• Microsoft ASP -- Active Server Pages
(similar to JSP)
CGI Programming
CGI is a scripting language
A cgi script works with an application that
runs on the server and creates HTML code
An early technology
Java Servlets
Servlets are java applications that some
Web servers can run
A Servlet creates pages on the fly and
these pages are returned to the requesting
browser
JSP and ASP
JSP (Java Server Pages)
• Create an HTML page that has Java code inside
HTML tags
This page is actually a template
The code, for example, could issue a database
query and create an HTML table for the result
• The Web server executes the code in the template
and produces a pure HTML page that is returned
to the client
Microsoft ASP (Active Server Pages)
• The code is VB (Visual Basic) scripts
• The Web server must be Microsoft IIS server
Client Technologies
Some technologies interact with the user on
the client level (Web browser)
Java Script is a scripting language that can
be added to HTML pages
Web browsers can run the script and change
the output accordingly
There is a slight interaction of the script with
the file system using cookies
Cookies are small files that store some
personal information in the file system of the
client
Separating Content from Style
XML and Style Sheets
67
Separating Content from Style
In HTML, the contents and the style of
pages are inseparable
• HTML tags actually refer only to the style
XML (eXtensible Markup Language) is a
new markup language for marking the
semantics (meaning) of the data
XML tags describe the meaning of each
portion of text in an XML document
XML Tags
XML tags are similar to attributes in a
relation
However, the attributes are the same for
all the records of the relation
In XML documents, each portion of text
has its own tag
• <course> databases </course>
• <course> operating systems </course>
XML tags can be nested
Parsing XML Documents
XML facilitates easy parsing of
documents according to their semantics
For example, the CS Department has
many Web pages of courses
Can we write a program that reads all
these pages and prints a list of the
names of courses?
If XML tags are used, it is easy to do
that
Using XML
XML
is important in the context of
data exchange between
applications
It is possible to define a common
set of tags that are suited for
specific applications
For example, MathML is used for
exchanging mathematical
information
Showing XML Document in
Browsers
XML documents contain data with
semantic tags
For a graphical representation,
information about the style must be
added
• For example, HTML tags provide
information about the style
Style Sheets
Style is added to XML documents
by means of style sheets
There are two style-sheet
languages
• CSS -- Cascading Style Sheets
• Describe how to graphically show the data
• XSL -- XML Style-sheet Language
• Can also transform the data
Putting it All Together
A common architecture for Web
applications has several tiers
• DBMS (database management system) for
storing and processing information
• A Web server for producing pages as a
result of client requests
• A browser that supports dynamic pages
using Java scripts (for creating dynamic
pages) and CSS (for creating the desired
visual output)
How Should XML be Used?
How can we query easily and effectively XML
documents?
How can we store efficiently XML
documents?
What is the proper way to include other
resources in XML documents (i.e., figures,
sounds, etc.)?
How can we use
a general style, and
information that is semantically well defined
without making the process of creating
documents too cumbersome?
Course topics
Server-side programming
• JDBC for connecting to the DBMS
• Servlets
• JSP
Client-side programming
• Java Scripts
• CSS
Data storage and processing on the Web
• XML
• XSL
Search Engines
What
are search engines?
How do they work?
Shortcomings of search engines
Some popular search engines:
Infoseek, HotBot, Altavista, Excite,
Lycos, Yahoo!, Jeeves,...