Java Software Solutions Foundations of Program Design - CS
Download
Report
Transcript Java Software Solutions Foundations of Program Design - CS
Basic Internet and
Networking Concepts
Representation and Management
of Data on the Internet
The Internet and the
World-Wide Web
TCP/IP and Web Browsers
2
The Internet and the Web
Internet means Inter-Network
• A world-wide network of many LANs (localarea networks)
• The LANs are of various types
Web means World-Wide Web
• A large collection of information arranged
as hypertext and stored in many computers
that are part of the Internet
The two are related but not the same
3
A Bit of History
The Internet grew very rapidly
throughout the 1980s and 90s
• Less than 600 computers were connected
to the Internet in 1983
• Now there are tens (if not hundreds) of
millions of computers
The Web started in 1989 and grew very
rapidly during the 1990s
The current Web has billions of pages
Internet Applications
Email
Telnet
FTP
Newsgroups
World-Wide
Chat
...
Web
The Web
Web Browsers
Web browsers provide a very
convenient interface for viewing the
information stored on the Web
Mosaic – the first browser – was
introduced in 1993 and sharply
increased the popularity of the Web
TCP/IP
TCP/IP is the common language of the
Internet
• IP – Internet Protocol
• TCP – Transmission Control Protocol
The IP protocol transmits packets of
data from one host (i.e., computer) to
another
The TCP protocol uses many packets to
transmit a long stream of data
8
TCP vs. IP
IP routes each packet from the source
host to the destination host
• IP is oblivious to the fact that usually each
packet is part of a data stream
TCP handles correctly a long data
stream
• Divides a long data stream into many
packets, at the source
• Reassembles the packets, in the right
order, at the destination
• Handles errors and lost data
9
Sockets
Sockets are a common interface that
make TCP streams look like file streams
Modern programming languages
support sockets
A read or a write operation to/from a
socket may block
• Until data arrives, or
• Until data can be sent
Use multiple threads so that blocking will
not cause the whole GUI to freeze
10
IP Addresses, Host Names
and URLs
11
IP Addresses
A computer connected to the Internet is
called a host
Every host has a unique IP address
An IP address consists of 32 bits that
are written as four decimal numbers,
separated by dots
• Example: 135.17.98.240
The numbers denote the four bytes
composing this address
Internet Addresses
In addition to an IP address, a host may
also have a human-readable Internet
address (or hostname)
Some examples of hostnames:
www.cs.huji.ac.il
www.cocacola.com
shum.cc.huji.ac.il
The first part is the name of a particular
host (i.e., computer)
The rest is the domain name
The Hierarchical Structure
of Hostnames
Example: www.cs.huji.ac.il
www is a name of a computer
That computer is in the CS Department
That dept. is at The Hebrew University of
Jerusalem (huji)
That university is an Academic Campus
(ac) in Israel (il)
The rightmost name, il, is the main
domain
As we move left, the sub-domains are
more specific
The First 7 Generic Domains
com - commercial organizations
(www.cocacola.com)
edu - educational institutions
(www.berkeley.edu)
gov - U.S. governmental organizations
(www.cia.gov)
int - international organizations
mil - U.S. military
net - networks (InterNIC)
org - other organizations (www.w3.org)
More domains have been added in recent years
Country Domains
Generic domains usually refer to hosts inside the
U.S.
Other countries use two-letter country domains:
•
•
•
•
il - Israel
uk - United Kingdom
jp - Japan
se - Sweden
These domains have sub-domains that
correspond to the generic domains, for example:
• co.il is the domain of all commercial organizations in
Israel
• ac.il is the domain of all academic institutions in Israel
URLs
Each information piece on the Web has
a unique identifying address, called a
URL (Uniform Resource Locator)
A URL takes the following form:
http://www.huji.ac.il/index.html
protocol
hostname
file
It has 3 parts: a protocol field, a
hostname field and a file field
URL Fields
The protocol field (“http” in the previous
example) specifies the way in which the
information should be accessed
The hostname field specifies the host
on which the information is found
The file field specifies the particular
location in the host's file system where
the file is found
More complex forms of URLs are
possible
Using IP Addresses in URLs
How does the browser know the IP
address of the Web server?
One possibility is that the user explicitly
specifies the IP address of the server in
the hostname field of the URL, for
example:
http://135.17.98.240/index.html
However, it is inconvenient for people to
remember such addresses
From Hostnames to IP Addresses
When we address a host in the Internet,
we usually use its hostname (e.g., using
a hostname in a URL)
The browser needs to map that
hostname to the corresponding IP
address of the given host
There is no algorithm for computing the
IP address from the hostname
A lookup table provides the IP address
of each hostname
Where is the Translation Done?
The translation of IP addresses to
hostnames requires a lookup table
Since there are millions of hosts on the
Internet, it is not feasible for the browser
to hold a table that maps all hostnames
to their IP-addresses
Moreover, new hosts are added to the
Internet every day and hosts change
their names
DNS (Domain Name System)
The browser (and other Internet
applications) use a DNS Server to map
hostnames to IP addresses
DNS is a hierarchical scheme for
naming hosts
The command nslookup gets an IP
address and returns a hostname or
vice-versa
It runs on clients and contacts a DNS
server
LANS, IP Addresses
and Routing
How IP Packets are transmitted
Across the Internet
23
IP Addresses
An IP address consists of 4 bytes
• Each byte is a number in the range 0 – 255
• For example, 132.64.1.10
The first 1 to 3 bytes identify the network and
the remaining 1 to 3 bytes identify hosts on
the network
• There are several classes of network addresses
Subnet masks effectively increase the
number of networks
Network Information Center (NIC) assigns IP
addresses to organizations and companies
24
Classes of IP Addresses
Class A: The first byte is 1 – 127
• 1 byte for network and 3 for host
Class B: The first byte is 128 – 191
• 2 bytes for network and 2 for host
Class C: The first byte is 192 – 223
• 3 bytes for network and 1 for host
Classes D and E: 224 – 255
• These classes have special functions, e.g.,
a multicast packet uses a class D address
25
Subnet Masking
The network part of an IP address
identifies a LAN (Local-Area Network)
Hosts in a given LAN can be up to 100
meters from the LAN switch
HU has one class B network address,
namely, 132.64 (CS is 132.65)
• But HU needs many LANs !
Subnet masking solves this problem
26
Defining a Subnet Mask
The subnet mask is a four-byte sequence of
1’s followed by 0’s, e.g., 255.255.255.0
IP addresses are interpreted as follows:
• Any bit that is 1 in the mask identifies the network
and any bit that is 0 identifies the host
When the subnet mask 255.255.255.0 is
applied to an IP address of Class B, e.g.,
132.64.112.52, it means that
• The first 2 bytes identify a network (HU)
• The third byte identifies a subnet, i.e., a specific
LAN
• The fourth byte identifies a host
27
Local-Area Networks (LANs)
LANs are typically built by connecting
hosts to a 100Mbit Ethernet LAN switch,
using Category 5 cables
Maximal distance between switch and
host is 100 meters
LAN switches transmit IP packets
between hosts on the same LAN
LAN switches translate IP addresses to
physical addresses (MAC addresses)
28
Routers
LAN switches are connected using fiber
optics to routers
Routers route IP packets across LANs
A router is connected directly to two or
more LANs and it can transmit IP
packets between this LANs (local
routing)
Some routers are connected to each
other via WANs (Wide-Area Networks)
and do backbone routing
29
Hop-by-Hop Routing
Suppose that an IP packet is sent from
a LAN to another far-away LAN
The message gets to the router that is
directly connected to the source LAN
The router sends it to the next hop, i.e.,
• A router on the same LAN that is also
connected to some other LANs, or
• A router on the same WAN
30
Routing Tables
Each router has routing table with prefixes of
IP address
• Each prefix has a router address for the router that
handles that prefix
Given an IP packet with some IP address, the
next-hop router is determined by matching
the longest prefix (of an IP address) from the
routing table with the given IP address
There is a default entry for the largest routers
in the backbone of the Internet
31
Updating the Routing Tables
The routing table includes local
information provided by the local
network administrator
Router periodically update their routing
tables by exchanging information with
their neighboring routers
Routing protocols: Distance Vector
(Bellman-Ford), Open Shortest Path
First (OSPF)
32
A Short Overview of
How the Web Works
The HTTP Protocol, Web Proxies,
Dynamic HTML Pages
33
The HTTP Protocol
Hypertext
Transfer Protocol
Used between Web clients (e.g.,
browsers) and Web servers (and
proxies)
Text based
Built on top of TCP
Stateless protocol (it doesn’t
remember your previous requests)
Browsers Are Clients
We
use a browser to display HTML
pages
The browser is responsible for
fetching the HTML pages and
displaying their contents according
to the HTML rules
35
Web Servers
HTML pages are stored in file systems
Some hosts, called Web servers, can access
these HTML pages
Each Web server runs an HTTP-daemon in
order to make its HTML pages available to
other hosts
The term “Web server” refers to the software
that implements the HTTP daemon, but
sometimes it also refers to the host that runs
that software
36
HTTP Daemons
An HTTP-daemon is an application that
is constantly running on a Web server,
waiting for requests from remote hosts
Technically, any host connected to the
Internet can act as a Web server by
running an HTTP-daemon application
A Web client (e.g., browser) connects to
a Web server through the HTTP
protocol and requests an HTML page
37
Browser-HTTPD Interaction
user requests
http:// www.cs.huji.ac.il /index.htm
GET /index.html
host www.cs.huji.ac.il
HTTP
daemon
Web server
sends the
content of
index.html
Disk
Browser
38
Browser-HTTPD Interaction
The user requests
http://www.cs.huji.ac.il/index.html
The browser contacts the HTTP-daemon
running on the host www.cs.huji.ac.il and
requests the HTML page /index.html
The HTTP-daemon translates the requested
name to a specific file in its local file system
The HTTP-daemon reads the file index.html
from the disk and sends the content of the file
to the browser
The browser receives the HTML page, parses it
according to the HTML rules and displays it
39
HTTP Transaction – Client
Client
request:
• The request
GET /index.html HTTP/1.0
• Optional header information
User-Agent: browser name
Accept:formats the browser understands
...
• A blank line (\n)
• The client can also send data (e.g., the data
that the user entered into an HTML form)
HTTP Transaction – Server
Server
response:
• Status line
HTTP/1.0 200 OK
• Header information
Content-type: text/html
Content-length: 3022
...
A
blank line (\n)
Document data
Proxy Servers
A proxy server acts as a delegate of
browsers for accessing the Web
The browser transfers the request for a
document to the Proxy
The Proxy contacts the Web server and
fetches the document on behalf of the
browser
42
proxy asks the
document from
the HTTPD
Proxy Server
user requests a document
browser requests the document
from the proxy
sends the
content of
Proxy server index.html
Proxy
application
Browser
Cache
43
Advantages of Proxy Servers
Proxy servers have several advantages
over direct access:
• They can be combined with a firewall to
enable restricted access to the Internet
• They enable caching of popular documents
• They can enlarge the functionality of the
browser by translating from one protocol
to another (for example, from FTP to HTTP
and vice-versa)
44
Responding to Clients’ Inputs
HTML pages are static documents
Sometimes users supply input, for
example, keywords submitted to a
search engine
The Web server has to react to this
input
• The output is an HTML page that is not
known in advance
In order to react to the input, the Web
server may have to use some
applications (e.g., database queries)
Server-Side Programming
Writing applications that react to clients’
inputs by creating HTML pages on the fly is
known as server-side programming
A client request will include, in addition to the
URL of the service provider, a list of
parameters, for example:
http://www.google.com/search?q=search-word
The response to the above request is a
dynamic HTML page and generating it may
involve interaction with other applications
(e.g., database queries)
Generating Dynamic HTML Pages
user requests:
http://www.excite.com/search?what=something
GET /search?what=something
host www.excite.com
sends the
content of
index.html
HTTPD
execution of a
application
search program
Browser
47
Server-Side and Client-Side
Technologies
Servlets, JSP and Java Scripts
48
Server-Side Technologies
There are five common tools for serverside programming; each one works with
some of the available Web servers
• CGI (Common Gateway Interface)
programming
• Java Servlets
• JSP – Java Server Pages, or
• Microsoft ASP – Active Server Pages
(similar to JSP)
• PHP
CGI Programming
CGI is a scripting language
A cgi script works with an application that
runs on the server and creates HTML
pages
An early technology
Java Servlets
Servlets are java applications that some
Web servers can run
A Servlet creates pages on the fly and
these pages are returned to the requesting
browser
JSP, ASP and PHP
JSP (Java Server Pages)
• Create an HTML page that has Java-Servlet code
inside HTML tags
This page is actually a template
The code, for example, could issue a database
query and create an HTML table for the result
• The Web server executes the code in the template
and produces a pure HTML page that is returned
to the client
Microsoft ASP (Active Server Pages)
• The code is VB (Visual Basic) scripts
• The Web server must be Microsoft IIS server
•
PHP is an HTML-embedded scripting
language
Client-Side Technologies
Certain parts of a Web application can
be executed locally, in the client
For example, some validity checks can
be applied to the user’s input locally
The user request is sent to the server
only if the input is valid
Java Script (not part of Java!) is an
HTML-embedded scripting language for
client-side programming
Java Script
Java Script is a scripting language for
generating dynamic HTML pages in the
browser
The script is written inside an HTML page
and the browsers runs the script and displays
an ordinary HTML page
There is some interaction of the script with
the file system using cookies
Cookies are small files that store personal
information in the file system of the client
• For example, a cookie may store your user name
and password for accessing a particular site
Separating Content from Style
XML and Style Sheets
55
Separating Content from Style
In HTML, the contents and the style of
pages are inseparable
• HTML tags actually refer only to the style
XML (eXtensible Markup Language) is a
markup language for marking the
semantics (meaning) of the data
XML tags describe the meaning of each
portion of text in an XML document
XML Tags
XML tags are similar to attributes in a
relation
However, the attributes are the same for
all the records of the relation
In XML documents, each portion of text
has its own tag
• <course> databases </course>
• <course> operating systems </course>
XML tags can be nested
Parsing XML Documents
XML facilitates easy parsing of
documents according to their semantics
For example, the CS Department has
many Web pages of courses
Can we write a program that reads all
these pages and prints a list of the
names of courses?
If XML tags are used, it is easy to do
that
Data Exchange Using XML
XML
is important in the context of
data exchange between
applications
It is possible to define a common
set of tags that are suited for
specific applications
For example, MathML is used for
exchanging mathematical
information
Showing XML Documents
in Browsers
XML documents contain data with
semantic tags
For a graphical representation,
information about the style must be
added
• For example, HTML tags provide
information about the style
Style Sheets
Style is added to XML documents
by means of style sheets
There are two style-sheet
languages
• CSS – Cascading Style Sheets
• Describe how to graphically show the data
• Can be used to give all HTML pages a
common style
• XSL – XML Style-sheet Language
• Can also transform the data (e.g., XML to
HTML transformations)
Putting it All Together
A common architecture for Web applications
has three tiers
• DBMS (database management system) for storing
and processing information
• A Web server + additional tools for interacting with
the DBMS (and possibly some other applications)
and producing dynamic HTML pages
• A browser that supports
Java Script for locally generating dynamic
HTML pages
CSS (and possibly XSL) for creating the
desired visual output
How Should XML be Used?
How can we query easily and effectively
XML documents?
How can we store efficiently XML
documents?
What is the proper way to include other
resources in XML documents (i.e.,
figures, sounds, etc.)?
The Challenge
We want to
• represent information (internally) so that
the semantics is well defined
• use the style we like for displaying the
information
Can we do that without making the
process of creating HTML pages too
cumbersome?
Topics Covered in the Course
Server-side programming
• JDBC for connecting to the DBMS
• Servlets
• JSP
Client-side programming
• Java Scripts
• CSS
Data storage and processing on the Web
• XML
• XSL
Search Engines
What
are search engines?
How do they work?
Shortcomings of search engines
Some popular search engines:
Infoseek, HotBot, Altavista, Excite,
Lycos, Yahoo!, Jeeves,...