Java Software Solutions Foundations of Program Design - CS

Download Report

Transcript Java Software Solutions Foundations of Program Design - CS

Basic Internet and
Networking Concepts
Representation and Management
of Data on the Internet
The Internet and the
World-Wide Web
TCP/IP and Web Browsers
2
The Internet and the Web
Internet means Inter-Network
• A world-wide network of many LANs (localarea networks)
• The LANs are of various types
Web means World-Wide Web
• A large collection of information arranged
as hypertext and stored in many computers
that are part of the Internet
The two are related but not the same
3
A Bit of History
The Internet grew very rapidly
throughout the 1980s and 90s
• Less than 600 computers were connected
to the Internet in 1983
• Now there are tens (if not hundreds) of
millions of computers
The Web started in 1989 and grew very
rapidly during the 1990s
The current Web has billions of pages
Internet Applications
 Email
 Telnet
 FTP
 Newsgroups
 World-Wide
 Chat
 ...
Web
The Web
Web Browsers
Web browsers provide a very
convenient interface for viewing the
information stored on the Web
 Mosaic – the first browser – was
introduced in 1993 and sharply
increased the popularity of the Web

TCP/IP

TCP/IP is the common language of the
Internet
• IP – Internet Protocol
• TCP – Transmission Control Protocol
The IP protocol transmits packets of
data from one host (i.e., computer) to
another
 The TCP protocol uses many packets to
transmit a long stream of data

8
TCP vs. IP

IP routes each packet from the source
host to the destination host
• IP is oblivious to the fact that usually each
packet is part of a data stream

TCP handles correctly a long data
stream
• Divides a long data stream into many
packets, at the source
• Reassembles the packets, in the right
order, at the destination
• Handles errors and lost data
9
Sockets
Sockets are a common interface that
make TCP streams look like file streams
 Modern programming languages
support sockets
 A read or a write operation to/from a
socket may block

• Until data arrives, or
• Until data can be sent

Use multiple threads so that blocking will
not cause the whole GUI to freeze
10
IP Addresses, Host Names
and URLs
11
IP Addresses
A computer connected to the Internet is
called a host
 Every host has a unique IP address
 An IP address consists of 32 bits that
are written as four decimal numbers,
separated by dots

• Example: 135.17.98.240
 The numbers denote the four bytes
composing this address
Internet Addresses
In addition to an IP address, a host may
also have a human-readable Internet
address (or hostname)
Some examples of hostnames:
www.cs.huji.ac.il
www.cocacola.com
shum.cc.huji.ac.il
The first part is the name of a particular
host (i.e., computer)
 The rest is the domain name

The Hierarchical Structure
of Hostnames

Example: www.cs.huji.ac.il
www is a name of a computer
That computer is in the CS Department
That dept. is at The Hebrew University of
Jerusalem (huji)
That university is an Academic Campus
(ac) in Israel (il)
The rightmost name, il, is the main
domain
 As we move left, the sub-domains are
more specific

The First 7 Generic Domains

com - commercial organizations
(www.cocacola.com)
edu - educational institutions
(www.berkeley.edu)
gov - U.S. governmental organizations
(www.cia.gov)
int - international organizations
mil - U.S. military
net - networks (InterNIC)
org - other organizations (www.w3.org)

More domains have been added in recent years






Country Domains


Generic domains usually refer to hosts inside the
U.S.
Other countries use two-letter country domains:
•
•
•
•

il - Israel
uk - United Kingdom
jp - Japan
se - Sweden
These domains have sub-domains that
correspond to the generic domains, for example:
• co.il is the domain of all commercial organizations in
Israel
• ac.il is the domain of all academic institutions in Israel
URLs
Each information piece on the Web has
a unique identifying address, called a
URL (Uniform Resource Locator)
 A URL takes the following form:
 http://www.huji.ac.il/index.html

protocol

hostname
file
It has 3 parts: a protocol field, a
hostname field and a file field
URL Fields
The protocol field (“http” in the previous
example) specifies the way in which the
information should be accessed
 The hostname field specifies the host
on which the information is found
 The file field specifies the particular
location in the host's file system where
the file is found
 More complex forms of URLs are
possible

Using IP Addresses in URLs
How does the browser know the IP
address of the Web server?
 One possibility is that the user explicitly
specifies the IP address of the server in
the hostname field of the URL, for
example:
http://135.17.98.240/index.html
 However, it is inconvenient for people to
remember such addresses

From Hostnames to IP Addresses
When we address a host in the Internet,
we usually use its hostname (e.g., using
a hostname in a URL)
 The browser needs to map that
hostname to the corresponding IP
address of the given host
 There is no algorithm for computing the
IP address from the hostname
 A lookup table provides the IP address
of each hostname

Where is the Translation Done?
The translation of IP addresses to
hostnames requires a lookup table
 Since there are millions of hosts on the
Internet, it is not feasible for the browser
to hold a table that maps all hostnames
to their IP-addresses
 Moreover, new hosts are added to the
Internet every day and hosts change
their names

DNS (Domain Name System)
The browser (and other Internet
applications) use a DNS Server to map
hostnames to IP addresses
 DNS is a hierarchical scheme for
naming hosts
 The command nslookup gets an IP
address and returns a hostname or
vice-versa
 It runs on clients and contacts a DNS
server

LANS, IP Addresses
and Routing
How IP Packets are transmitted
Across the Internet
23
IP Addresses

An IP address consists of 4 bytes
• Each byte is a number in the range 0 – 255
• For example, 132.64.1.10

The first 1 to 3 bytes identify the network and
the remaining 1 to 3 bytes identify hosts on
the network
• There are several classes of network addresses


Subnet masks effectively increase the
number of networks
Network Information Center (NIC) assigns IP
addresses to organizations and companies
24
Classes of IP Addresses

Class A: The first byte is 1 – 127
• 1 byte for network and 3 for host

Class B: The first byte is 128 – 191
• 2 bytes for network and 2 for host

Class C: The first byte is 192 – 223
• 3 bytes for network and 1 for host

Classes D and E: 224 – 255
• These classes have special functions, e.g.,
a multicast packet uses a class D address
25
Subnet Masking
The network part of an IP address
identifies a LAN (Local-Area Network)
 Hosts in a given LAN can be up to 100
meters from the LAN switch
 HU has one class B network address,
namely, 132.64 (CS is 132.65)

• But HU needs many LANs !

Subnet masking solves this problem
26
Defining a Subnet Mask


The subnet mask is a four-byte sequence of
1’s followed by 0’s, e.g., 255.255.255.0
IP addresses are interpreted as follows:
• Any bit that is 1 in the mask identifies the network
and any bit that is 0 identifies the host

When the subnet mask 255.255.255.0 is
applied to an IP address of Class B, e.g.,
132.64.112.52, it means that
• The first 2 bytes identify a network (HU)
• The third byte identifies a subnet, i.e., a specific
LAN
• The fourth byte identifies a host
27
Local-Area Networks (LANs)
LANs are typically built by connecting
hosts to a 100Mbit Ethernet LAN switch,
using Category 5 cables
 Maximal distance between switch and
host is 100 meters
 LAN switches transmit IP packets
between hosts on the same LAN
 LAN switches translate IP addresses to
physical addresses (MAC addresses)

28
Routers
LAN switches are connected using fiber
optics to routers
 Routers route IP packets across LANs
 A router is connected directly to two or
more LANs and it can transmit IP
packets between this LANs (local
routing)
 Some routers are connected to each
other via WANs (Wide-Area Networks)
and do backbone routing

29
Hop-by-Hop Routing
Suppose that an IP packet is sent from
a LAN to another far-away LAN
 The message gets to the router that is
directly connected to the source LAN
 The router sends it to the next hop, i.e.,

• A router on the same LAN that is also
connected to some other LANs, or
• A router on the same WAN
30
Routing Tables

Each router has routing table with prefixes of
IP address
• Each prefix has a router address for the router that
handles that prefix


Given an IP packet with some IP address, the
next-hop router is determined by matching
the longest prefix (of an IP address) from the
routing table with the given IP address
There is a default entry for the largest routers
in the backbone of the Internet
31
Updating the Routing Tables
The routing table includes local
information provided by the local
network administrator
 Router periodically update their routing
tables by exchanging information with
their neighboring routers
 Routing protocols: Distance Vector
(Bellman-Ford), Open Shortest Path
First (OSPF)

32
A Short Overview of
How the Web Works
The HTTP Protocol, Web Proxies,
Dynamic HTML Pages
33
The HTTP Protocol
 Hypertext
Transfer Protocol
 Used between Web clients (e.g.,
browsers) and Web servers (and
proxies)
 Text based
 Built on top of TCP
 Stateless protocol (it doesn’t
remember your previous requests)
Browsers Are Clients
 We
use a browser to display HTML
pages
 The browser is responsible for
fetching the HTML pages and
displaying their contents according
to the HTML rules
35
Web Servers




HTML pages are stored in file systems
Some hosts, called Web servers, can access
these HTML pages
Each Web server runs an HTTP-daemon in
order to make its HTML pages available to
other hosts
The term “Web server” refers to the software
that implements the HTTP daemon, but
sometimes it also refers to the host that runs
that software
36
HTTP Daemons
An HTTP-daemon is an application that
is constantly running on a Web server,
waiting for requests from remote hosts
 Technically, any host connected to the
Internet can act as a Web server by
running an HTTP-daemon application
 A Web client (e.g., browser) connects to
a Web server through the HTTP
protocol and requests an HTML page

37
Browser-HTTPD Interaction
user requests
http:// www.cs.huji.ac.il /index.htm
GET /index.html
host www.cs.huji.ac.il
HTTP
daemon
Web server
sends the
content of
index.html
Disk
Browser
38
Browser-HTTPD Interaction





The user requests
http://www.cs.huji.ac.il/index.html
The browser contacts the HTTP-daemon
running on the host www.cs.huji.ac.il and
requests the HTML page /index.html
The HTTP-daemon translates the requested
name to a specific file in its local file system
The HTTP-daemon reads the file index.html
from the disk and sends the content of the file
to the browser
The browser receives the HTML page, parses it
according to the HTML rules and displays it
39
HTTP Transaction – Client
 Client
request:
• The request
GET /index.html HTTP/1.0
• Optional header information
User-Agent: browser name
Accept:formats the browser understands
...
• A blank line (\n)
• The client can also send data (e.g., the data
that the user entered into an HTML form)
HTTP Transaction – Server
 Server
response:
• Status line
HTTP/1.0 200 OK
• Header information
Content-type: text/html
Content-length: 3022
...
A
blank line (\n)
 Document data
Proxy Servers
A proxy server acts as a delegate of
browsers for accessing the Web
 The browser transfers the request for a
document to the Proxy
 The Proxy contacts the Web server and
fetches the document on behalf of the
browser

42
proxy asks the
document from
the HTTPD
Proxy Server
user requests a document
browser requests the document
from the proxy
sends the
content of
Proxy server index.html
Proxy
application
Browser
Cache
43
Advantages of Proxy Servers

Proxy servers have several advantages
over direct access:
• They can be combined with a firewall to
enable restricted access to the Internet
• They enable caching of popular documents
• They can enlarge the functionality of the
browser by translating from one protocol
to another (for example, from FTP to HTTP
and vice-versa)
44
Responding to Clients’ Inputs
HTML pages are static documents
 Sometimes users supply input, for
example, keywords submitted to a
search engine
 The Web server has to react to this
input

• The output is an HTML page that is not
known in advance

In order to react to the input, the Web
server may have to use some
applications (e.g., database queries)
Server-Side Programming
Writing applications that react to clients’
inputs by creating HTML pages on the fly is
known as server-side programming
 A client request will include, in addition to the
URL of the service provider, a list of
parameters, for example:
http://www.google.com/search?q=search-word
 The response to the above request is a
dynamic HTML page and generating it may
involve interaction with other applications
(e.g., database queries)

Generating Dynamic HTML Pages
user requests:
http://www.excite.com/search?what=something
GET /search?what=something
host www.excite.com
sends the
content of
index.html
HTTPD
execution of a
application
search program
Browser
47
Server-Side and Client-Side
Technologies
Servlets, JSP and Java Scripts
48
Server-Side Technologies

There are five common tools for serverside programming; each one works with
some of the available Web servers
• CGI (Common Gateway Interface)
programming
• Java Servlets
• JSP – Java Server Pages, or
• Microsoft ASP – Active Server Pages
(similar to JSP)
• PHP
CGI Programming
CGI is a scripting language
A cgi script works with an application that
runs on the server and creates HTML
pages
An early technology
Java Servlets
Servlets are java applications that some
Web servers can run
A Servlet creates pages on the fly and
these pages are returned to the requesting
browser
JSP, ASP and PHP

JSP (Java Server Pages)
• Create an HTML page that has Java-Servlet code
inside HTML tags
 This page is actually a template
 The code, for example, could issue a database
query and create an HTML table for the result
• The Web server executes the code in the template
and produces a pure HTML page that is returned
to the client

Microsoft ASP (Active Server Pages)
• The code is VB (Visual Basic) scripts
• The Web server must be Microsoft IIS server
•
PHP is an HTML-embedded scripting
language
Client-Side Technologies
Certain parts of a Web application can
be executed locally, in the client
 For example, some validity checks can
be applied to the user’s input locally
 The user request is sent to the server
only if the input is valid
 Java Script (not part of Java!) is an
HTML-embedded scripting language for
client-side programming

Java Script




Java Script is a scripting language for
generating dynamic HTML pages in the
browser
The script is written inside an HTML page
and the browsers runs the script and displays
an ordinary HTML page
There is some interaction of the script with
the file system using cookies
Cookies are small files that store personal
information in the file system of the client
• For example, a cookie may store your user name
and password for accessing a particular site
Separating Content from Style
XML and Style Sheets
55
Separating Content from Style

In HTML, the contents and the style of
pages are inseparable
• HTML tags actually refer only to the style
XML (eXtensible Markup Language) is a
markup language for marking the
semantics (meaning) of the data
 XML tags describe the meaning of each
portion of text in an XML document

XML Tags
XML tags are similar to attributes in a
relation
 However, the attributes are the same for
all the records of the relation
 In XML documents, each portion of text
has its own tag

• <course> databases </course>
• <course> operating systems </course>

XML tags can be nested
Parsing XML Documents
XML facilitates easy parsing of
documents according to their semantics
 For example, the CS Department has
many Web pages of courses
 Can we write a program that reads all
these pages and prints a list of the
names of courses?
 If XML tags are used, it is easy to do
that

Data Exchange Using XML
 XML
is important in the context of
data exchange between
applications
 It is possible to define a common
set of tags that are suited for
specific applications
 For example, MathML is used for
exchanging mathematical
information
Showing XML Documents
in Browsers
XML documents contain data with
semantic tags
For a graphical representation,
information about the style must be
added
• For example, HTML tags provide
information about the style
Style Sheets
Style is added to XML documents
by means of style sheets
There are two style-sheet
languages
• CSS – Cascading Style Sheets
• Describe how to graphically show the data
• Can be used to give all HTML pages a
common style
• XSL – XML Style-sheet Language
• Can also transform the data (e.g., XML to
HTML transformations)
Putting it All Together

A common architecture for Web applications
has three tiers
• DBMS (database management system) for storing
and processing information
• A Web server + additional tools for interacting with
the DBMS (and possibly some other applications)
and producing dynamic HTML pages
• A browser that supports
 Java Script for locally generating dynamic
HTML pages
 CSS (and possibly XSL) for creating the
desired visual output
How Should XML be Used?
How can we query easily and effectively
XML documents?
 How can we store efficiently XML
documents?
 What is the proper way to include other
resources in XML documents (i.e.,
figures, sounds, etc.)?

The Challenge

We want to
• represent information (internally) so that
the semantics is well defined
• use the style we like for displaying the
information

Can we do that without making the
process of creating HTML pages too
cumbersome?
Topics Covered in the Course

Server-side programming
• JDBC for connecting to the DBMS
• Servlets
• JSP

Client-side programming
• Java Scripts
• CSS

Data storage and processing on the Web
• XML
• XSL
Search Engines
 What
are search engines?
 How do they work?
 Shortcomings of search engines
 Some popular search engines:
Infoseek, HotBot, Altavista, Excite,
Lycos, Yahoo!, Jeeves,...