Transcript The WWW
The WWW
Willem Visser
RW334
Internet
• http://en.wikipedia.org/wiki/Internet
• A global network of networks connecting
computers using the internet protocol suite
(TCP/IP)
• The WWW is built on top of the internet using
hyperlinked web pages to display information
OSI vs. TCP/IP Stack
Layering: FTP Example
Application
Presentation
FTP
Application
ASCII/Binary
Session
Transport
Network
Link
Physical
The 7-layer Open System
Interconnection Model
TCP
Transport
IP
Network
Ethernet
Link
The 4-layer Internet model
TCP/IP
•
•
•
•
Transmission Control Protocol/Internet Protocol
IP is at the Network Layer
TCP is at the Transport Layer
UDP
– Forgotten stepchild in all of this
– User Datagram Protocol
– Essentially Exposes IP functionality at the Transport
Layer
Transport
Network
TCP
UDP
IP
Internet Protocol
• Routing packets
• Connectionless
• No guarantees of delivery
– Best Effort delivery
• Out of order delivery
• Only packet header is guaranteed to be errorfree due to checksum usage
Transmission Control Protocol
• TCP is built on top of IP
• Two-way connections
– Write here, see there and vice versa
• Reliable
– Destination assembles packets in order
– Resend if packet doesn’t make it
• Continuous byte-stream
– The order is preserved from sender to receiver
• Flow Control
– If network gets congested the ACK packets are spaced out
• Use sliding window to improve performance
TCP
• multiplexing: multiple programs using the same
IP address
–
–
–
–
–
–
port: a number given to each program or service
port 80: web browser
port 25: email
port 22: ssh
port 5190: AOL Instant Messenger
http://en.wikipedia.org/wiki/List_of_TCP_and_UDP_p
ort_numbers
TCP vs IP
http://www.w3schools.com/tcpip/tcpip_intro.asp
• IP is like sending a long letter as a lot of short
postcards, each one taking a potentially different route
to its destination.
• IP is communication between computers
• TCP is communication between applications
• TCP/IP
– TCP takes care of the communication between your
application software (i.e. your browser) and your network
software.
– IP takes care of the communication with other computers.
– TCP is responsible for breaking data down into IP packets
before they are sent, and for assembling the packets when
they arrive.
– IP is responsible for sending the packets to the correct
destination.
IP Addresses
128.34.56.100
Network
Host
• Encodes both the Network and the Host
• IPv4
– 32 bits, i.e. 4 bytes and thus values 0…255
– Running out of addresses
• IPv6 (will soon take over)
– 128 bits
• IP addresses are assigned statically or dynamically
(using Dynamic Host Configuration Protocol)
Domain Names
• www.cs.sun.ac.za is a domain name
• Much easier to remember than 146.232.50.1
– Which is exactly the same thing!
• Domain Name System (DNS) servers converts
domain names into IP addresses
– Often just Domain Name Servers or DNS
Standards Organizations
• World Wide Web Consortium (W3C)
– web standards
• Internet Engineering Task Force (IETF):
– internet protocol standards
• Internet Corporation for Assigned Names and
Numbers (ICANN):
– domain names
The Players
• Web Browsers
– Software Application
– Displays web pages
– IE, Firefox, Chrome, Safari,
Opera, etc.
• Web Servers
– Machine running web server
software that listens on
TCP port 80 for web page
requests (from browsers)
– Apache, Microsoft Internet
Information Server, etc.
Hypertext Transfer Protocol
• HTTP is the language of the WWW
– Commands sent from a web browser, the request
– Understood by a web server, which sends a response
– You hardly ever see these, since the browser handles them internally
• Application layer
– Above TCP (Transport) which is above IP (Network)
• Tim Berners-Lee first proposed the "WorldWideWeb" project in 1989
– With Robert Cailliau wrote the formal version in 1990
– Credited with inventing the original HTTP protocol along with HTML and the
associated technology for a web server and a text-based web browser
– First version of HTTP standard only had GET command
• Commands
– GET filename : download
– POST filename : send a web form response
– PUT filename : upload
Hypertext Transfer Protocol
• Sample Commands
– GET filename : download
– POST filename : send a web form response
– PUT filename : upload
• Don’t need a browser in fact, one can just
telnet into port 80
HTTP Return Codes
• HTTP returns a code in its response, possibly followed by HTML
content
• Some of the common codes:
– 200 OK
– 404 Page Not Found
– 500 Internal Server Error
• Code classes:
–
–
–
–
–
1XX Informational
2XX Success
3XX Redirection
4XX Client Error
5XX Server Error
• Complete list is here:
http://en.wikipedia.org/wiki/Http_error_codes
Uniform Resource Locator
• Subclass of URI (Uniform Resource Identifier)
– But few care about this and just stick with URL
– URI also contains Uniform Resource Name (URN)
• URN is analogous to a person’s name
• URL is analogous to a person’s street address
• URL: http://www.cs.sun.ac.za/rw334/Marks.xls
– Protocol: http
– Host: www.cs.sun.ac.za
– Path: /rw334/Marks.xls
• upon entering this URL into the browser:
–
–
–
–
it would ask the DNS server for the IP address of www.cs.sun.ac.za
connect to that IP address at port 80
ask the server to GET /rw334/Marks.xls
display the resulting page on the screen
More URLs
• anchor: jumps to a given section of a web page
– http://www.textpad.com/download/index.html#downloads
– the above URL fetches index.html and then jumps downward to
a part of the page labeled downloads
• port: for web servers on ports other than the default 80
– http://www.cs.washington.edu:8080/secret/money.txt
• query string: a set of parameters passed to a web program
– http://www.google.com/search?q=miserable+failure&start=10
– the above URL asks the server at www.google.com to run the
program named search and pass it two parameters:
• q (set to "miserable+failure")
• start (set to 10)
Web Technologies
• Hypertext Markup Language (HTML)
– used for writing web pages
• Cascading Style Sheets (CSS)
– supplies stylistic info to web pages
• Javascript:
– allows interactive and programmable web pages
• Asynchronous Javascript and XML (AJAX)
– allows fetching of web documents in the background for
enhanced web interaction
• PHP Hypertext Processor (PHP)
– allows the web server to create pages dynamically
HTML
• It is not WYSIWYG
– What you write is not what you see in the browser
• Next generation HTML:
– XHTML (started 2000)
• XML +HTML
– HTML5 is the new future (combining HTML and
XHTML)
• New features to handle multimedia
• Very far from a recommended standard from W3C, but
the reality is that many browsers support it already
Cascading Style Sheets (CSS)
• describe the appearance, layout, and
presentation of information on a web page
– as opposed to HTML, which describes the content of
the page
• describe how information is to be displayed
– not what is being displayed
• can be embedded in HTML document or placed
into separate .css file
– advantage of .css file
• one style sheet can be shared across many HTML
documents
Javascript
• No relation to JAVA
– Although you’d be more and more surprised to what
lengths SUN and Netscape seemed to have gone to make
this non-relationship even more confusing!
• Client-side scripting language
– Functional, borrowing syntax from C
• Browsers offer support for its execution (with a VM)
• Nowadays there is also server-side Javascript
• Example:
– <script type="text/javascript" src="hello.js"></script>
– Best practice to embed the JS within a file
Javascript Example
http://www.w3schools.com/js/tryit.asp?filename=tryjs_events
<html>
<head>
<script type="text/javascript">
function displayDate() {
document.getElementById("demo").innerHTML=Date();
}
</script>
</head>
<body>
<h1>My First Web Page</h1>
<p id="demo">This is a paragraph.</p>
<button type="button" onclick="displayDate()">Display Date</button>
</body>
</html>
Asynchronous Javascript and XML
• AJAX
• Client-side interactive webpages
– Old days pages got loaded all at once
• Retrieve data from the server asynchronously
in the background without interfering with the
display and behaviour of the existing page
• Avoids full browser reloads
PHP Hypertext Processor
• General purpose scripting language
• Widely used for generating dynamic webpages
• Embedded in the HTML and executed on the
server-side to generate the page to be
displayed
– PHP processor runs on the web-server
Roadmap
•
•
•
•
•
•
•
•
Server-side architectures
Google App Engine
Google Web Toolkit
Security
Web APIs
AJAX using jQuery
Scalability
Testing