Data Modeling - Hiram College
Download
Report
Transcript Data Modeling - Hiram College
Applications
CPSC 363 Computer Networks
Ellen Walker
Hiram College
(Includes figures from Computer Networking by Kurose & Ross, © Addison Wesley
2002)
Review of Layers
• Layer 5: Application (messages)
– Implemented on hosts
• Layer 4: Transport (segments)
– Connection-oriented service (connects hosts)
• Layer 3: Network (datagrams / packets)
– Routing service (packets between hosts)
• Layer 2: Link (frames)
• Layer 1: Physical (bits - electric or light)
– Layers 2 & 1 in network card. Accurate transmission of bit
sequences across physical links (wires, cable, radio, etc.).
Application Layer point of view
• Client requests/establishes a connection with server
(via the network layer)
• Client sends and receives messages across the
connection
• Client (or server) closes the connection (again via the
network layer)
• A host can maintain multiple connections at the same
time!
– Multiple ports
– Multiple processes
Application vs. Protocol
• Network application can consist of many
components
–
–
–
–
Data storage & retrieval
Data formatting & presentation
Encoding / decoding encrypted data
Messages between client & server (or peers)
• Only the last of these uses network
application protocol
Not all standards are protocols
• Web services
– Browser-server communication (HTTP)
– Document formatting standard (HTML or XML or
PDF or JPG or …)
• Email services
– Client-server communication (SMTP, IMAP, POP)
– Message attachment formatting (MIME)
Protocols define…
• What types of messages are exchanged?
• How is the message formatted into fields?
(Syntax)
• What do the fields mean? (Semantics)
• What are legal messages and responses to
messages? (Rules)
Finding Internet Protocols
• Internet protocols are defined in RFC
(Request for Comments) documents
• A searchable database of these documents is
available at http://www.rfc-editor.org
• Not all RFC documents are standards; some
are informational, experimental, etc.
Client and Server
• Client
– Where the “user” sits, usually
– Initiates the conversation (in nearly all cases)
• Server
– Responds to client’s requests
– Provides “services” such as file storage, database
query, etc.
• Peer to Peer
– Both hosts play both roles
Process Communication
• A process is a running program with its own
program counter, registers and memory
• Most modern operating systems run many
processes simultaneously
• Processes communicate with each other
through sockets (also called APIs)
– Limited interaction with network layer
– Establish type of transport protocol and a few
parameters (e.g. segment size) only
Message Addressing
• Which host is this message going to?
– IP address (e.g. 143.206.149.21)
– We’ll discuss in detail later (network layer)
• Which process on that host will receive the
message?
– Port number (e.g. 80 for HTTP)
– Standard port numbers have been assigned (see
http://www.iana.org/assignments/port-numbers)
– We’ll discuss ports in detail later (transport layer)
Applications’ Demands on Transport
Layer
• Reliable Data Transfer
– How much loss is acceptable?
– None (e.g. financial applications) vs. some (e.g multimedia)
• Bandwidth
– What transmission rate is necessary?
– Minimum requirements (e.g. streaming video) vs. use
whatever is available (e.g. web, file transfer)
• Timing
– What end-to-end delay is acceptable?
– Some applications (e.g. telephony) have strict constraints
Transport Layer Services
• TCP
–
–
–
–
Connection-oriented (handshake)
Reliable transmission
Congestion control (throttling)
No guaranteed minimum transmission rate
• UDP
– Connectionless (no handshake)
– Unreliable (no guarantee of receipt or ordering)
– No guaranteed minimum transmission rate
Examples (fig. 2.5)
Application
App-Layer
Protocol
SMTP
Transport
Protocol
TCP
Remote
terminal
Web
Telnet
TCP
HTTP
TCP
File Transfer
FTP
TCP
E-mail
Streaming MM Proprietary
UDP or TCP
Internet
Telephony
UDP
Proprietary
Telnet
• Perhaps the simplest protocol
• Opens a TCP connection between two hosts through
a specified port
• Whatever you type is sent through the connection
• Typically used for terminal connection (now
superseded by SSH for secure connections)
• Can telnet to any port
– telnet cs.hiram.edu 23 (terminal connection; default)
– telnet cs.hiram.edu 80 (web server connection)
The WorldWide Web
• Basis in hyperlinks and hypertext (documents)
–
–
–
–
–
Proposed by Vannevar Bush (Memex) 1945
“Hypertext” coined by Ted Nelson 1965
Hypertext in education at Brown (FRESS) 1966 - 198x?
Hypercard (Apple) 1987
See http://ei.cs.vt.edu/book/chap1/htx_hist.html
• Historical path…
– FTP (file transfer protocol - files aren’t displayed)
– Gopher (displays directories & text files)
– Web (embedded links can link to any document)
•
•
•
•
Search engines
Multimedia indexes
Front ends to databases
ETC!
Web Application Vocabulary
• Web page (document) - collection of objects
– Usually base HTML file + several referenced
objects
• Object - any file addressable by a single URL
• URL (Uniform Resource Locator) - how to
reach an object
– Host address + object’s path name
• Browser - user agent (client) for the web
• Web server - houses objects
HyperText Transfer Protocol (HTTP)
• Request / response protocol
– Client requests an object (URL)
– Server provides the object(s) corresponding to that URL
– Non-persistent (1 object only) vs persistent (explicit close)
• Stateless protocol
– The server doesn’t store any knowledge (state) of the client
– E.g. server doesn’t remember what pages client looked at
HTTP Uses TCP
• Client initiates TCP connection to server, port
80
• Server accepts connection
• Client sends HTTP message & Server
responds (one or more times)
• TCP connection closed
Non-persistent HTTP conversation
•
•
•
•
Client to server: “[address] I’d like to talk to you”
Server to client: “OK, I will talk with you”
Client to server: “Thanks. Please send me [path]”
Server to Client: “Here’s the object you asked for.
[Object] Goodbye.”
• Client to server: “Got it. Goodbye”
• If the object contains embedded links, an additional
complete conversation is needed for each!
How long does it take?
• Define RTT (Round Trip Time) as time for
one message to travel from client to server &
server to client (includes all delays)
• Total time is 2*RTT+ file transmission
– Beginning of handshake (1 RTT)
– End of handshake + transmit request + first packet
of response (1 RTT)
– Transmit the rest of the file (depends on file size)
• If file has 10 images, 22*RTT + file times
Persistent Transmission
• TCP connection remains open until explicitly
closed.
• Previous example now takes 2 RTT for setup,
plus 1 RTT per request, plus file time
(12RTT+ file time)
• With pipelining, new requests are sent as files
are received, so server is never idle. Only
1RTT for setup, plus 1 RTT for all objects.
Example now takes 2RTT + file time.
Message Format (Request)
• Request line (command, addr, protocol)
GET /~walkerel/cs363/index.html HTTP/1.1
• Header lines (fields & values)
Host: cs.hiram.edu
Connection: close
User-agent: Mozilla/4.0
Accept-language: en
nonpersistent
browser id
preferred lang.
• Entity body (for POST) contains contents of
forms filled out
• End with 2 CR/LFs
HTTP Commands
• GET path
– Get a file
• GET path?var=val…
– Get a file, specify values (from form)
• POST path
– Run a program that resides at the specified path
– Program will generate a web page, which is the
server’s response
Additional Commands
• HEAD
– Requests header lines but not the actual file
• PUT (1.1 only)
– Uploads file to path specified in URL field
• DELETE (1.1 only)
– Deletes file specified by URL field
Message Format (Response)
• Status line (protocol, status code, message)
HTTP/1.1 200 OK
• Header lines
Connection: close
Date: Sat, 25 Jan 2003 12:15:00 EDT
Server: Apache/1.3.0 (Unix)
Last-Modified: …
Content-Length: …
Content-Type: text/html
• Data
Status Codes
• 200 OK
• 301 Moved Permanently (new URL provided
in Location: header)
• 400 Bad Request (generic error message)
• 404 Not Found
• 505 HTTP Version Not Supported (on this
server)
Practice HTTP
• Telnet to your favorite server, e.g.
cs.hiram.edu, using port 80
telnet cs.hiram.edu 80
• Enter HTTP message, followed by a blank
line
GET /~walkerel/cs363/index.html
Host: cs.hiram.edu
Cookies
• Information stored by the client to identify the client to
the server
• The “cookie” is a unique identification number for the
user. (e.g. to index purchase history)
• It is stored in a “cookie file” on the client machine,
and provided to the server as part of a request
message
• The server can then create “personalized” responses
• Cookies can also authenticate, so you can “save your
password”
Cookies in HTTP
• In the HTTP Response message (from the
server)
– Set-cookie: 1253261
• In future HTTP Request messages (from the
client)
– Cookie: 1253261
Web Cache (Proxy Server)
• Stores copies of recently requested items
• Browser first requests item from Proxy Server
– If item is stored, it is sent
– Otherwise, item is retrieved from external server, stored,
then sent
• Proxy Server acts as both client and server
• Proxy server can also refuse requests
– Prevent browsing to “inappropriate” sites
• Risk: page has changed since saved (stale page)
Advantage of Web Cache
• Increases average response time
– Response time of “hit” is very fast (item in cache)
– Response time across network much slower than
LAN response time
– Hit rates 0.2 - 0.7 in practice (20%-70% of
accesses are repeats)
• Average response time =
– Hit rate * LAN delay + (1-Hit rate) * net delay
Example: Cache Advantage
• Assumptions (Section 2.2.6)
– LAN delay = 0.01 sec
– Net delay = 2.01 sec
– Hit rate = 40% (0.4)
• No cache
– 2.01 seconds delay
• With cache
– .4 * 0.01 + .6 * 2.01 = 0.004+ 01.206 =
– 1.21 seconds delay
Protocol for Avoiding Stale Pages
• Server requests page only if changed (Conditional
GET)
GET file HTTP/1.1
Host: Ipaddress
If-modified-since: date
• Response if not changed:
HTTP/1.1 304 Not Modified
Date: date
Server: server
• Response if changed is the same as before
HTTP:/1.1 200 OK
Additional headers + data
File Transfer (FTP)
• Send a file from one host to another
– User can sit on “donor” or “recipient” host
• User provides authentication information
once for all transfers
– Username & password, or ‘anonymous’ & email
address
• Connection is persistent until an explicit close
• Example: ftp pub/reid.txt from rtfm.mit.edu
FTP Uses 2 Connections
• Control connection
– Sends user id, password, commands
– “Out of band” because not interspersed with data
– Port 21 (TCP)
• Data connection (TCP)
– Sends actual files
– A new data connection is created for each file
Unlike HTTP, State is maintained
• Server remembers which user is connected
– vs. HTTP Authorization header in every message
• Server remembers current directory
– vs. HTTP full path in every message
• Because state is maintained, the number of
simultaneous connection is limited, relative to
HTTP
FTP Commands
•
•
•
•
•
USER username
PASS password
LIST (list the files in the current directory)
RETR filename (retrieve from remote host)
STOR filename (store onto remote host)
• Client commands aren’t quite identical (eg.
GET, PUT) and may allow additional
arguments
Electronic Mail
• User Agent
– Allows user to send and receive email
– Generally allows access to stored email
– e.g. MS Outlook, Eudora
• Mail Server
– Delivers email, stores it in user’s mailbox (at least)
until read
– Sends off-site email; queues and retries if external
host isn’t available
SMTP (Simple Mail Transfer
Protocol)
• All messages (not just headers) restricted to
7-bit ASCII (must be encoded/decoded by
user agent)
• Transfers mail from origin host to destination
host (no intermediate servers)
• Commands include HELO, MAIL FROM,
RCPT TO, and DATA
• To try it: telnet serverName 25
SMTP vs. HTTP
• HTTP is “pull protocol”, SMTP is “push
protocol”
• SMTP requires 7-bit ASCII, even for data;
HTTP allows any format
• SMTP puts all data into one message
– MIME encoding (Multipurpose Internet Mail
Extensions)
• Content-Type: and Content-Transfer-Encoding: headers
MIME Types
• Multipart/mixed
– Look for part boundaries, content headers
•
•
•
•
Text/plain
Text/html
Image/gif or image/jpeg
Application/msword, application/pdf, etc.
Mail Headers
• Some from user (e.g. To, cc)
• Some from user agent (e.g. Date, From,
MIME headings)
• Some from servers (e.g. Received)
• Most user agents allow “full headers” to be
viewed.
Mail Header Example
•
•
•
•
•
•
•
•
•
From: Thomas Bagley [email protected]
Subject: ACM Member Technical Interest Service January 2010
Date: January 26, 2010 5:46:47 PM EST
To: Ellen Walker [email protected]
Received: from mail.hiram.edu ([206.57.41.42]) by hiramr.hiram.edu with
Microsoft SMTPSVC(6.0.3790.3959); Tue, 26 Jan 2010 17:46:54 -0500
Received: from smtp161.redcondor.net ([206.57.41.40]) by mail.hiram.edu with
Microsoft SMTPSVC(6.0.3790.3959); Tue, 26 Jan 2010 17:46:54 -0500
Received: from acm26-2.acm.org ([199.222.69.107]) by smtp161.redcondor.net
({6c7b74fb-260e-4729-9476-f743470f315e}) via TCP (inbound) with ESMTP id
20100126224651153 for <[email protected]>; Tue, 26 Jan 2010 22:46:51
+0000
Received: from acm28-8 ([192.168.1.104]) by acm26-2.acm.org (IceWarp 9.4.2)
with SMTP id IXU98141 for <[email protected]>;
Tue, 26 Jan 2010 17:46:41 -0500
Mail Header Example (Cont’d)
•
•
•
•
•
•
•
•
•
X-Rc-From: [email protected]
X-Rc-Rcpt: [email protected]
Message-Id:
10171514.34121264546007418.JavaMail.Administrator@acm28-8
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-Mailer: ColdFusion 8 Application Server
Return-Path: [email protected]
X-Originalarrivaltime: 26 Jan 2010 22:46:54.0837 (UTC)
FILETIME=[7568BA50:01CA9ED9]
Mail Access Protocols
• Protocols for conversation between user agent and
mail server (SMTP is for mail server to mail server
communication)
• POP3
– Authorization, transaction, update (after client quit)
• IMAP
– Allows users to store mail in folders on server
– Clients can access message components (e.g. headers
only)
• WebMail (HTTP)
– Mail accessed through a web page using a browser; no
application-specific client or protocol
Domain Name System (DNS)
• Translates between mnemonic hostnames and
numeric IP addresses
– www.hiram.edu = 206.57.41.47
– Command “host” can look up an address
• Distributed database implemented in hierarchy of
DNS servers
• Application-layer protocol that allows hosts to query
this database
– Used by other application-layer protocols such as HTTP for
name-address translation
– This adds a delay to each HTTP request
Database is Distributed
•
•
•
•
Root servers (13) point to…
Top-level domain servers (com, org, edu, uk,…) point to…
Authoritative servers (per organization)
Local servers (per ISP)
a Verisign, Dulles, VA
c Cogent, Herndon, VA (also LA)
d U Maryland College Park, MD
g US DoD Vienna, VA
h ARL Aberdeen, MD
j Verisign, ( 21 locations)
b USC-ISI Marina del Rey, CA
l ICANN Los Angeles, CA
k RIPE London (also 16 other locations)
i Autonomica, Stockholm (plus
28 other locations)
m WIDE Tokyo (also Seoul,
Paris, SF)
Translating a Domain (iterative)
• Ask local DNS
• Local DNS asks root DNS
• Root DNS responds with appropriate toplevel DNS
• Local DNS asks top-level DNS
• TL DNS responds with appropriate
organization’s authoritative DNS
• Local DNS asks Authoritative DNS, and
receives address (which it probably caches)
Translating a Domain (recursive)
•
•
•
•
Ask local DNS
Local DNS asks root DNS
Root DNS asks top-level DNS
TL DNS asks organization’s authoritative
DNS
• Organization’s DNS responds to TL DNS,
which forwads to Root DNS, which forwards
to local DNS, which responds to original client
• (Caching as appropriate throughout)
DNS Record Types
• A
– Name = hostname, value = IP address
• NS
– Name = domain (hiram.edu), value = hostname of
authoritative DNS
• CNAME
– Name = canonical hostname (foo.com), value = real
hostname (relay1.bar.foo.com)
• MX
– Name = canonical mail name (gmail.com), value = real
hostname (gsmtp183.google.com)
Creating a new DNS Record
• Entity “registers” the site, by paying a
registrar and providing authoritative DNS
server IP addresses
• Registrar verifies uniqueness of name and
enters NS and A type records into database
• You provide A and MX records for your own
servers at your authoritative DNS servers
Why Not Central Name Server?
• Single point of failure
• Too many requests
• Does not scale !!!!!
P2P (Peer to Peer) Applications
• All content transferred directly between peers without
passing through third-party servers
– Does not rely on always-on (24/7) servers
• When a client requests an object
– Find a server, currently connected that has that object
– Server transmits object to client
• A given host can be both client and (transient) server
• Applications include:
– File distribution
– VOIP (e.g. Skype)
File Sharing
• Goal: Distribute a file from a single server to
many hosts
• Client-server
– Huge burden on source host (must connect to all
recipients)
• P2P
– Any host that has received some portion of the file
can redistribute to others (sharing the burden)
– Most popular (2009): BitTorrent
How long does it take? (C/S)
• N bits in file, u = upload rate, dmin = min
download rate
• 1 server -> N clients
• Upload time = N*F/u
• Max download time = F / dmin
• Overall: max (N*F/u, F/dmin)
• Assume N is big enough, result is N*F/u
How Long Does It Take? (P2P)
• To get file out, server must send every bit
once (F/u)
• Slowest recipient will take F/dmin time to get
its complete file
• Total upload capacity is sum of upload
capacity of all uploaders:
– uTotal = u + u1 + u2…
• Overall time = max (F/u, F/dmin, N*F/uTotal)
• When N gets bigger, so does uTotal!
BitTorrent
• P2P Protocol for file distribution
• Torrent = collection of all peers involved in
distribution of a single file
• Chunks = equal-sized pieces of file (e.g.
256K)
• Tracker = special infrastructure node (one
tracker per torrent)
When a peer joins a torrent…
• Tracker sends peer N (say 50) random
addresses of hosts in the torrent
• Try to connect to all of them. (Successful
connections called ‘neighbors’)
• Ask each neighbor for lists of chunks they
have
• Request (from appropriate neighbor) each
chunk you don’t have
Rarest First
• Ask for the chunk which is held by the fewest
of my neighbors
• Result: more copies of that chunk, roughly
equalizing the availability of chunks
Responding to Requests (Tit for Tat)
• Host receives many requests
• Respond to requests from 4 neighbors
sending bits at highest rate (unchoked)
– These are fastest
– These are ‘most generous’
• Respond to requests from a fifth neighbor at
random (optimistically unchoked)
– Might become ‘top-4’ of this neighbor!
– Allows more neighbors to get in on the action.
Distributed Hash Tables
• Need to maintain searchable index of (key,
value) pairs
• Cannot contain it on a single host (point of
failure)
– Napster did this in the early days of P2P
• Distribute pairs among hosts
– How to avoid having all hosts contain all pairs?
– How to avoid having all hosts contact all hosts?
Answer: Use the Hash Value!
• Assign an integer to each host (same range
as hash values)
• Assign each (key, value) pair to the host
whose integer is closest to hash(key)
– Equal is closest, then successor (wraps around)
• Each host knows its successor (last -> first)
– circular overlay network
To Add or Find
• Peer receives message with hash(key)
• If ID is closer to hash(key) than successor’s
ID, then peer responds directly to message’s
sender
– ID ≥ hash(key)
– Successor ID < hash(key)
• Else peer passes on the message to its
successor
Evaluating Circular Network
• Advantage:
– Every peer needs to keep track of only 2
neighbors (predecessor and successor)
• Disadvantage:
– When the circle gets big, messages take a long
time to go around!
• Solution:
– Add a few “shortcut” links across the circle
– Trades off more neighbors vs. shorter travel time
Peer Churn
• Remember, hosts come & go, not 24/7
• By the original plan, if my successor is lost, I
am disconnected!
• Instead
– Each node tracks 2 successors
– Periodically check both your successors are there
– If one is gone, find the other’s successor so you
still have 2 successors
• To join, pass a message around the circle
P2P Case study: Skype
Skype clients (SC)
• inherently P2P: pairs of
users communicate.
• proprietary applicationSkype
layer protocol (inferred login server
via reverse
engineering)
• hierarchical overlay
with SNs
• Index maps usernames
to IP addresses;
distributed over SNs
From Kurose & Ross Slides v. 5
Supernode
(SN)
67