Server Architectures and Web Servers
Download
Report
Transcript Server Architectures and Web Servers
Server Architectures and Web Servers
Zachary G. Ives
University of Pennsylvania
CIS 455 / 555 – Internet and Web Systems
January 22, 2008
Today
Brief discussion of the Butler Lampson paper handed
out on Monday
Server architecture (internal)
If time: Web (HTTP) servers
Upcoming:
Data on the Web
2
Some Context
To this point, you’ve probably had significant
experience designing programs to solve specific,
relatively small tasks
It’s often a very difficult job to build a system
(What is a computing system?)
(Why is it harder?)
We will consider in this course:
Architectural aspects [Butler Lampson article]
Algorithmic aspects [e.g., two-phase commit]
Engineering aspects [e.g., build management]
3
Butler Lampson
(Abbreviated Biography from His Page)
Butler Lampson is an Architect at Microsoft Corporation and
an Adjunct Professor of Computer Science and Electrical
Engineering at MIT.
He was one of the designers of the SDS 940 time-sharing
system, the Alto personal distributed computing
system, the Xerox 9700 laser printer, two-phase commit
protocols, ...
He received the ACM’s Software Systems Award in 1984 for
his work on the Alto, the IEEE Computer Pioneer award in
1996, and the Turing Award in 1992.
4
Historical Note: Xerox Alto
1972-78
Personal computer for
research
The first GUI-based
computer (note the mouse!)
128KB RAM, 2.5MB hard disk
Ethernet
In many ways, the forerunner
to the Xerox Star
… Which begat the Apple
Lisa, and the rest is history!
5
Lampson’s Advice
6
Designing Servers
Major issues:
Concurrency
How do we handle multiple simultaneous requests?
Statefulness and sessions
Are requests self-contained, or do they require the server to
keep around state?
Communication and consistency
What state is shared across requests?
Do all requests need the same view?
… And, of course, security!!!
7
Toy Example
Suppose we want to build an “arithmetic” server
Takes a request for a computation
Parses the computation request
Performs the computation
Generates an HTML document with the result
Returns the result to the requestor
Suppose we can build on TCP…
8
Concurrency
One approach: a separate server for each request
Obviously this doesn’t work
Alternative: context-switching
Threads and processes
Events
Cooperative scheduling
Thread pools
Staged events
9
Review: Threads and Processes
Threads/processes are each written as if they are
sequential programs
But threads may also yield or wait on condition variables
Preemptive switching, based on time slicing
according to quanta (usu. 10-100msec)
States of threads: ready, running, and blocked
Different levels of sharing and overhead between the
two
10
Event Handlers
Basically, a programmer-specified way of breaking up
tasks
You’ve probably seen it if you’ve done any sort of GUI
programming
But it’s also used to multitask
Based on an event queue and a notion of an event
handler loop
Each task is broken into a series of events
Each event has a handler that does some work and
potentially enqueues another event
11
Thread Pools
Very commonly used (e.g., in many Apache products
including some versions of the Web server)
Fixed number of threads – say 100 or 200
As requests come in, they’re put onto a queue
Handler threads dequeue items and process them
12
Other Ideas
Cooperative scheduling
“Non-preemptive multitasking”: threads execute for a
while, save state, and explicitly yield
Examples of where used: old Mac OS, Windows 2.x
Why is it bad?
Staged events (SEDA – Welsh, UCB)
Tasks are broken into explicit sub-components with
different triggering events
Better for cache behavior, etc.
Scales to thousands of tasks
13
Concurrency and Debugging
A critical issue: how do we debug concurrent apps?
Consider:
Threads – pros and cons
Events – pros and cons
There’s no free lunch!
14
Statefulness and Sessions
Very early HTTP
Essentially stateless
Make a request; the response is a page that is named by the URL
More recent HTTP, and other protocols:
Some amount of state is maintained
In HTTP, this requires cookies (more later)
In many other protocols, the connection is kept open and all state is
preserved on both ends
Pros and cons of statefulness?
(Does this look at all like the threads vs. events discussion?)
15
Communication and Consistency
A key question: how much interaction is there
among server processes / requests?
Let’s consider:
Amazon.com
EBAY
Blogger.com
iTunes
Google
16
Shared, Persistent State
Generally a database back-end
Recovery and reliability features
Transaction support
Simple query interface
Often the database is on a different server from the
executing code
AJAX game
This is what Enterprise JavaBeans are
designed to support: distributed
transactions
“Model view controller” pattern
is the most common
Controller
Client-side
JScript
View
XML view
Model
Database
17
Web (HTTP) Servers
Processes HTTP requests,
generally over TCP Port 80
HTTP
request
Response uses another port
Port 80
Processing
Response Other port
May involve:
Returning a document, with
its (MIME) type info
e.g., HTML document, TXT
document
Invoking a program or
module, returning its output
Submitting form data to a
program or module,
returning its output
Resources are described
using URLs
18
The URL
URL: Uniform Resource Locator
A way of encoding protocol, login, DNS (or IP) address,
path info in one string
Special case of Uniform Resource Identifer (URI)
URL is a URI for a location from which something can be retrieved
URN is a URI for a name
General syntax:
{partition/protocol}://{userid}:{password}@{domain:port}/{path}
http://me:[email protected]/index.html
news://nntp.upenn.edu
imap://email:[email protected]/folder1
19
Handling a Web (HTTP) Request
1. Read and parse the request message
Most commonly, GET the contents of a URL
2. Translate the URL
Extract the “path” that is being requested
Determine if this is:
A “virtual directory” that’s an alias for something else
A reference to a file (HTML or SSI)
A reference to a script or servlet
3. Verify authorization / access rights
4. Generate the response (may be an error code)
20
HTTP: HyperText Transfer Protocol
A very simple, stateless protocol for sessionless
exchanges
Browser creates a new connection each time it wants to
make a new request (for a page, image, etc.)
What are the benefits of this model? Drawbacks?
Exceptions:
HTTP 1.1 added support for persistent connections and
pipelining
Clients + servers might keep state information
Cookies provide a way of recording state
21
HTTP Overview
Requests:
A small number of request types (GET, POST, PUT,
DELETE)
Request may contain additional information, e.g. client
info, parameters for forms, etc.
Responses:
Response codes: 200 (OK), 404 (not found), etc.
Metadata: content’s MIME type, length, etc.
The “payload” or data
22
A Simple HTTP Request
GET /~cis455/index.html HTTP/1.1
If-Modified-Since: Sun, 7 Jan 2007 11:12:23 GMT
Referer: http://www.cis.upenn.edu/index.html
Requests data at a path using HTTP 1.1 protocol
Example response:
HTTP/1.1 200 OK
Date: Sun, 7 Jan 2007 11:12:26 GMT
Last-Modified: Wed, 14 Jan 2004 8:30:00 GMT
Content-Type: text/html
Content-Length: 3931
…
23
Request Types
GET
Retrieve the resource at a URL
PUT
Publish the specified data at a URL
DELETE
(Self-explanatory; not always supported)
POST
Submit form content
24
Forms: Returning Data to the Server
HTML forms allow assignments of values to variables
Two means of submitting forms to apps:
GET-style – within the URL:
GET /home/my.cgi?param=val¶m2=val2
POST-style – as the data:
POST /home/second.cgi
Content-Length: 34
searchKey Penn
where www.google.com
25
Authentication and Authorization
Authentication
At minimum, user ID and password – authenticates requestor
Client may wish to authenticate the server, too!
SSL (we’ll discuss this more later)
Part of SSL: certificate from trusted server, validating machine
Also: public key for encrypting client’s transmissions
Authorization
Determine what user can access
For files, applications: typically, access control list
If data from database, may also have view-based security
We’ll talk about these in more detail later in the semester
26
Programming Support in Web Servers
Several means of supporting custom code:
CGI – Common Gateway Interface – the oldest:
A CGI is a separate program, often in Perl, invoked by the server
Certain info is passed from server to CGI via Unix-style
environment variables
QUERY_STRING; REMOTE_HOST, CONTENT_TYPE, …
HTTP post data is read from stdin
Interface to persistent process:
In essence, how communication with a database is done – Oracle
or MySQL is running “on the side”
Communicate via pipes, APIs like ODBC/JDBC, etc.
Server module running in the same process
27
Two Main Types of Server Modules
Interpreters:
JavaScript/JScript, PHP, ASP, …
Often a full-fledged programming language
Code is generally embedded within HTML, not stand-alone
Custom runtimes/virtual machines:
Most modern Perl runtimes; Java servlets; ASP.NET
A virtual machine runs within the web server process
Functions are invoked within that JVM to handle each request
Code is generally written as usual, but may need to use HTML to
create UI rather than standard GUI APIs
Most of these provide (at least limited) protection mechanisms
28
Interfacing with a Database
A very common operation:
Read some data from a database, output in a web form
e.g., postings on Slashdot, items for a product catalog, etc.
Three problems, abstracted away by
ODBC/ADO/JDBC:
Impedance mismatch from relational DBs to objects in Java
(etc.)
Standard API for different databases
Physical implementation for each DB
29
(Cross-)Session State: Cookies
Major problem with sessionless nature of HTTP:
how do we keep info between connections?
Cookie: an opaque string associated with a web site,
stored at the browser
Create in HTTP response with “Set-Cookie: xxx”
Passed in HTTP header as “Cookie: xxx”
Interpretation is up to the application
Usually, object-value pairs; passed in HTTP header:
Cookie: user=“Joe” pwd=“blob” …
Often have an expiration
Very common: “session cookies”
30
Common Web Server Architectures
How do we handle many concurrent requests?
Approach 1 – use what the OS provides:
Fork a separate process for each request
Or spawn a separate thread
Approach 2 – write your own task switcher
Break every response into small steps
Schedule with custom event-driven dispatcher
Approach 3 – pool of handlers:
Create a thread pool that switches among requests or steps
31
Content Management Systems
Generally, a “middleware” that runs under the web
server (or provides its own)
Provides content integration from multiple sources
Perhaps SQL or XML databases
Perhaps text files, RSS feeds, etc.
Often provides content authoring & assembly tools
Typically, provides templates or other similar features for
describing how to assemble the site
Common examples:
MS Content Management Server; Slash; Apache Cocoon
32
Ways of Handling Many Requests
Web server “listens” on port 80 – “daemon” task
Upon a request, it needs to invoke a response
How should that response task get executed?
33
Readings
Please read for further depth:
Rexford chapter on HTTP servers
http://www.novocode.com/doc/servlet-essentials/ for an
overview of servlets
You will need to learn:
Enough about HTTP to handle GET, POST, cookies, etc.
Enough about Java threads to write your own thread
pools for a Web server
Enough about servlets to run them (including sessions)
34