Transcript Week 2

INFO 320
Server Technology I
Week 2
Server architectures
INFO 320 week 2
1
www.ischool.drexel.edu
OS and Server Architecture
• Last week we outlined the basic functions
of an operating system
• Since an OS exists to serve as a
connection between apps and the
hardware, what kind of hardware is
available and how it’s used are critical
things to consider
• …So what is a server architecture?
INFO 320 week 2
2
www.ischool.drexel.edu
Server Architecture
• Key issues in server architecture include
– What is the extent of centralization or
distribution of functions?
• There are many possible answers, not just A or B
– The main functions are managing data,
performing processing (e.g. running apps),
and determining how to display the results
to the user
• i.e. Who does what where in your system?
INFO 320 week 2
3
www.ischool.drexel.edu
Server Architecture
• Other issues to keep mind could include
– Reliability
– Availability
– Security
– Performance
INFO 320 week 2
4
www.ischool.drexel.edu
Server Architecture
• So defining server architecture is a key
step in the larger process of designing
a network
• Once the architecture is set, then can
work on details such as
– How many of each server are needed?
– How big are they (CPUs, RAM, storage)?
– What kind of links are needed among them?
INFO 320 week 2
5
www.ischool.drexel.edu
Centralization
• Centralizing all or some aspects of a
system can be good
– Take advantage of economies of scale
– Easier to staff support people
– Easier to control procurement
– Easier to enforce programming and data
structure standards
– Easier to manage security
INFO 320 week 2
6
www.ischool.drexel.edu
Centralization
• We can centralize computers, as was
done with mainframes
• Centralization doesn’t necessarily apply
to the entire system though
• We could centralize processing
– Data processing, payroll, apps unique to a
given department (CAD) might be centralized
• We could centralize data
– Big database server(s)
INFO 320 week 2
7
www.ischool.drexel.edu
Distributed data processing
• Distributed data processing (DDP) is a
possible step away from centralization
– Servers are distributed throughout the
organization in order to meet operational,
economic, and/or geographic needs
– Could still have a larger central facility with
satellite facilities, or all peer facilities
INFO 320 week 2
8
www.ischool.drexel.edu
Distributed data processing
• DDP advantages include
– Responsiveness to local needs
– Higher availability, more redundancy to
minimize impact of a single system failure
– Resource sharing can still be done with
expensive hardware
– Incremental growth is easier
• Avoids all or nothing upgrades
– More user involvement, control, productivity
INFO 320 week 2
9
www.ischool.drexel.edu
Distributed data processing
• DDP operating systems need
– Good networking capability to exchange data
– Ability to cluster machines for high availability
and high performance
– To manage processes across the distributed
environment
INFO 320 week 2
10
www.ischool.drexel.edu
Distributed processing overview
• We’ll look at critical technologies in
distributed processing
– Client/server computing
– Distributed message passing
– Remote procedure calls
– Clusters
INFO 320 week 2
11
www.ischool.drexel.edu
Client/server computing
• In a client/server environment, a client
requests information from the servers
– An API (Applications Programming Interface),
drivers, or other forms of middleware allows
communication between them
• Clients present the information in a usercuddly GUI format
INFO 320 week 2
12
www.ischool.drexel.edu
Client/server computing
• Servers exist to provide shared services
to clients
– What kind of servers could we see?
• Also keep in mind the network connecting
the clients and servers
– Is it a LAN, WAN, the Internet, or ???
– We need to be aware of the amount of traffic
we expect the network to bear
INFO 320 week 2
13
www.ischool.drexel.edu
Client/server characteristics
• A client/server architecture differs from
other distributed processing in many ways
– Strong emphasis on user-friendly apps for the
user on their system
– Often centralize database, network
management, and utility functions to control
overhead and support costs
– Open and modular systems are increasingly
common – mix products from various vendors
INFO 320 week 2
14
www.ischool.drexel.edu
Client/server characteristics
– Networking is critical, hence focus a lot of
attention to network management and
security issues
• Client/server apps communicate directly,
depending on the network protocols
(TCP/IP) to make that possible
– Even though the client and server often have
different platforms and OS’s
– Client/server apps look like Internet apps!
INFO 320 week 2
15
www.ischool.drexel.edu
Client/server characteristics
Images from (Stallings, 2009)
INFO 320 week 2
16
www.ischool.drexel.edu
Client/server database
• A common client/server app is to use a
database server
• The DBMS resides on the server, and is
called by the application logic
• Part of the app design challenge is to
make sure the network isn’t overwhelmed
by the data transfer expectations
INFO 320 week 2
17
www.ischool.drexel.edu
Client/server database
INFO 320 week 2
18
www.ischool.drexel.edu
Client/server database
• The first example is good use of
client/server, since
– The server has the job of sorting through one
million records, at which a desktop system
might cringe
– The network doesn’t have to support moving
the entire database across itself
INFO 320 week 2
19
www.ischool.drexel.edu
Client/server classes
• Four classes of client/server (C/S) apps
– Host-based processing, much like a
mainframe & dumb terminal, is not really C/S
– Server-based processing, the most serverheavy class of C/S processing
– Cooperative processing, processing is locally
optimized on the client
– Client-based processing, the most fair split of
workload
INFO 320 week 2
20
www.ischool.drexel.edu
Client/server classes
(b) Is a “thin” client app
(c) and (d) are “fat” client apps
INFO 320 week 2
21
www.ischool.drexel.edu
Three-tier client/server architecture
• In three-tier C/S, we now have a client, a
middle tier server, and a backend server
– The client is typically a thin client
– The middle tier is often an application server
• It acts as a server to the client, and as a client to
the backend server
– The backend server is often one or more
database servers
• The app server chooses which one is needed
INFO 320 week 2
22
www.ischool.drexel.edu
File consistency
• Clients and servers often cache files which
are frequently used
• When a file or database record is being
changed, the cache can be inconsistent
with the correct version
• Often address this by locking files or
records, hence the level at which data is
locked can be a key performance issue
INFO 320 week 2
23
www.ischool.drexel.edu
What is middleware?
INFO 320 week 2
24
www.ischool.drexel.edu
Middleware
• Development of C/S apps has exceeded
anyone’s ability to make standardized
application support tools
• APIs and other programming interfaces
help address this, and are generically
known as middleware
– ‘Common definitions are that middleware is
the "glue" between software components or
between software and the network or it is the
slash in Client/Server.’ From here
INFO 320 week 2
25
www.ischool.drexel.edu
Middleware
INFO 320 week 2
26
www.ischool.drexel.edu
Middleware
• Middleware describes software that
connects two or more software
applications so they can exchange data
• There are many types of middleware,
hence the confusion
– Message Oriented Middleware, Object
Middleware, RPC Middleware, Database
Middleware, Transaction Middleware, Portals
INFO 320 week 2
27
www.ischool.drexel.edu
Distributed message passing
• Within one computer, processes can pass
messages via semaphores
• In distributed systems, processes are on
different systems, so that isn’t possible
– One issue is message reliability (did it get
there?)
– Can processing continue before getting a
response? (if so, called nonblocking or
asynchronous)
INFO 320 week 2
28
www.ischool.drexel.edu
Distributed message passing
INFO 320 week 2
29
www.ischool.drexel.edu
Distributed message passing
INFO 320 week 2
30
www.ischool.drexel.edu
Remote procedure calls
• Remote procedure calls (RPCs) allow
distributed systems to communicate as
though they were on the same machine
– A remote interface can have named
operations with specific types
• Allows clearly defined documentation and static
error checking
– Helps generate code automatically, and port
code to different platforms and OS’s
INFO 320 week 2
31
www.ischool.drexel.edu
Remote procedure calls
This expands on image (b) on slide 29.
INFO 320 week 2
32
www.ischool.drexel.edu
Remote procedure calls
• Issues with using RPC include
– Passing parameters by value or pointer
– Representation of parameters (int, float, $, …)
– Client/server binding
• Nonpersistent (always make new connection)
• Persistent (keep the same binding until it expires)
– Asynchronous (let other processes continue)
or synchronous (block everything until done)
– Object-oriented RPC (see OLE or CORBA)
INFO 320 week 2
33
www.ischool.drexel.edu
SMP
• In order to get lots of computational power,
symmetric multiprocessing (SMP) was the
first option
– SMP has multiple processors
– They share main memory (RAM) and I/O
– They are connected by a bus
– Are processors are the same type (hence
the ‘symmetric part’)
INFO 320 week 2
34
www.ischool.drexel.edu
Clustering
• As the need for more computational power
grew, clustering was developed
– What kind of problems need massive CPU
power?
• Clustering is a group of interconnected
standalone computers working together
as one
– Each computer in a cluster is a node
INFO 320 week 2
35
www.ischool.drexel.edu
Clustering
• Clustering has several benefits
– Absolute scalability – can keep adding more
systems to get as much power as you can
afford
– Incremental scalability – you can add a little
more power as well, avoiding complex
upgrade paths
– High availability – lots of separate computers
means if one fails it’s not a big deal
INFO 320 week 2
36
www.ischool.drexel.edu
Clustering
– Superior price/performance since cheap
computers can be clustered
• Clusters can be classified based on
whether they share hard disks (among
other ways)
– In the first approach, each standby server has
separate disks, and they communicate via a
high speed link
– In the second approach, they share a RAID
array
INFO 320 week 2
37
www.ischool.drexel.edu
Clustering
INFO 320 week 2
38
www.ischool.drexel.edu
Clustering
• A better approach for cluster classification
is by functionality
– Passive Standby
– Active secondary
– Separate servers
– Servers connected to disks
– Servers share disks
INFO 320 week 2
39
www.ischool.drexel.edu
Clustering
• Passive Standby
– A second server takes over if the primary fails
– Easy to implement
– Wastes second server since it’s mostly
unused
– Doesn’t improve performance over a single
server
– Often not considered a true cluster
INFO 320 week 2
40
www.ischool.drexel.edu
Clustering
• Active secondary
– The second server is also used for processing
tasks
– Cheaper since second server is now used
– Increased complexity
INFO 320 week 2
41
www.ischool.drexel.edu
Clustering
• Separate servers (is (a) on slide 38)
– Servers have separate disks
– Data is copied from primary to second server
– Gives high availability and high performance
– High network and server overhead due to
copying
INFO 320 week 2
42
www.ischool.drexel.edu
Clustering
• Servers connected to disks
– Also called the shared nothing approach
– Servers are connected to a set of disks, but
each server has its own disks in that set
– Reduces need for copying among servers
– Often needs mirroring or RAID in case of disk
failure
– Windows Cluster Server is an example
INFO 320 week 2
43
www.ischool.drexel.edu
Clustering
• Servers share disks (is (b) on slide 38)
– Multiple servers share a set of disks
– Low network and server overhead
– Reduced chance of disk failure
– Requires lock manager software, plus
mirroring and/or RAID
INFO 320 week 2
44
www.ischool.drexel.edu
Clustering and the OS
• Clustering produces interesting OS
problems
– Failure management
• Either a high availability approach or a fault
tolerant approach can be used
• The latter is better at handling partial transactions
if a system fails
• Failover is the function of handing off an app and
its data when there’s a failure; the opposite is
failback
INFO 320 week 2
45
www.ischool.drexel.edu
Clustering and the OS
– Load balancing
• How do you balance how much work each system
is performing?
• A load-balancing facility must handle this and
schedule tasks accordingly
– Parallelized computation
• How is the application run on multiple systems?
– Could have a parallelizing compiler
– A parallelized application is written to run on a cluster
– Parametric computing tools can be used for simulations
that require a lot of similar runs with different conditions
INFO 320 week 2
46
www.ischool.drexel.edu
Clustering architecture
• A cluster presents itself to the user as a
single system, the single-system image
– This is possible thanks to the clustering
middleware
– The middleware also may perform load
balancing and respond to system failures
INFO 320 week 2
47
www.ischool.drexel.edu
Clustering architecture
INFO 320 week 2
48
www.ischool.drexel.edu
Clustering architecture
• The single-system image ensures that
– Single entry point
• The user logs into the cluster, not a machine
– Single file hierarchy
• The user sees files in a single file structure
– Single control point
• There is a default node used to manage the cluster
– Single virtual networking
• Any node can access the rest of the cluster
INFO 320 week 2
49
www.ischool.drexel.edu
Clustering architecture
– Single memory space
• Distributed shared memory allows programs to
share variables
– Single job management system
• A user can commit a job to run without specifying
where it runs (which node)
– Single user interface
• The same GUI supports users regardless of where
they log into the cluster
INFO 320 week 2
50
www.ischool.drexel.edu
Clustering architecture
• To improve availability, the OS allows
– Single I/O space
• Any node can access any I/O peripheral or disk
device no matter where it is
– Single process space
• A uniform process identification scheme is used
– Checkpointing
• Saves process state and data in case of failure
– Process migration
• Enables load balancing
INFO 320 week 2
51
www.ischool.drexel.edu
SMP versus clustering
• SMP is more mature technology, is easier
to manage and configure than a cluster
– SMP takes less space and power
• Clusters win when scalability, either
absolute or incremental, is critical
– Availability for clusters is also higher
INFO 320 week 2
52
www.ischool.drexel.edu
Clustering examples
• Windows Cluster Server is a sharednothing approach
• Sun Cluster is an object-oriented approach
using CORBA
– The object framework handles calls to
other nodes
– A virtual node (vnode) file system is used
INFO 320 week 2
53
www.ischool.drexel.edu
Beowulf
• Beowulf (no, not Beowulf) is one of the
oldest clustering approaches, started in
1994 using clustered PCs
– Most Beowulf clusters use Linux systems,
connected by Ethernet (LAN) or via TCP/IP
• Each node runs an autonomous Linux
kernel, yet participates in global
namespaces
INFO 320 week 2
54
www.ischool.drexel.edu
Beowulf
• Key pieces of Beowulf software are
– BPROC, the distributed process space
package, which allows a process to span
multiple nodes and can allow a new process
to be created on other nodes
– Ethernet Channel Bonding, which joins
multiple local networks into one high speed
network and does load balancing
INFO 320 week 2
55
www.ischool.drexel.edu
Beowulf
– Pvmsync is a programming environment
which helps perform synchronization and
shares data objects among processes
– EnFuzion is a set of tools for parametric
computing; creating a lot of jobs with different
input parameters or initial conditions
INFO 320 week 2
56
www.ischool.drexel.edu
References
• Operating Systems Internals and Design
Principles, by William Stallings, 6th Ed,
Pearson/Prentice Hall 2009.
ISBN 0136006329
– His web site
• What is Middleware?
http://www.middleware.org/whatis.html
INFO 320 week 2
57
www.ischool.drexel.edu