Single System Image

Download Report

Transcript Single System Image

Single System Image and
Cluster Middleware
Approaches, Infrastructure and
Technologies
Dr. Rajkumar Buyya
Cloud Computing and Distributed Systems (CLOUDS) Lab.
The University of Melbourne, Australia
www.cloudbus.org
1
Recap: Cluster Computer
Architecture
Parallel Applications
Parallel Applications
Parallel Applications
Sequential Applications
Sequential Applications
Sequential Applications
Parallel Programming Environment
Cluster Middleware
(Single System Image and Availability Infrastructure)
PC/Workstation
PC/Workstation
PC/Workstation
PC/Workstation
Communications
Communications
Communications
Communications
Software
Software
Software
Software
Network Interface
Hardware
Network Interface
Hardware
Network Interface
Hardware
Network Interface
Hardware
Cluster Interconnection Network/Switch
2
Recap: Major issues in Cluster
design
•
Enhanced Performance (performance @ low cost)
•
Enhanced Availability (failure management)
•
Single System Image (look-and-feel of one system)
•
Size Scalability (physical & application)
•
Fast Communication (networks & protocols)
•
Load Balancing (CPU, Net, Memory, Disk)
•
Security and Encryption (clusters of clusters)
•
Distributed Environment (Social issues)
•
Manageability (admin. And control)
•
Programmability (simple API if required)
•
Applicability (cluster-aware and non-aware app.)
3
A typical Cluster Computing
Environment
Applications
PVM / MPI/ RSH
???
Hardware/OS
4
The missing link is provided by
cluster middleware/underware
Applications
PVM
PVM//MPI/
MPI/RSH
RSH
Middleware
Hardware/OS
5
Middleware Design Goals

Complete Transparency (Manageability):

Offer a single system view of a cluster system..


Scalable Performance:

Easy growth of cluster


Single entry point, ftp, telnet, software loading...
no change of API & automatic load distribution.
Enhanced Availability:

Automatic Recovery from failures


Employ checkpointing & fault tolerant technologies
Handle consistency of data when replicated..
6
What is Single System Image
(SSI)?

SSI is the illusion, created by software or
hardware, that presents a collection of
computing resources as one, more whole
resource.


In other words, it the property of a system that
hides the heterogeneous and distributed nature
of the available resources and presents them to
users and applications as a single unified
computing resource.
SSI makes the cluster appear like a single
machine to the user, to applications, and to
the network.
7
Cluster Middleware & SSI

SSI


Supported by a middleware layer that resides
between the OS and user-level environment
Middleware consists of essentially 2 sub-layers of SW
infrastructure
 SSI infrastructure


Glue together OSs on all nodes to offer unified access to
system resources
System availability infrastructure

Enable cluster services such as checkpointing, automatic
failover, recovery from failure, & fault-tolerant support
among all nodes of the cluster
8
Functional Relationship Among
Middleware SSI Modules
9
Benefits of SSI







Use of system resources transparent.
Transparent process migration and load
balancing across nodes.
Improved reliability and higher availability.
Improved system response time and
performance
Simplified system management.
Reduction in the risk of operator errors.
No need to be aware of the underlying
system architecture to use these machines
effectively.
10
Desired SSI Services/Functions

Single Entry Point:








telnet cluster.my_institute.edu
telnet node1.cluster. institute.edu
Single User Interface: using the cluster through a
single GUI window and it should provide a look and
feel of managing a single resources (e.g., PARMON).
Single File Hierarchy: /Proc, NFS, xFS, AFS, etc.
Single Control Point: Management GUI
Single Virtual Networking
Single Memory Space - Network RAM/DSM
Single Job Management: Glunix, SGE, LSF
11
Availability Support Functions

Single I/O Space:


Single Process Space:



Any node can access any peripheral or disk devices
without the knowledge of physical location.
Any process on any node create process with
cluster wide process wide and they communicate
through signal, pipes, etc, as if they are one a
single node.
Single Global Job Management System
Checkpointing and process migration:

Can saves the process state and intermediate
results in memory to disk to support rollback
recovery when node fails. RMS Load balancing...
12
SSI Levels

SSI levels of abstractions:
Application and Subsystem Level
Operating System Kernel Level
Hardware Level
13
SSI Characteristics
Every SSI has a boundary.
 Single system support can
exist at different levels
within a system, one able to
be build on another.

14
SSI Boundaries
Batch System
SSI
Boundary
Source: In search of clusters
15
SSI Middleware Implementation:
Layered approach
16
SSI at Application and
Sub-system Levels
Level
Examples
Application
batch system and
system management;
Google Search Engine
Sub-system
File system
Toolkit
Distributed DB (e.g.,
Oracle 10g),
OSF DME, Lotus
Notes, MPI, PVM
Sun NFS, OSF,
DFS, NetWare,
and so on
OSF DCE, Sun
ONC+, Apollo
Domain
Boundary
Importance
An application
What a user
wants
A sub-system
SSI for all
applications of
the sub-system
Shared portion of
the file system
Implicitly supports
many applications
and subsystems
Explicit toolkit
facilities: user,
service name, time
Best level of support
for heterogeneous
system
© Pfister, In search of clusters 17
SSI at OS Kernel Level
Level
Kernel/
OS Layer
Kernel
interfaces
Virtual
memory
Microkernel
Examples
Boundary
Importance
Each name space: Kernel support for
Solaris MC, Unixware
MOSIX, Sprite, Amoeba files, processes,
applications, adm
pipes, devices, etc. subsystems
/GLunix
UNIX (Sun) vnode,
Locus (IBM) vproc
None supporting
OS kernel
Mach, PARAS, Chorus,
OSF/1AD, Amoeba
Type of kernel
objects: files,
processes, etc.
Modularizes SSI
code within
kernel
Each distributed
virtual memory
space
May simplify
implementation
of kernel objects
Each service
outside the
microkernel
Implicit SSI for
all system services
© Pfister, In search of clusters 18
SSI at Hardware Level
Level
Examples
Boundary
Importance
Application and Subsystem Level
Operating System Kernel Level
memory
memory
device
and I/O
SCI (Scalable Coherent
Interface), Stanford DASH
SCI, SMP techniques
memory space
better communication
and synchronization
memory and I/O
device space
lower overhead
cluster I/O
© Pfister, In search of clusters 19
SSI via OS path!

1. Build as a layer on top of the existing OS



Benefits: makes the system quickly portable, tracks
vendor software upgrades, and reduces development
time.
i.e. new systems can be built quickly by mapping new
services onto the functionality provided by the layer
beneath. e.g.: Glunix.
2. Build SSI at kernel level, True Cluster OS


Good, but Can’t leverage of OS improvements by vendor.
E.g. Unixware, Solaris-MC, and MOSIX.
20
SSI Systems & Tools

OS level:




Subsystem level:


SCO NSC UnixWare;
Solaris-MC;
MOSIX, ….
PVM/MPI, TreadMarks (DSM), Glunix,
Condor, SGE, Nimrod, PBS, .., Aneka
Application level:

PARMON, Parallel Oracle, Google, ...
21
UnixWare: NonStop Cluster (NSC) OS
http://www.sco.com/products/clustering/
UP or SMP node
UP or SMP node
Users, applications, and
systems management
Standard OS
kernel calls
Standard SCO
UnixWare
with clustering
hooks
Extensions
Users, applications, and
systems management
Extensions
Modular
kernel
extensions
Standard OS
kernel calls
Standard SCO
UnixWare
with clustering
hooks
Modular
kernel
extensions
Devices
Devices
ServerNet
Other nodes
How does NonStop Clusters
Work?

Modular Extensions and Hooks to Provide:












Single Clusterwide Filesystem view;
Transparent Clusterwide device access;
Transparent swap space sharing;
Transparent Clusterwide IPC;
High Performance Internode Communications;
Transparent Clusterwide Processes, migration,etc.;
Node down cleanup and resource failover;
Transparent Clusterwide parallel TCP/IP networking;
Application Availability;
Clusterwide Membership and Cluster timesync;
Cluster System Administration;
Load Leveling.
Sun Solaris MC (Multi-Computers)

Solaris MC: A High Performance Operating System for
Clusters




A distributed OS for a multicomputer, a cluster of computing
nodes connected by a high-speed interconnect
Provide a single system image, making the cluster appear like a
single machine to the user, to applications, and the the network
Built as a globalization layer on top of the existing Solaris kernel
Interesting features





extends existing Solaris OS
preserves the existing Solaris ABI/API compliance
provides support for high availability
uses C++, IDL, CORBA in the kernel
leverages Spring OS technology
24
Solaris-MC: Solaris for
MultiComputers
Applications

System call interface

Network
File system
C++
Processes
Solaris MC
Object framework
Object invocations
Existing Solaris 2.5 kernel
Kernel
Other
nodes

global file
system
globalized
process
management
globalized
networking
and I/O
Solaris MC Architecture
http://research.sun.com/techrep/1995/abstract-48.html
25
Solaris MC components

Applications
System call interface

Network
File system
C++
Processes
Solaris MC
Other
nodes

Object framework
Object invocations
Existing Solaris 2.5 kernel
Kernel

Solaris MC Architecture

Object and
communication
support
High
availability
support
PXFS global
distributed file
system
Process
management
Networking
26
MOSIX: Multicomputer OS for UNIX
http://www.mosix.cs.huji.ac.il/ || mosix.org



An OS module (layer) that provides the
applications with the illusion of working on a single
system.
Remote operations are performed like local operations.
Transparent to the application - user interface
unchanged.
Application
PVM / MPI / RSH
Hardware/OS
27
Key Features of MOSIX
Preemptive process migration that can
migrate  any process, anywhere, anytime



Supervised by distributed algorithms that respond online to global resource availability – transparently.
Load-balancing - migrate process from over-loaded to underloaded nodes.
Memory ushering - migrate processes from a node that has
exhausted its memory, to prevent paging/swapping.
Download MOSIX:
http://www.mosix.cs.huji.ac.il/
28
SSI at Subsystem Level
Resource Management and
Scheduling
29
Resource Management and
Scheduling (RMS)



RMS system is responsible for distributing applications
among cluster nodes.
It enables the effective and efficient utilization of the
resources available
Software components

Resource manager


Resource scheduler


Queuing applications, resource location and assignment. It instructs
resource manager what to do when (policy)
Reasons for using RMS






Locating and allocating computational resource, authentication, process
creation and migration
Provide an increased, and reliable, throughput of user applications on
the systems
Load balancing
Utilizing spare CPU cycles
Providing fault tolerant systems
Manage access to powerful system, etc
Basic architecture of RMS: client-server system
30
Cluster RMS Architecture
User
Population
Manager
Node
Computation
Nodes
Resource Manager
Computation
Node 1
execution results
User 1
execution results
Job Manager
job
:
:
job
Node Status
Monitor
:
:
:
:
User u
Job Scheduler
Computation
Node c
31
Services provided by RMS

Process Migration




Checkpointing
Scavenging Idle Cycles





Computational resource has become too heavily
loaded
Fault tolerant concern
70% to 90% of the time most workstations are
idle
Fault Tolerance
Minimization of Impact on Users
Load Balancing
Multiple Application Queues
32
Some Popular
Resource Management Systems
Project
Commercial Systems - URL
LSF
SGE
http://www.platform.com/
NQE
http://www.cray.com/
LL
PBS
http://www.ibm.com/systems/clusters/software/loadleveler/
http://en.wikipedia.org/wiki/Oracle_Grid_Engine
http://www.pbsworks.com/
Public Domain System - URL
Alchemi
Condor
http://www.alchemi.net - desktop grids
GNQS
http://www.gnqs.org/
http://www.cs.wisc.edu/condor/
33
Pros and Cons of SSI Approaches

Hardware:


Operating System




Offers full SSI, but expensive to develop and maintain due to
limited market share.
It cannot be developed partially, to benefit full functionality need
to be developed, so it can be risky.
E.g., Mosix and SolarisMC
Subsystem Level


Offer the highest level of transparency, but it has rigid
architecture – not flexible while extending or enhancing the
system.
Easy to implement at benefit class of applications for which it is
designed. E.g., Job management systems such as PBS and SGE.
Application Level

Easy to realise, but requires that each application developed as
SSI-aware separately. E.g., Google
34
Additional References



R. Buyya, T. Cortes, and H. Jin, Single
System Image, International Journal of
High-Performance Computing Applications
(IJHPCA), Volume 15, No. 2, Summer 2001.
G. Pfister, In Search of Clusters, Prentice
Hall, USA.
B. Walker, Open SSI Linux Cluster Project:
http://openssi.org/ssi-intro.pdf
35