Operating System Support for MP & DSM Based Communication

Download Report

Transcript Operating System Support for MP & DSM Based Communication

Making Parallel Processing on Clusters
Efficient, Transparent and Easy for
Programmers
Andrzej M. Goscinski
School of Computing and Mathematics
Deakin University
Joint work with Michael Hobbs. Jackie Silcock and Justin Rough
1
Overview and Aims

Basic issues and solutions

Cluster Execution
environments

GENESIS
– Architecture
– Parallel processing: user
– Services for parallelism
expectations, clusters, phases
management and transparency
– Parallelism management
 GENESIS programming
– Transparency
interface
– Communication paradigms
– Message passing
– What to do?
– DSM
– Related systems
– Primitives

– Middleware
– Cluster operating systems


Easy to Use and Program
Environment
Performance Study
Summary and Future Work
2
Parallel Processing:
User Expectations

Affordable


Performance


Free from creation and placement concerns
Transparency


Good performance
Ease of Use


Supercomputers for a “poor man”
Unaware of location of processes
Ease of Programming

Choice and easy use of communication paradigm
3
Parallel Processing:
Clusters


Clusters are an ideal
platform for the
execution of parallel
applications
Many institutions
(universities, banks,
industries) move
toward homogeneous
non-dedicated
clusters

Advantages:
– Cheap to build: commodity
PCs, networks
– Widely available
– Idle during weekends
– Low utilization during
working hours

Disadvantages:
– Poor and difficult to use
software (operating systems
and runtime systems)
– User unfriendly
– Distribution of resources
(CPUs and peripherals)
4
Parallel Processing
Phases

Three distinct phases:
– Initialization
– Execution
– Termination



Researchers and manufacturers mainly concentrate
on execution to achieve the best performance
Ease of use of parallel systems and programmer’s
time are neglected
Application developers are discouraged as they
have to program many activities, which are of an
operating system nature
5
Parallelism Management



Present operating systems that manage clusters
are not built to support parallel processing
Reason: these operating systems do not provide
services to manage parallelism
Parallelism management is the management of
parallel processes and computational resources
– Achieve high performance
– Use computational resources efficiently
– Make programming and use of parallel systems easy
6
Parallelism Management

Parallelism management in parallel programming
tools, Distributed Shared Memory and enhanced
operating system environments
– has been neglected
– left to the application developers

Application developers must deal
– not only with parallel application development
– but also with the problems of initiation and control for
the execution on the cluster

Transparency and reliability (SSI) have been
neglected – users do not see a cluster as a single
powerful computer
7
Services for Parallelism
Management on Clusters







Services for parallelism management and
transparency
Establishment of a virtual machine
Mapping of processes to computers
Parallel processes instantiation
Data (including shared) distribution
Initialisation of synchronization variables
Coordination of parallel processes
Dynamic load balancing
8
Transparency


Users should see a cluster as a single powerful
computer
Dimensions of parallel processing transparency
– Location transparency
– Process relation transparency
– Execution transparency
– Device transparency
9
Communication Paradigms
Two communication paradigms:
 Message Passing (MP)
Explicit communication between processes of a parallel
application
– Fast
– Difficult to use for programmers

Distributed Shared Memory (DSM)
Implicit communication between processes of a parallel
application through shared memory objects
– Easy to use
– Demonstrates reduced performance

Claim: Operating environments that offer MP and DSM
should be provided as a part of a cluster operating system
as they manage system resources
10
What to do?

Affordable


Performance


Parallelism
management
Transparency


Introduce special
services
Ease of Use


Clusters

Operating
systems
Ease of Programming

Message passing
and DSM

Development of cluster
operating systems
supporting parallel
processing
Services of cluster
operating systems:
– Distributed services for
transparent communication
and management of basic
system resources
– Services for parallelism
management and
transparency
11
Related Systems
Message Passing Systems

PVM
– A set of cooperating server processes and specialized libraries that
support process communication, execution and synchronization
– A virtual machine must be set up by the user
– Provides transparent process creation and termination

MPI
– Objective is to standardize and coordinate the direction of various
message passing applications, tools and environments
– Provides limited process management functions to support parallel
processing

HARNESS
– Does not provide transparency
– Programmers are forced to specify computers, map processes to
these computers
– Load imbalance is neglected
12
Related Systems
DSM Systems



Research concentrates mainly on improving performance
Ease of use has been neglected
Munin
– Programmers must label different variables according to the
consistency protocol they require
– The initialisation stage requires the application developer to define
the number of computers to be used
– Programmers must create a thread on each computer, initialise
shared data and create synchronization variables

TreadMarks
– The application developer has a substantial input into initialisation of
DSM processes
– Full transparency is not provided
13
Related Systems
Execution Environments


Improvement to PVM, MPI and DSM approach of running on
top of an operating system is through the enhancement of
an operating system to support parallel processing
Beowulf
– Exploits distributed process space to manage parallel processes
– Processes can be started on remote computers after logon operation
into that computer was completed successfully
– It does not address resource allocation nor load balancing
– Transparent process migration is not provided
14
Related Systems
Execution Environments

NOW
– Combines specialized libraries and server processes with
enhancement to the kernel
– Enhancement: scheduling and communication kernel modulesGLUnix to provide network wide process, file and VM management
– Parallelism management service: process initialisation on any cluster
computer, support semi-transparent start of parallel processes on
multiple nodes (how to select nodes?), barriers, MPI

MOSIX
– Provides enhanced and transparent communication and scheduling
within the kernel
– Employs PVM to provide parallelism support (initial placement)
– Process migration transparently migrates processes
– Provides dynamic load balancing and data collection
– Remote communication is handled through the originating computer
15
Related Systems
Summary



All systems but MOSIX are based on middleware – there is
no trial to develop a comprehensive operating system to
support parallel processing on clusters
The solutions are performance driven – little work has been
done on making them programmer friendly
Problems from parallel processing point of view:
– Processes are created one at a time although primitives provided
enable the user to create multiple processes
– These systems (with the exception of MOSIX) do not provide
complete transparency
– Virtual machine is not set up automatically
– These systems do not provide load balancing
16
Cluster Execution Environments


Execution environments that support parallel
processing on clusters can be developed using
Middleware approach – at the application level
Underware – at the kernel level
17
Middleware
User
process
M
OR
User
process
M
PVM software
DSM software
Operating
System (Unix)
Operating
System (Unix)
Application processes
Library functions or
separate software
Operating system
18
Middleware - summary

Middleware allows programmers
–
–
–
–
–

to develop parallel application (PVM, MPI)
execute parallel applications on clusters (Beowulf)
employ shared memory based programming (Munin)
achieve good execution performance
take advantage of portability
Middleware
– does not offer complete transparency
– reduces potential execution performance (services are duplicated)
– forces programmers to be involved in many time consuming and error
prone activities that are of the operating system nature

Conclusion: to provide parallelism management, offer
transparency, make programming and use of a system easy
develop the needed services at the operating system level
19
Cluster operating systems


Cluster is a special kind of a distributed system
Cluster operating system supporting parallel processing
should
– possess the features of a distributed operating system to deal
with distributed resources and their management and hide
distribution
– exploit additional services to manage parallelism for application
and offer complete transparency
– provide an enhanced programming environment

Three logical levels of a cluster operating system
– Basic distributed operating system
– Parallelism management and transparency system
– Programming environment
20
Logical architecture of
a cluster operating system
Message
Passing/PVM
M
Communication
Services
PROGRAMMING
ENVIRONMENT
Shared
Memory
Parallelism
Management
System
DSM
Services
Enhanced Subset of a Distributed Operating System
(Microkernel, Communication/File Management)
21
GENESIS
Cluster Operating System




Proof of concept
Client-server model, microkernel approach and object based
approach (all entities have names)
All basic resources: processor, main memory, network,
interprocess communication, files are managed by relevant
servers
IPC - Message passing services
– basic communication paradigm
– cornerstone of the architecture
– provided by IPC Manager and local IPC component of microkernel


IPC placement and relationship with other services designed
to achieve high performance and transparency
DSM provided by Space (memory) and IPC Managers
22
The GENESIS Architecture
PVM
MP
Resource
Discovery
Global
Scheduler
Execution
Manager
DSM
System
IPC
Manager
Process
Manager
Space
Manager
Network
Manager
Parallel
Processes
DSM
File/Cache
Manager
Migration
Manager
Parallelism
Management
System
Kernel
Servers
RHODOS Microkernel
23
GENESIS Services for Parallelism
Management and Transparency





Basic services that provide parallelism management
and offer transparency:
Establishment of a virtual machine
Process creation
Process duplication
Process migration
Global scheduling
24
Establishment of
a Virtual Machine


Resource Discovery Server supports adaptive establishment
of a virtual machine
Resource Discovery Server
– Identifies




Idle and lightly loaded computers
Computer resources: e.g., processor model, memory size
Computational load and available memory
Communication patterns for each process
– Passes information to the Global Scheduling Server per




Process
Server
Averaged over an entire cluster
Virtual machine changes dynamically


Some computers become overloaded or out of order
Some computers become idle
25
Process Creation

Requirements
– Multiple process creation – to create many instances of a process on
a single or over many computers
– Scalability – must be scalable to many computers
– Complete transparency – must hide the location of all resources and
processes

Three forms of process creation:




Single
Multiple
Group
Creation is invoked when the Execution Manager receives a
process create request from a parent process
– Execution Manager notifies Global Scheduler
– Global Scheduler sends location on which process should be created
– Execution Manager on selected computer manages process creation
26
Process Creation
Single and Multiple Services

Single process creation service
– Similar to the services found in traditional systems
supporting parallel processing
– Requires executable image to be downloaded from disk for
each parallel process to be created

Multiple process creation service
– Supports the concurrent instantiation of a number of
processes on a given computer through one creation call
– When many computers are involved in multiple process
creation, each computer is addressed in a sequential manner
– Executable image of a parallel child process must be
downloaded separately for each computer involved –
scalability problem
27
Process Creation
Group


Group process creation combines multiple process
creation and group communication
Group process creation service
– allows multiple process to be created concurrently on many
computers
– Single executable is downloaded from a file server using
group communication
28
Group Process Creation
Behavior
4
Global
Scheduler
2
File
Server
3
5
Child 2
6
Computer 2
4
Exec
Manager
1
7
Exec
Manager
8
9
Parent
4
Child 1
7
Exec
Manager
5
Computer 1
Child n
Computer n
29
Process Duplication
Single Local and Remote


Parallel processes are instantiated on selected computers by
employing process duplication supported by process
migration
Three forms of process duplication




Single local and remote
Multiple local and remote
Group remote
Single local and remote process duplication
– Duplication is invoked when the Execution Manager receives a twin
request from a parent process



Execution Manager notifies Global Scheduler
Global Scheduler sends a location on which twin should be placed
If this computer is remote process migration is employed
30
Process Duplication
Multiple Local and Remote


Multiple local and remote process duplication is an
enhancement of single process duplication
Duplication is invoked when the Execution Manager receives
a multiple duplication request from a parent process
– Execution Manager notifies Global Scheduler
– Global Scheduler sends a location on which twin should be placed
– If computer is local

Process Manager and Space Manager are requested to duplicate
multiple copies of process entries and memory spaces
– If computer is remote




the parent process is migrated to this destination
multiple copies of the parent process are duplicated
the parent process on the remote computer is killed
Child processes should be duplicated on many computers
– Remote process duplication is performed for each selected computer
31
Process Duplication
Group Remote



When more than one remote computer is involved in
process duplication the overall performance decreases
Decrease is caused by migrating a parent process to each
remote computer sequentially
Performance is improved by employing group process
migration
– Process Managers and Execution Managers each join a relevant
group and use group communication
– The parent process is concurrently migrated to all selected remote
computers involved in process duplication
32
Group Remote Process Duplication
Behavior
7
Exec
Manager
Global
Scheduler
2
Exec
Manager
8
8
5M
3
6
1
4
10
Child 1
9
Migration
Manager
Parent
7
Migration
Manager
Child 2
Computer 2
5
7
Parent
Computer 1
5
5M
Exec
Manager
Parent
Migration
Manager
5
8
Child n
Computer n
33
Process Migration

Designed to separate policy from mechanism
– Process Migration Manager acts as the coordinator for migration of
various resources that combine to form a process
– Migration of resources: memory, process entries, buffers is carried
out by the Space, Process and IPC Managers, respectively


Two forms of process migration: single and group
Single process migration
– Global Scheduler provides “which” process to “where” computer
– Local Manager requests its remote peer to prepare for a process
– Local Migration Manager requests Space, Process and IPC Managers
to migrate respective resources
– Remote Manager informs its local peer of successful migration
– Local Manager requests Space, Process and IPC Managers to delete
the respective resources of the migrated process
34
Process Migration
Behavior
Event
Global
Scheduler
1
Migration
Manager
3
7
Process
2
6
Migration
Manager
5
Process
Manager
4 Process State
Space
Manager
4
IPC
Manager
4
Source Computer
Spaces
IPC Buffers
Process
Manager
Space
Manager
Process
IPC
Manager
Destination Computer
35
Group Process Migration



Enhancement of the single process migration
Modifying the single communication between the
peer Migration Managers, Process Managers, Space
Managers and IPC Managers to that of group
communication
Global Scheduler provides “which” process to “where”
computers
– Each server migrates their respective resources to multiple
destination computers in a single message using group
communication
– Parent process is duplicated on each remote computer
– At the end of successful migration the parent process on
each remote computer is killed
36
Global Scheduling



Makes policy decisions of which processes should be
mapped to which computers
Input provided by the Resource Discovery Manager
Relies on mechanisms of
– Single, multiple an group process creation and duplication
services
– Single and group process migration

The server combines services of
– Static allocation – at the initial stage of parallel processing
– Dynamic load balancing – to react to load fluctuations

Currently, the Global Scheduler is implemented as a
centralized server
37
GENESIS Programming Interface

Designed and
developed to
provide both
communication
paradigms:
– Message passing
– Shared memory
Message
Passing/PVM
M
Communication
Services
PROGRAMMING
ENVIRONMENT
Shared
Memory
Parallelism
Management
System
DSM
Services
Enhanced Subset of a Distributed Operating System
(Microkernel, Communication/File Management)
38
Message Passing

Basic Message Passing
– Exploits basic interprocess communication concepts
– Transparent and reliable local and remote IPC
– Integral component of GENESIS
– Offers standard message passing and RPC primitives

GENESIS PVM
–
–
–
–
–
PVM added to provide a well known parallelism programming tool
Ported from the UNIX based PVM
Implemented within a ‘library’ in GENESIS
Mapping of the standard PVM services onto the GENESIS services
Performance improvement of PVM on GENESIS



No additional “classic” PVM server processes required
Direct interprocess communication model instead of the default model
Load balancing provided
39
Architecture of PVM on Unix
User Task 1
libpvm
User Task 2
libpvm
TCP Connections
PVM
Server
Kernel
Computer 1
UDP Datagrams
PVM
Server
Kernel
Computer 2
40
Architecture of PVM on GENESIS
Computer 1
User PVM
Parallel Processes
PVM
Comms
libpvm
Execution
Manager
Computer 2
User PVM
Parallel Processes
libpvm
Migration
Manager
Execution
Manager
Migration
Manager
IPC
Manager
Network
Manager
Global
Scheduler
IPC
Manager
Network
Manager
Microkernel
Microkernel
Network
41
Distributed Shared Memory


DSM is an integral component of the operating system
Since DSM is a memory management function the DSM
system is integrated into the Space Manager
– Shared memory used as though it were physically shared
– Easy to use shared memory
– Low overhead, improved performance

Two consistency models supported:
– Sequential – implemented using invalidation model
– Release – implemented using write-update model

Synchronization and coordination of processes
– Semaphores - owned by Space Manager on particular computer
– Gaining ownership is distributed and mutually exclusive
– Barriers used for coordination – their management is centralized
42
Distributed Shared Memory
Computer 1
User DSM
Parallel Processes
Shared
Memory
Space
Manager DSM
IPC
Manager
Computer 2
User DSM
Parallel Processes
DSM Space
Manager
Process
Manager
IPC
Manager
Microkernel
Process
Manager
Microkernel
Network
43
GENESIS Primitives
Execution

Two groups of primitives
– to support execution services
– for the provision of communication and coordination services
MP
PVM
proc-ncreate()
pvm_spawn()
proc-ncreate()
pvm_exit()
proc-exit()
proc-exit()
DSM
44
GENESIS Primitives
Communication and Coordination
MP
PVM
DSM
send()
pvm_send()
read access
recv()
pvm_recv()
write access
barrier()
pvm_pkbuf()
wait()
pvm_unpkbuf()
signal()
pvm_barrier()
barrier()
45
Easy to Use and Program
Environment
GENESIS system



Provides and efficient and transparent environment for
execution of parallel applications
Offers transparency
Relieves programmers from activities such as:
– Selection of computers for a virtual a machine for the given
application
– Setting up a virtual machine
– Mapping processes to virtual machine
– Process instantiation using process creation and duplication
supported by process migration
– Load balancing
46
Easy to Use and Program
Environment
In the GENESIS system



Location of the remote computer(s) of the cluster is
selected automatically by Global Scheduler
Users do not know process location
Programming of parallel applications has been made easy
by providing
– Message passing: standard and PVM
– Distributed Shared Memory
– Powerful primitives: implement sequences of operations and
provide transparency process_ncreate(GROUP_CREATE,n, “child_prog”)
– Process instantiation using process creation and duplication
supported by process migration
– Load balancing
47
Performance
of Standard Parallel Applications

GENESIS System
– 13 Sun3/50 Workstations

12 Computation + 1 File Server
– 10 Mbit/sec shared Ethernet



Influence of process instantiation on execution
performance
GENESIS PVM vs. Unix PVM
Standard parallel applications
– Successive Over Relaxation
– Quicksort
– Traveling Salesman Problem
48
Influence of Process Instantiation on
Execution Performance
Parallel Simulation (5, 25, 50 Second Workload)
Speed-up
Simulation - amount of work
relates to the overall exec time

Two parameters:

– Work load (5, 25, 50 Seconds)
– Number of workstations (1 ..12)
Global scheduler & migration
Speedups for #comp = #proc
Speed-up

13
12
11
10
9
8
7
6
5
4
3
2
1
0
Ideal
Group
Multi
Single
0
1
2
3
4
5
6
7
8
9 10 11 12 13
Number of Workstations
Ideal
Group
Multi
Single
0
Speed-up

13
12
11
10
9
8
7
6
5
4
3
2
1
0
1
2
3
4
5
6
7
8
9 10 11 12 13
Number of Workstations
13
12
11
10
9
8
7
6
5
4
3
2
1
0
Ideal
Group
Multi
Single
0
1
2
3
4
5
6
7
8
9 10 11 12 13
Number of Workstations
49
GENESIS PVM vs. Unix PVM
IPC Latency
Round-trip time (including data packing and unpacking) was measured
1600
1400
1200
1000
Genesis PVM
800
Unix PVM (Default Route, No Encoding)
600
400
200
96
91
86
81
76
71
66
61
56
51
46
41
36
31
26
21
16
11
0
6

1

Support for IPC provided by the PVM server in Unix was substituted with
GENESIS operating system mechanisms
To measure the time saved by removing the server, a simple PVM
application that exchanges messages (1kbyte –100kbytes) was used
Round Trip TIme (ms)

Message Size (KBytes)
50
GENESIS PVM vs. Unix PVM
Speedup

Application used to study the influence of process instantiation amount of work relates to the overall exec time – was studied
Parameters:
– Number of workstations
– GENESIS with and without load balancing
11
9
Maximum Speedup Achieved

Optimal
7
Genesis PVM w ith Load Balancing
Genesis PVM w ithout Load Balancing
Unix PVM
5
3
1
1
2
4
6
Num ber of Workstations
8
10
12
51
Successive Over Relaxation


Parallel applications developed based on algorithms of Rice University
Rice superior cluster hardware: DEC station-5000/240 + fast ATM net
For 8 computers – array size: Rice - 512 x 2048 elements with 101
iterations; GENESIS 128 x 128 elements with 10 iterations
– DSM: TreadMarks – 6.3; GENESIS – 4.4
– PVM: Rice – 6.91; GENESIS – 5.1
Speed-up

13
12
11
10
9
8
7
6
5
4
3
2
1
0
Ideal
MP
PVM
DSM
0
1
2
3
4
5
6
7
8
9 10 11 12 13
Number of Workstations
52
Quicksort


Parallel applications developed based on algorithms of Rice
Rice superior cluster hardware: DEC station-5000/240 + fast ATM net
For 8 computers – array size: Rice - 256 x 1024 integers; GENESIS
256 x 256 integers
– DSM: TreadMarks – 5.3; GENESIS – 2.5
– PVM: Rice – 6.79; GENESIS – 6.07
Speed-up

13
12
11
10
9
8
7
6
5
4
3
2
1
0
Ideal
MP
PVM
DSM
0
1
2
3
4
5
6
7
8
9 10 11 12 13
Number of Workstations
53
Traveling Salesman Problem


Parallel applications developed based on algorithms of Rice University
Rice superior cluster hardware: DEC station-5000/240 + fast ATM net
For 8 computers – 18 city tour; with the minimum threshold set to 13
cities
– DSM: TreadMarks – 4.74; GENESIS – 6.33
– PVM: Rice – 5.63; GENESIS – 5.94
Speed-up

13
12
11
10
9
8
7
6
5
4
3
2
1
0
Ideal
MP
PVM
DSM
0
1
2
3
4
5
6
7
8
9 10 11 12 13
Number of Workstations
54
Summary

Nondedicated clusters are commonly available
– Force application developers to program operating system operations
– Do not offer transparency

Application developers need a computer system that
– Processes applications efficiently
– Uses cluster resources well
– Allows to see cluster as a single powerful computer rather than as a
set of connected computers


Proposal: employ a cluster operating system
Design: cluster operating system with three logical levels
– Distributed operating system
– Parallelism management and transparency system
– Programming environment
55
Summary

GENESIS – designed and developed as a “proof of concept”

GENESIS is a system that satisfies user requirements
GENESIS approach is unique

– Offers both message passing (MP and PVM) and DSM environment
– Services providing parallelism management are integral components
of an operating system
– Provides a comprehensive environment to transparently manage
system resources




Programmers do not have to be involved in parallelism
management
Use of the cluster is has been made easy
Complete transparency is offered
Good performance results have been achieved
56
Future Work



Port GENESIS to an Intel like platform
Use virtual memory to support DSM
Offer reliable parallel computing services on clusters
by employing
– Reliable group communication
– Checkpointing to offer fault tolerance
57