ECI, July 2005

Download Report

Transcript ECI, July 2005

Resource Management
and Balancing
ECI, July 2005
RMS – Overview





Resource management
Job management
Monitoring
Resource balancing
Information dissemination
ECI – July 2005
2
Job Management

The need



Operating system offers job and resource
management service for a single computer
The batch job control on multi-user mainframes was
performed outside the operating system
Main advantages are:



Structured resource utilization planning and control
Abstraction, easy-to-{understand,use} for user
Provide a vendor independent user interface
ECI – July 2005
3
Manager vs. Scheduler

Resource manager




Locating and allocate resources
Authentication
Process creation and migration
Resource scheduler


Queuing applications
Drive manager (enforce policy)
ECI – July 2005
4
Job Management - Requirements

A typical job management system offers








Heterogeneous support
Batch support
Parallel support
Interactive support
Checkpointing and process migration
Load balancing
Job run-time limits
GUI
ECI – July 2005
5
RMS Architecture

Prerequisites



Multi-user & multitasking capabilities
Homogeneous OS are not a restriction
In practice


“Similar” operating systems run on all machines
UNIX (in all variants) is very customary in the
context of using RMS
ECI – July 2005
6
Resource Description

Requirements





Easy to generate simple description
Powerful to generate complex description
Portable representation
Attributed components
RDL: Language to specify resources



Administrator: describe what’s available
User: describe what’s required
Hierarchical
ECI – July 2005
7
RDL Example

A 1024 nodes transputer with unix front-end
DECLARATION
BEGIN PROC Transputer
DYNAMIC;
EXCLUSIVE;
DECLARATION
BEGIN PROC Backend
DECLARATION
{ PROC; CPU=T8; MEMORY=4;
SPEED=30; REPEAT=1024; }
{ PORT; REPEAT=4; }
END PROC
ECI – July 2005
BEGIN PROC Frontend
DECLARATION
{ PROC; OS=Unix; Repeat = 4; }
END PROC
CONNECTION
FOR I = 0 to 3 DO
Backend LINK i  Frontend LINK i
OD
END PROC
8
RMS Components

User interface



At the minimum - command line user interface
GUI becoming indispensable
Typical commands



Job submission to register for execution
status display to monitor progress or failure of a job
Job deletion to cancel jobs no longer needed
ECI – July 2005
9
RMS Components (contd)

Administrative environment







Specify nodes characteristics
Define feasible job classes and map to hosts
Define user access permissions
Specify resource limitations for users and jobs
Specify policies for the assignment of jobs
Control and ensure proper operation of the RMS
Analyze accounting data to tune the system
ECI – July 2005
10
RMS Entities

Queues


Hosts




Compute hosts, control hosts
Users


Queues bound to hosts, jobs assigned to queues
Capabilities, permissions, priorities
Jobs
Resources
Policies
ECI – July 2005
11
RMS Entities – Jobs

Job: collection of computational tasks


A single program, or several interacting programs
In the context of RMS




Batch Jobs: require no manual interaction as soon
as started
Interactive Jobs: require input during runtime
Parallel Jobs: subtasks spread across several
hosts in a cluster
Check-pointing Jobs: periodically save status to
the file system and can be aborted anytime
ECI – July 2005
12
RMS Entities – Jobs

Batch jobs



Interactive jobs



Dispatch jobs according to policy and availability
Suspend/Resume & checkpoint/restart
Need to maintain a terminal connection
“Watchdog” monitor withdraw from pool
Parallel jobs


Need to integrate with parallel environment
Scheduling policy is more complex
ECI – July 2005
13
RMS Entities - Resources



Available memory, CPU time, network
bandwidth, and peripheral devices,
licenses
Jobs declare resource requirements
RMS enforces resource consumption



ensures quality of service
prevents over-subscription
detects over-usage
ECI – July 2005
14
RMS Entities - Policies

Abstract mechanisms to automate control





Resource Utilization Policies



imbalanced load is common in clusters
important/urgent work starved
unauthorized users may take advantage
users may exceed desired resource usage over time
Monitor resource consumption
Dispatch of new jobs
Scheduling Policies


Dispatch of new jobs
Relocation of jobs
ECI – July 2005
15
Resource Utilization Policies

Share based





Functional

Assignment by functional importance (priority)
Past usage is not taken into account

Time-critical applications

Administrators like power…



Resource “credit” is assigned to users, depts…
Hierarchical share tree defines sharing
Establish entitlements within time frame
Fair distribution of resources
Deadline
Manual override
ECI – July 2005
16
Scheduling Policies

Dispatch time – who, where





First-Come-First-Served
Select-Least-Loaded
Select-Fixed-Sequence
Combinations above
Relocation – who, when, where

Dynamic resource balancing
ECI – July 2005
17
Scheduling of Parallel Processes

Gang scheduling


Requires tight-coupling (MPP’s)
Co-scheduling

Demand-based



False priority
Concurrent applications
Implicit

Busy wait to not relinquish cpu
ECI – July 2005
18
RMS Challenges

Open Interfaces






Export load balancing/distributed capabilities
Export status info (load, job status, queues)
Control/assistance from application
Integration with other environments (MPI)
Extend functionality for special cases
API must be: simple, usable, abstract, robust
ECI – July 2005
19
RMS Example: CODINE

CODINE/GRD




cod_qmaster: master daemon
cod_schedd: scheduler daemon
cod_execd: execution daemon
Continuously match utilization with policies


GRD monitors and adjusts resource usage
correlated to all processes of a job
Feedback to adjust shares towards changing
requirements
ECI – July 2005
20
Static Scheduling Scheme
ECI – July 2005
21
Dynamic Scheduling Scheme
ECI – July 2005
22
RMS Example: PBS

Portable Batch Sysetm






Scheduler – job to node mapping, queues
Server – communications, logs
Control daemon (per node) – executive agent
Scope – single node
Job arrays
Task Management interface
ECI – July 2005
23
RMS Example: Condor




Condor: a distributed job scheduler
Harvest idle workstations
Job scheduling and migration
Advertising mechanism



Both job and W/S advertise presence
Jobs advertise requirements (job description file)
W/S advertise their capabilities
ECI – July 2005
24
Condor: Example JDF
universe = vanilla
# select runtime environment
executable = some_job
requirements = (Arch=="INTEL" && OpSys=="LINUX")
rank = (Memory * 10000) + KFlops
#target
arguments = -verbose
input = in.dat
# redirect to stdin
output = out.dat
# redirect to stdout
log = log.txt
Queue
# add job to queue
ECI – July 2005
25
RMS: Condor (contd)

Universe





Vanilla: sequential apps (shared FS)
MPI, PVM: integrated with parallel environment
Globus: grid computing environment
Standard: enables process migration
Process migration



Reschedule higher priority job
User reclaims her W/S
Must be linked with a special library
ECI – July 2005
26
RMS: Condor (contd)

Access to data


Shared file system
Condor file transfer mechanism



Automatically prefetch, postfetch
Remote I/O calls (in standard universe)
Architecture


Central manager
Server on each node
ECI – July 2005
27
Known Condor Pools
ECI – July 2005
28
Monitoring

Design choices





Centralized  decentralized
Periodic  request driven
Flat  hierarchical
Resolution of information
Focused view
ECI – July 2005
29
Monitoring Example: Parmon

Features:





Online creation of Node and Group database
Component, Node, Group, or entire Cluster level
Monitoring of CPU, memory, disk and network,
processes, log files etc
Facility to define events & automatic notification
Misc: message broadcast, remote admin, GUI
ECI – July 2005
30
Load Balancing






Application  system
Static  dynamic  adaptive
Centralized  decentralized
Receiver initiated  sender initiated
Parallel applications
On-line nature
ECI – July 2005
31
LB: Application Level

Application level





Round robin
Randomized
Recursive bisection
Other optimization
Hard to estimate execution times



Indeterminate no. of steps
Unpredictable load
Communication delays
ECI – July 2005
32
LB: System Level

System level



Round robin
Randomized
Estimate run time


Specified by job description
Estimate from past experience
ECI – July 2005
33
LB Example: MOSIX







Decentralized
Symmetric
Deterministic
Responsive
Stable
Competitive
Resources: CPU, memory, I/O
ECI – July 2005
34
Load Balancing Over Network

Distribute workload or network traffic load
across the cluster



Nodes may be interconnected among themselves
Must be connected to the balancing device
Processing nodes provide status information





current processor load
the application system load
number of active users
the availability of network protocol buffers
other specific resources
ECI – July 2005
35
Load Balancing Over Network

Balancing device






monitors the status of all nodes
dictates where to direct the next job
a single unit or a group in tree hierarchy
use one or more algorithms or methods
static or dynamic setting
Decide which node gets the next
incoming connection request
ECI – July 2005
36
Factors in Network Balancing


Wire-speed processing
Node operating system limitation


Balancing device limitations



Packet processing, no. of connections, interrupts
Tables, memory
Session based traffic, non-session UDP
Application dependencies (affinity)
ECI – July 2005
37
Simple Balancing Methods

Weighting


Randomization


Works good in identical node environment
Round-Robin



Assign weights to the nodes of different capacities
Commonly used by itself in DNS (address caching)
Effective where all the nodes in the cluster are
identical in capacity and performance
Hashing

Packets from the same source address will always
get assigned to the same server
ECI – July 2005
38
Simple Balancing Methods

Least Connections


Minimum Misses


Assigns to the node which currently has the least
connections ( ≠ least load )
Assign to the nodes which has processed the least
number of incoming request in its history
Fastest Response


Assigns to the node with the fastest response
Requires active monitoring of the individual nodes


Sending ICMP packets with the ‘ping’ command
Proprietary mechanism based upon UDP packets
ECI – July 2005
39
Advanced Balancing Methods

Primary optimization vectors







Node traffic – predict volume of traffic
Network traffic – monitor node state
Node-load based balancing – (which load ?)
DNS load balancing - simple
Topology-based – reduce latency
Application-specific performance
Policy based optimization

Application ,bandwidth, admin, security
ECI – July 2005
40
Common Errors

There are four common errors





Overflow
Underflow
Routing errors
Induced network errors
May destabilize efficient network clustering
ECI – July 2005
41
Information Dissemination


Central  Decentralized
Load incurred on system



Processing load
Network load
Partial knowledge – gossip algorithms

Example: finding average load
ECI – July 2005
42