ECI, July 2005
Download
Report
Transcript ECI, July 2005
Resource Management
and Balancing
ECI, July 2005
RMS – Overview
Resource management
Job management
Monitoring
Resource balancing
Information dissemination
ECI – July 2005
2
Job Management
The need
Operating system offers job and resource
management service for a single computer
The batch job control on multi-user mainframes was
performed outside the operating system
Main advantages are:
Structured resource utilization planning and control
Abstraction, easy-to-{understand,use} for user
Provide a vendor independent user interface
ECI – July 2005
3
Manager vs. Scheduler
Resource manager
Locating and allocate resources
Authentication
Process creation and migration
Resource scheduler
Queuing applications
Drive manager (enforce policy)
ECI – July 2005
4
Job Management - Requirements
A typical job management system offers
Heterogeneous support
Batch support
Parallel support
Interactive support
Checkpointing and process migration
Load balancing
Job run-time limits
GUI
ECI – July 2005
5
RMS Architecture
Prerequisites
Multi-user & multitasking capabilities
Homogeneous OS are not a restriction
In practice
“Similar” operating systems run on all machines
UNIX (in all variants) is very customary in the
context of using RMS
ECI – July 2005
6
Resource Description
Requirements
Easy to generate simple description
Powerful to generate complex description
Portable representation
Attributed components
RDL: Language to specify resources
Administrator: describe what’s available
User: describe what’s required
Hierarchical
ECI – July 2005
7
RDL Example
A 1024 nodes transputer with unix front-end
DECLARATION
BEGIN PROC Transputer
DYNAMIC;
EXCLUSIVE;
DECLARATION
BEGIN PROC Backend
DECLARATION
{ PROC; CPU=T8; MEMORY=4;
SPEED=30; REPEAT=1024; }
{ PORT; REPEAT=4; }
END PROC
ECI – July 2005
BEGIN PROC Frontend
DECLARATION
{ PROC; OS=Unix; Repeat = 4; }
END PROC
CONNECTION
FOR I = 0 to 3 DO
Backend LINK i Frontend LINK i
OD
END PROC
8
RMS Components
User interface
At the minimum - command line user interface
GUI becoming indispensable
Typical commands
Job submission to register for execution
status display to monitor progress or failure of a job
Job deletion to cancel jobs no longer needed
ECI – July 2005
9
RMS Components (contd)
Administrative environment
Specify nodes characteristics
Define feasible job classes and map to hosts
Define user access permissions
Specify resource limitations for users and jobs
Specify policies for the assignment of jobs
Control and ensure proper operation of the RMS
Analyze accounting data to tune the system
ECI – July 2005
10
RMS Entities
Queues
Hosts
Compute hosts, control hosts
Users
Queues bound to hosts, jobs assigned to queues
Capabilities, permissions, priorities
Jobs
Resources
Policies
ECI – July 2005
11
RMS Entities – Jobs
Job: collection of computational tasks
A single program, or several interacting programs
In the context of RMS
Batch Jobs: require no manual interaction as soon
as started
Interactive Jobs: require input during runtime
Parallel Jobs: subtasks spread across several
hosts in a cluster
Check-pointing Jobs: periodically save status to
the file system and can be aborted anytime
ECI – July 2005
12
RMS Entities – Jobs
Batch jobs
Interactive jobs
Dispatch jobs according to policy and availability
Suspend/Resume & checkpoint/restart
Need to maintain a terminal connection
“Watchdog” monitor withdraw from pool
Parallel jobs
Need to integrate with parallel environment
Scheduling policy is more complex
ECI – July 2005
13
RMS Entities - Resources
Available memory, CPU time, network
bandwidth, and peripheral devices,
licenses
Jobs declare resource requirements
RMS enforces resource consumption
ensures quality of service
prevents over-subscription
detects over-usage
ECI – July 2005
14
RMS Entities - Policies
Abstract mechanisms to automate control
Resource Utilization Policies
imbalanced load is common in clusters
important/urgent work starved
unauthorized users may take advantage
users may exceed desired resource usage over time
Monitor resource consumption
Dispatch of new jobs
Scheduling Policies
Dispatch of new jobs
Relocation of jobs
ECI – July 2005
15
Resource Utilization Policies
Share based
Functional
Assignment by functional importance (priority)
Past usage is not taken into account
Time-critical applications
Administrators like power…
Resource “credit” is assigned to users, depts…
Hierarchical share tree defines sharing
Establish entitlements within time frame
Fair distribution of resources
Deadline
Manual override
ECI – July 2005
16
Scheduling Policies
Dispatch time – who, where
First-Come-First-Served
Select-Least-Loaded
Select-Fixed-Sequence
Combinations above
Relocation – who, when, where
Dynamic resource balancing
ECI – July 2005
17
Scheduling of Parallel Processes
Gang scheduling
Requires tight-coupling (MPP’s)
Co-scheduling
Demand-based
False priority
Concurrent applications
Implicit
Busy wait to not relinquish cpu
ECI – July 2005
18
RMS Challenges
Open Interfaces
Export load balancing/distributed capabilities
Export status info (load, job status, queues)
Control/assistance from application
Integration with other environments (MPI)
Extend functionality for special cases
API must be: simple, usable, abstract, robust
ECI – July 2005
19
RMS Example: CODINE
CODINE/GRD
cod_qmaster: master daemon
cod_schedd: scheduler daemon
cod_execd: execution daemon
Continuously match utilization with policies
GRD monitors and adjusts resource usage
correlated to all processes of a job
Feedback to adjust shares towards changing
requirements
ECI – July 2005
20
Static Scheduling Scheme
ECI – July 2005
21
Dynamic Scheduling Scheme
ECI – July 2005
22
RMS Example: PBS
Portable Batch Sysetm
Scheduler – job to node mapping, queues
Server – communications, logs
Control daemon (per node) – executive agent
Scope – single node
Job arrays
Task Management interface
ECI – July 2005
23
RMS Example: Condor
Condor: a distributed job scheduler
Harvest idle workstations
Job scheduling and migration
Advertising mechanism
Both job and W/S advertise presence
Jobs advertise requirements (job description file)
W/S advertise their capabilities
ECI – July 2005
24
Condor: Example JDF
universe = vanilla
# select runtime environment
executable = some_job
requirements = (Arch=="INTEL" && OpSys=="LINUX")
rank = (Memory * 10000) + KFlops
#target
arguments = -verbose
input = in.dat
# redirect to stdin
output = out.dat
# redirect to stdout
log = log.txt
Queue
# add job to queue
ECI – July 2005
25
RMS: Condor (contd)
Universe
Vanilla: sequential apps (shared FS)
MPI, PVM: integrated with parallel environment
Globus: grid computing environment
Standard: enables process migration
Process migration
Reschedule higher priority job
User reclaims her W/S
Must be linked with a special library
ECI – July 2005
26
RMS: Condor (contd)
Access to data
Shared file system
Condor file transfer mechanism
Automatically prefetch, postfetch
Remote I/O calls (in standard universe)
Architecture
Central manager
Server on each node
ECI – July 2005
27
Known Condor Pools
ECI – July 2005
28
Monitoring
Design choices
Centralized decentralized
Periodic request driven
Flat hierarchical
Resolution of information
Focused view
ECI – July 2005
29
Monitoring Example: Parmon
Features:
Online creation of Node and Group database
Component, Node, Group, or entire Cluster level
Monitoring of CPU, memory, disk and network,
processes, log files etc
Facility to define events & automatic notification
Misc: message broadcast, remote admin, GUI
ECI – July 2005
30
Load Balancing
Application system
Static dynamic adaptive
Centralized decentralized
Receiver initiated sender initiated
Parallel applications
On-line nature
ECI – July 2005
31
LB: Application Level
Application level
Round robin
Randomized
Recursive bisection
Other optimization
Hard to estimate execution times
Indeterminate no. of steps
Unpredictable load
Communication delays
ECI – July 2005
32
LB: System Level
System level
Round robin
Randomized
Estimate run time
Specified by job description
Estimate from past experience
ECI – July 2005
33
LB Example: MOSIX
Decentralized
Symmetric
Deterministic
Responsive
Stable
Competitive
Resources: CPU, memory, I/O
ECI – July 2005
34
Load Balancing Over Network
Distribute workload or network traffic load
across the cluster
Nodes may be interconnected among themselves
Must be connected to the balancing device
Processing nodes provide status information
current processor load
the application system load
number of active users
the availability of network protocol buffers
other specific resources
ECI – July 2005
35
Load Balancing Over Network
Balancing device
monitors the status of all nodes
dictates where to direct the next job
a single unit or a group in tree hierarchy
use one or more algorithms or methods
static or dynamic setting
Decide which node gets the next
incoming connection request
ECI – July 2005
36
Factors in Network Balancing
Wire-speed processing
Node operating system limitation
Balancing device limitations
Packet processing, no. of connections, interrupts
Tables, memory
Session based traffic, non-session UDP
Application dependencies (affinity)
ECI – July 2005
37
Simple Balancing Methods
Weighting
Randomization
Works good in identical node environment
Round-Robin
Assign weights to the nodes of different capacities
Commonly used by itself in DNS (address caching)
Effective where all the nodes in the cluster are
identical in capacity and performance
Hashing
Packets from the same source address will always
get assigned to the same server
ECI – July 2005
38
Simple Balancing Methods
Least Connections
Minimum Misses
Assigns to the node which currently has the least
connections ( ≠ least load )
Assign to the nodes which has processed the least
number of incoming request in its history
Fastest Response
Assigns to the node with the fastest response
Requires active monitoring of the individual nodes
Sending ICMP packets with the ‘ping’ command
Proprietary mechanism based upon UDP packets
ECI – July 2005
39
Advanced Balancing Methods
Primary optimization vectors
Node traffic – predict volume of traffic
Network traffic – monitor node state
Node-load based balancing – (which load ?)
DNS load balancing - simple
Topology-based – reduce latency
Application-specific performance
Policy based optimization
Application ,bandwidth, admin, security
ECI – July 2005
40
Common Errors
There are four common errors
Overflow
Underflow
Routing errors
Induced network errors
May destabilize efficient network clustering
ECI – July 2005
41
Information Dissemination
Central Decentralized
Load incurred on system
Processing load
Network load
Partial knowledge – gossip algorithms
Example: finding average load
ECI – July 2005
42