Intel On-line Template

Download Report

Transcript Intel On-line Template

Process Scheduling in
Multiprocessor and
Multithreaded Systems
Matt Davis
CS535
4/7/2003
1
Outline
 Multiprocessor Systems
– Issues in MP Scheduling
– How to Allocate Processors
– Cache Affinity
– Linux MP Scheduling
 Simultaneous Multithreaded Systems
– Issues in SMT Scheduling
– Symbiotic Jobscheduling
– SMT and Priorities
– Linux SMT Scheduling
 Conclusions
Multiprocessor Systems
 Symmetric Multiprocessing (SMP):
– One copy of OS in memory, any CPU can use it
– OS must ensure that multiple processors
cannot access shared data structures at the
same time
CPU
CPU
CPU
CPU
Shared
Memory
CPU
CPU
CPU
CPU
Shared Memory Multiprocessors
Issues in MP Scheduling
 Starvation
– Number of active parallel threads < number of
allocated processors
 Overhead
– CPU time used to transfer and start various
portions of the application
 Contention
– Multiple threads attempt to use same shared
resource
 Latency
– Delay in communication between processors
and I/O devices
How to allocate processors
 Allocate proportional to average
parallelism
 Other factors:
–System load
–Variable parallelism
–Min/Max parallelism
 Acquire/relinquish processors based
on current program needs
Cache Affinity
 While a program runs, data needed is
placed in local cache
 When job is rescheduled, it will likely
access some of the same data
 Scheduling jobs where they have
“affinity” improves performance by
reducing cache penalties
Cache Affinity (cont)
 Tradeoff between processor
reallocation and cost of reallocation
–Utilization versus cache behavior
 Scheduling policies:
–Equipartition: constant number of
processors allocated evenly to all jobs.
Low overhead.
–Dynamic: constantly reallocates jobs to
maximize utilization. High utilization.
Cache Affinity (cont)
 Vaswani and Zahoran, 1991
–When a processor becomes available,
allocate it to runnable process that was last
run on processor, or higher priority job
–If a job requests additional processors,
allocate critical tasks on processor with
highest affinity
–If an allocated processor becomes idle,
hold it for a small amount of time in case
task with affinity comes along
Vaswani and Zahoran, 1991
 Results showed that utilization was
dominant effect on performance, not cache
affinity
– But their algorithm did not degrade
performance
 Predicted that as processor speeds
increase, significance of cache affinity will
also increase
 Later studies validated their predictions
Linux 2.5 MP Scheduling
 Each processor responsible for scheduling
own tasks
– schedule()
 After process switch, check if new process
should be transferred to other CPU running
lower priority task
– reschedule_idle()
 Cache affinity
– Affinity mask stored in /proc/pid/affinity
– sched_setaffinity(), sched_getaffinity()
What is SMT?
 Simultaneous Multithreading
– aka HyperThreading®
 Issue instructions from multiple threads
simultaneously on a superscalar
processor
Thread 1
Time
Thread 2
ALU
FPU
BP
Mem
Why SMT?
 Technique to exploit parallelism in and
between programs with minimal
additions in chip resources
 Operating system treats SMT processor
as two separate processors*
Thread Thread
1
2
Operating System
Processor
1
Processor
2
Operating System
Issues With SMT Scheduling
 *Not really separate processors:
–Share same caches
 MP scheduling attempts to avoid idle
processors
–SMT-aware scheduler must differentiate
between physical and logical processors
Symbiotic Jobscheduling
 Recent studies from U of Washington
–Origin of early research into SMT
 OS coschedules jobs to run on
hardware threads
 # of coscheduled jobs <= SMT level
 Occasionally swap out running set to
ensure fairness
Symbiotic Jobscheduling (cont)
 Shared system resources:
–Functional units, caches, TLB’s, etc…
 Coscheduled jobs may interact well…
–Few resource conflicts, high utilization
 Or they may interact poorly
–Many resource conflicts, lower utilization
 Choice of coscheduled jobs can have
large impact on system performance
Symbiotic Jobscheduling (cont)
 Improve symbiosis by coscheduling
jobs that get along well
 Two phases of SOS (Sample,
Optimize, Symbios) jobscheduler:
–Sample – Gather data on current
performance
–Symbios – Use computed scheduling
configuration
Symbiotic Jobscheduling (cont)
 Sample phase:
–Periodically alter coscheduled job mix
–Record system utilization from hardware
performance counter registers
 Symbios phase:
–Pick job mix that had the highest
utilization
 Trade-off between sampling often or
infrequently
How to Measure Utilization?
 IPC not necessarily best predictor:
– IPC can have high variations throughout
process
– High-IPC threads may unfairly take system
resources from low-IPC threads
 Other predictors: low # conflicts, high
cache hit rate, diverse instruction mix
 Balance: schedule with lowest deviation in
IPC between coschedules is considered
best
What About Priorities?
 Scheduler estimates the “natural” IPC of
job
 If a high-priority jobs is not meeting the
desired IPC, it will be exclusively
scheduled on CPU
 Provides a truer implementation of
priority:
–Normal schedulers only guarantee
proportional resource sharing, assumes no
interaction between jobs
Another Priority Algorithm:
 SMT hardware fetches instructions to
issue from queue
 Scheduler can bias fetching algorithm
to give preference to high-priority
threads
 Hardware already exists, minimal
modifications
Symbiosis Performance Results
 Without priorities:
–Up to 17% improvement
 Software-enforced priorities:
–Up to 20%, average 8%
 Hardware-based priorities:
–Up to 30%, average 15%
Linux 2.5 SMT Scheduling
 Immediate reschedule forced when
HT CPU is executing two idle
processes
 HT-aware affinity: processes prefer
same physical CPU
 HT-aware load-balancing: distinguish
logical and physical CPU in resource
allocation
Conclusions
 Intelligent allocation of resources can
improve performance in parallel systems
 Dynamic scheduling of processors in MP
systems produces better utilization as
processor speeds increase
– Cache affinity can help improve throughput
 Symbiotic coscheduling of tasks in SMT
systems can improve average response
time
Resources
 Kenneth Sevcik, “Characterizations of Parallelism
in Applications and Their Use in Scheduling”
 Raj Vaswani and John Zahoran, “The Implications
of Cache Affinity on Processor Scheduling for
Multiprogrammed, Shared Memory
Multiprocessors”
 Allan Snavely et al., “Symbiotic Jobscheduling
with Priorities for a Simultaneous Multithreading
Processor”
 Linux MP cache affinity,
http://www.tech9.net/rml/linux
 Linux Hyperthreading Scheduler,
http://www.kernel.org/pub/linux/kernel/people/rust
y/Hyperthread_Scheduler_Modifications.html
 Daniel Bovet and Marco Cesati, Understanding the
Linux Kernel