Intel On-line Template
Download
Report
Transcript Intel On-line Template
Process Scheduling in
Multiprocessor and
Multithreaded Systems
Matt Davis
CS535
4/7/2003
1
Outline
Multiprocessor Systems
– Issues in MP Scheduling
– How to Allocate Processors
– Cache Affinity
– Linux MP Scheduling
Simultaneous Multithreaded Systems
– Issues in SMT Scheduling
– Symbiotic Jobscheduling
– SMT and Priorities
– Linux SMT Scheduling
Conclusions
Multiprocessor Systems
Symmetric Multiprocessing (SMP):
– One copy of OS in memory, any CPU can use it
– OS must ensure that multiple processors
cannot access shared data structures at the
same time
CPU
CPU
CPU
CPU
Shared
Memory
CPU
CPU
CPU
CPU
Shared Memory Multiprocessors
Issues in MP Scheduling
Starvation
– Number of active parallel threads < number of
allocated processors
Overhead
– CPU time used to transfer and start various
portions of the application
Contention
– Multiple threads attempt to use same shared
resource
Latency
– Delay in communication between processors
and I/O devices
How to allocate processors
Allocate proportional to average
parallelism
Other factors:
–System load
–Variable parallelism
–Min/Max parallelism
Acquire/relinquish processors based
on current program needs
Cache Affinity
While a program runs, data needed is
placed in local cache
When job is rescheduled, it will likely
access some of the same data
Scheduling jobs where they have
“affinity” improves performance by
reducing cache penalties
Cache Affinity (cont)
Tradeoff between processor
reallocation and cost of reallocation
–Utilization versus cache behavior
Scheduling policies:
–Equipartition: constant number of
processors allocated evenly to all jobs.
Low overhead.
–Dynamic: constantly reallocates jobs to
maximize utilization. High utilization.
Cache Affinity (cont)
Vaswani and Zahoran, 1991
–When a processor becomes available,
allocate it to runnable process that was last
run on processor, or higher priority job
–If a job requests additional processors,
allocate critical tasks on processor with
highest affinity
–If an allocated processor becomes idle,
hold it for a small amount of time in case
task with affinity comes along
Vaswani and Zahoran, 1991
Results showed that utilization was
dominant effect on performance, not cache
affinity
– But their algorithm did not degrade
performance
Predicted that as processor speeds
increase, significance of cache affinity will
also increase
Later studies validated their predictions
Linux 2.5 MP Scheduling
Each processor responsible for scheduling
own tasks
– schedule()
After process switch, check if new process
should be transferred to other CPU running
lower priority task
– reschedule_idle()
Cache affinity
– Affinity mask stored in /proc/pid/affinity
– sched_setaffinity(), sched_getaffinity()
What is SMT?
Simultaneous Multithreading
– aka HyperThreading®
Issue instructions from multiple threads
simultaneously on a superscalar
processor
Thread 1
Time
Thread 2
ALU
FPU
BP
Mem
Why SMT?
Technique to exploit parallelism in and
between programs with minimal
additions in chip resources
Operating system treats SMT processor
as two separate processors*
Thread Thread
1
2
Operating System
Processor
1
Processor
2
Operating System
Issues With SMT Scheduling
*Not really separate processors:
–Share same caches
MP scheduling attempts to avoid idle
processors
–SMT-aware scheduler must differentiate
between physical and logical processors
Symbiotic Jobscheduling
Recent studies from U of Washington
–Origin of early research into SMT
OS coschedules jobs to run on
hardware threads
# of coscheduled jobs <= SMT level
Occasionally swap out running set to
ensure fairness
Symbiotic Jobscheduling (cont)
Shared system resources:
–Functional units, caches, TLB’s, etc…
Coscheduled jobs may interact well…
–Few resource conflicts, high utilization
Or they may interact poorly
–Many resource conflicts, lower utilization
Choice of coscheduled jobs can have
large impact on system performance
Symbiotic Jobscheduling (cont)
Improve symbiosis by coscheduling
jobs that get along well
Two phases of SOS (Sample,
Optimize, Symbios) jobscheduler:
–Sample – Gather data on current
performance
–Symbios – Use computed scheduling
configuration
Symbiotic Jobscheduling (cont)
Sample phase:
–Periodically alter coscheduled job mix
–Record system utilization from hardware
performance counter registers
Symbios phase:
–Pick job mix that had the highest
utilization
Trade-off between sampling often or
infrequently
How to Measure Utilization?
IPC not necessarily best predictor:
– IPC can have high variations throughout
process
– High-IPC threads may unfairly take system
resources from low-IPC threads
Other predictors: low # conflicts, high
cache hit rate, diverse instruction mix
Balance: schedule with lowest deviation in
IPC between coschedules is considered
best
What About Priorities?
Scheduler estimates the “natural” IPC of
job
If a high-priority jobs is not meeting the
desired IPC, it will be exclusively
scheduled on CPU
Provides a truer implementation of
priority:
–Normal schedulers only guarantee
proportional resource sharing, assumes no
interaction between jobs
Another Priority Algorithm:
SMT hardware fetches instructions to
issue from queue
Scheduler can bias fetching algorithm
to give preference to high-priority
threads
Hardware already exists, minimal
modifications
Symbiosis Performance Results
Without priorities:
–Up to 17% improvement
Software-enforced priorities:
–Up to 20%, average 8%
Hardware-based priorities:
–Up to 30%, average 15%
Linux 2.5 SMT Scheduling
Immediate reschedule forced when
HT CPU is executing two idle
processes
HT-aware affinity: processes prefer
same physical CPU
HT-aware load-balancing: distinguish
logical and physical CPU in resource
allocation
Conclusions
Intelligent allocation of resources can
improve performance in parallel systems
Dynamic scheduling of processors in MP
systems produces better utilization as
processor speeds increase
– Cache affinity can help improve throughput
Symbiotic coscheduling of tasks in SMT
systems can improve average response
time
Resources
Kenneth Sevcik, “Characterizations of Parallelism
in Applications and Their Use in Scheduling”
Raj Vaswani and John Zahoran, “The Implications
of Cache Affinity on Processor Scheduling for
Multiprogrammed, Shared Memory
Multiprocessors”
Allan Snavely et al., “Symbiotic Jobscheduling
with Priorities for a Simultaneous Multithreading
Processor”
Linux MP cache affinity,
http://www.tech9.net/rml/linux
Linux Hyperthreading Scheduler,
http://www.kernel.org/pub/linux/kernel/people/rust
y/Hyperthread_Scheduler_Modifications.html
Daniel Bovet and Marco Cesati, Understanding the
Linux Kernel