A1-Presentation-2 - Lyle School of Engineering

Download Report

Transcript A1-Presentation-2 - Lyle School of Engineering

Process Management & IPC
In Multiprocessor Operating
Systems
Presented by Group A1
Garrick Williamson
Brad Crabtree
Alex MacFarlaneSM
SMU
Process Management & IPC Intro
(Focus on Solaris)
Garrick Williamson
SMU
SM
Introduction
• SunOS is the operating system component of
the Solaris environment.
• It supports Symmetric Multiprocessing (SMP).
See diagram on next page for an example of an SMP
system.
• The kernel runs equally on all processors
within a tightly coupled shared memory
multiprocessor system.
• Control flows are entirely threads, including
SM
interrupts.
SMU
SMP System Example
SMU
SM
SunOS 5.0 Architecture
• In addition to Kernel level threads, SunOS also
supports multiple threads of control, called lightweight
processes (LWPs).
• There is one Kernel thread for each LWP. The Kernel
threads are used when the LWPs perform system
functions/calls.
SMU
SM
SunOS Architecture Diagram
SMU
SM
Synchronization
• Threads/Processes synchronize through a
variety of ways:
–
–
–
–
Mutual Exclusion locks
Condition Variables
Counting Semaphores
Multiple Readers and single writer locks
• The Mutual Exclusion and writer locks use a
priority inheritance protocol in order to
SM
prevent priority inversion.
SMU
Solaris IPC
• Solaris provides the following mechanisms for
IPC:
– Simple, but limited mechanisms include
• Signals
• Pipes and named pipes (FIFO)
• Sockets
– More versatile mechanisms include
• Message Queues
• Shared memory (With Memory Mapped files and IPC
shared Memory options)
SM
• Semaphores
SMU
Simple IPC
• Pipes do not allow unrelated processes to
communicate.
• Named pipes allow unrelated processes to
communicate, but are not private
channels.
• Using the kill function, processes may
communicate with signals, but only
through signal numbers. SM
SMU
Complex IPC
• Messaging allows formatted data streams
to be sent to arbitrary processes.
• Semaphores allow processes
synchronization.
• And shared memory allows processes to
share part of their virtual address space.
SMU
SM
IRIX Process Management And
IPC
Brad Crabtree
SMU
SM
Outline
• Hardware Background
• Process Management Facilities
• Interprocess Communication Facilities
SMU
SM
Large Scale Computing
Machines a Reality
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
The Avalon A12.
The Cambridge Parallel Processing
Gamma II Plus.
The Compaq AlphaServer SC.
The Fujitsu AP3000.
The Fujitsu VPP5000 series.
The Hitachi SR8000 system.
The HP Exemplar V2600.
The IBM RS/6000 SP.
The NEC Cenju-4.
The NEC SX-5.
The Quadrics Apemille.
The SGI Origin 2000 series.
The Sun E1000 Starfire.
The Tera/Cray SV1.
The Tera/Cray T3E.
The Tera MTA
• June 2001
– Raytheon installs
1152 processor Origin
3000 series at NOAA
– $67M
– 900 BFLOPS/sec
– 2 PB Tape Library
SMU
SM
SGI Origin Architecture
• ccNUMA (NUMALink)
– non-blocking
crossbar switches as
an interconnect fabric
– 1.6GB-per-second
crossbar switch
SMU
SM
Switch verses Bus
SMU
SM
“Cellular IRIX” Scheduler
• Facilities for Improving Scalability and
Locality
• Job Priorities
–
–
–
–
–
Real-Time Jobs
Batch Critical
Time Share
Batch
Weightless
SMU
SM
• User-level Scheduler Concept
Real Time Jobs
• Global Run Queue replaced with Implicit
Binding Scheme
– improve cache affinity and scalability
– binds top N jobs, by priority, to N CPUs
• CPU is always available when real-time job
comes in because currently running job is of
lower priority
• Real-Time jobs always go to same CPU
SM
SMU
Hard Real-Time in IRIX
• REACT/PRO Extentions
– Lock processes, memory to CPUs
– Disable IRIX scheduler and replace with
Frame Scheduler, Deadline Scheduler or
None (yours)
– Direct interrupts away from CPUs
– Deterministic interrupt latency
SMU
SM
Time Sharing Scheduler
• Degrading Priority replaced with
Earnings Model
– Distribution controlled by Virtual
Multiprocessors (VMPs)
– at 1 HZ, VMPs balance run queues with
nearest neighbors and push out extra work
SMU
SM
Parallel Job Scheduling
• Gang Scheduling replaced with
Nanothreads
– Space sharing over Time Sharing
– Job requests CPUs, gets # avail and then
algorithm is re-blocked
– When thread preempted, context is saved to
shared memory and User Level Scheduler
re-blocks again
SM
SMU
Replicated Kernel Text
• Wired in 16MB TLB pair into kernel
virtual memory space
– One read-only, one read-write
– TLB miss exception overhead is avoided
SMU
SM
Memory Migration
• Trying to avoid memory hot spots
• Reference counters in hub (local/remote)
• Fast Block Transfer Engine
– Marks Source Page as Poisoned
• Lazy TLB Shootdown
• Hysterisis for frequent migration
managed
SM
SMU
Types of IPC & Compatibility
Type of IPC
Signals
Purpose
A means of receiving notice of a software or hardware event,
asynchronously.
A way to create a segment of memory that is mapped into the
address space of two or more processes, each of which can
access and alter the memory contents.
Compatibility
POSIX, SVR4,
BSD
POSIX, IRIX,
SVR4
Semaphores
Software objects used to coordinate access to countable
resources.
POSIX, IRIX,
SVR4
Locks,
Mutexes, and
Condition
Variables
Software objects used to ensure exclusive use of single
resources or code sequences.
POSIX, IRIX
Barriers
Software objects used to ensure that all processes in a group
are ready before any of them proceed.
IRIX
Message
Queues
Software objects used to exchange an ordered sequence of
messages.
POSIX, SVR4
File Locks
A means of gaining exclusive use of all or part of a file.
SVR4, BSD
Sockets
Virtual data connections between processes that may be in
different systems.
BSD
Shared
memory
SMU
SM
POSIX vs. IRIX Shared
Memory
POSIX
Function Name
Purpose and Operation
mmap(2)
Map a file or shared memory object into the address space
shm_open(2)
Create, or gain access to, a shared memory object.
shm_unlink(2)
Destroy a shared memory object when no references to it remain open.
IRIX
Function Name
Purpose and Operation
usconfig(3)
Establish the default size of an arena, the number of concurrent processes that can use it, and
SMU
the features of IPC objects in it.
usinit(3)
Create an arena or join an existing arena.
usadd(3)
Join an existing arena.
SM
usconfig options
usconfig() Flag Name
Meaning
CONF_INITSIZE
The initial size of the arena segment. The default is 64 KB. Often you know
that more is needed.
CONF_AUTOGROW
Whether or not the arena can grow automatically as more IPC objects or data
objects are allocated (default: yes).
CONF_INITUSERS
The largest number of concurrent processes that can use the arena. The default is 8;
if more processes than this will use IPC, the limit must be set higher.
CONF_CHMOD
The effective file permissions on arena access. The default is 600, allowing only
processes with the effective UID of the creating process to attach the arena.
CONF_ARENATYPE
Establish whether the arena can be attached by general processes or only by members
of one program (a share group).
CONF_LOCKTYPE
Whether or not lock objects allocated in the arena collect metering statistics as they are used.
SMU
CONF_ATTACHADDR An explicit memory base address for the next arena to be created
CONF_HISTON/OFF
SM
Start and stop collecting usage history (more bulky than metering information) for semaphores
in a specified arena.
CONF_HISTSIZE
Set the maximum size of semaphore history records.
IRIX IPC
• Tuned for Multiprocessor Environment
• Utilizes “shared arena” memory
– memory that can be mapped into the address
spaces of multiple processes
– A shared arena is identified with a file that
acts as the backing store for the arena
memory
– shared memory is pinned intoSM
physical
memory, accessible by programs and kernel
SMU
First Touch Rule
• Pages in an arena are allocated via first
touch
– places virtual page in the node that first
accesses it
• To ensure spread processes have local
access to most used pages, touch whole
pages in arena from processes which use
SM
them most
SMU
– dynamic realloc. will handle; but slower
Linux Process Management
Alex MacFarlane
SMU
SM
Threads
• Number of threads limited only to size of
physical memory. By default, set to half:
max_threads = mempages / (THREAD_SIZE/PAGE_SIZE) / 2;
• Modifiable at runtime using sysctl() or
the proc filesystem interface.
• Was limited to 4k in Linux 2.2
SMU
SM
Thread Types
• Idle Thread(s)
– One per CPU in SMP system
– Created at boot time
• Kernel Threads
• User-space Threads
• Threads created by clone(), an extension
to fork()
SM
SMU
clone() flags
• CLONE_VM
– Share data and stack
• CLONE_FS
– Share filesystem info
• CLONE_FILES
– Share open files
• CLONE_SIGHAND
– Share signal handlers
• CLONE_PID
SMU
– Share PID with parent
SM
Linux Scheduling Policies
• SCHED_OTHER
– Traditional UNIX scheduling
• SCHED_FIFO
– Runs until blocking on I/O, explicitly yielding CPU
or being pre-empted by higher priority realtime
task.
• SCHED_RR
– Same as SCHED_FIFO but limited to a timeslice
• All user-space tasks must use SCHED_OTHER
• Static priorities may be assignedSM
using nice()
SMU
Process Representation
• A collection of struct task_struct structures
• Linked in two ways:
– A hashtable hashed on pid
– A circular doubly-linked list
• Find specific task using find_task_by_pid()
• Walk tasks using for_each_task()
• Modifications protected by a read-write
SM
spinlock.
SMU
Process States
•
•
•
•
•
•
•
TASK_RUNNING: means the task is in the run queue.
TASK_INTERRUPTIBLE: means the task is sleeping but can
be woken up by a signal or by expiry of a timer.
TASK_UNINTERRUPTIBLE: same as previous, except it
cannot be woken up.
TASK_ZOMBIE: task has terminated but has not had its status
collected (wait()-ed for) by the parent (natural or by adoption).
TASK_STOPPED: task was stopped, either due to job control
signals or due to ptrace().
TASK_EXCLUSIVE: this is not a separate state but can be ORed to either one of TASK_INTERRUPTIBLE or
TASK_UNINTERRUPTIBLE. Prevents “thundering herd”.
A process’ state may be modified asynchronously.
SMU
SM
Atomic Operations
• Two types
– Bitmap
– atomic_t
• Wrapped by bus locking on SMP
• Bitmap operations – for free/allocated bitmaps
– set_bit(), clear_bit(), change_bit(),
test_and_set_bit() etc.
• atomic_t operations – for numeric counts
SMU
– atomic_read(), atomic_set(), atomic_add(),
SM
atomic_inc() etc.
References
•
•
•
•
•
•
The SGI Origin software environment and application performance , Whitney, S.;
McCalpin, J.; Bitar, N.; Richardson, J.L.; Stevens, L., Compcon '97. Proceedings, IEEE ,
1997, Page(s): 165 -170
An Integrated Kernel- and User-Level Paradigm for Efficient Multiprogramming,
Master’s Thesis, D. Craig, CSRD Technical Report No. 1533, University of Illinois at
Urbana-Champaign, 1999.
Integrated scheduling of multimedia and hard real-time tasks, Kaneko, H.;
Stankovic, J.A.; Sen, S.; Ramamritham, K., Real-Time Systems Symposium, 1996., 17th
IEEE , 1996, Page(s): 206 -217
An Efficient Kernel-level Scheduling Methodology for Multiprogrammed Shared
Memory Multiprocessors, Proc. of the First Merged IPPS/SPDP Conference, pp. 392-397, Orlando, FL, 1998. 18
Topics in IRIX Programming, Chapter 2, Interprocess Communication, Silicon
Graphics, Inc., 2001
SMBetween Processes,
Topics in IRIX Programming, Chapter 3, Sharing Memory
Silicon Graphics, Inc., 2001
SMU
References
• Phyllis E. Crandall, Eranti V. Sumithasri, and Mark A. Clement.
Performance comparison of desktop multiprocessing and
workstation cluster computing. In Proceedings of the Fifth
International Symposium on High Performance Distributed
Computing, August 1996.
• www.sun.com
• Kotz, David and Nils Nieuwajaar, Flexibility and Performance of
Parallel File Systems, ACM Operating Systems Review 30(2),
ACM Press, April 1996, pp. 63-73.
SMU
SM