P 1 - Georgia State Student Chapter of the ACM

Download Report

Transcript P 1 - Georgia State Student Chapter of the ACM

CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
1. Goal
2. Transparency
3. Service
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
1. Efficiency
2. Flexibility
3. Consistency
4. Robustness
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
1. Access transparency
2. Location transparency
3. Migration transparency
4. Concurrency transparency
5. Replication transparency
6. Parallelism transparency
7. Failure transparency
8. Performance transparency
9. Size transparency
10. Revision transparency
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
1. Primitive Services
2. Services by System Servers
3. Value added services.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
1. Workstation-server model
Servers
Communication
Network
Workstations
2. Processor-pool model
Processor Pool
Server
Communication
Network
Servers
Intelligent Terminals
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
1. LAN – Local Area Network
2. MAN – Metropolitan Area Network
3. WAN – Wide Area Network
4. On board interconnection networks
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
1. Physical Layer
2. Data Link Control (DLC) Layer
3. Network Layer
4. Transport Layer
5. Session Layer
6. Presentation Layer
7. Application Layer
Note: TCP/IP Protocol Suite Correspondence. 1, 2,
as it is, 3 is IP, 4 and parts of 5 is TCP/UDP.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
1. Object models and naming schemes.
2. Distributed Coordination.
3. Interprocess Communication.
4. Distributed Resources.
5. Fault tolerance and security.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
• Everything in a computer system are objects.
Objects are characterized by operations.
• Object identification schemes
• By name.
• By address.
• By service.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
1. Barrier Synchronization
2. Condition Coordination
3. Mutual Exclusion
Related Problems:
1. Deadlock handling
2. Agreement protocols
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
1. Point-to-point communication (unicast)
2. Group communication (multicast or broadcast)
1. Client/server model (socket)
2. Remote Procedure Call (RPC)
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
1. Processing Power (processors) – load sharing,
multiprocessor scheduling
2. Distributed Data (distributed shared memory and
distributed file system) - synchronization, replica
management
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
1. Transparency
2. Recovery
3. Confidentiality, authentication and authorization
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
•Processes are programs in execution
--- Each process has its own PCB, page table
and address space.
•Threads are lightweight processes
--- Multiple threads run on a process sharing the
address space of the process.
--- Each thread has its own TCB consisting of
program counter, stack pointers and register set.
--- Further each thread has its own stack.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
 Ease of creation – no need to copy address space
 Context switching is cheaper, particularly for
threads implemented in user space
 Allow concurrency as well as blocking system
calls making the programming easier
 Simpler and more efficient resource sharing
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
 Dispatcher-workers model.
 Team model.
 Pipeline model.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Requests
CSC 8420 Advanced Operating Systems
Worker thread
Worker thread
Worker thread
Dispatcher thread
Georgia State University
Yi Pan
Requests
Thread
Thread
Thread
o Identical threads
o Static
o Not identical
threads
CSC 8420 Advanced Operating Systems
o Dynamic
Georgia State University
Yi Pan
Requests
Thread
CSC 8420 Advanced Operating Systems
Thread
Thread
Georgia State University
Yi Pan
 Thread creation
-- Static
vs. Dynamic.
 Thread termination
 Thread synchronization
 Thread scheduling
-- Priority assignment, quantum size variation,
handoff and affinity scheduling.
 Signal handling
-- Signal handler, signals must not be lost.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
 Implementing in user space
 Implementing in kernel space
 Hybrid approach
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
 As an add-on library
 OS schedules the process, not threads, limited
scheduling choice – non-preemptive
priority.
 Low context switching overhead.
 One thread may block all other.
 Possible unfair process scheduling.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
 All scheduling policies may be implemented.
 OS is aware of threads and schedule them.
 Context switching overhead is higher.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
User space
threads
LWP
LWP
LWP
LWP
Kernel space
threads
Multiprocessor system
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
To analyze the performance requirements, the processes
should be represented in abstract ways, hiding
unnecessary detail.
Models for process representation:
 Synchronous process graph.
 Asynchronous process graph.
 Timing diagram.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Represents the precedence relation between
communicating processes using Directed Acyclic Graph
(DAG).
Can be used for total completion time analysis
etc.
Fork/join or cobegin/coend constructs may be
used to implement the precedence relation.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Represents communicating processes without any
reference to the precedence relation.
Used for processor allocation etc.
 One-way communication
 Client/server communication
 Peer to peer communication
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
The events alongwith their time of occurrence is shown in this
method. Provides greater detail and both the synchronous and
asynchronous process graphs may be derived from the timing
diagram.
P1
P2
P3
Time
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Fact: Many applications need the timing information.
Ex: Two processes updating a file. The file server received two simultaneous
write requests. Without a time stamp it is not possible for the file server to
determine which request is older.
Fact: It is not possible to perfectly synchronize clocks in a
distributed system.
Fact: Even if the clocks may be synchronized, physical
clocks will not run at same pace. Therefore, clock skew
will develop over time.
Hence, we need some protocol to ensure that the clocks
run in reasonable agreement. Such protocols are called
clock synchronization protocols.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Two types of clocks:
• Physical Clock (a close approximation to the real life clock).
• Logical Clock (used to provide ordering of the events).
Different algorithms are used to synchronize physical
and logical clocks.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Many Universal Coordinated Time (UTC) sources are
available for use by computers.
 NIST short wave radio
 Automated Computer Time Service (via modem)
 GPS satellites
Using these time services are costly and/or complicated.
Therefore, a few time servers (TS) access the UTC
sources directly. Other computers obtain the timing
information from one or more TS.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Fact: Accessing a TS via network requires a non-deterministic
message exchange time. Therefore, the following methods are
used to compensate the delay.
1.
Cristian’s Algorithm
2.
The Berkeley Algorithm
3.
Averaging Algorithm
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Sender
Request
Time Server
T0
I, time to process
interrupt
T1
Time, T
If d is the estimate of delay from the TS to
the sender, the sender should set its clock
to T+d.
If nothing is known about I, then estimated
d=(T1-T0)/2.
If I is known, then estimated d=(T1-T0-I)/2.
Or d may be estimated using average or
minimum of multiple requests.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
The time server periodically gathers time data
from all other machines and determines the
average time. Then it communicates the average
time (in fact the difference) to all machines.
 Suitable for systems which can not access any UTC
server.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
The Berkeley algorithm is centralized and therefore has a
single point of failure.
Every computer broadcasts its time information
at a regular interval known systemwide. Every
machine averages the broadcast information to
compute its own time.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Note that the clock of the system can not be set backwards.
Doing so may result in an inconsistent system state.
To solve the problem the clock is slowed down for a
certain period of time. Once the system clock reaches the
desired state, the rate of change of the clock may be
restored to the old state.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
It is possible to put an upper bound (D) on the rate at which the
clock skew develops. That means, the clock in a system may
deviate at most Dt from the real clock over time t.
Therefore, if two clocks are synchronized at time 0, then
they will differ by at most 2 Dt at the end of time period t.
If the system specification allows the clocks to differ by at
most T, then every computer will have to synchronize its
clock at least once every T/2D time.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
 Lamport’s algorithm
 Vector logical clocks
 Matrix logical clocks
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Lamport’s algorithm is based on two facts.
1. The events within a processor may be consistently
ordered using the system clock provided that the system
clock is monotonically increasing.
2. The ordering of the events occurring in different
processors do not matter as long as it is consistent with
the communication pattern. In other words, If processor
Pj sends a messsage to processor Pk at time T, then Pk can
not receive it on or before T.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Let T be the universal physical time and Cj(T) be the
value of the logical clock in processor j at time T.
Further, Cj(T) is monotonically non-decreasing.
1. Stamp each internal event by the logical clock
value of that processor.
2. Whenever a message is sent by processor k,
stamp it by Ck(T).
3. When a message is received by processor k
with a time stamp C’ set the logical clock of k to
max(C’+1,Ck(T)).
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Note that the version of the Lamport’s algorithm
provides only a partial ordering of the events across the
system.
If a total ordering of the events is necessary, then
the time stamp may be appended with the
processor id and the process id.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
One problem with Lamport’s algorithm is it does not help to
recognize the concurrent events in the sense that two concurrent
events may have different times stamps. Therefore, if T(a)<T(b)
(where T(a) is the time stamp associated with the event a), it is
not possible to say whether a causally happened before b, or they
are concurrent.
Definition: a causally happened before b if either
1.
a occurred before b in the same processor
2.
A message sent by the processor of a after a occurred is received by the processor
of b before b occurred.
3.
a causally happened before c and c causally happened before b.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
The problem may be solved using vector logical clock.
Each processor (k) maintains an array of logical clocks
Ck(T)=(Ck1 (T),…,Ckn(T)) where the ith entry corresponds to the
logical clock of processor I.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Definition: Ck(T)<Cj(T) if all coordinates of
Ck(T) are less than or equal to the
corresponding coordinates of Cj(T) and
at least one coordinate of Ck(T) is
strictly less than that of Cj(T).
If neither Ck(T)<Cj(T) nor Cj(T)< Ck(T) then
they are called incomparable.
The vector logical clock algorithm uses this vector clock to time
stamp the events. The vector logical clock is maintained in such a
way that the time stamp for two concurrent events will be
incomparable. If the time stamps are comparable then the event
with smaller time stamp occurred before.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
The processor k maintains its vector clock as follows.
•
The clock Ck(T) is initiated at (0,…,0).
•
The kth coordinate (Ckk(T)) changes as the local
clock changes.
•
Each message sent out is stamped by the vector
clock.
•
Whenever a message with time stamp Ci(T) is
received, coordinate p of the logical clock (Ckp(T))
is replaced by max(Ckp(T), Cip(T)) for p=1,…,n.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Instead of maintaining an array of logical clocks, an array of
vector clock is maintained. The matrix thus obtained is used to
stamp messages. Upon receipt of a message, the matrix clock is
updated by taking the pair-wise maximum.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
1. Message Passing (lowest level of communication, unstructured)
2. Request/Reply (Client/server, RPC)
3. Transactions (Satisfies ACID property)
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
• Send(destination,message)
• Receive(source, message)
Source/destination may be described in four different ways
1.
Process name
2.
Link
3.
Mailbox
4.
port
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
1. Unique global process id. required. May be obtained by
adding machine address to process id.
2. Allows only one logical communication link between
two processes.
3. Process identifiers need to be known at coding time.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
1. Allows multiple logical communication path between
communicating processes.
2. The communicating processes need to know about each
other.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Mailboxes are global data structures shared by some sender
and some receiver processes.
1. Allows indirect communication between sender and
receiver.
2. Allows multipoint and multipath communication.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Ports are finite mailboxes with first-in-first-out (FIFO)
message queues.
1. Created by user processes using system calls.
2. May have restrictions in terms of access rights.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Send/receive primitives may be blocking or non-blocking.
Blocking receive implies the process cannot continue till
the message is received.
Blocking send may be of different types.
•
Ordinary blocking send
•
Reliable blocking send
•
Explicit blocking send
•
Request and reply
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Used in both Windows and UNIX.
• Pipes: Byte streams shared by parent process and children.
• Named pipes: FIFO files shared by unrelated processes
• Sockets: two-way communication links shared by processes in
different domains.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Provides Privacy, Integrity and Authenticity.
Authentication is done by third-party certification authority.
Privacy and integrity are maintained by handshake protocol
and cryptography.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
•
Reliability (Best effort vs. Reliable)
•
Delivery Order (FIFO, Causal Order, Total Order)
•
Failure of recipient(s) vs. Failure of originator
•
Overlapping groups
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
The algorithm is very similar to the vector logical clock.
1.
Each message is time stamped by a vector where each entry in the vector is
the number of messages received by the sender from that group member.
2.
Accept a message from process i if a) you have received all previous
messages from i and b) you have received all messages seen by i.
Otherwise,delay accepting the message.
3.
Reject any duplicated message.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Atomic multicast alongwith total order delivery provided
by two-phase, total-order multicast.
Originator:
•
Send the message, collect acknowledgements with time stamp.
•
Send the commit message with the highest logical ack time stamp (commit
stamp).
Recipient:
•
Send acknowledgement with the logical clock value as time stamp (local
ack stamp).
•
Do not deliver a message with commit stamp t until the commit message for
all messages with local ack stamp < t has been committed.
•
Deliver messages in the order of the commit stamp.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
In request/reply communication the sender is blocked (or the
message is considered not delivered) until it receives a reply.
RPC (Remote Procedure Call) is a language-level abstraction to
support request/reply communication.
RPC is implemented by stub procedures at both the client end
and the server end.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Implementation issues:
• Parameter passing and data conversion
- Call-by-value and call-by-copy/restore are favorite choices.
- Data typing, data representation, data transfer syntax problems. Use a
canonical data representation.
• Binding: directory server and port mapper.
• RPC compilation
• RPC exception and failure in-band or out-band signaling, idempotent
services, request id., reliable transport layer, handling server
crash, handling client crash
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Transactions are communications with ACID property:
• Atomicity: all or nothing
• Consistency: interleaving results in serial execution in some order
• Isolation: Partial results are not visible outside
• Durability: After committing, the results are permanent
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
There is one coordinator and multiple participants. Each of them
have access to some stable storage. Activity log is maintained in
the stable storage.
Coordinator:
1.
Prepare to commit the transaction by writing every update in activity
log.
2.
Write a precommit message in the activity log. Send a voting message
to all participants asking whether they are ready to commit.
3.
If all participants vote yes within the time-out period, multicast a
commit message. Otherwise, multicast an abort message.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Participant:
1.
Prepare to commit the transaction by writing every update in activity
log.
2.
Write a precommit message in the activity log. Wait for request to
vote from coordinator.
3.
Wait for commit message from the coordinator. If received, commit
the transaction. If abort message is received, abort the transaction.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Coordinator:
1.
If the processor crashes, it will check the activity log for the
transaction.
2.
If the precommit message is not in the log, abort the transaction.
3.
If the commit message is not in the log, retake the vote.
4.
If the commit message is there in the log, finish the transaction.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
•
Contention based
Lamport’s algorithm, Ricart & Agrawala’s algorithm, Voting algorithm
•
Token based
Ring structure, tree structure, broadcast structure
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Every process maintains a queue of pending requests for
entering critical section ordered according to the logical timestamp.
Requestor:
1. Send a request to every process.
2. After all replies are collected, enter its own request in
its own queue
3. If own request is at the head of the queue, enter critical
section.
4. Upon exiting the critical section, send a release
message to every process.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Other processes:
1. After receiving a request, send a reply and enter the
request in the queue
2. After receiving release message, remove the
corresponding request from the queue.
3. If own request is at the head of the queue, enter critical
section.
Problems:
• 3(N-1) messages per requests
• Multiple points of failure
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Requestor:
1. Send a request to all other process.
2. Enter critical section upon receipt of reply from all
processes.
Other Processes:
1. Upon receipt of a request check whether it has any
older (by logical clock) pending request of its own or
whether it is executing in critical section.
2. If neither of the above conditions hold, send reply.
Otherwise delay the reply message till both the
conditions are false.
Problems:
1. 2(N-1) message exchange per request.
2. Multiple points of failure.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Requestor:
1. Send a request to all other process.
2. Enter critical section once REPLY from a majority is
received
3.
OtherBroadcast
processes: RELEASE upon exit from the critical
section.
1. REPLY
to a request if no REPLY has been sent.
Otherwise, hold the request in a queue.
2. If a REPLY has been sent, do not send another REPLY
till the RELEASE is received.
Observations:
1. No single point of failure, but possibility of deadlock.
2. O(N) messages per request.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Deadlock prevention method is used by ensuring no hold-and-wait
Requestor:
1. Send a request with time stamp to all other process.
2. Enter critical section once REPLY from a majority is
received
3. Return the REPLY message upon receipt of an
INQUIRY if not currently executing in the critical
section.
4. Broadcast RELEASE upon exit from the critical
section.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Other processes:
1. If no REPLY has been sent, REPLY to any REQUEST.
2. If a REPLY has been sent but the RELEASE has not
been received, compare the time stamp of the new
REQUEST with the time stamp of the replied REQUEST.
If the new REQUEST has an earlier time stamp, try to
retract the old REPLY sending INQUIRY message. If the
older REPLY is returned, send REPLY to the new
REQUEST. Otherwise, wait for the RELEASE.
Observation:
1. High (O(N)) messages per critical section entry.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
To reduce message overhead, the set of (N) processes is divided into N
sets S1 to Sn such that Si I Sk is non-null for all i and k. Si is called the
quorum of process i.
Requestor:
1. Need to secure the REPLY messages from all members
of its quorum only.
Observations:
1. It is possible to reduce number of messages
significantly.
2. Depending on the structure of the quorum and the
exact algorithm, single point of failure may exist.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Control-token circulates in the system in some fixed order. Possession of
the token grants permission to enter critical section.
• Ring structure – simple, deadlock-free, fair. The token circulates even in the absence
of any request (unnecessary traffic). Long path (O(N)) – the wait for token may
be high.
• Tree structure
• Broadcast structure
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
The processes are organized in a logical tree structure, each node pointing
to its parent. Further, each node maintains a FIFO list of token requesting
neighbors. Each node has a variable Tokenholder initialized to false for
everybody except for the first token holder (token generator).
Entry section:
If not Tokenholder
If the request queue empty
request token from parent;
put itself in request queue;
block self until Tokenholder is true;
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Exit section:
If the request queue is not empty
parent = dequeue(request queue);
send token to parent;
set Tokenholder to false;
if the request queue is still
not empty, request token from parent;
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Upon receipt of a request:
If Tokenholder
If in critical section put the requestor in the queue
else
parent = requestor; Tokenholder = false;
send token to parent;
endif
else
if the queue is empty
send a request
to the parent;
put the
requestor in queue;
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Upon receipt of a token:
Parent = Dequeue(request queue);
if self is the parent
Tokenholder = true
else
send token to the parent;
if the queue is not
empty
request token
from parent;
Note that all requests/token receipts must be processed in
FIFO order and the routines must execute atomically.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Data Structure:
The token contains
Token vector T(.) – number of completion of the critical section
for every process.
Request queue Q(.) – queue of requesting processes.
Every process (i) maintains the following
seq_no – how many times i requested critical section.
Si(.) – the highest sequence number from every process i heard of.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Entry Section (process i):
Broadcast a REQUEST message stamped with seq_no.
Enter critical section after receiving token.
Exit Section (process i):
Update the token vector T by setting T(i) to Si(i).
If process k is not in request queue Q and there are pending
requests from k (Si(k)>T(k)), append process k to Q.
If Q is non-empty, remove the first entry from Q and send the
token to the process indicated by the top entry.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Processing a REQUEST (process j):
Set Sj(k) to max(Sj(k), seq_no) after receiving a REQUEST from
process k.
If holds an idle token, send it to k.
Requires broadcast. Therefore message overhead is high.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Used to elect a centralized controller if the older one fails. The
election of a new controller helps to alleviate some of the
problems arising from having a single point of failure.
Two types of election algorithm
• Extrema- finding – based on global priority
• Preference-based – based on local preferences.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Election algorithms and mutual-exclusion algorithms are similar in the sense that
they try to find one process that will be leader or enter the critical section. However,
there are significant differences between two.
1.
Leader election is one time. A process may yield to another process. In
mutual exclusion algorithm, a process competes until it succeeds.
2.
In leader election starvation is not an issue.
3.
Once the leader is elected, all other processes must know who is the
leader.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Election algorithms are designed on the basis of the logical topology of the group of
the processes.
1.
Complete topology
2.
Ring
3.
Tree
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Assumptions:
1.
A process may reach all other processes in one logical hop.
2.
Communication is reliable, only processes may fail.
3.
Time to handle a message is bounded.
4.
Process faults are benign, i.e., a faulty process stops to generate
messages. In other words, a process never generates confusing wrong
messages.
5.
Upon recovery, a process knows that it experienced a failure.
6.
A failed process may rejoin the group upon recovery.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Our algorithm is an extrema-finding algorithm. It is called the bully
algorithm.
Upon detecting that the leader has failed, a process sends an inquiry
message to all higher-priority processes. If no reply is received within
the time-out period, the initiator sends an enter-election message to all
lower priority processes. After the initiator receives a reply from all
lower-priority processes or after the time-out, it declares itself as new
leader and broadcasts that information to all processes.
If a process receives an inquiry message from a lower priority process,
it send inquiry message to all higher priority process and enters election.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Our algorithm is an extrema-finding algorithm. The processes are organized
in a logical ring.
The initiator sends a message along ring with its own id.
Requesting everybody to elect it the leader.
A process compares its own priority with that of the id. on an
incoming message. If its own priority is higher, and it never
forwarded any other messages, replace the message with its own
id. and forward it. If its own priority is lower, forward the message.
If the message carries its own id. then elect itself the leader and
forward the election message.
Message complexity O(N2) in the worst-case, O(N) in the best-case.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
There are two types of election algorithms those use the
logical tree topology.
1. The algorithm runs after the spanning tree is
constructed.
2. The algorithm runs before the spanning tree is
constructed and constructs the spanning tree in the
process.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Note that the tree is an undirected tree without any parent-child
relation. When the election begins there are possible more than
one
contender.
Contender:
1. If receives an elect-me message from some other process
before step 2 is completed, stop contending.
2. Send an elect-me message to all neighbors with a timestamp.
3. If receives reply from all neighbors, declare itself as leader
by broadcasting I-am-leader message.
4. If receives I-am-leader message from some other process,
stop contending.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Leaf Nodes:
1. Among possible multiple elect-me messages pick the one
with lowest time-stamp and reply to it. Do not send multiple
replies.
Internal nodes:
1. Upon receiving an elect-me message from neighbor u,
compare its time stamp with other such messages received
earlier (if any). If this one is the oldest elect-me message,
forward it to all neighbors except u. Designate u as parent.
2. Upon receiving reply from all neighbors except parent,
forward it to the parent.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
1. The effect of communication overhead
2. The effect of underlying architecture
3. Dynamic behavior of the system
4. Resource utilization
5. Turnaround time
6. Location and performance transparency
7. Task and data migration
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
1. Static or off-line
2. Dynamic or on-line
3. Real-time
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
1. Precedence graph
2. Communication graph
3. Disjoint process model
4. Statistical load modeling
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
1. Communication system model
2. Processor pool model
3. Isolated workstation model
4. Migration workstation model
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
1. Speedup
2. Resource utilization
3. Makespan or completion time
4. Load sharing and load balancing
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
m
l
Turnaround time=1/(m-l)
l
m
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
m
2l
m
Turnaround time=m/((m-l)(m+l))
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
The general problem is NP-complete.
1. Stone’s algorithm (2 heterogeneous processors, arbitrary
communication process graph).
2. Bokhari’s algorithm (n homogeneous processors, linear
communication process graph, linear communication
system)
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Assumptions:
1. Two heterogeneous processors. Therefore, the execution
cost of each process depends on the processor. Further, the
execution cost for each process is known for both
processors.
2. Communication cost between each pair of processes is
known.
3. Interprocess communication incurs negligible cost if both
the processes are in the same processor.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Objective Function:
Stone’s Algorithm minimizes the total execution and communication
cost by properly allocating the processors among the processes.
Let, G=(V,E) be the communication process model. Let A and B be two
processors. Let wA(u) and wB(u) be the cost of executing process u on processors
A and B respectively. Let, c(u,v) be the cost of interprocess communication
between processes u and v if they are allocated different processors. Further, S be
the set of processes to be executed on processor A and V-S be the set of process to
be executed on processor B. Stone’s algorithm computes S such that the objective
function Su e S wA(u)+Sv e V-S wB(v)+Su e S, v e V-S c(u,v) is minimized.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Example:
Processes
1
2
3
4
Cost at
Processor A
Cost at
Processor B
3
1
2
3
2
5
4
3
Computation Costs
1
1
2
3
4
4
5
0
1
2
2
3
2
4
Communication Costs
If process 1 and 2 are allocated to processor A, and the rest in processor B, then the
total cost is 3+1+4+3+5+0+1+2=19.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Commodity-flow problem:
Stone’s algorithm reduces the scheduling problem to the commodity-flow problem described below.
Let G=(V,E) be a graph with two special nodes u (source) and v (sink). Each edge has a
maximum capacity to carry some commodity. What is the maximum amount of the
commodity that can be carried from the source to the sink.
1.
There is a known polynomial time algorithm to solve the commodity-flow problem
2.
Let S be a subset of V such that the source is in S and the sink is in V-S. A set of edges (say
C) with one end in S and the other end in V-S is called a cut and the sum of the capacities of
the edges in C is called the weight of the cut. It can be shown that the maximum-flow is equal
to the minimum possible cut-weight. This is called max-flow, min-cut theorem.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Reduction to the commodity-flow problem:
Stone’s algorithm reduces the scheduling problem to the commodity-flow problem as follows.
Let G=(V,E) be the communication process graph. Construct a new graph G’=(V’,E’) by
adding two new nodes a (source, corresponding to processor A) and b (sink, corresponding
to processor B). For every node u in G, add the edges (a,u) and (b,u) in G’. The weights
(capacities) of the new edges will be wB(u) and wA(u) respectively.
1.
Note that a cut in G’ gives a processor allocation for the job.
2.
Further, the weight of the cut is same as the cost (objective function) of execution + cost of
communication for the processor allocation given by the cut.
3.
Therefore, if we compute the max-flow on G’, the corresponding min-cut gives the best
processor allocation.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
5
Example:
3
2
4
1
1
2
2
1
A
B
5
4
2
4
3
CSC 8420 Advanced Operating Systems
3
2
3
Georgia State University
Yi Pan
Assumptions:
1.
n homogeneous processors, connected in a linear topology.
2.
k>n processes, linear communication graph.
3.
Communication links are homogeneous.
4.
Computation and communication costs are known.
5.
Two processes not communicating directly will not be allocated the same
processor.
6.
Two communicating processes, if not allocated same processor, must be in
the adajacent processors.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Objective function:
Bokhari’s algorithm minimizes the sum of the total communication cost and the
maximum execution cost in a processor. Note the difference with the objective
function in Stone’s algorithm.
Let P1,…,Pn be processors in linear order and p1,…,pk be processes in linear
order. Let the execution cost of pi be wi and the communication cost between pi
and pi+1 be ci. If processes pj(j) to pi(j+1)-1 be allocated to processor Pi, then the
objective function for Bokhari’s algorithm will be maxiWj+Si=1ncj(i) where Wj is
the execution cost for processor j. Wj=wi(j)+wj(i+1)-1.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Example:
4
2
3
2
P1
1
4
P2
3
1
2
2
1
P3
The value of the objective function max(3+2,4,1+2+1)+(2+1)=8.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
To solve the problem Bokhari constructed the a layered graph. The graph has n+2
layers, where n is the number of processors. The top and bottom layer has 1 node each.
All other layers have k nodes where k is the number of processes. We number the layers
from 0 thru k+1 and denote the ith node of jth layer by v(i,j). The node in the top (0th)
layer is adjacent to all nodes in layer 1. The node in the bottom layer is adjacent to all
nodes in the kth layer. Other v(i,j)s are adjacent to v(i,j+1),…,v(k,j+1).
Example of a layered graph constructed
for n=2 and k=3. Note that any path
from the top layer to the bottom layer
gives a processor allocation subject to
the restrictions described earlier and
vice versa.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Each edge e in the graph has two weights, w1(e) and w2(e). If e is an edge connecting
v(i,j) and v(r,j+1) then w1(e) is the total execution time of the processes pi to pr-1 and
w2(e) is the communication cost between pr-1 and pr. Note that, a shortest path from the
top to bottom on the basis of w2 will minimize the total communication cost. Any
known shortest path algorithm may be used to do that.
Further, w1 may take only certain values. To be precise, there are k+1C2 possible values
of w1. We can compute those values and sort them. Then we can restrict the w1 by
deleting all edges with w1(e) greater than the restricted value. A shortest path in the new
layered graph will minimize the total communication cost subject to the restriction that
the maximum of the execution cost in any processor will not exceed the threshold value.
Using this method we can compute the objective function for each of the possible k+1C2
threshold values. The best processor allocation can thus be computed.
Note that a clever binary search over the possible values of w1 may reduce the
computational complexity of the algorithm significantly.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
The Goal of dynamic scheduling is to load share and balance dynamically. This goal is
achieved by migrating tasks from heavily loaded processors to lightly loaded
processors.
Two types of task migration algorithms:
• Sender-initiated
- Transfer policy (when to transfer?)
- Selection policy (which task to transfer?)
- Location policy (where to transfer?)
Potentially unstable.
• Receiver-initiated
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
A real-time task consists of Si, the earliest start time, Di, the deadline to finish the task
and Ci, the worst-case completion time.
A set of real-time tasks is called a real-time task set or a real-time system.
A system of real-time tasks is called hard real-time system if all of the tasks in the
system must complete execution within their respective deadlines. Even if one task
misses its deadline, the whole system is considered incorrectly executed.
A system of real-time tasks is called soft real-time system if the performance is judged
by how many tasks missed deadline and by how much.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
A schedule for a real-time system is a CPU assignment to the tasks of the system such
that a CPU is assigned to no more than one task at one time.
A schedule for a real-time system is called valid if no task is assigned CPU before its
earliest-start time and all tasks complete execution within their respective deadlines.
A real-time system is called feasible if there is a valid schedule for the system.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
1. Dispersed files and clients
- login transparency, access transparency
- location transparency, location independence
2. Multiple users and files
- concurrency transparency
- replication transparency
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Basic Concepts
1. Files and file systems
2. Services and servers
3. File mounting and server registration
4. Stateful and stateless file servers
5. File access and semantics od sharing
6. Version control
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
1. Files have names, attributes and data.
2. File organization – flat or hierarchical
3. File access – sequential, direct, indexed/indexed sequential
4. Key components of file service – directory, authorization, file
service (basic and transaction) and system services.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
1.
A service may be provided by one/many servers. Similarly one process
may provide multiple services.
2.
Client/server relationship is relative. A schematic is given below.
Directory
services
Auth.
services
Clients
File
services
System
services
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
1.
Mounting is method to build users’ directory structure by using files and
directories dispersed over multiple systems.
2.
Different types of mounting – Explicit, Boot and Auto-mounting.
3.
Mounting may or may not provide uniform global view.
4.
Mounting is not a location transparent protocol, hence server registration
is useful
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
1.
A stateful server maintains state information for each open file. Staeless
server does not maintain state information.
Examples of state information – list of open files and clients, file
descriptors and handles, file position pointers, mounting
information, lock status, session keys, cache or buffer etc.
2.
For stateless servers there are certain issues to be taken care of
- Idempotency requirement
- File locking mechanism
- Session key management
- Cache consistency
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
space
time
Remote
Access
Cache
Access
Down/upload
acc.
Simple
R/W
No true
Sharing
Coherency
control
Coherency
Control
Transaction
Concurrency
Control
Coherency and
Concurrency
Control
Coherency and
Concurrency
Control
Session
Not
applicable
Not
applicable
Ignore
sharing
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
1.
UNIX Semantics
- Can be implemented using only server-side cache.
- Closely approximated using client-side cache with write-through and writeinvalidate/update
- As a compromise write-back (file updated and cache invalidated only at the end
of the batch) cache coherence policy may be used.
2.
Transaction Semantics
- File at server is updated only at the end of a successful transaction
3.
Session Semantics
- File at server is updated only at the end of the session.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
One solution of the file sharing/replication problem is to create a new version of
the file upon every write or upon every close.
Problem: What if two applications open an older version of a file, then
application one closes the file (and thus creates a newer version) before
application 2. Then application 2 closes the file.
Three solutions:
1.
Ignore Conflict
2.
Resolve version conflict
3.
Resolve serializability conflict.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Transaction requirements:
1.
The execution of transactions are all or none.
2.
The interleaving of multiple transactions is serializable.
3.
Update is atomic.
Clients
Transaction
manager
Scheduler
Object
manager
Objects
Clients
Transaction
manager
Scheduler
Object
manager
Objects
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
A serializable schedule is a legal schedule such that its execution will produce
result equivalent to some serial execution of all transactions.
A simple, inefficient method to achieve serializability: Complete transactions in
private space, then update distributed objects using total order multicast.
Three concurrency control protocol for transaction management
1.
Two-phase locking
2.
Timestamp ordering
3.
Optimistic concurrency control
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
An object must be locked before accessing it.
No new lock can be acquired after releasing a lock.
Problems:
1.
Potential deadlock. May be solved by rollback or abort.
2.
Commit dependence resulting in probable rolling aborts if locks are released
as early as possible. May be solved using strict two-phase locking – locks
released only at commit or abort point.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Each transaction has a time-stamp. Transactions are serialized according to this
time-stamp. In other words, older transactions abort and restart if they have a
conflict with an younger transaction.
Each object has two time stamps RD and WR – times of the transactions which
read (wrote) that object last.
Further each object will have a list of tentative transaction times for pending
writes. Let Tmin be the minimum of these tentative times.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
The following actions are taken with each event:
•READ: Does not conflict with other reads. If the time-stamp of this read is
smaller than WR, this transaction is aborted. If it is allowed to proceed if it’s time
stamp is in between WR and Tmin. It is kept in a queue otherwise for some other
writes to commit.
•WRITE: Allowed to proceed only if it’s time stamp is greater than both RD and
WR. Then the write is marked tentative and put in the list.
•ABORT: Aborted read has no effect. Aborted write is removed from the tentative
list and if a pending read reaches the head of the queue, it is performed.
•COMMIT: A commit may not involve any pending read. If it has any pending
write then all preceding tentative writes aborted and this transaction is committed.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Transactions are allowed to proceed freely in the private work space. Before
committing it is validated. Once validated, the objects may be updated in updation
phase.
Each transaction ti has two time stamps, TSi and TVi for the start of the execution
and the validation phase respectively. Each object Oj records the times for its last
read and write commits, RDj and WRj, respectively. Ri and Wi are read set and
write set for ti. The transactions are serialized in the order of TVi.
VALIDATION: If tk is already in validation phase, and TVi < TVk, then ti can not
be validated and must be aborted.
If ti has no overlap with any other transaction, it is validated.
If ti execution phase overlaps with update phase of tk, then Ri and Wk needs to be
disjoint to validate ti.
If ti execution phase overlaps with validate and update of tk, then Ri and Wk needs
to be disjoint as well as Wi and Wk need to be disjoint.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Reading operation:
1.
Read-one-primary
2.
Read-one
3.
Read-quorum
Writing operation:
1.
Write-one-primary
2.
Write-all
3.
Write-all-available
4.
Write-quorum
5.
Write-Gossip
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
1.
2*write-quorum > number of replicas
2.
Read-quorum+write_quorum > number of replicas
3.
Usually read-quorum is chosen to be smaller than write-quorum
4.
Voting by witnesses
5.
Weighted voting schemes.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Each File service agent (FSA) f maintains a time stamp of last updation – TSf
Each Replica manager (RM) i maintains a time stamp of last updation – TSi
Basic Gossip Protocol:
Read: If TSf > TSi the replica manager do not have current data. Either wait or
contact a different replica manager. Otherwise proceed with reading.
Write: Increase TSf. If TSf > TSi update and ask the replica manager to
propagate the update by gossip. Otherwise, depending on the application,
either overwrite or update and read. In both cases increase TSf.
Gossip message: If the gossip message carries a newer time-stamp, update
data, TSi and depending on application, repeat the gossip with Tsi.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
1.
Uniform Memory Access (UMA)
Cache
MM
PE
Network
Cache
2.
PE
MM
Non-Uniform Memory Access (NUMA)
MM
PE
Network
MM
PE
CSC 8420 Advanced Operating Systems
Cache
Georgia State University
Cache
Yi Pan
Virtual Memory Space
MM
MM
MM
MM
MM
Frame
MM
Offset
Page Table Entry
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
1.
Temporal and Spatial locality of references may be exploited using
cache
2.
Scalability may be improved by using improved network and/or by
using cache.
3.
DSM helps the application programmer by providing communication
transparency.
4.
Programmers are more familiar with shared memory. Further many
software are available for single processor and multiprocessor shared
memory systems.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
The performance of a DSM may be improved by 1. Placing a data block in
appropriate memory module, 2. By migrating data block to an appropriate
memory module and 3. by replicating block.
The performance of migration and replication strategies are measured by hit
ratio.
Block-migration strategies suffer from block-bouncing problem.
Both block-migration and block-replication strategies suffer from the problem
of false-sharing.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
In a distributed system, it is not possible to enforce uniprocessor-like coherency
for shared data items.
1.
There is no global clock. Therefore, it is not possible to determine latest
write.
2.
There will always be delays (non-deterministic).
Hence we use some weaker consistency model for shared data in distributed
environment. The programmer is supposed to know what kind of consistency
is guaranteed and act accordingly.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
If the user can not give any synchronization information, then general access
consistency models are used. The general access consistency models are
1.
Atomic consistency (usual uniprocessor consistency, a read never gets stale
data)
2.
Sequential consistency
3.
Causal consistency
4.
Processor consistency
5.
Slow memory consistency.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
The operation of all processors are executed in some sequential order and the
operations of each individual processor are executed in the order specified
by its program.
P1: W(X)1
P2:
P3:
W(Y)2
R(Y)2 R(X)0 R(X)1
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Only causally related writes must appear in same order to all processors.
P1: W(X)1
W(X)3
P2: RX(1) W(X)2
P3: R(X)1
R(X)3
R(X)2
P4: R(X)1
R(X)2
R(X)3
Causally consistent but not sequentially consistent.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Writes from same processor must appear in the order they were issued.
P1: W(X)1
P2:
RX(1) W(X)2
P3:
R(X)1
R(X)2
P4:
R(X)2
R(X)1
Processor consistent, but neither causally consistent nor sequentially consistent.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Writes from the same processor only in the same location (memory address)
must appear in the order they were issued.
P1: W(X)1 W(Y)0 W(Y)2
P2:
R(X)1
W(X)3
R(X)3 R(Y)0
Slow memory consistent, but not processor consistent.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
The user controls access to the ordinary shared variable by using semaphore or
other explicit synchronizing variables. The system provides some
consistency only for synchronizing variables.
Synchronization access consistency models are:
1.
Weak consistency
2.
Release consistency
3.
Entry consistency.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Access to a synchronization variable is synch(S).
1.
Access to all synchronizing variables are sequentially consistent
2.
A processor must not access a synchronizing variable with pending access
to any data
3.
A processor must not access any variable with pending access to a
synchronizing variable.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Access to a synchronization variable are acquire(S) and release(S).
1.
Access to all synchronizing variables are processor consistent
2.
Access to all shared variables between acquire and release are exclusive.
3.
Changes made to shared variables are made visible to the outside world
after release.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Access to a synchronization variable are acquire(X) and release(X).
1.
Access to all synchronizing variables are processor consistent
2.
Access to shared variable X between acquire and release are exclusive.
3.
Changes made to X are made visible to the outside world after release.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Master copy
E
<
Replicated block
V E
Replicated block
V E
Replicated block
V E
CSC 8420 Advanced Operating Systems
Processor- bits
Georgia State University
>
V – valid/invalid
E - exclusive
Yi Pan
Any cache manager faces the following events, and must take appropriate
actions to maintain functionality
1.
Read-hit
2.
Read-miss
3.
Write-hit
4.
Write-miss
The performance of a cache management algorithm is judged by both hit-ratios
and the overhead that occurs at each of the four events.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Event
Action by processor
Action by directory
Read hit
Use data from local cache
Do nothing
Read miss
Get the block from master
copy. Set V bit. May set
E bit also.
Send the block to the
requestor. Set P bit, may
set E bit also.
Write hit
Change own cache, send
data and invalidate
message to the directory.
Update master copy, send
invalidate message to
other copies.
Write miss
Same as write hit
Same as write hit.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Event
Action by processor
Action by directory
Read hit
Use data from local cache
Do nothing
Read miss
Get the block from master
copy. Set V bit. May set
E bit also.
Send the block to the
requestor. Set P bit, may
set E bit also.
Write hit
Change own cache, send
data and invalidate
message to the directory.
Update master copy, send
update message to other
copies.
Write miss
Same as write hit
Same as write hit.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
If the communication medium is a broadcast medium like bus, there is no
need of the directory, as all processors can monitor all cache related
messages and take appropriate actions. Clearly, such systems can
implement sequential consistency.
However, bus is a potential bottleneck and a single point of failure. Several
approaches are taken to alleviate the problem.
1. Cache traffic is carried on a separate bus.
2. Multiple bus in hierarchical configuration.
3. Multistage interconnection networks (MIN) made of bus.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
In a DSM system, memory modules are strictly local and communication is
only via high-latency links. Therefore, such systems simulates/emulates
shared memory by message passing.
Obviously, a process in such a system finds its address-space distributed in
the memories of different processors. Therefore, it has three options while
accessing a non-local memory page.
1.
Remote access
2.
Migration
3.
Replication
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
With three access mechanisms and two access types (read and write), there are
nine possible DSM management algorithms. Out of nine, the following
four are more interesting.
1.
Read-remote-write-remote
2.
Read-migrate-write-migrate
3.
Read-replicate-write-migrate
4.
Read-replicate-write-replicate
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
One or multiple server serves the memory pages.
1.
Simple to implement
2.
If implemented with a single server, the system is always sequentially
consistent provided that the server serializes request and response services.
3.
High latency
4.
Servers are potential bottlenecks.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
A memory page migrates to the new processors memory upon access
1.
Coherence problem need not be handled separately
2.
Exploits program localities well
3.
Upon migration, all processors sharing the page need to update their virtual
page to physical block mapping
4.
Ping-pong effect is a problem that becomes more severe by false sharing.
May be handled by smaller page size (associated with high overhead), or a
combination of migrate-replicate-remote access strategy.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Same as the write-invalidate strategy for cache management.
1.
Popular because many software use the multiple-read/exclusive-write
semantic.
2.
Performs well when reads are dominant operation.
3.
In the absence of broadcast support in hardware, write becomes a costly
operation.
4.
Maintains strong consistency.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Same as the write-update strategy for cache management.
1.
Performs well when reads are not dominant operation.
2.
In the absence of broadcast support in hardware, write becomes a very
costly operation.
3.
Maintains strong consistency only when used with a suitable protocol like
two-phase commit.
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Two problems need to be solved:
1.
Finding the current owner of a page (if there is migration).
2.
Finding the set of processors with a copy of the page (if there is
replication.)
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
1.
If there is a master owner, that may keep track of the current owner
2.
Otherwise, the following data structure may be used.
Block probable
owner
Block
Block probable
owner
Block
probable
owner
Block
probable
owner
probable
owner
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
From To’s
From To’s
From To’s
Spanning Tree
From NIL
From NIL
Forwarded
Head master
next
Head master
next
Head master
next
Update Invalidate Request
Head master
next
Acknowledgement
Requestor
Linked list
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Distributed Shared Memory systems are classified on the basis of the following criteria
1.
Implementation – Hardware, Software or Hybrid
2.
Architecture Configuration or Interconnection Topology – Bus, Ring, Cube, MIN etc.
3.
Shared Data Organization – Structured (Objects, higher level types etc.) or Non-structured.
4.
Granularity/Coherence unit – Word, Cache, Block, Page, Object etc.
5.
DSM Algorithms – SRSW, MRSW, MRMW.
6.
Management Responsibility – Distributed or centralized
7.
Consistency Model
8.
Coherence Protocol
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan
Processor
Processor
Processor
Processor
Cache
Cache
Cache
Cache
Local Bus
Read/Write
LB to VB
Interface
Node Local
Memory
Lock Space
Control Spc
Data Spc
Write
Network
Interconnect
RM BUS
Write
VME Bus
I/O
CSC 8420 Advanced Operating Systems
I/O
Georgia State University
Yi Pan
1.
Implementation – Hybrid, Reflective memory + library routines
2.
Architecture Configuration or Interconnection Topology – Hierarchical bus
3.
Shared Data Organization – Structured (Data structure)
4.
Granularity/Coherence unit – Data structure
5.
DSM Algorithms – MRMW.
6.
Management Responsibility - Distributed
7.
Consistency Model – Entry consistency
8.
Coherence Protocol – Write-update
CSC 8420 Advanced Operating Systems
Georgia State University
Yi Pan