Distributed Computing

Download Report

Transcript Distributed Computing

Distributed Computing
Distributed Computation Using Files
Part 1
f1 = open(toPart2, …);
while(…){
write(f1. …);
}
close(f1);
f2 = open(toPart1, …);
while(…){
read(f2, …);
}
close(f2);
Part 2
f2 = open(toPart2, …);
while(…){
read(f1, …);
}
close(f1);
…
f2 = open(toPart1, …);
while(…){
write(f2. …);
}
close(f2);
Finer Grained Sharing (1)
• Distributed computing can be for:
– Speed up (simultaneous execution)
– Simplify data management (consistency)
• Last Generation: programming languages (and
programmers) do serial computation
– Want to share global data
– Speed up is a specialist’s domain
• Procedure is basic unit of abstraction
– Abstract data type behavior
Finer Grained Sharing (2)
• Newer computing model
– Partition into processes/threads
– Message passing communication
• New OS & language support
–
–
–
–
–
Remote procedures
Remote objects with remote method invocation
Distributed process management
Shared memory
Distributed virtual memory
• … but first, how to partition the computations?
Data Partition
while(…){…}
Distribute Data
Serial Form
Serial Form
Serial Form
Serial Form
Execute all data streams simultaneously
Functional Partition (1)
Serial Form
The Parts
A Partition
Functional Partition (2)
• Software is composed from procedures
• All programmers are familiar with procedural
abstraction – exploit procedure model
• Allow each function to be a blob
• Implement each blob as a process
• OS provide network IPC mechanism for serial use
of distributed functions
– TCP/IP
– Messages
– Serial “procedure call” protocol between client and
server
Record Sharing
Part 1
…
while(…){
writeSharedRecord(…);
readSharedRecord(…);
}
…
Part 2
…
while(…){
readSharedRecord(…);
writeSharedRecord(…);
}
…
Message Passing
Application buffer
Kernel buffer
Network packet
DL Frame
Remote Procedure Call
int main(…) {
…
func(a1, a2, …, an);
…
}
void func(p1, p2, …, pn) {
…
}
Conventional Procedure Call
int main(…) {
…
func(a1, a2, …, an);
…
}
void func(p1, p2, …, pn) {
…
}
Remote Procedure Call
Conceptual RPC Implementation
int main(…) {
…
func(a1, a2, …, an);
…
}
void func(p1, p2, …, pn) {
…
}
…
pack(a1, msg);
pack(a2, msg);
…
pack(an, msg);
send(rpcServer, msg);
// waiting ...
result = receive(rpcServer);
...
// Initialize the server
while(TRUE) {
msg = receive(anyClient);
unpack(msg, t1);
unpack(msg, t2);
…
unpack(msg, tn);
func(t1, t2, …, tn);
pack(a1, rtnMsg);
pack(a2, rtnMsg);
…
pack(an, rtnMsg);
send(rpcServer, rtnMsg);
}
Remote Procedure Call
int main(…) {
…
func(a1, a2, …, an);
…
}
void func(p1, p2, …, pn) {
…
}
…
pack(a1, msg);
pack(a2, msg);
…
pack(an, msg);
send(rpcServer, msg);
// waiting ...
result = receive(rpcServer);
...
// Initialize the server
while(TRUE) {
msg = receive(anyClient);
unpack(msg, t1);
unpack(msg, t2);
…
unpack(msg, tn);
func(t1, t2, …, tn);
pack(a1, rtnMsg);
pack(a2, rtnMsg);
…
pack(an, rtnMsg);
send(rpcServer, rtnMsg);
}
Implementing RPC
• Syntax of an RPC should look as much like a local
procedure call as possible
• Semantics are impossible to duplicate, but they
should also be as close as possible
• The remote procedure’s execution environment
will not be the same as a local procedure’s
environment
–
–
–
–
Global variables
Call-by-reference
Side effects
Environment variables
Implementing RPC
theClient
main
int main(…) {
…
localF(…);
…
remoteF(…);
…
}
void localF(…) {
…
return;
}
clientStub
lookup(remote);
pack(…);
send(rpcServer, msg);
receive(rpcServer);
unpack(…);
return;
rpcServer
register(remoteF);
while(1) {
receive(msg);
unpack(msg);
remoteF(…);
pack(rtnMsg);
send(theClient,rtnMsg);
}
void remoteF(…) {
…
return;
}
Name Server
void register(…) {
…
}
void lookup(…) {
…
}
Compiling an RPC
• callRemote(remoteF, …);
• remoteF(…);
– Compile time
– Link time
– Dynamic binding
A Partitioned Computation
Serial Form
The Parts
A Partition
Supporting the Computation
• Each blob might be a process, thread, or object
• Blobs should be able to run on distinct,
interconnected machines
• OS must provide mechanisms for:
– Process Management
•
•
•
•
Control
Scheduling
Synchronization
IPC
– Memory Management
• Shared memory
– File Management – remote files
• Distributed OS or cooperating Network OSes?
Control
• Remote process/thread create/destroy
• Managing descriptors
• Deadlock
Scheduling
•
•
•
•
•
Threads and processes
Explicit scheduling
Transparent scheduling
Migration & load balancing
Objects
– Active vs passive
– Address spaces
Synchronization
• Distributed synchronization
– No shared memory  no semaphores
– New approaches use logical clocks & event
ordering
• Transactions
– Became a mature technology in DBMS
– Multiple operations with a commit or abort
• Concurrency control
– Two-phase locking
Traditional Memory Interfaces
Process
Primary Memory Interface
Secondary Memory Interface
Virtual
Memory
File
Management
Device Interface
Physical
Memory
Storage
Devices
Remote File Services
Process
Primary Memory Interface
Secondary Memory Interface
Virtual
Memory
Physical
Memory
File
Management
Storage
Devices
Remote Disk
Client
Remote Disk
Server
Remote File
Client
Remote File
Server
Distributed Shared Memory
Process
Remote Memory
Interface
File
Management
Remote Memory
Client
Remote Memory
Server
•Static memory  New language
•Dynamic memory  New OS
interface
•Low level interface
•Binding across address
spaces
•Shared memory malloc
•High level interface
•Tuples
•Objects
Distributed Virtual Memory
Process
Primary Memory Interface
Virtual
Memory
Remote Paging
Client
Physical
Memory
Remote Paging
Server
Storage
Devices
Storage
Devices
Distributed Objects
Process
Process
Object Interface
Local
Object Interface
Performance
Local Objects
Remote Object
Client
Remote Object
Server
Local Objects
Remote
Object Interface
e.g. Corba, DCOM,
SOAP, …
Remote Object
Client
Remote Object
Server