02DistributedSystemBuildingBlocksx - Tsinghua
Download
Report
Transcript 02DistributedSystemBuildingBlocksx - Tsinghua
Distributed
System Building
Blocks
Outline
Distributed Programming Paradigm
◦ Shared Memory Programming
◦ Message Passing Interface
Networking
Remote Procedure Call
Distributed
Programmin
g Paradigm
Based on the memory architecture, the
programming paradigm can be roughly
categorizes to different classes
Shared Memory Programming
◦ The processing units share the same memory
space
Message Passing Interface Programming
◦ There is no shared memory among multiple
processing units thus the processing units can
only communicate sending and receiving
messages
Parallel Processing Idea
Serialized processing with context switch
Task 1
Task 2
Parallel processing
Task 1
Task 2
Shared Memory Programming
Model
Multiple processing units connect to the shared memory and have the
same memory address space. All the processing units can see the
virtually the same memory.
Processing
Unit
Processing
Unit
……
Processing
Unit
……
Memory Unit
Memory Buses
Memory Unit
Memory Unit
Single shared memory address space
Multi-thread Programming
Shared memory multi-thread programming is the standard for single
machine programming. It can harness the full power of multicore
architecture (with careful programming). The programming model is
quite simple but it is hard to program correctly and efficiently.
◦ Windows: WinThread
◦ Linux: pthread
◦ Scientific Computing: OpenMP
We will see some examples of pthread and OpenMP programming and
use the examples to show some important concepts while building our
distributed systems.
Process and Thread
Thread creation and
termination
Create a thread by providing the entry of the thread (a function)
◦ pthread_create(thread, attr, start_routine, arg)
Wait a thread to finish. This is a special kind of thread synchronization.
◦ pthread_join
Quit the execution of a thread
◦ pthread_exit
Once a thread is created, they are peers and independent.
pthreadcreate.c
Thread synchronization
As threads share the same memory address space, it is dangerous that
if a shared resources are accessed simultaneously. If this happens, the
behavior of the program will not be defined. Thus, we need the
mechanisms to synchronize the access to the shared resources.
Commonly, this can be achieved through locks.
Another synchronization happens when you want to define the order
of instruction flow in different threads. As every two threads are
independent, some synchronization should be used to implement this.
Mutual Exclusion
Providing mutual exclusion access to shared resources.
Thread
Thread
Shared
Resources
Thread
Thread
pthread Mutual Exclusion
Mutex is an abbreviation for "mutual exclusion". Mutex variables are
one of the primary means of implementing thread synchronization
and for protecting shared data when multiple writes occur. Only one
thread can lock (or own) a mutex variable at any given time.
pthread_mutex_init
pthread_mutex_destroy
pthread_mutex_lock
pthread_mutex_unlock
pthreadmutex.c
Define execution order among
different threads
It is quite common that events are used as a mechanism for defining
the execution order among different portions of codes located in
multiple threads.
Event means the execution of some threads will not continue until
something has happened. Another thread will make the thing happen.
Waiting
thread1
Notify
Working
thread2
pthread Conditional Variables
Condition variables allow threads to synchronize based upon the
actual value of data.
pthread_cond_int(condition,attr)
pthread_cond_destroy(condition)
pthread_condition_wait(condition, mutex)
pthread_condition_signal(condition)
pthread_condition_broadcast(condition)
pthreadcondition.c
Race condition
Two thread access the shared resources without synchronization. The
behavior of race condition is undefined and might bring some
undesirable results.
int global_counter=0;
//thread 1
//thread 2
for (int i = 0; i<50;i++)
for (int i = 50; i<=100;i++)
global_counter+=i;
global_counter += i;
What will be the final result of global_couter after these two code blocks
finished?
Dead lock
//thread 1
//thread 2
lock(A)
lock(B)
lock(B)
lock(A)
do_something()
do_someotherthings()
unlock(B)
unlock(A)
unlock(A)
unlock(B)
Deadlock on the road
Live lock
Only request for lock but do nothing useful.
while (true){
while (true){
Lock L1
Lock L2
if (!Lock L2)
if (!Lock L1)
Release(L1)
Release(L2)
else
else
break;
break;
}
}
do something useful here
do something useful here
Message Passing Interface
With the large number of computing nodes, it is very difficult to build
a single shared memory space for the processing units. Thus, process
can exchange information by sending/receiving messages.
MPI is the de-facto standard for programming in the cluster
environment for scientific computing.
MPI Programs
Each process has its own stack and code segment. Processes exchange
information by passing messages. Support both SPMD and MPMD
computing.
process 0
process 1
Load
SPMD Program
Process
Gather
Store
process 2
MPI supports MPMD
(a) MPMD
Master/Worker
(b) MPMD
Coupled Analysis
Node 1
prog_a
Node 2
prog_b
Node 3
prog_a
Node 1
prog_b
Node 2
prog_c
Node 3
prog_a
prog_b
prog_c
Node 1
Node 2
Node 3
(c) MPMD
Streamline
Create the MPI world
#include <stdio.h>
#include "mpi.h"
int main( int argc, char *argv[] ) {
int rank;
int size;
Hello
Hello
Hello
Hello
world
world
world
world
from
from
from
from
process
process
process
process
MPI_Init( argc, argv );
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
printf( "Hello world from process %d of %d\n", rank, size );
MPI_Finalize();
return 0;
}
0
1
2
3
of
of
of
of
4
4
4
4
MPI Basic: Point to Point
Communication
int MPI_SEND(buf, count, datatype, dest, tag, comm)
int MPI_RECV(buf,count,datatype,source,tag,comm,status)
What parameter make the communication happen?
◦
◦
◦
◦
◦
◦
The buffer of sender or receiver
quantity of data, count
data type
source and destination
tag
the communicators and groups
Group Synchronization
MPI_Barrier(comm)
◦ Creates a barrier synchronization in a group. Each task, when reaching the
MPI_Barrier call, blocks until all tasks in the group reach the same
MPI_Barrier call.
Broadcast
processes
data
A0
broadcast
(a)
A0
A0
A0
A0
A0
A0
Scatter and Gather
data
process
A0 A1 A2 A3 A4 A5
scatter
gather
(b)
Gather: many to one
Scatter: one to many
A0
A1
A2
A3
A4
A5
Allgather
processes
data
A0
B0
C0
D0
E0
F0
allgather
A0 B0 C0 D0 E0
A0 B0 C0 D0 E0
A0 B0 C0 D0 E0
A0 B0 C0 D0 E0
A0 B0 C0 D0 E0
A0 B0 C0 D0 E0
F0
F0
F0
F0
F0
F0
Alltoall
data
process
A0 A1 A2 A3
B0 B1 B2 B3
C0 C1 C2 C3
D0 D1 D2 D3
E0 E1 E2 E3
F0 F1 F2 F3
A4 A5
B4 B5
C4 C5
alltoall
D4 D5
E4 E5
F4 D5
(d)
A0
A1
A2
A3
A4
A5
B0 C0
B1 C1
B2 C2
B3 C3
B4 C4
B5 C5
D0
D1
D2
D3
D4
D5
E0
E1
E2
E3
E4
E5
F0
F1
F2
F3
F4
D5
IP, TCP, DNS
Socket
Network
basics
Protocols
programming structures
TCP/IP, DNS
IP: The Internet Protocol (IP) is the principal communications protocol
used for relaying datagrams (also known as network packets) across an
internetwork using the Internet Protocol Suite responsible for routing
packets across network boundaries. (Routing, finding a specified
destination on the Internet)
TCP: TCP provides reliable, ordered delivery of a stream of octets from
a program on one computer to another program on another computer.
DNS: You can not remember something like 74.125.128.100 (IP
address) easily. But you can remember www.google.com easily. DNS is
just the system to translate the name of a machine to the IP address.
What makes two process talk?
The address of two machines and the identification of two process in
each machine.
Source IP, Destination IP: the address of two machines
Source Port, Destination Port: the identification of two processes in
two machines
So, a connection is identified by following four parameters:
◦
◦
◦
◦
source IP
destination IP
source Port
destination Port
Socket
Sock is the fundamental programming abstraction for communication
between two processes on two different machines. (It is OK to use
socket for communication in the same machine, however this is not
the typical usage for inter process communication between two
processes on the same machine.)
Client and server use two different types of sockets:
◦ Client creates a client-side socket and make a connect call to the server
socket. After successfully connected, the client can start sending data to the
server.
◦ Server creates a server-side socket and often listen on this socket waiting
for the client. If a connection request package is received, the server will
then make a new socket to accept the connect and start communication.
The original socket can be used waiting for other connections.
Ports
As mentioned before, ports are used to identify a specific process
within a machine(with an IP address).
Using different source ports allows multiple clients to connect to a
server at once.
Example: Web Server (1/3)
1) Server creates a socket
attached to port 80
80
The server creates a listener socket attached to a specific port.
80 is the agreed-upon port number for web traffic.
33
Example: Web Server (2/3)
2) Client creates a socket and
connects to host
(anon)
Connect: 66.102.7.99 : 80
80
The client-side socket have to use a source port, but the OS
chooses a random unused port number
When the client requests a URL (e.g., “www.google.com”), its
OS uses DNS system to find its IP address.
34
Example: Web Server (3/3)
3) Server accepts connection,
gets new socket for client
80
(anon)
(anon)
4) Data flows across connected
socket as a “stream”, just like a file
Listener is ready for more incoming connections, while current
connection can processed in parallel.
35
Example: Web Server
36
The network packet
Data transfer over Internet by using data packets.
Packets wrapper various information that is used for different usage.
For example, addresses are used for routing, serial number and size
are used for stream control.
Your data can be considered as payload in the packet which looks like
a letter inside an envelop.
IP header
TCP
header
Your data
You should know that here are some lower level of protocols for
interoperate with physical devices such as MAC layer for Ethernet or using
802.11 wireless.
IP: the Internet Protocol
IP mainly focuses on how to find a machine on the Internet. Thus, IP
define the addressing schema of machines.
IP packet encapsulate the upper layer protocol information as well as
the data that provided by the applications.
IP protocol does not provide reliability. IP protocol just includes
enough information for the data to tell the routers where is the
destination for the data carried in the packet.
TCP: Transport Control
TCP is built on top of IP.
TCP provides a virtual line between two ends. The data is stream
oriented instead of message (packet) orientied.
TCP provides the reliability and ordering of messages.
TCP is very important basic building block for upper layer protocol. For
example, HTTP is built on top of TCP.
You and the web you want to
access
Not actually tube-like “underneath the hood”
Unlike phone system (circuit switched), the packet switched Internet
uses many routes at once
you
www.google.com
It is difficult to handle network
problems
If you can not receive a message from a specific machine, it is quite
difficult even impossible to identify whether it is node crash or
network crash.
If you send some data to a machine and a party to a socket
disconnects, how can we identify how much data did the other receive.
Security problems: during the data transfer, Can someone in the
middle intercept/modify our data?
Performance problems: Traffic congestion makes switch/router
topology important for efficient throughput
Programming structures for
processing network information
fork() based server data processing
multiple threads based
select() based
poll() based
see the bible of UNIX Network Programming
Before you do the data
transfer
CLIENT
SERVER
fd=socket()
listenfd=socket();
setsockopt(fd)
setsockopt(listenfd)
r=connect(fd,destination)
bind(listenfd)
read(fd)/write(fd)
listen(listenfd)
send(fd)/recv(fd)
acceptedfd=accet(listenfd)
sendto/recvfrom(fd)
close(fd)
do_various_work_with_accepte
dfd();//see following slides
fork based server data
processing
acceptedfd=accept(listenfd);
pid=for();
assert(pid>=0);
if(pid==0) //child process
close(listenfd);
do_some_thing_with_acceptedfd;
else
close(acceptedfd);
go_back_to_accept();
threads code
acceptedfd=accept(listenfd);
thread=get_free_thread_from_pool();
set_thread_data(thread,acceptedfd);
activate_thread(thread)
go_back_to_accept()
//in the thread
do_something(acceptedfd);
close(acceptedfd);
select code for multiple
sockets
Why? you want to reuse the power of single thread and processing on
multiple sockets. you want to stay in the same thread an thus you can
keep the information more conveniently ( you don’t want to do some
synchronization among threads).
FD_ZERO, FD_SET, fd_set(readfds, writefds), maxfd //setting the fds you
want to monitor
switch(select(fd_set)){
-1: something is wrong; break;
0: rarely happen, you should do the select again;break;
default: for each fd you want to monitor if FD_ISSET(fd_set, fd) do data
transfer with the fd.
}
poll code for multiple sockets
Some other group propose the function of poll and use similar but
different programming interface.
The programming structure can be the same as select.
int poll(struct pollfd *ufds, unsigned int nfds, int timeout);
POLLIN, POLLOUT, POLLPRI
If you are using the poll or select
version, how can you notify the
working thread?
using fifo, and put one of the fd pair in the poll or select list
otherwise, you can use eventfd() which only use one instead of two
fds.
What is remote procedure call?
Why RPC?
The types of RPC
RPC
How can we implement RPC (RPC internals)
RPC
A remote procedure call (RPC) is an inter-process communication that
allows a computer program to cause a subroutine or procedure to
execute in another address space (commonly on another computer on
a shared network) without the programmer explicitly coding the
details for this remote interaction.
That is, the programmer writes essentially the same code whether the
subroutine is local to the executing program, or remote.
Consider the object oriented principles, RPC is called as remote
invocation or remote method invocation.
Request/Response over
Internet
Regular client-server protocols involve sending data back and forth
according to a shared state.
Client:
Server:
HTTP/1.0 index.html GET
200 OK
Length: 2400
(file data)
HTTP/1.0 hello.gif GET
200 OK
This is the straightforward way to
use the network facilities.
Length: 81494
…
Call a function in another
process in another machine
RPC servers will call arbitrary functions in its address spaces with
arguments passed over the network and return values back over
network.
Client:
Server:
foo.dll,bar(4, 10, “hello”)
“returned_string”
foo.dll,baz(42)
err: no such function
…
Possible modes of RPC
Synchronous RPC: Client call an RPC function, and then wait until the
return value is sent back from the server.
Asynchronous PRC: Client call an RPC function, and then can continue
on some other work. After a while, the client can check a handle to
find out whether the return value of an PRC call is ready.
callback supported RPC: Client call an RPC function and then can
continue on some other work. When the execution of the function is
finished on the server, the server will notify the client to call a
registered callback function.
Similar concepts exists in many areas of computer science including
networking (different types of sockets), operating system (think about
the system call provided).
Synchronous RPC
client
server
s = RPC(server_name, “foo.dll”,
get_hello, arg, arg, arg…)
RPC dispatcher
time
foo.dll:
String get_hello(a, b, c)
{
…
return “some hello str!”;
}
print(s);
...
54
Asynchronous RPC
client
server
h = Spawn(server_name,
“foo.dll”, long_runner, x, y…)
RPC dispatcher
...
keeps running…)
time
(More code
foo.dll:
String long_runner(x, y)
{
…
return new GiantObject();
}
GiantObject myObj = Sync(h);
55
Callbacks
client
server
h = Spawn(server_name, “foo.dll”,
callback, long_runner, x, y…)
RPC dispatcher
time
(More code
...
Thread spawns:
runs…)
foo.dll:
String long_runner(x, y)
{
…
return new Result();
}
void callback(o)
{
Uses Result
}
56
So, how can we implement
RPC?
From the client side:
1 wrap the argument
2 wrap the function id
3 wrap the server:port
So, translate the function call of bar(arg0, arg1) to some underlying
mechanism like rpc_call(foo.dll, bar, arg0, arg1)
Programmers want bar(arg0, arg1) but the RPC designer have to
implement rpc_call
Design Considerations
Protocol Choices: UDP? TCP?
Fault Tolerant: what if the network is broken? The call might be sent 0, 1,
2, ……times. A client sent just one call, but the server might receive
multiple invocations. What should the server do? (at-most-once semantic
vs. multiple invocation semantic.)
Security: can any one call RPC functions? This is through network, some
malicious user might send a lot of invocations.
Compatible: how do you handle multiple versions of a function?
Error conditions: the function call itself might return error. The RPC
framework might raise errors. So, how to handle various error conditions?
Object Oriented Support: we need to marshal/unmarshal objects.
A lot of RPC protocols: DCOM, CORBA, JRMI……
Go to Lab1
Continue code guide for the lib1 and help students to understand
various aspects of the source code in yfs.
RPC
FUSE
Thank you! Any Questions?