Transcript dist-prog2

Client/Server Distributed Systems
240-322, Semester 1, 2005-2006
2. Distributed Programming
Concepts
 Objectives
– explain the general meaning of distributed
programming beyond client/server
– look at the history of distributed
programming
240-322 Cli/Serv.: Dist. Prog./2
1
Overview
1. Definition
2. From Parallel to Distributed
3. Forms of Communication
4. Data Distribution
5. Algorithmic Distribution
6. Granularity
7. Load Balancing
8. Brief History of Distributed
Programming
240-322 Cli/Serv.: Dist. Prog./2
2
1. Definition
 Distributed
programming is the spreading
of a computational task across several
programs, processes, or processors.
 Includes
parallel (concurrent) and
networked programming.
 Definition
240-322 Cli/Serv.: Dist. Prog./2
is a bit vague.
3
2. From Parallel to Distributed

Most parallel languages talk about processes:
– these can be on different processors or on different
computers

The implementor may choose to add language
features to explicitly say where a process should run.

May also choose to address network issues
(bandwidth, failure, etc.) at the language level.
240-322 Cli/Serv.: Dist. Prog./2
continued
4
 Often
resources required by programs
are distributed, which means that the
programs must be distributed.
240-322 Cli/Serv.: Dist. Prog./2
continued
5
240-322 Cli/Serv.: Dist. Prog./2
continued
6
Network Transparency
 Most
users want networks to be as
transparent (invisible) as possible:
– users do not want to care which machine is
used to store their files
– they do not want to know where a process is
running
240-322 Cli/Serv.: Dist. Prog./2
7
3. Forms of Communication
 1-to-1
communication
processes
 1-to-many
240-322 Cli/Serv.: Dist. Prog./2
communication
These can be
supported on
top of shared
memory or
distributed
memory
platforms.
continued
8
 many-to-1
communication
 many-to-many
240-322 Cli/Serv.: Dist. Prog./2
communication
9
4. Data Distribution
 Divide
input data between identical separate
processes.
 Examples:
– database search
– edge detection in an image
– builders making a room with bricks
240-322 Cli/Serv.: Dist. Prog./2
10
Boss-Workers
send part of database
send answer
workers
(all database
search engines)
boss
240-322 Cli/Serv.: Dist. Prog./2
11
 workers
often need to talk to one another
workers
(all builders)
send bricks
done
boss
240-322 Cli/Serv.: Dist. Prog./2
talking
12
Boss - Eager Workers
workers
(all builders)
ask for bricks
send bricks
boss
240-322 Cli/Serv.: Dist. Prog./2
talking
13
Things to Note

The code is duplicated in every process.

The maximum no. of processes depends on the
size of the task and difficulty of dividing data.

Talking can be very hard to code.

Talking is usually called communication,
synchronisation or cooperation
240-322 Cli/Serv.: Dist. Prog./2
continued
14
 Communication
is almost always
implemented using message passing.
 How
are processes assigned to processors?
240-322 Cli/Serv.: Dist. Prog./2
15
5. Algorithmic Distribution
 Divide
algorithm into parallel parts /
processes
– e.g. UNIX pipes
dirty plates
on table
collector
plates in
cupboard
240-322 Cli/Serv.: Dist. Prog./2
dirty
plates
washer
Stacker
wipe dry
plates
clean wet
plates
Drier
16
Things to Note
 Talking
is simple: pass data to next
process which ‘wakes up’ that process.
 Talking
becomes harder to code if there
are loops.
 How
240-322 Cli/Serv.: Dist. Prog./2
to assign processes to processors?
17
Several Workers per Sub-task
 Use
both algorithmic and data distribution.
 Problems: how to divide data?
how to combine data?
collector
Drier
washer
collector
Drier
Stacker
240-322 Cli/Serv.: Dist. Prog./2
Drier
18
Parallelise Separate Sub-tasks
Build a house:
plumbing
brick
laying
paint
electrical
wiring
b | (pl & e) | pt
240-322 Cli/Serv.: Dist. Prog./2
19
6. Granularity
Amount of data handled by a process:

Course grained: lots of data per process
– e,g, UNIX processes

Fine grained: small amounts of data per
process
– e.g. UNIX threads, Java threads
240-322 Cli/Serv.: Dist. Prog./2
20
7. Load Balancing
 How
to assign processes to processors?
 Want
to ‘even out’ work so that each processor
does about the same amount of work.
 But:
– different processors have different capabilities
– must consider cost of moving a process to a processor
(e.g. network speed, load)
240-322 Cli/Serv.: Dist. Prog./2
21
8. Brief History of (UNIX)
Distributed Programming

1970’s: UNIX was a multi-user, time-sharing OS
– &, pipes
– interprocess communication (IPC) on a single processor

mid 1980’s: System V UNIX
– added extra IPC mechanisms: shared memory,
messages, queues, etc.
240-322 Cli/Serv.: Dist. Prog./2
continued
22
 late
1970's to mid 1980’s: ARPA
– US Advanced Research Projects Agency
– funded research that produced TCP/IP, sockets
– added to BSD Unix 4.2
 mid-late
1980’s: utilities developed
– telnet, ftp
– r* utilities: rlogin, rcp, rsh
– client-server model based on sockets
240-322 Cli/Serv.: Dist. Prog./2
continued
23
 1986:
System V UNIX
– released TL1, a set of socket-based libraries
that support OSI
– not widely used
 late
1980’s: Sun Microsystems
– NFS (Network File System)
– RPC (Remote Procedure Call)
– NIS (Network Information Services)
240-322 Cli/Serv.: Dist. Prog./2
continued
24
 early
1990’s
– POSIX threads (light-weight processes)
– Web client-server model based on TCP/IP
 mid
1990's: Java
– Java threads
– Java Remote Method Invocation (RMI)
– CORBA
240-322 Cli/Serv.: Dist. Prog./2
continued
25
 late
1990's / early 2000's
– J2EE, .NET
– peer-to-peer (P2P)
 Napster,
Gnutella, etc.
– JXTA
240-322 Cli/Serv.: Dist. Prog./2
26