Transcript dist-prog2
Client/Server Distributed Systems
240-322, Semester 1, 2005-2006
2. Distributed Programming
Concepts
Objectives
– explain the general meaning of distributed
programming beyond client/server
– look at the history of distributed
programming
240-322 Cli/Serv.: Dist. Prog./2
1
Overview
1. Definition
2. From Parallel to Distributed
3. Forms of Communication
4. Data Distribution
5. Algorithmic Distribution
6. Granularity
7. Load Balancing
8. Brief History of Distributed
Programming
240-322 Cli/Serv.: Dist. Prog./2
2
1. Definition
Distributed
programming is the spreading
of a computational task across several
programs, processes, or processors.
Includes
parallel (concurrent) and
networked programming.
Definition
240-322 Cli/Serv.: Dist. Prog./2
is a bit vague.
3
2. From Parallel to Distributed
Most parallel languages talk about processes:
– these can be on different processors or on different
computers
The implementor may choose to add language
features to explicitly say where a process should run.
May also choose to address network issues
(bandwidth, failure, etc.) at the language level.
240-322 Cli/Serv.: Dist. Prog./2
continued
4
Often
resources required by programs
are distributed, which means that the
programs must be distributed.
240-322 Cli/Serv.: Dist. Prog./2
continued
5
240-322 Cli/Serv.: Dist. Prog./2
continued
6
Network Transparency
Most
users want networks to be as
transparent (invisible) as possible:
– users do not want to care which machine is
used to store their files
– they do not want to know where a process is
running
240-322 Cli/Serv.: Dist. Prog./2
7
3. Forms of Communication
1-to-1
communication
processes
1-to-many
240-322 Cli/Serv.: Dist. Prog./2
communication
These can be
supported on
top of shared
memory or
distributed
memory
platforms.
continued
8
many-to-1
communication
many-to-many
240-322 Cli/Serv.: Dist. Prog./2
communication
9
4. Data Distribution
Divide
input data between identical separate
processes.
Examples:
– database search
– edge detection in an image
– builders making a room with bricks
240-322 Cli/Serv.: Dist. Prog./2
10
Boss-Workers
send part of database
send answer
workers
(all database
search engines)
boss
240-322 Cli/Serv.: Dist. Prog./2
11
workers
often need to talk to one another
workers
(all builders)
send bricks
done
boss
240-322 Cli/Serv.: Dist. Prog./2
talking
12
Boss - Eager Workers
workers
(all builders)
ask for bricks
send bricks
boss
240-322 Cli/Serv.: Dist. Prog./2
talking
13
Things to Note
The code is duplicated in every process.
The maximum no. of processes depends on the
size of the task and difficulty of dividing data.
Talking can be very hard to code.
Talking is usually called communication,
synchronisation or cooperation
240-322 Cli/Serv.: Dist. Prog./2
continued
14
Communication
is almost always
implemented using message passing.
How
are processes assigned to processors?
240-322 Cli/Serv.: Dist. Prog./2
15
5. Algorithmic Distribution
Divide
algorithm into parallel parts /
processes
– e.g. UNIX pipes
dirty plates
on table
collector
plates in
cupboard
240-322 Cli/Serv.: Dist. Prog./2
dirty
plates
washer
Stacker
wipe dry
plates
clean wet
plates
Drier
16
Things to Note
Talking
is simple: pass data to next
process which ‘wakes up’ that process.
Talking
becomes harder to code if there
are loops.
How
240-322 Cli/Serv.: Dist. Prog./2
to assign processes to processors?
17
Several Workers per Sub-task
Use
both algorithmic and data distribution.
Problems: how to divide data?
how to combine data?
collector
Drier
washer
collector
Drier
Stacker
240-322 Cli/Serv.: Dist. Prog./2
Drier
18
Parallelise Separate Sub-tasks
Build a house:
plumbing
brick
laying
paint
electrical
wiring
b | (pl & e) | pt
240-322 Cli/Serv.: Dist. Prog./2
19
6. Granularity
Amount of data handled by a process:
Course grained: lots of data per process
– e,g, UNIX processes
Fine grained: small amounts of data per
process
– e.g. UNIX threads, Java threads
240-322 Cli/Serv.: Dist. Prog./2
20
7. Load Balancing
How
to assign processes to processors?
Want
to ‘even out’ work so that each processor
does about the same amount of work.
But:
– different processors have different capabilities
– must consider cost of moving a process to a processor
(e.g. network speed, load)
240-322 Cli/Serv.: Dist. Prog./2
21
8. Brief History of (UNIX)
Distributed Programming
1970’s: UNIX was a multi-user, time-sharing OS
– &, pipes
– interprocess communication (IPC) on a single processor
mid 1980’s: System V UNIX
– added extra IPC mechanisms: shared memory,
messages, queues, etc.
240-322 Cli/Serv.: Dist. Prog./2
continued
22
late
1970's to mid 1980’s: ARPA
– US Advanced Research Projects Agency
– funded research that produced TCP/IP, sockets
– added to BSD Unix 4.2
mid-late
1980’s: utilities developed
– telnet, ftp
– r* utilities: rlogin, rcp, rsh
– client-server model based on sockets
240-322 Cli/Serv.: Dist. Prog./2
continued
23
1986:
System V UNIX
– released TL1, a set of socket-based libraries
that support OSI
– not widely used
late
1980’s: Sun Microsystems
– NFS (Network File System)
– RPC (Remote Procedure Call)
– NIS (Network Information Services)
240-322 Cli/Serv.: Dist. Prog./2
continued
24
early
1990’s
– POSIX threads (light-weight processes)
– Web client-server model based on TCP/IP
mid
1990's: Java
– Java threads
– Java Remote Method Invocation (RMI)
– CORBA
240-322 Cli/Serv.: Dist. Prog./2
continued
25
late
1990's / early 2000's
– J2EE, .NET
– peer-to-peer (P2P)
Napster,
Gnutella, etc.
– JXTA
240-322 Cli/Serv.: Dist. Prog./2
26