High Performance Computing - Center for Computation & Technology
Download
Report
Transcript High Performance Computing - Center for Computation & Technology
Prof. Thomas Sterling
Center for Computation and Technology &
Department of Computer Science
Louisiana State University
February 3, 2011
HIGH PERFORMANCE COMPUTING: MODELS, METHODS, &
MEANS
COMMUNICATING SEQUENTIAL
PROCESSES
CSC 7600 Lecture 6 : CSP
Spring 2011
Topics
•
•
•
•
•
•
•
Introduction
Towards a Scalable Execution Model
Communicating Sequential Processes
CSP – Heat Distribution Example
Performance Issues
Distributed Programming with Unix
Summary – Material for the Test
CSC 7600 Lecture 6 : CSP
Spring 2011
2
Topics
•
•
•
•
•
•
•
Introduction
Towards a Scalable Execution Model
Communicating Sequential Processes
CSP – Heat Distribution Example
Performance Issues
Distributed Programming with Unix
Summary – Material for the Test
CSC 7600 Lecture 6 : CSP
Spring 2011
3
Opening Remarks
• This week is about scalable application execution
– Shared memory systems not scalable
– Job stream parallelism does not accelerate single application
• A path to harnessing distributed memory computers
– Dominant form of HPC systems
– Commodity clusters & DM MPPs
• Discuss the 2ndparadigm for parallel programming:
cooperative computing
– Throughput computing (Segment 1, capacity computing)
– Multithreaded shared memory (Segment 3, capability computing)
• Dominant strategy
– Arena of technical computing
• Embodiment of Cooperative Computing
– Single application
– Weak scaling
CSC 7600 Lecture 6 : CSP
Spring 2011
4
Topics
•
•
•
•
•
•
•
Introduction
Towards a Scalable Execution Model
Communicating Sequential Processes
CSP – Heat Distribution Example
Performance Issues
Distributed Programming with Unix
Summary – Material for the Test
CSC 7600 Lecture 6 : CSP
Spring 2011
5
Driving forces
• Technology
– VLSI
• “Killer micros”
• High density DRAM
– Emerging network
• Architecture
– DM MPP
– Beowulf systems (commodity clusters)
• Weak Scaling
– Need for larger problems
– Data parallelism
CSC 7600 Lecture 6 : CSP
Spring 2011
6
Scalability
• Strong scaling limits sustained performance
– Fixed size problem to achieve reduced execution time with increased
computing resources
– Amdahl’s law
• Sequential component limits speedup
– Overhead imposes limits to granularity
• therefore parallelism and speedup
• Weak scaling allows computation size to grow with data set size
– Larger data sets create more concurrent processes
– Concurrent processes approximately same size granularity
– Performance increases with problem set size
• Big systems are big memories for big applications
– Aggregates memories of many processing nodes
– Allows problems far larger than a single processor could manage
CSC 7600 Lecture 6 : CSP
Spring 2011
7
Strong Scaling
Machine Scale (# of nodes)
Granularity (size / node)
Total Problem Size
Strong Scaling, Weak Scaling
Weak Scaling
Machine Scale (# of nodes)
CSC 7600 Lecture 6 : CSP
Spring 2011
8
Strong Scaling, Weak Scaling
•
Capacity
•
•
•
•
Primary scaling is increase in throughput proportional to increase in resources
applied
Decoupled concurrent tasks, increasing in number of instances – scaling
proportional to machine.
Cooperative
•
•
•
Single job, (different nodes working on different partitions of the same job)
Job size scales proportional to machine
Granularity per node is fixed
Capability
•
•
Primary scaling is decrease in response time proportional to increase in
resources applied
Single job, constant size – scaling proportional to machine size
Capacity
Cooperative
Capability
Single Job
Problem Size Scaling
Weak Scaling
Strong Scaling
CSC 7600 Lecture 6 : CSP
Spring 2011
9
Strong Scaling Vs. Weak Scaling
Weak Scaling
Work per task
Strong Scaling
1
2
4
Machine Scale (# of nodes)
8
CSC 7600 Lecture 6 : CSP
Spring 2011
10
Impact of VLSI
• Mass produced microprocessor enabled low cost computing
– PCs and workstations
• Economy of scale
– Ensembles of multiple processors
• Microprocessor becomes building block of parallel computers
• Favors sequential process oriented computing
– Natural hardware supported execution model
– Requires locality management
• Data
• Control
– I/O channels (south bridge) provides external interface
• Coarse grained communication packets
• Suggests concurrent execution at the process boundary level
– Processes statically assigned to processors (one on one)
• Operate on local data
– Coordination by large value-oriented I/O messages
• Inter process/processor synchronization and remote data exchange
CSC 7600 Lecture 6 : CSP
Spring 2011
11
“Cooperative” computing
•
Between Capacity and Capability computing
•
•
Synonymous with “Coordinated” computing
Single application
– Not a widely used term
– But an important distinction with respect to these others
– Partitioning of data into quasi independent blocks
– Semi independent processes operate on separate data blocks
– Limited communication of messages
• Coordinate through remote synchronization
• Cooperate through the exchange of some data
•
Scaling
•
Programming
– Primarily weak scaling
– Limited strong scaling
–
–
–
–
Favors SPMD (Single Program stream Multiple Data stream) style
Static scheduling mostly by hand
Load balancing by hand
Coarse grain
• Process
• Data
• Communication
CSC 7600 Lecture 6 : CSP
Spring 2011
12
Data Decomposition
• Partitioning the global data into major contiguous blocks
• Exploits spatial locality that assumes the use of a data
element heightens the likelihood of nearby data being
used as well (reducing latencies associated with cache
misses followed by accesses to main memory)
• Exploits temporal locality that assumes the use of a
data element heightens the likelihood that the same data
will be reused again in the near future
• Varies in form
– in dimensionality
– Granularity (size)
– Shape of partitions
• Static mapping of partitions on to processor nodes
CSC 7600 Lecture 6 : CSP
Spring 2011
13
Distributed Concurrent Processes
• Each data block can be processed at the same time
– Parallelism is determined by number of processes
– More blocks with smaller partitions permit more processes
– But …
• Processes run on separate processors on local data
– Usually one application process per processor
– Usually SPMD i.e., processes are equivalent but separate
(same code, different environments)
• Execution of inner data elements of the partition block
are done independently for each of the processes
– Provides coarse grain parallelism
– Outer loop iterates over successive application steps over the
same local data
CSC 7600 Lecture 6 : CSP
Spring 2011
14
Data Exchange
• In shared memory, no problem, all the data is there
• For distributed memory systems, data needs to be
exchanged between separate nodes and processes
• Ghost cells used to hold local copies of edges of remote
partition data at remote processor sites
• Communication packets are medium to coarse grain and
point to point for most data transfers
– e.g., all edge cells of one data partition may be sent to
corresponding ghost cells of the neighboring processor in a
single message
• Multi-cast or broadcast may be required for some
application algorithms and data partitions
– e.g., matrix-vector multiply
CSC 7600 Lecture 6 : CSP
Spring 2011
15
Synchronize
• Global barriers
– Coarse grain (in time) control of outer-loop steps
– Usually used to coordinate transition from computation phase to
communication phase
• Send/receive
–
–
–
–
Medium grain (in time) control of inner-loop data exchanges
Blocks on a send and receive
Computation at sender proceeds when data has been received
Computation at receiver proceeds when incoming data is
available
– Non-blocking versions of each exist but can lead to race
conditions
CSC 7600 Lecture 6 : CSP
Spring 2011
16
Topics
•
•
•
•
•
•
•
Introduction
Towards a Scalable Execution Model
Communicating Sequential Processes
CSP – Heat Distribution Example
Performance Issues
Distributed Programming with Unix
Summary – Material for the Test
CSC 7600 Lecture 6 : CSP
Spring 2011
17
Making Parallelism Fit
• Different kinds of parallelism work best on certain kinds
architectures
• Need to satisfy two contending requirements:
– Spread work out among as many parallel elements as possible
– Minimize inefficiencies due to:
• Overhead
• latency
CSC 7600 Lecture 6 : CSP
Spring 2011
18
Communicating Sequential Processes
• A model of parallel computing
– Developed in the 1970s
– Often attributed to Tony Hoare
– Satisfies criteria for cooperative computing
• Many would claim it as a means of capability computing
•
•
•
•
•
Process Oriented
Emphasizes data locality
Message passing semantics
Synchronization using barriers among others
Distributed reduction operators added for purposes of
optimization
CSC 7600 Lecture 6 : CSP
Spring 2011
19
Communicating Sequential Processes Model
• Another form of parallelism
• Coarse grained parallelism
– Large pieces of sequential code
– They run at the same time
• Good for clusters and distributed memory MPPs
• Share data by message passing
– Often referred to as “message-passing model”
•
•
•
•
Synchronize by “global barriers”
Most widely used method for programming
MPI is dominant API
Supports “SPMD” strategy (Single Program Multiple Data)
CSC 7600 Lecture 6 : CSP
Spring 2011
20
CSP Processes
• Process is the body of state and work
• Process is the module of work distribution
• Processes are static
– In space: assigned to a single processor
– In time: exist for the lifetime of the job
• All data is either local to the process or acquired through
incident messages
• Possible to extend process beyond sequential to
encompass multiple threaded processes
– Hybrid model integrates the two models together in a clumsy
programming methodology
CSC 7600 Lecture 6 : CSP
Spring 2011
21
Locality of state
• Processes operate on memory within the processor node
• Granularity of process iteration dependent on the amount
of process data stored on processor node
• New data from beyond local processor node acquired
through message passing, primarily by send/receive
semantic constructs
CSC 7600 Lecture 6 : CSP
Spring 2011
22
Other key functionalities
• Synchronization
– Barriers
– messaging
• Reduction
– Mix of local and global
• Load balancing
– Static, user defined
CSC 7600 Lecture 6 : CSP
Spring 2011
23
Message Passing Model (BSP)
Initialize
barrier
Local
sequential
process
barrier
Exchange
Data
CSC 7600 Lecture 6 : CSP
Spring 2011
24
Global Barrier Synchronization
• Different nodes finish a process at different
times
• Cannot exchange data until all processes have
completed
• Barriers synchronize all concurrent processes
running on separate “nodes”
• How it works
– Every process “tells” barrier when it is done
– When all processes are done, barrier “tells”
processes that they can continue
• “tells” is done by message passing over the
network
CSC 7600 Lecture 6 : CSP
Spring 2011
25
Barrier
Barrier Synchronization
Processes
CSC 7600 Lecture 6 : CSP
Spring 2011
26
Message: Send & Receive
• Nodes communicate with each other by packets through
the system area network
– Reminder: network comprises (hardware)
• NICs (Network Interface Controller)
• Links (metal wires or fiber optics)
• Switch (N x N)
– Operating systems and network drivers (software)
• Processes communicate with each other by applicationlevel messages
– send
– receive
• Message content
– Process port
– Data
CSC 7600 Lecture 6 : CSP
Spring 2011
27
Send & Receive
Node 1
Node 2
Process A
Process B
Network
send
receive
receive
send
send
receive
CSC 7600 Lecture 6 : CSP
Spring 2011
28
Topics
•
•
•
•
•
•
•
Introduction
Towards a Scalable Execution Model
Communicating Sequential Processes
CSP – Heat Distribution Example
Performance Issues
Distributed Programming with Unix
Summary – Material for the Test
CSC 7600 Lecture 6 : CSP
Spring 2011
29
An Example Problem
• Partial Differential Equation (PDE)
– Heat equation
– 2-dimensions discrete point distribution mesh to approximate a
unit square
• The temperature field is approximated by a finite set of discrete
points distributed over the computational domain and the
temperature values need to be calculated in this points
– Static boundary conditions (temperature on the boundaries is
predefined function of time)
• Stages of Code Development
–
–
–
–
Data decomposition
Concurrent sequential processes
Coordination through synchronization
Data exchange
CSC 7600 Lecture 6 : CSP
Spring 2011
30
An Example Problem
Heat equation:
u
k 2u
t
Boundary cells
In 2-D:
ut k (u xx u yy )
Uniprocessor
Ghost cells
Implementation:
• Jacobi method on a unit square
• Dirichlet boundary condition
• Equal number of intervals along x and y
axis
*
* * *
*
stencil
Boundary
updates
CSP with 4 processes
CSC 7600 Lecture 6 : CSP
Spring 2011
31
Stencil Calculation
xN
xW
xC
xE
xN t xE t xS t xW t
xC t 1
4.0
xS
CSC 7600 Lecture 6 : CSP
Spring 2011
32
Heat Distribution : Interactive Session
• We are going to act out a heat distribution problem.
• The take-home message we want to convey is :
–
–
–
–
How cooperative computing works
Notion of mesh/grid partitioning
Use of messaging
Explicit synchronization
CSC 7600 Lecture 6 : CSP
Spring 2011
33
CSC 7600 Lecture 6 : CSP
Spring 2011
34
100
80
60
40
20
100
0
0
0
0
80
0
0
0
60
0
0
0
0
0
0
40
0
0
0
0
0
0
0
0
0
0
10
0
0
0
0
0
20
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
30
20
0
0
0
0
0
0
0
0
40
0
0
0
0
0
0
0
0
0
50
0
10
20
30
0
CSC 7600 Lecture 6 : CSP
Spring
40
50 2011
35
Calculate the value of each cell by averaging
its 4 neighboring cells
60 0 0 0 60
15
4
4
0000 0
0
4
4
100
80
60
100
0
0
0
0
0
0
0
80
0
0
0
0
0
0
0
60
0
0
0
0
0
0
0
0
0
0
CSC 7600 Lecture 6 : CSP
Spring 2011
36
Calculate the difference between the
previous cell values and new cell values
100
80
60
100
50
20
15
0
80
20
0
0
0
60
15
0
0
100
80
60
100
0
0
0
80
0
0
60
0
0
50 0 50
CSC 7600 Lecture 6 : CSP
Spring 2011
37
After computing the difference for each cell, Determine
the Maximum Temperature ACROSS your problem
chunk
100
80
60
100
50
20
15
80
20
0
0
60
15
0
0
50
Send this value to
coordinator
Coordinator
CSC 7600 Lecture 6 : CSP
Spring 2011
38
Coordinator Waits for all processing elements to
send their values and determines the maximum
of all the values it receives
max(50,10,10,25) 50
Coordinator
50 10
10
25
YES
MAX <
7.0 ?
Proc 1
Proc 2
Proc 3
STOP
Proc 4
NO
CSC 7600 Lecture 6 : CSP
Spring 2011
39
100
80
60
40
20
100
50
20
15
10
80
20
0
0
60
15
0
0
10
0
15
40
15
10
5
0
0
0
0
0
0
2.5
10
0
0
0
0
5.0
20
0
0
0
7.5
0
0
0
0
5.0
10
0
0
0
0
0
0
7.5
30
20
5
0
0
0
0
0
0
10.0
40
0
0
2.5
5.0
7.5
5.0
7.5
10.0
25.0
50
30
CSC 7600 Lecture 6 : CSP
Spring
40
50 2011
0
10
20
0
40
ITERATION 1
CSC 7600 Lecture 6 : CSP
Spring 2011
41
ITERATION 2
CSC 7600 Lecture 6 : CSP
Spring 2011
42
ITERATION 3
CSC 7600 Lecture 6 : CSP
Spring 2011
43
ITERATION 4
CSC 7600 Lecture 6 : CSP
Spring 2011
44
ITERATION 5
CSC 7600 Lecture 6 : CSP
Spring 2011
45
ITERATION 6
CSC 7600 Lecture 6 : CSP
Spring 2011
46
ITERATION 7
CSC 7600 Lecture 6 : CSP
Spring 2011
47
ITERATION 8
CSC 7600 Lecture 6 : CSP
Spring 2011
48
ITERATION 9
CSC 7600 Lecture 6 : CSP
Spring 2011
49
ITERATION 10
CSC 7600 Lecture 6 : CSP
Spring 2011
50
ITERATION 11
CSC 7600 Lecture 6 : CSP
Spring 2011
51
ITERATION 12
CSC 7600 Lecture 6 : CSP
Spring 2011
52
ITERATION 13
CSC 7600 Lecture 6 : CSP
Spring 2011
53
ITERATION 14
CSC 7600 Lecture 6 : CSP
Spring 2011
54
ITERATION 15
CSC 7600 Lecture 6 : CSP
Spring 2011
55
ITERATION 16
CSC 7600 Lecture 6 : CSP
Spring 2011
56
ITERATION 17
CSC 7600 Lecture 6 : CSP
Spring 2011
57
ITERATION 18
CSC 7600 Lecture 6 : CSP
Spring 2011
58
ITERATION 19
CSC 7600 Lecture 6 : CSP
Spring 2011
59
ITERATION 20
CSC 7600 Lecture 6 : CSP
Spring 2011
60
ITERATION 50
CSC 7600 Lecture 6 : CSP
Spring 2011
61
ITERATION 100
CSC 7600 Lecture 6 : CSP
Spring 2011
62
Topics
•
•
•
•
•
•
•
Introduction
Towards a Scalable Execution Model
Communicating Sequential Processes
CSP – Heat Distribution Example
Performance Issues
Distributed Programming with Unix
Summary – Material for the Test
CSC 7600 Lecture 6 : CSP
Spring 2011
63
Performance issues for CSP
• Parallelism speeds things up
• Data exchange slows things down
• Finer grain partitioning
– provides more parallelism
• Can use more processors
– requires more fine grain messages
• Overhead becomes more significant per datum
– fewer operations per message
• Overhead of communication becomes more significant per
operation
• Synchronization is another source of overhead
• Computation and communication not overlapped
CSC 7600 Lecture 6 : CSP
Spring 2011
64
Performance Issues for CSP
• Communication (Gather / Scatter, Data exchange)
– Latency
• Network Distance, Message size
– Contention
• Network Bandwidth
– Overhead
• Network interfaces & protocols used
• Synchronization (Blocking read – writes, barriers)
– Overhead
• Load Balancing
– Non-uniform work tasks
– Starvation
– Overhead
CSC 7600 Lecture 6 : CSP
Spring 2011
65
Topics
•
•
•
•
•
•
•
Introduction
Towards a Scalable Execution Model
Communicating Sequential Processes
CSP – Heat Distribution Example
Performance Issues
Distributed Programming with Unix
Summary – Material for the Test
CSC 7600 Lecture 6 : CSP
Spring 2011
66
Parallelism : Operating System level
Program
instructions
• Program (review) : A program is a set of instructions
usually stored in the memory. During execution a
computer fetches the instruction stored in the memory
address indicated by the program counter and
executes the instruction.
• Process (review) : Can be defined as a combination of
program, memory address space associated with the
program and a program counter.
• A program associated with one process cannot access
the memory address space of another program.
• A multi-threaded process is one where a single memory
address space is associated with multiple program
counters.
• In this lecture we limit the discussion to single-threaded
processes for the sake of simplicity.
CPU
Adapted from Ch. 7 Beowulf and Cluster Computing . Gropp, Lusk, Sterling
CSC 7600 Lecture 6 : CSP
Spring 2011
67
Unix processes : Overview
•
•
New processes can be created using the fork()exec() combination.
fork()
– A medium weight mechanism that copies the
address space and creates a process with the
same program. The process that invoked the fork()
call is known as the parent process and the newly
created process is called the child process.
– For the child the fork() call returns a 0 where as for
the parent fork() call returns the process ID (PID)
•
•
The child process then invokes the exec() system
call.
exec()
– changes the program associated with the process
– sets the program counter to the beginning of the
program.
– reinitializes the address space
Image cropped from : http://sc.tamu.edu/
Adapted from Ch. 7 Beowulf and Cluster Computing . Gropp, Lusk, Sterling
CSC 7600 Lecture 6 : CSP
Spring 2011
68
Parallelism using Unix utilities
•
•
•
Usually a shell process waits for the child process to finish execution
before prompting you to execute another command.
By appending a new process invocation with the “&” character, the shell
starts the new process but then immediately prompts for another
command, this is called running a process in the “background”.
This is the simplest form of master-worker model executed using basic
Unix utilities.
!# /bin/bash
export search_string=$1
echo searching for $search_string
for i in 20*
do ( cd $i; grep $search_string * >> $search_string.out
done
wait
cat 20*/$search_string.out > $1.all
& ) ;
Adapted from Ch. 7 Beowulf and Cluster Computing . Gropp, Lusk, Sterling
CSC 7600 Lecture 6 : CSP
Spring 2011
69
Remote Processes
•
•
To create a new process on another machine, the initiator must contact
an existing process and cause it to fork a new process. The contact is
usually made over a TCP socket.
rsh (remote shell) :
– rsh command contacts the rshd process running on the remote machine and
prompts it to execute a script/program.
– The standard I/O for the remote machine are routed through rsh to the local
machine’s standard I/O.
– Due to severe security problems (plain text password handling) utilities like
rsh and telnet are strongly discouraged and deprecated in many systems.
– eg : rsh celeritas.cct.lsu.edu /bin/date
•
ssh (Secure shell) :
– behaves much like rsh, but the authentication mechanism is based on public
key encryption and encrypts all traffic between the local and remote machines.
– since rsh does not have the encryption stage, rsh is substantially faster than
ssh.
– eg : ssh celeritas.cct.lsu.edu /bin/date
CSC 7600 Lecture 6 : CSP
Spring 2011
70
Sockets : Overview
• socket : a bidirectional communication channel between two
processes that is accessed by the processes using the same
read and write functions that are used for file I/O
• Connection Process :
– The initial connection process that two remote processes perform in order
to establish a bidirectional communication channel is asymmetric.
• The remote machine listens for a connection and accepts it
• One process initiates request for connection with the remote machine.
• A bidirectional channel between the two machines is established.
– Once a channel is established the communication between the two
processes is symmetric.
– In a client-server model, the process that waits for a connection is known as
the server and the process that connects to it is known as the client.
CSC 7600 Lecture 6 : CSP
Spring 2011
71
TCP/IP : Overview
•
Common Terms :
– IP : Internet Protocol for communication between computers, and is responsible
for routing IP packets to their destination.
– TCP : Transmission Control Protocol for communication between applications
– UDP : User Datagram Protocol for communication between applications
– ICMP : Internet Control Message Protocol for detecting errors and network
statistics
•
TCP :
– An application that wants to communicate with another application sends a
request for connection.
– The request is sent to a fully qualified address (more on this soon), and port.
– After the “handshake”: (SYN-ACK-SYN) between the two, a bidirectional
communication channel is established between the two.
– The communication channel remains alive until it is terminated by one of the
applications involved.
•
TCP / IP :
– TCP breaks down the data to be communicated between applications into
packets and assembles the data from packets when they reach the destination.
– IP ensures routing of the data packets to their intended receiver.
CSC 7600 Lecture 6 : CSP
Spring 2011
72
TCP/IP : Overview
•
•
•
Each computer on a network is associated with an IP address containing 4
numbers each holding a value between 0-255 eg : 130.184.6.128
Using Domain Name System (DNS) servers the numeric IP address is
mapped to a domain name that is easier to remember, eg the Domain
Name corresponding to 130.184.6.128 is prospero.uark.edu
Analogy : Making a phone call
–
–
–
–
Caller – client
Receiver – server
Phone Number – IP address
Extension – Port number
Client:
• Picking up the receiver – socket ()
• Locating the call recipient (from
phone book / memory) – bind()
• Dialing the phone number –
connect()
• Talking – read() / write()
• Hanging up – close()
Server:
• Connecting phone to the phone line – socket ()
• Selecting an incoming line – bind ()
• Ringer ON – listen()
• Receiving the call – accept()
• Talking – read() / write()
• Hanging up – close()
CSC 7600 Lecture 6 : CSP
Spring 2011
73
Server : Create Socket
/* Create data structures to store connection specific info */
struct sockaddr_in sin, from;
/* The main call that creates a socket */
listen_socket = socket(AF_INET, SOCK_STREAM, 0);
Server
–
–
–
–
Client
Create a TCP socket
Bind socket-port
Listen for connections
Loop:
–
–
–
–
Create a TCP socket
Connect to server
Communicate
Close connection
• accept connection
• communicate
• close connection
CSC 7600 Lecture 6 : CSP
Spring 2011
74
Server : Bind Socket-Port
/* Initializing data structures */
sin.sin_family = AF_INET;
sin.sin_addr.s_addr = INADDR_ANY;
/* 0 - Allowing the system to do the selection of port to bind. This
is user-configurable, we use specific port (PORT_NUM) */
sin.sin_port = htons(PORT_NUM);
bind(listen_socket, (struct sockaddr *) &sin, sizeof(sin));
Server
–
–
–
–
Client
Create a TCP socket
Bind socket-port
Listen for connections
Loop:
–
–
–
–
Create a TCP socket
Connect to server
Communicate
Close connection
• accept connection
• communicate
• close connection
CSC 7600 Lecture 6 : CSP
Spring 2011
75
Server : Listen for Connections
listen(listen_socket, 5)
/*5 refers to the number of connection requests that the kernel should
maintain for the application */
getsockname(listen_socket, (struct sockaddr *) &sin, &len);
print (“listening on port = %d\n”, ntohs(sin.sin_port));
Client
Server
–
–
–
–
Create a TCP socket
Bind socket-port
Listen for connections
Loop:
–
–
–
–
Server A
130.39.128.9 : port
Create a TCP socket
Connect to server
Communicate
Close connection
• accept connection
• communicate
• close connection
CSC 7600 Lecture 6 : CSP
Spring 2011
76
Server : accept()
talk_socket = accept(listen_socket, (structsockaddr *) &from, &len);
/*accept() a blocking system call that waits until connection from a
client and then returns a new socket(talk_socket)using which the
server is connected to the client, so that it can continue listening
for more connections on the original server socket (listen_socket)
*/
Server
–
–
–
–
Client
Create a TCP socket
Bind socket-port
Listen for connections
Loop:
–
–
–
–
Create a TCP socket
Connect to server
Communicate
Close connection
• accept connection
• communicate
• close connection
CSC 7600 Lecture 6 : CSP
Spring 2011
77
Client : Create Socket
/* Create data structures to store connection specific info */
struct sockaddr_in sin;
struct hostent *hp;
/* The main call that creates a socket */
talk_socket = socket(AF_INET, SOCK_STREAM, 0);
Server
–
–
–
–
Client
Create a TCP socket
Bind socket-port
Listen for connections
Loop:
• accept connection
• communicate
• close connection
–
–
–
–
Create a TCP socket
Connect to server
Communicate
Close connection
CSC 7600 Lecture 6 : CSP
Spring 2011
78
Client : Bind and Connect to Server
/* initialize data structures*/
hp = gethostbyname(HOST_NAME)
bzero((void *)&sin, sizeof(sin));
bcopy((void *) hp->h_addr, (void *) &sin.sin_addr, hp->h_length);
sin.sin_family = hp->h_addrtype;
sin.sin_port = htons(atoi(PORT_NUM));
/* connect to the server */
connect(talk_socket,(structsockaddr *) &sin, sizeof(sin));
Server
–
–
–
–
Create a TCP socket
Bind socket-port
Listen for connections
Loop:
• accept connection
• communicate
• close connection
Client
–
–
–
–
Create a TCP socket
Connect to server
Communicate
Close connection
CSC 7600 Lecture 6 : CSP
Spring 2011
79
Client : send msg. write()
n = write(talk_socket, buf, strlen(buf)+1);
if (n < 0)
error("ERROR writing to socket");
bzero(buf,256);
/*Client initiates communication with server using a write() call*/
Server
–
–
–
–
Client
Create a TCP socket
Bind socket-port
Listen for connections
Loop:
–
–
–
–
Create a TCP socket
Connect to server
Communicate
Close connection
• accept connection
• communicate
• close connection
CSC 7600 Lecture 6 : CSP
Spring 2011
80
Server recv/send read()/write()
n = read (talk_socket, buf, 1024);
if (n < 0)
error("ERROR reading from socket");
else
write(talk_socket, buf, n);
/* simple echo; content stored in buf*/
Server
–
–
–
–
Client
Create a TCP socket
Bind socket-port
Listen for connections
Loop:
• accept connection
• communicate
• close connection
–
–
–
–
Create a TCP socket
Connect to server
Communicate
Close connection
CSC 7600 Lecture 6 : CSP
Spring 2011
81
Client : recv : read()
n = read(talk_socket, buf, 1024);
if (n < 0)
error("ERROR reading from socket");
else
printf(“received from server: %s \n”,buf);
/*receives messages sent by the server stored in buf*/
Server
–
–
–
–
Client
Create a TCP socket
Bind socket-port
Listen for connections
Loop:
• accept connection
• communicate
• close connection
–
–
–
–
Create a TCP socket
Connect to server
Communicate
Close connection
CSC 7600 Lecture 6 : CSP
Spring 2011
82
Close Socket
close(talk_socket)
close(talk_socket)
/* Ends the socket connection
corresponding to one particular
client. The control goes back
to the loop and server
continues to wait for more
client connections at
listen_socket */
/* Ends the client socket
connection
*/
Server
–
–
–
–
Client
Create a TCP socket
Bind socket-port
Listen for connections
Loop:
–
–
–
–
Create a TCP socket
Connect to server
Communicate
Close connection
• accept connection
• communicate
• close connection
CSC 7600 Lecture 6 : CSP
Spring 2011
83
Demo: Socket Example
CSC 7600 Lecture 6 : CSP
Spring 2011
84
Socket Programming: Problems
• Limited portability (not all interconnect interfaces support
sockets)
• Limited scalability (number of ports available on a node)
• Tedious and error-prone hand-coding (unless somebody did it
before)
• Tricky startup process (assumed port availability is not
guaranteed)
• Only point-to-point communication supported explicitly; no
implementation of collective communication patterns
• Frequently used communication topologies not available (e.g.,
Cartesian mesh), have to be coded from scratch
• Direct support only for data organized in continuous buffers,
forcing writing own buffer packing/unpacking routines
CSC 7600 Lecture 6 : CSP
Spring 2011
85
Socket Programming : Problems
• Suffer from the overhead of protocol stack (TCP), or require
designing algorithms to manage reliable, in-order, free of duplicates
arrival of complete messages (e.g. datagram-oriented)
• Basic data transfer calls (read/write) do not guarantee returning or
sending the full requested number of bytes, requiring the use of
wrappers (and possibly resulting in multiple kernel calls per
message)
• Complicated writing of applications with changing/unpredictable
communications (it’s only easy when reads are matched to writes
and you know when both of them occur)
• On some OS’s sockets may linger long after application exits,
preventing new startups using the same configuration
• If used, asynchronous management of socket calls adds another
layer of complexity (either through select() or multiple threads)
CSC 7600 Lecture 6 : CSP
Spring 2011
86
Topics
•
•
•
•
•
•
•
Introduction
Towards a Scalable Execution Model
Communicating Sequential Processes
CSP – Heat Distribution Example
Performance Issues
Distributed Programming with Unix
Summary – Material for the Test
CSC 7600 Lecture 6 : CSP
Spring 2011
87
Summary : Material for the Test
•
•
•
•
•
•
Scalability (strong, weak scaling) : 7 – 10
Cooperative computing : 12 – 16
Communicating Sequential Processes : 18 – 22
Message Passing : 24 – 28
Performance issues of CSP : 64 – 65
Sockets, TCP / IP : 71, 72, 85, 86
CSC 7600 Lecture 6 : CSP
Spring 2011
88
CSC 7600 Lecture 6 : CSP
Spring 2011
89