High Performance Computing - Center for Computation & Technology

Download Report

Transcript High Performance Computing - Center for Computation & Technology

Prof. Thomas Sterling
Center for Computation and Technology &
Department of Computer Science
Louisiana State University
February 3, 2011
HIGH PERFORMANCE COMPUTING: MODELS, METHODS, &
MEANS
COMMUNICATING SEQUENTIAL
PROCESSES
CSC 7600 Lecture 6 : CSP
Spring 2011
Topics
•
•
•
•
•
•
•
Introduction
Towards a Scalable Execution Model
Communicating Sequential Processes
CSP – Heat Distribution Example
Performance Issues
Distributed Programming with Unix
Summary – Material for the Test
CSC 7600 Lecture 6 : CSP
Spring 2011
2
Topics
•
•
•
•
•
•
•
Introduction
Towards a Scalable Execution Model
Communicating Sequential Processes
CSP – Heat Distribution Example
Performance Issues
Distributed Programming with Unix
Summary – Material for the Test
CSC 7600 Lecture 6 : CSP
Spring 2011
3
Opening Remarks
• This week is about scalable application execution
– Shared memory systems not scalable
– Job stream parallelism does not accelerate single application
• A path to harnessing distributed memory computers
– Dominant form of HPC systems
– Commodity clusters & DM MPPs
• Discuss the 2ndparadigm for parallel programming:
cooperative computing
– Throughput computing (Segment 1, capacity computing)
– Multithreaded shared memory (Segment 3, capability computing)
• Dominant strategy
– Arena of technical computing
• Embodiment of Cooperative Computing
– Single application
– Weak scaling
CSC 7600 Lecture 6 : CSP
Spring 2011
4
Topics
•
•
•
•
•
•
•
Introduction
Towards a Scalable Execution Model
Communicating Sequential Processes
CSP – Heat Distribution Example
Performance Issues
Distributed Programming with Unix
Summary – Material for the Test
CSC 7600 Lecture 6 : CSP
Spring 2011
5
Driving forces
• Technology
– VLSI
• “Killer micros”
• High density DRAM
– Emerging network
• Architecture
– DM MPP
– Beowulf systems (commodity clusters)
• Weak Scaling
– Need for larger problems
– Data parallelism
CSC 7600 Lecture 6 : CSP
Spring 2011
6
Scalability
• Strong scaling limits sustained performance
– Fixed size problem to achieve reduced execution time with increased
computing resources
– Amdahl’s law
• Sequential component limits speedup
– Overhead imposes limits to granularity
• therefore parallelism and speedup
• Weak scaling allows computation size to grow with data set size
– Larger data sets create more concurrent processes
– Concurrent processes approximately same size granularity
– Performance increases with problem set size
• Big systems are big memories for big applications
– Aggregates memories of many processing nodes
– Allows problems far larger than a single processor could manage
CSC 7600 Lecture 6 : CSP
Spring 2011
7
Strong Scaling
Machine Scale (# of nodes)
Granularity (size / node)
Total Problem Size
Strong Scaling, Weak Scaling
Weak Scaling
Machine Scale (# of nodes)
CSC 7600 Lecture 6 : CSP
Spring 2011
8
Strong Scaling, Weak Scaling
•
Capacity
•
•
•
•
Primary scaling is increase in throughput proportional to increase in resources
applied
Decoupled concurrent tasks, increasing in number of instances – scaling
proportional to machine.
Cooperative
•
•
•
Single job, (different nodes working on different partitions of the same job)
Job size scales proportional to machine
Granularity per node is fixed
Capability
•
•
Primary scaling is decrease in response time proportional to increase in
resources applied
Single job, constant size – scaling proportional to machine size
Capacity
Cooperative
Capability
Single Job
Problem Size Scaling
Weak Scaling
Strong Scaling
CSC 7600 Lecture 6 : CSP
Spring 2011
9
Strong Scaling Vs. Weak Scaling
Weak Scaling
Work per task
Strong Scaling
1
2
4
Machine Scale (# of nodes)
8
CSC 7600 Lecture 6 : CSP
Spring 2011
10
Impact of VLSI
• Mass produced microprocessor enabled low cost computing
– PCs and workstations
• Economy of scale
– Ensembles of multiple processors
• Microprocessor becomes building block of parallel computers
• Favors sequential process oriented computing
– Natural hardware supported execution model
– Requires locality management
• Data
• Control
– I/O channels (south bridge) provides external interface
• Coarse grained communication packets
• Suggests concurrent execution at the process boundary level
– Processes statically assigned to processors (one on one)
• Operate on local data
– Coordination by large value-oriented I/O messages
• Inter process/processor synchronization and remote data exchange
CSC 7600 Lecture 6 : CSP
Spring 2011
11
“Cooperative” computing
•
Between Capacity and Capability computing
•
•
Synonymous with “Coordinated” computing
Single application
– Not a widely used term
– But an important distinction with respect to these others
– Partitioning of data into quasi independent blocks
– Semi independent processes operate on separate data blocks
– Limited communication of messages
• Coordinate through remote synchronization
• Cooperate through the exchange of some data
•
Scaling
•
Programming
– Primarily weak scaling
– Limited strong scaling
–
–
–
–
Favors SPMD (Single Program stream Multiple Data stream) style
Static scheduling mostly by hand
Load balancing by hand
Coarse grain
• Process
• Data
• Communication
CSC 7600 Lecture 6 : CSP
Spring 2011
12
Data Decomposition
• Partitioning the global data into major contiguous blocks
• Exploits spatial locality that assumes the use of a data
element heightens the likelihood of nearby data being
used as well (reducing latencies associated with cache
misses followed by accesses to main memory)
• Exploits temporal locality that assumes the use of a
data element heightens the likelihood that the same data
will be reused again in the near future
• Varies in form
– in dimensionality
– Granularity (size)
– Shape of partitions
• Static mapping of partitions on to processor nodes
CSC 7600 Lecture 6 : CSP
Spring 2011
13
Distributed Concurrent Processes
• Each data block can be processed at the same time
– Parallelism is determined by number of processes
– More blocks with smaller partitions permit more processes
– But …
• Processes run on separate processors on local data
– Usually one application process per processor
– Usually SPMD i.e., processes are equivalent but separate
(same code, different environments)
• Execution of inner data elements of the partition block
are done independently for each of the processes
– Provides coarse grain parallelism
– Outer loop iterates over successive application steps over the
same local data
CSC 7600 Lecture 6 : CSP
Spring 2011
14
Data Exchange
• In shared memory, no problem, all the data is there
• For distributed memory systems, data needs to be
exchanged between separate nodes and processes
• Ghost cells used to hold local copies of edges of remote
partition data at remote processor sites
• Communication packets are medium to coarse grain and
point to point for most data transfers
– e.g., all edge cells of one data partition may be sent to
corresponding ghost cells of the neighboring processor in a
single message
• Multi-cast or broadcast may be required for some
application algorithms and data partitions
– e.g., matrix-vector multiply
CSC 7600 Lecture 6 : CSP
Spring 2011
15
Synchronize
• Global barriers
– Coarse grain (in time) control of outer-loop steps
– Usually used to coordinate transition from computation phase to
communication phase
• Send/receive
–
–
–
–
Medium grain (in time) control of inner-loop data exchanges
Blocks on a send and receive
Computation at sender proceeds when data has been received
Computation at receiver proceeds when incoming data is
available
– Non-blocking versions of each exist but can lead to race
conditions
CSC 7600 Lecture 6 : CSP
Spring 2011
16
Topics
•
•
•
•
•
•
•
Introduction
Towards a Scalable Execution Model
Communicating Sequential Processes
CSP – Heat Distribution Example
Performance Issues
Distributed Programming with Unix
Summary – Material for the Test
CSC 7600 Lecture 6 : CSP
Spring 2011
17
Making Parallelism Fit
• Different kinds of parallelism work best on certain kinds
architectures
• Need to satisfy two contending requirements:
– Spread work out among as many parallel elements as possible
– Minimize inefficiencies due to:
• Overhead
• latency
CSC 7600 Lecture 6 : CSP
Spring 2011
18
Communicating Sequential Processes
• A model of parallel computing
– Developed in the 1970s
– Often attributed to Tony Hoare
– Satisfies criteria for cooperative computing
• Many would claim it as a means of capability computing
•
•
•
•
•
Process Oriented
Emphasizes data locality
Message passing semantics
Synchronization using barriers among others
Distributed reduction operators added for purposes of
optimization
CSC 7600 Lecture 6 : CSP
Spring 2011
19
Communicating Sequential Processes Model
• Another form of parallelism
• Coarse grained parallelism
– Large pieces of sequential code
– They run at the same time
• Good for clusters and distributed memory MPPs
• Share data by message passing
– Often referred to as “message-passing model”
•
•
•
•
Synchronize by “global barriers”
Most widely used method for programming
MPI is dominant API
Supports “SPMD” strategy (Single Program Multiple Data)
CSC 7600 Lecture 6 : CSP
Spring 2011
20
CSP Processes
• Process is the body of state and work
• Process is the module of work distribution
• Processes are static
– In space: assigned to a single processor
– In time: exist for the lifetime of the job
• All data is either local to the process or acquired through
incident messages
• Possible to extend process beyond sequential to
encompass multiple threaded processes
– Hybrid model integrates the two models together in a clumsy
programming methodology
CSC 7600 Lecture 6 : CSP
Spring 2011
21
Locality of state
• Processes operate on memory within the processor node
• Granularity of process iteration dependent on the amount
of process data stored on processor node
• New data from beyond local processor node acquired
through message passing, primarily by send/receive
semantic constructs
CSC 7600 Lecture 6 : CSP
Spring 2011
22
Other key functionalities
• Synchronization
– Barriers
– messaging
• Reduction
– Mix of local and global
• Load balancing
– Static, user defined
CSC 7600 Lecture 6 : CSP
Spring 2011
23
Message Passing Model (BSP)
Initialize
barrier
Local
sequential
process
barrier
Exchange
Data
CSC 7600 Lecture 6 : CSP
Spring 2011
24
Global Barrier Synchronization
• Different nodes finish a process at different
times
• Cannot exchange data until all processes have
completed
• Barriers synchronize all concurrent processes
running on separate “nodes”
• How it works
– Every process “tells” barrier when it is done
– When all processes are done, barrier “tells”
processes that they can continue
• “tells” is done by message passing over the
network
CSC 7600 Lecture 6 : CSP
Spring 2011
25
Barrier
Barrier Synchronization
Processes
CSC 7600 Lecture 6 : CSP
Spring 2011
26
Message: Send & Receive
• Nodes communicate with each other by packets through
the system area network
– Reminder: network comprises (hardware)
• NICs (Network Interface Controller)
• Links (metal wires or fiber optics)
• Switch (N x N)
– Operating systems and network drivers (software)
• Processes communicate with each other by applicationlevel messages
– send
– receive
• Message content
– Process port
– Data
CSC 7600 Lecture 6 : CSP
Spring 2011
27
Send & Receive
Node 1
Node 2
Process A
Process B
Network
send
receive
receive
send
send
receive
CSC 7600 Lecture 6 : CSP
Spring 2011
28
Topics
•
•
•
•
•
•
•
Introduction
Towards a Scalable Execution Model
Communicating Sequential Processes
CSP – Heat Distribution Example
Performance Issues
Distributed Programming with Unix
Summary – Material for the Test
CSC 7600 Lecture 6 : CSP
Spring 2011
29
An Example Problem
• Partial Differential Equation (PDE)
– Heat equation
– 2-dimensions discrete point distribution mesh to approximate a
unit square
• The temperature field is approximated by a finite set of discrete
points distributed over the computational domain and the
temperature values need to be calculated in this points
– Static boundary conditions (temperature on the boundaries is
predefined function of time)
• Stages of Code Development
–
–
–
–
Data decomposition
Concurrent sequential processes
Coordination through synchronization
Data exchange
CSC 7600 Lecture 6 : CSP
Spring 2011
30
An Example Problem
Heat equation:
u
 k 2u
t
Boundary cells
In 2-D:
ut  k (u xx  u yy )
Uniprocessor
Ghost cells
Implementation:
• Jacobi method on a unit square
• Dirichlet boundary condition
• Equal number of intervals along x and y
axis
*
* * *
*
stencil
Boundary
updates
CSP with 4 processes
CSC 7600 Lecture 6 : CSP
Spring 2011
31
Stencil Calculation
xN
xW
xC
xE
xN t   xE t   xS t   xW t 
xC t  1 
4.0
xS
CSC 7600 Lecture 6 : CSP
Spring 2011
32
Heat Distribution : Interactive Session
• We are going to act out a heat distribution problem.
• The take-home message we want to convey is :
–
–
–
–
How cooperative computing works
Notion of mesh/grid partitioning
Use of messaging
Explicit synchronization
CSC 7600 Lecture 6 : CSP
Spring 2011
33
CSC 7600 Lecture 6 : CSP
Spring 2011
34
100
80
60
40
20
100
0
0
0
0
80
0
0
0
60
0
0
0
0
0
0
40
0
0
0
0
0
0
0
0
0
0
10
0
0
0
0
0
20
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
30
20
0
0
0
0
0
0
0
0
40
0
0
0
0
0
0
0
0
0
50
0
10
20
30
0
CSC 7600 Lecture 6 : CSP
Spring
40
50 2011
35
Calculate the value of each cell by averaging
its 4 neighboring cells
60  0  0  0 60

 15
4
4
0000 0
 0
4
4
100
80
60
100
0
0
0
0
0
0
0
80
0
0
0
0
0
0
0
60
0
0
0
0
0
0
0
0
0
0
CSC 7600 Lecture 6 : CSP
Spring 2011
36
Calculate the difference between the
previous cell values and new cell values
100
80
60
100
50
20
15
0
80
20
0
0
0
60
15
0
0
100
80
60
100
0
0
0
80
0
0
60
0
0
50  0  50
CSC 7600 Lecture 6 : CSP
Spring 2011
37
After computing the difference for each cell, Determine
the Maximum Temperature ACROSS your problem
chunk
100
80
60
100
50
20
15
80
20
0
0
60
15
0
0
50
Send this value to
coordinator
Coordinator
CSC 7600 Lecture 6 : CSP
Spring 2011
38
Coordinator Waits for all processing elements to
send their values and determines the maximum
of all the values it receives
max(50,10,10,25)  50
Coordinator
50 10
10
25
YES
MAX <
7.0 ?
Proc 1
Proc 2
Proc 3
STOP
Proc 4
NO
CSC 7600 Lecture 6 : CSP
Spring 2011
39
100
80
60
40
20
100
50
20
15
10
80
20
0
0
60
15
0
0
10
0
15
40
15
10
5
0
0
0
0
0
0
2.5
10
0
0
0
0
5.0
20
0
0
0
7.5
0
0
0
0
5.0
10
0
0
0
0
0
0
7.5
30
20
5
0
0
0
0
0
0
10.0
40
0
0
2.5
5.0
7.5
5.0
7.5
10.0
25.0
50
30
CSC 7600 Lecture 6 : CSP
Spring
40
50 2011
0
10
20
0
40
ITERATION 1
CSC 7600 Lecture 6 : CSP
Spring 2011
41
ITERATION 2
CSC 7600 Lecture 6 : CSP
Spring 2011
42
ITERATION 3
CSC 7600 Lecture 6 : CSP
Spring 2011
43
ITERATION 4
CSC 7600 Lecture 6 : CSP
Spring 2011
44
ITERATION 5
CSC 7600 Lecture 6 : CSP
Spring 2011
45
ITERATION 6
CSC 7600 Lecture 6 : CSP
Spring 2011
46
ITERATION 7
CSC 7600 Lecture 6 : CSP
Spring 2011
47
ITERATION 8
CSC 7600 Lecture 6 : CSP
Spring 2011
48
ITERATION 9
CSC 7600 Lecture 6 : CSP
Spring 2011
49
ITERATION 10
CSC 7600 Lecture 6 : CSP
Spring 2011
50
ITERATION 11
CSC 7600 Lecture 6 : CSP
Spring 2011
51
ITERATION 12
CSC 7600 Lecture 6 : CSP
Spring 2011
52
ITERATION 13
CSC 7600 Lecture 6 : CSP
Spring 2011
53
ITERATION 14
CSC 7600 Lecture 6 : CSP
Spring 2011
54
ITERATION 15
CSC 7600 Lecture 6 : CSP
Spring 2011
55
ITERATION 16
CSC 7600 Lecture 6 : CSP
Spring 2011
56
ITERATION 17
CSC 7600 Lecture 6 : CSP
Spring 2011
57
ITERATION 18
CSC 7600 Lecture 6 : CSP
Spring 2011
58
ITERATION 19
CSC 7600 Lecture 6 : CSP
Spring 2011
59
ITERATION 20
CSC 7600 Lecture 6 : CSP
Spring 2011
60
ITERATION 50
CSC 7600 Lecture 6 : CSP
Spring 2011
61
ITERATION 100
CSC 7600 Lecture 6 : CSP
Spring 2011
62
Topics
•
•
•
•
•
•
•
Introduction
Towards a Scalable Execution Model
Communicating Sequential Processes
CSP – Heat Distribution Example
Performance Issues
Distributed Programming with Unix
Summary – Material for the Test
CSC 7600 Lecture 6 : CSP
Spring 2011
63
Performance issues for CSP
• Parallelism speeds things up
• Data exchange slows things down
• Finer grain partitioning
– provides more parallelism
• Can use more processors
– requires more fine grain messages
• Overhead becomes more significant per datum
– fewer operations per message
• Overhead of communication becomes more significant per
operation
• Synchronization is another source of overhead
• Computation and communication not overlapped
CSC 7600 Lecture 6 : CSP
Spring 2011
64
Performance Issues for CSP
• Communication (Gather / Scatter, Data exchange)
– Latency
• Network Distance, Message size
– Contention
• Network Bandwidth
– Overhead
• Network interfaces & protocols used
• Synchronization (Blocking read – writes, barriers)
– Overhead
• Load Balancing
– Non-uniform work tasks
– Starvation
– Overhead
CSC 7600 Lecture 6 : CSP
Spring 2011
65
Topics
•
•
•
•
•
•
•
Introduction
Towards a Scalable Execution Model
Communicating Sequential Processes
CSP – Heat Distribution Example
Performance Issues
Distributed Programming with Unix
Summary – Material for the Test
CSC 7600 Lecture 6 : CSP
Spring 2011
66
Parallelism : Operating System level
Program
instructions
• Program (review) : A program is a set of instructions
usually stored in the memory. During execution a
computer fetches the instruction stored in the memory
address indicated by the program counter and
executes the instruction.
• Process (review) : Can be defined as a combination of
program, memory address space associated with the
program and a program counter.
• A program associated with one process cannot access
the memory address space of another program.
• A multi-threaded process is one where a single memory
address space is associated with multiple program
counters.
• In this lecture we limit the discussion to single-threaded
processes for the sake of simplicity.
CPU
Adapted from Ch. 7 Beowulf and Cluster Computing . Gropp, Lusk, Sterling
CSC 7600 Lecture 6 : CSP
Spring 2011
67
Unix processes : Overview
•
•
New processes can be created using the fork()exec() combination.
fork()
– A medium weight mechanism that copies the
address space and creates a process with the
same program. The process that invoked the fork()
call is known as the parent process and the newly
created process is called the child process.
– For the child the fork() call returns a 0 where as for
the parent fork() call returns the process ID (PID)
•
•
The child process then invokes the exec() system
call.
exec()
– changes the program associated with the process
– sets the program counter to the beginning of the
program.
– reinitializes the address space
Image cropped from : http://sc.tamu.edu/
Adapted from Ch. 7 Beowulf and Cluster Computing . Gropp, Lusk, Sterling
CSC 7600 Lecture 6 : CSP
Spring 2011
68
Parallelism using Unix utilities
•
•
•
Usually a shell process waits for the child process to finish execution
before prompting you to execute another command.
By appending a new process invocation with the “&” character, the shell
starts the new process but then immediately prompts for another
command, this is called running a process in the “background”.
This is the simplest form of master-worker model executed using basic
Unix utilities.
!# /bin/bash
export search_string=$1
echo searching for $search_string
for i in 20*
do ( cd $i; grep $search_string * >> $search_string.out
done
wait
cat 20*/$search_string.out > $1.all
& ) ;
Adapted from Ch. 7 Beowulf and Cluster Computing . Gropp, Lusk, Sterling
CSC 7600 Lecture 6 : CSP
Spring 2011
69
Remote Processes
•
•
To create a new process on another machine, the initiator must contact
an existing process and cause it to fork a new process. The contact is
usually made over a TCP socket.
rsh (remote shell) :
– rsh command contacts the rshd process running on the remote machine and
prompts it to execute a script/program.
– The standard I/O for the remote machine are routed through rsh to the local
machine’s standard I/O.
– Due to severe security problems (plain text password handling) utilities like
rsh and telnet are strongly discouraged and deprecated in many systems.
– eg : rsh celeritas.cct.lsu.edu /bin/date
•
ssh (Secure shell) :
– behaves much like rsh, but the authentication mechanism is based on public
key encryption and encrypts all traffic between the local and remote machines.
– since rsh does not have the encryption stage, rsh is substantially faster than
ssh.
– eg : ssh celeritas.cct.lsu.edu /bin/date
CSC 7600 Lecture 6 : CSP
Spring 2011
70
Sockets : Overview
• socket : a bidirectional communication channel between two
processes that is accessed by the processes using the same
read and write functions that are used for file I/O
• Connection Process :
– The initial connection process that two remote processes perform in order
to establish a bidirectional communication channel is asymmetric.
• The remote machine listens for a connection and accepts it
• One process initiates request for connection with the remote machine.
• A bidirectional channel between the two machines is established.
– Once a channel is established the communication between the two
processes is symmetric.
– In a client-server model, the process that waits for a connection is known as
the server and the process that connects to it is known as the client.
CSC 7600 Lecture 6 : CSP
Spring 2011
71
TCP/IP : Overview
•
Common Terms :
– IP : Internet Protocol for communication between computers, and is responsible
for routing IP packets to their destination.
– TCP : Transmission Control Protocol for communication between applications
– UDP : User Datagram Protocol for communication between applications
– ICMP : Internet Control Message Protocol for detecting errors and network
statistics
•
TCP :
– An application that wants to communicate with another application sends a
request for connection.
– The request is sent to a fully qualified address (more on this soon), and port.
– After the “handshake”: (SYN-ACK-SYN) between the two, a bidirectional
communication channel is established between the two.
– The communication channel remains alive until it is terminated by one of the
applications involved.
•
TCP / IP :
– TCP breaks down the data to be communicated between applications into
packets and assembles the data from packets when they reach the destination.
– IP ensures routing of the data packets to their intended receiver.
CSC 7600 Lecture 6 : CSP
Spring 2011
72
TCP/IP : Overview
•
•
•
Each computer on a network is associated with an IP address containing 4
numbers each holding a value between 0-255 eg : 130.184.6.128
Using Domain Name System (DNS) servers the numeric IP address is
mapped to a domain name that is easier to remember, eg the Domain
Name corresponding to 130.184.6.128 is prospero.uark.edu
Analogy : Making a phone call
–
–
–
–
Caller – client
Receiver – server
Phone Number – IP address
Extension – Port number
Client:
• Picking up the receiver – socket ()
• Locating the call recipient (from
phone book / memory) – bind()
• Dialing the phone number –
connect()
• Talking – read() / write()
• Hanging up – close()
Server:
• Connecting phone to the phone line – socket ()
• Selecting an incoming line – bind ()
• Ringer ON – listen()
• Receiving the call – accept()
• Talking – read() / write()
• Hanging up – close()
CSC 7600 Lecture 6 : CSP
Spring 2011
73
Server : Create Socket
/* Create data structures to store connection specific info */
struct sockaddr_in sin, from;
/* The main call that creates a socket */
listen_socket = socket(AF_INET, SOCK_STREAM, 0);
Server
–
–
–
–
Client
Create a TCP socket
Bind socket-port
Listen for connections
Loop:
–
–
–
–
Create a TCP socket
Connect to server
Communicate
Close connection
• accept connection
• communicate
• close connection
CSC 7600 Lecture 6 : CSP
Spring 2011
74
Server : Bind Socket-Port
/* Initializing data structures */
sin.sin_family = AF_INET;
sin.sin_addr.s_addr = INADDR_ANY;
/* 0 - Allowing the system to do the selection of port to bind. This
is user-configurable, we use specific port (PORT_NUM) */
sin.sin_port = htons(PORT_NUM);
bind(listen_socket, (struct sockaddr *) &sin, sizeof(sin));
Server
–
–
–
–
Client
Create a TCP socket
Bind socket-port
Listen for connections
Loop:
–
–
–
–
Create a TCP socket
Connect to server
Communicate
Close connection
• accept connection
• communicate
• close connection
CSC 7600 Lecture 6 : CSP
Spring 2011
75
Server : Listen for Connections
listen(listen_socket, 5)
/*5 refers to the number of connection requests that the kernel should
maintain for the application */
getsockname(listen_socket, (struct sockaddr *) &sin, &len);
print (“listening on port = %d\n”, ntohs(sin.sin_port));
Client
Server
–
–
–
–
Create a TCP socket
Bind socket-port
Listen for connections
Loop:
–
–
–
–
Server A
130.39.128.9 : port
Create a TCP socket
Connect to server
Communicate
Close connection
• accept connection
• communicate
• close connection
CSC 7600 Lecture 6 : CSP
Spring 2011
76
Server : accept()
talk_socket = accept(listen_socket, (structsockaddr *) &from, &len);
/*accept() a blocking system call that waits until connection from a
client and then returns a new socket(talk_socket)using which the
server is connected to the client, so that it can continue listening
for more connections on the original server socket (listen_socket)
*/
Server
–
–
–
–
Client
Create a TCP socket
Bind socket-port
Listen for connections
Loop:
–
–
–
–
Create a TCP socket
Connect to server
Communicate
Close connection
• accept connection
• communicate
• close connection
CSC 7600 Lecture 6 : CSP
Spring 2011
77
Client : Create Socket
/* Create data structures to store connection specific info */
struct sockaddr_in sin;
struct hostent *hp;
/* The main call that creates a socket */
talk_socket = socket(AF_INET, SOCK_STREAM, 0);
Server
–
–
–
–
Client
Create a TCP socket
Bind socket-port
Listen for connections
Loop:
• accept connection
• communicate
• close connection
–
–
–
–
Create a TCP socket
Connect to server
Communicate
Close connection
CSC 7600 Lecture 6 : CSP
Spring 2011
78
Client : Bind and Connect to Server
/* initialize data structures*/
hp = gethostbyname(HOST_NAME)
bzero((void *)&sin, sizeof(sin));
bcopy((void *) hp->h_addr, (void *) &sin.sin_addr, hp->h_length);
sin.sin_family = hp->h_addrtype;
sin.sin_port = htons(atoi(PORT_NUM));
/* connect to the server */
connect(talk_socket,(structsockaddr *) &sin, sizeof(sin));
Server
–
–
–
–
Create a TCP socket
Bind socket-port
Listen for connections
Loop:
• accept connection
• communicate
• close connection
Client
–
–
–
–
Create a TCP socket
Connect to server
Communicate
Close connection
CSC 7600 Lecture 6 : CSP
Spring 2011
79
Client : send msg. write()
n = write(talk_socket, buf, strlen(buf)+1);
if (n < 0)
error("ERROR writing to socket");
bzero(buf,256);
/*Client initiates communication with server using a write() call*/
Server
–
–
–
–
Client
Create a TCP socket
Bind socket-port
Listen for connections
Loop:
–
–
–
–
Create a TCP socket
Connect to server
Communicate
Close connection
• accept connection
• communicate
• close connection
CSC 7600 Lecture 6 : CSP
Spring 2011
80
Server recv/send read()/write()
n = read (talk_socket, buf, 1024);
if (n < 0)
error("ERROR reading from socket");
else
write(talk_socket, buf, n);
/* simple echo; content stored in buf*/
Server
–
–
–
–
Client
Create a TCP socket
Bind socket-port
Listen for connections
Loop:
• accept connection
• communicate
• close connection
–
–
–
–
Create a TCP socket
Connect to server
Communicate
Close connection
CSC 7600 Lecture 6 : CSP
Spring 2011
81
Client : recv : read()
n = read(talk_socket, buf, 1024);
if (n < 0)
error("ERROR reading from socket");
else
printf(“received from server: %s \n”,buf);
/*receives messages sent by the server stored in buf*/
Server
–
–
–
–
Client
Create a TCP socket
Bind socket-port
Listen for connections
Loop:
• accept connection
• communicate
• close connection
–
–
–
–
Create a TCP socket
Connect to server
Communicate
Close connection
CSC 7600 Lecture 6 : CSP
Spring 2011
82
Close Socket
close(talk_socket)
close(talk_socket)
/* Ends the socket connection
corresponding to one particular
client. The control goes back
to the loop and server
continues to wait for more
client connections at
listen_socket */
/* Ends the client socket
connection
*/
Server
–
–
–
–
Client
Create a TCP socket
Bind socket-port
Listen for connections
Loop:
–
–
–
–
Create a TCP socket
Connect to server
Communicate
Close connection
• accept connection
• communicate
• close connection
CSC 7600 Lecture 6 : CSP
Spring 2011
83
Demo: Socket Example
CSC 7600 Lecture 6 : CSP
Spring 2011
84
Socket Programming: Problems
• Limited portability (not all interconnect interfaces support
sockets)
• Limited scalability (number of ports available on a node)
• Tedious and error-prone hand-coding (unless somebody did it
before)
• Tricky startup process (assumed port availability is not
guaranteed)
• Only point-to-point communication supported explicitly; no
implementation of collective communication patterns
• Frequently used communication topologies not available (e.g.,
Cartesian mesh), have to be coded from scratch
• Direct support only for data organized in continuous buffers,
forcing writing own buffer packing/unpacking routines
CSC 7600 Lecture 6 : CSP
Spring 2011
85
Socket Programming : Problems
• Suffer from the overhead of protocol stack (TCP), or require
designing algorithms to manage reliable, in-order, free of duplicates
arrival of complete messages (e.g. datagram-oriented)
• Basic data transfer calls (read/write) do not guarantee returning or
sending the full requested number of bytes, requiring the use of
wrappers (and possibly resulting in multiple kernel calls per
message)
• Complicated writing of applications with changing/unpredictable
communications (it’s only easy when reads are matched to writes
and you know when both of them occur)
• On some OS’s sockets may linger long after application exits,
preventing new startups using the same configuration
• If used, asynchronous management of socket calls adds another
layer of complexity (either through select() or multiple threads)
CSC 7600 Lecture 6 : CSP
Spring 2011
86
Topics
•
•
•
•
•
•
•
Introduction
Towards a Scalable Execution Model
Communicating Sequential Processes
CSP – Heat Distribution Example
Performance Issues
Distributed Programming with Unix
Summary – Material for the Test
CSC 7600 Lecture 6 : CSP
Spring 2011
87
Summary : Material for the Test
•
•
•
•
•
•
Scalability (strong, weak scaling) : 7 – 10
Cooperative computing : 12 – 16
Communicating Sequential Processes : 18 – 22
Message Passing : 24 – 28
Performance issues of CSP : 64 – 65
Sockets, TCP / IP : 71, 72, 85, 86
CSC 7600 Lecture 6 : CSP
Spring 2011
88
CSC 7600 Lecture 6 : CSP
Spring 2011
89