Bulk Synchronous Processing Model
Download
Report
Transcript Bulk Synchronous Processing Model
Bulk Synchronous Parallel Processing Model
Jamie Perkins
Overview
Four W’s – Who, What, When and Why
Goals for BSP
BSP Design and Program
Cost Functions
Languages and Machines
A Bridge for Parallel
Computation
Von Neumann model
Designed to insulate hardware and software
BSP model (Bulk Synchronous Parallel)
Proposed by Leslie Valiant of Harvard University
in 1990
Developed by W.F. McColl of Oxford
Designed to be a “bridge” for parallel
computation
Goals for BSP
Scalability – performance of HW & SW must
be scalable from a single processor to
thousands of processors
Portability – SW must run unchanged, with
high performance, on any general purpose
parallel architecture
Predictability – performance of SW on
different architecture must be predictable in
a straight forward way
BSP Design
Three
Components
Node
Processor
Router
and Local Memory
or Communication Network
Message
Passing or Point-to-Point
communication
Barrier
or Synchronization Mechanism
Implemented
in hardware
BSP Design
Fixed memory architecture
Hashing to allocate memory in “random” fashion
Fast access to local memory
Uniformly slow access to remote memory
Illustration of BSP Computer
Node
P
M
Node
P
M
Node
P
M
Barrier
Communication Network
http://peace.snu.ac.kr/courses/parallelprocessing/
BSP Program
Composed of S supersteps
Superstep consists of:
A computation where each processor uses
only locally held values
A global message transmission from each
processor to any subset of the others
A barrier synchronization
Strategies for programming
on BSP
Balance the computation between
processes
Balance the communication between
processes
Minimize the number of supersteps
BSP Program
Superstep 1
P1
P2
Computation
Communication
Superstep 2
Barrier
http://peace.snu.ac.kr/courses/parallelprocessing/
P3
P4
Advantages of BSP
Eliminates need for programmers to manage
memory, assign communication and perform
low-level synchronization (w/ sufficient parallel
slackness)
Synchronization allows automatic optimization
of the communication pattern
BSP model provides a simple cost function for
analyzing the complexity of algorithms
Cost Function
g – “gap” or bandwidth inefficiency
L – “latency”, minimum time needed for one
superstep
w – largest amount of work performed (per
processor)
h – largest number of packets sent or received
wi + ghi + L = execution time for the superstep i
Languages & Machines
BSP
++
C
C++
Fortran
JBSP
Opal
IBM
SP1
SGI Power Challenge
(Shared Memory)
Cray T3D
Hitachi SR2001
TCP/IP
Thank You
Any Questions
References
http://peace.snu.ac.kr/courses/parallelprocessing/
http://wwwcs.uni-paderborn.de/fachbereich/AG/agmad
http://www.cs.mu.oz.au/677/notes/node41.html
McColl, W.F. The BSP Approach to Architecture
Independent Parallel Programming. Technical report,
Oxford University Computing Laboratory, Dec. 1994
United States Patent 5083265
Valiant, L.G. A Bridging Model for Parallel
Computation. Communications of the ACM 33,8
(1990), 103-111.