Bulk Synchronous Processing Model

Download Report

Transcript Bulk Synchronous Processing Model

Bulk Synchronous Parallel Processing Model
Jamie Perkins
Overview

Four W’s – Who, What, When and Why

Goals for BSP

BSP Design and Program

Cost Functions

Languages and Machines
A Bridge for Parallel
Computation

Von Neumann model


Designed to insulate hardware and software
BSP model (Bulk Synchronous Parallel)
Proposed by Leslie Valiant of Harvard University
in 1990
 Developed by W.F. McColl of Oxford
 Designed to be a “bridge” for parallel
computation

Goals for BSP
Scalability – performance of HW & SW must
be scalable from a single processor to
thousands of processors
 Portability – SW must run unchanged, with
high performance, on any general purpose
parallel architecture
 Predictability – performance of SW on
different architecture must be predictable in
a straight forward way

BSP Design
 Three
Components
 Node
 Processor
 Router
and Local Memory
or Communication Network
 Message
Passing or Point-to-Point
communication
 Barrier
or Synchronization Mechanism
 Implemented
in hardware
BSP Design

Fixed memory architecture

Hashing to allocate memory in “random” fashion

Fast access to local memory

Uniformly slow access to remote memory
Illustration of BSP Computer
Node
P
M
Node
P
M
Node
P
M
Barrier
Communication Network
http://peace.snu.ac.kr/courses/parallelprocessing/
BSP Program

Composed of S supersteps

Superstep consists of:
A computation where each processor uses
only locally held values
 A global message transmission from each
processor to any subset of the others
 A barrier synchronization

Strategies for programming
on BSP

Balance the computation between
processes

Balance the communication between
processes

Minimize the number of supersteps
BSP Program
Superstep 1
P1
P2
Computation
Communication
Superstep 2
Barrier
http://peace.snu.ac.kr/courses/parallelprocessing/
P3
P4
Advantages of BSP
Eliminates need for programmers to manage
memory, assign communication and perform
low-level synchronization (w/ sufficient parallel
slackness)
 Synchronization allows automatic optimization
of the communication pattern
 BSP model provides a simple cost function for
analyzing the complexity of algorithms

Cost Function




g – “gap” or bandwidth inefficiency
L – “latency”, minimum time needed for one
superstep
w – largest amount of work performed (per
processor)
h – largest number of packets sent or received
wi + ghi + L = execution time for the superstep i
Languages & Machines
 BSP
++
C
 C++
 Fortran
 JBSP
 Opal
 IBM
SP1
 SGI Power Challenge
(Shared Memory)
 Cray T3D
 Hitachi SR2001
 TCP/IP
Thank You
Any Questions
References






http://peace.snu.ac.kr/courses/parallelprocessing/
http://wwwcs.uni-paderborn.de/fachbereich/AG/agmad
http://www.cs.mu.oz.au/677/notes/node41.html
McColl, W.F. The BSP Approach to Architecture
Independent Parallel Programming. Technical report,
Oxford University Computing Laboratory, Dec. 1994
United States Patent 5083265
Valiant, L.G. A Bridging Model for Parallel
Computation. Communications of the ACM 33,8
(1990), 103-111.