Transcript slides
Parallel Computing
DCS 860A Topics in Emerging Computer Technologies
DPS 2016, Fall 2014
Dr. Ron Frank & Dr. Tappert
By: Team 1 – DPS 2016
(Leigh Anne Clevenger, Kevin Khan, Mantie Reid, Javid Maghsoudi, Hugh Eng)
4/8/2017
Parallel Computing
1
Presentation Summary:
• Introduction
• Concepts, Software, Memory Architecture, programming Models
• Parallel Computing: Operating Systems
• Parallel Computing Operating Systems: GPU?
• Closing
4/8/2017
Parallel Computing
2
Introduction
• Single Thread: Processing of one command at a time. The
smallest sequence of programmed instructions that can be
managed independently by an operating system’s scheduler.
• Multithreading: they are a subset of a process, so that a
process can have multiple threads and share resources. On a
multiprocessor or multicore system the threads are concurrent
with every processor/core executing a separate thread.
• Serial computing: is execution of one instruction at a time.
This is the type of computing that we are all familiar with.
4/8/2017
Parallel Computing
3
Introduction – cont.
• Parallel computing:
• Is the simultaneous use of multiple processors/cores to solve a problem.
• Problems are broken down into parts that can be solved concurrently.
• Each part is broken into a series of instructions.
• Each instruction can be executed on different processors/cores
• There is a need for a control mechanism.
• Almost all computers that are made today are capable of parallel
processing from a hardware point of view.
• Most of the supercomputers today are really clusters of hardware.
4/8/2017
Parallel Computing
4
Introduction – cont.
Why Parallel Computing?
We are at the limits of single CPU computing in terms of performance
Parallel computing allows us to solve problems that don’t fit onto one CPU.
(An example: the game consoles that are available, they would not be able to
process both the instruction execution and the graphic display processing
needed using one processor.)
Our ability to model real situations require the problem to look at complex,
interrelated events that are occurring at the same time.
Where are we using Parallel Computing?
- In science and engineering: Circuit designs, Molecular sciences,
design of fighter planes, submarines, and other defense systems.
- Industrial and commercial: Oil explorations, medical imaging,
pharmaceutical design,
- weather forecasting
- Search for Extra Terrestrial Intelligence (SETI)
- web search engines
4/8/2017
Parallel Computing
5
Introduction – cont.
• Single Instruction Single Data (SISD) : The oldest type of
computers executing only one
instruction stream with one data in any one clock cycle.
• Single Instruction, Multiple Data (SIMD):
Single instruction each processing unit can work on a
different data element (Processor Arrays and Vector pipelines and most graphic
processing units)
• Multiple Instruction, Single Data (MISD ) : Each
processing unit operates on the data
independently using separate instruction streams (multiple cryptography algorithms
for a single coded message)
• Multiple Instruction, Multiple Data (MIMD) : Every processor is executing a different instruction
and every processor can be working on a different data stream. (most supercomputers,
networked parallel computer clusters
4/8/2017
Parallel Computing
6
Parallel Computing – Concepts & Software
Differences: Parallel Computing & Serial Computing:
Serial Computing: Software has been written for serial computation:
A problem is broken into a discrete series of instructions
Instructions are executed sequentially one after another
Executed on a single processor & Only one instruction may execute at any moment in time
4/8/2017
Parallel Computing
7
Parallel Computing – Concepts & Software – Cont.
Differences: Parallel Computing & Serial Computing:
Parallel Computing:
In the simplest sense, parallel computing is the simultaneous use of multiple
compute resources to solve a computational problem:
A problem is broken into discrete parts that can be solved concurrently
Each part is further broken down to a series of instructions
Instructions from each part execute simultaneously on different processors
An overall control/coordination mechanism is employed
4/8/2017
Parallel Computing
8
Parallel Computing – Computers
Parallel Computers:
Virtually all stand-alone computers today are parallel from a hardware perspective:
• Multiple functional units (L1 cache, L2 cache, branch, prefetch, decode, floatingpoint, graphics processing (GPU), integer, etc.)
• Multiple execution units/cores
• Multiple hardware threads
4/8/2017
Parallel Computing
9
Parallel Computing – Concepts & Terminology
von Neumann Architecture:
• Named after the Hungarian mathematician John von Neumann who first
authored the general requirements for an electronic computer in his 1945
papers.
• Also known as "stored-program computer" - both program instructions and
data are kept in electronic memory. Differs from earlier computers which were
programmed through "hard wiring".
• Since then, virtually all computers have followed this basic design:
Comprised of four main components:
Memory
Control Unit
Arithmetic Logic Unit
Input/Output
4/8/2017
Parallel Computing
10
Parallel Computing – Concepts & Terminology
Flynn's Classical Taxonomy
• There are different ways to classify parallel computers.
• Available Flynn's taxonomy distinguishes multi-processor computer
architectures according to how they can be classified along the two independent
dimensions of Instruction Stream and Data Stream. Each of these dimensions
can have only one of two possible states: Single or Multiple.
• The matrix below defines the 4 possible classifications according to Flynn:
An Example of MISD:A type of parallel computer
Each processing unit operates on the data independently via separate instruction streams .
Single Data: A single data stream is fed into multiple processing units.
4/8/2017
Parallel Computing
11
Parallel Computing – Memory Architectures
There are multiple ways of having memory architecture:
Uniform Memory Access (UMA):
Non-Uniform Memory Access (NUMA):
Distributed Memory
4/8/2017
Parallel Computing
12
Parallel Computing – Programming Models
Shared Memory Model (without threads)
• In this programming model, tasks share a common address space, which
they read and write to asynchronously.
Threads Model
• This programming model is a type of shared memory programming.
Distributed Memory / Message Passing Model
4/8/2017
Parallel Computing
13
Parallel Computing – Programming Models
Data Parallel Model
The data parallel model demonstrates the following characteristics:
• Address space is treated globally
• Most of the parallel work focuses on performing operations on a
data set.
• The data set is typically organized into a common structure, such
as an array or cube.
4/8/2017
Parallel Computing
14
Parallel Computing – An Example
Array Processing: This example demonstrates calculations on 2-dimensional array
elements, with the computation on each array element being independent from other
array elements.
• The serial program calculates one element at a time in sequential order.
Serial code could be of the form:
Parallel Solution
•
•
Arrays elements are distributed so that each processor owns a portion of an array (subarray).
Independent calculation of array elements ensures there is no need for communication between tasks.
4/8/2017
Parallel Computing
15
Parallel Computing Operating Systems
Cluster
Each computer has a complete OS, and they can be combined using load-balancing
servers for task parallelism, or perform computation for a single program
Beowulf
Cluster built of standard computers with a standard OS, controlled by server using
Parallel Virtual Machine (PVM) and Message Passing Interface (MPI)
Client nodes do only what they are directed to do
Symmetric Multi-Processing (SMP)
All processors are peers, sharing memory and I/O bus
Asymmetric Multi-Processing (AMP)
Operating system reserves processors for parallel use, cores may be specialized.
Embedded
4/8/2017
Compilers, debuggers for parallel system on a chip (SoC) software designs (i.e. Intel System
Studio)
Parallel Computing
16
Cluster Operating Systems
High Performance Computing (HPC)
Synchronization of clusters, task scheduler
Example – Blue Gene from IBM
Single-system Image (SSI)
Multiple computers look like one
Kerrighed global process management
4/8/2017
Parallel Computing
17
Beowulf Clusters
Low-cost solution for parallel computing platform
Linux on desktops
Scalable
Construct with :
Knoppix bootable CDs
OpenMosix
Open Source cluster application resources (OSCAR)
Examples:
Linux-Windows Hybrid HPC Cluster
Scientific simulations
High Density Computing: Green Destiny from Los Alamos National Labs
4/8/2017
Parallel Computing
18
Introduction
What is GPU?
• It is a processor optimized for 2D/3D graphics, video,
visual computing, and display.
• It is highly parallel, highly multithreaded multiprocessor
optimized for visual computing.
• It provide real-time visual interaction with computed
objects via graphics images, and video.
• It serves as both a programmable graphics processor
and a scalable parallel computing platform.
• Heterogeneous Systems: combine a GPU with a CPU
4/8/2017
Parallel Computing
19
GPU Graphic Trends
• OpenGL – an open standard for 3D programming
• DirectX – a series of Microsoft multimedia programming
•
•
•
•
•
•
•
interfaces
New GPU are being developed every 12 to 18 months
New idea of visual computing:
combines graphics processing and parallel computing
Heterogeneous System – CPU + GPU
GPU evolves into scalable parallel processor
vGPU renders graphics on a server
GPU Computing: GPGPU and CUDA
GPU unifies graphics and computing
4/8/2017
Parallel Computing
20
GPU vs CPU
• GPUs contain much larger number of dedicated ALUs then
CPUs.
• GPUs also contain extensive support of Stream Processing
paradigm. It is related to SIMD ( Single Instruction Multiple
Data) processing.
• Each processing unit on GPU contains local memory that
improves data manipulation and reduces fetch time.
4/8/2017
Parallel Computing
21
GPU Chip Layouts
NVIDIA GeForce 8800
4/8/2017
Parallel Computing
22
Future Apps in Concurrent
World
Exciting applications in mass computing market
Molecular dynamics simulation
Video and audio coding and manipulation
3D imaging and visualization
Consumer game physics
Virtual reality products
Various granularities of parallelism exist, but…
programming model must not hinder parallel
implementation
data delivery needs careful management
Introducing domain-specific architecture
4/8/2017 CUDA for GPGPU
Parallel Computing
23
GPU and CPU: The Differences
ALU
ALU
ALU
ALU
Control
Cache
DRAM
DRAM
CPU
GPU
GPU
More transistors devoted to computation, instead of
caching or flow control
Suitable for data-intensive computation
4/8/2017
Parallel Computing
High arithmetic/memory
operation ratio
24
Three Basic Forms of Network
Storage
Direct access storage (DAS)
Network attached storage (NAS)
Storage area network (SAN)
Virtualization & Software Defined Infrastructure
4/8/2017
Parallel Computing
25
Network Attached Storage (NAS)
NAS is a dedicated storage device, and it operates
in a client/server mode.
NAS is connected to the file server via LAN.
Protocol: NFS (or CIFS) over an IP Network
Network File System (NFS) – UNIX/Linux
Common Internet File System (CIFS) – Windows Remote file
system (drives) mounted on the local system (drives)
evolved from Microsoft NetBIOS, NetBIOS over TCP/IP (NBT), and Server
Message Block (SMB)
Advantage: no distance limitation
Disadvantage: Speed and Latency
Weakness: Security
4/8/2017
Parallel Computing
26
Storage Area Network (SAN)
A Storage Area Network (SAN) is a specialized,
dedicated high speed network joining servers and
storage, including disks, disk arrays, tapes, etc.
Storage (data store) is separated from the
processors (and separated processing).
High capacity, high availability, high scalability,
ease of configuration, ease of reconfiguration.
Fiber Channel is the de facto SAN networking
architecture, although other network standards
could be used.
4/8/2017
Parallel Computing
27
DAS
NAS
FC-SAN
clients
servers
FC
Switch
storage
4/8/2017
Parallel Computing
28
References :
http://en.wikipedia.org/wiki/Computer_cluster#Parallel_programming
http://electronicdesign.com/digital-ics/symmetric-multiprocessing-vs-asymmetric-processing
http://goparallel.sourceforge.net/embedded-goes-parallel/
E. Betti, M. Cesati, R. Gioiosa, and F. Piermaria, “A global operating system for HPC clusters,” in IEEE International Conference on Cluster
Computing and Workshops, 2009. CLUSTER ’09, 2009, pp. 1–10.
M. K. Gobbert, “Configuration and performance of a Beowulf cluster for large-scale scientific simulations,” Computing in Science Engineering,
vol. 7, no. 2, pp. 14–26, Mar. 2005.
I. Castaos, I. Garrido, A. Garrido, and G. Sevillano, “Design and implementation of an easy-to-use automated system to build Beowulf parallel
computing clusters,” in XXII International Symposium on Information, Communication and Automation Technologies, 2009. ICAT
2009, 2009, pp. 1–6.
M. S. Warren, E. H. Weigle, and W. Feng, “High-Density Computing: A 240-Processor Beowulf in One Cubic Meter,” in Supercomputing,
ACM/IEEE 2002 Conference, 2002, pp. 61–61.
S. Liang, V. Holmes, and I. Kureshi, “Hybrid Computer Cluster with High Flexibility,” in 2012 IEEE International Conference on Cluster Computing
Workshops (CLUSTER WORKSHOPS), 2012, pp. 128–135.
K. V. Sandhya and G. Raju, “Single System Image clustering using Kerrighed,” in 2011 Third International Conference on Advanced Computing
(ICoAC), 2011, pp. 260–264.
W. Luo, A. Xie, and W. Ruan, “The Construction and Test for a Small Beowulf Parallel Computing System,” in 2010 Third International
Symposium on Intelligent Information Technology and Security Informatics (IITSI), 2010, pp. 767–770.
4/8/2017
Parallel Computing
29