MSCS282 Parallel Computing

Download Report

Transcript MSCS282 Parallel Computing

MSCS6060 Parallel and
Distributed Systems
Lecture 1 Introduction
1
Goal of the Course
We study architectures, algorithms, and programming
paradigm for parallel and distributed systems.
“A distributed system consists of a collection of
autonomous computers linked to a computer network
and equipped with distributed system software.”
“Distributed systems is a term used to define a wide
range of computer systems from a weakly-coupled
system such as wide area networks, to very strongly
coupled systems such as multiprocessor systems.”
2
Parallel vs. Distributed System
Within the context of concurrent computing of a
single job, which takes a long time or is infeasible on
one processor
• strongly-coupled example: multiprocessor systems
– Multiple processors, each with its own cache
– Data exchange through shared memory among processors
– Single operating system
• loose-coupled example: Google data center
– Multiple, independent workstations
– Connected by networks
3
Parallel Computing
A single job is concurrently executed by multiple
processors for shorter time-to-solution. The
system might be
• A single workstation with a multi-core processor
– Each core executes parts of the job
– Data exchange through shared memory
• A cluster of computers connected by high speed
network
– All cores in the cluster participates in execution
– Data exchange through message passing across
network
4
Distributed Computing
Within the context where concurrency is not usually
required for a single job.
• Peer-to-peer (P2P) computing systems
– Participants that make a portion of their resources
directly available to other network participants
– Example: Napster
• Mobile computing network
• Distributed transactions
5
Administrative Issues
• Timing:
– Class: TTH 5:00-6:15PM
– Office Hours: TTH 10:30AM-12:00PM at CU320 or by
appointment
• Course website
– http://www.mscs.mu.edu/~rge/mscs6060
• Textbook:
– An Introduction to Parallel Computing (Ed. 2) by Ananth
Grama et al
– Designing and Building Parallel Programs (Online) by Ian
Foster.
• Exams
– Midterm: Oct 18
– Take home final: 5/13/2010, Tuesday
6
Administrative Issues II
• Grading
–
–
–
–
–
Reading -- 10%
Programming assignment -- 25%
Term projects -- 25%
Midterm exam I -- 20%
Final exam -- 20%
• Late assignment
– each student is allowed to submit one of his
assignments/projects within 48-hours of due time. All
other assignments and projects must be submitted in
time.
• Course policies
7
About Myself
• Name: Rong Ge
• PhD from Virginia Tech
• Research areas:
–
–
–
–
Parallel and distributed systems
High performance computing
Energy-efficient computing
Performance analysis and evaluation
8
Today’s Lecture
• Big picture
– Why do we build parallel and distributed systems?
9
Non Parallel Distributed Systems
terminal
terminal
Server
A sequential system
Not parallel
terminal
terminal
The server machine: a high-performance host,
running one or more server programs which share
its resources with clients.
The clients do not share any of their resources, but
requests a server's content or service function
10
Free Lunch is Over
11
Parallel System Are Ubiquitous Today
• Processors ranging from high end in servers to
low power in mobiles systems
– CMP technology with multiple cores
• High end server processors Xeon and Opteron
– Quad-core or six-core processors
• Atom processors in netbooks
– Dual-core
• Processors in cell phones
– Dual-core
12
Multicore Processors
13
A Multiprocessor, Tightly-Coupled Parallel System
14
A Loosely-Coupled Computer Cluster
15
A Distributed System with Multiple Servers
Server
Server
Server
Server
Server
Traditional Uses
• Numerical simulations of complex systems and
"Grand Challenge Problems" such as:
–
–
–
–
–
–
–
weather and climate
chemical and nuclear reactions
biological, human genome
geological, seismic activity
mechanical devices - from prosthetics to spacecraft
electronic circuits
manufacturing processes
18
Emerging use
Amazon
Google
Airline reservation systems
Finance: e-transactions, e-banking, stockexchange
Military
Advanced graphics and Virtual reality
Networked video and multi-media technologies
19
Why Parallel and Distributed Systems
• Computational speedup
– Some tasks too large for even the fastest single
computer
• Real time weather/climate modeling, human
genome project, fluid turbulence modeling, etc
• Resource sharing
– Distributed systems offer access to specialized
resources of many systems
• Some nodes may have special databases/info
• Some nodes may have access to special hardware
• Reliability and availability
– Redundancy, fail over
20
Why not distributed systems?
What are the disadvantages?
Parallel
distributed
multi-server
vs
vs
vs
sequential?
centralized?
client-server?
Expensive (to have high computation power and
redundancy)
Complexity, maintenance cost
Concurrency => Interleaving => Bugs
Failures lead to incorrectness.
21
The Scope of This Course
• System architectures
• Parallel computing paradigm
–
–
–
–
–
Posix Multithreaded
OpenMP
Message passing
UPC if time allows
Heterogeneous programming
• Parallel algorithm design and analysis
• Distributed system design technology
– Google File system
– Hadoop MapReduce
– Amazon storage systems
• Peer to peer computing
• Cloud computing
22
Summary
• What is parallel and distributed systems
• What are they good for?
23
Multiprogramming and
Multithreading
24
5 components of any Computer
Devices
Keyboard,
Mouse
Input
Disk
Computer
Processor
(active)
Control
(“brain”)
Datapath
(“brawn”)
Memory
(passive)
(where
programs,
data
live when
running)
Output
(where
programs,
data
live when
not running)
Display,
Printer
Levels of Abstraction for a Computer
Applications
Compiler
Software
Hardware
Operating
System
Assembler
Processor
Memory I/O system
Instruction Set
Architecture
OS manages resources
OS manages resources and schedule processes
26
History of Computing Systems
• Phase 1: hardware is expensive, humans are
cheap
– User at console: single-user systems
– Batching systems (1955-1965)
– Multiprogramming systems (1965-1980)
• Phase 2: hardware is cheap, humans are
expensive
– Time sharing: users use cheap terminal and share
servers
• Phase 3: hardware is very cheap, humans are
very expensive
– Personal computing: one system per user
– Distributed computing: lots of systems per user
27
Multiprogramming
• Multiple sequential processes in a run state can be
swapped in and out by the OS
–
–
–
–
Run state: the executable file and data are held in memory
Better utilization of CPU
High throughput
Short timespan for a single program
• IO interrupt
– While I/O is in progress for program P1, the computer will
execute several thousand instructions of program P2 and return
to process the data obtained for P1 when it is ready or when P2
is waiting on I/O
• Time sharing
– The execution of program P1 will be interrupted by time too
Process
• A process is an abstraction that supports running
programs
• A process is the basic unit of execution in an operating
system
• Different processes may run several instances of the same
program
• At a minimum, process execution requires
– Memory to contain the program file and data
– A set of CPU registers to support execution
29
Anatomy of a Process
• Executable file
– Header
– Code
– Initialized data
• Process’s address space
• Process control block
30
CPU Switch From Process to Process
31
Costs of Concurrency Transparency
• Independent processes cannot maliciously or
inadvertently affect the correctness of each other’s
behaviors.
• Cost of concurrency transparency
– Creation: OS must create a complete independent address
space
• Initialize memory segments
• Copy the associated program into a text segment
• Set up a stack for temporary data
– Switching:
• Save the CPU context (register values, program counter,
stack pointer, …)
• Modify registers of the MMU
• Invalidate address translation caches
• Swap processes between main memory and disk
32
Single and Multithreaded Processes
33
Threads vs. Processes
Thread
Processes
• A thread has no data segment or
heap
• A thread cannot live on its own, it
must live within a process
• There can be more than one
thread in a process, the first
thread calls main and has the
process’s stack
• Inexpensive creation
• Inexpensive context switching
• If a thread dies, its stack is
reclaimed by the process
• A process has code/data/heap
and other segments
• There must be at least one thread
in a process
• Threads within a process share
code/data/heap, share I/O, but
each has its own stack and
registers
• Expense creation
• Expensive context switching
• It a process dies, its resources are
reclaimed and all threads die
34
Thread Implementation
• Process defines address space
• Threads share address space
TCB for thread1
• Process Control Block (PCB)
contains process-specific info
– PID, owner, heap pointer, active
threads and pointers to thread info
• Thread Control Block (TCB)
contains thread-specific info
– Stack pointer, PC, thread state,
register …
$pc
$sp
State
Registers
…
…
TCB for thread2
$pc
$sp
State
Registers
…
…
Process’s address space
Reserved
DLL’s
Stack – thread 1
Stack – thread 2
Heap
Initialized data
CODE
35
Benefits of Threading
• Responsiveness
– When one thread is blocked, your browser still responds
– E.g. download images while allowing your interaction
• Resource Sharing
– Share the same address space
– Reduce overhead (e.g. memory)
• Economy
– Creating a new process costs memory and resources
– E.g. in Solaris, 30 times slower in creating process than
thread
• Utilization of MP Architectures
– Threads can be executed in parallel on multiple processors
– Increase concurrency and throughput
36
Examples of Threads
• A web browser
– One thread displays images
– One thread retrieves data from network
• A word processor
– One thread displays graphics
– One thread reads keystrokes
– One thread performs spell checking in the background
• A web server
– One thread accepts requests
– When a request comes in, separate thread is created to
service
– Many threads to support thousands of client requests
37
Once Again....
A PROCESS
A THREAD
Process ID
Thread ID
Program Counter
Program Counter
Signal Dispatch
Table
Signal Dispatch
Table
Registers
Registers
Process Priority
Thread Priority
Stack Pointer &
Stack
Stack Pointer &
Stack
Heap
All threads share
the same
memory, heap,
and file handles
(and offsets)
Memory Map
File Descriptor Table
What does the developer have
to do?
• Decide how to decompose the computation into
parallel parts.
• Create and destroy threads to support the
decomposition
• Add synchronization to make sure dependences
are covered.
What’s POSIX Got To Do With It?
• Each OS had it’s own thread library and style
• That made writing multithreaded programs difficult
because:
– you had to learn a new API with each new OS
– you had to modify your code with each port to a
new OS
• POSIX (IEEE 1003.1c-1995) provided a standard
known as Pthreads
Pthreads: A shared memory
programming model
• POSIX standard shared memory multithreading
interface.
• Not just for parallel programming, but for general
multithreaded programming
• Provide primitives for thread management and
synchronization.
• Threads are commonly associated with shared
memory architectures and operating systems.
– Necessary for unleashing the computing power of SMT and
CMP processors.
– Making it easy and efficient is very important at this time.
Pthreads: execution model
• A single process can have multiple, concurrent
execution paths.
– a.out creates a number of threads that can be
scheduled and run concurrently.
– Each thread has local data, but also, shares the entire
resources (global data) of a.out.
– Any thread can execute any subroutine at the same
time as other threads.
– Threads communicate through global memory.
Summary
• Process vs. thread
• Multiprogramming vs. multhreading
43
Pthreads: POSIX Threads
• Pthreads is a standard set of C library functions for
multithreaded programming
– IEEE Portable Operating System Interface, POSIX, section
1003.1 standard, 1995
• Pthread Library (60+ functions)
–
–
–
–
–
Thread management: create, exit, detach, join, . . .
Thread cancellation
Mutex locks: init, destroy, lock, unlock, . . .
Condition variables: init, destroy, wait, timed wait, . . .
...
• Programs must include the file pthread.h
• Programs must be linked with the pthread library (lpthread)
Pthreads Naming Convention
• Types: pthread[_object]_t
• Functions: pthread[_object]_action
• Constants/Macros: PTHREAD_PURPOSE
• Examples:
–
–
–
–
pthread_t: the type of a thread
pthread_create(): creates a thread
pthread_mutex_t: the type of a mutex lock
pthread_mutex_lock(): lock a mutex
pthread_create()
• Creates a new thread
int pthread_create (
pthread_t *thread,
pthread_attr_t *attr,
void * (*start_routine) (void *),
void *arg);
– Returns 0 to indicate success, otherwise returns error code
– thread: output argument for the id of the new thread
– attr: input argument that specifies the attributes of the thread to
be created (NULL = default attributes)
– start_routine: function to use as the start of the new thread
• must have prototype: void * foo(void*)
– arg: argument to pass to the new thread routine
• If the thread routine requires multiple arguments, they must be passed
bundled up in an array or a structure
pthread_create() example
• Want to create a thread to compute the sum of the elements
of an array
void *do_work(void *arg);
• Needs three arguments
– the array, its size, where to store the sum
– we need to bundle them in a structure
struct arguments {
double *array;
int size;
double *sum;
}
pthread_create() example
int main(int argc, char *argv) {
double array[100];
double sum;
pthread_t worker_thread;
struct arguments *arg;
arg = (struct arguments *)calloc(1,
sizeof(struct arguments));
arg->array = array;
arg->size=100;
arg->sum = ∑
if (pthread_create(&worker_thread, NULL,
do_work, (void *)arg)) {
fprintf(stderr,”Error while creating thread\n”);
exit(1);
}
...
}
pthread_create() example
void *do_work(void *arg) {
struct arguments *argument;
int i, size;
double *array;
double *sum;
argument = (struct arguments*)arg;
size = argument->size;
array = argument->array;
sum = argument->sum;
*sum = 0;
for (i=0;i<size;i++)
*sum += array[i];
return NULL;
}
Comments about the example
• The “main thread” continues its normal execution
after creating the “child thread”
• IMPORTANT: If the main thread terminates, then all
threads are killed!
– We will see that there is a join() function
• Of course, memory is shared by the parent and the
child (the array, the location of the sum)
– nothing prevents the parent from doing something to it
while the child is still executing
– which may lead to a wrong computation
– we will see that Pthreads provide locking mechanisms
• The bundling and unbundling of arguments is a bit
tedious
pthread_exit()
• Terminates the calling thread
void pthread_exit(void *retval);
– The return value is made available to another thread
calling a pthread_join() (introduced later)
– My previous example had the thread just return from
function do_work()
• In this case the call to pthread_exit() is implicit
• The return value of the function serves as the argument to the
(implicitly called) pthread_exit().