Parallel programming based on master
Download
Report
Transcript Parallel programming based on master
On Parallel Computation of
exp(x)
based on
Master-Worker Paradigm
Keiichi Shiraishi (Kagawa N.C.T.)
Yoshiro Imai (Kagawa University)
Welcome to Japan
Welcome to
Takuma Campus
I’m looking forward to
discussing on Computer
Science.
BACKGROUNDS
Processors : Clock up --> Multi-core
Multi-core processors are not expensive.
Core i7, Cell, GPU, etc.
PC clusters, Super Computers and Cloud
Computing are a kind of parallel computing.
Using multi-core processors effectively is
important.
Teaching materials for parallel computing
And…
My research area : Computer Algebra System,
e-Learning and Instructional Design
CONTENTS
Parallel computation of exp(x)
Embarassingly parallel computation and Masterworker paradigm
How to compute/parallelize exp(x)
Algorithms
Experiments
Discussions
Conclusion
EMBARASSINGLY PARALLEL
COMPUTATIONS
Embarassingly parallel computations - no
dependency or communication exists between
parallel tasks
Master-worker paradigm is suitable.
Tasks
Results
Computing/Processing
NUMERICAL APPROXIMATION OF
exp(x)
Worker 1
Worker 2
Worker 3
Worker M
Equation (3) will be divided to M groups of terms and allocated them to
each worker process.
N 1
L
M
NUMERICAL APPROXIMATION OF
exp(x)
In these group, the last terms of the former groups are appeared in the
coefficient of the following other groups, e.g. xL-1/(L-1)! is appeared in all
groups.
PARALLEL COMPUTATION OF
exp(x) (MASTER)
PARALLEL COMPUTATION OF
exp(x) (WORKER)
TEST-BED OF EXPERIMENTS
PC
Processor
Intel(R) Core(TM) i7 860 2.80GHz
4cores (8HT)
Memory
3GB
OS
FreeBSD/i386 8.0-RELEASE
Computer Algebra
System
Risa/Asir 20070806
Number of Workers
1~8
Compute with multiple precision
integer/rational numbers (very slow)
ELAPSED TIME (exp(1), N=1000)
Communicaton
time is needed.
Sequential
Parallel
Number of
workers
increases.
Number of
digits of
return value
decreses.
MAXIMUM NUMBER OF DIGITS OF
NUMERATOR/DENOMINATOR OF
RETURN VALUE FROM WORKERS
Number of worker
processes
Maximum number of digits of
numerator/denominator of return value
1
2565
2
1437
3
969
4
735
5
594
6
492
7
425
8
375
If the number of digits is twice, it would needs 4
times multiplication.
DISCUSSIONS
From the viewpoint of numerical computation, a
computation with N=1000 isn’t needed because
one with about N=10 makes the results sufficient.
There are some numerical approach, e.g. C
standard library’s exp(x), Stirling's
approximation for n!.
For teaching material, this approach would be
good because the tasks allocated each worker
processes have some dependencies. It is more
difficult than to compute the circle ratio.
CONCLUSIONS
Parallel computation of exp(x) is illustrated.
As number of worker processes increase, the
completion time decreases.
As number of digits of numerator/denominator of
return value from worker processes decrease, the
completion time decreases.
Future works
Evaluation the problem as teaching material
exp(x) by C STANDARD LIBRARY
SEQUENTIAL COMPUTATION OF
exp(x)
NUMERICAL APPROXIMATION OF
THE CIRCLE RATIO
1
4
dx
2
1 x
0
4 N 1
1
N i 0 1 ((i 0.5) / N ) 2
4 99
1
100 i 0 1 ((i 0.5) / 100) 2
99
4 49
1
1
2
2
100 i 0 1 ((i 0.5) / 100) i 50 1 ((i 0.5) / 100)
N=100, # of workers is 2.
Tasks are independent each other.
They can be allocated to 2 workers.
TEST-BED OF EXPERIMENTS
PC
PLAYSTATION3
Xeon X3210 2.13GHz
Cell B.E. 3.2GHz
4cores
PPE, 7SPEs
SPE Local Store 256KB
EIB 307.2GB/s
Memory
3GB
256MB
OS
FreeBSD/amd64 7.1RELEASE
Yellow Dog Linux 6.2
CAS,
Library
Risa/Asir 20070806
libspe-2.2.80-132
Processor
ELAPSED TIME (N=50,000,000)
Elapsed Time(Risa/Asir)
# of workers
1
2
3
4
Elapsed time[s]
37.902
19.627
13.787
11.056
Speedup ratio*
1
1.931
2.749
3.428
Elapsed Time(SPE Library)
# of workers
1
2
3
4
5
6
Elapsed time[s]
6.037
3.023
2.019
1.518
1.217
1.017
Speedup ratio*
1
1.997
2.990
3.978
4.959
5.934
* Speedup ratio is the ratio of elapsed time with 1
worker to one with n workers.