IBM`s X10 - Parallel Programming Laboratory

Download Report

Transcript IBM`s X10 - Parallel Programming Laboratory

IBM’s X10
Presentation by Isaac Dooley
CS498LVK
Spring 2006
Sources:
• Report on the Experimental Language
X10, Vijay Saraswat
•
http://domino.research.ibm.com/comm/research_projects.nsf/pages/x10.index.html
• X10 Tutorial v10.pdf
Goal of X10
• “The fundamental goal of X10 is to
enable scalable, high-performance,
high-productivity transformational
programming for high-end computers -for traditional numerical computation
workloads… as well as commercial
server workloads.”
Description
• “X10 is a type-safe, modern, parallel,
distributed object-oriented language intended
to be very easily accessible to Java(TM)
programmers. It is targeted to future low-end
and high-end systems with nodes that are
built out of multi-core SMP chips with nonuniform memory hierarchies, and
interconnected in scalable cluster
configurations. …
Description
• … A member of the Partitioned Global
Address Space (PGAS) family of languages,
X10 highlights the explicit reification of locality
in the form of places; lightweight activities
embodied in async, future, foreach, and
ateach constructs; constructs for termination
detection (finish) and phased computation
(clocks); the use of lock-free synchronization
(atomic blocks); and the manipulation of
global arrays and data structures.”
Memory Model
• Not Distributed Memory, Not Shared
Memory
• “Fragmented Memory Model”
• Partitioned Global Address Space(GAS)
– Like UPC, Co-Array Fortran, Titanium
• Globally Asynchronous, Locally
Synchronous (GALS)
Language
• Extended Subset of Java
– Without Java Concurrency(threads,
monitors)
– With Distributed Arrays
– With new built-in types
– With Places, Activites, Clocks
Language
• Places
– Host some data
– Runs some activities(lightweight threads)
– Must spawn activities to access data at other
places
– Correspond to an Address Space (SMP node)
• Immutable data
• Freely copied between places on access
Language
• Clocks
– Used to order activities
• Distributed Arrays
– Support Collectives
Language
• Atomic Sections
– atomic S
– Excecute locally, only accessing local data
• Asynchronous Activities
– async (P) S
– future (P) E
Regions
• Distributions are maps from a region to
a subset of places
• A region [0:200,1:100] specifies a
collection of 2-D points
• Points are used as array indices
• Operations on regions are provided
– Union, disjunction, set difference
Deadlock, Data Races
• X10 guarantee −Any program written with
async, finish, atomic, foreach, ateach, and
clock parallel constructs will never deadlock
• Unrestricted use of future and force may lead
to deadlock, but Restricted use of future and
force in X10 can preserve guaranteed
freedom from deadlocks
• To eliminate Data Races, atomic methods
and blocks should be used
Example of “future”
public class TutFuture1 {
static int fib(final int n) {
if ( n <= 0 ) return 0;
else if ( n == 1 ) return 1;
else {
future<int> fn_1 = future { fib(n-1) };
fn_2 = future { fib(n-2) };
return fn_1.force() + fn_2.force();
}
} // fib()
public static void main(String[] args) {
System.out.println("fib(10) = " + fib(10));
}
}
future<int>
Example of recursive divide-and- conquer
parallelism --- calls to fib(n-1) and fib(n-2)
execute in parallel
Example of “async”
final int n=100;
…
finish {
async for (int i=1 ; i<=n ; i+=2 ) oddSum.val += i;
for (int j=2 ; j<=n ; j+=2 ) evenSum.val += j;
} // Both oddSum and evenSum have been computed now
System.out.println("oddSum = " + oddSum.val + " ; evenSum = " + evenSum.val);
• Parent activity creates new child to execute “for (int i=1 ;
i<=n ; i+=2 ) oddSum.val += i”
• An async statement returns immediately
• Parent execution proceeds immediately to next
statement
• Any access to parent’s local data must be through
final variables
Example of “atomic”
finish {
async for (int i=1 ; i<=n ; i+=2 ) {
double r = 1.0d / i ; atomic rSum += r; }
for (int j=2 ; j<=n ; j+=2 ) {
double r = 1.0d / j ; atomic rSum += r; }
}
An atomic statement/method is conceptually
executed in a single step, while other
activities are suspended − Note: programmer
does not manage any locks explicitly
An atomic section may not include − Blocking
operations − Creation of activities
Example of “atomic”
public class TutAtomic2 {
const int a = new boxedInt(100);
const int b = new boxedInt(100);
public static atomic void incr_a() { a.val++ ; b.val-- ; }
a.val-- ; b.val++ ; }
public static atomic void decr_a() {
public static void main(String args[]) {
int sum;
finish {
async for (int i=1 ; i<=10 ; i++ ) incr_a();
for (int i=1 ; i<=10 ; i++ ) decr_a();
}
atomic sum = a.val + b.val;
System.out.println("a+b = " + sum);
}
}
Console output: a+b = 200
Code for Jacobi 2-D
public class Jacobi {
const region R = [0:N+1, 0:N+1];
const region RInner= [1:N, 1:N];
const distribution D = distribution.factory.block(R);
const distribution DInner = D | RInner;
const distribution DBoundary = D - RInner;
double[D] B = new double[D] (point p[i,j])
{return DBoundary.contains(p) ? (N-1)/2 : N*(i-1)+(j-1); };
// Exploded Variable Declaration, implicitly defined for i,j
public boolean run() {
int iters = 0;
double err;
while(true) {
double[.] Temp = new double[DInner] (point [i,j])
{return (read(i+1,j)+read(i-1,j)+read(i,j+1)+read(i,j-1))/4.0; };
if((err=((B | DInner) - Temp).abs().sum()) < epsilon) break;
B.update(Temp);
iters++;
}}
public double read(final int i, final int j) {
return future(D[i,j]) B[i,j].force();
}
public static void main(String args[]) {
boolean b= (new Jacobi()).run();
System.out.println("++++++ " + (b? "Test succeeded." :"Test failed."));
System.exit(b?0:1);
}