Presenter 19

Download Report

Transcript Presenter 19

Programming Grid Applications
with GRID Superscalar
[ Journal of Grid Computing, Volume 1, Issue 2, 2003. ]
Presenter : Juan Carlos Martinez
Agnostic : Allen Lee
Authors : Rosa M. Badia, Jesús Labarta, Raül Sirvent,
Josep M. Pérez, José M. Cela and Rogeli Grima
Overview
What’s Grid Superscalar and what is its behavior?
Overview
Globus Toolkit
Grid Superscalar
•
Basic idea:
FPU
FXU
It promotes the ease of programming GRID applications
ISU
FXU
•
ISU
FPU
IDU
IDU
LSU
L3 Directory/Control
IFU
BXU
L2
LSU
L2
IFU
BXU

grid
L2
ns  seconds/minutes/hours
http://www.bsc.es/grid/grid_superscalar/documents/FIU_seminar.pdf
Overview
Grid Superscalar
Objective
•
Development complexity of Grid applications to the minimum
– writing a Computational Grid app as easy as writing a sequential one
•
Target applications: composed of tasks
– Granularity of the tasks of the level of simulations or programs
– Data objects are files
Overview
Let’s see how it works…
for (int i = 0; i < MAXITER; i++) {
Input/output files
newBWd = GenerateRandom();
subst (referenceCFG, newBWd, newCFG);
dimemas (newCFG, traceFile, DimemasOUT);
post (newBWd, DimemasOUT, FinalOUT);
if(i % 3 == 0) Display(FinalOUT);
}
fd = GS_Open(FinalOUT, R);
printf("Results file:\n"); present (fd);
GS_Close(fd);
http://www.bsc.es/grid/grid_superscalar/documents/FIU_seminar.pdf
How It Works
For this let’s see a specific example. Let’s use the java program named
Matmul that basically multiply two matrices:
Matmul
A sequential code in Java that creates 2 hyper matrices (4 matrices inside
of each one) and what it does is multiply 4 of them against the other 4 all at
runtime.
Now, with Grid Superscalar, we made this code parallelized.
Let’s understand this better…
Looking at matmul
Getting Started!!!
File Structure
C applications:
1. <myapplication>.idl
2. <myapplication>.c (main program)
3. <myapplication>-functions.c (functions to be executed on the
grid)
Java applications:
1. <myapplication>.idl
2. <arbitraryname>.java (main program) diff from the actual prog
3. <myapplication>Impl.java (functions-methods to be
executed on the grid)
http://www.bsc.es/grid/grid_superscalar/documents/ssh_gridsuperscalar_quick_tutorial.pdf
How It Works
For this case of Matmul we will basically have two folders and two xml files.
matmul_java_master
App.java Matmul.idl project.gsdeploy@
Matmul_java_worker
Matmul.idl MatmulImpl.java Block.java
Matmul_java (the xml file itself)
Project.gsdeploy@  Matmul_java
MatmulAppException.java
The IDL file
Matmul.idl
interface CHOLESKY {
void multiply_accumulative ( inout File f3, in File f1, in
File f2 );
};
The XML file for Matmul_java
<?xml version="1.0" encoding="UTF-8"?>
<project isSimple="yes" masterBandwidth="100000" masterBuildScript=""
masterInstallDir="/home/lion-e/globus2/matmul_java_master"
masterName="la-blade-01.cs.fiu.edu"
masterSourceDir="/a/lion.cs.fiu.edu./disk/216/e/globus2/matmul_java_master"
name="Matmul"
workerBuildScript=""
workerSourceDir="/a/lion.cs.fiu.edu./disk/216/e/globus2/matmul_java_worker">
<disks>
<disk name="_MasterDisk_"/>
<disk name="_WorkingDisk_la-blade-02_cs_fiu_edu_"/>
<disk name="_WorkingDisk_la-blade-03_cs_fiu_edu_"/>
</disks>
<directories>
<directory disk="_MasterDisk_" isWorkingPath="yes" path="/home/lione/globus2/matmul_java_master"/>
</directories>
<workers>
<worker Arch="" GFlops="1.0" LimitOfJobs="1" Mem="16" NCPUs="1" NetKbps="100000"
OpSys="" Queue="none" Quota="0" deploymentStatus="deployed"
installDir="/home/lion-e/globus2/matmul_java_worker" name="la-blade-02.cs.fiu.edu">
<directories>
<directory disk="_WorkingDisk_la-blade-02_cs_fiu_edu_" isWorkingPath="yes"
path="/home/lion-e/globus2/matmul_java_worker"/>
</directories>
</worker>
….
The Deployment Center
Adding Hosts
Selecting hosts for a specific
project
Deploying our application in the
workers…
We’ve got to build the master
We get inside this folder and execute..
gsjavabuild master Matmul
We’ve got to build the worker
We get inside this folder and execute..
gsjavabuild worker Matmul
After that application ready to run (deployed)
Source files wanted?
gsstubgen -j Matmul.idl
Files created: When deploying with
gsjavabuild…
• matmul_java_master
•
•
•
•
•
•
•
•
•
App.java
Original Files
Matmul.idl
project.gsdeploy
App.class
ConstraintsWrapper.class
Matmul.class
MatmulConstraints.class
MatmulConstraintsInterface.class
MatmulOps.class
Files created: When deploying with
gsjavabuild…
• Matmul_java_worker
•
•
•
•
•
•
•
•
•
•
•
Block.java
MatmulAppException.java
Matmul.idl
MatmulImpl.java
workerGS.sh.in
Block.class
MatmulAppException.class
MatmulImpl.class
MatmulOps.class
Worker.class
workerGS.sh
Original Files
Interaction
App.java
public class App
{
private final int MSIZE = 2;
private String [ ][ ]_A;
private final int BSIZE = 64;
private String [ ][ ]_B;
private String [ ][ ]_C;
public void Run ()
{
initialize_variables(); // initialize arrays holding the acctual array names
try
{ fill_matrices();
}
catch ( IOException ioe )
{ ioe.printStackTrace();
return;
}
GSMaster.On();
for (int i = 0; i < MSIZE; i++)
for (int j = 0; j < MSIZE; j++)
for (int k = 0; k < MSIZE; k++)
Matmul.multiply_accumulative( _C[i][j], _A[i][k], _B[k][j] );
GSMaster.Off(0);
}
private void initialize_variables ()
{
…
}
private void fill_matrices () throws FileNotFoundException, IOException
{ ….
}
public static void main(String args[ ])
{
(new App()).Run();
}
}
Whats GSMaster.java?
GSMaster class calls native functions in C
which are implemented in the file GS.cc
GSMaster.On()  GS_ON()
GSMaster.Off()  GS_OFF()
GS_ON()
checks for environment variables
activates modules from globus like:
globus_l_module_activate(GLOBUS_COMMON_MODULE);
globus_l_module_activate(GLOBUS_XIO_MODULE);
globus_l_module_activate(GLOBUS_FTP_CLIENT_MODULE);
….
Creates folders for the debugging files that will be created if the GS_DEBUG***
envoronment variable was activated. This job of creating files is done with:
res = globus_gram_client_job_request(….);
pre_ws_gram (GT2)***
In other words leaves everything prepared in the Grid s o that when the execution comes,
globus will allow it.
GS_OFF()
Basically does the opposite of GS_ON(), that is, free resources that were created by
GS_ON() like:
resul = globus_module_deactivate(GLOBUS_COMMON_MODULE);
resul = globus_module_deactivate(GLOBUS_XIO_MODULE);
resul = globus_module_deactivate(GLOBUS_FTP_CLIENT_MODULE);
…..
And to delete files it uses:
res = globus_gram_client_job_request(….);
pre_ws_gram (GT2)***
Again on App.java
public class App
{
private final int MSIZE = 2;
private String [][]_A;
private final int BSIZE = 64;
private String [][]_B;
private String [][]_C;
public void Run ()
{
initialize_variables();
// initialize arrays holding the actual array names
try
{ fill_matrices();
}
catch ( IOException ioe )
{ ioe.printStackTrace();
return;
}
GSMaster.On();
for (int i = 0; i < MSIZE; i++)
for (int j = 0; j < MSIZE; j++)
for (int k = 0; k < MSIZE; k++)
Matmul.multiply_accumulative( _C[i][j], _A[i][k], _B[k][j] );
GSMaster.Off(0);
}
private void initialize_variables ()
{
…
}
private void fill_matrices () throws FileNotFoundException, IOException
{ ….
}
public static void main(String args[])
{
(new App()).Run();
}
}
Matmul.java
/* This file has been autogenerated from 'Matmul.idl'. */
/* CHANGES TO THIS FILE WILL BE LOST */
public class Matmul implements MatmulOps
{
public static void multiply_accumulative(String f3, String f1, String f2)
{
/* Marshalling/Demarshalling buffers */
/* Parameter marshalling */
String pars[] = new String[4];
pars[0] = f3;
pars[1] = f1;
pars[2] = f2;
pars[3] = f3;
GSMaster.Execute(multiply_accumulativeOp, 3, 0, 1, 0, pars);
}
}
ws_gram GT4***
Execution Itself…
Again GS.cc
GsMaster.Execute  Execute (from GS.cc)
Execute  SubmitShortcuts  DoSubmit
“Execute function : Interface GS – GLOBUS”
DoSubmit
• Data dependencies (queue)
• Submit to list of running tasks.
• Instruction used for Task:
res = globus_wsgram_job_submit(namehost[Task->Machine], rsl,
&Task->input, &Task->monitor, &engine, globus_l_notify_cb);
***GT4***
Interaction
MatmulOps.java
/* This file has been autogenerated from 'Matmul.idl'. */
/* CHANGES TO THIS FILE WILL BE LOST */
public interface MatmulOps
{
int multiply_accumulativeOp = 0;
}
Interaction
Worker.java
/* This file has been autogenerated from 'Matmul.idl'. */
/* CHANGES TO THIS FILE WILL BE LOST */
public class Worker implements MatmulOps
{
public static void main(String args[])
{
int opCod;
if (args.length < 6)
{ System.out.println("ERROR: Wrong arguments list passed to the worker\n");
System.exit(1);
}
opCod = Integer.parseInt(args[1]);
GSWorker.IniWorker(args);
switch (opCod)
{
case multiply_accumulativeOp:
MatmulImpl.multiply_accumulative(args[5], args[3], args[4]);  Local Call
break;
}
GSWorker.EndWorker(args);
}
}
MatmulImpl.java was originally in the folder as we remember, so it’s a
local call what we are doing now:
If we remember:
Matmul_java_worker
Matmul.idl MatmulImpl.java Block.java
MatmulAppException.java
public class MatmulImpl
{
public static void multiply_accumulative( String f3, String f1, String f2 )
{ Block a = new Block( f1 );
Block b = new Block( f2 );
Block c = new Block( f3 );
c.multiplyAccum( a, b );
try
{ c.blockToDisk( f3 );
}
catch ( MatmulAppException ce )
{
System.err.println( ce.getMessage() );
GSWorker.SetResult(-1);
return;
}
}
}
So basically we have…
Grid Superscalar
Execute (inside GS.cc)
 Interface Between GS and Globus
Globus (GRAM running
locally in the
worker)
Local Execution
However we have GT2 and GT4 in
GS
Remember…
GS.GS_ON() & GS.GS_OFF()
GT2
GS.Execute
GT4
GRAM in GT2 & GT4
GRAM Implementations
Pre-WS GRAM - GT2
First implementation of GRAM
GT2 - Globus-specific protocol
Gatekeeper/jobmanager services
WS GRAM
- GT4
Web Service based implementations of GRAM
GT3 OGSI based implementation
GT4 WSRF based implementation
GT2
Remember the “res = globus_gram_client_job_request(….);”??? pre_ws_gram
(GT2)*** GS_ON and GS_OFF
http://www-cse.ucsd.edu/classes/sp00/cse225/notes/shava/globus.html
GT4
GSMaster.Execute(multiply_accumulativeOp, 3, 0, 1, 0, pars);
ws_gram GT4***
http://www-unix.globus.org/toolkit/docs/development/3.9.5/execution/key/WS_GRAM_components.png
Agnostic Questions
1.-
Do you believe that the GRID Superscalar would interfere or
benefit the concept of the economic model of the GRID as
mentioned in a previous presentation (A Case for Economy Grid
Architecture for Service Oriented Grid Computing)?
One of the problems this paper presented was the cost obtained
by deploying a job in a Grid and not having an exact knowledge of
which hosts should be the best to execute each task. Grid
Superscalar, in this sense, takes advantage of knowing the
resources of each of its available workers and in this way it’s able
to know if for example a worker is able to receive and process two
tasks at the same time (2 processor host for example) since Grid
Superscalar has a configuration file for this kind of information.
Agnostic Questions
2.- Would the addition of web services on a GRID utilizing
the GRID Superscalar cause issues with the way the
GRID Superscalar tries to make sequential programs
parallel?
First of all, GS is used as a dynamic library as it is
now, and that library is responsible of the
parallelization process. Now if we add Web Services
into a Grid for example one in each host, then if a
program requires to call two of those web services for
instance Grid superscalar can make those 2 calls
parallel as long as they are not dependent.
Agnostic Questions
3.-
Some of the applications that the GRID Superscalar is geared
towards require large data files. Do you believe that the overhead
of sending the same large files around to support parallel
processing could be more harmful or wasteful than operating the
process sequentially?
GS tries to exploit the data locality of the files. So if a large file is
sent to a machine or a large file is generated as a result in a
machine, GS will consider that information in order to decide
where to run a job (to avoid transfers in future tasks and
minimizing total execution time). Also there is a shared disk
mechanism (described in the manual) where you can specify the
location of replicas of your files in order to avoid GS to transfer
them every time.
Agnostic Questions
4.- Could the GRID Superscalar be optimized if it was
discovered that there are costs for using various
resources? For example, what if it was found that
the connection between two systems on the grid is
slower than the connections between the other
system due to weather or network congestion?
By now the parameters that you can specify about
the network are the theoretical bandwidth in a
machine. We do not work with any dynamical
information (NWS or similar).
Agnostic Questions
5.- How would the GRID Superscalar adjust if one of the
computers that were assigned a task on the GRID
suddenly becomes unavailable due to weather, for
example?
If there is a failure during the execution, current
version of GS stops the master (so, the whole
process). Then you can re-run the program again
without the machine that causes the problem, but the
previous computations that have been checkpointed
won't be repeated. Currently we have a development
version which detects failures in machines and
removes failing machines from the computation at
runtime, and thus the overall process keeps going.
Agnostic Questions
6.- Would there be a reason to use a GRID Superscalar
on a GRID that has few systems, where each system
has a unique resource that will likely be used by tasks
given to the GRID?
It depends on the form which that Grid has. Imagine
that each system is from a different institution, works
with a different queuing system, etc... It would be
easier to gridify the application using GS than using
any other parallel programming model (mpi(Message
Passing Interface),etc). Also the file locality policy can
reduce transfers compared to MPI, for instance (where
you always have to send the data you need to
compute).
Agnostic Questions
7.- The converting of the applications from
sequential to parallel is done without the
programmer’s knowledge. How would this
affect the ability for programmers to deal with
exception handling?
The parallelization is basically functional
parallelization. So an error inside the function
can be detected the same way in the worker
code. When an error is detected, you can
return a value to the master meaning that
things went wrong in that function.
Agnostic Questions
8.-
GRIDs have a very fragmented nature where different parts of the
GRID are administered by different organizations and the
agreements between each organization on the usage are not
necessarily the same. How could the Superscalar make sure that
performance isn’t being hindered by sending tasks to a system
that, by agreement, gives much less CPU utilization than another
system?
When you add a machine in the configuration file you can specify
the computing power of that machine. Then in the estimation
function you can use that value to try to predict the execution time
of that operation in the given machine. As you see it is specified
statically (GS does not gather any information about the real
status of the different systems).
Agnostic Questions
9.- Do you feel that it would be possible to use flat
files as a synchronization component to allow
the GRID Superscalar to allow processes to
use a database to maintain the constraints of
WaW, RaW, and WaR?
Grid Superscalar does need it because it can
do it by itself. File dependency is always
checked by the Grid Superscalar in order to
know which job can be executed and which
one hast to wait until the other one finishes
because of data dependencies.
Agnostic Questions
10.- Does the system provide any sort of protection against
renaming files? Would the Double Hashtable system
be compromised if a submitted task renames files or
makes duplicate files as part of its operations?
You cannot rename source files in a worker (as it is
specified in the manual), but you can copy them and
make whatever you want with that copy. Also with
temporary files (files which are just in that "local
domain" of that task) you can do virtually anything
(they will be removed after the computation, because a
temporary directory is created in order to execute the
task).
Questions? Comments?...
No comment!!! :p