CS591x Cluster Computing and Programming Parallel Computers

Download Report

Transcript CS591x Cluster Computing and Programming Parallel Computers

CS591x
Cluster Computing and
Programming Parallel Computers
Grid Computing
Global Grid Exchange and
Parabon’s Frontier
GGE - Frontier
Global Grid Exchange is based on
Frontier
Frontier is a parallel computing platform
from Parabon, Inc.
Frontier is massively parallel platform
Frontier provides a framework to define,
launch, execute and manage parallel
applications across the Internet
GGE - Frontier
Frontier derives is computational
processing power from “relatively lowpower, unreliable, high latency nodes”
… but potentially on a massive scale
These nodes are largely desktop
computers
Frontier uses idle time on these nodes
GGE - Frontier
processing on a specific only occurs if
the processor is in an unused state
“unreliable” means that when a node
returns to an active state (i. e. the user
moves a mouse) the task is “killed”
Frontier has facilities to help mitigate
the unexpected lose of nodes.
GGE - Frontier
Program in Java
or any language that produces bytecode
must run under JVM
client application must make Java calls
to the Client API
The work task must be coded in Java
GGE - Frontier
Frontier platform



Client application – the front end application that
defines the overall application and communicates
with Frontier API. This runs on the GGE/Frontier
user’s computer
Frontier compute engine – compute nodes running
an application that provides spare (idle) machine
cycles to the Frontier job
Frontier Server – handles the assignment,
allocation of resources, schedules and manages
jobs
GGE - Frontier
Compute
Engine
Frontier
Client
Application
Compute
Engine
Server
…
Compute
Engine
Compute
Engine
GGE - Frontier
Main components of a Frontier
application

jobs
 relatively isolated applications

tasks
 set of one or more work applications
 arbitrary
 independent
 a task runs on a single compute node
GGE - Frontier
jobs =


a set of elements
a set of tasks
elements =

data elements
 blocks of data represented as a parameter list,
sent to compute engine for execution of task

executable elements
 Java bytecode that defines the executable task
to be performed by the compute engine
GGE - Frontier
Tasks – defined by task specification

tasks made up of
 executable elements that it requires
 a list of parameters
 an entry point class name

executable element defines the computational
work of the compute node
 must be bytecode in jar file
 defined by
.addJarFileExecutableElement
.addRequiredExecutableElement method
GGE - Frontier
about tasks




tasks cannot communicate with other
running tasks
all communications with a task takes place
from the server at the time that the task is
instantiated
All communication from a task takes place
via status reports, some may be passed by
to the client application
task can see/use global and job level
elements
GGE - Frontier
Parameter lists





define as a set of name-value pairs
name – some text string
value – an assigned value of a primative
type
parameter lists – grouped together into
parameter maps
parameter maps attached to task
specification
GGE - Frontier
Task Status



all information is reported back from
compute engine as status reports
some status reports are passed back from
server to client application
recent status reports replace previous
status reports
GGE - Frontier
Types of status reports

run mode
 unstarted, running, complete,…

results
 returned as name-value pairs in parameter
maps

exceptions
 codes and text description of exceptions

progress
 single scalar metric of task progress.
Checkpointing
Compute nodes are “unreliable”


can frequently leave the compute node
“pool”
owner/user moves a mouse,etc
In progress job is stopped and exited
Potential to lose a lot of work
Checkpointing
Checkpointing allows the Frontier server to
restart a task at or near the place where it
exited – where it was last checkpointed
Checkpointing involves recording/saving
important state variable in the task and
sending these state variables to the server
The server replaces the task’s task spec with
the checkpoint data
task is restarted with checkpoint data as task
spec rather than the original task spec.
Design Issues
Java for task code


code must run under compute node’s JVM
JVM has tight security model
“Work” done in tasks


Except for Frontier job management work
is done in tasks
Tasks are pushed out to compute nodes
(maybe local)
Design Issues
Best for jobs that have a high compute to io
ratio (compute bound)

Frontier has limited data IO capabilities and no
inter-task communications capabilities
Launch and Listen



most serial programs have program controlled IO
In Frontier you push out the task and listen for
results
You don’t know when those results will come back
Design Issues
Unreliable task execution



Tasks can be stopped, reexecuted,
reassigned
Can’t reasonably predict when tasks will
complete
Can redundantly allocate tasks to make to
make job more resilient
GGE/Frontier Programs
In most general sense you must have
two programs

A client application
 setups, starts and potentially manages the job
 Java or make Java calls to client API

A task
 this contains the scientific or engineering work
that you are trying to accomplish
 must be in Java, run under JVM
Frontier tasks
We’ll start by looking at tasks
Code samples shown here from the
Global Grid Exchange website
www.globalgridexchange/developers
See (in your installation)


local/src/LocalTask.java
local/src/LocalApp.java
Frontier Tasks
public class LocalTask implements Task {
private TaskContext context; // task context
private boolean runtimeWantsMeToStop = false;
public LocalTask(TaskContext context) { this.context = context;
runtimeWantsMeToStop = false; }
// Task interface method to start the task
public void run() throws TaskStoppedException {
try { DoLocalTask(); }
catch(TaskStoppedException e) { throw(e); } } // Task interface
method to
stop the task
public void stop() {
runtimeWantsMeToStop = true; }…..}
Frontier Tasks
private void DoLocalTask() throws TaskStoppedException
{ double progress = 0.0;
NamedParameterMap map = new NamedParameterMap();
if(runtimeWantsMeToStop) {
throw new TaskStoppedException(); }
square = input * input;
// Return the final status (with the results)
map.put("square", square);
context.postResults(1.0, map); // 1.0 denotes 100%
completion }
Frontier Tasks
•Must have in your task code…
private int input; // task parameters
private int square;
public void setInput(int val) { input = val; }
public void setSquare(int val) { square = val; }
Frontier Tasks
Compiler task file to java class file
Put class file in Jar file
Frontier Client Application
Client application defines jobs
Jobs define tasks
Client works with Frontier server (for
remote tasks) or locally to launch tasks
Client Application
Basic steps

Create a session manager
 manager = new LocalSessionManager();



define job attributes (up to you)
assign job attributes to parameter map
attach parameter map to job attribute map
Client applications

Define executable elements
 the class file that you will execute as a task






Set that element as the executable element
define task attributes
assign task attributes to map
define parameters for task
attach task attribute map to task spec
define task proxy
Client Application
Create a listener
Events (results) do not return
automatically
Listener listens for events in GGE
system
Events include results, exceptions,
progress report
Listener
Listener class
class LocalAppJobListener implements
TaskProgressListener, TaskResultListener,
TaskExceptionListener { SessionManager
manager;
Job job; // Constructor
public LocalAppJobListener(SessionManager manager_,
Job job_) { manager = manager_; job = job_;
}
Progress Listener
public void progressReported(TaskProgressEvent event) {
// Get the task's ID from the task attributes
int taskId = event.getAttributes().getIntValue("TaskID");
System.out.println("Task " + taskId + " is "
+event.getProgress()*100 + "% complete"); }
Results Listener
public void resultsPosted(TaskResultEvent event) {
NamedParameterMap resultsMap =event.getResults();
if(resultsMap == null) { return; }
event
if(event.isComplete()) { // Get the Task proxy from the
TaskProxy proxy = event.getTaskProxy(); // Get the
task's ID from the task attributes
int taskId = proxy.getAttributes().getIntValue("TaskID");
// Get the result from the event
int square = resultsMap.getIntValue("square");
System.out.println("Task " + taskId + "reports the square is: " +
square);
Listener – shut things down
// Remove the task from the job
proxy.remove();
System.out.println(“Tasks have been completed.removing
job");
// Remove the job
job.remove();
// Destroy the manager
manager.destroy();
// Exit all running threads
System.exit(0); } }