Master/Worker Pattern implemented in Aviation Grid

Transcript Master/Worker Pattern implemented in Aviation Grid

Bilateral Research and Industrial Development
Enhancing and Integrating GRID Enabled Technologies
Aviation Grid Application and Drug
Discovery Application
Yongjian Wang
Beihang University, Beijing, China
EchoGrid Conference
Beijing, China, 30.10.2007
Agenda
 Background
 Brief introduction to Aviation Grid Application and Drug
Discovery Grid
 Planned Deployment of Aviation Grid Application
 Current Deployment of Drug Discovery Grid
 Master/Worker pattern based Grid Application




Brief Introduction to Master/Worker Pattern
Comparison of Meta-Scheduler and Master/Worker
Some famous Master/Worker pattern implementation
Master/Worker pattern used in our project
 Lesson learned from the project
Background
 Aviation Grid Application and Drug Discovery
Application are two key applications supported
by CNGrid;
 The purpose of these two applications:
 Aviation Grid Application plans to use computing and
storage resource across the Internet to accelerate the
design of airplane;
 Aviation Grid Application will set-up and demonstrate in
operation a distributed workflow that automatically
executing all the simulation services provided by different
partners which come from both China and EU;
 The distributed workflow will be demonstrated with the use
of a modified airplane wing as the input;
 Drug Discovery Grid plans to use computing and
storage resource across the Internet to accelerate the
drug discovery process;
 Drug discovery usually need a long time to search millions
of data items;
Planned Deployment of Aviation Grid
Application
Genetic Algorism
Optimization
SIMDAT
Acoustic
Simulation
Service
HAJIF-II
Workflow CNGrid
Management
DOCK6 and GAsDOCK workflow
Sybyl, text editor
Sybyl, text editor
PDB
Select files
(~1-5)
PDB
for
each
Define active site
Convert pdb to mol2
Receptor.sph
showbox
generate
Active site box
box.in
CDB
(ZINC,SPECS,
CPND, MMPD)
Mol2 file
generate
spheres
INSPH
box.pdb
Dock/Gasdock
exclude.pdb
receptor.mol2
ms
sphgen
protein preparation
Database
preparation
Receptor.pd
b
Preprocess:
Remove water,
ligand, etc
Repair structure
Receptor.ms
grid
generate
Scorgin grids
Create
Protein surface
receptor.cnt
receptor.nrg
receptor.mol2
receptor.sph
Receptor.pdb
Grid.in
Run
DOCK6 or
GAsDOCK
Energy list file
Conformation file
Result
analysis/visualiztion
Manual step
Automatic step
Docking
File
Current Deployment of Drug
Discovery Grid
Beihang
University
DLUT
CERNET
ShangHai
SSC
ShangHai
HongKong
HARNET
CAS SIMM
SJTU
CERNET
Commons pattern in these two Grid
Applications
 The pattern of these two applications are quite similar:
 Need central server to split big job into smaller sub-tasks,
dispatch sub-tasks to proper computing resources, monitor
job and sub-tasks status, collect process result etc;
 Computing and data resource scattered across the Internet;
 Reliable requirements:
 Fault detection for all the resource, all the running jobs and all the
running sub-tasks;
 Re-try failed jobs and sub-tasks automatically according to
different predefined policies etc;
 During the design and development of the project, we adopt
Master/Worker pattern:
 Master/Worker pattern is suitable for the scenarios we
encountered in Aviation and Drug Discovery Application;
 Master/Worker pattern is also a widely used computing pattern;
Agenda
 Background
 Brief introduction to Aviation Grid Application and Drug
Discovery Grid
 Planned Deployment of Aviation Grid Application
 Current Deployment of Drug Discovery Grid
 Master-Worker pattern based Grid Application




Brief Introduction to Master/Worker Pattern
Comparison of Meta-Scheduler and Master/Worker
Some famous Master/Worker pattern implementation
Master/Worker pattern used in our project
 Lesson learned from the project
Master-Worker Pattern based Grid
Application
Master-worker pattern involve two
different roles:
Master node in charge of splitting large job
into smaller sub-tasks, dispatching task,
monitor status and collecting results etc;
Worker in charge of processing sub-task
locally, and two communication models are
widely used:
Pull model
Worker asks master for new task according
to local workload and whether resource is
available;
Resource Scheduler is not necessary;
Push model:
Master sends tasks to all the available
workers;
Resource scheduler is necessary;
Many grid middleware support masterworker pattern, but usually used within a
cluster:
Condor MW;
Google MapReduce;
CommonJ Specification etc;
Comparison of Master/Worker and
Meta-Scheduler
 Master/Worker pattern and Meta-Scheduler have
some overlapping functions, here is comparison:
 Meta-Scheduler is a very basic service that in
charge of dispatching tasks to proper
resources for executing:
 Meta-Scheduler consists of resource scheduler and
computing elements and usually is provided by grid
middleware, i.e.
 Resource Scheduler: gLite RB/WMS, Condor
Scheduler;
 Computing Element: gLite/LCG CE, Condor Pool etc;
 Master/Worker is a common computing pattern
and has many different ways to implement. In
grid environment, we can implement
Master/Worker pattern based on MetaScheduler and other components provided by
grid middleware such as information service:
Condor MW Pattern
 Condor MW is a Condor-based Master-Worker solution;
 Used within Condor Pool;
 Adopt MPI and PVM based communication;
 Condor DAG for batch job workflow;
Condor MW Pattern (cont.)
Google MapReduce Pattern
 Google MapReduce also provides a flexible and powerful
Master/Worker pattern implementation;
 Divide the customized task into two sub-tasks: Map and
Reduce;
 Task will be dispatched to where data located instead of
moving data to where task is executing:
 Different to Batch Job System’s executing model;
 Different to standard operations defined in OGSA-BES;
 Implemented on the application level and with the following
requirements:




Master node and worker nodes should be in the same cluster;
High network widths among all the worker nodes;
Google MapReduce is tight coupled with Google Filesystem;
Not a general purpose Master/Worker implementation;
Google MapReduce Pattern (cont.)
CommonJ Specification
 CommonJ specification consists of Timer and Work
Manager, is a BEA-IBM joint specification;
 The Timer and Work Manager specification provide application
programmers with often-needed utilities.
 The Work Manager API provides a simple API for applicationsupported concurrent execution of work items.
 Enables applications to schedule work items for concurrent execution,
 Provide greater throughput and increased response time.
 The Timer API provides a simple API for setting timers in an
application-supported fashion.
 Enables applications to schedule future timer notifications and receive
timer notification;
 CommonJ provides Master/Worker pattern implementation
which running in local process;
 Not support distributed computing model;
 Not suitable in our scenarios;
Why we provide another Master/Worker
Pattern implementation?
 Condor MW, Google MapReduce and CommonJ are mature
Master/Worker pattern implementations, why we provide
another Master/Worker Pattern implementation in our
project?
 Condor MW, Google MapReduce and CommonJ are perfect,
but have some strong requirements which is difficult to satisfy
in our scenarios:
 Condor MW and Google MapReduce require master node and
worker nodes should be within the same cluster; CommonJ
requires master and worker within the same process;
 Condor MW need PVM and MPI for communication which is not
necessary for the sequential application, i.e. Aviation Grid
Application;
 Google MapReduce divides the process into two sub-process:
map and reduce, which is suitable for document search, but is not
suitable for our scenarios;
 Both Aviation Grid Application and Drug Discovery Grid want
to utilize the resource distributed across the Internet;
 Master node and workers are usually owned by different owners;
 Need to support different kinds of batch job system including PBS,
LSF, BONIC etc;
Our Master/Worker Pattern implementation
 To fulfill the requirements in project, our implementation
mainly consist of five components:
 Workflow engine: compose different services into a unified
service;
 Master node: central controller;
 Job Scheduler
 Meta-Scheduler
 Status Monitor
 Message Server: communication among master and workers;
 JMS-based Message Exchange;
 Additional support reliable message transfer;
 JobExecute: act as the role of worker, provide an abstraction
of underlying computing resources;




Provide unified interface for master node;
Provide support for different underlying grid infrastructure;
SEDA-based architecture to support massive throughout
Easy to support other kinds of computing resource;
 Different underlying grid middleware: grid middleware is used
to organize underlying computing resource;
 Current support GOS and gLite;
 GRIA and BOINC support is in–progress;
Master/Worker Pattern implemented in
Aviation Grid Application & DDGrid
User
Workflow Engine
Optimus from LMS Corp.
Web Service Interface
for Job Submission and Notification
Job Scheduler
Job Status DB
Master Node
Meta-Scheduler Task Status DB
Message
Server
Communication
Module
JobExecute
4GOS
JobExecute
4gLite
GOS
gLite WMS
/RB
Job/Task-List
Different kinds of
Worker
JobExecute
4GRIA
GRIA
JobExecute
4BOINC
BOINC Server
Master/Worker Pattern implemented in
Aviation Grid Application & DDGrid
User
Workflow Engine
Optimus from LMS Corp.
Web Service Interface
for Job Submission and Notification
Job Scheduler
Job Status DB
Master Node
Meta-Scheduler Task Status DB
Message
Server
Communication
Module
JobExecute
4GOS
JobExecute
4gLite
GOS
gLite WMS
/RB
Job/Task-List
Different kinds of
Worker
JobExecute
4GRIA
GRIA
JobExecute
4BOINC
BOINC Server
Example: Optimus Workflow Engine used in
Aviation Grid Application
Master/Worker Pattern implemented in
Aviation Grid Application & DDGrid
User
Workflow Engine
Optimus from LMS Corp.
Web Service Interface
for Job Submission and Notification
Job Scheduler
Job Status DB
Master Node
Meta-Scheduler Task Status DB
Message
Server
Communication
Module
JobExecute
4GOS
JobExecute
4gLite
GOS
gLite WMS
/RB
Job/Task-List
Different kinds of
Worker
JobExecute
4GRIA
GRIA
JobExecute
4BOINC
BOINC Server
Task request
from workflow
Example: DDGrid Job Scheduler in Master
Node
Task Instance used only in
Web Service
Interface
initialized
task schedule process,
including: (1) userid; (2)
processid; (3) taskid; (4)
task processing state;
Pending State
Pending Waiting
Failed State
Failed Running
Waiting Queue
Waiting Running
Task processing state
such as the number of
job in total and
Running Waiting
Running Finished
current state of job
processing
Busy
TaskInfo DB
Busy
Idle Thread
Failed Queue
Finished Queue
Scheduled Scanner
Waiting State
Finished State
Ehcache and
HSQLDB based
queue
implementation
Running Failed
Busy threads in thread
pool are running job
schedule and treat task as
parameter
JobInfo DB
Busy
Job processing state such
as which node is used and
whether finished or not
Idle Thread
Running State
Fixed Number threads exist
in the thread pool, number of
thread is between 15 and 25
Master/Worker Pattern implemented in
Aviation Grid Application & DDGrid
User
Workflow Engine
Optimus from LMS Corp.
Web Service Interface
for Job Submission and Notification
Job Scheduler
Job Status DB
Master Node
Meta-Scheduler Task Status DB
Message
Server
Communication
Module
JobExecute
4GOS
JobExecute
4gLite
GOS
gLite WMS
/RB
Job/Task-List
Different kinds of
Worker
JobExecute
4GRIA
GRIA
JobExecute
4BOINC
BOINC Server
Communication Model
 Two majority communication models are used in this scenario:
 Pull Model: Worker asks master for new task according local workload
and other factors;
 Easy to implement;
 Resource schedule algorithm is easy;
 Master has less control on worker;
 Push Model: Master send tasks to worker according to global state
monitor and scheduling algorithm;
 Need global status monitor mechanism;
 Resource schedule algorithm is complex;
 Master has more control on worker;
 At present, we choose push model as our communication model;
 In pull model, it’s difficult to control task dispatching process and task
finish time; In push model, master has more control on the whole
process;
 We use message-based communication mechanism to provide a pushstyle communication model ;
 JMS is used;
Communication Model (cont.)
 The communication between master node and workers is
message-based;
 A JMS1.1[Java Message Service] compatible Message Server
is chosen;
 ActiveMQ4.2 with Java NIO supported;
 Master node publish task to JMS queue, worker subscribe
corresponding JMS queue to obtain task information;
 The Message Server should process massively small
messages concurrently;
 Table-1 depicts our test result, for our case, it’s well enough;
 With Java NIO support, the performance is excellent if the
message size is small;
 Small message is common in our scenarios;
Test Case Seq.
Message Size
Concurrent Clients
Case-1
10KB
2200
Case-2
50KB
1200
Case-3
100KB
400~500
Master/Worker Pattern implemented in
Aviation Grid Application & DDGrid
User
Workflow Engine
Optimus from LMS Corp.
Web Service Interface
for Job Submission and Notification
Job Scheduler
Job Status DB
Master Node
Meta-Scheduler Task Status DB
Message
Server
Communication
Module
JobExecute
4GOS
JobExecute
4gLite
GOS
gLite WMS
/RB
Job/Task-List
Different kinds of
Worker
JobExecute
4GRIA
GRIA
JobExecute
4BOINC
BOINC Server
JobExecute
 JobExecute acts as the role of worker in our
implementation;
 Because we want to support different underlying grid
infrastructure, JobExecute need to fulfill the following
requirements:
 Clear separation of concerns between the grid infrastructure,
batch execution environment and the batch application;
 Executing batch process periodically;
 Concurrent batch processing: parallel processing of a job;
 Massively parallel batch processing;
 Staged, message-driven processing;
 Manual or scheduled restart after failure;
 Sequential processing of dependent steps;
Example: JobExecute for GOS
OpenPBS
Notification
Verify
Task
GOS
Message
Parsing
ToJSDL
Submit
Status
JobExecute Process for GOS
EventHandler
 Decompose service into stages separated by queues
 Each stage performs a subset of request processing;
 Stages use light-weight event-driven concurrency internally;
 Queues make load management explicit;
 Stages contain a thread pool to driven execution
 Small number of threads per stage;
 Dynamic control grows/shrinks thread pools with demand;
Dynamic Resource
Control
 Applications implement simple event handler interface
 Apps don't allocate, schedule, or manage threads
Thread Pool
Agenda
 Background
 Brief introduction to Aviation Grid Application and Drug
Discovery Grid
 Planned Deployment of Aviation Grid Application
 Current Deployment of Drug Discovery Grid
 Master-Worker pattern based Grid Application




Brief Introduction to Master/Worker Pattern
Comparison of Meta-Scheduler and Master/Worker
Some famous Master/Worker pattern implementation
Master/Worker pattern used in our project
 Lesson learned from the project
Lesson learned from the project
 Master/Worker pattern is a mature and flexible computing
pattern that is widely used in HPC area.
 Master/Worker is just a computing pattern, in different
environments, we need to decide how to implement it;
 Through our experience obtained from the project,
Master/Worker pattern is especially suitable for data-central
Application Scenario;
 Communication among master node and worker nodes are
flexible, many different ways to implement:
 Condor MW adopt MPI/PVM;
 Google MapReduce use customized RPC mechanism;
 In our implementation, JMS is adopted, which support both point
to point and publish-subscribe communication semantics;
 In our implementation, worker design is based on SEDA
 Configure to support different kinds of underlying grid middleware;
 Massive concurrent access support;
 Support dynamic resource control;
Thanks for your attention
Q&A

Master/Worker Pattern implemented in Aviation Grid

Transcript Master/Worker Pattern implemented in Aviation Grid

Directory