Run the workflow

Download Report

Transcript Run the workflow

Using Scientific Workflows in GEON
Efrat Jaeger, Ilkay Altintas
1
Mission of scientific workflow systems
Promote “scientific discovery” by providing tools and
methods to generate scientific workflows
 Create a generic customizable graphical user interface
for scientists from different scientific domains
 Support computational experiment creation, execution,
sharing, reuse and provenance
 Design frameworks which define efficient ways to
connect to the existing data and integrate
heterogeneous data from multiple resources
 Large scale resource sharing and management
 Collaborative and distributed applications
 Gluing it all together to user’s monitor!!!

2
Utilizing Kepler in GEON

An extensible, easy to use, workflow design and prototyping tool

Integrating heterogeneous local and remote tools in a single interface:







Support for High Performance Computations:




Deployment of workflows to the GEON portal
Harvesting data and tools from repositories:




Job submission and monitoring
Logging of execution trace and registering intermediate products
Data provenance and failure recovery
Portal accessibility.


Web and Grid services
GIS services
Legacy application integration via Shell-Command actor
Remote tools via SSH, SCP and GridFTP
Relational and spatial databases access
Reusable generic and domain specific actors
Direct access to data and tools registered to the GEON portal
A web service harvester
Storage Resource Broker (SRB)
Reverse engineering of existing approaches
3
Actor-Oriented Design

Actor



Port



Communication between input and
output data
Without call-return semantics
Composite Actors



Encapsulation of parameterized
actions
Interface defined by ports and
parameters
Abstract information
Sub-workflows
Model of computation (Director)


Communication semantics among
ports
Flow of control
4
Workflow Design and Prototyping
Actor Search

Data Search
Vergil is the graphical user interface for Kepler
 Actor ontology and semantic search for actors
 Search -> Drag and drop -> Link via ports
 Metadata-based search for datasets
5
Actor Search
• Kepler Actor Ontology
• Used in searching actors and creating
conceptual views (= folders)
Currently more than 200 Kepler actors added!
6
Data Search and Usage of Results
• Kepler DataGrid
– Discovery of data resources
through local and remote
services
SRB,
Grid and Web Services,
Db connections
– Registry of datasets on the
fly using workflows
7
Integrating heterogeneous local and remote
tools in a single interface







Generic Web Service Client and Web Service Harvester
GIS Services
Legacy Application Integration via Command Line
wrapper tools, e.g. GMT
RDBMS and Spatial Databases Access
Remote Tools Access via SSH, SCP and GridFTP
Some Grid actors-Globus Job Runner, GridFTP-based
file access, Proxy Certificate Generator
Generic and domain-oriented actors:





Classification and interpolation algorithms
Native R support
Imaging, Gridding, Vis Support
Textual and Graphical Output
…more …
8
Some Features

Support for High Performance Computations
 Job submission and monitoring
 Logging of execution trace and registering intermediate
products
 Data provenance and failure recovery

Portal accessibility
 Deployment of workflows to the GEON portal

Harvesting data and tools from repositories
 Direct access to data and tools registered to the GEON portal
 A web service harvester
 Storage Resource Broker (SRB)
9
Some actors in place
10
GEON Workflows Examples
11
GEON Mineral Classification Workflow
An “early” example:
Classification for
naming Igneous Rocks.
12
GEON Mineral Classifier Workflow
13
PointInPolygon
algorithm
14
Enter initial inputs,
Run
and
Display results
15
Output Visualizers
Browser display
of results
16
Integration Scenario: A-type query

Classifying A-types from an Igneous rock database
 Integrating between Relational and Spatial (shapefiles) databases to query
and interactively display GIS results
 Reusing existing and generic Kepler components (Classifier, JDBC)
Ghulam Memon, Ashraf Memon
17
Reusing The Mineral Classifier
Classification sub-workflow runs for …
… each body, each sample and each diagram
18
Output
19
Extraction of Datasets on the Fly
SQL database access (JDBC)
Translating query xml
response to web service worldImage
xml input format.
XML SOAP response
Query the UTEP
gravity database for
bouguer anomalies
20
Extraction of Datasets on the Fly
Translating query xml
response to web service worldImage
xml input format.
Creating shapefiles
on the fly using ESRI
mapping services
XML SOAP response
Displaying an image
of the shapefile on
a browser interface
21
Image of the
resulting dataset
Sample
22
GEON Dataset Registration
(as in geonSearch)
Annotation form
23
GEON Dataset Registration
ADN metadata
Metadata display
Registering
validation
24
Putting it all together
25
Beach Balls Workflow
GOAL: Integrate seismic focal mechanisms with image services
26
Beach Balls Workflow Output
28
Gravity Modeling Workflow
Observed Gravity
Topography
Pluton map
Sediments
Moho
Output
Difference
calculator
Source: (GEON)
Dogan Seber, Randy Keller
Residual Map
Densities
Interactive 3D model
Defining possible depth
distribution of plutons
29
Kepler as a Modeling Tool: Gravity Modeling Workflow
• Comparing between synthetic and observed gravity models of
heterogeneous data sources. Creating a residual map of the difference using
ESRI services and displaying it on a web browser
• Portrays Kepler as a prototyping tool (“ToDo”)
• Adjustable parameter-wise
ToDo
Joint work between
SDSC and UTEP.
30
Gravity Modeling Workflow
ToDo
31
R. Haugerud,
U.S.G.S
Survey
LiDAR Introduction
Interpolate / Grid
D. Harding,
NASA
Point Cloud
x, y, zn, …
Process &
Classify
Analyze / “Do Science”
32
The Computational Challenge:

LiDAR generates massive data volumes - billions of returns are
common.

Distribution of these volumes of point cloud data to users via
the internet represents a significant challenge.

Processing and analysis of these data requires significant
computing resources not available to most geoscientists.

Interpolation of these data challenges typical GIS / interpolation
software.
 our tests indicate that ArcGIS, Matlab and similar software
packages struggle to interpolate even a small portion of
these data.

Traditionally: Popularity > Resources
33
A Three-Tier Architecture

GOAL: Efficient LiDAR interpolation
and analysis using GEON
infrastructure and tools
Portal
 GEON Portal
 Kepler Scientific Workflow System
 GEON Grid

Use scientific workflows to
glue/combine different tools and the
infrastructure
Grid
35
Kepler can be used as a batch execution engine


Portal
Configuration phase
Subset: DB2 query on DataStar
Monitoring/
Translation
Analyze
Visualize
Subset
move
process
move
render
display
• Interpolate: Grass RST, Grass IDW, GMT…
• Visualize: Global Mapper, FlederMaus, ArcIMS
Scheduling/
Output
Processing
Grid
36
Lidar Processing Workflow (using Fledermaus)
Analyze
Visualize
Subset
move
process
move
render
display
Arizona Cluster
Fledermaus
Create
Scene file
Datastar
sd
iView3D/
Browser
d2
d1
d1
d2 (grid file)
d1
IBM DB2
NFS Mounted Disk
d2
NFS Mounted Disk
37
Lidar Processing Workflow (using Global Mapper)
Analyze
Visualize
Subset
move
process
move
render
display
Arizona Cluster
Global Mapper
Get image
for grid file
Datastar
Browser
d2
d1
d1
d2 (grid file)
d1
IBM DB2
NFS Mounted Disk
d2
NFS Mounted Disk
38
Lidar Processing Workflow (using ArcIMS)
Analyze
Visualize
Subset
move
process
move
render
display
Arizona Cluster
ArcIMS
Datastar
ArcInfo
d1
d1
NFS Mounted Disk
ArcIMS
d2 (grid file)
d1
IBM DB2
ArcSDE
d2
NFS Mounted Disk
39
Lidar Workflow Portlet

User selections from GUI
 Translated into a query and a parameter file
 Uploaded to remote machine

Workflow description created on the fly

Workflow response redirected back to portlet
40
LIDAR POST-PROCESSING WORKFLOW PORTLET
x,y,z and attribute
Client/
NFS Mounted
Disk
DB2
GEON Portal
Render Map
Map
Parameters
Grass
Functions
ArcSDE
ArcInfo
Parameter
xml
ArcIMS
process
output
Create
Workflow
Description
raw
data
DB2
Spatial
query
submit
Map onto the grid (Pegasus)
Grass surfacing algorithms:
Spline
IDW
Compute Cluster
block mean
Binary grid
…
ASCII grid
Text file
Tiff/Jpeg/Gif
ASCII grid
Download data
KEPLER WORKFLOW
41
Portlet User Interface - Main Page
42
43
Behind the Scenes: Workflow Template
47
Filled Template
48
Example Outputs
49
With Additional Algorithms
50
GLW Monitoring

Job management
 A unified interface to follow up on the status of submitted jobs
The system
 View job metadata
 Zoom to a specific bounding box location
 Track errors
 Modify a job and re-submist
 View the processing results
 In the future, register desired workflow products
 Useful for publication

GLW is exposed to a high risk of components failures
 Long running process
 Distributed computational resources under diverse controlling
authorities
 Provides transparent/background error handling using
provenance data and ‘smart’ reruns
51
Examples

Searching for actors and datasets



Create a “Hello World!” workflow


<KEPLER_DIR>/demos/getting-started/04-HelloWorld.xml
Use of GEON data source and portal search


Actor search for ‘gis’
Data search for ‘volcanic’
Search for ‘Igneous’
Relational Database Access and Query

Connect to VT Igneous rocks database:
•
Database format: DB2
• URL: jdbc:db2://data.sdsc.geongrid.org:60000/IGNEOUS
• User: readonly
• Passwd: read0n1y
 Web service based workflows



Invoke a remote application – SSH


<KEPLER_DIR>/demos/getting-started/06-WebServicesAndDataTransformation.xml
Composite actors
ls to a remote directory
Using various interpolation algorithms



interpolation actor
invoking a perl script through ssh
through a web service
52
Demo
GEON Mineral Classifier Workflow
53
Demo
Atype Workflow
54
Demo
Datasets Extraction and Registration
55
Demo
Beach Balls Workflow
56
Demo
GEON LiDAR Workflow (GLW)
57
BREAK
Before that…
Please download the Kepler getting started guide:
http://www.geongrid.org/CSIG06/notes/Efrat_Jaeger/gettin
g-started-guide.doc
58
Hands-on Exercises
59
Installing and Running Kepler


Kepler is already installed on your computer in the lab
To start:
 Double click ‘kepler-1.0.0beta2 icon on the desktop
 Or, C:\Tools\Kepler\kepler-1.0.0beta2

http://www.kepler-project.org
 There’s a link to the latest release on the Kepler website.
 To install it at home!!!

Compatible with all platforms
60
Opening and Running a Workflow



Start Kepler
Open the “HelloWorld.xml” under the “demos/gettingstarted” directory in your local Kepler folder
Two options to run a workflow:
 PLAY BUTTON in the toolbar
 RUNTIME WINDOW from the run menu
61
Modifying an Existing Workflow and Saving It

GOAL: Modify the HelloWorld workflow to display a
parameter-based message

Step by step instructions:
 Open the HelloWorld workflow as before
 From actors search tab, search for Parameter
 Drag and drop the parameter to the workflow canvas on the
right
 Double click the parameter and type your name
 Right click the parameter and select ‘Customize Name’, type in
‘name’.
 Double click the Constant actor and type the following:
• “Hello “ + name
 Save
 Run the workflow
62
Creating a HelloWorld! Workflow (p. 24)

Open a new blank workflow canvas




In the Components tab, search for “Constant” and select the
Constant actor.
Drag the Constant actor onto the Workflow canvas
Configure the Constant actor









From toolbar: File  New Workflow  Blank
right-click the actor and selecting Configure Actor from the menu
Or, double click the actor
Type “Hello World” in the value field and click Commit
In the Components and Data Access area, search for “Display”
and select the Display actor found under “Textual Output.”
Drag the Display actor to the Workflow canvas.
Connect the output port of the Constant actor to the input port of
the Display actor.
In the Components and Data Access area, select the Components
tab, then navigate to the “/Components/Director/” directory.
Drag the SDF Director to the top of the Workflow canvas.
Run the model
63
Use of GEON Data Source Search

GOAL: Accessing data registered to the GEON portal

Step by step instructions:






In the Components and Data Access area, select the Data tab
Click on ‘Sources’ and mark only the ‘GEON Search QueryInterface’
Type in the desired search string (e.g., PACES). Make sure that the
search string is spelled correctly.
Click the Search button.
Drag the ‘PACES Online Gravity Database’ data source onto the
workflow canvas
Right click and select open actor (show the database schema)
•
•
•
•



Under table select ‘USSTATES_TABLE’
Under field select ‘STATE’
Mark ‘include in selection’
Click ok
To view query, double click the actor
Add Display actor (from components tab), connect ports, add sdf
director (as in previous example)
Run the workflow
64
Using Relational Databases in a Workflow

GOAL: Accessing PACES database using a generic database actor

Step by step instructions:




In the Components and Data Access area, select the components tab
Search for ‘database’
Drag ‘Open Database Connection’ and ‘Database Query’ onto the canvas
Configure ‘Open Database Connection’ with the following parameters:
•
•
•
•
Database format: Oracle
Database URL: jdbc:oracle:thin:@129.108.20.225:1521:PDB1
Username: geon
Password: geon
 Connect the output of ‘Open Database Connection’ with the dbcon input port of
‘Database Query’
 Right click and select ‘Open Actor’ (show the database schema)
•
•
•
•
Under table select ‘GEON.USSTATES_TABLE’
Under field select ‘STATE’
Mark ‘include in selection’
Click ok
 To view query, double click the actor
 Add Display actor (from components tab), connect ports, add sdf director (as in
previous example)
 Run the workflow
65
Creating Web Service Workflows


GOAL: Accessing CA Traffic Conditions
Step by step instructions:











In the Components and Data Access area, select the components tab
Search for ‘web service’
Drag ‘Web Service Actor’ onto the canvas
Double click the actor, enter
http://www.xmethods.net/sd/2001/CATrafficService.wsdl, commit
Double click the actor again, select ‘getTraffic’ as method name, commit
Search for ‘String Constant’ in the components tab. Drag and drop
‘String Constant’ onto workflow canvas
Connect ‘String Constant’ output with the ‘Web Service Actor’ input
Add a ‘Display’ and connect its input with the ‘Web Service Actor’
output
Add the SDF director
Double click the ‘String Constant’, type a CA hwy number and commit
Run the workflow
66
SSH Actor and Including Existing Scripts in a Workflow

Step by step instructions:




In the Components and Data Access area, select the components tab
Search for ‘ssh’
Drag ‘SSH To Execute’ onto the canvas
Double click the actor,
• Type in a remote host you have access to.
• Type in your username
 Search for ‘String Constant’ in the components tab. Drag and drop
‘String Constant’ onto workflow canvas
 Double click the ‘String Constant’, type ‘ls’ and commit
 Connect ‘String Constant’ output with the ‘SSH To Execute’ command
input (lowest)
 Add a ‘Display’ and connect its input with the ‘SSH To Execute’ stdout
output (top)
 Add the SDF director
 Run the workflow
 If you have a script deployed on the server, you can replace the ‘ls’
command to invoke the script.
• e.g., perl tmp.pl …
67
Using Various Displays
Open the “03-ImageDisplay.xml” under the
“demos/getting-started” directory in your local Kepler
folder
 Run the workflow
 Search for ‘browser’ in the components tab
 Drag and drop ‘Browser Display’ onto the canvas
 Replace ‘ImageJ’ with ‘Browser Display’ (connect Image
Converter output to ‘Browser Display’ inputURL
 Run workflow again
 Replce ‘Browser Display’ with a textual ‘Display’
 Run workflow

68
Questions
Thanks!
Efrat Jaeger-Frank
[email protected]
Ilkay Altintas
[email protected]
http://www.sdsc.edu
http://kepler-project.org
69