Run the workflow
Download
Report
Transcript Run the workflow
Using Scientific Workflows in GEON
Efrat Jaeger, Ilkay Altintas
1
Mission of scientific workflow systems
Promote “scientific discovery” by providing tools and
methods to generate scientific workflows
Create a generic customizable graphical user interface
for scientists from different scientific domains
Support computational experiment creation, execution,
sharing, reuse and provenance
Design frameworks which define efficient ways to
connect to the existing data and integrate
heterogeneous data from multiple resources
Large scale resource sharing and management
Collaborative and distributed applications
Gluing it all together to user’s monitor!!!
2
Utilizing Kepler in GEON
An extensible, easy to use, workflow design and prototyping tool
Integrating heterogeneous local and remote tools in a single interface:
Support for High Performance Computations:
Deployment of workflows to the GEON portal
Harvesting data and tools from repositories:
Job submission and monitoring
Logging of execution trace and registering intermediate products
Data provenance and failure recovery
Portal accessibility.
Web and Grid services
GIS services
Legacy application integration via Shell-Command actor
Remote tools via SSH, SCP and GridFTP
Relational and spatial databases access
Reusable generic and domain specific actors
Direct access to data and tools registered to the GEON portal
A web service harvester
Storage Resource Broker (SRB)
Reverse engineering of existing approaches
3
Actor-Oriented Design
Actor
Port
Communication between input and
output data
Without call-return semantics
Composite Actors
Encapsulation of parameterized
actions
Interface defined by ports and
parameters
Abstract information
Sub-workflows
Model of computation (Director)
Communication semantics among
ports
Flow of control
4
Workflow Design and Prototyping
Actor Search
Data Search
Vergil is the graphical user interface for Kepler
Actor ontology and semantic search for actors
Search -> Drag and drop -> Link via ports
Metadata-based search for datasets
5
Actor Search
• Kepler Actor Ontology
• Used in searching actors and creating
conceptual views (= folders)
Currently more than 200 Kepler actors added!
6
Data Search and Usage of Results
• Kepler DataGrid
– Discovery of data resources
through local and remote
services
SRB,
Grid and Web Services,
Db connections
– Registry of datasets on the
fly using workflows
7
Integrating heterogeneous local and remote
tools in a single interface
Generic Web Service Client and Web Service Harvester
GIS Services
Legacy Application Integration via Command Line
wrapper tools, e.g. GMT
RDBMS and Spatial Databases Access
Remote Tools Access via SSH, SCP and GridFTP
Some Grid actors-Globus Job Runner, GridFTP-based
file access, Proxy Certificate Generator
Generic and domain-oriented actors:
Classification and interpolation algorithms
Native R support
Imaging, Gridding, Vis Support
Textual and Graphical Output
…more …
8
Some Features
Support for High Performance Computations
Job submission and monitoring
Logging of execution trace and registering intermediate
products
Data provenance and failure recovery
Portal accessibility
Deployment of workflows to the GEON portal
Harvesting data and tools from repositories
Direct access to data and tools registered to the GEON portal
A web service harvester
Storage Resource Broker (SRB)
9
Some actors in place
10
GEON Workflows Examples
11
GEON Mineral Classification Workflow
An “early” example:
Classification for
naming Igneous Rocks.
12
GEON Mineral Classifier Workflow
13
PointInPolygon
algorithm
14
Enter initial inputs,
Run
and
Display results
15
Output Visualizers
Browser display
of results
16
Integration Scenario: A-type query
Classifying A-types from an Igneous rock database
Integrating between Relational and Spatial (shapefiles) databases to query
and interactively display GIS results
Reusing existing and generic Kepler components (Classifier, JDBC)
Ghulam Memon, Ashraf Memon
17
Reusing The Mineral Classifier
Classification sub-workflow runs for …
… each body, each sample and each diagram
18
Output
19
Extraction of Datasets on the Fly
SQL database access (JDBC)
Translating query xml
response to web service worldImage
xml input format.
XML SOAP response
Query the UTEP
gravity database for
bouguer anomalies
20
Extraction of Datasets on the Fly
Translating query xml
response to web service worldImage
xml input format.
Creating shapefiles
on the fly using ESRI
mapping services
XML SOAP response
Displaying an image
of the shapefile on
a browser interface
21
Image of the
resulting dataset
Sample
22
GEON Dataset Registration
(as in geonSearch)
Annotation form
23
GEON Dataset Registration
ADN metadata
Metadata display
Registering
validation
24
Putting it all together
25
Beach Balls Workflow
GOAL: Integrate seismic focal mechanisms with image services
26
Beach Balls Workflow Output
28
Gravity Modeling Workflow
Observed Gravity
Topography
Pluton map
Sediments
Moho
Output
Difference
calculator
Source: (GEON)
Dogan Seber, Randy Keller
Residual Map
Densities
Interactive 3D model
Defining possible depth
distribution of plutons
29
Kepler as a Modeling Tool: Gravity Modeling Workflow
• Comparing between synthetic and observed gravity models of
heterogeneous data sources. Creating a residual map of the difference using
ESRI services and displaying it on a web browser
• Portrays Kepler as a prototyping tool (“ToDo”)
• Adjustable parameter-wise
ToDo
Joint work between
SDSC and UTEP.
30
Gravity Modeling Workflow
ToDo
31
R. Haugerud,
U.S.G.S
Survey
LiDAR Introduction
Interpolate / Grid
D. Harding,
NASA
Point Cloud
x, y, zn, …
Process &
Classify
Analyze / “Do Science”
32
The Computational Challenge:
LiDAR generates massive data volumes - billions of returns are
common.
Distribution of these volumes of point cloud data to users via
the internet represents a significant challenge.
Processing and analysis of these data requires significant
computing resources not available to most geoscientists.
Interpolation of these data challenges typical GIS / interpolation
software.
our tests indicate that ArcGIS, Matlab and similar software
packages struggle to interpolate even a small portion of
these data.
Traditionally: Popularity > Resources
33
A Three-Tier Architecture
GOAL: Efficient LiDAR interpolation
and analysis using GEON
infrastructure and tools
Portal
GEON Portal
Kepler Scientific Workflow System
GEON Grid
Use scientific workflows to
glue/combine different tools and the
infrastructure
Grid
35
Kepler can be used as a batch execution engine
Portal
Configuration phase
Subset: DB2 query on DataStar
Monitoring/
Translation
Analyze
Visualize
Subset
move
process
move
render
display
• Interpolate: Grass RST, Grass IDW, GMT…
• Visualize: Global Mapper, FlederMaus, ArcIMS
Scheduling/
Output
Processing
Grid
36
Lidar Processing Workflow (using Fledermaus)
Analyze
Visualize
Subset
move
process
move
render
display
Arizona Cluster
Fledermaus
Create
Scene file
Datastar
sd
iView3D/
Browser
d2
d1
d1
d2 (grid file)
d1
IBM DB2
NFS Mounted Disk
d2
NFS Mounted Disk
37
Lidar Processing Workflow (using Global Mapper)
Analyze
Visualize
Subset
move
process
move
render
display
Arizona Cluster
Global Mapper
Get image
for grid file
Datastar
Browser
d2
d1
d1
d2 (grid file)
d1
IBM DB2
NFS Mounted Disk
d2
NFS Mounted Disk
38
Lidar Processing Workflow (using ArcIMS)
Analyze
Visualize
Subset
move
process
move
render
display
Arizona Cluster
ArcIMS
Datastar
ArcInfo
d1
d1
NFS Mounted Disk
ArcIMS
d2 (grid file)
d1
IBM DB2
ArcSDE
d2
NFS Mounted Disk
39
Lidar Workflow Portlet
User selections from GUI
Translated into a query and a parameter file
Uploaded to remote machine
Workflow description created on the fly
Workflow response redirected back to portlet
40
LIDAR POST-PROCESSING WORKFLOW PORTLET
x,y,z and attribute
Client/
NFS Mounted
Disk
DB2
GEON Portal
Render Map
Map
Parameters
Grass
Functions
ArcSDE
ArcInfo
Parameter
xml
ArcIMS
process
output
Create
Workflow
Description
raw
data
DB2
Spatial
query
submit
Map onto the grid (Pegasus)
Grass surfacing algorithms:
Spline
IDW
Compute Cluster
block mean
Binary grid
…
ASCII grid
Text file
Tiff/Jpeg/Gif
ASCII grid
Download data
KEPLER WORKFLOW
41
Portlet User Interface - Main Page
42
43
Behind the Scenes: Workflow Template
47
Filled Template
48
Example Outputs
49
With Additional Algorithms
50
GLW Monitoring
Job management
A unified interface to follow up on the status of submitted jobs
The system
View job metadata
Zoom to a specific bounding box location
Track errors
Modify a job and re-submist
View the processing results
In the future, register desired workflow products
Useful for publication
GLW is exposed to a high risk of components failures
Long running process
Distributed computational resources under diverse controlling
authorities
Provides transparent/background error handling using
provenance data and ‘smart’ reruns
51
Examples
Searching for actors and datasets
Create a “Hello World!” workflow
<KEPLER_DIR>/demos/getting-started/04-HelloWorld.xml
Use of GEON data source and portal search
Actor search for ‘gis’
Data search for ‘volcanic’
Search for ‘Igneous’
Relational Database Access and Query
Connect to VT Igneous rocks database:
•
Database format: DB2
• URL: jdbc:db2://data.sdsc.geongrid.org:60000/IGNEOUS
• User: readonly
• Passwd: read0n1y
Web service based workflows
Invoke a remote application – SSH
<KEPLER_DIR>/demos/getting-started/06-WebServicesAndDataTransformation.xml
Composite actors
ls to a remote directory
Using various interpolation algorithms
interpolation actor
invoking a perl script through ssh
through a web service
52
Demo
GEON Mineral Classifier Workflow
53
Demo
Atype Workflow
54
Demo
Datasets Extraction and Registration
55
Demo
Beach Balls Workflow
56
Demo
GEON LiDAR Workflow (GLW)
57
BREAK
Before that…
Please download the Kepler getting started guide:
http://www.geongrid.org/CSIG06/notes/Efrat_Jaeger/gettin
g-started-guide.doc
58
Hands-on Exercises
59
Installing and Running Kepler
Kepler is already installed on your computer in the lab
To start:
Double click ‘kepler-1.0.0beta2 icon on the desktop
Or, C:\Tools\Kepler\kepler-1.0.0beta2
http://www.kepler-project.org
There’s a link to the latest release on the Kepler website.
To install it at home!!!
Compatible with all platforms
60
Opening and Running a Workflow
Start Kepler
Open the “HelloWorld.xml” under the “demos/gettingstarted” directory in your local Kepler folder
Two options to run a workflow:
PLAY BUTTON in the toolbar
RUNTIME WINDOW from the run menu
61
Modifying an Existing Workflow and Saving It
GOAL: Modify the HelloWorld workflow to display a
parameter-based message
Step by step instructions:
Open the HelloWorld workflow as before
From actors search tab, search for Parameter
Drag and drop the parameter to the workflow canvas on the
right
Double click the parameter and type your name
Right click the parameter and select ‘Customize Name’, type in
‘name’.
Double click the Constant actor and type the following:
• “Hello “ + name
Save
Run the workflow
62
Creating a HelloWorld! Workflow (p. 24)
Open a new blank workflow canvas
In the Components tab, search for “Constant” and select the
Constant actor.
Drag the Constant actor onto the Workflow canvas
Configure the Constant actor
From toolbar: File New Workflow Blank
right-click the actor and selecting Configure Actor from the menu
Or, double click the actor
Type “Hello World” in the value field and click Commit
In the Components and Data Access area, search for “Display”
and select the Display actor found under “Textual Output.”
Drag the Display actor to the Workflow canvas.
Connect the output port of the Constant actor to the input port of
the Display actor.
In the Components and Data Access area, select the Components
tab, then navigate to the “/Components/Director/” directory.
Drag the SDF Director to the top of the Workflow canvas.
Run the model
63
Use of GEON Data Source Search
GOAL: Accessing data registered to the GEON portal
Step by step instructions:
In the Components and Data Access area, select the Data tab
Click on ‘Sources’ and mark only the ‘GEON Search QueryInterface’
Type in the desired search string (e.g., PACES). Make sure that the
search string is spelled correctly.
Click the Search button.
Drag the ‘PACES Online Gravity Database’ data source onto the
workflow canvas
Right click and select open actor (show the database schema)
•
•
•
•
Under table select ‘USSTATES_TABLE’
Under field select ‘STATE’
Mark ‘include in selection’
Click ok
To view query, double click the actor
Add Display actor (from components tab), connect ports, add sdf
director (as in previous example)
Run the workflow
64
Using Relational Databases in a Workflow
GOAL: Accessing PACES database using a generic database actor
Step by step instructions:
In the Components and Data Access area, select the components tab
Search for ‘database’
Drag ‘Open Database Connection’ and ‘Database Query’ onto the canvas
Configure ‘Open Database Connection’ with the following parameters:
•
•
•
•
Database format: Oracle
Database URL: jdbc:oracle:thin:@129.108.20.225:1521:PDB1
Username: geon
Password: geon
Connect the output of ‘Open Database Connection’ with the dbcon input port of
‘Database Query’
Right click and select ‘Open Actor’ (show the database schema)
•
•
•
•
Under table select ‘GEON.USSTATES_TABLE’
Under field select ‘STATE’
Mark ‘include in selection’
Click ok
To view query, double click the actor
Add Display actor (from components tab), connect ports, add sdf director (as in
previous example)
Run the workflow
65
Creating Web Service Workflows
GOAL: Accessing CA Traffic Conditions
Step by step instructions:
In the Components and Data Access area, select the components tab
Search for ‘web service’
Drag ‘Web Service Actor’ onto the canvas
Double click the actor, enter
http://www.xmethods.net/sd/2001/CATrafficService.wsdl, commit
Double click the actor again, select ‘getTraffic’ as method name, commit
Search for ‘String Constant’ in the components tab. Drag and drop
‘String Constant’ onto workflow canvas
Connect ‘String Constant’ output with the ‘Web Service Actor’ input
Add a ‘Display’ and connect its input with the ‘Web Service Actor’
output
Add the SDF director
Double click the ‘String Constant’, type a CA hwy number and commit
Run the workflow
66
SSH Actor and Including Existing Scripts in a Workflow
Step by step instructions:
In the Components and Data Access area, select the components tab
Search for ‘ssh’
Drag ‘SSH To Execute’ onto the canvas
Double click the actor,
• Type in a remote host you have access to.
• Type in your username
Search for ‘String Constant’ in the components tab. Drag and drop
‘String Constant’ onto workflow canvas
Double click the ‘String Constant’, type ‘ls’ and commit
Connect ‘String Constant’ output with the ‘SSH To Execute’ command
input (lowest)
Add a ‘Display’ and connect its input with the ‘SSH To Execute’ stdout
output (top)
Add the SDF director
Run the workflow
If you have a script deployed on the server, you can replace the ‘ls’
command to invoke the script.
• e.g., perl tmp.pl …
67
Using Various Displays
Open the “03-ImageDisplay.xml” under the
“demos/getting-started” directory in your local Kepler
folder
Run the workflow
Search for ‘browser’ in the components tab
Drag and drop ‘Browser Display’ onto the canvas
Replace ‘ImageJ’ with ‘Browser Display’ (connect Image
Converter output to ‘Browser Display’ inputURL
Run workflow again
Replce ‘Browser Display’ with a textual ‘Display’
Run workflow
68
Questions
Thanks!
Efrat Jaeger-Frank
[email protected]
Ilkay Altintas
[email protected]
http://www.sdsc.edu
http://kepler-project.org
69