- myExperiment

Download Report

Transcript - myExperiment

An Introduction to Designing,
Executing and Sharing Workflows
with Taverna
Nowgen, Next Gen Workshop
17/01/2012
Exercise 1: Exploring the Workbench


Taverna can be downloaded from
http://www.taverna.org.uk/
Go to the page and find the latest (2.3)
Today, Taverna has been installed for you. You can find
it in the program menu
The following page shows a screenshot of Taverna and
the different panels that make up the workbench
Taverna Workbench
Services Panel
Workflow
Explorer
Workflow
Diagram
1. Workflow Diagram



The workflow diagram is the visual representation of the
workflow, it:
Shows inputs, outputs, services and data flows
Allows editing of the workflow by dragging and dropping
and connecting services together
Enables saving of workflow diagrams for publishing and
sharing
1. Workflow Explorer


The Workflow Explorer shows the detailed view of your
workflow. It shows default values and descriptions for
service inputs and outputs and it shows where remote
services are located. It also shows configuration details,
such as iteration and looping
Workflow validation details can also be found here.
Before a workflow is run, Taverna checks to see if it is
connected correctly and if its services are available.
1. Available Services Panel
Lists services available by default in Taverna







Local java services
WSDL Web Service – secure and public
RESTful Services
R Processor services (for statistical analyses)
Beanshell scripts
Xpath scripts
Spreadsheet import service
The services panel also allows you to add new services
or workflows from the web or from file systems – there
are loads more available!
1. Finding and Using Additional
Resources
Galaxy executes a collection of ‘built-in’ tools. There are
lots available, but eventually, you will need to access
tools and resources outside of Galaxy.
Taverna can make use of any arbitrary web services
(WSDL or REST), so you can use Taverna workflows to
extend your analyses into new areas
In the next few exercises, we will use Taverna to explore
which pathways our genes affect, the functions and
locations of those genes in the cell and literature
searches of the displayed phenotype
To do this, we will use tools and resources from different
resources, such as KEGG, Gene Ontology and PubMed
Exercise 2: Building a Simple Workflow
Like Galaxy, we will begin with running just one service.


Go to the Services Panel and type ‘pathway’ into the
search box at the top
You will see several services in the search results
Select ‘get_pathways_by_genes’. This service returns all
pathways from KEGG
Drag this service
across to the workflow
explorer panel
2: Building a Simple Workflow



In a blank space in the workflow
diagram, right-click and select “Add
Workflow Input Port”
Type in a name for this input (e.g. ID)
and click “ok”
Do the same to create a new
workflow output. Call this output
“pathways”
2: Building a Simple Workflow


You now have 3 boxes in the diagram and we need to
connect them up
Click on the input box and drag towards
“get_pathways_by_genes” and let go. An arrow will
connect the two boxes
2: Building a Simple Workflow



Click on the output box,
drag towards
“get_pathways_bygenes”, and let go. An
arrow will connect the two
boxes
You have now built your
first workflow!
It should look something
like this
2: Building a Simple Workflow

Run the workflow by selecting “file -> run workflow”, or by
clicking on the play button at the top of the workbench
2: Building a Simple Workflow
An input window will appear. As you can see, we have not yet
added a description of the workflow or of the input
Click on ‘New Value’ in the input window and add a KEGG Gene identifier
(e.g. mmu:13163) where it says “some input data goes here”
2: Building a Simple Workflow






Click “run workflow”
You will automatically be switched to the ‘Results’ window
In the bottom left of the results window, click on the results.
You will see some pathway identifiers. These are good for
computers, but not for humans. We need pathway
descriptions to properly examine the results
Switch back to the ‘Design’ window using the tab at the top
of the workbench
In the service panel, search for another KEGG service,
called ‘btit’.
Drag and drop it into the same workflow
2: Building a Simple Workflow





Connect it to the input ‘ID’ and create a new output called
‘pathway_description’ and connect it to that
Re-run the workflow and look at the pathway descriptions
A list of pathways and their descriptions is useful, but it
would be easier to visualise diagrams of the whole
pathways
We also need to extract and analyse each gene from the
gene list generated in the Galaxy exercise
For both these tasks we will find and use workflows from
myExperiment
Exercise 3: Re-using workflows from
myExperiment



Go to http://www.myexperiment.org and click on ‘find
workflows’
You will see a list of the most viewed and downloaded
workflow – see what the most popular workflow does by
reading the description
Change the rank to ‘Latest’ and see what has been
uploaded in the last few weeks
3: Re-using workflows from myExperiment





Find the workflow called “geneIDs to Kegg Pathway
Images” and look at the workflow entry page
Download the workflow by clicking on the link: “Download
Workflow File/Package (T2FLOW)”
Open the workflow in Taverna by going to ‘File ->Open
Workflow’
Run the workflow using the example values supplied by
the workflow creator (Hint: when you run the workflow
the examples values will be added by default in the input
window)
Look at the workflow output – now you will see pathway
diagrams
Exercise 4: Combining workflows from
myExperiment
To analyse all the genes from our study, we must export
and extract the relevant data from the Galaxy history
 Go to your Galaxy / Cistrome history and download the file:
“List of Genes near peak summits”
 Open the file in Excel
 For this part, we only need the list of genes in column D
(ignoring the header lines)
 Save the file with a .csv extension
 If you can’t find the file in your history, download a
version from myExperiment:
http://www.myexperiment.org/files/661.html

4: Combining workflows from myExperiment




In myExperiment, find and download the workflow called
“Import and convert gene list”
This workflow will extract the list of genes in column D
using the built-in spreadsheet import tool (you can find this
in the services panel)
The next step in the workflow converts the RefSeq IDs into
unigene IDs (required for the pathways workflow)
Run the workflow. This time, in the input window, select
“set file location” and navigate to your saved results file
4: Combining workflows from myExperiment




We will now combine the two
workflows
While you are still in the “import
and convert” workflow, go to
the top of the workbench and
select
“insert
->
Nested
workflow”
In the pop-up window, select
“import from file” and find the
pathways workflow from earlier
Click on “import workflow” and
the pathways workflow will
appear in the main workflow
diagram.
4: Combining workflows from myExperiment

Connect the workflows up by linking the output of the
‘Merge_Gene_List’ with the nested workflow input
4: Combining workflows from myExperiment

Create new output ports for the Nested workflow and
connect the Nested workflow outputs to the new outputs

Save the workflow
Run the workflow

4: Combining workflows from myExperiment




The workflow may take a few minutes to run. Spend the
time looking at myExperiment to find other pathway-related
workflows
What other pathway workflows are there?
Do they all use KEGG?
What other resources could you use instead?
Exercise 5: Adding New Services to Taverna



In Galaxy, if you want to add a new tool, you have to add it
to the server. In Taverna, new tools can be ‘added’ more
easily because we are often actually calling external tools
Go to http://www.biocatalogue.org and search for the
‘ontology lookup service’
Look at the entry for that service and copy the WSDL
location URL
5. Adding New Services


Go to the services panel in
Taverna and click “import
new services”. For each
type of service, you are
given the option to add a
new service
Select ‘WSDL service…’ A
window will pop-up asking
for a web address
5. Adding New Services

Enter the Ontology Lookup
service address you just
copied

Scroll down to the bottom
of the Services list and you
will see the new service
you added

It is now ready to be used
in your workflows
5: Adding New Services to Taverna





From the service set you have just imported, add the
service ‘getontologyname’ to a new workflow
This service does not require any inputs, so just create an
output port called ‘ontologyNames’ and connect it to the
service
Run the workflow
You will see a list of all ontologies you can search using
these services
Sometimes, documentation about services is embedded in
the service set like this
Exercise 6: GO Associations
There are many different tools we could use to find GO
associations for the gene list
We could use the service we have just added, or we could
modify the ‘Import and convert’ workflow
Reload the ‘Import and Convert’ workflow
Right-click on the ‘mmusculus_gene_ensembl’ service and
select ‘Copy’
Paste a copy into the same workflow diagram
6: GO Associations
This is a BioMart service. It allows you to retrieve omics
data from ENSEMBL and other genomics resources. If you
are familiar with BioMart, you will see the interface in
Taverna is the same as the web interface
We will modify the BioMart query to find all GO
associations for each gene associated with a Chip-Seq
peak
Right-click on the new service copy and select ‘Configure
BioMart Query’
6: GO Associations
The inputs (or filters) already accept RefSeq Ids from our
input file, but we need to modify the outputs (or attributes)
Select ‘Attributes’ and expand the ‘External’ section.
Unselect ‘UniGeneID’ and select ‘RefSeq mRNA’
Additionally, select ‘Go Term Accession’, ‘GO Name’ and
‘Go Domain’
At the top of the page, change the output format from
multiple to single (TSV format)

(See screenshot on the next slide for an example)
6: GO Associations
6: GO Associations
Click ‘apply’ to save your changes, and ‘close’, to go back
to Taverna
At the top of the workflow diagram, change the workflow
view to show all ports by clicking on the table icon
6: GO Associations
Connect your new service
to the workflow by linking
the ‘D’ output port of the
spreadsheet service to
the input of your new
service
Make a new output port
called ‘GO_Report’ and
connect it to your new
service
6: GO Associations
Save the workflow by going to ‘File -> Save Workflow’
Run the workflow
Download and view the GO report
Exercise 7: Text Mining
So far we have looked at enriching the genomic
information, but we could also use workflows for running
data analyses or performing literature searches
Think about the ways you could extend this analysis with
literature searches (e.g. Correlations between pathways,
genes, GO terms, phenotypes etc)
Search myExperiment for workflows involving text mining,
using the search terms “text mining” and “Pubmed”
7: Text Mining
Find and open the workflow “Phenotype to pubmed”
As you can see, one of the services is no longer available
in the nested workflow. Taverna checks the availability of
each service when you load the workflow and when you
run it
In this case, the workflow will still run without the final
nested workflow (clean text)
Delete the nested workflow and reconnect the workflow
output
Run the workflow with the search term ‘erythropoiesis’
Advanced Exercises
These exercises are an introduction to using Taverna, but
there are many other things you could do with the
workbench
A series of advanced exercises are available to download
from myExperiment here:
http://www.myexperiment.org/files/670.html
All the workflows and materials from this session are
available in the myExperiment group ‘Next Generation
Sequencing Tutorial’. You can join the group if you sign-up
to myExperiment.