Scenario 5 - people.vcu.edu

Download Report

Transcript Scenario 5 - people.vcu.edu

Scenario 5
Analysis: Discovery of possible regulatory motifs
What follows is a simulation of the proposed graphical
interface. As you go through the simulation please consider
what capabilities you would want to serve your research and
annotation interests.
A narrative to help you go through the simulation appears in a
red-bordered box, such as the one below.
To begin:
1. Click on Slide Show, (on the upper toolbar)
2. Click View Show
3. Click Continue button
Continue
Scenario 5
Analysis: Discovery of possible regulatory motifs
You’ve decided you want to know what regulates the expression
of nif genes, encoding the machinery for nitrogen fixation. Here’s
your strategy:
• Collect nif genes from Anabaena PCC 7120 into set
• Include in set orthologs of the Anabaena genes
• Extract 5’ sequences from all genes in set
• Analyze set of 5’ sequences for motifs
• (Search for other genes with same motifs)
Continue
Build set
Display set
Click on Build Set to
begin finding orfs with
the desired
specifications
Modify set
Set operation
Build set
Display set
Modify set
Choose set type
All items in
All open reading frames of
All amino acid sequences of
All intergenic regions of
Human-annotated orfs of
Private set
Public set
The first goal is to find all open reading
frames within Prochlorococcus
annotated as nif genes, so click on All
open reading frames in
Set operation
Cancel
Build set
All items in
Display set
Modify set
Set operation
Choose set type
Choose database
All open reading frames of
Arthrobacter platensis
Gloeobacter violaceus
Microcystis aeruginosa
Nostoc punctiforme
Nostoc PCC
Anabaena
PCC7120
7120
Prochlorococcus MED4
Prochlorococcus MIT9313
Prochlorococcus S120
Synechococcus PCC6301
Synechococcus PCC7942
Synechococcus WH
Synechocystis PCC 6803
Thermosynechococcus
Trichodesmium
Unicellulular
Filamentous
All
Click on
Anabaena PCC 7120
Cancel
Build set
Display set
Modify set
Set operation
Cancel
Variable
Data
Operation
Function
Done
All items in
Choose set type
Choose database
All open reading frames of
Anabaena PCC 7120
You want to compare the description of each
orf with “nif”. To get a tool to extract the
description, click on Function .
such that:
Build set
Display set
Modify set
Set operation
Cancel
Variable
Data
Operation
Function
Done
All items in
Choose set type
Choose database
All open reading frames of
Anabaena PCC 7120
Choose function
Closest ortholog of
Protein product of
Upstream region of
Downstream region of
Description of
Category of
Annotation level of
(item
Click on Description of.
such that:
Build set
Display set
Modify set
Set operation
Cancel
Variable
Data
Operation
Function
Done
All items in
Choose set type
Choose database
All open reading frames of
Anabaena PCC 7120
Choose function
Description of
Op
(item)
=

includes
excludes
You want to find orfs whose description
includes the word “nif”. Click on includes.
such that:
Build set
All items in
Display set
Modify set
Set operation
Cancel
Data
Operation
Function
Done
Choose set type
Choose database
All open reading frames of
Anabaena PCC 7120
Choose function
Description of
Op
(item)
includes
Type description term(s)
nif
You can type in any characters to search
for. For this simulation, the term “nif” is
provided. Press the Enter key
such that:
Build set
Display set
Modify set
Set operation
Cancel
Variable
Data
Operation
Function
Done
All items in
Choose set type
Choose database
All open reading frames of
Anabaena PCC 7120
Choose function
Description of
Op
(item)
includes
No more specifications.
Press the Done button.
Type description term(s)
nif
such that:
Build set
Display set
Modify set
Set operation
Cancel
Variable
Data
Operation
Function
Done
All items in
Choose set type
Choose database
All open reading frames of
Anabaena PCC 7120
Choose function
Description of
Op
(item)
includes
Type description term(s)
nif
If this were a complicated search, you might
want to save the specifications as a script. In
this case, just save the results by clicking on
Save only results.
Save results and script
Save
Save only
only results
results
such that:
Build set
Display set
Modify set
Set operation
Cancel
Variable
Data
Operation
Function
Done
All items in
Choose set type
Choose database
All open reading frames of
Anabaena PCC 7120
Choose function
Description of
Type description term(s)
Op
(item)
includes
nif
Type name of set
7120 nif genes
All orfs of Anabaena whose descriptions include
“nif” will be collected into a set. You can name
the set anything you want. For this simulation, a
name is provided. Press the Enter key.
such that:
Build set
Display set
Modify set
Set operation
Done
Set: 7120 nif genes
Anab7120:all0687 hupL [NiFe] uptake hydrogenase large subunit, C terminus
Anab7120:all0687 hupL [NiFe] uptake hydrogenase large subunit, N terminus
Anab7120:all0688 hupS [NiFe] uptake hydrogenase small subunit
Anab7120:alr0692
similar to nifU
Anab7120:alr0874 nifH2 dinitrogenase reductase
Anab7120:asr1309
similar to nifU
Anab7120:alr1407 nifV1 homocitrate synthase
Anab7120:asr1408 nifZ
iron-sulfur cofactor synthesis
Anab7120:asr1409 nifT
<< more items >>
This is the result of the search. The set is displayed
both as a list of orfs and a graphical representation of
the genetic neighborhood of each orf. You can find out
more about an orf by clicking its name or its arrow.
For now, just press
.
Continue
Continue
Build set
Display set
Modify set
Set operation
Set: 7120 nif genes
Anab7120:all0687 hupL [NiFe] uptake hydrogenase large subunit, C terminus
Anab7120:all0687 hupL [NiFe] uptake hydrogenase large subunit, N terminus
Anab7120:all0688 hupS [NiFe] uptake hydrogenase small subunit
Anab7120:alr0692
similar to nifU
Anab7120:alr0874 nifH2 dinitrogenase reductase
Anab7120:asr1309
similar to nifU
Anab7120:alr1407 nifV1 homocitrate synthase
Anab7120:asr1408 nifZ
iron-sulfur cofactor synthesis
Anab7120:asr1409 nifT
<< more items >>
This search, like most, is only a beginning. It brought up some
unintended hits (“nif” found “NiFe”). More seriously, it brought up
many genes probably in the middle of operons and unlikely to be
preceded by regulatory motifs. The genetic neighborhood gives clues
as to operon structure. Select the two most likely orfs to begin
operons by clicking on the circles next to alr0874 and alr1407.
Done
Build set
Display set
Modify set
Set operation
Set: 7120 nif genes
Anab7120:all0687 hupL [NiFe] uptake hydrogenase large subunit, C terminus
Anab7120:all0687 hupL [NiFe] uptake hydrogenase large subunit, N terminus
Anab7120:all0688 hupS [NiFe] uptake hydrogenase small subunit
Anab7120:alr0692
similar to nifU
Anab7120:alr0874 nifH2 dinitrogenase reductase
Anab7120:asr1309
similar to nifU
Anab7120:alr1407 nifV1 homocitrate synthase
Anab7120:asr1408 nifZ
iron-sulfur cofactor synthesis
Anab7120:asr1409 nifT
<< more items >>
Let’s suppose you proceed in a like
fashion through the rest of the list.
Press
.
Done
Done
Build set
Anab7120:alr0874
Anab7120:alr1407
Anab7120:all1438
Anab7120:all1455
Display set
Modify set
Show orf ID
Show gene name Set:
Show description
nifH2 dinitrogenase reductase
Show coordinates
nifV1 homocitrate synthase
Show graphic
nifE nitrogenase Fe/Mo cofactor
Show neighbors: +/- 1
nifH dinitrogenase reductase
Show map
Anab7120:all1517 nifB
Set operation
7120 nif genes
nitrogen fixation protein
Anab7120:alr2968 nifV2 homocitrate synthase
The set now consists of the six Anabaena nif genes
that you judged most likely to be preceded by
transcriptional signals. It might be interesting to
see where this set is located on the genome. To do
this, click
, then make some room by
Display set
clicking on Show graphic.
Done
Build set
Anab7120:alr0874
Anab7120:alr1407
Anab7120:all1438
Anab7120:all1455
Display set
Modify set
Show orf ID
Show gene name Set:
Show description
nifH2 dinitrogenase reductase
Show coordinates
nifV1 homocitrate synthase
Show graphic
nifE nitrogenase Fe/Mo cofactor
Show neighbors: +/- 1
nifH dinitrogenase reductase
Show map
Anab7120:all1517 nifB
Set operation
7120 nif genes
nitrogen fixation protein
Anab7120:alr2968 nifV2 homocitrate synthase
Replace the space-consuming description
with coordinates by clicking on Show
description, and then click Show
coordinates and finally Show map.
Done
Build set
Anab7120:alr0874
Anab7120:alr1407
Anab7120:all1438
Anab7120:all1455
Display set
Modify set
Show orf ID
Show gene name Set:
Show description
nifH2
Show coordinates
nifV1
Show graphic
nifE
Show neighbors: +/- 1
nifH
Show map
Set operation
7120 nif genes
Anab7120:all1517 nifB
Anab7120:alr2968 nifV2
Replace the space-consuming description
with coordinates by clicking on Show
description, and then click Show
coordinates and finally Show map.
Done
Build set
Anab7120:alr0874
Anab7120:alr1407
Anab7120:all1438
Anab7120:all1455
Display set
Modify set
Show orf ID
Show gene name Set:
Show description
nifH2 1008496 -> 1009389
Show coordinates
nifV1 1671878 -> 1673011
Show graphic
nifE 1696389 <- 1697831
Show neighbors: +/- 1
nifH 1713396 <- 1714283
Show map
Anab7120:all1517 nifB
Set operation
7120 nif genes
1776670 <- 1778097
Anab7120:alr2968 nifV2 3609625 -> 3611012
Replace the space-consuming description
with coordinates by clicking on Show
description and then Show coordinates, and
finally, click on Show map.
Done
Build set
Display set
Modify set
Set: 7120 nif
Anab7120:alr0874 nifH2 1008496 -> 1009389
Anab7120:alr1407 nifV1 1671878 -> 1673011
Anab7120:all1438 nifE
1696389 <- 1697831
Anab7120:all1455 nifH
1713396 <- 1714283
Anab7120:all1517 nifB
1776670 <- 1778097
Set operation
Maintenance
genes Set operations
Analysis tools
Discovery tools
Transformations
Transformations
Anab7120:alr2968 nifV2 3609625 -> 3611012
Anabaena
chromosome
Four of the six putative nif operons
are clustered near 1.7 Mb... but back
to business. Our idea was to extend
the set to include orthologs in other
nitrogen-fixing cyanobacteria. To do
this, click
Set operation , then
Transformations, then Ortholog of.
6413771 bp
Done
Closest
ortholog
Ortholog
of of
Protein product of
Upstream region of
Downstream region of
Build set
Display set
Modify set
Choose set type
Orthologs of (
All open reading frames of
All amino acid sequences of
All intergenic regions of
Human-annotated orfs of
Public set
Private set
You want the orthologs of the orfs in the set
you just made. This set is yours – a private
set – as opposed to certain sets that are
available to all users. Click Private set.
Set operation
Cancel
Build set
Orthologs of (
Display set
Modify set
Set operation
Choose set type
Choose set
Private set
7120 IS895 seqs
7120 nif genes
7120 STTR7 regions
Light-specific genes
Npun STTR7 regions
The list of choices will consist of whatever
sets you may have created. Choose the one
you just made: 7120 nif genes.
Cancel
Build set
Orthologs of (
Display set
Modify set
Choose set type
Choose set
Private set
7120 nif genes
At present, the set of filamentous
cyanobacteria include just the nitrogenfixing strains Nostoc punctiforme,
Trichodesmium erythreum, Anabaena.
Click on filamentous.
Set operation
Cancel
Choose database
in
Arthrobacter platensis
Gloeobacter violaceus
Microcystis aeruginosa
Nostoc punctiforme
Anabaena PCC 7120
Prochlorococcus MED4
Prochlorococcus MIT9313
Prochlorococcus S120
Synechococcus PCC6301
Synechococcus PCC7942
Synechococcus WH8102
Synechocystis PCC 6803
Thermosynechococcus
Trichodesmium erythreum
Unicellulular
Filamentous
filamentous
All
)
Build set
Orthologs of (
Display set
Modify set
Choose set type
Choose set
Private set
7120 nif genes
Set operation
Choose database
in
Type name of set
all nif genes
All orthologs of the selected nif genes
will be combined and saved in a set of
your choice. For this simulation, a name
is provided. Press the Enter key.
Cancel
Filamentous
)
Build set
Display set
Modify set
Set: all nif genes
Anab7120:alr0874 nifH2 dinitrogenase reductase
Anab7120:alr1407 nifV1 homocitrate synthase
Anab7120:all1438 nifE
nitrogenase Fe/Mo cofactor
Anab7120:all1455 nifH
dinitrogenase reductase
Anab7120:all1517 nifB
nitrogen fixation protein
Anab7120:alr2968 nifV2 homocitrate synthase
NostPunc:637.025 nifH2 dinitrogenase reductase
NostPunc:510.011 nifV1 homocitrate synthase
NostPunc:651.072 nifE
nitrogenase Fe/Mo cofactor
NostPunc:510.021 nifB
nitrogen fixation protein
<< more items >>
The set now consists of nif genes from all
filamentous cyanobacteria. From this set
we want to extract the upstream
sequences. Click on Set operation ,
then click on Transformations and
Upstream region of.
Set operation
Maintenance
Set operations
Analysis tools
Discovery tools
Transformations
Transformations
Done
Ortholog of
Protein product of
Upstream region of
Downstream region of
Build set
Display set
Modify set
Choose set type
Upstream region of (
All open reading frames of
Human-annotated orfs of
Public set
Private set
Again you want the orfs from a set you
made yourself, so click on Private set.
Set operation
Cancel
Build set
Display set
Upstream region of (
Modify set
Set operation
Choose set type
Choose set
Private set
7120 IS895 seqs
7120 nif genes
7120 STTR7 regions
all nif genes
Light-specific genes
Npun STTR7 regions
The set you just defined magically
appears on the list (no chance for
misspelling). Click on it.
Cancel
)
Build set
Display set
Upstream region of (
Modify set
Set operation
Choose set type
Choose set
Private set
all nif genes
Type name of set
all nif genes – 5’
Give this new set of 5’ regions a
descriptive name (done here for
you). Press the Enter key.
)
Cancel
Build set
Display set
Modify set
Set operation
Done
Set: all nif genes – 5’
Anab7120.C:1006982-1008496d
Anab7120.C:1671462-1671878d
Anab7120.C:1697832-1698138c
Anab7120.C:1713264-1713395c
Anab7120.C:1778098-1779034c
Anab7120.C:3609273-3609624d
NostPunc.637:37288-37376d
NostPunc.510:15955-16325d
NostPunc.651:60311-60584c
NostPunc.510:5239-6338c
<< more items >>
The resulting set consists of sequences not orfs,
and so the elements are defined by coordinates.
Clicking on a coordinate brings up the
sequence display (see Scenario 6). Clicking on
a graph of an orf brings up the orf’s annotation
Continue
page. Click
.
Continue
Build set
Display set
Modify set
Set: all nif genes –
Anab7120.C:1006982-1008496d
Anab7120.C:1671462-1671878d
Set operation
Maintenance
5’Set operations
Analysis tools
Discovery tools
Transformations
Anab7120.C:1697832-1698138c
Anab7120.C:1713264-1713395c
Anab7120.C:1778098-1779034c
Anab7120.C:3609273-3609624d
NostPunc.637:37288-37376d
NostPunc.510:15955-16325d
NostPunc.651:60311-60584c
NostPunc.510:5239-6338c
<< more items >>
The final step in this procedure is to analyze the set
of upstream sequences of nif genes hoping to find a
common motif. Click on Set
Set operation
operatio , then
Analysis tools. Tools based on Position-Specific
Scoring Matrices (PSSM’s) are most often used for
the task. Click on one of these: Meme.
Done
Align
PSSM: Gibbs sampler
PSSM: Meme
Make HMM
Build set
Display set
Modify set
Choose set type
PSSM: Meme of (
Public set
Private set
Click Private set and then all nif genes – 5’
to give Meme the set of 5’ sequences.
Set operation
Cancel
Build set
Display set
PSSM: Meme of (
Modify set
Set operation
Choose set type
Choose set
Private set
7120 IS895 seqs
7120 nif genes
7120 STTR7 regions
all nif genes
all nif genes – 5’
Npun STTR7 regions
Click Private set and then all nif genes – 5’
to give Meme the set of 5’ sequences.
)
Cancel
Build set
Display set
PSSM: Meme of (
Modify set
Set operation
Choose set type
Choose set
Private set
all nif genes – 5’
Type name of results
PSSM:all nif – 5’
Give the results a name, press Enter,
and the task is accomplished.
)
Cancel
Scenario 5
Analysis: Discovery of possible regulatory motifs
Summary
• The interface facilitates operations on sets of genes and
sequences
• The interface puts at your disposal powerful tools (that
already exist), without the need to figure out a different
computer environment
• Taken together, these capabilities make possible a focus by
those not particularly adept at computer programming on
the function of noncoding sequences
But don’t be fooled – the interface does not yet exist.
That’s the point of the proposal!