Transcript Slide 1
EGAN Tutorial:
Loading Network Data
October, 2009
Jesse Paquette
UCSF Helen Diller Family
Comprehensive Cancer Center
[email protected]
Preamble
• This document has many slides with multi-step
animations
– Best viewed in Slide Show mode
• The EGAN graphical user interface is evolving
– Icons may change
– Menus may change
– Button/widget placement may change
– This document probably won’t change as quickly
– Please contact the developers if you notice major
discrepancies between this and EGAN
Loading network data: An overview
• The EGAN pre-collated network represents only
a fraction of available data
• Additional data can be loaded as
– Gene sets/association nodes
• Pathways, annotation terms, articles, transcription
factor targets, miRNA targets, conserved domains,
significant gene sets/clusters from experiments, etc.
– Gene-gene edges
• Protein-protein interactions, literature co-occurrence,
expression correlation, sequence homology,
transcription factor targets, kinase targets, etc.
• This document will outline the steps for loading
additional gene sets and gene-gene edges into
EGAN
Loading gene sets into EGAN
Loading gene sets into EGAN:
Gene set file formats
•
Two possible tab-delimited text formats
– GMT
• All default pre-collated gene sets in EGAN are all specified via GMT
files
• Each row represents a different gene set
– GMX
• Transposed GMT
• Each column represents a different gene set
•
First two columns of GMT (or rows for GMX) specify
– Gene set ID (first column)
• Can potentially be used to link out to the gene set’s web page via URL
– Gene set name (second column)
• Can be empty or same as the ID
•
Subsequent columns list the genes in each set
– Gene identifiers must be mappable to Entrez Gene IDs
• EGAN provides a wide variety of mapping file options
– Entrez Gene ID, HUGO Gene Symbol, assay-specific IDs, Ensembl,
GenBank, UniProt, etc.
• EGAN expects that all entity IDs are the same type for each file
Loading gene sets into EGAN:
An example
Each row is
a gene set
Later columns: gene identifiers
First column: gene set IDs
Second column: gene set names
Loading gene sets into EGAN:
An example
Save as tab-delimited text
Loading gene sets into EGAN:
An example
• Download or construct a gene set file
– This example will use c2.cgp.v2.5.symbols.gmt from
MSigDB (download this file to follow along)
• You’ll have to log-in with your email address to
download MSigDB gene sets
• Launch EGAN H. sapiens
Loading gene sets into EGAN:
An example
Click “Browse…”
Now specify that these
gene
sets are
type
This GMT
file of
uses
“MSigDB
C2: chemical
Gene Symbols
for
and
genetic
gene
identifiers.
perturbations” by
selecting
that option
Select “HUGO
Gene
from
the drop-down
Symbol”
from the
menu.
drop-down menu.
Shown are the default pre-collated
gene sets.
We want to load a new one.
Select your GMT file and click
“Specify gene association set”.
ClickMSigDB
on “7) Association
This
type has Data”
been pre-defined for
EGAN, which is why it
you areclick
finished
exists in this menu. When Finally,
“Addloading
Set” data,
click “Finish – Launch EGAN”.
Loading gene sets into EGAN:
An example
Whenever you change the network configuration by
adding or removing files, you will be given the option to
save the new configuration to a tab-delimited text file.
If you choose to save a .config file, next time you will
only need to specify that file (item 3 in the Launch
EGAN Wizard).
Loading gene sets into EGAN:
An example
When EGAN finishes loading, your
new set(s) will be available for
exploration
Loading gene-gene edges
into EGAN
Loading gene-gene edges into EGAN:
File formats
• Two possible tab-delimited text formats
– SIF (Simple Interaction File) format commonly used in Cytoscape
• .sif extension (required in EGAN)
• Each line represents a gene-gene relationship
• Three columns
– First column is first gene
– Middle column is ignored in EGAN
– Third column is second gene
– EGAN interaction file format
• .txt file extension
• Three columns, like SIF
– Middle column is a PubMed ID
• Gene identifiers must be mappable to Entrez Gene IDs
– EGAN provides a wide variety of mapping file options
• Entrez Gene ID, HUGO Gene Symbol, assay-specific IDs, Ensembl,
GenBank, UniProt, etc.
– EGAN expects that all entity IDs are the same type for each file
Loading gene-gene edges into EGAN:
An example
Each row is a
gene-gene
relationship
Third column: second gene
First column: first gene
Loading gene-gene edges into EGAN:
An example
Save as tab-delimited text
Loading gene-gene edges into EGAN:
An example
• Download or construct a gene-gene edge file
– This example will use HPN.sif, a set of kinase-target
relationships available in the “.sif Gzip-ed files” link at
NetworKIN (download this file to follow along)
• You’ll have to accept the NetworKIN license in order to
download data
• Launch EGAN H. sapiens
Loading gene-gene edges into EGAN:
An example
Click “Browse…”
Now specify that these
gene sets are of type
This SIF file by
uses
“NetworKIN”
Gene Symbols
for
selecting
that option
genethe
identifiers.
from
drop-down
menu.
Select “HUGO
Gene
Symbol” from
The
NetworKIN
type
the been
drop-down
has
pre-defined
menu.
for
EGAN, which is why
it exists in this menu.
Shown are the default pre-collated
gene-gene edge files.
to load.txt)
a new one.
Select yourWe
SIFwant
(or EGAN
file and click “Specify genegene edge set”
Click on “8) Gene Relationship Edges”
When you
are finished
loading
Finally,
click “Add
Set”data,
click “Finish – Launch EGAN”
Loading gene-gene edges into EGAN:
An example
Whenever you change the network configuration by
adding or removing files, you will be given the option to
save the new configuration to a tab-delimited text file.
If you choose to save a .config file, next time you will
only need to specify that file (item 3 in the Launch
EGAN Wizard).
Loading gene-gene edges into EGAN:
An example
When EGAN finishes loading, your
new gene-gene edges will be
available for exploration
Loading network data: Tips and hints
•
Both the MSigDB and NetworKIN types were pre-defined in EGAN
–
–
•
You can specify your own type definitions in a Type Definition file
–
–
–
•
Give your added nodes and edges distinct colors and links
See item 4 in the Launch EGAN Wizard
Use this type definition file as a template – just add the appropriate lines for
your new types
You can specify gene set, gene-gene edge and mapping files via
URL (or .jar file, but that’s tricky)
–
•
This may not be the case for your new data
You can use the “Custom Node/Custom Edge” types as a default
Just type or paste the URL into the appropriate text field instead of clicking
“Browse…”
Potential issues to consider
–
–
–
Identifiers used in your gene set/gene-gene edge file might not be found in
the mapping file
Genes in your mapping file might not be present in the network
These issues are written (rather crudely) to the Log
• Inspect the log file if you notice unexpected behavior
Questions/comments?
• Visit http://groups.google.com/group/ucsf-egan
for downloads, documentation and discussion
– Requires an account with Google Groups