Transcript Slide 1

SeqExpress:
Introduction
Features

Visualisation Tools



Analysis Tools




Data: gene expression, gene function and gene location.
Analysis: probability models, hierarchies and clusters.
Cluster analysis, refinement and validation.
Using mixture modelling.
Graphs and Hierarchies.
Data Tools



Data Import/Export tools (Remote access of GEO, local
access of tab separated and MAGE format).
Data Integration: optional underlying data and annotation
database.
Data Manipulation.
SeqExpress:
Visualisation Tools
Visualisations

Data Visualisation:





Gene Expression;
Gene Variance;
Gene Function/Ontology; and
Chromosome Features.
Analysis Visualisations:



Hierarchies/Graphs;
Probabilistic Methods; and
Cluster Comparison.
Gene Expression
Scatter Plots
Parallel Plots
Also: Histograms, Annotation lists and Gene Tables
Gene Variance
Gene Spectrums
Gene Clouds
Gene Ontology Visualisations
Graphs
TreeMaps
Tables
Chromosome Feature Visualisations
Data Analysis
Probability Models
Cluster Comparison
Dendrograms
Example: Viewing Clusters
A cluster has
been selected in
the gene tab.
The genes are
then selected in
a scatter plot, a
parallel plot and
the histogram.
Example: Gene Function Selection
The binding term
has been selected
from the results of
an ontology term
search. The binding
term is then
automatically
selected in the
Function tab, as well
as the open Tree
Map visualisation.
All genes that have
been annotated with
the binding term are
also selected in the
parallel plot.
Example: Genome Location
A combined expression
profile and location-based
cluster analysis has been
performed and the results
viewed. The parallel plot
shows the similar
expression profiles, whilst
the two genome views
show the locale of the
genes. The genome view
in the middle is set to
auto-zoom, and so shows
the locale in detail.
Example: Data Analysis
A series of models have
been generated, and the
genes with a high
probability of belonging
to one of the models has
been selected in the
model viewer. The
corresponding location
of the genes and their
expression profiles are
then shown
Summary

Number of visualisations available to support
variety of tasks:








Expression
Ontology (plus pathway and protein-protein interaction)
Location
Hierarchies
Cluster comparison
Variance
Probability-theory
Visualisations inter-linked
SeqExpress:
Analysis Tools
Analysis Tools 1: Clusters, Hierarchies and
Concepts

Clustering:

Distance based
Refinement (ontology or model based).

Validation (C-Index)



Hierarchies: SDD*, Hierarchical
Projection:


Covariance*: eigen(covar(A)) or A=USVT
Co-occurrence*: P(g,e)=P(g)ΣP(e|z)P(z|g)
*Used for global/enterprise-wide information retrieval
Cluster Distances
Location
Function
Expression
TERM:1
2
1
TERM:2
3
TERM:3
Z
4
Y
TERM:6
TERM:4
Pearson, Cosine
Euclidian, Manhattan.
TERM:5
Information theory:
2*N3/(N1+N2+2*N3)
Intra gene distance
distance to feature
SAGE: Semi Discrete Decomposition
•Immunity to outliers
•Uses local density
•Describes both experiments and genes
•Hierarchical description
•Stencils means that fold-in possible
•Highly scalable
5

5
0

5
5
0
0  1
 
 3  3   1
3  3  0
0
A  X  D Y T
1
0
0  5 0 0 
 
 1
 1  1  0 3 0  
0




1  1 0 0 3
0
0 0
0 0

1 0
0 1
Analysis Tools 2: Models and Graphs
Multi-factor analysis to identify complex features
within the data (e.g. genes which have both a
similar expression profile and are located on the
same part of a chromosome)


Graphs: Two factor analysis using (1)Graph
Connectivity and (2) Edge Length.
Models: N-factor analysis using product rule:
P(A,B|C)=P(A|BC)*P(B|C).
Models: Discovery
Different models can be found, and altered using energy parameters and tempering.
Linear (beta 0.6)
Spline (beta 0.1)
Unsupervised Clusters
Regulatory Modules
Unsupervised Clusters
Size: 63 ( Energy, osmolarity and cAMP signaling )
Size: 31 ( Energy and Osmotic stress I )
Size: 55
Size: 53 ( Respiration and carbon regulation )
Size: 27
Size: 42 ( mRNA, rRNA and tRNA processing )
Size: 32 ( Ribosomal and phosphate metabolism )
Regulatory Modules
Size: 34 ( Mixed IV )
Size: 23 ( Cell wall and transport I )
Size: 40 ( Cell differentiation )
Size: 59 ( Protein modification and trafficking
Size: 77 ( Sporulation and Cell wall )
Size: 47 ( Nuclear )
Size: 86 ( Trafficking and Mitochondrial )
Size: 71 ( Cell cycle, TFs and DNA metabolis
Size: 41 ( Energy and Osmotic stress II )
Size: 63 ( Energy, osmolarity and cAMP signa
Size: 31 ( Energy and Osmotic stress I )
Size: 53 ( Respiration and carbon regulation
Size: 28 ( Unknown genes II )
Size: 38 ( AA metabolism II )
Size: 42 ( mRNA, rRNA and tRNA processing
Size: 32 ( Ribosomal and phosphate metabo
Size: 41 ( Mixed III )
Size: 64 ( Cell cycle and general TFs )
Size: 59 ( Sporulation and cAMP pathway )
Size: 48 ( TFs and nuclear transport )
Size: 77 ( ER and Nuclear )
Size: 28 ( Mixed I )
Size: 74 ( Snf kinase regulated processes )
Size: 87 ( Mitochondrial and Signaling )
Size: 34 ( Missing values )
Size: 43
Size: 87
Size: 79
Size: 69
Size: 75
Size: 26
Size: 29
Size: 39
Size: 40
Size: 37
Size: 19
Size: 88
Size: 49
Size: 53
Size: 52
Size: 53
Size: 72
Size: 28
Size: 13
Size: 37
Size: 101
Size: 27
Size: 34 ( Missing values )
Normal (beta 0.1)
Unsupervised Clusters
Regulatory Modules
Cosine (beta 1.1)
Unsupervised Clusters
Size: 54 ( Mixed II )
Size: 87 ( Unkown (sub-telomeric) )
Regulatory Modules
Size: 64 ( Cell cycle and general TFs )
Size: 789
Size: 54 ( Mixed II )
Size: 59 ( Sporulation and cAMP pathway )
Size: 49
Size: 31 ( Energy and Osmotic stress I )
Size: 81
Size: 53 ( Respiration and carbon regulation )
Size: 77 ( Sporulation and Cell wall )
Size: 30 ( Nitrogen catabolite repression )
Size: 41 ( Energy and Osmotic stress II )
Size: 122
Size: 48 ( TFs and nuclear transport )
Size: 51
Size: 61 ( Cell wall and Transport II )
Size: 555
Size: 71 ( Cell cycle, TFs and DNA metabolism )
Size: 71
Size: 76 ( DNA and RNA processing )
Size: 211
Size: 28 ( Mixed I )
Size: 107
Size: 41 ( Energy and Osmotic stress II )
Size: 36
Size: 40 ( Cell differentiation )
Size: 75
Size: 38 ( AA metabolism II )
Size: 76
Size: 42 ( mRNA, rRNA and tRNA processing )
Size: 37
Size: 32 ( Ribosomal and phosphate metabolis
Size: 38 ( AA metabolism II )
Size: 42 ( mRNA, rRNA and tRNA processing )
Size: 32 ( Ribosomal and phosphate metabolism )
Size: 34 ( Missing values )
Size: 30 ( Cell cycle (G2/M) )
Size: 52 ( AA and purine metabolism )
Models: Usage



Clusters generation: High probabilities equate
to cluster membership.
Fitting data: Use normal tissues to fit models
to genes, use disease tissues to fit genes to
models. Changed behaviour equates to
likelihood of model transition.
Combining models: complex feature
identification (given feature X on condition Y).
Graph: Discovery

Graph connectivity equates to:




Edge Distance equates to:




MST of expression values
Sub-graphs of the gene ontology
Chromosome relationship
Expression distance
Network (ontology) distance
Linear chromosomal distance
Graph partitioned:


regular (using Metis)
irregular (Min/Max)
Analysis: Summary





Desktop analysis.
Number of techniques available.
Techniques can be customised for different
data sets (e.g. organism, array type).
Borrows heavily from Information Retrieval.
Probabilistic techniques show most promise.
SeqExpress:
Data Tools
Data Analysis

Data Import/Export tools:




Data Integration: data and annotation database.


Remote access of GEO (one click access),
Import tab separated and MAGE format.
Export tab separated and Bioconductor format
Automatic and configurable annotation mapping (e.g.
SAGE tag to locuslink (entrez gene?) to unigene)
Data Manipulation: transformation, filtering and
constraining
Data Integration: GEO
Data Integration: Annotation Builder
SeqExpress:
Summary
Summary





Written in C#, is free and runs under windows.
Not associated with any academic institution,
funding body or commercial organisation.
Development is still ongoing.
Plan to develop to the Expression Application
Class Specification.
Looking for employment in Seattle…