Microarray Data Analysis Normalization

Download Report

Transcript Microarray Data Analysis Normalization

Babelomics
Functional interpretation of
genome-scale experiments
Barcelona, 28 November de 2007
Ignacio Medina
[email protected]
David Montaner
[email protected]
http://bioinfo.cipf.es
Bioinformatics Department
CENTRO DE INVESTIGACION PRINCIPE FELIPE
(VALENCIA)
Babelomics: A systems biology web
resource for the functional interpretation of
genome-scale experiments.
http://babelomics.bioinfo.cipf.es
Genome-scale experiment output
1007_s_at
1053_at
117_at
121_at
1255_g_at
1294_at
1316_at
.
1320_at
.
1405_i_at
1431_at
1438_at
1487_at
1494_f_at
1598_g_at
160020_at
1729_at
1773_at
177_at
.
.
1007_s_at
12.4
1053_at
11.5
117_at
10.3
121_at
10.2
1255_g_at
9.9
1294_at
9.3
1316_at
8.2
1320_at
8.1
1405_i_at
7.7
1431_at
7.4
1438_at
6.5
1487_at
6.2
1494_f_at
5.9
1598_g_at
5.8
160020_at
4.8
1729_at
4.7
.
.
.
.
Functional
Interpretation
Babelomics imported databases
GO
Homo sapiens
Mus musculus
KEGG
Rattus norvegicus
Interpro
Transcription
Factors
Ensembl ID
Gallus gallus
ENSEMBL
www.ensembl.org
Cisred
Drosophila melanogaster
Saccharmoyces cerevisae
Arabidopsis thaliana
Gene
expression
EMBL acc
UniProt/Swiss-Prot
UniProtKB/TrEMBL
Ensembl IDs
RefSeq
EntrezGene
Affymetrix
Agilent
PDB
Caenorhabditis elegans
Bioentities
Literature
HGNC symbol
Protein Id
IPI….
Babelomics tools
FatiGO: Finds differential distributions of Gene Ontology terms between two
1007_s_at
1320_at
groups of genes.1053_at
1405_i_at
117_at
1431_at
FatiGOplus: an extension
of FatiGO
for InterPro motifs, pathways and
121_at
1438_at
SwissProt KW1255_g_at
, transcription factors
1487_at (TF), gene expression in tissues,
1294_at
1494_f_at cis-regulatory elements CisRed.
bioentities from
scientific literature,
1316_at
1598_g_at
.
160020_at
Tissues Mining Tool: compares
reference values of gene expression in
.
1729_at
tissues to your results.
1773_at
177_at
MARMITE Finds differential distributions of bioentities extracted from PubMed
.
between two groups of genes.
.
1007_s_at
12.4
FatiScan: detect significant functions with Gene Ontology,
InterPro
motifs,
1053_at
11.5
Swissprot KW and KEGG pathways in lists of genes
ordered according
to
differents characteristics.
117_at
10.3
121_at
10.2
MarmiteScan: Use chemical and disease-related information
to detect
related
1255_g_at
9.9
blocks of genes in a gene list with associated values.
1294_at
9.3
8.2
GSEA: Detects blocks of functionally related genes 1316_at
with significant
coordinate
1320_at
8.1
over- or under-expression using the Gene Set Enrichment
Analysis.
1405_i_at
7.7
FatiGO
Organism
Text files with a
column of identifiers
Gene List1
Gene List2
[email protected]
your project name
Biological process
Molecular function
Cellular component
KEGG pathways
Biocarta Pathways (new)
Interpro motifs
Swissprot keywords
Bioentities from literature (Marmite)
Gene Expression (TMT)
Transcription Factor binding sites
Cis-regulatory elements (CisReD)
miRNAs (new)
Testing the distribution of functional
terms among two groups of genes
(remember, we have to test hundreds of GOs)
Group A
Group B
Are this two
groups of genes
carrying out
different
biological roles?
Biosynthesis 60%
Biosynthesis 20%
Sporulation
Sporulation
20%
Genes in group A have
significantly to do with
biosynthesis, but not with
sporulation.
Biosynthesis
No biosynthesis
A
6
4
20%
B
2
8
FatiGO Results
Gene group1 is
enriched in this
functional block
Gene group2 is
enriched in this
functional block
percentages
p-values
corrected
p-values
FatiScan
Organism
Gene List
ordered
according the
experimental
value
Gene1
12.4
Gene2
11.5
Gene3
10.3
Gene4
10.2
Biological processGene5
Molecular functionGene6
Gene7
Cellular component
KEGG pathways Gne8
Interpro motifs Gene10
gene11
Keywords Swissprot
.
Transcription Factor
Cis-regulatory
elements
9.9
9.3
8.2
8.1
7.7
7.4
.
Testing along the ordered list
Annotation label A
List of genes
•Index ranking genes
according to some
biological aspect under
study.
+
A
B
C
Annotation label B
Annotation label C
Block of genes
enriched in the
annotation A
•Database that stores
gene class membership
information.
Annotation C is
homogeneously
distributed
along the list
•FatiScan searches
over the whole ordered
list, trying to find runs
of functionally related
genes.
-
Block of genes
enriched in the
annotation B
Fatiscan results
List of genes
+
A
B
C
% Genes with the specific GO annotation for each partition
Functional interpretation
A B
+
Expression level
GO overrepresented
among genes
over-expressed in
A
GO overrepresented among
genes overexpressed in B
% Genes with the specific GO annotation for each partition
FatiScan Example
Tumor
Control
t ~ Tumor mean expression – Control mean expression
All genes in the array
+t
-t
Proliferation
Is more associated
with the genes on
the top of the list
Is more associated with the
genes that show higher
expression in Tumors