Transcript Slide 1
Course on Functional Analysis
:::
Daniel Rico, PhD.
[email protected]
Introduction to Functional Analysis
Bioinformatics Unit
CNIO
::: Schedule.
1.
2.
3.
4.
Biological (Functional) Databases
Threshold-based and threshold free methods
Threshold-based example: FatiGO.
Threshold free example 1: FatisScan.
ACKNOWLEDGEMENTS
Many of these slides have been taken and adapted from original slides by
Fatima Al-Shahrour from Joaquin Dopazo’s group (Babelomics team).
We are grateful for the material and for the great tools they have
developed!!!!
Homo
sapiens
Mus
Rattus
musculus norvegicus
Gallus
gallus
UniProt/Swiss-Prot
EntrezGene
UniProtKB/TrEMBL
Affymetrix
Ensembl IDs
Agilent
Danio
rerio
Drosophila Caenorhabditis
melanogaster
elegans
Genes IDs
Saccharmoyces Arabidopsis
thaliana
cerevisae
HGNC symbol
PDB
EMBL acc
Protein Id
RefSeq
IPI….
Biological databases
KEGG pathways
Gene Ontology
Biological Process
Molecular Function
Cellular Component
Biocarta
pathways
Regulatory elements
miRNA
CisRed
Transcription Factor
Binding Sites
Keywords
Swissprot
InterPro
Motifs
Gene
Expression
in tissues
Bioentities from literature:
Diseases terms
Chemical terms
Gene Ontology
CONSORTIUM
http://www.geneontology.org
• The objective of GO is to provide controlled vocabularies for
the description of the molecular function, biological process and
cellular component of gene products.
• These terms are to be used as attributes of gene products by
collaborating databases, facilitating uniform queries across
them.
• The controlled vocabularies of terms are structured
GO structure
The three categories of GO
Molecular Function
GO tree structure
the tasks performed by individual
gene products; examples are
transcription factor and DNA
helicase
Biological Process
broad biological goals, such as
mitosis or purine metabolism,
that are accomplished by ordered
assemblies of molecular functions
Cellular Component
subcellular structures, locations,
and macromolecular complexes;
examples include nucleus,
telomere, and origin recognition
complex
IS_A
relation
PART_OF
relation
http://www.genome.ad.jp/kegg/pathway.html
http://www.biocarta.com/genes/index.asp
http://www.reactome.org/
http://www.pathwaycommons.org
http://www.whichgenes.org/
http://www.cisred.org/
::: Schedule.
1.
2.
3.
4.
Biological (Functional) Databases
Threshold-based and threshold free methods
Threshold-based example: FatiGO.
Threshold free example 1: FatisScan.
Threshold-based
functional analysis
Threshold-free
functional analysis
Study the enrichment in functional
terms in groups of genes defined by
the experimental value.
Select genes taking into account
their functional properties.
FatiGO
FatiScan
GOminer
DAVID
GSEA
Marmite
MarmiteScan
The two-steps approach
•
•
Genes of interest are selected
using the experimental value.
Selected genes are compared
to the background.
•
Under a systems biology
perspective.
•
Detect blocks of functionally
related genes.
Threshold-based functional analysis
Class1 Class2
FDR<0.05
ttest cut-off
FDR<0.05
Biological meaning?
Threshold-free functional analysis
-
Gene
Set 1
Class1 Class2
Gene
Set 2
Gene
Set 3
Gene set 3
enriched in Class 2
ES/NES statistic
ttest cut-off
Gene set 2
enriched in Class 1
+
::: Schedule.
1.
2.
3.
4.
Biological (Functional) Databases
Threshold-based and threshold free methods
Threshold-based example: FatiGO.
Threshold free example 1: FatisScan.
http://babelomics.bioinfo.cipf.es/
::: How the functional profiling should never be done
It is not uncommon to find the following assertion in papers and talks:
“then we examined our set of genes selected in this way (whatever)
and we discover that 65% of them were related to metabolism, so we
can conclude that our experiment activates metabolism genes”.
Annotation is not a functional result!!!
::: Exercise 1: FatiGO SEARCH
1. Select “FatiGO Search” ” and “H. sapiens”.
2. Upload FatiGO_example.txt file
3. Select “KEGG pathways” and click “Run”
::: Exercise 1: FatiGO SEARCH
1. Select “FatiGO Search” ” and “H. sapiens”.
2. Upload FatiGO_example.txt file
3. Select “KEGG pathways” and click “Run”
FatiGO-Search annotations
Testing the distribution of GO terms among two groups
of genes
(remember, we have to test hundreds of GOs)
Group A
Group B
Are this two
groups of genes
carrying out
different
biological roles?
Biosynthesis 60%
Biosynthesis 20%
Sporulation
Sporulation
20%
Genes in group A have
significantly to do with
biosynthesis, but not with
sporulation.
Biosynthesis
No biosynthesis
A
6
4
20%
B
2
8
Using FatiGO
Comparing groups of genes
List1: genes of interest (they are significantly over- or underexpressed when two classes of experiments are compared, colocated in the chromosomes, etc.)
List2:the background (typically the rest of genes).
Select suitable database, Run...
Remove genes
repeated in list1
“clean”
List1
List1
Remove genes
repeated between
both lists
Extract
functional
terms
“clean”
List2
List2
Remove genes
repeated in list2
BABELOMICS
GO
KEGG
Interpro
KW
Bioentities
Gene
Expression
TF
Cisred
Matrix of
functional
terms
011000101010101
001 ......
11001010 ...........
010001010
...........
0110001010
...........
1111001111.........
......
Fisher´s test
Adjust p-value
by FDR
Significant
functional
terms
List 1b / List 2b
Class1 Class2
FDR<0.05
ttest cut-off
FDR<0.05
List 1
List 2
(background)
::: Exercise 2: FatiGO COMPARE
1. Select “FatiGO Compare” and “H. sapiens”.
2. Upload FatiGO_example.txt file
3. Select “Rest of Genome” as background.
4. Select “KEGG pathways” and click “Run”
::: Exercise 2: FatiGO COMPARE
1. Select “FatiGO Compare” and “H. sapiens”.
2. Upload FatiGO_example.txt file
3. Select “Rest of Genome” as background.
4. Select “KEGG pathways” and click “Run”
Only “Apoptosis” is significant