Open Day 2006

Download Report

Transcript Open Day 2006

From Expression, Through
Annotation, to Function
Ohad Manor & Tali Goren
Open Day 2006
Have you ever wondered…
Open Day 2006
Types of Data
What Characterizes
these data sets?
Systematic view in
genomic
large
scale
Gene
GO Annotations
Expression
Protein
Sub
Cellular
– Protein
GO
Annotations
ChIP
on
chip
(Microarray)
Localization
Interactions
Open Day 2006
Open Day 2006
What is
?
• A computational tool to check enrichment
of data sets
• Implemented in perl
• Interactive command line
• May be scripted…
• Concatenate tests and matrix operations
• Data manipulation functions and queries
Open Day 2006
Using
• Load biological data
• Check enrichment of crossed data sets
• Extract statistically significant results
• Multiple hypothesis correction
• Cluster gene sets
• Save results
Open Day 2006
What is statistically significant?
• How to choose the right test to compare
measurements?
• Non – Parametric:
– no assumption about sample size or distribution
– no parameters such as
expectation or variance
• Paired or Unpaired?
Open Day 2006
Paired – Binary Version
Gene1
Gene1
Gene2
Gene2
Gene3
Gene3
Gene4
Gene4
Gene5
Gene5
Gene6
Gene6
Gene7
Gene7
Gene8
Gene8
Gene9
Gene9
Gene10
Gene10
Ribosome
Assembly
RAP1
RAP1
Ribosome
Assembly
0
1
0
3
2
1
0
5
Open Day 2006
Paired – continuous version
heat shock
Gene1
YPD
Gene1
Gene1
Gene2
Gene2
Gene2
Gene3
Gene3
Gene3
Gene4
Gene4
Gene4
Gene5
Gene5
Gene5
Gene6
Gene6
Gene6
Gene7
Gene7
Gene7
Gene8
Gene8
Gene8
Gene9
Gene9
Gene9
Gene10
Gene10
Gene10
-1
1
Open Day 2006
Unpaired test
heat shock
heat
shock
RAP1
Gene1
Gene1
Gene2
Gene2
Gene3
Gene3
Gene4
Gene4
Gene5
Gene5
Gene6
Gene6
Gene7
Gene7
Gene8
Gene8
Gene9
Gene9
Gene10
Gene10
Gene1
Gene2
Gene4
Gene5
Gene6
heat shock
Gene3
Gene7
Gene8
Gene10
-1
1
Open Day 2006
Statistics Statistics…….
Type Of Data
Goal
Parametric Tests
Non -Parametric
Tests
Compare two
unpaired groups
Unpaired T test
Kolmogorov-Smirnov
Compare two
paired groups
Paired T test
Wilcoxon test
Quantify
association
between two
variables
Pearson correlation
Spearman Correlation
Binary
Measurements
Chi-square test
Open Day 2006
How About Some Biology?
Open Day 2006
S. Cerevisiae Regulation
• Let’s presume we know nothing about the
Yeast
• Use ENRICH to construct a basic
regulatory network of Yeast
• How can we do that?
Open Day 2006
STE12
RAP1
MSN2
FHL1
Flow chart
Gene1
ChIP
Gene3
Gene4
Gene5
Gene6
Gene7
Gene8
Gene9
HG test
Ribosomal
Stress
Cell cycle
Metabolism
Gene10
Gene1
STE12
Ribosomal
Stress
Cell cycle
Metabolism
Ribosomal
Stress
Cell cycle
Metabolism
Gene2
Significance
RAP1
STE12
RAP1
YAP5
YAP5
MSN2
MSN2
SFP1
SFP1
threshold
FHL1
FHL1
GAT1
GAT1
Gene2
GO
Gene3
Gene4
Gene5
P-values
Binary
values
Gene6
Gene7
Gene8
Gene9
Gene10
Open Day 2006
Yeast regulation network
Metabolism
Stress
Cell cycle
Open Day 2006
FHL1 protein
Case study
Open Day 2006
FHL1 – what is known
• Putative transcriptional regulator
• Predicted to be involved in stress
response
• Required for rRNA processing
• Null mutant shows reduced growth rate
• Could we have found all of that alone?
Open Day 2006
Experimental various conditions
Exp.
Gene1
Gene2
Gene4
Exp.1
Gene5
Exp.2
Gene6
Exp.3
Gene7
Exp.4
FHL1
Gene8
genes
Gene10
Unpaired
Exp.5
Exp.1
HG test
Exp.2
Exp.3
FHL1
Gene1
Gene2
Gene3
T-test
Exp.4
Conditions
Exp.5
P-values
Exp.2
Exp.3
Gene5
Exp.4
Gene6
Exp.5
Gene8
Gene9
P-values
Exp.1
Gene4
Gene7
FHL1
Gene9
Heat shock
AA starvation
osmotic stress
oxidative stress
invasive growth
FHL1
Gene3
Binary
values
Gene10
Open Day 2006
Tell me who are your friends…
FHL1
Gene1
Gene2
Gene3
Gene4
Gene5
RAP1
FKH2
MBP1
GAT3
SOK2
Gene6
Gene7
Gene9
HG test
Gene10
ChIP
FHL1
Gene8
Gene1
Gene2
Gene3
P-values
Gene4
Gene5
Gene6
Gene7
Gene8
Gene9
Gene10
Open Day 2006
Enriched
conditions
Growth
Enriched GO
annotations
Ribosome
assembly
Stress
response
Enriched TF’s
RAP1
SFP1
GAT3
Open Day 2006
Remember this question?
• What is the connection between the expression level of a
gene to its sub-cellular localization?
• Which Transcription Factors regulate Amino Acid
Biosynthesis?
• Does a heat shock affect peripheral proteins
more than it affects mitochondrial proteins?
Cell Periphery
Mitochondrion
Open Day 2006
Flow chart
Mitochondria
Bud Neck
Vacuole
Cell periphery
Nucleus
Exp.
Gene1
Gene2
Gene3
Gene4
Mitochondria
Bud Neck
Vacuole
Cell periphery
Nucleus
Gene6
Gene7
Gene8
Gene9
genes
Gene10
Unpaired
Exp.2
Exp.3
Exp.4
Exp.5
Exp.1
HG test
Exp.2
Gene1
Gene2
Gene3
T-test
Short HS
Medium HS
Long HS
Severe HS
Moderate HS
Exp.3
Localization
Exp.4
Exp.5
P-values
Exp.2
Gene5
Exp.3
Gene6
Exp.4
Gene7
Exp.5
Gene9
Gene10
Short HS
Medium HS
Long HS
Severe HS
Moderate HS
P-values
Exp.1
Gene4
Gene8
Cell periphery
Mitochondria
Exp.1
Gene5
Binary
values
Open Day 2006
Future plans
• Continue to develop
• More data available out there
• Build Regulation networks for the
Yeast and other species
Open Day 2006
Questions
Open Day 2006
Thanks
• Prof. Nir Friedman
• Tommy Kaplan
• And to you for listening!!!
Open Day 2006
Open Day 2006