Distribution of p values for short and long experiments
Download
Report
Transcript Distribution of p values for short and long experiments
Preliminary Exploratory Data
Analysis for Ruth’s G2P Workflow
Bernice Rogowitz
4/30/2010
Exploratory Data Analysis Exercise
• A look at Bjorn’s experimental data with
Lecong’s Lemma analysis
• Main focus: to demonstrate potential for
exploratory data analysis techniques for G2P
visualization
– and to demonstrate ViVA capabilities
• Preliminary identification of “interesting” clusters
of genes, which can feed additional analyses of
pathway and metabolic activity
• This type of analysis can pave the way to
creating an exploratory analysis component
based on ViVA for integration into Ruth’s
workflow
Experimental Data
•
15,085 genes
•
2 Experiments
– Long Exposure
– Short Exposure
•
5 Temperatures for each -10,12, 14, 17, 8 degrees Celsius
•
Control Condition for each
Experiment, 20 degrees Celsius
•
Data
– Gene expression values
– Statistical significance, relative
to control condition
A quick look at expression value distributions
Distribution of p values for short and long
experiments
Notice: lots of
highly significant
values, p<=.05
Exploratory data analysis
For each condition,
mark values where
p<=0.5 in green
For each condition,
mark values where
p>=0.5 in in black
• Result: Green = all those genes that are
significant in all the short conditions.
• Note, none significantly different from the control
in the warmest condition, 17 degrees Celsius.
Identify genes by ID in “Category
Table”
Genes that are significantly different from controls
across14C, 12C, 10C and 8C
Short Experiment
Genes that are significantly different from controls
across14C, 12C, 10C and 8C
Long Experiment
The “yellow” genes are significantly different from
controls in both the short and long experiments
Genes that are significantly different from controls
across 14C, 12C, 10C and 8C
Short Experiment
Two genes
• At 17C, there were no genes that behaved
different from the control.
• Two genes were significantly different from their
controls in all other experimental conditions
– In the long- and short-duration experiments
– For temperature = 14C, 12C, 10C and 8C
These are:
– 246114_at
– 257252_at
Another Analysis
• Identify Genes that are differentially
expressed in the different experiments and
conditions
• First, identify genes that are visually
different from controls.
• Second, filter out identified genes that are
not statistically different from the controls.
Expression vs. Control
Short Experiment
Long Experiment
Visual Exploration: Short Condition
1. Visually identified genes in short-14 condition that were different from
controls. Color them red.
2. Examine short- 8 condition. Additional genes identified in short-8, which
were visually different from controls, color them green.
Visual Exploration: Short Condition
1. Genes that are active in the short-14 condition tend to be more highly
differentiated in lower temperature, short-8 (red)
2. Additional genes are differentiated at the lower temperature (green)
Visual Exploration: Long Condition
1. Some red and green genes are also visually differentiated in the
long condition
2. Additional genes are visually differentiated, which are not green
or red. Color them blue.
Visual Exploration: Return to the
Short Condition
1. Genes that are differentiated in the long condition (blue) are basically not
differentiated in the short condition (no blue genes are present)
S-14
S-8
L-8
“Blue” Genes
28 genes identified which are
very different from the
control for the long-8
condition, but not for the
short conditions.
Next: which of these are
significantly different?
ViVA “Category Table”
Table sorted by p-value”. Not all “blue” genes significantly different from the
control. To see which ones are, look at the blue genes only.
Identifying p<=0.05 deviations
P<= .01
P<= 0.05
Genes that are significantly expressed
for long, but not short duration
P<=0.01
P<=0.05
Goal: Venn Diagram of CoExpression (a la Ruth)
Short 14
Long 14
2
Short 8
Long 8
Next step
• Translate gene ids to searchable format
• Use PlantMetGen and MapMan to identify
involved pathways and metabolic functions