Activist data mining (as applied to Carbon: Nitrogen
Download
Report
Transcript Activist data mining (as applied to Carbon: Nitrogen
Activist Data Mining (as
Applied to Carbon:Nitrogen
Sensing in Plants)
Dennis Shasha
New York University
Department of
Biology
Gloria Coruzzi
Mike Chou
Andrew Kouranov
Laurence Lejay
Courant Institute of
Math & Computer Sciences
Dennis Shasha
Bud Mishra
Marco Antoinotti
Marc Rejali
Photosynthesis
LIGHT
Sugar
NH4+
Amino
Acids
Glu
Gln
Asp
Asn
Light, Carbon and Amino acids
differentially regulate N-assimilation genes
Light
Carbon
GS2
Gln
C:N
Light
Amino acids
C5:N2
Carbon
AS1
Asn
C:N
Amino acids
C4:N2
Goal: Figure out the Circuit for many genes
A Multi-factor Approach to C:N sensing in plants.
Identify how a combination of interactions of “inputs”
(Light, Carbon, & Nitrogen) affects gene regulation
using Combinatorial Design and Genome Chip analysis.
Identify Arabidopsis mutants defective in C:N sensing
Forward genetics: Selections for C:N sensing mutants
Reverse genetics: Mutants in candidate C:N signaling genes
Ultimate Goal: Virtual plant… (frankenfoods)
A Combinatorial Approach to discovering interactions
Inputs:
*Light
*Starvation to Various Nutrients
*Carbon
*Inorganic N (NO3/NH4)
*Organic N (Glu)
*Organic N (Gln)
If inputs are take binary values (first approximation)
6 binary (+/-) inputs= 26 or 64 input combinations (or treatments)
Use combinatorial design to reduce number of treatment
combinations required to effectively cover the experimental space
ACTIVIST DATA MINING
Don’t study the experiments (only). Change them.
Combinatorial design generates a subset of the 64 treatments
that give “good” approximation of the entire experimental space.
For every pair of “inputs”, all four combinations of binary
variables are tested:
Example; NO3 and Carbon have four possible combinations
+NO3 +Carbon; +NO3 -Carbon; -NO3 +Carbon; -NO3 -Carbon
Each combination of inputs is present in at least one treatment
of experiments predicted by combinatorial design
“Combinatorial design” predicts 12 conditions to test the effect of
Light in all combinations of Starvation, Carbon, and Nitrogen
EXPT 1
PIVOT
LIGHT
LANE
LIGHT
STARVE
CARBON
NO3NH4
GLU
GLN
1
LIGHT
N
0
0
0
0
2
LIGHT
Y
0
L
0
H
3
LIGHT
Y
L
0
H
0
4
LIGHT
N
L
L
H
H
5
LIGHT
N
0
0
H
H
6
LIGHT
N
L
L
0
0
7
DARK
N
0
0
0
0
8
DARK
Y
0
L
0
H
9
DARK
Y
L
0
H
0
10
DARK
N
L
L
H
H
11
DARK
N
0
0
H
H
12
DARK
N
L
L
0
0
“Pivot” analysis of gene expression data from C:N treatments
Find “minimal pairs” of treatments that are the same except
in one input (e.g. Light) to measure its effect on a dependent
variable (gene) (e.g. AS1)
PIVOT
Dependent
Variable
(Gene)
EFFECT
LIGHT
AS1
repress
Evidence =
Minimal pair
treatments
4_8
LITE
STARVE
CARBON
L_D
N
L
NO3
GLU
0
H
Analyze a series of minimal pair treatments using one input
(e.g. Light) as a “pivot”, to determine the effect of light
on a dependent variable (e.g. AS1) under a variety of carbon
and nitrogen combinations. If consistent, likely always true.
LITE represses AS1 & induces GS2 under a variety of C:N conditions
PIVOT
dependent
EFFECT
Evidence=
Minimalpair
treatments
LIGHT
AS1
repress
1_5
LIGHT
AS1
repress
LIGHT
AS1
LIGHT
LITE
STARVE
CARBON
NO3/NH4
GLU
L_D
Y
0
0
0
2_6
L_D
Y
L
0
0
repress
3_7
L_D
Y
L
L
0
AS1
repress
4_8
L_D
N
L
0
H
LIGHT
AS1
repress
10_14
L_D
N
0
0
0
LIGHT
AS1
repress
11_15
L_D
Y
L
0
0
LIGHT
AS1
repress
12_16
L_D
Y
L
L
0
LIGHT
AS1
repress
13_17
L_D
Y
L
0
H
LIGHT
GS2
induce
1_5
L_D
Y
0
0
0
LIGHT
GS2
induce
2_6
L_D
Y
L
0
0
LIGHT
GS2
induce
3_7
L_D
Y
L
L
0
LIGHT
GS2
induce
4_8
L_D
Y
L
0
H
LIGHT
GS2
induce
10_14
L_D
N
0
0
0
LIGHT
GS2
induce
11_15
L_D
N
L
0
0
LIGHT
GS2
induce
12_16
L_D
N
L
L
0
LIGHT
GS2
induce
13_17
L_D
Y
L
0
H
GLU induces AS1 & represses GS2 under a variety of conditions
PIVOT
Gene
EFFECT
Evidence=
Minimalpair
Treatments
LIGHT
STARVE
Carbon
NO3/NH4
GLU
GLU
AS1
induce
2_4
L
Y
L
0
0_H
GLU
AS1
induce
6_8
D
Y
L
0
0_H
GLU
AS1
induce
15_17
D
Y
L
0
0_H
GLU
AS1
induce
19_21
D
N
L
0
0_H
GLU
AS1
induce
23_25
L
N
L
0
0_L
GLU
AS1
induce
26_28
L
Y
L
0
0_L
GLU
AS1
induce
30_32
L
Y
L
0
0_L
GLU
GS2
repress
2_4
L
Y
L
0
0_H
GLU
GS2
repress
6_8
D
Y
L
0
0_H
GLU
GS2
repress
11_13
L
Y
L
0
0_H
GLU
GS2
repress
15_17
D
Y
L
0
0_H
GLU
GS2
repress
19_21
D
N
L
0
0_H
GLU
GS2
repress
20_22
D
N
L
L
0_H
GLU
GS2
repress
23_25
L
N
L
0
0_L
GLU
GS2
repress
30_32
L
Y
L
0
0_L
Underlying Method: combinatorial design
Combinatorial design: Inspired by work in software testing by
David Cohen, Siddhartha Dalal, Michael Fredman and
Gardner Patton at Bellcore/Telcordia.
Their problem: how to test a good set of inputs to a
program to discover whether there are any bugs.
Not program coverage, but input coverage.
Not all input combinations, but all combinations of
every pair of of input variables.
Hypothesis: every input combination should give
same output: no error.
If true for designed subset, then program is ok.
Underlying Method: combinatorial design 2
Scientific question: does input X induce
(resp. repress) the output?
If so, then, regardless of the other inputs,
X should induce.
So, choose X = low and then a combinatorial design of
the other inputs.
Then choose X = high and then the same combinatorial
design of the other inputs.
If for each context c in the design (high,c) has more
output than (low,c) -- minimal pair -- then X is inductive.
Underlying Methods: adaptive design
What happens when X isn’t uniformly inductive or repressive?
Suppose X shows induction normally, but repression
occasionally. That is for most c values
(low, c) vs. (high, c) shows induction, but for one c’
(low,c’) vs. (high, c’) shows repression.
Then study difference between those c values
showing induction that are closest to c’ and
design experiments to reduce those differences.
Conclusions About Methodology
Design/don’t wait: Use the data you are given, sure, but
don’t be shy to ask for more.
Combinatorial Design can help test a hypothesis:
e.g. 10 three-valued variables require
59,049 experiments to cover whole space. Combinatorial
design can reduce this to 27.
Adaptation is easy: Study differences between normal cases
and abnormal ones to discover fine structure.