liquid association-1..

Download Report

Transcript liquid association-1..

Human Genome Program, U.S. Department of Energy, Genomics and Its Impact on Medicine and Society: A 2001 Primer, 2001
• Two examples
• A challenge
Liquid Association (LA)
• LA is a generalized notion of association for describing certain
kind of ternary relationship between variables in a system. (Li 2002
PNAS)
•
Liquid Association
high (+)
•
low (-)
Y
transit
state 1
state 2
Linear (state 1)
Linear (state 2)
•
•
low (-)
X
Green points represent four
conditions for cellular state
1.
Red points represent four
conditions for cellular state
2.
Blue points represent the
transit state between
cellular states 1 and 2.
(X,Y) forms a LA.
high (+)
Profiles of genes X and Y are
displayed in the above scatter plot.
Important! Correlation
between X and Y is 0
Mathematical Statistics on LA
• EX=0, EY=0, SD(X)=SD(Y)=1
• LA is defined by following equation. g(Z) is
the conditional expectation of the
correlation between X and Y. LA(X,Y|Z) is
the expected changes of the correlation
between X and Y.
g(Z )  EX,Y (XY Z )
LA(X,Y Z )  EZ (g(Z ))
Stein Lemma
• To compute E(g’(Z)) is not easy. With
help from mathematical statistics theory,
the LA(X,Y|Z) can be simplified as
E(XYZ) when Z follows normal
distribution.
Stein lemma
LA(X,Y | Z )  E (g(Z ))  E (Zg(Z))
 E(ZE(XY | Z ))  E (E(XYZ | Z ))
 E(XYZ )
Human Genome Program, U.S. Department of Energy, Genomics and Its Impact on Medicine and Society: A 2001 Primer, 2001
gene-expression data
cond1 cond2 …….. condp
gene1
gene2
gene n
x11
x21
x12 …….. x1p
x22 …….. x2p
…
…
Correlation Coefficient
has been used by Gauss,
Bravais, Edgeworth …
Sweeping impact in data analysis is due to
Galton(1822-1911)
“Typical laws of heredity in man”
Karl Pearson modifies and popularizes its use.
A building block in multivariate analysis, of
which
clustering, classification, dimension reduction are
recurrent themes
An application
Two classes
problem
ALL (acute lymphoblastic
leukemia)
AML(acute myeloid leukemia)
Why clustering make sense
biologically?
The rationale is
expression similarity
functionally related.
Genes with high degree of
likely to be
may form
are
structural complex,
may participate in
common pathways.
may be co-regulated
by
common upstream regulatory
elements.
Simply put,
Profile similarity implies functional association
However, the converse is not true
The expression profiles of majority of
functionally associated genes are
indeed uncorrelated
• Microarray is too noisy
•Biology is complex
Why no correlation?
• Protein rarely works alone
• Protein has multiple functions
• Different biological processes or pathways have to be
synchronized
• Competing use of finite resources : metabolites, hormones,
• Protein modification: Phosphorylation, proteolysis,
shuttle, …
Transcription factors serving both as activators and
repressors
Transcription factors: proteins that bind to DNA
Activator; repressors
Going
subtle:Protein
modification
Histone
inhibits
transcription
To activate
transcription,
the lysine side
chain must be
acetylated.
Weaver(
2001)
Corepressor :
histone
deacetylase
Thyroid hormone
Coactivator:
Histone
acetyltransferase
Math. Modeling : a nightmare
Current
mRNA
Observed mRNA
hidden
mRNA
protein kinase
ATP, GTP, cAMP, etc
Cytoplasm
Nucleus
localization
Mitochondria
Vacuolar
DNA methylation, chromatin structure
Nutrients- carbon, nitrogen sources
Temperature
Water
Next
F
I
T
N
E
S
S
F
U
N
C
T
I
O
N
Statistical
methods become
useful
What is LA? PLA?
Concept of “mediator”
Schematic illustration of LA
low(-) gene Y high(+)
Fig2-Top
s tate1
transit
s tate2
Linear
(state1 )
Linear
(state2 )
low(-)
gene X
high(+)
condition
Fig2-Bottom
lo w (-)
ge ne Z
hig h(+)
Example 1. Positive-to-negative
• X=ARP4,Y=LAS17,
Z=MCM1
• Corr =0 in each plot
• For low Z (marked
points in A), X and Y
are coexpressed
• (B). For high Z
(marked points in B), X
and Y are
contra-expressed
Arp4 Protein that interacts with core histones,
member of the NuA4 histone acetyltransferase complex;
actin related protein
Las17 Component of the cortical actin cytoskeleton
Figure 2.
(A)
(B)
Example 2 -Negative to Positive
• X=QCR9, Y= ROX1,
Z=MCM1
• Corr=0 in each plot
• For low Z (marked
points in A), X and Y are
contra-expressed
• (B). For high Z (marked
points in B), X and Y are
co-expressed
Rox1 Heme-dependent transcriptional repressor of
hypoxic genes including CYC7(iso-2-cytochrome c )
and ANB1(translation initiation, ribosome)
Qcr9 Ubiquinol cytochrome c reductase subunit 9
Figure 3.
(A)
(B)
A Challenge
• What genes behave
like that ?
• Can we identify all of
them ?
• N=5878 ORFs
• N choose 3 = 33.8
billion triplets to
inspect
Statistical theory for LA
• X, Y, Z random variables with mean 0 and
variance 1
• Corr(X,Y)=E(XY)=E(E(XY|Z))=Eg(Z)
• g(z) an ideal summary of association
pattern between X and Y when Z =z
• g’(z)=derivative of g(z)
• Definition. The LA of X and Y with respect
to Z is LA(X,Y|Z)= Eg’(Z)
Statistical theory-LA
• Theorem. If Z is standard normal, then
LA(X,Y|Z)=E(XYZ)
• Proof. By Stein’s Lemma : Eg’(Z)=Eg(Z)Z
•
•
•
•
•
=E(E(XY|Z)Z)=E(XYZ)
Additional math. properties:
bounded by third moment
=0, if jointly normal
transformation
Normality ?
• Convert each gene expression profile by
taking normal score transformation
• LA(X,Y|Z) = average of triplet product of
three gene profiles:
(x1y1z1 + x2y2z2 + …. ) / n
•
•
How does LA work in yeast?
Urea cycle/arginine biosynthesis
Yeast Cell Cycle
(adapted from Molecular Cell Biology, Darnell et al)
Most visible
event
ARG1
Glutamate
ARG2
ARG1
Glutamate
ARG2
ARG1
ARG1
aspartate
8th place
negative
Glutamine CPA2
ARG4
fumarate
citrulline
ARG3
carbamoyl
phosphate
CPA1
arginine
ornithine
CAR1
urea
CAR2
N-acetylglutamate
Glutamate
L-argininosuccinate
L-glutamate-5-semialdehyde
ARG2
Y
Proline
Figure 2 . The four genes in the urea cycle are coded by ARG3,
ARG1, ARG4, and CAR1 in S. Cerevisiae.
ARG2 enocodes acetyl-glutamate synthase, which catalyzes the first
step of ornithine biosynthesis. CPA1 and CPA2 enocode small and
large units of carbamoylphosphate synthetase. CAR2 encodes
ornithine aminotransferase. This chart is adapted from KEGG.
Adapted from KEGG
X
Compute LA(X,Y|Z)
for all Z
Rank and find
leading genes
Why negative LA?
high CPA2 : signal for arginine demand.
up-regulation of ARG2 concomitant with down-regulation of CAR2
prevents ornithine from leaving the urea cycle.
When the demand is relieved, CPA2 is lowered, CAR2 is up-regulated,
opening up the channel for orinthine to leave the urea cycle.
2
0
-2
-1
0
Low
CAR2
High
1
1
-1
-2
Low
ARG2
High
2
low CPA2
median CPA2
high CPA2
Linear (low CPA2)
Linear (high CPA2)
Other examples (see Li 2002)
• X=GLN3(transcription factor), Y=CAR1, Z=ARG4 (8th
place negative end)
• Electron transport: X=CYT1(cytochome c1), gives ATP1
(11 times), ATP5 (subunits of ATPase)
• Calmodulin CMD1, NUF1 (binding target of CMD1),
CMK1(calmodulin-regulated kinase), YGL149W
• Glycolysis genes PFK1, PFK2 (6-phospho-fructokinase)
• CYR1(adenylate cyclase) , GSY1 (glycogen synthase),
GLC2( glucan branching), SCH9(serine/threonine protein
kinase; longevity)
•
SCH9
Protein kinase that regulates signal transduction activity and G1
progression, controls cAPK activity, required for nitrogen activation of
the FGM pathway, involved in life span regulation, homologous to
mammalian Akt/PKB (SGD summary)
• Science. 2001 Apr 13;292(5515):288-90. Regulation of
longevity and stress resistance by Sch9 in yeast.Fabrizio P,
Pozza F, Pletcher SD, Gendron CM, Longo VD.
•
The protein kinase Akt/protein kinase B (PKB) is implicated in insulin signaling in
mammals and functions in a pathway that regulates longevity and stress
resistance in Caenorhabditis elegans. We screened for long-lived
mutants in nondividing yeast Saccharomyces cerevisiae and
identified mutations in adenylate cyclase and SCH9, which is
homologous to Akt/PKB, that increase resistance to oxidants and extend lifespan by up to threefold. Stress-resistance transcription factors Msn2/Msn4 and
protein kinase Rim15 were required for this life-span extension. These results
indicate that longevity is associated with increased investment in maintenance
and show that highly conserved genes play similar roles in life-span regulation in
S. cerevisiae and higher eukaryotes.
ARG1
ARG1
ARG2
• Blue : low SCH9
• Red: high SCH9
ARG3
ARG3
ARG2
ARG2
ARG4
ARG4
ARG2
ARG2
CAR1
CAR1
ARG2
ARG2
ARG2