Personalized Medicine
Download
Report
Transcript Personalized Medicine
Nathaniel Dang
403221488
CSM124, Spring 2008
Motivation
There are many factors that go into calculating disease
risk – environmental, genetic, and sometimes chance.
If we are able to effectively approximate an individual’s
increased disease risk factor due to genetic variation,
we can take several actions (if possible), including:
More frequent/earlier screening
Pre-symptomatic medication aimed at reducing one or more risk
factors
Focusing on reducing the environmental risk factors, including
lifestyle changes
Study these variations in depth, find correlated SNPs, study
proteins encoded by the genes – leads to a greater
understanding.
Goals
Given the probabilities of mutations, the
relative risks of those mutations, and a
baseline risk level, what is the
prevalence and genetic contribution to
risk?
What has more of a genetic effect on
overall risk - to have more SNPs of
lower relative risk or probability, or fewer
SNPs of higher relative risk/probability?
Background
We learned in class:
pa
pa
( 1) pa 1
Where γ is relative risk. We then used
pa and pa+ to find the power of the
association, given a specific value for γ.
This assumes a very low F (prevalence)
value.
Background
Now, we wish to find how SNP(s) affect
our disease risk – How much more likely
does a SNP make it that we will catch a
disease?
To simplify the problem, assume that
SNPs are not correlated, and thus their
effects on disease risk are completely
independent.
Single-SNP Case
The probability of an individual having disease without
mutation is given by
R = P(+|g1=0 ), where R is our baseline level of risk.
Then, the probability of having disease given a mutation
is:
P(+|g1=1) = γ1R
Thus, the prevalence in the entire population is:
F=P1γ1R + (1-P1)R
Then, the genetic contribution can be found by dividing
by the baseline risk:
F/R=P1 γ1+(1-P1)
Two SNP Case
The probability of an individual having disease without mutation is
given by
P( + | g1=0, g2=0) = R, where R is our baseline level of risk.
Then, the probability of having disease given a mutation at SNP1 but
not SNP2 is:
P(+ | g1=1,g2=0) = γ1R
And the probability of having disease given a mutation at SNP2 but
not SNP1 is:
P(+ | g1=0,g2=1) = γ2R
Thus, the prevalence in the entire population is:
F =P1P2 γ1 γ2R + P1(1-P2) γ1 R + P2(1-P1) γ2R + (1-P1)(1-P2)R
Then, the genetic contribution can be found by dividing by the
baseline risk:
F/R = P1P2 γ1 γ2 + P1(1-P2) γ1 + P2(1-P1) γ2 + (1-P1)(1-P2)
What about n-SNP cases?
Again, holding the p’s and ‘s constant across all
SNPs, it can be shown that for n number of SNPs,
gi
n
F/R= ( pigi (1 pi )(1 gi ))( i)
g{0,1}n i 1
Where gi is the genotype for SNP i.
Implementation
Treat each SNP’s genotype as a binary value: 1 for
mutation, 0 for normal
For example, applying the formula from the previous
slide to a 2-SNP case, and treating all pi’s and γi’s
as the same values p and γ, then as seen before:
F/R = (1-P1)(1-P2) + P1γ1(1-P2) + P2γ2(1-P1) + P1γ1P2γ2 =
00+01+10+11
Which simplifies to:
F/R = (1-P)2 + Pγ(1-P) + (1-P)Pγ + (Pγ)2
Implementation
So, to find F/R for n SNPs, we generate an
n-digit binary matrix, with all values from 0
to 2n-1(decimal). Each row represents one
of the terms in the F/R equation.
Thus, iterate over each row, counting the
number of ‘1’ and ‘0’ values; for each ‘1’ in
the row, the term is multiplied by Pγ, and for
each ‘0’, the term is multiplied by (1-P).
Finally, Sum each of the rows to get the final
value.
Methods
We calculated the F/R (genetic contribution) for
several cases:
Holding the # of SNPs constant, how do varying minor
allele frequencies affect F/R?
Holding the # of SNPs constant, how do varying relative
risks for each SNP affect F/R?
Finally, holding the minor allele frequencies and relative
risks constant for all SNPs, how does the number of SNPs
affect F/R?
Compare and contrast the three scenarios
Results: Varying γ and p (same values for all
SNPs)
One SNP:
Relative Risks (γ)
0.1
0.2
0.3
0.4
0.5
1.0
1.0
1.0
1.0
1.0
1.0
2.0
1.1
1.2
1.3
1.4
1.5
3.0
1.2
1.4
1.6
1.8
2.0
5.0
1.4
1.8
2.2
2.6
3.0
10.0
1.9
2.8
3.7
4.6
5.5
25.0
3.4
5.8
8.2
10.6
13.0
Two SNPs:
Relative Risks (γ)
Minor Allele Frequencies (p)
Minor Allele Frequencies (p)
0.1
0.2
0.3
0.4
0.5
1.0
1.0
1.0
1.0
1.0
1.0
2.0
1.21
1.44
1.69
1.96
2.25
3.0
1.44
1.96
2.56
3.24
4.0
5.0
1.96
3.24
4.84
6.76
9.0
10.0
3.61
7.84
13.69
21.16
30.25
25.0
11.56
33.64
67.24
112.36
169.00
Results
Ten SNPs:
Minor Allele Frequencies (p)
0.1
0.2
0.3
0.4
0.5
1.0
1.0
1.0
1.0
1.0
1.0
2.0
2.59
6.19
13.78
28.92
57.66
3.0
6.19
28.9
109.9
357.04
1024.00
5.0
28.92
357.04
2655.99
14116.70
59049.00
10.0
6.13e+02
2.96e+04
4.81e+05
4.24e+06
2.53e+07
25.0
2.06e+05
4.31e+07
1.374e+09
1.79e+10
1.38e+11
Results
F/R vs. Relative Risk, for 1 and 2 SNPs, p=0.1
4.00
3.50
3.00
2.50
2.00
1 SNP
1.50
2 SNPs
1.00
0.50
0.00
0.0
5.0
10.0
15.0
F/R vs. Number of SNPs, for γ=2,3, and 5, p constant at 0.1
35
30
25
Relative Risk =
2
20
Relative Risk =
3
15
10
Relative Risk =
5
5
0
-5
0
5
10
15
Conclusions
Here, we can see that as expected, when
holding all else constant, larger minor allele
frequencies lead to an increased genetic
contribution to disease risk.
Similarly, all else constant, larger relative
risks also lead to increased genetic
contribution to disease risk.
Similarly, larger numbers of SNPs leads to
increased genetic factor of disease risk.
Conclusions
However, they do not scale equivalently!
For example, holding p=0.1, increasing the γ
from 1 to 10 increased F/R by factors of 1.9,
3.61, and 613, for one, two, and ten SNPs,
respectively.
Yet holding p=0.1, increasing the number of
SNPs from 1 to 10 increased F/R by factors of
2.35, 5.8, 20.65, and 322, for γ=2,3,5,and 10,
respectively.
We can see that increasing #of SNPs by a
factor of 10 usually has a greater effect on F/R
than increasing γ by the same factor, for most
cases.
Conclusions
As another example, say we had two SNPs,
each with p=0.1 and relative risk of 10, which
gives an F/R of 3.61. A single SNP with the
same p hoping to achieve the same F/R
value would require a relative risk of 25!
It appears that going from p=0.1 to p=0.5
holding all else equal, results in greater F/R
gains than going from γ=1 to γ=5 holding all
else equal, most of the time. However, for
low γ values, increased p does not result in a
large increase in F/R.
Future Work
Scalability: Currently, the program in R
doesn’t allow for creation of large matrices
(>20) due to memory issue
Allow for differing values of p and γ for each
SNP
(Read in vectors of p and γ values?)