Linkage Analysis - The Blavatnik School of Computer Science

Download Report

Transcript Linkage Analysis - The Blavatnik School of Computer Science

Genetic linkage analysis
Dotan Schreiber
According to a series of presentations by M. Fishelson
OutLine
•
•
•
•
•
Introduction.
Basic concepts and some background.
Motivation for linkage analysis.
Linkage analysis: main approaches.
Latest developments.
“Genetic linkage analysis is a statistical
method that is used to associate
functionality of genes to their location on
chromosomes.“
http://bioinfo.cs.technion.ac.il/superlink/
The Main Idea/usage:
Neighboring genes on the chromosome
have a tendency to stick together when
passed on to offsprings.
Therefore, if some disease is often passed
to offsprings along with specific markergenes , then it can be concluded that the
gene(s) which are responsible for the
disease are located close on the
chromosome to these markers.
Basic Concepts
•
•
•
•
Locus
Allele
Genotype
Phenotype
Dominant Vs. Recessive Allele
‫ צבע עיניים‬:‫דוגמא קלאסית‬
heterozygote
homozygote
(se)X-Linked Allele
Most human cells contain 46 chromosomes:
• 2 sex chromosomes (X,Y):
XY – in males.
XX – in females.
• 22 pairs of chromosomes named autosomes.
Around 1000 human alleles are found only on the
X chromosome.
“…the Y chromosome essentially is reproduced via
cloning from one generation to the next. This
prevents mutant Y chromosome genes from being
eliminated from male genetic lines. Subsequently,
most of the human Y chromosome now contains
genetic junk rather than genes.”
http://anthro.palomar.edu/biobasis/bio_3b.htm
Medical Perspective
When studying rare disorders, 4 general patterns
of inheritance are observed:
• Autosomal recessive (e.g., cystic fibrosis).
– Appears in both male and female children of unaffected parents.
• Autosomal dominant (e.g., Huntington disease).
– Affected males and females appear in each generation of the
pedigree.
– Affected parent transmits the phenotype to both male and female
children.
..Continued
• X-linked recessive (e.g., hemophilia).
– Many more males than females show the disorder.
– All daughters of an affected male are “carriers”.
– None of the sons of an affected male show the disorder or are
carriers.
• X-linked dominant.
– Affected males pass the disorder to all daughters but to none of
their sons.
– Affected heterozygous females married to unaffected males pass
the condition to half their sons and daughters.
Example
1
2
3
4
5
6
7
8
9
10
– After the disease is introduced into the family in generation #2, it
appears in every generation  dominant!
– Fathers do not transmit the phenotype to their sons 
X-linked!
Crossing Over
Sometimes in meiosis, homologous chromosomes exchange parts in
a process called crossing-over, or recombination.
Recombination Fraction
The probability  for a recombination
between two genes is a monotone, nonlinear function of the physical distance
between their loci on the chromosome.
(Linkage) 0    P(Recombination)  0.5 ( No Linkage)
Linkage
The further apart two genes on the same
chromosome are, the more it is likely that
a recombination between them will occur.
Two genes are called linked if the
recombination fraction between them is
small (<< 50% chance)
Linkage related Concepts
• Interference - A crossover in one region usually decreases
the probability of a crossover in an adjacent region.
• CentiMorgan (cM) - 1 cM is the distance between genes
for which the recombination frequency is 1%.
• Lod Score - a method to calculate linkage distances (to
determine the distance between genes).
Ultimate Goal: Linkage Mapping
With the following few minor problems:
– It’s impossible to make controlled crosses in
humans.
– Human progenies are rather small.
– The human genome is immense. The
distances between genes are large on
average.
Possible Solutions
• Make general assumptions:
Hardy-Weinberg Equilibrium – assumes certain probability
for a certain individual to have a certain genotype.
Linkage Equilibrium – assumes two alleles at different loci
are independent of each other.
• Incorporate those assumptions into
possible solutions:
Elston-Stewart method.
Lander-Green method.
Elston-Stewart method
• Input: A simple pedigree + phenotype
information about some of the people. These
people are called typed.
founder
leaf
1/2
• Simple pedigree – no cycles, single pair of
founders.
..Continued
• Output: the probability of the observed data,
given some probability model for the
transmission of alleles. Composed of:
founder probabilities - Hardy-Weinberg equilibrium
penetrance probabilities The probability of the phenotype, given the genotype
transmission probabilities the probability of a child having a certain genotype given the parents’
genotypes
..Continued
• Bottom-Up: sum conditioned probabilities
over all possible genotypes of the children
and only then on the possible genotypes
for the parents.
• Linear in the number of people.
Lander-Green method
• Computes the probability of marker
genotypes, given an inheritance vector.
P(Mi|Vi) at locus i
marker data at this
locus (evidence).
A certain inheritance
vector.
Main Idea
• Let a = (a1,…,a2f) be a vector of alleles assigned to
founders of the pedigree (f is the number of founders).
• We want a graph representation of the restrictions
imposed by the observed marker genotypes on the
vector a that can be assigned to the founder genes.
• The algorithm extracts only vectors a compatible with
the marker data.
• Pr[m|v] is obtained via a sum over all compatible vectors
a.
Example – marker data on a
pedigree
1
11
2
12
13
a/b
a/b
21
22
23
24
a/b
a/b
a/c
b/d
14
Example – Descent Graph
1
2
11
12
13
14
21
22
23
24
a/b
a/b
a/b
Descent Graph
1
2
(a,b)
a/b
a/c
3
4
(a,b)
(a,b)
b/d
5
6
(a,b)
(a,c)
7
8
(b,d)
Descent Graph
1
2
(a,b)
3
4
(a,b)
(a,b)
5
6
7
8
(a,b)
(a,c)
(b,d)
Assume that paternally inherited genes are on the left.
2. Assume that non-founders are placed in increasing order.
3. A ‘1’ (‘0’) is used to denote a paternally (maternally)
originated gene.
 The gene flow above corresponds to the inheritance
vector: v = ( 1,1; 0,0; 1,1; 1,1; 1,1; 0,0 )
1.
Example – Founder Graph
Descent Graph
1
2
3
4
5
7
(a,b)
(a,b)
6
(a,b)
(a,b)
(a,c)
(b,d)
Founder Graph
5
(a,b)
3
(a,b)
2
1
8
6
(b,d)
8
(a,b)
4
(a,c)
7
Find compatible allelic assignments
for non-singleton components
1.
Identify the set of compatible alleles for each vertex.
This is the intersection of the genotypes.
{a,b} ∩ {a,b} = {a,b}
5
(a,b)
3
(a,b)
2
1
{a,b} ∩ {b,d} = {b}
6
(b,d)
8
(a,b)
4
(a,c)
7
Possible Allelic Assignments
{a,b}
{a,b}
5
(a,b)
3
(a,b)
2
{a,b,c,d}
1
{a,b}
{b}
{a}
(a,b)
6
4
(a,c)
(b,d)
8
7
{b,d}
{a,c}
Graph Component
(2)
Allelic Assignments
(a), (b), (c), (d)
(1,3,5)
(4,6,7,8)
(a,b,a), (b,a,b)
(a,b,c,d)
Computing P(m|v)
• If for some component there are no possible allelic
assignments, then P(m|v) = 0.
• The probability of singleton components is 1  we can
ignore them.
• Let ahi be an element of a vector of alleles assigned to the
vertices of component Ci.
Pr[ahi ] 
 Pr[a
j
]
over 2f elements
hi
]
2 terms at most
{ j: jCi }
P r[Ci ] 
 P r[a
{h:ahi Ai }
P r[m | v] 
m
 P r[C ]
i
i 1
Linear in the number of founders
Latest News: SuperLink
• Combines the covered approaches in one
unified program.
• Has other built-in abilities that increase its
computations efficiency.
• Claimed to be more capable and faster
than other related programs (by its own
makers).
• http://bioinfo.cs.technion.ac.il/superlink/
The
End