Fast Elimination

Download Report

Transcript Fast Elimination

,
Linear-Time Reconstruction of Zero-Recombinant
Mendelian Inheritance on Pedigrees without
Mating Loops
Authors:
Lan Liu, Tao Jiang
Univ. California, Riverside USA
Outline




Introduction and problem definition
The linear system for ZRHC
A linear-time algorithm for Loop-free ZRHC
Conclusion
Pedigree

An example: British
Royal Family
Elizabeth II of
the United Kingdom
Diana,
Prince Charles,
Camilla,
Princess of Wales Prince of Wales Duchess of Cornwall
Prince William
of Wales
Prince Henry of
Wales
Prince Philip,
Duke of Edinburgh
Captain
Commander
Princess Anne,
Mark Phillips Princess Royal Timothy Laurence
Peter Phillips
Zara Phillips
Sarah
Prince Andrew,
Duke of York Margaret Ferguson
Princess
Beatrice of York
Princess
Eugenie of York
Prince Edward,
Earl of Wessex
Sophie
Rhys-Jones
Lady Louise
Windsor
Biological Background

Basic concepts
Genotype
Haplotype
2
2 Locus
2
1
1
2
1
1
1
2
paternal

Mendelian Law: one haplotype
comes from the father and the other comes
from the mother.
maternal
11 22: homozygous
12:
heterozgyous
1|2: ps-value 0
2|1 : ps-value 1
Example: Mendelian experiment
Notations and Recombinant
1
1
2
2
2
2
2
2
Genotype
1
2
2
2
2
1
2
2
Haplotype
Configuration
1
1
1
1
2
2
2
2
2
2
2
2
Father
2
2
2
2
Mother
1
1
1
1
2
2
2
2
Child
0 recombinant
1
1
1
1
2
2
2
2
2
2
2
2
Father
2
2
2
2
Mother
:
recombinant
1
1
2
2
2
2
2
2
Child
1 recombinant
Haplotype Configuration Reconstruction

Haplotypes: useful, but expensive to obtain
Genotypes: not so informative, but cheaper to obtain


In biological application, genotypes instead of haplotypes
are collected.
How to reconstruct haplotype from genotype?
recombination-free assumption
1
1
2
2
1
1
1
1
2
2
2
2
1
1
2
2
1
1
1
1
2
2
(a)
2
2
1
2
2
1
1
1
1
1
(b)
2
2
2
2
The Loop-free ZRHC problem

Problem definition
Given a loop-free pedigree and the genotype
information for each member, find a recombinationfree haplotype configuration for each member that
obeys the Mendelian law of inheritance.
Solutions to the ZRHC problem


A particular solution: any numerical assignment
A general solution: the span of a basis in the
solution space to its associated homogeneous system,
offset from the origin by a vector, namely by any
particular solution.
An Example
1
1
2
2
1
1
1
1

0 1 2
0 1 2
2
2
1
1
2
2
Input genotype

x+z+w
x
y
y+z+w
x+z
y+z

1
1
A general solution
0
1
2 0
2 0
2 0
2 0
0: 1 | 2
1: 2 | 1
A general solution
1
2
2
1
0
1
2
1
1
2
2
1
1
2
1
0
x=0
y=1
z=0
w=1
Previous Work and Our Progress
ZRHC
Loop-fee
ZRHC
Li and Jiang introduced a system of linear equations
over F[2] and presented an O(m3n3) time algorithm for
ZRHC [LJ03]
Xiao et al. present a much faster algorithm for ZRHC
with running time O(mn2+n3 log2n log log n) to generate
a general solution and O(mn+n3log2n log log n) to
produce a particular solution. [XLX+07]
Xiao et al. ’ s algorithm has running time O(mn2+n3) to
produce a general solution and O(mn+n3) to generate a
particular solution. [XLX+07]
Chan et al. proposed a linear-time (i.e. O(mn) time)
algorithm to find a particular solution. [CCC+06]
We present a novel algorithm with running time O(mn2)
to produce a general solution and O(mn) to generate a
particular solution.
In pedigree
 m : #loci
 n: #members
Related work



Methods based on fast matrix multiplication algorithms
could achieve an asymptotic speed of O(k2.376) on k
equations with k unknowns
The Lanczos and conjugate gradient algorithms are
only heuristics [GV96].
The Wiedeman algorithm has expected quadratic
running time [W86]
Outline




Introduction and problem definition
The linear system for ZRHC
A linear-time algorithm for Loop-free ZRHC
Conclusion
The New Linear System

n, m


m : #loci
n: #members in pedigree
Unknowns

: the paternal haplotype vector of a member j.
: the scalar demonstrating inheritance info between a
parent j1 and a child j.

The New Linear System
j2
j1
0
1
0
0
1
1
0
1
0
0
0
0
j
0
0
0
1
1
1
0
1
pj1,2=1
pj1,3=0
0
1
1
1
j2
j1
Pj1,1
pj1,2
pj1,3
pj1,4
Pj1
hj1,j
Pj1,1 +1
pj1,2 +0
pj1,3 +0
pj1,4 +1
Pj1 +wj1
Pj2,1
pj2,2
pj2,3
pj2,4
Pj2
hj2,j
j
Pj,1
pj,2
pj,3
pj,4
Pj
Pj2,1 +0
pj2,2 +1
pj2,3 +1
pj2,4 +1
Pj2 +wj2
Pj,1 +1
pj,2 +1
pj,3 +0
pj,4 +0
Pj +wj
The Linear System
O  mn
 O(mn) equations on O(mn) unknowns.
 Given a homozygous locus i on a
member j (with a child j1), pj[i] and pj1[i]
are pre-determined.
O  mn
Ax=b
Pedigree Graph

A pedigree with genotype
12
22
11
12
12
1
12
2
12
Pedigree graph G
2
1
12
11

12
12
4
12
11
7
6
22
12
4
7
6
12
8
22
9
12
22
8
22
9
12
#edges · 2n
Locus Graph
 Locus graph Gi
Gi = (V, Ei), where Ei= {(k,j)| k is a parent of j, wk[i]=1}
12
22
1
2
1
?
2
1
h1,4
12
4
12
6
11
7
1
1
4
0
6
h6,8
12
Zero-weight
22
8
h8,9
1
9
(a) Genotype info
0
h4,9
9
(b) Locus graph
Example: Locus graph for the 3rd locus
8
7
:
An Observation
 For any path in a locus graph connecting two pre-determined
vertices, the summation of h-variables along the path is a
constant.
We can use paths to denote constraints!
 (proof sketch)
Assume the path
connecting two pre-determined vertices j0 and jk .
Pj0[i]
…
dj1, j2
hj1, j2
dj0, j1
hj0, j1
Pj1[i]
Pj2[i]
in locus graph Gi
djk-1, jk
hjk-1, jk
Pjk-1[i]
Pjk[i]
Pj0[i]+ hj0, j1 = Pj1[i] + dj0, j1
Pj1[i]+ hj1, j2 = Pj2[i] + dj1, j2
Pj2[i]+ hj2, j2 = Pj3[i] + dj2, j3
…
Pjk-1[i]+ hjk-1, jk= Pjk[i] + djk-1, jk
a constant
Examples of Linear Constraints
?
1
2
1
1
1
4
0
6
h6,8
h8,9
1
9
(a) 1st locus graph
h6,8 + h8,9= 1
0
8
7
Linear Constraints
Obviously, the linear constraints are necessary. We
can also show that these constraints are sufficient.
 Moreover, we can upper bound #constraints in each
locus graph as O(n), while the trivial analysis gives an
upper bound O(n2).
 Total #constraints = O(mn).

O  mn
O(n)
transformation
O  mn
Ax=b
O(mn)
Ax=b
The linear constraints only
contain h-variables
Outline




Introduction and problem definition
The linear equations for ZRHC
A linear-time algorithm for ZRHC
Conclusion
The Loop-free ZRHC-PHASE algorithm
Algorithm Loop-free ZRHC_PHASE
Traditional method
input: a pedigree G=(V,E) and genotype {gj}
 Solve h-variables and p-
output: a general solution of {pj}
begin
Step 1. Preprocessing
Step 2. Linear constraint generation on h-variables
Step 3. Solve h-variables by redundant equation
elimination and a novel mapping method
Step 4. Solve the p-variables by propagation from predetermined p-variables to others.
end
variables together
 O(mn) equations on O(mn)
unknowns: O(mn) p-variables
and O(n) h-variables.
Our method
 Solve h-variables and pvariables separately
 O(mn) linear equations on O(n)
h-variables.
Redundant Equation Elimination

An observation
j0
Given a path P = j0,…,jk, assume that there are
constraints among each pair of vertices.
j1

j2
jk
…
jk-2
jk-1
j0 ~ j2
j2 ~ jk-1
j0 ~ jk-1

Key lemma
Originally, there are O(k2) constraints. Notice that
they are not independent.

However, we can replace the original constraints
by an equivalent set of constraints with size O(k).

Remove the redundant equations
without solving them!
Given a set S of constraints on a tree pedigree T, we can
reduce S to an equivalent constraint set of size at most n in time
O(mn).
O  mn
O(n)
transformation
O  mn
Ax=b
O  mn
Ax=b
redundancy
elimination
O(n )
O(n)
Ax=b
Solving h-variables
In order to obtain a linear-time algorithm, we want to
avoid the Gaussian elimination method.
 An observation
Given a constraint along a path j0 , j1,…, jk-1 , jk

j0
j1
…
jk-1
jk
h j0 , j1 +hj1 , j2 + …+ h jk-1, j =
b
k
We can solve the constraint in the following way:

Assign the h-variables on edges (j0 , j1), (j1, j2), …, (jk-2, jk-1) arbitrarily.
Assign the h-variables on the last edge (jk-1, jk) as a fixed value to
satisfy the constraint: hj , j = hj0 , j1 + …+ h jk-2, j k-1+ b.
k-1 k

Solving h-variables Based on the Mapping f


We have constructed the infective mapping f : S -> E ,
where S is the constraint set and E is the edge set.
We solve h-variables as follows:


For each h-variable corresponding to an edge e not in f (S),
assign an arbitrary value.
For each h-variable corresponding to an edge e in f (S),
assign a fixed value based on the constraint f –1(e), such
that the constraint is satisfied.
1
2
3
: not in f(E)
: in f(E)
constraints
1
0
1
0
4
5
(1,2)
(2,4)
(2,5)
Mapping
sum of
h-variables
1
0
0
edge
(2,3)
(3,4)
(4,5)
h-variables can be
solved by a single
BFS Traversal.
Conclusion

We present an efficient algorithm for Loop-fee ZRHC
with running time O(mn) to generate a particular
solution and O(mn2) to generate a general solution .
Thanks for your time
and attention!