PPT - Ruriko Yoshida

Download Report

Transcript PPT - Ruriko Yoshida

Nonparametric estimation of phylogenetic tree
distributions
Ruriko Yoshida
Finding outlier gene trees via Kernel
density estimation
 Here outlier gene tree is a gene tree with such events in genome
evolution as gene duplications, lateral gene transfer between species,
retention of ancestral polymorphisms by balancing selection, or
accelerated evolution by neofunctionalization.
 Using the estimated density over the tree space we say trees with small
probability as outliers.
 Choice of distances: path dierence dP, quartet distance dQ, RobinsonFoulds distance (or splits distance) dS, and matching splits distance dM.
Goals
 τ denotes all of tree space on n taxa (either with or without branch
lengths)
 Given tree estimates T = {t1, . . . , tn} for n genes across the genome
 Problem: Estimate distribution f from which “most” trees in T were
sampled
 Identify outliers in the distribution i.e., Estimate distribution f and a
subset Tout subset in T, assuming T - Tout was sampled from f
Kernel methods
 Regard trees as points in space, t 
(t) in RD for some D
(possibly infinite)
 Kernel is denoted K(t1, t2) which is the inner product
<
(t1), 
(t2)>
 Sometimes for statistics applications we assume
integration of K(t1, t2) over t2 = 1. We won’t assume this
here
 In kernel methods we work with K and T, which implicitly
means linear computations with 
(T) in RD
Vectorize a tree
Kernels
Greedy algorithm
Bandwidth
Bandwidth
Partition function
Example
Variations
 Kernel
 Uniform
 Gaussian
 Epanechnikov
 …
 Bandwidth
 Fixed to every data
 Variable according to data pattern
Fairy wren data set
Fairy wren data set
 There are four species: Red-backed fairy wren (RBFW); Whitewinged fairy wren (WWFW); Splendid Fairy Wren (SFW); and
Variegated Fairy Wren (VFW).
 Each species has up to four alleles (1a, 1b, 2a, 2b; the number
indicates the individual, with alleles a and b). The complete
genes have 16 sequences – 4 species, 4 allleles per species.
 total of 39 genes.
Results
Questions?
Thank you
for your attention!
Joint work with
P. Huggins, D. Haws and
G. Weyenberg