Genome variation informatics: SNP discovery, demographic

Download Report

Transcript Genome variation informatics: SNP discovery, demographic

Computational Tools for Finding and
Interpreting Genetic Variations
Gabor T. Marth
Department of Biology, Boston College
[email protected]
http://clavius.bc.edu/~marthlab/MarthLab
Sequence variations (polymorphisms)
A reference sequence of the human
genome is available…
… but every individual is
unique, and is different
from others at millions of
nucleotide locations
genetic polymorphisms
Our research interests
1. How to find genetic polymorphisms?
2. How to use variation data to track our
pre-historic past?
?
?
?
?
3. How to utilize polymorphism data for
medical research?
Tools for polymorphism discovery
SNP discovery in clonal sequences
P( SNP ) 

all var iable
P( S1 | R1 ) P( S N | RN )
 ...
 PPr ior ( S1 ,..., S N )
PPr ior ( S1 )
PPr ior ( S N )
P( Si1 | R1 ) P( SiN | R1 )
S
...
 ...
 PPr ior ( Si1 ,..., SiN )


PPr ior ( SiN )
Si1 [ A ,C ,G ,T ] SiN [ A ,C ,G ,T ] PPr ior ( S i1 )
Redevelopment and expansion
Homozygous C
Heterozygous C/T
Automated detection of
heterozygous positions in diploid
individual samples
Homozygous T
(visit Aaron Quinlan’s poster)
Redevelopment and expansion
Discovery of short deletions/insertions (both bi-allelic
and micro-satellite repeats)
Redevelopment and expansion
• Improve the detection of very rare alleles by taking into account
recent results in Population Genetics (i.e. a priori, rare alleles are
more frequent than common alleles)
• Developing a rigorous statistical framework both for heterozygote
polymorphisms and INDELs
• Calculating a probability value that a SNP found in one set of
samples will also be present in another
• Complete software rewrite
• Graphical User Interface (GUI)
• Ease of use for small laboratories without UNIX expertise
Genetic and epigenetic changes in cancer
We want to develop tools for detecting
inherited polymorphisms and somatic
mutations in a variety of new data types,
representing both genetic and epigenetic
changes
nucleotide changes, short
insertions / deletions
copy number changes,
chromosomal rearrangements
changes in DNA
methilation, histone
modification
Human pre-history
Demographic history
European data
African data
bottleneck
modest but
uninterrupted
expansion
Tools for Medical Genetics
The polymorphism
structure of
individuals follow
strong patterns
http://pga.gs.washington.edu/
The international HapMap project
However, the variation
structure observed in the
reference DNA samples…
… often does not match the
structure in another set of
samples such as those used in a
clinical case-control association
study aimed to find disease
genes and disease-causing
genetic variants
Tools to test sample-to-sample variability
Instead of genotyping additional
sets of (clinical) samples with
costly experimentation, and
comparing the variation structure
of these consecutive sets directly…
… we generate additional samples
with computational means, based
on our Population Genetic models
of demographic history. We then
use these samples to test the
efficacy of gene-mapping
approaches for clinical research.
Tools to test sample-to-sample variability
experimental
sample
r2 (4-site composite #2)
1
0.8
0.6
0.4
0.2
0
0
computational
sample
(visit Dr. Eric Tsung’s poster)
0.2
0.4
0.6
r2 (data)
0.8
1
Tools to connect genotype and clinical outcome
genetic marker (haplotype)
in genome regions of drug
metabolizing enzyme
(DME) genes
computational prediction
based on haplotype
structure
functional allele (known
metabolic polymorphism)
clinical endpoint
(adverse drug reaction)
molecular phenotype (drug
concentration measured in
blood plasma)
The Computational Genetics Lab
http://clavius.bc.edu/~marthlab/MarthLab