here - BC Bioinformatics
Download
Report
Transcript here - BC Bioinformatics
The medical relevance of
genome variability
Gabor T. Marth, D.Sc.
Department of Biology, Boston College
[email protected]
Lecture overview
1. Phenotypic effects caused by known
genetic variants
2. Genetic mapping to find genetic variants that
cause diseases – linkage analysis and association
studies
3. Genome-wide association mapping resources –
the HapMap
4. Structural and epigenetic variations in disease
1. Phenotypic effects caused by
known genetic variants
Many SNPs do have phenotypic effects
some notable genetic diseases:
cystic fibrosis
cycle-cell anemia
Badano and Katsanis, NRG 2002
Genetic variants in Pharmacogenetics
Evans and Rellig, Science 1999
Genetic variants in Pharmacogenetics
Evans and Rellig, Science 1999
Using genotype information in the
drug development pipeline
Roses. NRG 2004
Are all genetic variants functional?
~ 10 million known SNPs
SNPs, on the scale of the genome,
can be described well with the
“neutral theory” of sequence
variations the vast majority of
SNPs likely to have no functional
effects
0.4
0.3
0.2
0.1
0
16kb
16 kb
12 kb
12 kb
0.00
5.00
10.00
8kb
8 kb
15.00
20.00
25.00
4 kb
30.00
4 kb
35.00
40.00
How do we find the few functional variants in the background of
millions of non-functional SNPs?
2. Genetic mapping to find genetic
variants that cause diseases – linkage
analysis and association studies
Genetic mapping
Allelic association (linkage)
• allelic association is the nonrandom assortment between
alleles i.e. it measures how well
knowledge of the allele state at
one site permits prediction at
another
marker site
functional site
• significant allelic association between a marker and a
functional site permits localization (mapping) even without
having the functional site in our collection
• allelic association, and the use of genetic markers is the
basis for mapping functional alleles
Mendelian diseases have simple inheritance
genotype inheritance
genotype + phenotype inheritance
Linkage analysis compares the transmission of
marker genotype and phenotype in families
Complex disease – complex inheritance
Badano and Katsanis, NRG 2002
Allele frequency and relative risk
Brinkman et al. Nature Reviews Genetics advance online publication;
published online 14 March 2006 | doi:10.1038/nrg1828
Association study strategies
• region(s) interrogated: single gene, list of candidate genes (“candidate gene study”),
or entire genome (“genome scan”)
• direct or indirect:
causative variant
• single-SNP marker or multiSNP haplotype marker
• single-stage or multi-stage
marker that is co-inherited
with causative variant
causative variant
Association study strategies
for economy, one cannot genotype every SNP in thousands of clinical samples:
marker selection is the process where a subset of all available SNPs is chosen
1. hypothesis driven (i.e. based on gene function)
2. LD-driven – based entirely on the reduction of redundancy presented by the
linkage disequilibrium (LD) between SNPs; tags represent other SNPs they are
correlated with
causative variant
Marker selection depends on genome LD
Daly et al. NG 2001
Case-control association testing
• genotyping cases and controls at various polymorphisms
clinical cases
• searching for markers with
“significant” marker allele frequency
differences between cases and controls;
these marker signify regions of possible
causative alleles
AF(controls)
clinical controls
AF(cases)
3. Genome-wide association mapping
resources – the HapMap
The HapMap resource
• goal: to map out human allele and association structure
of at the kilobase scale
• deliverables: a set of physical and informational reagents
LD structure in four human populations
International HapMap Consortium, Nature 2005
LD varies across samples
there are large differences in LD
between different human populations…
European reference (CEU)
African reference (YRI)
… and even between samples from the
same population.
Other European samples
Sample-to-sample LD differences make tagSNP selection
problematic
groups of SNPs that are in LD in the
HapMap reference samples may not
be in a future set of clinical samples…
… and tags that were selected based
on LD in the HapMap may no longer
work (i.e. represent the SNPs they
were supposed to) in the clinical
samples…
… possibly resulting in missed disease
associations.
Marker selection with additional samples
test if markers selected from the HapMap continue to
“tag” other SNPs in their original LD group
Representative computational samples
Two methods of computational sample generation
Method 1. “Data-relevant Coalescent”. This
algorithm uses a population genetic model to
connect mutations in the HapMap reference
to mutations in future clinical samples. Full
model but computationally slow.
“HapMap”
HapMap
“cases”
“controls”
Method 2. The PAC method (product of
approximate conditionals, Li & Stephens).
This method constructs “new” samples as
mosaics of existing haplotypes, mimicking
the effects of recombination. An
approximation but fast.
LD difference -- comparison to extra experimental genotypes
• we have analyzed two extra genotype sets collected at the HapMap SNPs in
three genome regions, from our clinical collaborators (Prof. Thomas Hudson,
McGill; Prof. Stanley Nelson, UCLA)
0.949 +/- 0.013
0.963 +/- 0.014
0.978 +/- 0.010
Genome-wide scans for human diseases
SNPs in Complement Factor H (CFH)
gene are associated with Age-related
Macular Degeneration (AMD)
Klein et al, Science 2005
4. Somatic, structural and epigenetic
variants in disease
Somatic mutations
the detection of somatic mutations, and their
distinction from inherited polymorphism, is
important to separate pre-disposing variants
from mutations that occur during disease
progression e.g. in cancer
© Brian Stavely, Memorial University of Newfoundland
1. detect the mutations
2. classify whether somatic or inherited
Detecting somatic mutations with comparative data
• based on comparison of cancer and normal tissue from
the same individual
• often cancer tissue is highly heterogeneous and the
somatic mutant allele may represent at low allele
frequency
Detecting somatic mutations with subtraction
• if normal tissue samples are not available,
we detect SNPs in cancer tissue against
e.g. the human genome reference sequence
• search for evidence that these mutations are
genetic
• subtract apparent mutations that are present in sequence variation
databases
Detecting somatic mutations in murine mtDNA
• we have applied our methods for somatic mutation
detection in murine mitochondrial sequences
heteroplasmy
homoplasmy
• we will be applying our methods for
human nuclear DNA from our
collaborators
Structural variants in disease
Feuk et al. Nature Reviews Genetics 7, 85–97 (February 2006) | doi:10.1038/nrg1767
Structural variations and phenotype
Feuk et al. Nature Reviews Genetics 7, 85–97 (February 2006) | doi:10.1038/nrg1767
Epigenetics and cancer
Baylin at al. NRC 2006.
Informatics of detection / integration of varied
genetic and epigenetic data
somatic
mutations
chromosome
rearrangements
methylation
profiles
chromatin
structure
copy number
changes
gene
expression
profiles
repeat expansions