HapMap PROJECT - Faculty of Science at Bilkent University
Download
Report
Transcript HapMap PROJECT - Faculty of Science at Bilkent University
HapMap PROJECT
Basics
HapMap
• The International HapMap Project is
analyzing DNA from populations with
African, Asian, and European ancestry
Multiple Populations
• The DNA samples for the HapMap have come
from a total of 270 people.
– The Yoruba people of Ibadan, Nigeria, provided 30
sets of samples from two parents and an adult child
(each such set is called a trio).
– In Japan, 45 unrelated individuals from the Tokyo
area provided samples.
– In China, 45 unrelated individuals from Beijing
provided samples.
– Thirty U.S. trios provided samples, which were
collected in 1980 from U.S. residents with northern
and western European ancestry by the Centre
d'Etude du Polymorphisme Humain (CEPH).
Methods
• The blood samples are being converted into cell
lines, DNA extracted.
• The samples and cell lines are not linked to any
individual in the populations studied. However,
the samples and cell lines are identified as
coming from one of the four populations
participating in the study, which raises ethical
issues associated with conducting genetic
research in named populations.
SNP Nomenclature
• http://snp500cancer.nci.nih.gov/terms_snp
_region.cfm
Hardy Weinberg Test
• http://innateimmunity.net/IIPGA2/Bioinform
atics/exacthweform
IIPGA
Exact HWE
Fishers Exact Test
Fishers Exact Test
Homework
http://www.hsph.harvard.edu/bioinfocore/Documents/Talk%20slid
es/Bioinfo_training_August_10_05_tutorial_Niu_T.pdf
SNPcutter
http://bioinfo.bsd.uchicago.edu/SNP_cutter.htm
SNP and Cancer
• A SNP is defined as a genomic locus
where two or more alternative bases occur
with appreciable frequency (>1%).
• Occurs every several hundred bases.
• Whole genome SNP analysis is possible.
Applications
• Direct Association Analysis:
– Test association between putative functional
variants and disease risk.
• Evaluation of nonsynonymous SNPs or regulatory
polymorphisms = functional SNPs.
• Problem: there are not that many functional SNPs.
• Uncharacterized de novo mutations???
Examples
• 2 MMP9 nonsynonymous SNPs
associated with risk of lung cancer with
metastasis (Hu et al. 2005b)
• Coding polymorphisms within UGT1A7
predict response of colorectal patients to
capecitabine (Carlini et al. 2005).
• Functional MTHFR mutations linked to
several different cancers.
Direct Association
• Candidate gene or genomic region.
– Linkage analysis
– Expression array analysis
– Knowledge of development and physiology
– Comparative genomics
Tools
• PANTHER database- evolutionary
analysis of coding SNPs.
• SNPEffect-estimate likelihood that a
particular SNP is causing a functional
effect.
• SNPSeek->90 000 coding SNPs in the
exons of known genes
• SNP500Cancer – identification, validation,
and characterization of polymorphisms.
PANTHER
http://www.pantherdb.org/tools/csnpScoreForm.jsp
PANTHER
http://www.pantherdb.org/tools/csnpScoreForm.jsp
PANTHER
ABCA1
PolyPhen
• http://genetics.bwh.harvard.edu/pph/
PolyPhen
• http://genetics.bwh.harvard.edu/pph/
SNPEffect
http://snpeffect.vib.be/search.php
SNPSeek
Search for BRCA1
Search for BRCA1
Search for BRCA1
Search for BRCA1
SNP500 Cancer Database
SNP500 vs HDP
Test if SNP500 and HDP differ
Do subpopulations differ?
Compare Caucasion vs African
Compare Caucasian vs Hispanic
Test whether in HWE
HWE
HWE
TDT
TDT
HapMap
• Polymorphisms identified by HapMap are
likely to be neural in phenotypic effect but
can inform on nearby alleles that might
play a role in disease.
Haplotype
• SNP alleles tend to be correlated together
in a predictable way-known as haplotype.
– The linear, LD ordered arrangement of alleles
on a chromosome
• The correlation between SNPs is mediated
by linkage disequilibrium (LD).
– LD exists when alleles at distinctive loci occur
together more frequently than expected given
the known allele frequencies and
recombination fraction between the loci.
Disease allele and haplotypes
• In the presence of LD, polymorphisms that
are in physical proximity to a causal
polymorphism will show a difference
between cases and controls.
HapMap
• Three phases:I, II, III
• I: completed in October 2005-genotyping
of 1M SNPs at average spacing of 5kb. An
additional SNP finding in 48 samples from
original populations across 10 specific
500kb ENCODE regions (represent a
genome wide rage of evolutionary
conservation and gene density). Later this
was extended to 269 samples.
HapMap
• Phase II: 269 samples, 2.9 M SNPs were
genotyped, a total of 3.9 M.
• Phase III: other populations will be added.
Results of Phase I and Phase II
• Intensity of SNP data across ENCODE
regions 1SNP/279 bp.
• Intensity of phase II Hapmap 1SNP/kb
Robust measures of LD
• D’ and r2 are the two major measures of
LD.
• D’, if two SNPs have not been separated
by recombination during the history of the
sample D’ is 1.
• R2 is the correlation between two SNPs;
when two SNPs always observed together
r2 is 1. Generally is a better measure.
Linkage Studies
• Family-based approaches to identify a disease
gene.
• A disease gene segregates in a family, genomic
markers in close proximity to the disease will
segregate in the same manner due to lack of
recombination.
– Identify families with disease; genotype each
individual.
– Compare the marker allele and disease distributions
within the family. Assign a LOD score.
Linkage studies
• Genome wide scans for linkage analysis
performed using several hundred
microsatellites at a 10cM density
throughout genome.
• SNP-based linkage studies use a panel of
10000 SNPs.
Examples
• Multiple sclerosis
• Neonatal diabetes
• Familial glucocorticoid deficiency.