Transcript Slide 1

.
Applications of Homozygosity Haplotype in the Study of Human Genetic Diseases
with High Density SNP Genotype
Haiyan
1
Jiang ,Mark
1,2
1
Samuels ,Duane Guernsey ,Andrew
1,3
Orr
Departments of 1Pathology, and 3Ophthalmology and Visual Sciences, Dalhousie University, Halifax, NS Canada
2Department of Medicine, University of Montreal, Montreal, QC Canada
3. Application to the screening of known causative genes
Background
Assuming that patients who have inherited the disease susceptibility gene from a common ancestor also share
haplotype in the genomic interval, HH approach can be applied to screen the known causative genes or loci by
searching for the shared homozygosity haplotype around the gene. If patients do not share significant RCHH
around the known gene, then the gene can be excluded.
Simwalk2 multipoint linkage: Chomosome 1
10
6
LOD = 8.7
4
2
0
-2
-4
0
2
4
6
8
cM
10
12
14
Schnyder crystalline corneal dystrophy (SCCD) is a rare genetic disorder presenting with
opacification of the cornea caused by aberrant intracellular cholesterol storage and possible
systemic dyslipidemia. AMGGI ascertained a large family segregating SCCD from Nova Scotia.
Mapping analysis confirmed linkage of the Nova Scotia SCCD family to a previously published
locus at chromosome 1p34.
Impact of genotyping errors
It is difficult to determine genotyping errors when only a few affected individuals in a family are available to be
genotyped, an approach was developed to calculate the error possibility. First, replace the mismatched
compSNPs with concordant SNPs to create consistent homozygosity haplotype. Run Monte Carlo (MC) to
simulate genotyping errors with the selected error model and error ratio on the modified genotypes. Analyze
the distribution of the number of mismatched compSNPs created by simulated genotyping errors using Poisson
distribution. Calculate the possibility of getting N mismatched compSNPs introduced by genotyping error.
Lincon and Lander error model
DNA resequencing identified a heterozygous
segregating missense variant in the gene UBIAD1 at
chr1:11,255,866-11,268,929 in the NS family, and four
other missense variants in four additional small
families ascertained nationally and internationally
with SCCD.
The presumptive pathogenic variants, at positions
102, 112, 119, 175 and 232 are all in highly conserved
residues, and lie near each other in a 3-D model of
protein folding. UBIAD1 encodes a potential
prenyltransferase, and may participate in biosynthesis
or regulation of intracellular cholesterol trafficking.
Thus UBIAD1 is a potential novel therapeutic target
for treatment of hyperlipidemia, a significant risk
factor for cardiovascular disease.
1. Homozygosity Haplotype Method
An HH is a haplotype described by
only homozygous SNPs and is obtained
by the deletion of heterozygous SNPs.
ped 1
ped 115
8
LOD
In a large family with a specific disease, patients usually share the identity-by-descent (IBD) haplotype
linked to the disease susceptibility genes. Although many haplotype analysis methods have been
developed to detect the shared interval, it is currently still very difficult to reconstruct the haplotype on a
genome-wide basis. A non-parametric method Homozygosity Haplotype (HH) was proposed recently for
the genome-wide search of the shared autosomal segment with high density SNP genotype. Rather than
phasing the haplotype, HH utilizes a form of haplotype described by the homozygous SNPs only, which
allows HH to perform genome-wide search with high efficiency. The applicability and the effectiveness of
HH in identifying the candidate region of causative gene were studied with the Illumina 550k genotype
data of the affected members from a large family with Schnyder crystalline corneal dystrophy (SCCD, MIM
121800), a rare autosomal dominant disease. HH successfully detected the ~1Mb shared segment with a
minimum set of three samples. We proposed that HH can be applied to screen the known causative genes
or loci by searching for the shared homozygosity haplotype for patients who have inherited a susceptibility
gene from a common ancestor. A new strategy for the genome-wide screening of the known causative
genes or loci with high density SNP genotype data was developed, which has the potential to be used as an
efficient alternative approach other than sequencing or microsatellite-based fine mapping for the research
of genetic diseases and the clinical diagnosis.
D1S2667
Abstract
2. Use HH to identify the candidate loci for Schnyder crystalline corneal dystrophy
In which, O is the observed genotype of a SNP, G is
the underlying genotype, the genotype error rate is
E. The error rate is the same for all possible
underlying genotypes.
Lincoln SE, Lander ES. Genomics 1992; 14:604–610.
The P value, the possibility of getting N mismatched
compSNPs introduced by genotyping error, is
calculated with a Poisson distribution.
Results
The whole-genome screening approach was validated using a family with Myoclonus dystonia (MIM 159900).
The known causative genes are SGCE, DRD2, and DYT1. A published causative mutation c.304C>T (R102X) in
the SGCE gene has been detected in the affected family members by sequencing. HH was tested whether the
proposed screening approach can exclude non-causative genes correctly. Four patients from the family were
genotyped with Illumina HumanHap550 beadchips. HH was run to identify RCHHs shared by the four patients
with a cutoff 3.0 cM.
Genome-wide mapping of RCHHs shared by four patients from a Canadian family with Myoclonus dystonia
Taken from Orr et al. PLoS One 2: e685 (2007)
A compSNP is a SNP that is
homozygous in two subjects
Results
I. An RCHH at chr1:10,679,786-11,639,887 was identified by HH method with the genotype data of
10 patients.
DRD2: chr11:112,785,527-112,851,211
DYT1: chr9:131,616,072-131,626,199
SGCE: chr7:94,052,472-94,123,457
An RCHH is a run of compSNPs matched for
allelic type, the genetic length of which is
longer than the cutoff value. An RCHH is
bounded by either a mismatched compSNP(s)
or by the end(s) of an autosome.
HH program
1. Read in genotypes of all affected subjects
2. Delete heterozygous SNPs and generate HH
for each subject
3. Generate list of RCHHs for each pair of
subjects with a cutoff value
4. Determine the shared RCHH of multisubjects
Short arm
10 patients were genotyped with Illumina
550K bead chips. HH analyses were run
with cutoff 3.0 cM.
Long arm
RCHHs identified by HH method with the
550K genotypes of 10 patients
Short arm
Long arm
II. Minimal subset required to identify the interval
Results of genotyping error simulation
Region from a common ancestor (RCA)
Cutoff value selection
Sample selection: select distantly related individuals because they share less RCAs
Ratio of RCA to the total genetic length shared
by two descendants from a common ancestor. In
which, m, n are the number of generations
removed from a common ancestor of two
subjects
Region
DRD2
Chr11:111,851,211-113,785,527
DYT1
Chr9:130,626,199-132,616,072
Error Simulation
E=0.01
λ=8.98
P=0
λ=8.76
P=0
The two gene DRD2 and DYT1 can be excluded because no RCHH was detected around them. The
results of genotyping error simulations with P=0 suggest the genotype data are reliable.
An RCA is an autosomal region where subjects
share a chromosomal segment derived from a
common ancestor (i.e. IBD). The presence of
RCA is predicted through the RCHH.
The largest RCHH at chr7: 93,168,493-130,965,632 with size of 37 Mb includes gene SGCE
(chr7:94,052,472-94,123,457).
Miyazawa H, et. al. Homozygosity haplotype allows a genomewide search for the autosomal
segments shared among patients. Am J Hum Genet. 2007 Jun; 80(6):1090-102.
 Features of HH method
Non-parametric
High efficiency
Complexity O(n2), n: number of subjects
For Marfan syndrome, Affymetrix 500k SNP genotype, 9 subjects, the computational time is 6 s on
laptop.
Both dominant and recessive disease loci can be detected
HH analysis may provide an advantage when 6≤m+n≤ 50 (m, n are the number of generations
removed from a common ancestor of two subjects) where the haplotype analysis or the linkage
analysis are difficult to perform.
HH is well-suited to the local population in Atlantic region with m+n<20.
Gene
The study of Myoclonus dystonia demonstrated that the proposed screening approach excluded all
non-causative genes successfully. Besides, it identified the potential linkage of SGCE in the meanwhile.
Conclusions
Short arm
Long arm
Short arm
RCHHs identified by HH method with the
550K genotypes of patient 1351 and 1425
Long arm
HH successfully detected the ~1Mb shared
segment on Chr1 with genotypes of three
patients
Our study of HH approach with Illumina 550k SNP genotype data from a series of monogentic disease
projects demonstrates that HH method is very efficient and effective in identifying disease linked regions.
Based on the idea of homozygosity haplotype, we developed a new approach for the genome-wide
screening of the known causative genes or loci using high density SNP genotype data. The successful
application to a family with known causative mutation supports that the method has the potential to be
used as an efficient alternative approach other than sequencing or Microsatellite-based fine mapping for
the research and clinical diagnosis of genetic diseases.
July 2008, ISMB 2008