Exome-seq Analysis
Download
Report
Transcript Exome-seq Analysis
Exome Sequencing as Molecular
Diagnostic Tool of Mendelian
Diseases
BIOS 6660
Hung-Chun (James) Yu
Shaikh Lab
04/28/2014
Human Genetic Diseases
Penetrance vs Frequency
Kaiser J. Science (2012) 338:1016-1017.
Human Genetic Diseases
Complex Disorder
•
•
•
•
Polygenic, many genes.
Low penetrance/effect size.
Multifactorial, environmental, dietary.
Examples: heart disease, diabetes, obesity,
autism, etc.
Mendelian Disorder
•
•
•
Monogenic or polygenic.
Full or high penetrance/effect size.
Examples: sickle cell anemia and cystic fibrosis.
Complex Diseases
Multiple causes, and polygenic.
Multiple genetics factors with low
penetrance individually.
Coronary artery disease
Coriell Institute for Medical Research.
https://cpmc1.coriell.org/genetic-education/diagnosis-versus-increased-risk
Mendelian Diseases
Veltman J.A. et al. Nat. Rev. Genet. (2012) 13:565-575.
Mendelian Diseases
Dominant Inheritance
U.S. National Library of Medicine. http://ghr.nlm.nih.gov/
Mendelian Diseases
Recessive Inheritance
U.S. National Library of Medicine. http://ghr.nlm.nih.gov/
Exome Sequencing
Bamshad, MJ., et al. Nat. Rev. Genet. (2011) 12:745-755.
Exome Sequencing
~40Mb (coding) or 60Mb (coding +
UTRs)
Mendelian Diseases Identified by
Exome Sequencing
Timeline
Gilissen C. et al., Genome Biol. (2011) 12:228.
Mendelian Diseases Identified by
Exome Sequencing
By mid-2012, ~100 genes identified.
By mid-2013, >150 genes identified.
Rabbani, B., et al. (2012) J. Hum. Genet. 57:621-632.
Types of Variation
What kind of variation/mutation can be
detected by Exome Sequencing?
•
•
•
SNV (single nucleotide variation)
Small InDel, (insertion/deletion of <25bp)
Large InDel, CNV (copy number variation)
•
Aneuploidy
•
Same as CNV
Translocation
•
Possible, but not reliable.
Possible, but not reliable. Limited.
Complex rearrangement
Not likely.
Exome Variants
SNV (single nucleotide variation)
•
•
Synonymous: (1) Silent.
Nonsynonymous: (1) Missense. (2)
Nonsense. (3) Stop-loss. (4) Start-gain. (5)
Start-loss. (6) Splice-site.
http://upload.wikimedia.org/wikipedia/c
ommons/6/69/Point_mutations-en.png
http://www.webbooks.com/MoBio/Free/Ch5A4.htm
Exome Variants
Small InDel (insertion/deletion <25bp)
Frameshift
• In-frame
•
NHGRI Digital Media Database (DMD), http://www.genome.gov/dmd/
Variant and Population Frequency
Novel/Private variant
•
Rare variant
•
Minor allele freq. (MAF) < 1%.
Polymorphic variant
•
Never been reported before.
MAF > 1% (0.01) or 5% (0.05).
Databases
dbSNP (NCBI): http://www.ncbi.nlm.nih.gov/SNP/
• 1000 Genomes: http://www.1000genomes.org/
• ESP (NHLBI): http://evs.gs.washington.edu/EVS/
•
Exome Variants
How to analyze enormous amount of
variants in any given exome?
Private/Novel
Protein altering
Coding + splice-site
All
Gilissen C. et al. Eur. J. Hum. Genet. (2012) 20:490-497.
~100 - 300
~4,000 - 15,000
~10,000 - 30,000
~20,000 - 200,000
Exome Variants
Bamshad, MJ., et al. Nat. Rev. Genet. (2011) 12:745-755.
Exome Analysis Strategies
Male
Female
Affected
Heterozygous
carrier
Sex-linked
heterozygous
carrier
Mating
Consanguineous
mating
Gilissen C. et al., Eur. J. Hum. Genet. (2012) 20:490-497.
Exome Analysis Strategies
Linkage
Large family with multiple
affected individuals
• Pathogenic variant co-segregate
with disorder.
•
Homozygosity
Affected patients from
consanguine parents.
• Homozygous mutation within a
homozygous stretch in the genome.
• Ideal for recessive disorders
•
Exome Analysis Strategies
Candidate genes
Biased approach
• Require current biological knowledge
• Good for screening or clinical diagnosis of known
disorders.
•
Overlap
Require multiple unrelated individuals with identical
disorders.
• Monogenic disorders
•
Exome Analysis Strategies
De novo
Sporadic mutation
• Germline mutation during meiosis
• Dominant inheritance
•
*
Exome Analysis Strategies
Double-hit
Unaffected parents are heterozygous carries
• Parental sequence info is very helpful
• Recessive inheritance.
•
Homozygous
Compound Heterozygous
*
#
*#
*
*
**
Trio-based Exome sequencing
Family trio
•
Unaffected parents and an affected patient.
Why we use trio? What can be tested using trio?
Advantages?
• Economical, efficient, single case required.
Trio-based Exome sequencing
Autosomal dominant
De novo
X-linked dominant
De novo
Autosomal recessive
X-linked recessive
Hemizygous in male
Male
Compound heterozygous
Homozygous
*
Female
Affected
Heterozygous
carrier
Sex-linked
heterozygous
carrier
XY
*
XY
XX
Trio-based Exome sequencing
Candidate Genes/Variants
Protein altering variants
• Rare or novel variants
• Variants that fit each inheritance model
•
Dominant
Recessive
Rare
Variant
Novel
Variant
De novo
0~1
0~1
Compound
Heterozygous
0 ~ 20
0~3
Homozygous
0 ~ 20
0~3
X-linked
0 ~ 10
0~5
Case 1
Clinical information
The patient was a 7-month-old boy when first evaluated. He
was diagnosed with BPES by a pediatric ophthalmologist. In
addition to blepharophimosis, ptosis, and epicanthus inversus
normally associated with BPES, he had cryptorchidism, right
hydrocele, wide-spaced nipples, and slight 2–3 syndactyly of
toes.
Clinical testing demonstrated a normal karyotype (46,XY),
and normal FISH studies for 22q11.2 deletion, Cri-du-Chat
(5p deletion) syndrome. Thyroid function was normal.
Further, normal 7-dehydrocholesterol level was used to rule
out Smith–Lemli–Opitz syndrome. Sanger sequencing and
highresolution CNV analysis with Affymetrix SNP 500K
arrays did not identify a FOXL2 mutation.
Case 1
A-D: 2-month old. note
blepharophimosis, ptosis, epicanthus
inversus (A), posteriorly angulated
ears with thickened superior helix and
prominent antihelix (B), and slight 2–3
syndactyly of toes in addition to
overlapping toes (C, D)
E-F: 3.5-year old. Following
oculoplastic surgery to correct ptosis;
note right-sided preauricular ear pit (F,
indicated by arrow).
G-I: 12-year old. Note the recurrence
of ptosis (L>R), arched eyebrows,
abnormal ears, thin upper lip
vermilion, small pointed chin,
downsloping shoulders, and widespaced and low-set nipples.
Case 2
Clinical information
The proband is a nine year old girl who presented with
microcephaly, unilateral retinal coloboma, bilateral optic
nerve hypoplasia, nystagmus, seizures, gastroesophageal reflux,
and developmental delay including not yet saying specific
words (at 29 months old).
On exam, she has microcephaly with a normal height, a
down-turned upper lip, and fingertip pads. A karyotype and
CGH analysis have been normal. Kabuki (KMT2D and
KDM6A) and Angelman (UBE3A and MECP2) syndromes
were suspected in this patient.
Case 2
Case 3
Clinical information
Case 3 was the result of a non-consanguineous union and he
presented to care at four months of age with a seizure
disorder, hypotonia and developmental delay. The patient
underwent a left parietal craniotomy and partial resection of
the frontal cortex without complete resolution of the
seizure disorder.
Initial laboratory studies included an elevated homocysteine
and methylmalonic acid and a normal vitamin B12 level.
Complementation analysis of the patient’s cell line placed the
patient into the cblC class. Sequencing and
deletion/duplication analysis (microarray) the MMACHC gene
was negative in both skin fibroblasts and peripheral blood.
Case 3
Feature
Combined methylmalonic aciduria and homocystinuria.
Severe developmental delay, infantile spasms, gyral cortical
malformation, microcephaly, chorea, undescended testes,
megacolon
Case 3
Monster Max
http://www.maxwatson.org/
Patient's older
sister as a summer
student in Shaikh
Lab
Data for Case Study
3 trios
•
•
VCF files
•
•
A total of 3 families/cases.
Each family/case includes unaffected parents and an
affected patient.
Familial variants calls in VCF format, mapped to
human GRCh37/hg19.
2x90bp paired-end reads, with ~50X coverage
“Mini” Exome
•
•
100 genes with/without known disorder association.
Validated causative genes, plus randomly selected
genes.
Exome NGS Workflow
FASTQ
2x90bp
BCF
Filter based on Phred
score, mapping quality, read
depth, etc.
SAM
Filter unpaired, unmapped
reads
VCF
BAM
?
Filter PCR duplicates
artifact
BWA
(Burrows-Wheeler Aligner)
SAMtools
VCF Format
VCF (Variant Call Format)
http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41
## Meta-information lines
FILTER, INFO, FORMAT
# Header line
VCF Format
INFO
AA : ancestral allele
AC : allele count in genotypes, for each ALT allele, in the same order as listed
AF : allele frequency for each ALT allele in the same order as listed: use this when estimated from primary data,
not called genotypes
AN : total number of alleles in called genotypes
BQ : RMS base quality at this position
CIGAR : cigar string describing how to align an alternate allele to the reference allele
DB : dbSNP membership
DP : combined depth across samples, e.g. DP=154
END : end position of the variant described in this record (for use with symbolic alleles)
H2 : membership in hapmap2
H3 : membership in hapmap3
MQ : RMS mapping quality, e.g. MQ=52
MQ0 : Number of MAPQ == 0 reads covering this record
NS : Number of samples with data
SB : strand bias at this position
SOMATIC : indicates that the record is a somatic mutation, for cancer genomics
VALIDATED : validated by follow-up experiment
1000G : membership in 1000 Genomes
VCF Format
FORMAT
GT: Genoetype.
0/0: Homozygous normal
0/1: Heterozygous variant
1/1: Homozygous variant
PL: the Phred-scaled genotype likelihoods (>0).
0/0
0/1
1/1
174
,0
,178
GQ : Genotype quality (1-99)
Question ?