Exome Sequencing as Molecular Diagnostic Tool
Download
Report
Transcript Exome Sequencing as Molecular Diagnostic Tool
Exome Sequencing as Molecular
Diagnostic Tool of Mendelian
Diseases
BIOS 6660
Hung-Chun (James) Yu
Shaikh Lab, Department of Pediatrics, University of Colorado
Denver Genetic Laboratories, Children’s Hospital of Colorado
11/17/2015
Human Genetic Diseases
Kaiser J. Science (2012) 338:1016-1017.
Human Genetic Diseases
Complex Disorder
•
•
•
•
Polygenic, many genes.
Low penetrance/effect size.
Multifactorial, environmental, dietary.
Examples: heart diseases, diabetes, obesity,
autism, etc.
Mendelian Disorder
•
•
•
Monogenic (mostly).
Full or high penetrance/effect size.
Examples: sickle cell anemia (HBB) and cystic
fibrosis (CFTR).
Complex Diseases
Multiple causes, and polygenic.
Multiple genetics factors with low
penetrance individually.
Coronary artery disease
Coriell Institute for Medical Research.
https://cpmc1.coriell.org/genetic-education/diagnosis-versus-increased-risk
Mendelian Diseases
Veltman J.A. et al. Nat. Rev. Genet. (2012) 13:565-575.
Mendelian Diseases
Dominant Inheritance
U.S. National Library of Medicine. http://ghr.nlm.nih.gov/
Mendelian Diseases
Recessive Inheritance
U.S. National Library of Medicine. http://ghr.nlm.nih.gov/
Exome Sequencing
Focusing on exons or coding regions of genes
Exons
Complementary
Baits
Bamshad, MJ., et al. Nat. Rev. Genet. (2011) 12:745-755.
Exome Sequencing
3,000,000,000bp (3Gb) human genome
•
•
•
~45% repetitive sequence
~25% genic region
~2% exonic, coding region
20,000 – 30,000 human genes
•
•
•
3,000 – 5,000 disease genes
~4,000 human genetic diseases (OMIM)
114 medically actionable (treatable) genes
Michael O. Dorschner., et al. Am J Hum Genet. 2013 93: 631–640.
Exome Sequencing
Gene
Read
Coverage
Individual
Reads
~40Mb (coding) or
60Mb (coding + UTRs)
Mendelian Diseases Identified by
Exome Sequencing
Timeline
Gilissen C. et al., Genome Biol. (2011) 12:228.
Mendelian Diseases Identified by
Exome Sequencing
Kym M. Boycott et al. Nature Reviews Genetics (2013) 14:681–691
Types of Variation
What kind of variation/mutation can be
detected by Exome Sequencing?
•
•
•
SNV (single nucleotide variation)
Small InDel, (insertion/deletion <25bp)
Large InDel, CNV (copy number variation)
•
Aneuploidy (loss/gain of entire chromosome)
•
Possible.
Translocation
•
Possible, but not reliable.
Difficult and not reliable.
Complex rearrangement
Very difficult.
Exome Variants
SNV (single nucleotide variation)
•
•
Synonymous: (1) Silent.
Nonsynonymous: (1) Missense. (2) Nonsense. (3)
Stop-loss. (4) Start-gain. (5) Start-loss. (6) Splice-site.
http://upload.wikimedia.org/wikipedia/c
ommons/6/69/Point_mutations-en.png
http://www.webbooks.com/MoBio/Free/Ch5A4.htm
Exome Variants
Small InDel (insertion/deletion <25bp)
Frameshift
• In-frame
•
NHGRI Digital Media Database (DMD), http://www.genome.gov/dmd/
Variant and Population Frequency
Novel/Private variant
•
Rare variant
•
Never been reported before.
Minor allele freq. (MAF) < 1%.
Databases
•
•
•
•
dbSNP (NCBI): http://www.ncbi.nlm.nih.gov/SNP/
1000 Genomes: http://www.1000genomes.org/
ESP (NHLBI): http://evs.gs.washington.edu/EVS/
ExAC: http://evs.gs.washington.edu/EVS/
Exome Variants
How to analyze enormous amount of
variants in any given exome?
~100 - 300 Private/Novel
~500 - 2,000 Rare, MAF<1%
~4,000 - 15,000 Protein altering
~10,000 - 30,000
~20,000 - 200,000
Coding/splice-site
All
Gilissen C. et al. Eur. J. Hum. Genet. (2012) 20:490-497.
Exome Analysis Strategies
Male
Female
Affected
Heterozygous
carrier
Sex-linked
heterozygous
carrier
Mating
Consanguineous
mating
Sequenced
individual
Gilissen C. et al., Eur. J. Hum. Genet. (2012) 20:490-497.
Trio-based Exome sequencing
Family trio
•
Both unaffected parents and an affected patient.
Why using trio?
• Every inheritance model can be tested
• Economical, efficient, single case required.
• Access to samples.
Trio-based Exome sequencing
Autosomal dominant
de novo
Autosomal recessive
Homozygous
*
*
**
*
Gene
Male
Female
Affected
Heterozygous
carrier
Sex-linked
heterozygous
carrier
Autosomal recessive
Compound heterozygous
*
X-linked recessive
Hemizygous in male
*
*
**
XY
*
XY
XX
Trio-based Exome sequencing
Candidate Genes/Variants
Rare (~500-2,000) or novel (~100-300)
protein altering variants
• Plus, variants that fit inheritance model
•
Dominant
Recessive
Rare
Variant
Novel
Variant
de novo
0~2
0~2
Compound
Heterozygous
0 ~ 20
0~3
Homozygous
0 ~ 20
0~3
X-linked
0 ~ 10
0~5
Case 1
Clinical information
Case 1 was the result of a non-consanguineous union and he
presented to care at four months of age with a seizure
disorder, hypotonia and developmental delay. The patient
underwent a left parietal craniotomy and partial resection of
the frontal cortex without complete resolution of the
seizure disorder.
Initial laboratory studies included an elevated homocysteine
and methylmalonic acid and a normal vitamin B12 level.
Complementation analysis of the patient’s cell line placed the
patient into the cblC class. Severe developmental delay, infantile
spasms, gyral cortical malformation, microcephaly, chorea,
undescended testes, megacolon.
Sequencing and deletion/duplication analysis (microarray) the
MMACHC gene was negative in both skin fibroblasts and
peripheral blood.
Case 1
Case 1
9News Colorado: Student joins first-grade class via web (May 15, 2011)
http://archive.9news.com/news/local/article/198634/346/Student-joins-first-grade-class-via-web
Case 1
Monster Max
http://www.maxwatson.org/
Patient's older
sister as a summer
student in Shaikh
Lab
Case 2
Clinical information
The patient was a 7-month-old boy when first evaluated. He
was diagnosed with BPES by a pediatric ophthalmologist. In
addition to blepharophimosis, ptosis, and epicanthus inversus
normally associated with BPES, he had cryptorchidism, right
hydrocele, wide-spaced nipples, and slight 2–3 syndactyly of
toes.
Clinical testing demonstrated a normal karyotype (46,XY),
and normal FISH studies for 22q11.2 deletion, Cri-du-Chat
(5p deletion) syndrome. Thyroid function was normal.
Further, normal 7-dehydrocholesterol level was used to rule
out Smith–Lemli–Opitz syndrome. Sanger sequencing and
highresolution CNV analysis with Affymetrix SNP 500K
arrays did not identify a FOXL2 mutation.
Case 2
A-D: 2-month old. note
blepharophimosis, ptosis, epicanthus
inversus (A), posteriorly angulated
ears with thickened superior helix and
prominent antihelix (B), and slight 2–3
syndactyly of toes in addition to
overlapping toes (C, D)
E-F: 3.5-year old. Following
oculoplastic surgery to correct ptosis;
note right-sided preauricular ear pit (F,
indicated by arrow).
G-I: 12-year old. Note the recurrence
of ptosis (L>R), arched eyebrows,
abnormal ears, thin upper lip
vermilion, small pointed chin,
downsloping shoulders, and widespaced and low-set nipples.
Case 3
Clinical information
The proband is a nine year old girl who presented with
microcephaly, unilateral retinal coloboma, bilateral optic
nerve hypoplasia, nystagmus, seizures, gastroesophageal reflux,
and developmental delay including not yet saying specific
words (at 29 months old).
On exam, she has microcephaly with a normal height, a
down-turned upper lip, and fingertip pads. A karyotype and
CGH analysis have been normal. Kabuki (KMT2D and
KDM6A) and Angelman (UBE3A and MECP2) syndromes
were suspected in this patient.
Case 3
Exome NGS Workflow
Exome
Sequencing
Mapping and
variant detection
Variant
prioritization
Genomic DNA
Sequence read
processing
QC
Exome capture
Library
construction
QC
Mapping
SAM
Sequencing
QC
Annotation
(General)
Annotation
(In-house)
Exome Enrichment
Illumina Sequencer
BAM
Filtering
QC
Inheritance
test, candidate
genes, ect.
Variant calling
Galaxy/FASTX Toolkit
Galaxy/BWA
Galaxy/Samtool
Galaxy
Exome analysis Workflow (this class)
Variant
determination
FASTQ sequence
2x90bp (paired-end)
Mapping to
genome
SAM
BCF
Filter based on Phred
score, mapping quality, read
depth, etc.
Conversion
Conversion
VCF
BAM
QC: Filter duplicates, artifacts,
and unpaired or unmapped reads,
100 genes
“Mini” Exome
?
BWA
(Burrows-Wheeler Aligner)
SAMtools
Data for Case Study
3 trios
•
•
VCF files
•
•
A total of 3 families/cases.
Each family/case includes both unaffected parents and
an affected patient.
Generated from 2x90bp paired-end Exome sequence
reads, and at ~50X coverage
Reads mapped to human GRCh37/hg19 and then
familial variants calls made in VCF format
“Mini” Exome
•
•
100 genes with/without known disease association.
Validated causative genes and randomly selected
disease genes or non-disease genes.
VCF Format
Variant Call Format
http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41
## Meta-information lines
FILTER, INFO, FORMAT
# Header line
VCF Format
FORMAT
GT: Genoetype.
0/0: Homozygous normal
0/1: Heterozygous variant
1/1: Homozygous variant
PL: the Phred-scaled genotype likelihoods (>0).
0/0
0/1
1/1
174
,0
,178
GQ : Genotype quality (1-99)
Annotation Tools
Annotate variants with useful information
•
•
•
•
•
•
•
Mutation effect
Population frequency
Clinical association
Genomic sequence and protein domain
Pathogenicity prediction
Gene expression, protein interaction.
…..and many many more.
SeattleSeq:http://snp.gs.washington.edu/SeattleSeqAnnotation138/
VEP (Variant effect Predictor): http://uswest.ensembl.org/info/docs/tools/vep/
ANNOVAR: http://wannovar.usc.edu/
Question ?