Development and Curation of a Universal Human Genomic Variant
Download
Report
Transcript Development and Curation of a Universal Human Genomic Variant
A Unified Clinical
Genomics Database
NHGRI - U41 Genomic Resource Grant
www.iccg.org
Variant Analysis for General Genome Report
3-5 million variants
Genes
~20,000 Coding/Splice Variants
Published as
Disease-Causing
20-40
“Pathogenic”
Variants
<1%
Rare CDS/Splice Variants
LOF in Disease
Associated Genes
30-50 Variants
Review evidence for
variant pathogenicity
Review evidence for
gene-disease association
and LOF role
Pharmacogenetics
5-10 Variants
Classification of Reported Pathogenic Variants
found in Human Genomes
Likely Path – 1%
Pathogenic – 2%
Benign
18%
Uncertain
significance
– 52%
Likely Benign
26%
U41 Genomic Resource Grant:
A Unified Clinical Genomics Database
To raise the quality of patient care by:
•
Standardizing the annotation and interpretation of
genomic variants
•
Sharing variant and case level data through a
centralized database for clinical and research use
•
Implementing an evidence-based expert consensus
process for curating genes and variant interpretations
Supporting data collection,
submission and curation
• Work with NCBI to design ClinVar to meet the needs of the community
• Develop data dictionary, ontologies, and work with standards bodies
• Define data submission and access policies for variant and case-level data
including genotypes and phenotypes
• Work with labs to solicit and support data submission
• Evidence-based curation of structural variants - (Riggs et al. 2012 )
• Evidence-based curation of sequence variants (ACMG Committee work in
progress)
• Develop a gene-centric resource to define the medical exome and provide
tools to support use in genomic medicine
• Work with vendors to improve reagents for genomic analysis (CMA, WES,
WGS)
NIH NCBI ClinVar
www.ncbi.nlm.nih.gov/clinvar
ClinVar Submitters
OMIM
Harvard Medical School and Partners Healthcare
InVitae Inc.
International Standards For Cytogenomic Arrays
GeneReviews
ARUP Laboratories
LabCorp
Sharing Clinical Reports Project
Finland Institute for Molecular Medicine
Tuberous Sclerosis Database
ClinSeq Project
Leiden Muscular Dystrophy Database
GeneDx
Emory Genetics Laboratory
American College of Medical Genetics and Genomics
Osteogenesis Imperfecta Database; University of Leicester
Ambry Genetics
Other laboratories (19)
Total
Variants
Genes
23524
6996
5526
4194
2913
1415
1391
902
840
431
425
220
205
48
23
15
10
52
49130
3077
155
4
46
287
6
140
2
39
1
35
10
3
13
1
3
1
25
3848
Sequencing Laboratories Which Have Agreed to Share Data
Alfred I Dupont Hospital for Children
All Children's Hospital St. Petersburg
Ambry Laboratories
ARUP
Athena Diagnostics
Baylor Medical Genetic Laboratories
Boston Children's Hospital
Boston University
Children's Hospital of Philadelphia
Children's Mercy Hospital, Kansas City
Cincinnati Children's Hospital
City of Hope Molecular Diagnostic Lab
CureCMD
Denver Genetic Laboratories
Detroit Medical Center
Emory University
Fullerton Genetics Laboratory
GeneDx
Cleveland Clinic
Greenwood Genetics
Harvard-Partners Lab for Molec. Medicine
Henry Ford Hospital
Huntington Medical Research Institutes
Illumina Clinical Services Lab
Indiana University/Perdue University
InSiGHT
LabCorp / Integrated Genetics / Correlagen
Masonic Medical Research Laboratory
Mayo Clinic
Mt. Sinai School of Medicine
Nationwide Children's Hospital
Nemours Biomolecular Core, Jefferson Medical
Oregon Health Sciences University
Providence Sacred Heart Medical Center
Quest Diagnostics
SickKids Molecular Genetic Laboratory
Transgenomics
University of Chicago
University of Michigan
University of Nebraska Medical Center
University of Oklahoma
University of Penn
University of Sydney
University of Washington
Women and Children's Hospital
Wayne State University School of Medicine
Yale University
Documenting arguments will improve the
evidence-based assessment of variants
U41/ClinVar pilot project
Comparison of three laboratories classifications for variants in 12 RASopathy genes:
BRAF, CBL, HRAS, KRAS, MAP2K1, MAP2K2, NRAS, PTPN11, RAF1, SHOC2, SOS1, SPRED1
Scope
Number of alleles
Total submitted to ClinVar
997
Multiple assertions
269
20% discrepant
53 discrepancies:
60% differ based upon likelihood (Benign vs LB, P vs LP)
34% differed VUS vs Likely Pathogenic/Likely Benign
6% differed VUS vs Pathogenic
Lab Classification Differences
84% differences were Lab A reporting a more
aggressive assertion (Pathogenic/Benign) than
Lab B/C (LP, LB, VUS)
16% of differences were Labs B/C reporting a
more aggressive assertion than Lab A
ACMG Lab QA Committee
on the
Interpretation of Sequence Variants
ACMG
Sue Richards (chair), Heidi Rehm (co-chair)
Sherri Bale, David Bick, Soma Das,
Wayne Grody, Madhuri Hegde, Elaine Spector
AMP
Julie Gastier-Foster, Elaine Lyon
CAP
Nazneen Aziz, Karl Voelkerding
12
Evidence supporting pathogenicity (check all that apply):
I. Stand-alone
□
Truncating variant (e.g. nonsense, frameshift, canonical +/-1,2 splice sites, initiation
codon) in a gene where loss of function is a known mechanism of disease1
□
Same amino acid change as a previously established pathogenic variant regardless of
nucleotide change2
II. Strong
□
De novo (paternity confirmed)3
□
Well-established in vitro or in vivo functional studies supportive of a deleterious effect
on the gene or gene product4
□
Case-control studies show a p value <0.01 for enrichment in cases6
III. Supporting
□
Located in a mutational hot spot and/or experimentally well-characterized functional
domain7
□
Variant occurs in a gene with high clinical specificity and sensitivity for a particular
phenotype and the proband has multiple, specific features of the disease8
□
Multiple lines of computational evidence support a deleterious effect on the gene or
gene product (conservation, evolutionary, splicing impact, etc)9
□
Type of variant fits known pathogenic variant spectrum for the disease10
□
Variant frequency in control data
Absent from controls in Exome Sequencing Project & 1000 Genomes, OR
Case-control studies show p value between 0.01-0.05 for enrichment in cases
(only applies if well-phenotyped populations are available) and frequency is below
highest general population minor allele frequency (MAF) expected for disease:6
General guidance: Autosomal dominant MAF <0.4%
General guidance: X-linked MAF <0.4% males
General guidance: Autosomal recessive MAF <1%
□
For recessive disorders, detected in trans with a pathogenic variant11
□
Assumed de novo, but without confirmation of paternity3
□
In-frame deletions/insertions in a non-repeat region or stop-loss variants12
□
Co-segregation with disease5
□
Novel missense change at an amino acid residue where a different missense change
determined to be pathogenic has been seen before2
5 Categories:
Pathogenic
Likely Pathogenic
Uncertain significance
Likely benign
Benign
Pathogenic = 1 stand-alone
OR 2 strong OR
1 strong + ≥3 supporting
Likely Pathogenic = 1
strong + 2 supporting
OR ≥4 supporting
Benign = 1 stand-alone
OR 2 strong OR
1 strong + ≥3 supporting
Likely benign = 1 strong + 2
supporting OR ≥4 supporting
Evidence supporting benign classification (check all that apply):
I. Stand-alone
□
For autosomal recessive: ≥1% MAF frequency6
□
For autosomal dominant: ≥0.4% or lower depending on disease frequency and
penetrance6
□
For X-linked: ≥0.4% or lower in males depending on disease frequency and penetrance6
□
Observed in a healthy adult individual for a recessive (homozygous), dominant
(heterozygous), or X-linked (hemizygous) disorder with full penetrance at an early age6
II. Strong
□
Well-established in-vitro or in vivo functional studies shows no deleterious effect on
protein function or splicing4
□
Observed in trans with a pathogenic variant for a fully penetrant dominant
gene/disorder11
□
Variant present in multiple mammalian species despite adjacent conservation9
III. Supporting
□
Located in a highly variable region without a known function7
□
Multiple lines of computational evidence suggest no impact on gene or gene product
(conservation, evolutionary, splicing impact, etc)9
□
Type of variant does not fit known pathogenic variant spectrum10
□
Case-control studies show comparable frequencies (e.g. p > 0.05)6
□
Variant in a dominant gene that does not segregate in a family5 or is found in a case with
an alternate cause of disease13
□
Observed in cis with a pathogenic variant11
*Variants should be classified as Uncertain Significance if other criteria are unmet
QC and Expert Concensus
Practice guidelines
Evidence-based review
Guideline
Expert Curation
ClinVar
Inter-laboratory
Multi-Source Curation
Intra-laboratory
Single-Source Curation
Large variant
datasets
Uncurated
dbSNP/dbVar
Curation - ClinVar
Analysis of LOF Variants - single genome
Rare LOFs
8
Reported
Common
33
82 LOF variants
below 5% MAF
from one case
Novel/Rare - 41
Pathogenic - 2
(Both AR
1 novel
1 known)
VUS – 1 (novel)
Update
database
Excluded
46
Weak gene
Not
to disease
Mendelian
association
14
10
False
Positives
13
False positive
Weak gene-disease
association
Non-Mendelian
LOF not disease
LOF
not a disease
mechanism
Mechanism - 2
Gene-centric resource
1. Define genes with medical relevance
2. Technical challenges
•
•
•
•
High GC
Pseudogenes/homologies
Repeat expansions
Common sites of structural variation
Initiated through collaboration
amongst CHOP, Emory, and
Harvard/Partners and Structural
Variant workgroup
3. Variant types (denote common vs rare types)
•
•
•
•
•
Sequence variants (substitutions, small indels)
• Loss-of-function vs. Gain-of-function
CNV – haploinsufficient vs. triplosensitive
Other structural changes (translocations, inversions, etc)
Imprinted loci
Repeat expansions
4. Medically relevant transcripts
5. Gene regions of pathogenic relevance
6. Patterns of inheritance (dominant, recessive, X-linked, mitochondrial, de novo, etc)
7. Phenotypes and evidence base for phenotype associations
8. Available approaches to define variant pathogenicity (assays, tools, etc)
9. Clinical utility measures
10. Clinical decision support opportunities
U41 - Working with Existing
Efforts
•
NCBI (ClinVar, dbSNP, dbVar, dbGaP, GTR) and EBI
•
NHGRI (CRVR, eMERGE, CSER, ROR), IRDiRC
•
Regulatory and Standards: ACMG, CAP, CDC, FDA, ASHG, AMP, CMGS, Global
Alliance
•
Locus Specific Databases (LSDBs – LOVD and non-LOVD)
•
InSiGHT, PharmGKB, MSeqDB, CFTR2, ENIGMA, etc
•
Human Variome Project and HGVS
•
PhenoDB (Ada Hamosh) and Human Phenotype Ontology (Peter Robinson)
•
OMIM (Ada Hamosh) and GeneReviews (Bonnie Pagon)
•
Patient Advocacy Groups (Genetic Alliance, Patient CrossRoads, UNIQUE,
Disease Specific Groups)
•
Industry partners (reagents, instruments, software, etc)
ClinGen: The Clinical Genome
Resource Program
Collaboration between:
• NHGRI U41 Grant
– PIs: Ledbetter (Geisinger), Martin (Geisinger),
Nussbaum (UCSF), Mitchell (Utah), Rehm
(Partners/Harvard)
• NHGRI U01 “Clinically Relevant Variant Resource”
Grants
– Grant 1 PIs: Bustamante (Stanford), Plon (Baylor)
– Grant 2 PIs: Berg (UNC), Ledbetter (Geisinger), Watson
(ACMG)
• NCBI
– ClinVar
ClinGen Delegation of
Responsibilities
Data
Collection
Curation
Structural
Variation
Variant Curation
– Clinical
Significance
Sequence
Variation
Gene-Variant
Pairs –
Actionability
Other Genomic
Data
Clinical Domain
Curation
Phenotype
Machine
Learning
Curation
IT/Biofx
Community
Data Extraction
Education
Data Analysis
ELSI/
Actionability
Data
Dissemination
Laboratory
Bioinformatics/IT
EHR Integration
Community
Patient Registry
U41
UNC
Geisinger
ACMG
U01
Stanford
Baylor
U01
ClinGen System Interactions
Private
Labs
Labs
Labs
(Genotypes &
Phenotypes)
Patient
Registries
Controlled Access
Public Access
LSDBs
dbGaP
OMIM
Medical
Lit
Case-level Data
Crowdsourced
Curation
Pharm
GKB
ClinVar
Variant-level Data
Data
Gene
Resource
(Medical Exome,
Actionability)
CNV
Curation
Tool (JIRA)
Population
Datasets
Expert
Curated
Variants
CoreDB
External Informatics
Activities Enabled
Application
Interface
Machine Learning
Algorithms
EHR Interface
Portal for the Public
Disease Area Curation Tool
Disease WGs
Clinical Domain WGs
Expert Curation of Genes and Variants by Clinical
Domain and Disease Area Workgroups
International Collaboration for
Clinical Genomics
– Over 190 institutional
members
– Over 2800 individual
members
Annual Conference June 10-12, 2014, Bethesda, MD
– Attendees include laboratory directors, physicians, genetic counselors,
researchers, parents, government employees, regulatory agency
representatives, and vendor partners
U41 Principal Investigators and
Workgroups
NIH U41 PIs: David Ledbetter (Geisinger), Christa Martin (Geisinger),
Joyce Mitchell (Utah), Robert Nussbaum (UCSF), Heidi Rehm (Harvard)
Sequence Variant Workgroup
Structural Variant Workgroup
Phenotyping Workgroup
Madhuri Hegde (co-chair, Emory)
Sherri Bale (co-chair, GeneDx)
Carlos Bustamante (Stanford)
Soma Das (U Chicago)
Matt Ferber (Mayo)
Birgit Funke (Harvard/MGH)
Marc Greenblat (UVM)
Elaine Lyon (ARUP)
Dona Maglott (NCBI)
Sharon Plon (Baylor)
Heidi Rehm (Harvard/Partners)
Avni Santani (CHOP)
Patrick Willems (Gendia)
Erik Thorland (co-chair, Mayo)
Swaroop Aradhya (co-chair, InVitae)
Deanna Church (NCBI)
Hutton Kearney (Fullerton)
Charles Lee (Jackson Labs)
Christa Martin (Emory)
Sarah South (ARUP)
Chad Shaw (Baylor)
Karin Wain (Utah)
David Miller (chair, Harvard)
Ada Hamosh (Hopkins)
Karen Eilbeck (Utah)
Monica Giovanni (Geisinger)
Robert Green (Harvard/BWH)
Mike Murray (Geisinger)
Robert Nussbaum (USCF)
Erin Riggs (Emory)
Peter Robinson (Berlin)
Steven Van Vooren (Cartagenia)
Patrick Willems (Gendia)
Engagement, Education and
Access Workgroup
Andy Faucett (chair, Geisinger)
Erin Riggs (Emory)
Danielle Metterville (Partners)
Genetic Counselors from
participating laboratories
Bioinformatics and IT Workgroup
Karen Eilbeck (co-chair) and Sandy Aronson (co-chair)
ARUP: Brendon O’Fallon; Cartagenia: Steven Van Vooren; Emory: Stuart Tinker; GeneDx: Rhonda Brandon, Lisa Vincent;
Mayo: Eric Klee; NCBI: Deanna Church, Jennifer Lee, Donna Maglott; George Riley; Partners Healthcare: Eugene Clark, Larry
Babb, Matt Varugheese; University of Chicago Teja Nelakuditi; Utah: Karen Eilbeck, Shawn Rynearson
Consultants
Les Biesecker, Johan den Dunnen, Robert Green, Ada Hamosh, Laird Jackson, Stephen Kingsmore,
Jim Ostell, Sue Richards, Peter Robinson, Lisa Salberg, Joan Scott, Sharon Terry