The slide of the talk
Download
Report
Transcript The slide of the talk
BIOBASE Training
Human Gene Mutation Database (HGMD®)
The only comprehensive source of data on human inherited
disease-associated mutations
Sample to Insight
A comprehensive source of mutation data
• Focus on peer-reviewed scientific
literature
• Experimental results are extracted by
highly trained genetic experts
• Content is updated 4x per year
Sample to Insight
More than 170,000 curated mutations
HGMD® Professional
Mutation Type
Spring 2015.2 Release
Number of Entries
Micro Lesions:
Missense / Nonsense
94860
Splicing
15476
Regulatory
3242
Small Deletions
25454
Small Insertions
10617
Small Indels
2436
Gross Lesions:
Sample to Insight
Repeat Variations
476
Gross Insertions / Duplication
3086
Complex Rearrangements
1638
Gross Deletions
12833
Total
170118
HGMD® advantages
HGMD® is the industry standard for:
• Identifying the known genetic causes of a given inherited
disease
• Understanding the mutational spectrum of a particular gene
• Verifying novel mutations
• Assessing individual disease risk
• Reducing time for literature review relating to a given inherited
disease
Sample to Insight
LRRK2
Mutation report for CM074929
Sample to Insight
Categorization of mutations & polymorphisms
DM = Disease causing (pathological) mutation
DM? = Likely disease causing (likely pathological) mutation
DP = Disease associated polymorphism
DFP = Disease associated polymorphism with additional supporting
functional evidence
FP = Polymorphism affecting the structure, function or expression of a
gene but with no disease association reported yet
Sample to Insight
PGMDTM
• Comprehensive
pharmacogenomic
database
• PGx/ADME
panels
• FDA and EMA
approved drugs
containing PGx
labels
Sample to Insight
Associations from 6500+ publications from
500+ journals studying >1400 drugs
PGMD: PharmacoGenomic Mutation Database
Facilitates
mapping of variants onto
genome at position or
genotype level
Associations from 6500+
publications from 500+
journals studying >1400
drugs
A/C
Sample to Insight
Genotype/haplotype specific
findings
• Median dose requirement of warfarin in
patients with CYP2C9*1/CYP2C9*3
haplotype is 2.6 mg
Statistical significance
• p-value - .001
• Relative Risk, Hazards Ratio, 95%
Confidence Interval when available
Study details (All studies are in
vivo)
• 22 cases with A/C genotype, 159 subjects
studied, Design - Clinical Trial
• Pop: European Continental Ancestry Group,
Age: 24-95, Treatment: All patients are
treated with 0.5 mg to 10 mg/day of warfarin
Types of evidence
Sample to Insight
Linkage Disequilibrium
HapMap D’, LOD, and R2 scores
Computed for all PGMD sites
Includes between non-PGMD sites
Sample to Insight
Allele frequencies
Major sources including:
EVS
1000 Genomes
HapMap
Sample to Insight
Delivery models
Online
Download
PGMD Web Interface
Subject specific
annotation via
Genome Trax
MySQL database
Custom
TSV
Pipeline
Integration
BED
GFF
Sample to Insight
Genome Trax™
Sample to Insight
NGS analysis pipeline
Sample to Insight
Genome Trax™
Track
HGMD® inherited disease mutations
HGMD® imputed mutations
Pharmacogenomic Variants
GWAS Catalogue
COSMIC somatic disease mutations
ClinVar
TRANSFAC® experimentally verified TFBS
ChIP-seq Transcription Factor Binding Sites
Predicted TF@DNase I hypersensitivity sites
miRNA gene sites
PTMs (Post-Translational Modifications)
PROTEOME ™ disease genes
PROTEOME ™ Drug target genes
PROTEOME ™ Pathway genes
HGMD® disease genes
SIFT &Polyphen predictions, conservation
EVS allele frequencies
Allele frequency from 1000 Genomes
dbSNP common SNPs
dbSNP
Sample to Insight
Release 2015.1
146,581
14,570
806,806
18,735
2,626,811
127,638
15,330
9,178,528
10,732,462
2,735
35,079
14,905
2,976
2,057
27,257
88,986,833
3,663,071
12,330,177
13,604,359
60,879,061
Disease
causing
variants
Regulatory
variants
Candidate
Genes
Function
prediction &
frequency
Over 190 million annotations total
Use it as you like it
Download
Flat files, MySQL dump
Use with genome browsers,
excel, tools, scripts,
ANNOVAR, CLC bio Workbenches, Alamut, Cartagenia…
Sample to Insight
HGMD – inherited mutations
Sample to Insight
HGMD imputed
HGMD
CAC (Histidine) changing to CAA (Glutamine) is causative for
disease X
CAC > CAG, leads to the same Histidine to Glutamine change
but would not be a match for the mutation
The HGMD equivalent track covers such cases
Sample to Insight
PGMD: PharmacoGenomic Mutation Database
Facilitates
mapping of variants onto
genome at position or
genotype level
Associations from 6500+
publications from 500+
journals studying >1400
drugs
A/C
Sample to Insight
Genotype/haplotype specific
findings
• Median dose requirement of warfarin in
patients with CYP2C9*1/CYP2C9*3
haplotype is 2.6 mg
Statistical significance
• p-value - .001
• Relative Risk, Hazards Ratio, 95%
Confidence Interval when available
Study details (All studies are in
vivo)
• 22 cases with A/C genotype, 159 subjects
studied, Design - Clinical Trial
• Pop: European Continental Ancestry Group,
Age: 24-95, Treatment: All patients are
treated with 0.5 mg to 10 mg/day of warfarin
ClinVar Variants
Version:
ClinVar-2015-02
Track Description:
This track contains data from the ClinVar. ClinVar is a public archive of reports that lists
relationship between human variations and phenotypes with supporting evidence. Thus ClinVar
facilitates access to and communication about the relationships asserted between human
variation and observed health status, and how interpretation of variation may change over time.
ClinVar collects reports of variants found in patient samples, assertions made regarding their
clinical significance, information about the submitter, and other supporting data. The alleles
described in the submissions are mapped to reference sequences, and reported according to the
HGVS standard.
Benefit:
This data set contains experimentally observed, clinically significant variants that are reviewed by
experts.
Filename: clinvar
Link-out base URL: http://preview.ncbi.nlm.nih.gov/clinvar/$$
Links to: An individual variant report in ClinVar site at NCBI.
Accession: ClinVar ID.
Feature:
HGVS description and the phenotype. For eg: NT_011109.15:g.14128514A>G:Diaphyseal
dysplasia;
Sample to Insight
COSMIC somatic disease mutations
Version: v71
Track Description:
This track contains data from the Catalogue of Somatic Mutations in Cancer (COSMIC).
COSMIC contains somatic mutation information relating to human cancers. The mutation data and
associated information is extracted from the primary literature and entered into the COSMIC
database. In order to provide a consistent view of the data a histology and tissue ontology has been
created and all mutations are mapped to a single version of each gene. A central aim of COSMIC is
to provide somatic mutation frequencies. This track contains SNPs, insertions and deletions from
COSMIC.
We include COSMIC mutations for which a chromosomal position can be determined. The
percentage of mutations with position is approximately 75%.
Benefit:
These somatic mutations complement the set of germ-line mutations from HGMD to allow for a
more comprehensive assessment of prior knowledge about observed mutations.
Filename: cosmic
Link-out base URL:
http://www.sanger.ac.uk/perl/genetics/CGP/cosmic?action=mut_summary&id=$$
Links to:
An individual mutation report in COSMIC site at the Welcome Trust Sanger Institute.
Accession: COSMIC Mutation ID.
Feature:
The histology and mutational change, eg "carcinoma:c.775G>T".
Sample to Insight
EVS Exome Variations
Version:
ESP6500
Track Description:
The EVS annotation source contains exome sequencing variants retrieved from the Exome
Variant Server (EVS) for NHLBI Exome Sequencing Project (ESP)1. The EVS data release
(ESP6500) The dataset is comprised of a set of 2203 African-Americans and 4300 EuropeanAmericans unrelated individuals, totaling 6503 samples (13,006 chromosomes).. All data were
simultaneously analyzed for exome variants at the University of Michigan (Abecasis Laboratory).
The methods used for analysis is explained in detail at http://evs.gs.washington.edu/EVS/
Benefit:
EVS provides the population based genotype, allele counts and MAF scores for the variations
observed in exome regions.
Filename:
evs
Accession:
a uniqe number identifying the EVS record. e.g. EVS2265387
Feature:
rsID and hgnc symbol of the gene eg. "rs138751118:C4orf21".
Sample to Insight
Orphanet (Beta)
Version:
02/18/2015
Track Description:
Orphanet is the reference portal for information on rare diseases and orphan drugs, for all
audiences. Orphanet's aim is to help improve the diagnosis, care and treatment of
patients with rare diseases.
Benefit:
Allows you to associate known patterns of inheritance (dominant, recessive) with rare
diseases and the genes implicated in them. Togehter with the observed zygosity, and the
disease causing mutations in HGMD, this can help you to focus only on dominant disease
causing variants, or on recessive disease causing variants that are homozygous in the
patient sample.
Filename:
Orpha
Accession:
The numerical part of the 'Orpha number‘, for example 79314 associated with the 'Orpha
number' ORPHA79314
Sample to Insight
GWAS Catalogue
Version:
02/17/2015
Track Description:
This track contains data from the GWAS Catalogue1. These are literature derived disease
associations for polymorphisms from GWAS studies that assayed at least 100,000 single
nucleotide polymorphisms, associations listed are limited to those with p-values < 1.0 x 10-5. The
dataset provides Odds Ratios for common variants that can be used to calculate increased or
decreased risk for the disease. A detailed description of the methods to assemble the dataset can
be found in Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, and
Manolio TA. Potentialetiologic and functional implications of genome-wide association loci for
human diseases and traits. Proc Natl Acad Sci USA. May 27, 2009., available
http://www.genome.gov/pages/about/od/newsandfeatures/pnasgwasonlinecatalog.pdf, and at the
GWAS Catalogue at www.genome.gov/gwastudies.
Benefit: These disease association data are manually curated, experimentally determined
associations from the scientific literature, mapped to coordinates. They allow you to identify
common SNPs that influence the risk for common diseases.
Filename: gwas
Link-out base URL: http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=$$
Links to: dbSNP record. As the GWAS catalog does not provide reports for the individual SNPs,
we link to dbSNP instead.
Accession: dbSNP rsid
Feature: The disease, risk allele, and odds-ratio or beta (denoted by OR or beta), e.g.
“Ovarian_cancer; rs2363956-T;1.1OR
Sample to Insight
dbNSFP Nonsynonymous functional predictions
Version:
version:v2.9
Track Description:
This track contains data from dbNSFP(Database for Non-synonymous SNPs Functional
Predictions)1. href="#fn4">4. dbNSFP is an integrated database of functional predictions from
multiple algorithms for the comprehensive collection of human non-synonymous SNPs (NSs).It
compiles prediction scores from four new and popular algorithms (SIFT, Polyphen2, LRT, and
MutationTaster), along with a conservation score (PhyloP) and other related information, for every
potential NS SNP in the human genome. More details about the methods of prediction is available
at http://www.ncbi.nlm.nih.gov/pubmed/21520341
Benefit:
This track also provides a calculated consensus prediction based on the results from different
prediction algorithms from dbNSFP data. The prediction of each NSs is accreted according to its
deleterious tendency ("Probably Deleterious", "Unknown", "Probably Harmless", "Harmless").
Filename:
dbnsfp
Accession:
Gene ID; eg: "85440"
Feature:
Aminoacid reference base > Aminoacid alternate reference base: Consensus prediction; eg: > N:
Probably Deleterious 50%.
Sample to Insight
TRANSFAC – gene regulation
Sample to Insight
PROTEOME – candidate genes
Sample to Insight
PROTEOME – disease genes & drugs
Sample to Insight
Trio dataset from clinical practice
Bloom Syndome
Our Patient
Autosomal recessive
Compound heterozygote
Short stature
Short stature
Facial Anomalies
Facial Anomalies
Skin hypo- and
hyperpigmentation
Skin hypo- and
hyperpigmentation
Feeding difficulties
Feeding difficulties
Mild intellectual disability
Severe intellectual disability
Cancer Predispostion
Cancer Predisposition
Frequent childhood
infections
No frequent infections
After 20 years, following Genome
Trax trio analysis finally able to be
diagnosed with
BLOOM SYNDROME
Sample to Insight
ANNOVAR Introduction
Stand-alone Application
Sample to Insight
32
Database preparation
ANNOVAR requires the annotation databases saved in local disk for annotating
genetic variants.
A simple command can be issued to download the database directly from the
internet (from UCSC browser, 1000 genome project or the ANNOVAR website).
annotate_variation.pl -downdb [optional arguments] <table-name> <output-directory-name>
Sample to Insight
33
Database preparation
Gene anno databases
gene / refgene / refGene
knowngene / knownGene
ensgene / ensGene
Filter databases
Region anno databases
• 1000g2012apr
• snp137
• snp135
•
•
•
•
Cytoband
tfbsConsSites
GenomicSuperDups
omimGene
Sample to Insight
34
Database download
Sample to Insight
35
Input files
ANNOVAR takes text-based input files, where each line corresponds to one
variant.
On each line, the first five space- or tab- delimited columns represent
chromosome
start position
end position
ref nucleotides
obs nucleotides
Sample to Insight
36
Profiling Breast Cancer variants – Input file
Isolate tumor specific variants by removing the germ line variants
This file, containing filtered results is used as input for gene based annotation
which extracts variants in the exonic, intronic, intergenic and other regions
Sample to Insight
37
Profiling Breast Cancer variants
This result file can be searched for specific, high risk genes such as TP53,
BRCA1 and BRCA2
Sample to Insight
38
Sample to Insight
39