Genetics in Epidemiology - University of Pittsburgh

Download Report

Transcript Genetics in Epidemiology - University of Pittsburgh

Genetics in Epidemiology
Nazarbayev University
July 2012
Jan Dorman, PhD
University of Pittsburgh
Pittsburgh, PA, USA
[email protected]
Genetics in Epidemiology
• Is important because
– It focuses on heritable & non-modifiable
determinants of disease
– It allows examination of gene-gene & geneenvironment interactions
– It can contribute to personalized medicine
• Is being transformed because
– Human Genome Project is complete
– Genetic variation can be now examined across
the entire genome at a very low cost
– Contribution of GWAS has been enormous in
terms of identifying disease-susceptibility genes
Human Genome Project
• February 2010 marked the
10th anniversary of the
completion of the human
genome project
• Initial sequence was
finished early because of
advancements in genome
sequence technology
• Resulted in drastically
reduced labor & delivery
costs
Human Genome Sequencing Costs
• 2000 – Human Genome Project
– $3 billion
• 2007 – James Watson
– $2 million
• 2009 – Illumina & Helicos
– $50,000
• 2010 – Illumina HiSeq
– $10,000
• 2014 – Multiple companies
– $1,000
Genetics in Epidemiology
• Is there evidence of familial aggregation of
the disorder (phenotype)?
– Is a positive family history an independent risk
factor for the disorder?
• For many chronic disorders, a positive family
history is associated with odds ratios between 2-6
• Is there evidence of heritability?
– A heritability of 50% indicates that ~ ½ of the
variation in disease risk in a population is due to
genetics
J Intern Med 2008;263:16
Candidate Gene Approach
• Are there potential candidate genes?
– Genes that are selected based on known
biological, physiological, or functional relevance to
the phenotype under investigation
– Approach is limited by its reliance on existing
knowledge about the biology of disease
– Associations may be population-specific
• E.g., type 2 diabetes
– Genes encoding molecules known to primarily
influence pancreatic β-cell or insulin action
• ABCC8 (sulphonylurea receptor), INS, INSR, etc.
PLOS Bio 2003;1:41
Alternative Approach
• Genome-wide association studies (GWAS)
– Hypothesis: common genetic variants (>5%) ;
common diseases (traits)
• Limited number of variants, each with a small effect
• No a priori hypotheses
• Power to identify rare variants (1-5%) is limited
– First publication was in 2005
• Complement factor H & age-related macular
degeneration
– Require
• Large, well-characterized populations
• Genotyping across the entire genome
• Sophisticated data analysis – collaborate on this!!
Monogenetic vs. Common
Disorders
GWAS
• 2 tiered approach
– 1st tier: genotyping identifies the ‘discovery set’
– 2nd tier: discovery set genotyped in another
population
• Replication is a requirement for publication
– 3rd tier: rule out false positives & false negatives
• Requires consortia
• Possible because
– High-density genotype platforms
• By 2007 – chips contained 500,000 – 1,000,000
markers
– DNA samples were available from wellcharacterized epidemiological cohorts
GWAS Example
NEJM 2010; 362:166
GWAS Example
NEJM 2010; 362:166
GWAS Example
NEJM 2010; 362:166
GWAS
• Have identified novel gene-disease (trait)
associations
– Most alleles are common (>5%)
– Most have small effect sizes (OR ~1.5)
• Are providing insights into pathways of
complex diseases
Published Genome-Wide Associations
published for 249 traits
NHGRI GWA Catalog
www.genome.gov/GWAStudies
Genetics Review
Anatomy of the Cell
Chromosomes, Genes & DNA
• Somatic cells are diploid - 46 chromosomes
– 22 pairs autosomes; 1 pair sex chromosomes
• Each pair of autosomes is homologous
– Contains the same genes in the same order
– 1 is maternal, the other is paternal
• Chromosome are composed of
deoxyribonucleic acid (DNA)
– Genome contains 3 billion base pairs (haploid)
– ~1% encode proteins
• Genes are located on chromosomes
Human Karyogram
Figure of a
Chromosome
DNA Double Helix
Base Pairs of a Double Helix
T
C
A
G
Structure of a Gene
A gene is a functional unit that includes
introns, exons enhancer & promoter
sequences & untranslated sequences at the
5’ & 3’ ends
Transcription Results in mRNA
Primary Transcript
mRNA Processing
From Genes to Proteins via mRNA
• Proteins consist of 1+ polypeptide chains
• Polypeptides chains are made of amino acids
• There are 20 amino acids
– Their order in is determined by the mRNA
sequence read in triplet
• Genetic code
– 64 combinations of 3 bases called codons
– 3 are stop codons (UAA, UGA, UAG)
• Genetic code is degenerate
• Genetic code is universal
Genetic Code
mRNA Determines AA Sequence
Translation is Protein Synthesis
Post-Translation Modifications
Advancements in
Biotechnology
Original
Method for
DNA
Sequencing
Polymerase Chain Reaction (PCR)
• Revolutionized molecular genetics
• Exploits the in vivo processes of DNA
replication to copy short DNA fragments in
vitro within a few hours
• Exponential increase of target DNA
sequences
• Highly sensitive – need small amount of
template DNA
• DNA ‘photocopier’
PCR - Cycle 1
5’ A C G T T A C C G T G A A C G T C T T A 3’
Denaturation, ~30 seconds
H bonds dissolve at 95oC
3’ T G C A A T G G C A C T T G C A G A A T 5’
PCR - Cycle 1
5’ A C G T T A C C G T G A A C G T C T T A 3’
3’ C A G A AT 5’
Anneal primers, ~30 seconds at 35-65oC
Temperature determined by sequence / length
5’ A C G T T A 3’
3’ T G C A A T G G C A C T T G C A G A A T 5’
PCR - Cycle 1
5’ A C G T T A C C G T G A A C G T C T T A 3’
T G C A A T G G C A C T T G C A G A A T 5’
Extension of Primers, ~30 seconds at 70-75oC
Taq polymerase - thermostable
5’ A C G T T A C C G T G A A C G T C T T A
3’ T G C A A T G G C A C T T G C A G A A T 5’
Post-Genome Era
Human Genetic Variation
• Single nucleotide polymorphisms (SNPs)
• Tandem repeat Sequences
– Microsatellites (<8 bp)
– Minisatellites (VNTRs; 8-100 bp)
• Copy number variants (CNVs; 1Kb – 1Mb)
• Insertions – deletions (indels; 100bp – 1Kb)
• Note: size limitations are arbitrary – no
biological basis & definitions are not
consistent across studies
SNPs
•
•
•
•
~10 million SNPs in human genome & counting
Most common type of genetic variation
2 alleles; e.g., A → T
Occurs across the entire genome & in stable
regions
• Many SNPs are in linkage disequilibrium
– SNPs close together are more likely to travel together
in a block than SNPs far apart
– Can use 1 ‘tagging’ SNP per block – cost effective
Linkage Disequilibrium
Haplotype Block
NEJM 2007; 356:1094
SNPs ‘Tag’ Haplotype Blocks
NEJM 2007; 356:1094
International HapMap
• Emerged as next logical step after
sequencing human genome
• Goal was to create a public genome-wide
database of common genetic variants
• Genotyped SNPs from 270 samples from:
– Nigeria, Utah, Han Chinese, Japanese
• Phase I
– Typed 1 million common SNPs (>5%) to
characterize LD patterns
• Phase II
– Typed 3 million rare SNPs (1-5%)
DNA Microarray
Used to genotype 500,000 – 1+ million SNPs
International HapMap
• Where are the SNPs?
–
–
–
–
–
12% occur in protein coding regions
8% occur in gene regulatory regions
40% occur in non-coding introns
40% occur in intergenic sequences
Regions of high linkage disequilibrium are similar
across populations
• HapMap was instrumental in facilitating
GWAS
Tandem Repeat Sequences
• 100,000+ TRSs in human genome
• Microsatellites (VNTRs)
– Repeat units (8 – 100 bp)
• Minisatellites
– Repeat units (2 – 8 bp)
– Eg., CAGCAGCAGCAGCAGCGACAG
• More than 200 diseases genes indentified
– E.g., Huntington’s disease, Fragile X syndrome
Copy Number Variants
• Size is 1 Kb to 1 Mb
– Duplications or deletions
• Less is known about CNV
– Term was introduced in 2004
• Are ubiquitous & reflect 12% of human genome
• May span multiple genes
• May change gene dosage or effect transcription
and translation
• Are creating a CNV map along with HapMap
• Associated with autism, schizophrenia, lupus,
Crohn’s disease, rheumatoid arthritis
Copy Number Variants
Indels
• Insertions & deletions
– Size 100 bp to 1 Kb
– Millions in genome
– Introduced in 2006
• Phenotype may depend on gene dosage
• May occur within genes or in promoter
• Also creating an indel map
Consequences of Genetic Variation
• No change
– In a non-coding region
– In a coding region - genetic code is degenerate
•
•
•
•
Change in 1 amino acid of a protein
Change in multiple amino acids of a protein
A truncated protein
Change in gene expression
– In a regulatory region or splice site
• Next generation GWAS will be based on
markers other than SNPs
– Tandem repeats, CNV, indels
Genetic Variation Databases
Database
Content
Address
dbSNP
SNPs covering the
human genome
http://www.ncbi.nlm.nih.
gov/projects/SNPs
HapMap
Catalog of variants
from HapMap Project
http://hapmap.org
1000 Genome Project
Extension of HapMap – www.1000genomes.org
aim to catalog 95% of
variants with 1% freq
UCSC Genome
Bioinformatics
Reference human
genome sequence with
annotation
http://genome.ucsc.edu
Ensembl
Genome browser,
annotation,
comparative genomics
http://www.ensembl.org
/index.html
Genetic Variation Databases
Database
Content
Address
GeneCards
Database of human
http://www.genecards.or
genes linked to relevant g
databases
PharmGKB
SNPs involved in drug
metabolism
DGV
Database of Genomic
http://projects.tcag.ca/var
Variants, including CNV iation
SCAN
SNP & CNV annotation http://www.scandb.org/n
based on gene function ewinterface
& expression
OMIM
Online Mendelian
Inheritance in Man –
over 12,000 genes
http://www.pharmgkb.org
http://www.ncbi.nlm.nih.g
ov/sites/entrez?db=omim
Genetic Variation Databases
Database
HuGE navigator
Content
Human genome
epidemiology
knowledge base
Address
http://hugenavigator.net/
HuGENavigator/home.do
Best Pract Res Clin Endo Metab, 2012. 26:119.
Collecting DNA
• Sources of DNA
–
–
–
–
Blood samples
Buccal brushes
Saliva samples
Dried blood spots
• Depends on
–
–
–
–
–
Conditions at time of collection
Resources available to process samples
What other biological samples will be collected
Long & short term storage
Quality control
Saliva vs. Blood Samples
• Considerations
–
–
–
–
–
–
Lower cost
More convenient & acceptable to patients
Increases compliance
Lower mean yield of DNA
But quality is comparable
No difference in success from high throughput
genotyping
Other Considerations
• Informed consent
What analysis can be performed now?
What analysis can be performed in the future?
Who has control of the specimen?
Do you need to ‘re-consent’ the participants due
to IRB changes?
- Will you inform participants of results?
-