Link to Powerpoint
Download
Report
Transcript Link to Powerpoint
Introduction to Genetics
Debashis Ghosh
Professor and Chair, Biostatistics and
Informatics, ColoradoSPH
Question we tackle today
• What do we mean by a gene?
• Steve Mount (ongenetics.blogspot.com):
“A gene is all of the DNA elements required in cis
for the properly regulated production of a set of
RNAs whose sequences overlap in the genome. ”
• Mark Gerstein (2007, Genome Biology):
“The gene is a union of genomic sequences
encoding a coherent set of potentially overlapping
functional products”
What is a gene?
• No ``one-size-fits-all” definition
• The previous definitions are useful to
contextualize data that are generated from
experiments
• Thinking carefully about evolution and the
constraints it has placed on functions is also
important
From Genotype to Phenotype
• Full genotypes (genomes) are coming…But
inheritance is complex
• Genetic markers are characters inherited in a
way that is simple enough to easily track
• Want to find genetic markers that explain or
predict phenotypes
– e.g., disease, susceptibility
– Ideally, the marker would be causative
• But that is rare
Alleles as Genes
• At each gene locus, we have two alleles, one
transmitted to us by our father, and one by
our mother.
• Usual assumption: Each parent randomly
transmits one of his/her alleles to the child
• For real datasets, this is identical to DNA
variants referred to as single-nucleotide
polymorphisms (SNPs)
Diploid Inheritance
From Mom
From Dad
Heterozygote
From Mom
From Dad
Homozygote
Phenotypic Dominance
From Mom
From Dad
Heterozygote
Light blue dominant
Dark blue recessive
Mixed Dominance
Dark blue dominant
Light blue recessive
Diploid Inheritance
Heterozygote
Homozygote
Dark Blue
Is Dominant
Recessive
Phenotype Only
Visible in
Homozygote
Mendelian Ratios
Recombination
From Grandma
From Grandpa
Chromosomal Segment in Mom
(she’s a diploid, remember)
From Mom
From Dad
Chromosomal Segment in You
(You’re diploid too)
Crossing Over
From Grandma
From Grandpa
Sister Chromatids Recombine (Cross
Over) During Meiosis
Inherited by You
Lost (Except in
Tetrad Analysis)
Products of Meiosis
Recombination: Basic Points
• Recombination switches which chromosome
in the parent (i.e., originating from which
grandparent) is passed along to the offspring
• Alleles physically adjacent on a chromosome
are more likely to be passed on together than
alleles far apart
• Alleles very far apart or on different
chromosomes are inherited randomly
Finding Disease Genes
•
•
•
•
Assemble data set of probands
Assemble data set of control population
Might have pedigree if runs in families
Might have trios to determine linkage
– Proband plus two parents
• Look for linkage between genetic markers and
disease
– In pedigree
– In dataset of less related individuals
Genetic Markers
• Polymorphic in population
– Different variants in different individuals
– Single Nucleotide Polymorphism (SNP)
– Variable Number of Tandem Repeats (VNTR)
• minisatellites
– Short Tandem Repeats (STR)
• Microsatellites
• Very high mutation rate: strand slippage
• Haplotype
– A set of closely linked SNPs inherited as unit
Linkage Analysis
• Set of variable markers distributed throughout
genome
• Identify linkage regions (haplotypes) that
cosegregate (are inherited) with disease or
trait
Pedigree Analysis
• Tabulate the occurrence of a trait in an
extended family
– Pedigree is family’s mating history
Assumptions and Complications
• Single gene with Mendelian inheritance
– Best use of extended families
– Few extended families with trait
• Quantitative traits are multigenic
– Includes most widespread or “common” inherited
diseases
– Sib pairs are best for complex traits with
incomplete penetrance (see next slide)
Incomplete Penetrance
• Not everyone with genotype will have the disease
–
–
–
–
Delayed or adult onset
Mild or undetectable symptoms
Environmental and developmental factors
Unknown genetic factors
• Disease allele = increase probability of disease,
relative risk
• We don’t always know in pedigree who has the
disease genotype!
Evaluating Linkage
• Remember, individual is a recombinant with
respect to two genes, A and B, if inherits the
allele from one parental chromatid at A and
inherits the allele from the other parental
chromatid at B
• The recombination fraction q AB is the
probability that a child is recombinant
• If A and B are tightly linked, then q AB is small
Simple LOD Scores
•
•
•
•
Total number of offspring, P
Number of recombinant offspring, R
P-R
R
Likelihood of the Data = qAB (1- qAB )
Maximum likelihood estimate
• LOD score for linkage in pedigree is
é L(D | q = qˆ ) ù
éqˆ R (1- qˆ )P-R ù
AB
AB
AB
AB
log10 ê
=
log
ú
ú
10 ê
P
1/ 2
ë L(D | q AB = 1/ 2) û
ë
û
Complications
• Need to know phase, genotypes of parents, to
identify recombinants
– Can estimate informativeness of additional data
depending on heterozygosity of markers
• Many disease versus marker comparisons are
involved
– Multiple comparisons
– But, markers are not independent
• Population structure
• LOD scores > 3 (1000:1) give general sense; >5
very strong
Population structure
• Genetic markers have different patterns in
different populations; this has the possibility
of confounding associations between genetic
markers with disease phenotypes.
Realistic Complications
• Include Penetrance(X|G)
– Likelihood of observing trait X given the genotype G
• Prior(G)
– Likelihood of observing the genotype in an individual
• Transmit(Gm|Gk,Gl, q)
– Probability that offpring will have genotype Gm given
parental genotypes Gk and Gl, and the recombination
parameter q
LOD Graph
• Can look at LOD score over a range of 's,
•
not just MLE.
Usual assumption is LOD > 3 is evidence for
linkage, LOD < -2 is evidence for exclusion
Example: 27 recombinant
Out of 139 gametes
(example from S. Purcell)
Recombination Probability and
Distance along Chromosome
• Recombination does not increase linearly
– Multiple recombination events possible over
greater distances, but also interference
• Can estimate genetic distance from
recombination rates
– Measure in Morgans, or cM
– c ABthe expected number of crossovers, is
additive
Mapping Functions
• Haldane’s mapping function
– Crossovers are assumed random and independent
c AB = - 12 ln(1- 2qAB )
• Kosambi’s mapping function
– Models interference: crossovers not too close
– Most popular
c AB = 1 4 ln[(1+ 2qAB ) (1- 2qAB )]
Genetic versus Physical
• Mapping is not simple
– Recombination rate varies along chromosomes
• Male versus Female
– Men 28.51M over whole genome
• 1.05 Mb/cM
– Women 42.96M (excluding X)
• 0.88 Mb/cM
• In Drosophila, about 0.4 Mb/cM
Modeling Penetrance
• Single locus, three genotypes
f DD = P(disease | DD)
f Dd = P(disease | Dd)
f dd = P(disease | dd)
• If
f DD = f Dd =1, f dd = 0
– Disease is Mendelian dominant
• If
f DD =1, f Dd = f dd = 0
– Disease is Mendelian recessive
• Spontaneous mutations:
• incomplete penetrance:
f dd > 0
f DD <1
Extending Analysis
• SNPs scattered throughout genome
– LOD scores for regions, not individual marker
• Multipoint linkage analysis
– Establish order relationship among 3+ markers
• Non-parametric analysis can be better for
complex traits, incomplete penetrance
– Work with affected siblings
– Less statistical power than model-based methods
• Identical by descent (IBD) versus chance
Non-Parametric
• Concerning siblings or other relatives
– Need “both affected” and “only one affected”
pairs
• Correlate shared IBD alleles with affected
state, proportion in two classes
– High correlation means linkage to disease
Mention
T1D
(Genomewide) Association Studies
• Correlate markers with disease over a large
population
• Marker may be disease (rare)
• Large regions of chromosome in linkage
disequilibrium with disease allele
– Marker is in disease gene haplotype
• Regions of chromosome tend to be inherited
as a unit
– Tapers off over time due to recombination
Association Studies
• Linkage disequilibrium varies among
populations
– Depends on population structure, age
• coalescent
– Europeans have a lot, African populations only a
little
– Population of human origin is more diverse, older
• Need dense, cheap markers over genome:
Genome Wide Association Studies (GWAS)
QTL and GWAS
• Quantitative Traits, polygenic traits that are
assumed to have additive effects
– Height, heart disease
– Quixotic Trait Loci?
• Each gene has a small effect
• Huge genotyping efforts now paying off
• BUT only a small fraction of genetic component is
accounted for even in huge studies
– Tradeoffs of including broader human population
Common Disease versus Rare Variants
• Common disease, common variants: The most
frequently occurring alleles/SNPs should
explain most of the etiology of a disease.
- Current studies do NOT show this to be
the case.
• Newer paradigm: rare variants
• - occur less frequently but have larger
associations with disease
Sullivan, Daly and
Donovan, Nature
Reviews
Genetics, 2012
• Different results in different populations
• Heritability
– What makes a gene matter to a disease?
– Take advantage of human phenotyping
– What genes CAN contribute to disease or
modification of disease?
• A golden age of personal genomics?
Acknowledgments
• David Pollock, Biochemistry and Molecular
Genetics