lecture_15-16(LP)

Download Report

Transcript lecture_15-16(LP)

Positional cloning of the Huntington’s
disease (HD) gene
Mapping and cloning of the HD gene
chromosome walking
cDNA libraries
Identifying the disease-causing mutations
Studies of the HD gene:
identifying orthologous proteins (BLAST)
mouse knockouts (KO’s)
transgenic mice
Summary of other repeat expansion diseases
Goals for the next three lectures…
-Try to fill in some gaps
-Strengthen the connections between topics
-Some new information:
protein similarity (probably today & Monday)
knockout mice (probably Monday)
population genetics (Monday?)
-Next Fridays lecture:
no more than 30 minutes of new material
course evaluations (~15-20 minutes)
review/problem solving/QS10
If I do spend time reviewing topics on Friday it
would be good to know what you need help with:
-No more than 1-2 topics (1-2 sentences)
-Send to: [email protected]
-Need to hear from you before Monday
Solutions to Problem set 6 have been posted on the course
website
Lastly: If you feel that an error was made in the grading of your
2nd midterm exam, send an email message to Anne Paul
summarizing the error, BY THE END OF THE DAY TODAY.
Huntington’s Disease
(reminder from lecture 13)
Huntington’s disease
- results
A dominant
geneticcell
disease; affects ~ 8 people
from nerve
per/100,000 worldwide
degeneration
in the
ganglia
- basal
Symptoms
include abnormal body movements
(chorea), cognitive decline, death
- Symptoms result from neurodegeneration
- Age of onset typically 40’s; ranges from infancy
to elderly
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
- Genetic anticipation (increasing disease severity
in subsequent generations) often observed
- No cure orHD
treatment
brain
Normal brain
Mapping of the Huntington’s disease gene
The informative pedigree:
• 5,000 related individuals from Venezuela segregating HD
•Included 100 members currently affected by HD
•Included >1,000 members with >25% risk
The markers used:
Few markers available so tested random, purified
fragments of human genome
Used these random fragments as probes to conduct
Southern blot analysis to identify RFLPs
On their 12th probe… the jackpot!
- linkage of the RFLP to HD!
1983
Marker ‘D4S10’ shows linkage to HD
1 = 3cM
max
Does this result
provide significant
evidence of linkage?
0.5
20
10
3 0
-2 0
-10
LOD
LOD score (Z)
Results of linkage
studies using the probe
“G8” which recognizes
the RFLP marker D4S10
40
30
-20
-0.5
-30
-40
-1
10
20
30
40
50

Where is marker ‘D4S10’ located?
FISH
Karyotype:
telomere
D4S10
(4p16)
centromere
?
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
How to tell?
TIFF (U
HD
are
gene
~3cM
away
from
D4S10
~3x106bp
1983
1992
Narrowing of the HD region
Looking for highly informative recombinants (haplotypes):
derived genotypes
telomere
D4S141
D4S115
D4S111
Y1P18
HD ?
R10
D4S98
D4S43
HD
D4S10
centromere
D4S141
D4S115
D4S111
Y1P18
R10
D4S98
D4S43
D4S10
1
C
B
1
1
2
3
B
2
A
C
2
2
3
0
A
2 1
C B
(B/C) B
1 1
1 1
3 3
3 5
A A
HD
Where are the
informative
recombinants?
D4S141
D4S115
D4S111
Y1P18
R10
D4S98
D4S43
D4S10
1
C
B
2
2
3
0
A
1
B
B
1
1
3
5
A
HD gene
likely to
reside here
What Next?
HD gene likely to reside here
(1/2)
A
(B/C)
2
2
3
3
B
1 2
C B
(B / C)
1 1
1 1
3 3
0 5
A C
D4S141
2 1
D4S115 A B
D4S111 (C / B)
Y1P18
2 1
R10
2 1
D4S98
3 3
D4S43
3 5
D4S10 B C
Genetic and physical map of the HD region
D
4
S
1
0
we know
the
sequence
here
centromere
D
4
S
1
8
0
HD ?
~500kb
D
4
S
1
8
2
and here
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
D
4
S
9
8
telomere
What is the DNA sequence
If 2008:
in this interval?
D4S180:
D4S182:
A portion
of the UCSC
Genome
browser
window:
use in a
AACTGACTTAA
CCTAGCTTAGAT
BLAST search
CCAACTGACTTAAGC…………………….AGCCTAGCTTAGATGC
We could also find the genes in this
QuickTime™ and a
interval using
the
UCSC browser
TIFF (LZW)
decompressor
are needed to see this picture.
But it was 1992…
Genetic and physical map of the HD region
D
4
S
1
0
we know
the
sequence
here
centromere
D
4
S
1
8
0
HD ?
D
4
S
1
8
2
and here
D
4
S
9
8
telomere
~500kb
What is the DNA sequence
If 2008:
in this interval?
D4S180:
D4S182:
A portion
of the UCSC Genome browser
window:
AACTGACTTAA
CCTAGCTTAGAT
How was this done in 1992 (i.e.,
before the genome was sequenced)?
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
But it was 1992…
Chromosome walking (outline)
Make radioactive probes from
known sequence
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Identify D4S180 & D4S182containing clones in genomic
DNA library
partial digest
Use ends of those clone’s
inserts to find other clones with
overlapping inserts
Repeat
Colony hybridization to find the first genomic DNA clone
genomic DNA clones
replica on
filter
release
the DNA
bind it to
filter
which
colonies
match up
with hyb
spots?
hyb
*
**
X-ray film
probe from D4S180 region
Colony hybridization (cont’d)
The colonies you detect must have
insert sequences complementary to
your D4S180 probe!
What next?
» Pick one of these clones
» Characterize it (restriction digest, etc.)
» Make a probe from one end of its insert
» Repeat colony hybridization
Chromosome walking — finding the next clone
Pick one end of the
insert
ori end
Amp end
PCR amplify the region
Label the PCR fragment
with radioactive tag
colony hyb
The goal — find the colonies
(clones) that contain this sequence
 overlap your first clone
Colony hybridization (cont’d)
The colonies you detect in the
hybridization could have…
- duplicates of your original plasmid
- new plasmids with different (but
overlapping) inserts
How could you tell if they were the same as the original?
Restriction digests or sequencing
Assembling a contig
Repeat the process until the clones obtained from the
flanking markers join:
identified using D4S182 probe
identified using D4S180 probe
insert in original clone
probe
insert in original clone
probe
Joining fragments
HD gene
a contig
II. Map location on genome
STS 24 62 17 54
20
From lecture 13
9 19 36 4
For example:
STS = sequence tagged
site… short, unique
genomic sequence—not
present anywhere else in
the genome— that can be
detected by PCR… ID tag
for that portion of genome
Which portion of the genome is represented in this
BAC’s insert?
Test the BAC by PCR:
Does it test positive* with PCR primers for STS 24?
Does it test positive with PCR primers for STS62? …etc.
*Test positive? What does that mean?
Genetic and physical map of the HD region
D
4
S
1
0
D
4
S
1
8
0
centromere
HD ?
D
4
S
1
8
2
~500kb
D
4
S
9
8
telomere
~40kb each
Cosmid (sort of like a plasmid) contig
How do we identify the genes in a contig?
Which one is the HD gene?
Identifying genes in DNA sequence
Various approaches…
...TTGAAGACGAAAGGGCCTCGTGATACGCCTATTTTTATAGGTTAATGTCATGATAATAATGGTTTCTTAGACGTCAGGTGGCACTTTTCGGG...
...AACTTCTGCTTTCCCGGAGCACTATGCGGATAAAAATATCCAATTACAGTACTATTATTACCAAAGAATCTGCAGTCCACCGTGAAAAGCCC...
Look for signatures of genes—e.g.,
promoters
Look for open reading frames
Look for transcribed regions—
e.g., make a cDNA library
These are
things that
computers are
great at-and
some of the
things that
underlie the
UCSC browser
Making a cDNA library
cDNA = complementary DNA
complementary to mRNA
Start with mRNA from a cell culture or tissue
Copy into DNA using reverse transcriptase and poly-A tail
One mRNA out of the pool shown here…
5’
TTTTTTT-5’
AAAAAAA-3’
added by cell during
pre-mRNA maturation
insert into plasmid, transform E. coli
Genomic vs. cDNA libraries
cDNA library
make cDNA, insert into plasmid, etc.
• only mRNA regions (exons)
represented
• frequency of clone proportional to
amount of transcription of the gene
Genetic and physical map of the HD region
D
4
S
1
0
D
4
S
1
8
0
centromere
D
4
S
9
8
telomere
~500kb
~40kb each
Used as
probes to
screen
cDNA
libraries
HD ?
D
4
S
1
8
2
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Cosmid (sort of like a plasmid) contig
IT-15
IT-11
IT-10C3 ADDA
Which (if any) of these transcripts correspond to the HD gene?
How was the HD gene identified?
Compared sequences from normal and HD individuals
Look for gene alterations specific to diseased population
Focused on genes that are expressed in the nervous system
Screened cDNA libraries prepared from normal brain mRNA
Some potential complications:
-non-disease causing (rare) polymorphisms
distinguishing the diseased and normal population
-incomplete penetrance
-variable expressivity
Why wouldn’t all individuals of a genotype show the same
phenotype? -Influence of other genes—many traits multigenic
-Influence of environment
-Observation errors!
How was the HD gene identified?
HD ?
CAG18
IT-15
CAG21
Gene: 67 exons; >200 kb
mRNA: 10,366 bases
Protein: 3,144 aa; ~350kDa
IT-11
IT-10C3 ADDA
A simple PCR test to measure CAG repeat length in IT-15:
Unique sequences
in IT-15 flanking
the CAG repeat
CAGn
GTCn
How was the HD gene identified?
Triplet
HD
normal
repeat
Correlation of HD age of
number
onset and CAG repeat length
100
Onset age (years)
65
11-34 CAG
8050
repeats in
173 normals 35
QuickTime™ and a
60
20 (Uncompressed) decompressor
(98% between TIFF
are needed to see this picture.
11-24)
40 5
42-86 CAG
repeats in
>150 HD
individuals
20
Further evidence
that CAG repeat expansion mutation is the
cause of HD:
0
-Two HD patients
with a new mutation (not seen in parents)
30 40 50 60 70 80 90 100
also had a repeat expansion.
CAG repeat
length
-Length of repeat correlated
with onset
and severity.
IT-15 is the HD gene (AKA Huntingtin)
HD ?
IT-15
Gene: 67 exons; >200 kb
mRNA: 10,366 bases
Protein: 3,144 aa; ~350kDa
IT-11
IT-10C3 ADDA
CAG11-34 non-disease allele
CAG42-?? disease allele
Why did it take so long to clone the HD gene?
1979-work begins to clone HD
1983-First marker linked to HD (a lucky break)
1993-HD gene cloned
-There were very few markers for linkage studies in humans
-There were several inconsistencies in the linkage data
-The biology of HD was of limited help in selecting candidate
genes (~60% of mRNAs transcribed in the brain)
-It is not easy to identify disease causing mutations!
"We applaud their discovery," adds another contender,
Michael Hayden of the University of British Columbia, who
found himself in the painful position of having proposed a
different candidate HD gene in Nature the day before the
consortium published their proof-positive results in Cell.
-Virginia Morell (1993) Science 260, 28-30.
Repeat instability explains HD genetic anticipation in HD
CAG repeats tend to expand upon paternal transmission:
Onset
Too young to
Onset in
at 2yrs
show trait
early 40’s
Triplet
repeat
number
120
90
65
50
35
20
5
Expanded CAG repeats are
unstable in the paternal germline
Why are long CAG repeats unstable?
A molecular model:
5’
3’
5’
3’
DNA polymerase
CAGCAGCAGCAGCAGCAG
GTCGTCGTCGTCGTCGTCGTCG TCGTC
A
increases CAG
CG
repeat length
CAGCAGCAGCAGCAG
by 1 CAG
GTCGTCGTCGTCGTCGTCG TCGTCGTC
OR, less frequently
5’
3’
decreases CAG
repeat length
by 1 CAG
CAGCAGCAGCAGCAGCAG
GTCGTCGTCGTCGTCGTCGTCGTC
GC
T
Why are only long CAG repeats unstable?
Short repeats often also contain some CAA codons:
5’ CAGCAGCAACAGCAGCAGCAACAGCAA 3’
3’ GTCGTCGTTGTCGTCGTCGTTGTCGTC 5’
5’ CAGCAGCAGCAGCAGCAGCAACAGCAA 3’
3’ GTCGTCGTCGTCGTCGTCGTTGTCGTC 5’
5’ CAGCAGCAGCAGCAGCAGCAGCAGCAA 3’
3’ GTCGTCGTCGTCGTCGTCGTCGTCGTC 5’
Prone to expansion?
How do mutations in Huntingtin cause disease?
HD is a dominant disorder: Given what you know about
dominant mutations, provide possible genetic explanations
for the HD phenotype.
-Haploinsufficiency-half the amount of HD gene
product insufficient (like W)?
-Dominant negative-poison subunit (like rab27b)?
-Expressed in wrong place (like Antennapedia) or
wrong time (like lactase)
-Protein with a new activity (like the ABO blood
antigens)?
How do mutations in Huntingtin cause disease?
HD is a dominant disorder: Given what you know about
dominant mutations, provide possible genetic explanations
for the HD phenotype.
-Haploinsufficiency-half the amount of HD gene
product insufficient (like W)?
-Dominant negative-poison subunit (like rab27b)?
-Expressed in wrong place (like Antennapedia) or
wrong time (like lactase)
-Protein with a new activity (like the ABO blood
antigens)?
Wolf-Hirschhorn Syndrome (4p-)
(The Human “Knockout” of the Huntington Locus )
• Microdeletion (contiguous gene deletion) syndrome
• Growth retardation, with abnormal facies.
• Cardiac, renal, and genital abnormalities.
• Significantly, basal ganglia is intact; no movement disorder
Rules out haploinsufficiency as cause of Huntington’s disease
How do mutations in Huntingtin cause disease?
HD is a dominant disorder: Given what you know about
dominant mutations, provide possible genetic explanations
for the HD phenotype.
-Haploinsufficiency-half the amount of HD gene
product insufficient (like W)?
-Dominant negative-poison subunit (like rab27b)?
-Expressed in wrong place (like Antennapedia) or
wrong time (like lactase)
-Protein with a new activity (like the ABO blood
antigens)?
How do mutations in Huntingtin cause disease?
HD is a dominant disorder: Given what you know about
dominant mutations, provide possible genetic explanations
for the HD phenotype.
-Haploinsufficiency-half the amount of HD gene
product insufficient (like W)?
-Dominant negative-poison subunit (like rab27b)?
-Expressed in wrong place (like Antennapedia) or
wrong time (like lactase)
-Protein with a new activity (like the ABO blood
antigens)?
Does CAG expansion act in a dominant-negative
fashion?
If the repeat expansion in HD acts in a dominant-negative
fashion, a homozygous LoF mutation should be equivalent
But no homozygous LoF alleles of the
HD gene have been seen in humans!
Perhaps we can create mutations in the mouse HD gene!
But, how do we find the mouse HD gene?
-more mismatches are tolerated if appropriate
hybridization conditions are met (salt and
temperature). Allows non-identical, but closelyrelated sequences to hybridize.
okay
Colony hybridization with a human HD probe
ultimately led to the identification of the mouse
HD gene
Human HD protein 3,144 aa
Mouse HD protein 3,120 aa
The two proteins match
at >90% of their aa’s!
If the sequences are conserved, the biological
function is also likely to be conserved
If the biological function is conserved, we can
test whether a mouse bearing a homozygous
HD lof mutation resembles the human disease
Before continuing, let’s diverge and consider
how this is done today-in some detail…(BLAST)
-But we will focus on using BLAST to find
similar proteins (unlike what you did in QS)
Finding the mouse HD gene computationally
We need three things:
1) A sequence database.
2) Some way of saying how similar two sequences are.
3) A really fast way of carrying out the similarity
test. in size
doubles
about every 2
years!
We have the genome sequences and gene structures already.
We’ll diverge from HD for a bit and talk about point 2 now.
Point 3 is more appropriate for a computer course. The
method is called BLAST (basic local alignment search tool).
You should be at least somewhat familiar with this from QS9.
Thinking about protein similarity
Suppose we have the following aligned protein
sequences:
amino acid
identities
PWAVTASCH (human)
|||||||||
VYAVQASPH (something else)
amino acid
identities
PWAVTASCH (human)
|||||||||
PWGVHATCW (something else)
We can see that both of the “something else” sequences
appear to be related to the human.
But related to what extent? We need to be quantitative.
Hydrophobic
Amino acid structures
Polar
phenylalanine
F
Charged
Amino acid frequency
amino acid
one-letter
frequency
percent
alanine
A
0.0768
7.68
cysteine
C
0.0162
1.62
aspartate
D
0.0526
5.26
glutamate
E
0.0648
6.48
phenylalanine
F
0.0409
4.09
gylcine
G
0.0689
6.89
histidine
H
0.0225
2.25
isoleucine
I
0.0586
5.86
lysine
K
0.0596
5.96
leucine
L
0.0958
9.58
methionine
M
0.0236
2.36
asparagine
N
0.0435
4.35
proline
P
0.0490
4.90
glutamine
Q
0.0394
3.94
arginine
R
0.0521
5.21
serine
S
0.0700
7.00
threonine
T
0.0558
5.58
valine
V
0.0663
6.63
tryptophan
W
0.0121
1.21
tyrosine
Y
0.0315
3.15
1.0000
100.00
Amino acid
frequencies in the
entire universe of
known protein
sequences.
common
rare
log odds calculation
likelihood of seeing amino acid pair in related protein
score = log
likelihood of seeing amino acid pair at random
likelihood of seeing amino acid pair in related protein:
• Related proteins taken from BLOCKS database (validated
related proteins).
• Simply count up how often a particular amino acid pair is seen.
• Gives you the numerator likelihood above.
likelihood of seeing amino acid pair at random:
At random:
f A-B pair  2 f A f B
(the factor of two is because it can be an A-B pair or a B-A pair)
• Gives the denominator likelihood above.
amino acid
one-letter
percent
Amino
acid
pairfrequency
frequencies
in related proteins
alanine
A
0.0768
7.68
cysteine
C
0.0162
1.62
CKS2_XENLA|Q91879
gylcine
G
CKS1_HUMAN|P33551
histidine
H
CKS2_HUMAN|P33552
isoleucine
I
CKS2_MOUSE|P56390
lysine
K
CKS1_PATVU|P41384
leucine
L
CKS1_DROME|Q24152
methionine
M
CKS1_PHYPO|P55933
asparagine
N
CKS1_LEIME|Q25330
proline
P
O23249
glutamine
Q
O60191
arginine
R
CKS1_SCHPO|P08463
serine
S
CKS1_YEAST|P20486
threonine
T
CKS1_CAEEL|Q17868
valine
V
F
0.0409
4.09
tryptophan
0.0121
D
0.0526
5.26
One
block
from
BLOCKS
database:
glutamate
E
0.0648
6.48
aspartate
phenylalanine
W
NIYYSDKYTDEHFEY
0.0689
6.89
QIYYSDKYDDEEFEY
0.0225
2.25
QIYYSDKYFDEHYEY
0.0586
5.86
QIYYSDKYFDEHYEY
0.0596
5.96
QIYYSDKYFDEDFEY
0.0958
9.58
DIYYSDKYYDEQFEY
0.0236
2.36
TIQYSEKYYDDKFEY
0.0435
4.35
KILYSDKYYDDMFEY
0.0490
4.90
QIQYSEKYFDDTFEY
0.0394
3.94
NIHYSTRYSDDTHEY
0.0521
5.21
QIHYSPRYADDEYEY
0.0700
7.00
SIHYSPRYSDDNYEY
0.0558
5.58
DFYYSNKYEDDEFEY
0.0663
6.63
D-D
D-E
D-P
D-T
D-N
E-E
E-T
E-P
T-P
T-T
T-N
21 pairs
14 pairs
14 pairs
7 pairs
7 pairs
1 pair
2 pairs
4 pairs
2 pairs
1 pair
1 pair
LOD calcul. (e.g., D-D pair):
1.21
Y
0.0315- pair 3.15
One of 29,068
blocks
frequencies
1.0000
100.00
compiled from all blocks
combined.
tyrosine
From aa frequency table
21 (# D-D pairs)
log 74 (total # of pairs)
0.05 X 0.05 (f of D-D)
log odds scores (side note)
• Traditionally, we use log base 2 (pedigree LOD scores
are base 10).
• To make computing fast, scores are usually multiplied
by 2 and then rounded to nearest integer (this is a
detail).
• Called “half-bit” scores (jargon for taking twice log
base 2).
log odds scores (cont.)
likelihood of seeing amino acid pair in related protein
score = log
likelihood of seeing amino acid pair at random
If amino acid pair seen MORE often than expected at random?
odds > 1, score positive
If amino acid pair seen LESS often than expected at random?
odds < 1, score negative
Remember:
Log2 1 = 0
Log2 2 = 1
Log2 1/2 = -1
Values from a score matrix (half-bit scores)
one-letter
amino acid code
score for
alanine (A) tryptophan
(W)
self match
scores
Hydrophobic
Amino acid structures
Polar
phenylalanine
F
Charged
Example - similar amino acids get positive scores
IL
IV
LV
Qualitatively, what
scores do you
expect pairs of
these to have?
Example - dissimilar amino acids get negative scores
Qualitatively, what scores do you expect pairs
among these groups to have? hydrophobic
vs. charged
Thinking about protein similarity
Suppose we have the following aligned protein
sequences:
PWAVTASCH (human)
|||||||||
VYAVQASPH (something else)
PWAVTASCH (human)
|||||||||
PWGVHATCW (something else)
 Related to what extent? We want to be quantitative.
Top case: -2 + 2 + 4 + 4 + -1 + 4 + 4 + -3 + 8 = 20
Bottom case: 7 + 11 + 0 + 4 + -2 + 4 + 1 + 9 + -2 = 32
(Side note - this also indicates the odds of seeing a match of this quality by chance
for the entire sequence. e.g. bottom match is 1 216  1/ 65536
. Remember they
are half-bit scores).
Getting back to HD…finding the mouse HD gene
of expected (E) matches (with a
MATLEKLMKAFESLKSFQQQQQQQQQQQQQQQQQQQQQQQPPPPPP
A portion of human HD #
PPPPPQLPQPPPQAQPLLPQPQPPPPPPPPPPGPAVAEEPLHRPKK
score
this good from a database of
ELSATKKDRVNHCLTICENIVAQSVRNSPEFQKLLGIAMELFLLCS
protein sequence (the
this size) from chance alone
DDAESDVRMVADECLNKVIKALMDSNLPRLQLELYKEIKKNGAPRS
LRAALWRFAELAHLVRPQKCRPYLVNLLPCLTRTSKRPEESVQ…
“query” sequence):
database of all proteins from
BLAST
human
human, chimp, dog, mouse, etc.
chimp
mouse!
summary list of
all related
proteins (one
per line)
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Getting back to HD…finding the mouse HD gene
Looking further
down on the
summary list…
Bit score
zebra fish
sea anemone
fruit fly
Can keep going,
but the validity
attenuates as you
approach E=1
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
E value
Getting back to HD…finding the mouse HD gene
Portion of Mus
musculus HD
alignment:
my query
this match
M. musculus
amino acid
identical
amino acid
similar
amino acid
dissimilar
gap
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
A
C
D
E
F
G
H
I
K
L
M
N
P
Q
R
S
T
V
W
Y
A
4
0
-2
1
-2
0
-2
-1
-1
-1
-1
-2
-1
-1
-1
1
0
0
-3
-2
C
0
9
-3
-4
-2
-3
-3
-1
-3
-1
-1
-3
-3
-3
-3
-1
-1
-1
-2
-2
D
-2
-3
6
2
-3
-1
-1
-3
-1
-4
-3
1
-1
0
-2
0
-1
-3
-4
-3
E
1
-4
2
5
-3
-2
0
-3
1
-3
-2
0
-1
2
0
0
-1
-2
-3
-2
F
-2
-2
-3
-3
6
-3
-1
0
-3
0
0
-3
-4
-3
-3
-2
-2
-1
1
3
G
0
-3
-1
-2
-3
6
-2
-4
-2
-4
-3
0
-2
-2
-2
0
-2
-3
-2
-3
H
-2
-3
-1
0
-1
-2
8
-3
-1
-3
-2
1
-2
0
0
-1
-2
-3
-2
2
I
-1
-1
-3
-3
0
-4
-3
4
-3
2
1
-3
-3
-3
-3
-2
-1
3
-3
-1
K
-1
-3
-1
1
-3
-2
-1
-3
5
-2
-1
0
-1
1
2
0
-1
2
-3
-2
L
-1
-1
-4
-3
0
-4
-3
2
-2
4
2
-3
-3
-2
-2
-2
-1
1
-2
-1
M
-1
-1
-3
-2
0
-3
-2
1
-1
2
5
-2
-2
0
-1
-1
-1
1
-1
-1
N
-2
-3
1
0
-3
0
1
-3
0
-3
-2
6
-2
0
0
1
0
-3
-4
-2
P
-1
-3
-1
-1
-4
-2
-2
-3
-1
-3
-2
-2
7
-1
-2
-1
-1
-2
-4
-3
Q
-1
-3
0
2
-3
-2
0
-3
1
-2
0
0
-1
5
1
0
-1
-2
-2
-1
R
-1
-3
-2
0
-3
-2
0
-3
2
-2
-1
0
-2
1
5
-1
-1
-3
-3
-2
S
1
-1
0
0
-2
0
-1
-2
0
-2
-1
1
-1
0
-1
4
1
-2
-3
-2
T
0
-1
-1
-1
-2
-2
-2
-1
-1
-1
-1
0
-1
-1
-1
1
5
0
-2
-2
V
0
-1
-3
-2
-1
-3
-3
3
2
1
1
-3
-2
-2
-3
-2
0
4
-3
-1
W -3
-2
-4
-3
1
-2
-2
-3
-3
-2
-1
-4
-4
-2
-3
-3
-2
-3 11
2
Y
-2
-3
-2
3
-3
2
-1
-2
-1
-1
-2
-3
-1
-2
-2
-2
-1
7
-2
2
notice that this amino acid pair is poorly conserved
Query LTAVGGIGQLT
LT GG+GQLT
Sbjct LTTPGGLGQLT
score (bits) is sum of each
aligned residue (x 0.5 because
the score table is in half-bits):
4 + 3 + 0 + -2 + 6 + 6 + 2 + 6 + 5 + 4 + 3 = 37 half bits = 18.5 bits
Does CAG expansion act in a dominant-negative
fashion?
If the repeat expansion in HD acts in a dominant-negative
fashion, a homozygous LoF mutation should be equivalent
But no homozygous LoF alleles of the
HD gene have been seen in humans!
Perhaps we can create mutations in the mouse HD gene!
But, how do we find the mouse HD gene?
Can do this using an experimental approach (e.g.,
screen a library) or using a computational approach
(e.g., conduct a BLAST search)
Once the mouse HD gene is identified we must create a
recombinant plasmid containing the mouse HD gene and
appropriate markers for generating a mouse HD mutation
(AKA: a mouse HD “knockout”)
Studies of HD in animal models: a mouse HD KO
Engineering an HD knockout mouse: Mouse genomic DNA clone
bearing HD exons 3-6
etc.
H
H
H
X
3
4
5
6
ampr
Restriction
endonuclease
H
H
sites
X
ori
Partial digest with H & X
H4
3
neo
5 r
X
ampr
6
X
ori
Partial digest with H
cut
H
gns H
H
3
ampr
neor
X
cut
6
ori
X
Studies of HD in animal models: a mouse HD KO
gns
neor
3
6
Embryonic stem
(ES) cells from
an albino (c/c)
strain of mice
Neomycin +
gancyclovir
gns
3
3
3
neor
4
5
neor
gns
6
6
ES
genome
1:1,000 gns
6
3
neor
6
ES cells die on gancyclovir
3
neor
6
Studies of HD in animal models: a mouse HD KO
1 2 3 4…
3
4
5
6
Blastocyststage
embryo
from a C/C
female
ES cell
bearing
heterozygous
HD KO
3
neor
3
6
6
1 2 3 6…
Altered splicing results
in frameshift and
premature transl. term.
Studies of HD in animal models: a mouse HD KO
Mosaic
embryo
c/c; HD-/HD+
Place mosaic embryos
into surrogate mother
C/C; HD+/HD+
Which of these mosaic
offspring are most likely to
have the targeted mutation
in their germline?
Creating the homozygous KO
Mosaic mouse:
c/c; HD-/HD+
C/C; HD+/HD+
c/c; HD-/HD+
OR
c/c; HD+/HD+
c/c;
HD-/HD+
Albino mouse:
c/c; HD+/HD+
X
genotype
(southern
blot of
blood
sample)
X
c/c; HD-/HD+
X
C/c; HD+/HD+
c/c; HD+/HD+
c/c; HD-/HD+
c/c; HD-/HD- The homozygous KO!
The phenotypes of the HD KO mice…
c/c; HD-/HD+
c/c; HD-/HD-
Phenotypically
normal-no brain
pathology
Early embryonic
lethal-embryonic
developmental
abnormalities
The homozygous HD KO displays different symptoms than
the human disease
HD symptoms do not result from a lof of the HD gene
How do mutations in Huntingtin cause disease?
HD is a dominant disorder: Given what you know about
dominant mutations, provide possible genetic explanations
for the HD phenotype.
-Haploinsufficiency-half the amount of HD gene
product insufficient (like W)?
-Dominant negative-poison subunit (like rab27b)?
-Expressed in wrong place (like Antennapedia) or
wrong time (like lactase)
-Protein with a new activity (like the ABO blood
antigens)?
What could it be?
The HD CAG repeats encode polyglutamine
(polyQ) tracts
(CAG)n
Promoter 1
2
Exons
3
3
4
5
6
AUG…(CAG)n
AAAAAA
M…(Q)n…
Are proteins bearing long polyglutamine tracts toxic?
etc.
Are long polyglutamine (polyQ) tracts toxic?
Evidence in favor:
-Spinal and Bulbar muscular atrophy caused by
polyQ expansion of androgen receptor.
-proteins with long polyQ repeats fold abnormally
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
…QQQQQQQQQQQQQQQQQQQQQQQ...
Protein product = misfolded conformation
 When length of glutamine tract exceeds a certain
length threshold (~ 35), the polyglutamine tract
adopts an abnormal conformation
Creating a mouse with a human HD gene
Creation of a ‘transgenic mouse’
Human
HD gene
Promoter 1
Promoter 1
(CAG)180
2
Exons
3
3
4
5
6
etc.
Single-celled mouse embryo
Gene fragment inserts
randomly into mouse genome
Place embryo into
surrogate mother
Creating a transgenic mouse (contd)
Surrogate
mother
Transgenic offspring?
Can be easily tested using PCR
Phenotypes of HD transgenic mice:
-tremors, abnormal gait, learning deficits by 6mos.
-brain polyQ aggregates
-cell loss in basal ganglia in late stages
Confirms protein with new activity (GoF) mechanism
Suggests that polyglutamine expansion is toxic
Are polyQ expansions toxic in a novel context?
Insertion of a polyQ tract in the hypoxanthine
phosphoribosyltransferase (HPRT) gene
1
2
3
(CAG)146
Generate transgenic
mouse
4
(HPRT)
polyQ expression is
also toxic in flies,
yeast, cell lines, etc.
Do mice develop
HD-like pathology?
Phenotypes of HPRT transgenic mice closely resemble
the HD transgenic mice.
suggests that polyQ itself is primarily responsible for toxicity
What have we learned from cloning HD?
- Symptoms result from neurodegeneration
- Age of onset typically 40’s; ranges from infancy
to elderly
- Genetic anticipation (increasing disease severity
in subsequent generations) often observed
But there are several promising
- No cure or treatment strategies on the horizon
Age of onset correlates with CAG repeat length; can now
be predicted (not clear if this is good or bad)
Genetic anticipation results from repeat length instability,
primarily in paternal germline
Mechanism of neuron death involves intrinsic toxicity of
large polyQ tracts
1993
Present
Repeat Expansion Diseases
Fragile X syndrome of mental retardation
FRAXE mental retardation
X-linked spinal and bulbar muscular atrophy
Myotonic dystrophy 1 and 2
Huntington’s disease 1 and 2
Dentatorubral pallidoluysian atrophy
Friedreich’s ataxia
Oculopharyngeal muscular dystrophy
Myoclonic epilepsy of Unverricht-Lundborg
Spinocerebellar ataxia types 1, 2, 3, 6, 7, 8, 10, 12 & 17