Transcript ppt

How many genes?
Mapping mouse traits
Lecture 1, Statistics 246
January 20, 2004
1
Aim of today’s and Thursday’s lecture
To review basic Mendelian genetics, the basics
of recombination, and go on to see how genes
contributing to qualitative and quantitative traits
are mapped using data from crosses of inbred
strains of mice.
2
2.1 Genetic background
2.1 Loci and markers
We need to know the following notions from
Meldelian genetics: autosomes, sex
chromosomes, genotypes, phenotypes, loci,
alleles, homozygous, heterozygous, dominant,
recessive, (fully) inbred, markers.
3
Our markers are Microsatellites
..AGTCCACACACACACACATGT..
..AGTCCACACACACACACATGT..
A
PCR and
electrophoresis
..AGTCCACACACACACACATGT..
..AGTCCACACACACACACACACACATGT..
H
..AGTCCACACACACACACACACACATGT..
..AGTCCACACACACACACACACACATGT..
Desirable: to call the genotypes (A, H, or B) automatically
Problems: stutters and noise, variability of the patterns, etc.
B
4
Similarity Sorting
unsorted
sorted
correlation matrix
This is a useful technique to enhance presentation of gel traces and assist manual examination.
5
Genotype Calling
This is a statistical pattern recognition problem:
• Fit mixture models
• Discriminant analysis
A
H
B
6
JnoTyper: software implementation in Java
7
2.2 Inbred strains and their crosses
Our main players are the C57BL/6 (BL for
black, abbreviated B6), a robust strain that has
been around about 90 years, and the NOD
(non-obese diabetic) mouse strain, a delicate
diabetes-prone strain discovered in 1990.
Coat colours: agouti is standard, B6 is black,
NOD is albino (i.e. white).
8
Normal (wild-type) mouse coat: color = agouti
a grizzled color of fur resulting from the barring of
each hair in several alternate dark and light bands
9
Black mouse: C57/BL6 strain
10
Albino mouse: non-obese diabetic (NOD) strain 11
Coat color loci in mice
Four main loci : A, B, C and D
•
•
•
•
Locus A – agouti
Locus B – black
Locus C (known as Tyr) – albinism
Locus D – dilution gene
12
Alleles at the Agouti (A) locus
• Ay, Lethal dominant yellow
• Avy, Viable yellow
• Aw, White-bellied Agouti
• A, Agouti or Wild type
• At, Black and Tan
• Am, mottled agouti
• a, Non-agouti
• ae., Extreme non-agouti
A and a are a dominant/recessive allele pair
13
Alleles at the Albino (C) Locus
•
•
•
•
C, full color gene
cch, chincilla
ch, himalayan
c, albino gene
C and c are a dominant/recessive pair of alleles
14
Alleles at A and C interact
(called epistasis in genetics)
• If the mouse is aaCx it is not agouti and
not albino (in our case a black mouse)
• If the mouse is AxCx it is agouti and not
albino
• If the mouse is xxcc it is albino no matter
what the alleles at the agouti locus are
because they are irrelevant
15
Crosses
We will denote the NOD mice by A, and the B6
mice by B. This same notation will denote the
two homozygotes at a polymorphic marker.
Two main crosses interest us, following the first
filial generation or F1 , which we denote by
AB  H. Here H denotes heterozygote, which
is the case for our F1s.
The backcross BC is arrived at via HB  BC,
or a variant, while the F2 intercross is given by
HH  F2.
16
2.3 Data
• An F2 inter cross was performed starting with
C57BL/6 and NOD parental lines.
• We have 133 female mice at the F2
generation, just females for the reason that
males fight, and this influences other
(quantitative blood) phenotypes of interest
• They were genotyped at 153 microsatellite
markers spanning all 19 autosomes and the
X chromosome. We also have coat color and
a few white blood cell phenotypes.
17
A small portion of the data (beginning)
#individuals #loci #traits marker next column = data from mouse1
data type f2 intercross
.
133 153 7
*D10M106 BBABBBBBHBBABBBBAABBBB-BABABA
*D10M14 AHHBHHHAHHABAHBHHBABAA-BHHAH
*D10M163 AHBBHHB-HHAB-HBH-BAHBA-BHHAH
*D10M20 HCBHAHBAHHAHAHBABAHHBH-HHHAB
data type f2 intercross
.
133 153 7
*D10M106 BBABBBBBHBBABBBBAABBBB-BABABABBABBBBBBBBBBBBB-BBBBBBABBAAABBBBBBBBB-HBABABB-ABBBBAB-BBBABABBB-BBBBBCBCBCBHBBBHCBBHBHHBCBBBBBBBHBHBHCH
*D10M14 AHHBHHHAHHABAHBHHBABAA-BHHAHAAHAHHHHHBAHHHAHHBAHBHABBBHAAHHHHAHBHHH--HHHHAHAHAHBHHHAHHABAHHHAHHHAHBHBBHHHAAHAAHHBHHAHAH-HBABAHAHBHHAH
*D10M163 AHBBHHB-HHAB-HBH-BAHBA-BHHAHAAHAAHHAHBAHHHHHHHAHBHABBBHAAHBBHAHBBHHBBHBHHHH-HBHHHHHAHHAHABH-AHHHAHBABBBBAAAHAAHHBHHAHHHBHBAHAHABHHHAH
*D10M20 HCBHAHBAHHAHAHBABAHHBH-HHHABAAHAAABHHBH-HAHBHAAHBCABABHAAABBHAHBHHBBBHBHAHH-HBHHHABAHHHHAHHBAAHHABHABHBHAAHBHAAHBHAAHBHBHBHHHHABAHAAH
D10M106 = a marker on chr 10 defined by MIT
Incompleteness code: C = B or H, D = A or H, - = missing
18
A small portion of the raw data (end)
*DXM210 --HAAAAHHHAHAAAAAHAH-HAHHAHAHHH*DXM222 HAAHHAA-HHAAHAAHHAAAHH-HAAHAAHHH
*DXM39 HAAAHAA-HHAH-AAA-HAAHH-HAAAAHHHHH
Coat color code
*trait1 1 1 2 3 1 1 2 3 1 2 2 2 1 1 1 1 3 1 1 1 1 1 1
*trait2 8.90472059883773 8.62455170973674WBC
8.4546
*trait3 16.0508869012649 16.1080453151048 traits
16.167
*trait4 16.0138456295845 16.0907244541622 16.125
*trait5 13.8887610197039 14.1288603771646 13.986
*trait6 7.1066061377273 6.52209279817015 6.63331
*trait7 8.65927129000923 8.41405243249672 198.1586
data type f2 intercross
.
133 153 7
*D10M106 BBABBBBBHBBABBBBAABBBB-BABABABBABBBBBBBBBBBBB-BBBBBBABBAAABBBBBBBBB-HBABABB-ABBBBAB-BBBABABBB-BBBBBCBCBCBHBBBHCBBHBHHBCBBBBBBBHBHBHCH
*D10M14 AHHBHHHAHHABAHBHHBABAA-BHHAHAAHAHHHHHBAHHHAHHBAHBHABBBHAAHHHHAHBHHH--HHHHAHAHAHBHHHAHHABAHHHAHHHAHBHBBHHHAAHAAHHBHHAHAH-HBABAHAHBHHAH
*D10M163 AHBBHHB-HHAB-HBH-BAHBA-BHHAHAAHAAHHAHBAHHHHHHHAHBHABBBHAAHBBHAHBBHHBBHBHHHH-HBHHHHHAHHAHABH-AHHHAHBABBBBAAAHAAHHBHHAHHHBHBAHAHABHHHAH
*D10M20 HCBHAHBAHHAHAHBABAHHBH-HHHABAAHAAABHHBH-HAHBHAAHBCABABHAAABBHAHBHHBBBHBHAHH-HBHHHABAHHHHAHHBAAHHABHABHBHAAHBHAAHBHAAHBHBHBHHHHABAHAAH
Snapshot of the genotype data
20
Using the
LOD_error
statistic.
Based on
close
recombn
events
which
indicate
possible
presence
of
genotyping
error
Error Detection
(see later)
calc.genoprob, calc.errorlod, plot.errorlod
21
2.4 Mendel’s laws for one locus
We can (and should) check Mendel with data
from our 133 offspring at each of our 153 loci.
For example, at D7Mit126, we have 24 A, 29 B
and 67 H genotypes, adding to 120, indicating
12 incomplete or missing genotypes.
What do we expect according to Mendel? How
would we test whether the data agree with our
expectations?
22
2.5 Mendel’s law for 2 loci
Mendel inferred from his data on peas the
independent segregation of different factors.
Here we check that this holds for our two coat
color loci, but not generally. We then go on to
understand the more general situation.
23
Mating & Coat color outcomes in this cross
C57/BL6
males
Parental
lines

NOD females
Albinos
Black
(aaBBCC)
(AABBcc)
All Agouti
F1
F2
aABBCc
Agouti
9
:
Black
3
:
Albino
4
We need to check these last proportions following Mendel’s
24
reasoning.
Punnett square depicting F1 parental allele
combinations passed on to F2 offspring
25
It’s not always like that
132
51
A
H
B
Total
A
H
B
Total
26
10
0
36
10
46
5
61
0
9
23
32
36
65
28
129
2-locus genotypes at D12Mit51 and D12Mit132.
If we pool A and H, we do not get 9:3:3:1.
26