No Slide Title - Computer Science Department, Technion
Download
Report
Transcript No Slide Title - Computer Science Department, Technion
Computational Genetics
Lecture 1
Background Readings: Chapter 2&3 of An introduction to Genetics,
Griffiths et al. 2000, Seventh Edition (CS/Fishbach/Other libraries).
This class has been edited from several sources. Primarily from Terry Speed’s homepage at
Stanford and the Technion course “Introduction to Genetics”. Changes made by Dan Geiger.
.
Course Information
Meetings:
Lecture, by Dan Geiger: Thursdays 14:30 –16:30, Taub.
Tutorial, by Anna Tzemach: Thursdays 12:30 –13:30, Taub 5.
Grade:
50% in five question sets. These questions sets are obligatory.
Each contains 4-6 theoretical problems. Submit in pairs in two
weeks time.
50% exam for undergrads. Seminar for Graduate students. A few
undergrad students may be allowed to replace the exam with a
seminar lecture.
Information and handouts:
http://www.cs.technion.ac.il/~anna_bi/cs236633/
2
Course Prerequisites
Computer Science and Probability Background
Algorithms 1 (cs234247)
Probability (any course)
Algorithms in computational biology (recommended, or take in
parallel).
Some Biology Background
Formally: None, to allow CS students to take this course.
Recommended: Introduction to Genetics (or in parallel).
3
Course Goals
Learning
about computational and mathematical
methods for genetic analysis.
We will focus on Gene hunting – finding genes for
simple human diseases.
Methods covered in depth: linkage analysis (using
pedigree data), association analysis (using random
samples).
Another goal is to learn more about Bayesian
networks usage for genetic linkage analysis.
4
Human Genome
Most human cells contain
46 chromosomes:
2 sex chromosomes
(X,Y):
XY – in males.
XX – in females.
22 pairs of
chromosomes, named
autosomes.
5
Genetic Information
Gene – basic unit of genetic
information. They determine
the inherited characters.
Genome – the collection of
genetic information.
Chromosomes – storage
units of genes.
6
Sexual Reproduction
egg
Meiosis
sperm
gametes
zygote
7
Source: Alberts et al
The Double Helix
8
Central Dogma
שעתוק
Transcription
Gene
תרגום
Translation
mRNA
Protein
cells express different subset of the genes
In different tissues and under different conditions
9
Chromosome Logical Structure
Marker – Genes, SNP, Tandem repeats.
Locus – location of markers.
Allele – one variant form of a marker.
Locus1
Possible Alleles: A1,A2
Locus2
Possible Alleles: B1,B2,B3
10
Alleles - the ABO locus example
Phenotype
Genotype
A
A/A, A/O
B
B/B, B/O
AB
A/B
O
O/O
O is recessive to A.
A is dominant over O.
A and B are codominant.
Multiple alleles: A,B,O.
Trait = Character = Phenotype
11
מושגים:
.1אלל רצסיבי ודומיננטי .כאשר קיים בתא גם האלל הרצסיבי וגם
הדומיננטי ,הפנוטיפ שקובע האלל הדומיננטי משתלט.
AA .2ו aa -הם הומוזיגוטים ) (Homozygoteלאלל הדומיננטי
והרצסיבי ,בהתאמה Aa .הוא הטרוזיגוט ).(Hetrozygote
.3אללים מרובים ),(A,B,O
12
(X-linked) תאחיזה למין
genotype
phenotype
b - dominant allele. Namely, (b,b), (b,w) is Black.
w - recessive allele. Namely, only (w,w) is White.
This is an example of an X-linked )(תאחיזה למין
trait/character.
For males b alone is Black and w alone is white.
There is no homolog gene ) ( גן הומולוגיon the Y chromose.
13
Mendel’s Work
Modern genetics began with Mendel’s experiments on garden
peas (Although, the ramification of his work were not realized
during his life time). He studied seven contrasting pairs of
characters, including:
The form of ripe seeds: round, wrinkled
The color of the seed albumen: yellow, green
The length of the stem: long, short
Mendel Gregor. 1866. Experiments on
Plant Hybridization. Transactions of the
Brünn Natural History Society.
14
Mendel’s first law
Characters are controlled by pairs of genes which
separate during the formation of the reproductive
cells (meiosis)
Aa
A
a
15
P:
AA X
F1:
aa
Aa
F1 X F1
Aa
X Aa
test cross
Aa X
Gametes:
A
a
Gametes:
A
a
A
AA
Aa
a
Aa
aa
a
Aa
aa
aa
~
~
Phenotype: 1A : 1 a
F2:
1 AA : 2 Aa : 1 aa
Phenotype
~
A
~
a
16
מושגים:
.1הכלאה של F1על עצמו :בדור F2היחס בין הצאצאים המראים
הפנוטיפ הדומיננטי לאלו המראים הפנוטיפ הרצסיבי הוא – .3:1
.2הכלאת מבחן :הכלאת צאצאי F1על ההורה בעל הפנוטיפ הרצסיבי.
היחס בין הצאצאים המראים הפנוטיפ הדומיננטי לאלו המראים הפנוטיפ
הרצסיבי הוא – 1:1
17
Mendel's First low.
Results of crosses in which parents differed for one character
Parental Phenotype
F1
F2
F2 ratio
1. Round X wrinkled seeds
Round
5474 round; 1850 wrinkled
2.96:1
2. Yellow X green seeds
yellow
6022 yellow; 2001 green
3.01:1
3. Purple X white petals
purple
705 purple; 224 white
3.15:1
4. Inflated X pinched pods
inflated
882 inflated; 299 pinched
2.95:1
5. Green X yellow pods
green
428 green; 152 yellow
2.82:1
6. Axial X terminal flowers
axial
651 axial; 207 terminal
3.14:1
7. Long X short stems
long
787 lon; 277 short
2.84:1
Conclusion, First low: The two members of a gene pair
segregate from each other into the gametes.
18
דוגמא לשושלת עם מוטציה רצסיבית
(נישואין של בני דודים).
19
Polydactyly – A dominant mutation
20
Brachydactyly – A dominant mutation
21
Mendel’s second law
When two or more pairs of genes segregate
simultaneously, they do so independently.
A a; B b
AB
PAB= PA PB
Ab
PAb=PA Pb
aB
PaB=Pa PB
ab
Pab=Pa Pb
22
23
Mendel's second low.
A dihybrid cross for color and shape of pea seeds
P
F1
F2
wrinkled and yellow X round and green
rrYY
RRyy
round yellow
Rr Yy X
Rr Yy
round yellow
round green
wrinkled yellow
wrinkled green
315
108
101
32
556
a. Check segregation pattern for each allele in F2:
416 yellow : 140 green (2.97:1)
423 round : 133 wrinkled (3.18:1)
Conclusion: both traits behave as single genes, each carrying
two different alleles.
24
Question: Is there independent assortment of alleles of the different genes?
v Probability to get yellow is 3/4; probability to get round is 3/4;
probability to get yellow round is 3/4 X 3/4, namely 9/16
vProbability to get yellow is 3/4; probability to get wrinkled is 1/4;
probability to get yellow wrinkled is 3/4 X 1/4, namely 3/16
vProbability to get green is 3/4; probability to get round is 3/4;
probability to get green round is 1/4 X 3/4, namely 3/16
vProbability to get green is 1/4; probability to get wrinkled is 1/4;
probability to get green wrinkled is 1/4 X 1/4, namely 1 /16.
25
A standard presentation in terms of counts
expected
expected observed
yellow round
9
312.75
315
yellow wrinkled
3
104.25
101
green round
3
104.25
108
green wrinkled
1
34.75
32
Total
16
556
556
Conclusion, second law:
Different gene pairs assort independently in gamete formation
26
“Exceptions” to Mendel’s Second Law
Morgan’s fruit fly data (1909): 2,839 flies
Eye color
A: red
Wing length B: normal
a: purple
b: vestigial
AABB
aabb
x
AaBb
Expected
Observed
AaBb
710
1,339
x
Aabb
710
151
aabb
aaBb
710
154
aabb
710
1,195
The pair AB stick together more than expected
from Mendel’s law.
27
Morgan’s explanation
A
A
B
a
a
B
F1:
b
b
A
a
B
a
a
b
b
b
F2:
A
a
B
b
b
a
A
a
a
b
Crossover has taken place
b
a
a
b
B
b
28
Parental types:
Recombinants:
AaBb, aabb
Aabb, aaBb
The proportion of recombinants between the two genes
(or characters) is called the recombination fraction between
these two genes.
It is usually denoted by r or . For Morgan’s traits:
r = (151 + 154)/2839 = 0.107
If r < 1/2: two genes are said to be linked.
If r = 1/2: independent segregation
(Mendel’s second law).
29
Recombination Phenomenon
(Happens during Meiosis)
Male or female
Recombination
Haplotype
:תאי מין
או זרע,ביצית
30
כרומוזומים מזווגים המראים כיאסמתה
הכיאסמתה היא הביטוי הציטולוגי לשחלוף.
31
Example: ABO, AK1 on
Chromosome 9
O
A
O O
A2 A2
2
1
A2/A2
A1/A1
Phase inferred
A O
A1 A2
Recombinant
A
A
4
3
A2/A2
A1/A2
O O
A1 A2
O
A |O
A2 | A2
5
A1/A2
Recombination fraction is 12/100 in males and 20/100 in females.
One centi-morgan means one recombination every 100
meiosis.
One centi-morgan corresponds to approx 1M nucleotides (with
large variance) depending on location and sex.
32
סימונים מוסכמים בשושלות
33
Maximum Likelihood Principle
What is the probability of data
for this pedigree, assuming a
recessive mutation ?
What is the probability of data
for this pedigree, assuming a
dominant mutation ?
Maximum likelihood principle: Choose the model that
maximizes the probability of the data.
34
One locus: founder probabilities
Founders are individuals whose parents are not in the pedigree. They may of
may not be typed (namely, their genotype measured). Either way, we need to
assign probabilities to their actual or possible genotypes.
This is usually done by assuming Hardy-Weinberg equilibrium (H-W). If the
frequency of D is .01, then H-W says:
1
Dd
pr(Dd ) = 2x.01x.99
Genotypes of founder couples are (usually) treated as independent.
1
Dd
2
dd
pr(pop Dd , mom dd ) = (2x.01x.99)x(.99)2
35
One locus: transmission probabilities
Children get their genes from their parents’ genes,
independently, according to Mendel’s laws; also
independently for different children.
Dd
1
2
3
Dd
dd
pr(kid 3 dd | pop 1 Dd & mom 2 Dd )
= 1/2 x 1/2
36
One locus: transmission probabilities - II
Dd
3
dd
1
2
Dd
4
5
Dd
DD
pr(3 dd & 4 Dd & 5 DD | 1 Dd & 2 Dd )
= (1/2 x 1/2)x(2 x 1/2 x 1/2) x (1/2 x 1/2).
The factor 2 comes from summing over the two mutually
exclusive and equiprobable ways 4 can get a D and a d.
37
One locus: penetrance probabilities
Pedigree analyses usually suppose that, given the genotype at all loci,
and in some cases age and sex, the chance of having a particular
phenotype depends only on genotype at one locus, and is independent
of all other factors: genotypes at other loci, environment, genotypes and
phenotypes of relatives, etc.
Complete penetrance:
DD
pr(affected | DD ) = 1
Incomplete penetrance )(חדירות חלקית:
DD
pr(affected | DD ) = .8
38
One locus: penetrance - II
Age and sex-dependent penetrance (liability
classes)
D D (45)
pr( affected | DD , male, 45 y.o. ) = .6
39
חדירות חלקית:
דוגמא למוטציה דומיננטית בה הפנוטיפ המוטנטי לא תמיד מתבטא
אישה בריאה זו מעבירה לבתה
את המוטציה הדומיננטית.
40
One locus: putting it all together
Dd
3
2
1
5
4
dd
Dd
Dd
DD
Assume penetrances pr(affected | dd ) = .1, pr(affected | Dd ) = .3
pr(affected | DD ) = .8, and that allele D has frequency .01.
The probability of data for this pedigree assuming penetrances of
1=0.1 and 2=0.3 is the product:
(2 x .01 x .99 x .7) x (2 x .01 x .99 x .3) x (1/2 x 1/2 x .9) x (2
x 1/2 x 1/2 x .7) x (1/2 x 1/2 x .8)
This is a function of the penetrances. By the maximum likelihood
principle, the values for 1 and 1 that maximize this
probability are the ML estimates.
41
Fully penetrant Recessive Disease
2
1
3
4
5
Let q be the probability of the disease allele. The probability of data for
this pedigree assuming full penetrance is the product:
L = (1-q) x q x (1-q) x q (3/4)(3/4)(1/4)
Exercise: write the likelihood for a fully penetrant dominant disease.
42