mnw2yr_lec17_2004

Download Report

Transcript mnw2yr_lec17_2004

Genomics
An introduction
Aims of genomics I

Establishing integrated databases –
being far from merely a storage

Linking genomic and expressed
gene sequences
cDNA
Aims of genomics II

Describing every gene:
• function/expression
data/relationships/phenotype
• 3-d structure and features (introns/exons,
domains, repeats)
• similarities to other genes

Characterize sequence diversity in
population
Genomics can be:

Structural
– where it is?

Functional
– what it does?
– DNA microarrays:

Comparative
– finding important fragments
Mapping genomes

Past
– Genetic maps
Distance between simple markers
expressed in units of recombination
– Cytological maps
Stained chromosomes, observable
under microscope

Present
– Physical maps
Distance between nucleotides
expressed in bases
– Comparative map
Corresponding genes detection;
Regulatory sequence detection;
Genome sizes
Organism
DNA
length
Genes
Mycoplasma
genitalium
0.5 Mb
470
Deinococcus
radiodurans
3 Mb in 410 copies!
3 200
Escherichia coli
4.5 Mb
4 400
Saccharomyces
12 Mb
6 200
Caenorhabditis
elegans
97 Mb
22 000
Drosophila
melanogaster
120 Mb
18 000
Homo sapiens
3200 Mb
32 000
cerevisiae
Genetic differences among humans

Goals
– Genetic diseases
– Identifying criminals

Methods
– Genetic markers (fingerprints) and DNA sequence.
Repeats:
• Microsatellites (repeats of 1-12 nucleotides)
• Minisatellites (> 12)
– Other types of variation
• Genome rearrangements
• Single nucleotide mutations
Microsatellites and disease

Huntington’s disease
– Huntingtin gene of unknown (!) function
– Repeats #: 6-35: normal; 36-120: disease
•
Friedrich ataxia disease
– GAA repeat in non-coding (intron) region
– Repeats #: 7-34: normal; 35 up: disease
– Repeat expansion reduces expression of frataxin gene
SNP - Single Nucleotide Polymorphism

Definition
– SNP and phenotype

Occurrence in genome
– Rarity of most SNPs (agrees with
neutral molecular evolutionary theory)
– SNPs in human population:
Inter-genic regions
Coding regions
Every 1400bp
Every 1430bp
• High variance in genome!

Detection of SNPs: Hybridization
Sickle cell anemia
Sickle looks like this:
SNP on Beta Globin gene, which is
recessive:
• 2 faulty copies: red blood cells
change shape under stress anemia
• 1 faulty copy: red blood cells
change shape under heavy stress –
but gives resistance to malaria
parasite
SNPs and haplotypes
Passengers and their evolutionary
vehicles
SNP - Phase inference

In the data from sequencing the genome the origin of SNP is
scrambled
G G
...CT AC GT...
T A
Possibility 1
CTGACGGT...
...
CTTACAGT...
...
chromosome
...
chromosome
...

Possibility 2
CTGACAGT...
CTTACGGT...
Which SNPs are on the same chromosome (are in phase)?
SNP – phase inference
Determining the parent of origin for each SNP
G C
...CT AC GT...
A G
C A
CT AC GT...
T A
...
G G
CT AC GT...
T A
...
In this case:
GG
TA
Phase inference – the reason why many SNPs sequencing is done for child
and two parents.
Linkage Disequilibrium, intro
How hard is it to break a chromosome

An allele/trait/SNP A and a are on the same position in genome
(locus), thus on a single chromosome an individual can have
either of them – but not both
– fA - frequency of occurrences of trait A in population
– fa = 1- fA
– fB, fb = 1 - fB are frequency occurrences of B and b

Probabilities of occurences of both traits on the same
chromosome:
A
B
fAB
A
b
fAb
a
B
faB
fab
a
b

LD and genomic recombination
Linkage Disequilibrium, calculation




When these alleles are not correlated we expect them to occur
together by chance alone:
fAB = fA fB
fAb = fA fb
faB = fa fB
fab = fa fb
But if A and B are occurring together more often (disequilibrium
state), we can write
fAB = fA fB + D
fAb = fA fb - D
faB = fa fB - D
fab = fa fb + D
where D is called the measure of disequlibrium
Of course from definitions above we have D = fAB - fA fB
How can we use it?
Phase inference tells us how SNPs are
organized on chromosome
 Linkage disequilibrium measures the
correlation between SNPs

Back to SNPs
Daly et al (2001), Figure 1
Haplotypes - vehicles for SNPs


Daly et al (2001) were able to infer offspring haplotypes largely
from parents. They say that “it became evident that the region
could be largely decomposed into discrete haplotype blocks,
each with a striking lack of diversity“
The haplotype blocks:
– Up to 100kb
– 5 or more SNPs
For example, this block shows just two distinct haplotypes
accounting for 95% of the observed chromosomes
Haplotypes on the genome fragment
a)
b)
c)
Observed haplotypes with dotted lines wherever probability of switching to another line is > 2%
Percent of explanation by haplotypes
Contribution of specific haplotypes
Another genetic test
Does haplotypes exist?
-
Each row represents an SNP
-
Blue dot = major
yellow = minor
-
Each column represents a
single chromosome
-
The 147 SNPs are divided into
18 blocks defined by black
lines.
-
The expanded box on the right
is an SNP block of 26 SNPs
over 19kb of genomic DNA. The
4 most common of 7 different
haplotypes include 80% of the
chromosomes, and can be
distinguished with 2 SNPs
How much SNPs we can ignore?
…and still predict haplotypes with high accuracy?
Literature
Gibson, Muse „A Primer of Genome Science”
 N Patil et al . Blocks of limited haplotype
diversity revealed by high-resolution scanning
of human chromosome 21 Science 294
2001:1719-1723.
 M J Daly et al . High-resolution haplotype
structure in the human genome Nat. Genet.
29 2001: 229-232.
