Transcript file

Regulatory Genomics
Lecture 1 November 2012
Yitzhak (Tzachi) Pilpel
1
Course requirements
• Attendance and participation
• Five reading assignments
• A final take home papers reading-based
exam
• website
In total 13 or 14 meetings (not 17…)
No meeting on Nov 15th
2
Genomics marked the
beginning of a new age in
biology and medicine
Rediscovery of
Mendel's laws helps
establish the science
of genetics
Huntington
disease gene
mapped to
chromosome 4
Sanger and Gilbert derive
methods of sequencing
DNA
1900
1977
Genetic and
physical
mapping
1983
1994-98
Working Draft of the
human genome
sequencing complete
2000
2005
1953
Watson and Crick
identify DNA
(the double helix) as
the Chemical basis
of heredity
1980
DNA markers used
to map human
disease genes to
chromosomal
regions
Source: Health Policy Research Bulletin, volume 1 issue2, September 2001
1990
Human Genome
Projects (HPG)
begins-an
international
effort to map and
sequence all the
genes in the
human genome
1998
DNA markers used
to map human
disease genes to
chromosomal
regions
Release
of
Human
Genome
Project
3
The genome browser
4
Link
Number of protein coding genes
20,210
13,601
Mustered
(Arabidopsis)
Fruit fly
Mouse
19,735
Worm
(C elegans)
20,568
5,616
Yeast
(S Cerevisiae)
482
Mycoplasma
genitalium
5
How comes we have so few genes give that
we are so complex???
•We have many non-protein coding genes
21,710
•Our genes are longer and more complex
•Regulation of human genes activity is more
complex
•Repeats (formerly known as “junk DNA”
(yet not garbage) contribute to complexity
•Combinatorial interactions among genes
and products
19,735
6
The hierarchical structure of the
genome
7
Lodish et al. Molecular Biology of the Cell (5th ed.). W.H. Freeman & Co., 2003.
Expressing the genome
8
The Central Dogma: a cellular context
915
The Central Dogma of Molecular Biology
Expressing the genome
RNA
Inactive
DNA
DNA
mRNA
f
Protein
f
10
Evolution
11
Corrected view of evolution
12
The tree of life
13
How genomes evolve?
Consider two distinct possibilities:
•Genomes evolve by lots of denovo “inventions”
•Genomes evolve predominantly
by mixing and matching existings
parts
14
Classification of protein
structures
15
Very slow growth in number of protein folds
Very few structural “inventions”
16
Comparing a certain family (e.g. kinases)
in different species reveals few
“inventions”
17
Analogy:
•Technology
•Language
18
Some basic evolutionary
operations
• Mutating existing DNA
• Change gene expression profiles
• Duplications of existing material (genes,
chromosomes, genomes)
• Transfer of genes from one organism to
another
• Functionalization of “junk DNA”
• Reverse transcription??
19
Stress condition induce high DNA
replication error rate
Because most newly arising mutations are
neutral or deleterious, it has been argued
that the mutation rate has evolved to be as
low as possible, limited only by the cost of
error-avoidance and error-correction
mechanisms. But up to one per cent of
natural bacterial isolates are 'mutator' clones
that have high mutation rates. We consider
here whether high mutation rates might play
an important role in adaptive evolution.
Models of large, asexual, clonal populations
adapting to a new environment show that
strong mutator genes (such as those that
increase mutation rates by 1,000-fold) can
accelerate adaptation, even if the mutator
20
gene remains at a very low frequency (for
example, 10[-5]). …
Some basic evolutionary
operations
• Mutating existing DNA
• Change gene expression profiles
• Duplications of existing material (genes,
chromosomes, genomes)
• Transfer of genes from one organism to
another
• Functionalization of “junk DNA”
• Reverse transcription??
21
A slight change in expression program can make a
big change: olfactory receptor can “smell the egg”
22
Science. 2003 Mar 28;299(5615):2054-8.
Identification of a testicular odorant receptor mediating human
sperm chemotaxis.
Spehr M, Gisselmann G, Poplawski A, Riffell JA, Wetzel CH, Zimmer RK,
Hatt H.
Source
Department of Cell Physiology, Ruhr University Bochum, 150 University
Street, D-44780 Bochum, Germany.
Abstract
Although it has been known for some time that olfactory receptors (ORs)
reside in spermatozoa, the function of these ORs is unknown. Here, we
identified, cloned, and functionally expressed a previously undescribed
human testicular OR, hOR17-4. With the use of ratiofluorometric imaging,
Ca2+ signals were induced by a small subset of applied chemical stimuli,
establishing the molecular receptive fields for the recombinantly
expressed receptor in human embryonic kidney (HEK) 293 cells and the
native receptor in human spermatozoa. Bourgeonal was a powerful
agonist for both recombinant and native receptor types, as well as a
strong chemoattractant in subsequent behavioral bioassays. In contrast,
undecanal was a potent OR antagonist to bourgeonal and related
compounds. Taken together, these results indicate that hOR17-4 functions
in human sperm chemotaxis and may be a critical component of the
23
fertilization process.
Some basic evolutionary
operations
• Mutating existing DNA
• Change gene expression profiles
• Duplications of existing material (genes,
chromosomes, genomes)
• Transfer of genes from one organism to
another
• Functionalization of “junk DNA”
• Reverse transcription??
24
Gene duplication might provide redundancy
duplication
nonfunctionalization
neofunctionalization
subfunctionalization
25
chromosome III duplicates in heat
2
log2(expression evo39 / evo30)
P value < 10e-100
1.5
1
0.5
0
-0.5
-1
all genes
-1.5
-2
chromosome III genes
500
1000
1500
2000
2500
3000
3500
Gene Index
4000
4500
5000
5500
26
Heat shock tolerance correlates with chromosome III copy number
3.5
3
Relative Survival
2.5
2
1.5
1
0.5
0
Evolved 3 copies WT, 3 copies
WT Two
copies
WT One copy
27
Conclusions from the experiment
• Chromosomes are easily gained and lost
in yeast evolution
• A more fine-tuned solution may follow
chromosome duplication
• A sticking similarity between repetitive
experiments
• A chromosome-condition specificity?
28
29
Many gene duplicate distances
Correspond to 60-70 mya!!
Sequences similarity
between gene pairs
30
Some basic evolutionary
operations
• Mutating existing DNA
• Change gene expression profiles
• Duplications of existing material (genes,
chromosomes, genomes)
• Transfer of genes from one organism to
another
• Functionalization of “junk DNA”
• Reverse transcription??
31
Horizontal (“lateral”) gene transfer:
transfer genes between organisms –
mostly in stress
32
Some basic evolutionary
operations
• Mutating existing DNA
• Change gene expression profiles
• Duplications of existing material (genes,
chromosomes, genomes)
• Transfer of genes from one organism to
another
• Functionalization of “junk DNA”
• Reverse transcription??
33
Evolution of transcriptional switches
TF1
TF1
Similar function
CACGCGTT
CACGCGTA
Neutral selection
TF1
Disrupted function
CACGCGTT
CACGAGTT
Low rate
purifying selection
TF2
TF1
Altered function
CACGCGTT
CACACGTT
Low rate
purifying selection
Gained function
CACGCGTT
CACACGTT
Low rate
purifying selection
34
Evolution of transcription
networks
35
[email protected]
36
Repetitive elements in the human
genome
•Alu are repetitive retrotransposons
elements in the Human genome.
•Alu elements are about 300 base pairs long
and are therefore classified as short
interspersed elements (SINEs)
•There are over one million Alu elements
interspersed throughout the human genome
•About 10% of the human genome consists of
37
Alu sequences.
Retro-transposition
38
Alus may contain binding sites for TFs, microRN
Alus
Alus
39
Can the phenotype shape the
genotype?
Classical Darwinian theory
Genotype
Phenotype
Lamarckian Theory
Genotype
Phenotype
40
The Central Dogma: a cellular context
4115
42
Cell membrane
* * *
Nucleus
cell
division
protein
synthesis
cell Attack a virus differentiate
death
43
From parts to networks…
* * *
44
Reporter gene reveal spatio-temporal
expression programs
45
In uni-cellulars response to environmental signals
affect gene expression dramatically
Genes
Gasch et al Mol Biol Cell. 2000 Dec;11(12):4241-57.
46
The transcriptome during the cell cycle
47
Spellman et al Mol Biol Cell. 1998 Dec;9(12):3273-97
Coding DNA strand
Non-coding strand
RNA
48
Transcription regulation
•
•
•
•
The hardware
The software
The input
The output
49
The initiation machinery
complex
50
Transcription factors bind the
DNA
51
Keys (regulators) can scan the genomes in
search for their locks (recognition sites)
ATACGAT
52
Transcription regulation
•
•
•
•
The hardware
The software
The input
The output
53
In the absence of Lactose
http://esg-www.mit.edu:8001/esgbio/pge/lac.html
54
The Lac Operon (Jacob and Monod)
In the presence of Lactose
http://esg-www.mit.edu:8001/esgbio/pge/lac.html
55
In the absence of Glucose
http://esg-www.mit.edu:8001/esgbio/pge/lac.html
56
Glucose
Lactose
The logic of the Lac operon
regulation
Activity
+ -
OFF
- +
ON
OFF
- -
OFF
+ +
CAP
site
Operator
Glucose
y
n
Lactose
n
OFF
OFF
y
ON
57
Genomic Regulatory Logic
58
DNA binding proteins for unique pathways
59
A global map of combinatorial expression
control
Heat-shock
Cell cycle
Sporulation
Diauxic shift
MAPK signaling
DNA damage
STRE
*High connectivity
*Hubs
*Alternative partners
in various conditions
PHO4
CCA
ALPHA1
mRPE8
mRPE57
AFT1
PDR
SWI5
MIG1
mRPE69
RAP1
mRPE72
GCN4
CSRE
SFF '
mRPE34
MCB
mRPE58
MCM1
mRPE6
RPN4
ECB
BAS1
SCB
LYS14
ABF1
SFF
STE12
ALPHA2
MCM1'
ALPHA1'
HAP234
mRRPE
PAC
60
mRRSE3
Transcription regulation
•
•
•
•
The hardware
The software
The input
The output
61
AlignACE Example
5’- TCTCTCTCCACGGCTAATTAGGTGATCATGAAAAAATGAAAAATTCATGAGAAAAGAGTCAGACATCGAAACATACAT
5’- ATGGCAGAATCACTTTAAAACGTGGCCCCACCCGCTGCACCCTGTGCATTTTGTACGTTACTGCGAAATGACTCAACG
5’- CACATCCAACGAATCACCTCACCGTTATCGTGACTCACTTTCTTTCGCATCGCCGAAGTGCCATAAAAAATATTTTTT
5’- TGCGAACAAAAGAGTCATTACAACGAGGAAATAGAAGAAAATGAAAAATTTTCGACAAAATGTATAGTCATTTCTATC
5’- ACAAAGGTACCTTCCTGGCCAATCTCACAGATTTAATATAGTAAATTGTCATGCATATGACTCATCCCGAACATGAAA
5’- ATTGATTGACTCATTTTCCTCTGACTACTACCAGTTCAAAATGTTAGAGAAAAATAGAAAAGCAGAAAAAATAAATAA
5’- GGCGCCACAGTCCGCGTTTGGTTATCCGGCTGACTCATTCTGACTCTTTTTTGGAAAGTGTGGCATGTGCTTCACACA
…HIS7
…ARO4
…ILV6
…THR4
…ARO1
…HOM2
…PRO3
300-600 bp of upstream sequence
per gene are searched in
Saccharomyces cerevisiae.
62
AlignACE Example
The Best Motif
5’- TCTCTCTCCACGGCTAATTAGGTGATCATGAAAAAATGAAAAATTCATGAGAAAAGAGTCAGACATCGAAACATACAT …HIS7
5’- ATGGCAGAATCACTTTAAAACGTGGCCCCACCCGCTGCACCCTGTGCATTTTGTACGTTACTGCGAAATGACTCAACG …ARO4
5’- CACATCCAACGAATCACCTCACCGTTATCGTGACTCACTTTCTTTCGCATCGCCGAAGTGCCATAAAAAATATTTTTT …ILV6
5’- TGCGAACAAAAGAGTCATTACAACGAGGAAATAGAAGAAAATGAAAAATTTTCGACAAAATGTATAGTCATTTCTATC …THR4
5’- ACAAAGGTACCTTCCTGGCCAATCTCACAGATTTAATATAGTAAATTGTCATGCATATGACTCATCCCGAACATGAAA …ARO1
5’- ATTGATTGACTCATTTTCCTCTGACTACTACCAGTTCAAAATGTTAGAGAAAAATAGAAAAGCAGAAAAAATAAATAA …HOM2
5’- GGCGCCACAGTCCGCGTTTGGTTATCCGGCTGACTCATTCTGACTCTTTTTTGGAAAGTGTGGCATGTGCTTCACACA …PRO3
AAAAGAGTCA
AAATGACTCA
AAGTGAGTCA
AAAAGAGTCA
GGATGAGTCA
AAATGAGTCA
GAATGAGTCA
AAAAGAGTCA
**********
MAP score = 20.37
63
Transcription regulation
•
•
•
•
The hardware
The software
The input
The output
64
Expression regulation of genes determines
complex spatio-temporal patterns
65
Monitor
expression during
cell cycle
mRNA expression level
4
3
2
1
0
-1
-2
0
5
10
15
G1 S G2 M G1 S G2 M
Time
66
Genes can be clustered based on
time-dependent expression profiles
1.2
0.7
0.2
-0.3
1
2
-0.8
-1.3
-1.8
Time -point
3
Normalized
Expression
Normalized
Expression
Time-point 3
Normalized
Expression
Time-point 1
1.5
1
0.5
0
-0.5
1
2
3
-1
-1.5
Time -point
1.5
1
0.5
0
-0.5
1
2
3
-1
-1.5
-2
67
Time -point
The K-means algorithm
• Start with random
positions of centroids.
Iteration = 0
68
K-means
• Start with random
positions of centroids.
• Assign data points to
centroids
Iteration = 1
69
K-means
• Start with random
positions of centroids.
• Assign data points to
centroids.
• Move centroids to
center of assigned
points.
Iteration = 1
70
K-means
• Start with random
positions of centroids.
• Assign data points to
centroids.
• Move centroids to
center of assigned
points.
• Iterate till minimal
cost.
Iteration = 3
71
The diauxic shift
Time
72
Genetic reprogramming
of the yeast metabolism
upon glucose deletion
73
At the beginning – when
glucose is abundant
Glucose
2 ADP+Pi
NAD+
2 ATP
NADH
Lactate
Pyruvate
Ferment
Ferment
Ethanol
Respirate
NADH
NAD+
AcetylCoA
TCA
74
~20 hours later
when glucose is depleted
Glucose
2 ADP+Pi
NAD+
2 ATP
NADH
Lactate
O2
O2
Pyruvate
Ferment
Ferment
Ethanol
Respirate
NADH
NAD+
AcetylCoA
TCA
75
The promoter sequences of coexpressed genes
5’- TCTCTCTCCACGGCTAATTAGGTGATCATGAAAAAATGAAAAATTCATGAGAAAAGAGTCAGACATCGAAACATACAT
5’- ATGGCAGAATCACTTTAAAACGTGGCCCCACCCGCTGCACCCTGTGCATTTTGTACGTTACTGCGAAATGACTCAACG
5’- CACATCCAACGAATCACCTCACCGTTATCGTGACTCACTTTCTTTCGCATCGCCGAAGTGCCATAAAAAATATTTTTT
5’- TGCGAACAAAAGAGTCATTACAACGAGGAAATAGAAGAAAATGAAAAATTTTCGACAAAATGTATAGTCATTTCTATC
5’- ACAAAGGTACCTTCCTGGCCAATCTCACAGATTTAATATAGTAAATTGTCATGCATATGACTCATCCCGAACATGAAA
5’- ATTGATTGACTCATTTTCCTCTGACTACTACCAGTTCAAAATGTTAGAGAAAAATAGAAAAGCAGAAAAAATAAATAA
5’- GGCGCCACAGTCCGCGTTTGGTTATCCGGCTGACTCATTCTGACTCTTTTTTGGAAAGTGTGGCATGTGCTTCACACA
…HIS7
…ARO4
…ILV6
…THR4
…ARO1
…HOM2
…PRO3
76
Promoter
Motifs and
expression
profiles
CGGCCCCGCGGA
CTCCTCCCCCCCTTC
TGGCCAATCA
ATGTACGGGTG
77