Protein-coding genes

Download Report

Transcript Protein-coding genes

Organisation of human genome
Nuclear genome (3.2 Gbp)
24 types of chromosomes
Y- 51Mb and chr1 -279Mbp
Mitochondrial genome
1.5%
Exons
Introns (junk)
Intergenic
regions
(junk)
The genome is empty?
9
Saccharomyces
cerevisiae
(baker’s yeast)
Estimated
number of
genes:
6,034
Drosophila
melanogaster
(fruit fly)
13,061
Caenorhabditus
elegans
(roundworm)
19,099
Arabidopsis
thaliana
(mustard plant)
25,000
LA COMPLEJIDAD BIOLÓGICA CRECIENTE EXIGE
CAMBIOS GENÓMICOS QUE INCREMENTEN LA
CAPACIDAD INFORMACIONAL DEL SISTEMA...
...PERO EL NÚMERO DE GENES EN LOS
DISTINTOS GENOMAS SECUENCIADOS
NO CONCUERDA CON LO ESPERADO
(APARENTEMENTE)
Amphimedon queenslandica
18693
Trichoplax adhaerens 11514
Bos taurus
>22790
Nematostella vectensis 18000
Nassonia vitripennis 17279
Homo sapiens 21527
Mus musculus 22083
Danio rerio 21413
Drosophila melanogaster 13781
Ciona intestinalis 16000
Takifugu rubripes 18500
Caenorhabditis elegans 20224
Strongylocentrotus purpuratus 23300
Anolis carolinensis 17000
Xenopus tropicalis 18000
Gallus gallus <17000
Arabidopsis thaliana 26000
Gorilla gorilla 21000
Oryza sativa 50000
Pan troglodytes 21000
Populus trichocarpa 45550
Glycine max 75778
Why (coding) gene number doesn’t matter?
• More sophisticated regulation of expression?
• Proteome vastly larger than genome?
– Alternate splicing
– RNA editing
• Postranslational modifications
• Cellular location
…but, remember there are other genes
Genes in the genome:
• Protein-coding genes (mRNA): around 20500
(as of 10/2012)
• Non-coding RNAs
Ribosomal RNA (rRNA)
Transfer RNA (tRNA)
Small nuclear RNA (SnRNA)
Small nucleolar RNA (SnoRNA)
microRNA (miRNA)
Other non-coding RNAs (Xist, 7SK, etc.)
• Peudogenes
Non polypeptide–coding: RNA encoding
Statistics about the current Gencode freeze (version 13)
*The statistics derive from the gtf files, which include only the main chromosomes of
the human reference genome.
Version 13 (March 2012 freeze, GRCh37)
General stats
Total No of Genes 55123
Protein-coding genes 20670
Long non-coding RNA genes 12393
Small non-coding RNA genes 9173
Pseudogenes 13123
Total No of Transcripts 182967
Protein-coding transcripts 77901
Long non-coding RNA loci transcripts 19835
Total No of distinct translations 78119
Genes that have more than one distinct translations 14235
Protein-coding genes (mRNA):
HUMAN genes
and their homology
to genes
from other organisms
CODING GENES
Noncoding regions in coding genes
• Regulatory regions
– RNA polymerase binding site
– Transcription factor binding sites
– Polyadenylation [poly(A)] sites
– Enhancers
• 5’- and 3’-UTRs
DNA as a series of ‘docking’ sites
It is the relative location of these docking sites to one another that
permits genes to be transcribed, spliced, and translated properly
and in specific spatial and temporal patterns.
…some more statistics
•
•
•
•
•
•
•
•
•
•
Gene density 1/100 kb (vary widely);
Averagely 9 exons per gene
363 exons in titin gene
Many genes are intronsless
Largest intron is 800 kb (WWOX gene)
Smallest introns – 10 bp
Average 5’ UTR 0,2-0,3 kb
Average 3’ UTR 0,77 kb but underestimated…
Largest protein: titin: 38,138 aa
Largest gene: dystrophin
Human genes vary enormously in size and exon content
An example of complex human gene locus
INK4a-ARF
From: Prof. Gordon Peters website
Genes within genes
Neurofibromatosis gene (NF1) intron 26 encode :
OGMP (oligodendrocyte myelin glycoprotein)
EVI2A and EVO2B
(homologues of ecotropic viral intergration sites in mouse)
Why gene number doesn’t matter?
• More sophisticated regulation of expression
• Proteome vastly larger than genome
– Alternate splicing
– RNA editing…
• Postranslational modifications
• Cooption
• GRN’s connectivity
REDES DINÁMICAS
Why gene number doesn’t matter?
• More sophisticated regulation of expression
• Proteome vastly larger than genome
– Alternate splicing
– RNA editing…
• Postranslational modifications
• Cooption
• GRN’s connectivity
Table 1. Levels of regulation--loci of control constraints--above the genome.
Levels and transitions
Dynamic regulatory system
1. Genome to transcriptome
Epigenetic regulation of gene expression (5). Includes pathways that detect energy
levels (redox levels) and repress DNA transcription when cellular NADH levels are
increased.
2. Transcriptome to proteome
Regulatory constraints include posttranslational modification of proteins.
3. Proteome to dynamic system
Metabolic networks of glycolysis and mitochondrial oxidation-reduction are the
dynamic systems presently the best understood in terms of both mechanism of
formation and operating principles. They display control distributed over all
enzymes of a network, and their phenotype includes cellular redox potential.
4. Dynamic systems to phenotype
Control of global phenotype such as disease may be localized to a single regulatory
system (such as metabolic, hormone signaling, etc.) or be distributed over many
systems and levels
Gene Expression
• The products of genes may be RNA or protein
• RNA and protein synthesis occur in many steps
• These steps are regulated and conttroled
Table 1. Levels of regulation--loci of control constraints--above the genome.
Levels and transitions
Dynamic regulatory system
1. Genome to transcriptome
Epigenetic regulation of gene expression (5). Includes pathways that detect energy
levels (redox levels) and repress DNA transcription when cellular NADH levels are
increased.
2. Transcriptome to proteome
Regulatory constraints include posttranslational modification of proteins.
3. Proteome to dynamic system
Metabolic networks of glycolysis and mitochondrial oxidation-reduction are the
dynamic systems presently the best understood in terms of both mechanism of
formation and operating principles. They display control distributed over all
enzymes of a network, and their phenotype includes cellular redox potential.
4. Dynamic systems to phenotype
Control of global phenotype such as disease may be localized to a single regulatory
system (such as metabolic, hormone signaling, etc.) or be distributed over many
systems and levels
UCSC
Table 1. Levels of regulation--loci of control constraints--above the genome.
Levels and transitions
Dynamic regulatory system
1. Genome to transcriptome
Epigenetic regulation of gene expression (5). Includes pathways that detect energy
levels (redox levels) and repress DNA transcription when cellular NADH levels are
increased.
2. Transcriptome to proteome
Regulatory constraints include posttranslational modification of proteins.
3. Proteome to dynamic system
Metabolic networks of glycolysis and mitochondrial oxidation-reduction are the
dynamic systems presently the best understood in terms of both mechanism of
formation and operating principles. They display control distributed over all
enzymes of a network, and their phenotype includes cellular redox potential.
4. Dynamic systems to phenotype
Control of global phenotype such as disease may be localized to a single regulatory
system (such as metabolic, hormone signaling, etc.) or be distributed over many
systems and levels
Gene Expression
• The products of genes may be RNA or protein
• RNA and protein synthesis occur in many steps
• These steps are regulated and conttroled
Location of CpG islands in the gene
CpG islands do NOT have a deficit
of CpG dinucelotides
How epigenetics works
Promoter Region
CpG Island
= CpG
= methylated CpG
Gene
Unmethylated CpGs relax chromatin
Gene
RNA
= CpG
= methylated CpG
Proteins
Methylated CpGs constrain chromatin
Gene
RNA
= CpG
= methylated CpG
Proteins
Chromatin Modification
Chromatin Remodeling
SNF/SWI
Transcription Factor
Modification
Acetylation
Phosphorylation
DNA Methylation
CpG dinucleotides
MeCP2
Histone Substitution
H2AZ
H2Ax
H3.3
Histone Modification
Acetylation
Ubiquitination
Sumoylation
Methylation
Phosphorylation
Eukaryotic transcription regulation
Modular construction and combinatorial control
• The regulatory sequence (cis element) on DNA
consists of multiple motifs specific for
transcription factors.
• Multiple transcription factors can bind
simultaneously to the regulatory sequences
and act together on the transcription of the
gene.
Co-activator
protein
General
transcription
factors
TBP
Transcriptional activators
binding to promoter region
TATA
-35
Regulated Transcription
Gene X
Activators stimulate the highly cooperative
assembly of initiation complexes
Binding sites for activators that control transcription of the mouse TTR gene
Figure 10-60
Model for cooperative assembly of an activated
transcription-initiation complex in the TTR promoter
Figure 10-61
(TTR= transthyretin)
Distant Cis-Acting Elements
Locus Control Region
Regulatory site required for optimal expression
of adjacent group of genes
Insulator Element
Prevents activation/repression extending to an adjacent
regulatory sequence
Distant Cis-Acting Elements
Insulator Element
Prevents activation/repression extending to an adjacent
regulatory sequence
Co-activator
protein
General
transcription
factors
TBP
Transcriptional activators
binding to promoter region
TATA
-35
Regulated Transcription
Gene X
ALTERNATIVE PROMOTERS
REGULACIÓN ESPECÍFICA DE SEXO EN EL GEN DNMT1
(METHYLTRANSFERASE):
PROMOTORES DE OOCITO, SOMÁTICO, O DE ESPERMATOCITO
Posttranscriptional control
• Regulation of RNA processing
• Regulation of mRNA degradation
• Regulation of translation
mRNA: many places for variation, modification, regulation
•
transcription
•
•
•
•
•
5’ capping
3’ polyA addition
•
•
•
•
editing
•
changing bases and codons
nonsense-mediated decay
degradation signals
sequestration
•
•
•
mature mRNA only
stability
•
•
alternative sites
alternative exons
self-splicing, spliceosomemediated
nuclear export
•
splicing
•
•
•
initiation
elongation
termination
•
localization in cytoplasmic
compartments
access to translation machinery
antisense/RNA interference
•
inhibit translation
The PolyA Site (PAS)
PAS
stop
UTR
3’ exon
PolyA signal
~17nt
AATAAA
T
AAAAAAAAA
AAAA
Alternative polyadenylation sites
Alternative PAS &
Post-transcriptional (de)regulation
Coding sequence
Possible regulatory element
(stability, translation, transport)
3' UTR
AUUAAA
AUUAAA
AUUAAA
AUUAAA
AUUAAA
Use of abnormal polyA site is associated to various diseases:
A/B Thalassemia (globin)
Mantle cell lymphoma (Cyclin CCND1)
Teratocarcinoma (PDGF)
Hypertension (Ca2+ ATPase)
Consensus nucleotides at
intron/exon junctions
Alternative splicing is a mechanism for
Generating functional diversity
Alternative processsing example
RNA editing
RNA editing is a rare form of post-transcriptional processing
whereby base-specific changes are enzymatically introduced at the
RNA level. Types of RNA editing in humans:
(i) C---> U, occurs in humans by a specific cytosine deaminase
e.g. The expression of the human apolipoprotein B gene in the
intestine involves tissue-specific RNA editing
(ii) A ---> I, the amino group in in carbon 6 of adenine is replaced
by a carbonyl group. I then acts as a G. Occurs in some ligandgated ion channels.
(iii) U ---> C, in mRNA of the WT1 Wilms’ tumor gene
(iv) U ---> A, in alpha-galactosidase mRNA
Apo B-100
Apo B-48
Gene Expression
• The products of genes may be RNA or protein
• RNA and protein synthesis occur in many steps
• These steps are frequently regulated
3. Protein Phosphorylation
Post-translational modifications that alter activity of the p53 protein.
Enzymes that have been shown to modify specific amino acid residues of
p53 are shown. Enzymes that inhibit the covalent modifications are indicated
in red. P, phosphorylation; R, ribosylation; Ac, acetylation.
…increasing informational capability of the genome,
but there are other genes….