Introductory Biology Primer - A computational tour of the human

Download Report

Transcript Introductory Biology Primer - A computational tour of the human

A Zero-Knowledge Based Introduction to Biology
Cory McLean
26 Sep 2008
Thanks to George Asimenos
Cells: Building Blocks of Life
cell, nucleus, cytoplasm, mitochondrion
© 1997-2005 Coriell Institute for Medical Research
2
DNA: “Blueprints” for a cell
• Genetic information encoded in long strings of
double-stranded DNA
• DeoxyriboNucleic Acid comes in only four
flavors: Adenine, Cytosine, Guanine, Thymine
3
Nucleotide
deoxyribose, nucleotide, base, A, C, G, T, purine, pyrimidine, 3’, 5’
purines
to previous nucleotide
O
O
P
O-
H
O
Thymine (T) Cytosine (C)
C
H
C
H
C
3’
to base
O
C
H
Adenine (A) Guanine (G)
5’
H
C
H
pyrimidines
H
to next nucleotide
Let’s write “AGACC”!
4
“AGACC” (backbone)
5
“AGACC” (DNA)
deoxyribonucleic acid (DNA)
3’
5’
3’
5’
6
DNA is double stranded
strand, reverse complement
5’
3’
3’
5’
DNA is always written 5’ to 3’
AGACC or GGTCT
7
DNA Packaging
histone, nucleosome, chromatin, chromosome, centromere, telomere
telomere
centromere
nucleosome
DNA
chromatin
H1
~146bp
H2A, H2B, H3, H4
8
The Genome
The genome is the full set of hereditary
information for an organism
 Humans bundle two copies of the genome into
46 chromosomes in every cell
 = 2 * (1-22 + X/Y)

9
Building an Organism
DNA
cell
Every cell has the same sequence of DNA
Subsets of the DNA sequence
determine the identity and
function of different cells
10
From DNA To Organism
?
Proteins do most of the work in biology, and
are encoded by subsequences of DNA, known
as genes.
11
RNA
ribose, ribonucleotide, U
purines
to previous ribonucleotide
O
O
P
O-
H
O
Uracil (U)
C
H
Cytosine (C)
C
H
C
3’
to base
O
C
H
Adenine (A) Guanine (G)
5’
H
C
OH
to next ribonucleotide
H
pyrimidines
TU
12
Genes & Proteins
gene, transcription, translation, protein
Double-stranded DNA
5’
3’
TAGGATCGACTATATGGGATTACAAAGCATTTAGGGA...TCACCCTCTCTAGACTAGCATCTATATAAAACAGAA
ATCCTAGCTGATATACCCTAATGTTTCGTAAATCCCT...AGTGGGAGAGATCTGATCGTAGATATATTTTGTCTT
3’
5’
(transcription)
Single-stranded RNA
AUGGGAUUACAAAGCAUUUAGGGA...UCACCCUCUCUAGACUAGCAUCUAUAUAA
(translation)
protein
13
Gene Transcription
promoter
5’
3’
G A T T A C A . . .
C T A A T G T . . .
3’
5’
14
Gene Transcription
transcription factor, binding site, RNA polymerase
5’
3’
G A T T A C A . . .
C T A A T G T . . .
3’
5’
Transcription factors: a type of protein that
binds to DNA and helps initiate gene
transcription.
Transcription factor binding sites: Short
sequences of DNA (6-20 bp) recognized and
bound by TFs.
RNA polymerase binds a complex of TFs in
the promoter.
15
Gene Transcription
5’
3’
3’
5’
The two strands are separated
16
Gene Transcription
5’
3’
3’
5’
An RNA copy of the 5’→3’ sequence is
created from the 3’→5’ template
17
Gene Transcription
5’
3’
pre-mRNA 5’
G A T T A C A . . .
3’
5’
C T A A T G T . . .
G A U U A C A . . .
3’
18
RNA Processing
5’ cap, polyadenylation, exon, intron, splicing, UTR, mRNA
5’ cap
exon
poly(A) tail
intron
mRNA
5’ UTR
3’ UTR
19
Gene Structure
introns
5’
3’
promoter
5’ UTR
exons
3’ UTR
coding
non-coding
20
How many?
(Human Genome)
• Exons per gene:
~ 8 on average (max: 148)
• Nucleotides per exon:
170 on average (max: 12k)
• Nucleotides per intron:
5,500 on average (max: 500k)
• Nucleotides per gene:
45k on average (max: 2,2M)
21
From RNA to Protein
• Proteins are long strings of amino acids joined
by peptide bonds
• Translation from RNA sequence to amino acid
sequence performed by ribosomes
• 20 amino acids  3 RNA letters required to
specify a single amino acid
22
Amino acid
amino acid
H
N
H
O
C
C
H
R
OH
Alanine
Arginine
Asparagine
Aspartate
Cysteine
Glutamate
Glutamine
Glycine
Histidine
Isoleucine
Leucine
Lysine
Methionine
Phenylalanine
Proline
Serine
Threonine
Tryptophan
Tyrosine
Valine
There are 20 standard amino acids
23
Proteins
N-terminus, C-terminus
to previous aa
N
H
O
C
C
to next aa
H
R
H
N-terminus
(start)
from 5’
OH
C-terminus
(end)
3’ mRNA
24
Translation
ribosome, codon
P site
A site
mRNA
The ribosome (a complex of protein and
RNA) synthesizes a protein by reading the
mRNA in triplets (codons). Each codon is
translated to an amino acid.
25
Translation
A site
P site
mRNA
U
C
A
G
U
C
A
G
UUU Phenylalanine (Phe)
UCU Serine (Ser)
UAU Tyrosine (Tyr)
UGU Cysteine (Cys)
U
UUC Phe
UCC Ser
UAC Tyr
UGC Cys
C
UUA Leucine (Leu)
UCA Ser
UAA STOP
UGA STOP
A
UUG Leu
UCG Ser
UAG STOP
UGG Tryptophan (Trp)
G
CUU Leucine (Leu)
CCU Proline (Pro)
CAU Histidine (His)
CGU Arginine (Arg)
U
CUC Leu
CCC Pro
CAC His
CGC Arg
C
CUA Leu
CCA Pro
CAA Glutamine (Gln)
CGA Arg
A
CUG Leu
CCG Pro
CAG Gln
CGG Arg
G
AUU Isoleucine (Ile)
ACU Threonine (Thr)
AAU Asparagine (Asn)
AGU Serine (Ser)
U
AUC Ile
ACC Thr
AAC Asn
AGC Ser
C
AUA Ile
ACA Thr
AAA Lysine (Lys)
AGA Arginine (Arg)
A
AUG Methionine (Met) or START
ACG Thr
AAG Lys
AGG Arg
G
GUU Valine (Val)
GCU Alanine (Ala)
GAU Aspartic acid (Asp)
GGU Glycine (Gly)
U
GUC Val
GCC Ala
GAC Asp
GGC Gly
C
GUA Val
GCA Ala
GAA Glutamic acid (Glu)
GGA Gly
A
GUG Val
GCG Ala
GAG Glu
GGG Gly
G
26
Translation
5’
...AUUAUGGCCUGGACUUGA...
UTR
Met
Start
Codon
Ala
Trp
3’
Thr
Stop
Codon
27
Translation
5’
...AUUAUGGCCUGGACUUGA...
3’
28
Translation
Met
5’
Trp
Ala
...AUUAUGGCCUGGACUUGA...
3’
29
Errors?
mutation
• What if the transcription / translation machinery makes
mistakes?
• What is the effect of mutations in coding regions?
30
Reading Frames
reading frame
G C U U G U U U A C G A A U U A G
G C U U G U U U A C G A A U U A G
G C U U G U U U A C G A A U U A G
G C U U G U U U A C G A A U U A G
31
Synonymous Mutation
synonymous (silent) mutation, fourfold site
G
G C U U G U U U A C G A A U U A G
G C U U G U U U A C G A A U U A G
Ala
Cys
Leu
Arg
Ile
G C U U G U U U G C G A A U U A G
Ala
Cys
Leu
Arg
Ile
32
Missense Mutation
missense mutation
G
G C U U G U U U A C G A A U U A G
G C U U G U U U A C G A A U U A G
Ala
Cys
Leu
Arg
Ile
G C U U G G U U A C G A A U U A G
Ala
Trp
Leu
Arg
Ile
33
Nonsense Mutation
nonsense mutation
A
G C U U G U U U A C G A A U U A G
G C U U G U U U A C G A A U U A G
Ala
Cys
Leu
Arg
Ile
G C U U G A U U A C G A A U U A G
Ala
STOP
34
Frameshift
frameshift
G C U U G U U U A C G A A U U A G
G C U U G U U U A C G A A U U A G
Ala
Cys
Leu
Arg
Ile
G C U U G U
U A C G A A U U A G
Ala
Tyr
Cys
Glu
Leu
35
Gene Expression Regulation
Regulation, signal transduction
• When should each gene be expressed?
• Regulate gene expression
Examples:
– Make more of gene A when substance X is present
– Stop making gene B once you have enough
– Make genes C1, C2, C3 simultaneously
• Why? Every cell has same DNA but each cell expresses
different proteins.
• Signal transduction: One signal converted to another
– Cascade has “master regulators” turning on many proteins,
which in turn each turn on many proteins, ...
36
Gene Regulation

Gene expression is controlled at many levels:

DNA chromatin structure

Transcription

Post-transcriptional modification

RNA transport

Translation

mRNA degradation

Post-translational modification
37
Transcription Regulation
• Much gene regulation occurs at the
level of transcription.
• Primary players:
– Binding sites (BS) in cis-regulatory modules
(CRMs)
– Transcription factor (TF) proteins
– RNA polymerase II
• Primary mechanism:
– TFs link to BSs
– Complex of TFs forms
– Complex assists or inhibits formation of the
RNA polymerase II machinery
38
Tx Factor Binding Sites
• Short, degenerate DNA sequences recognized
by particular TFs
• For complex organisms, cooperative binding of
multiple TFs required to initiate transcription
Binding Sequence Logo
39
Transcription Regulation Mechanisms
enhancer, silencer, insulator
Transcription Factor Specificity:
Enhancer:
Silencer:
Insulator:
40
Unicellular vs. Multicellular
unicellular
multicellular
41
Non-coding RNAs
• RNAs transcribed from DNA but not translated
into protein
• Structural ncRNAs: Conserved secondary
structure (A-U, C-G, G-U)
• Involved in gene regulation
42
Summary
• All hereditary information encoded in doublestranded DNA
• Each cell in an organism has same DNA
• DNA  RNA  protein
• Proteins have many diverse roles in cell
• Gene regulation diversifies protein products
within different cells
43
The end?
44