DNA and Gene Expression

download report

Transcript DNA and Gene Expression

DNA and Gene Expression
Dexoyribonucleic Acid (DNA)
• Two phosphoric acid sugar strands held
apart by pairs of four bases
– Adenine (A), thymine (T), guanine (G),
cytosine (C)
– A pairs with T, G pairs with C
• Self replicating molecule
• Directs protein synthesis
DNA Structure
<static.howstuffworks.com/gif/dna-2.jpg>
<static.howstuffworks.com/gif/dna-base-pairings.gif>
DNA Replication
• Results in two complete double helixes of
DNA
• How nucleotides are added in DNA
replication (animation)
Genome
• Maybe 30,000 genes on human genome
• Gene range from 1000 to 2 million base
pairs
Protein Synthesis
• 20 amino acids, despite 64 possible
combinations from 4 base pairs; duplication
• Codons
– Sequences of three base pairs
– Each codes for an amino acid (or “stop” signal)
• Amino acids assembled into proteins
• Only about 2% of genome involved in
protein synthesis
Genetic Code
Amino Acid
Alanine
Arginine
Aaparagine
Aspartic acid
Cysteine
Glutamic acid
Glutamine
Glycine
Histidine
Isoleucine
Leucine
Lysine
Methionine
Phenylalanine
Proline
Serine
Threonine
Tryptophan
Tyrosine
Valine
(Stop signals)
Codons
CGA, CGG, CGT, CGC
GCA, GCG, GCT, GCC, TCT, TCC
TTA, TTG
CTA, CTG
ACA, ACG
CTT, CTC
GTT, GTC
CCA, CCG, CCT, CCC
GTA, GTG
TAA, TAG, TAT
ATT, AAC, GAA, GAG, GAT, GAC
TTT, TTC
TAC
AAA, AAG
GGA, GGG, GGT, GGC
AGA, AGG, AGT, AGC, TAC, TCG
TGA, TGG, TGT, TGC
ACC
ATA, ATG
CAA, CAG, CAT, CAC
ATT, ATC, ACT
Mutations
• Mistakes made in copying DNA
• Produces different alleles (called
polymorphisms)
• Mutations in gametes are transmitted
faithfully unless natural selection intervenes
Single-Base Mutations
• Can either change or remove a base from a codon
• Changing one base for another
– Generally less likely to have an affect
• Removal of base
– More problematic; shifts the reading of the triplet code
– CGA-CTA-TGA --> CAC-TAT-GA…
– Alanine - aspartic acid - threonine --> valine - isoleucine…
• Changing amino acid
– No, small, or large effect on protein production
Multi-base Mutations
• Some genes can have multiple mutations at
different locations
• Complicates matters enormously for
functionality and identification of effects by
behavioural geneticists
RNA
• Ribonucleic acid
• Differs from DNA
– Single-stranded molecule (generally); shorter
– Ribose, not deoxyribose; RNA is less stable
– Adenine’s complementary nucleotide is uracil
(U), not thymine
• Various forms: mRNA, tRNA, rRNA, noncoding RNA
RNA
• The original genetic code
– Still seen in most viruses
• Single strand vulnerable to predatory
enzymes; double stranded DNA gained
selective advantage
• RNA degrades quickly, is tissue-, age-, and
state-specific
Gene Expression
• Transcription
– Production of mRNA in nucleus from DNA
template
• Translation
– Assembly of amino acids into peptide chains on
basis of information encoded in mRNA
– Occurs in ribosomes
– mRNA and tRNA
mRNA
• mRNA exists only for a few minutes
– Amount of protein produced depends on amount of
mRNA available for translation
– Protein production regulation
• mRNA carries information about a protein
sequence to the ribosomes
– About 100 amino acids added to protein per second
– Proteins 100-1000 amino acids long
Transcription
• Transcription animation
Translation
• Translation video
Non-Coding RNA
• Most DNA transcribed into RNA that is not
mRNA: non-coding RNA
• At least 50% of human genome is
responsible for non-coding RNA
• Mostly involved in directly or indirectly
regulating protein-coding genes
Introns
• DNA sequencers embedded in proteincoding genes
– Transcribed into RNA, but spliced out before
RNA leaves nucleus; non-coding
– From 50 to 20,000 base pairs long
• About 25% of human genome
Introns
• Used to be called “junk” DNA
• Not the case at all
• Introns can regulate transcription of genes
in which they reside
• In some cases can also regulate other genes
Exons
• What’s left (and spliced back together) after
introns are removed
• Usually only a few hundred base pairs long
MicroRNA
• Another class of non-coding RNA
• Usually only 21 base pairs long
– DNA coding for them is about 80 base pairs
• Especially important for regulation of genes
involved in primate nervous system
• Bind to (i.e., “silences”) mRNA
• About 500 microRNA identified; regulate
expression of over 30% of all coding
mRNA
Gene Regulation
• Short-term or long-term
• Responsive to both environmental factors
and expression of other genes
– i.e., genes can turn each other on and off
Polymorphisms
• Genome is about 3 billion base pairs
• Millions of base pairs differ among
individuals
• However, about 2 million base pairs differ
among at least 1 percent of the population
• These are the DNA polymorphisms useful
for behavioural geneticists
Detecting Polymorphisms
• Genetic markers
– Traditionally, single genes were identified by
their phenotypic protein outcome
• DNA markers
– Based on the actual polymorphisms in the DNA
– Millions of DNA base sequences are
polymorphic and can be used in genome-wide
DNA studies
– Identify single-gene disorders
DNA Microarrays
• Gene chips
• Surfaces the size of a postage
stamp
• Hundreds of thousands of DNA
sequences
• Serve as probes to detect gene
expression or single base
mutations
• Fodor's gene chip
<http://learn.genetics.utah.edu
/units/biotech/microarray/>
<http://www.bio.davidson.edu/
Courses/genomics/chip/chipreal.html>
Genetic Screens
• Expose non-humans to mutagens to cause
mutations, increases frequency of unusual alleles
• Basic screens look for a phenotype of interest in
the mutated population
• Enhancer/suppressor screens used when an allele
of a gene leads to a weak mutant phenotype
– E.g., weak effect: damaged or abnormal limb, organ,
behaviour trait
– E.g., strong effect: total absence of limb, organ,
behaviour
Classic Approach
• Map mutants by locating a gene on its
chromosome through crossbreeding studies
• Statistics on frequency of traits that cooccur are utilized
More Recently
• Produce disruption in DNA, then look for
effect on whole organism
• Random or directed deletions, insertions,
and point mutations produce a mutagenized
population
• Screen population for specific change at the
gene of interest
Directed Deletions and Point
Mutations
• Gene knockouts
– Individuals engineered to carry genes made inoperative
(“knocked out”)
• Gene silencing (“gene knockdown”)
– Uses double stranded RNA to temporarily disrupt gene
expression
– Produces specific effect without mutating the DNA of
interest
• Transgenic organisms
– E.g., over express normal gene
Single Nucleotide Polymorphisms
• SNPs
– A variation in DNA sequence when a single nucleotide
(A, T, C, G) in the genome differs between individuals
or between paired chromosomes of an individual
• AAGCCTA to AAGCTTA
– Two alleles here: C and T
• Almost all common SNPs have only two alleles
• For a variation to be called a SNP it must occur in
at least 1% of the population
Amino Acid Sequence
• SNPs won’t necessarily change the amino
acid sequence of a protein
– Duplication of codons
• Synonymous SNPs
– Both forms produce same polypeptide sequence
– “Silent mutation”
• Non-synonymous SNPs
– Different polypeptide sequences are produced
Coding Regions
• SNPs can exist in both protein coding and
non-coding regions of genome
• Even non-protein coding region SNPs can
have effects
– Gene splicing
– Transcription factor binding
– Sequencing of non-coding RNA
Example
• SNP in coding region with subtle effect
• Change the GAU codon to GAG
– Changes amino acid from aspartic acid to
glutamic acid
– Similar chemical properties, but glutamic acid
is a bit bigger
• This change to a protein is unlikely to be
crucial to its function
Example
• SNP in coding region with large effect
• Sickle-cell anemia
• Changes one nucleotide base in coding
region of hemoglobin beta gene
– Glutamic acid replaced by valine
– Hemoglobin molecule no longer carrying
oxygen as efficiently due to drastic change in
protein shape
Latent Effects
• SNP in coding region only switching gene
on under certain conditions
• Under normal conditions, gene is switched
off (is latent)
• Can activate under specific environmental
conditions
– E.g., exposure to precarcinogens or carcinogens
SNPs and Cancer
• SNP changes to genes for proteins regulating rate
of absorbing, binding, metabolizing, excreting
precarcinogens or carcinogens
• Small changes can alter an individual’s risk for
cancer
• SNP does no harm itself under normal
circumstances, only having an effect when person
is exposed to a particular environmental agent
– E.g., Two people with different SNPs could both
smoke, but only one develops cancer, responds to
therapy, etc.
Smoking and Susceptibility
• Precarcinogens from tobacco enter lungs
– Lodge in fat-soluable areas of cells
– Bind to proteins converting precarcinogens to
carcinogens
• Reactive molecules quickly eliminated
– Detoxifying proteins make carcinogens watersoluable
– Excreted in urine before (hopefully) damaging
cell
SNP Variability
• Different SNPs may express hyperactive or lazy
activator (or something in between)
– The carcinogen-making protein
– E.g., Hyperactive: “grab” and convert more
precarcinogens than usual or do it more rapidly
– E.g., Influence effectiveness of detoxifying enzymes
– If more carcinogens build up in lungs, more damage to
cells’ DNA
• Different SNPs could alter individuals’ risk of
lung cancer
Bladder Cancer
• Workers in dye industry exposed to arylamines
– Have increased risk of bladder cancer
• SNPs may be involved
• In liver, an acetylator enzymes acts on arylamines,
deactivating them for excretion
• SNPs produce several different slow forms of
acetylator enzyme, keeping arylamines in liver for
longer
– More are converted to precarcinogens, increasing risk
for cancer
Polygenetic Effect
• SNPs don’t entirely explain this
• Not all individuals with slow acetylators
exposed to arylamines are at increased risk
of bladder cancer
– About half of North American population has
slow acetylators
– Only 1 in 500 develop bladder cancer
• Other yet undiscovered genes and proteins
involved
Drug Therapies
• SNPs could also explain different patient reactions
to the same drug treatment
• Many proteins interact with a drug
– Transportation through body, absorption into tissues,
metabolism into more active or toxic by-products,
excretion
• Having SNPs in one or more of the proteins
involved may alter the time the body is exposed to
the active form of the drug
– E.g., individuals with behaviourally similar forms of
schizophrenia can react very differently to the same
drug therapy
SNPs and Gene Mapping
• SNPs are very common variations
throughout the genome
• Relatively easy to measure
• Very stable across generations
• Useful as gene markers
• Contribute to understanding of complex
gene interactions in behaviours and
behavioural disorders
By Association
• If SNP located close to gene of interest
• If gene passed from parent to child, SNP is
likely passed too
• Can infer that when same SNP found in a
group of individuals’ genomes that
associated gene is also present
Sequencing SNPs
• Sequence the genome of large numbers of
people
• Compare base sequences to discover SNPs
• Goal is to generate a single map of human
genome containing all possible SNPs
SNP Profile
• Each individual has his or her own pattern
of SNPs
– “SNP profile”
• By studying SNP profiles in populations
correlations will emerge between specific
SNP profiles and specific behaviour traits
– E.g., specific responses to cancer treatments
What is a Gene?
• “Gene” from “pangenesis” (Darwin’s
mechanism of heredity)
• Greek: genesis (“birth”) or genos (“origin”)
• First coined by Wilhelm Johannsen in 1909
Central Dogma
• One gene, one protein
• Information travels from DNA through
RNA to protein
• Gene = DNA region expressed as mRNA,
then translated into polypeptide
• View held through 1960s
Extended Dogma
• Transcribed mRNA produces single
polypeptide chain (folds into functional
protein)
• This molecule performs discrete, discernible
cellular function
• Gene regulated by promoter and
transcription-factor binding sites on nearby
DNA
Simplified Extended Dogma
From: Seringhaus & Gerstein, 2008
Implications
• Nomenclature
– Gene named and classified by basic function
• Traditional classification systems
– Vertically hierarchical
– Broad functional categories (e.g., genes whose products
catalyze a hydrolysis reaction) to specific functions
(e.g., “amylase” describing specific break-down of
starch)
• 1950s: International Commission on Enzymes
Classification, Munich Information Center for
Protein Sequences
• One gene, one protein, one function
• Straightforward view of subcellular life
• Allowed conception of single protein as
indivisible unit in larger cellular network
• When mapping genes across species, could
assume a protein is either fully preserved in
organisms or entirely absent
• Allowed easy grouping of related proteins
in different species
• Extended dogma includes regulation,
function, and conservation
Current View
• High-throughput experiments
– Probe activity of millions of bases in genome
simultaneously
• Much more complex than extended dogma
Creating RNA Transcript
• Genes only small fraction of human genome
• Genome pervasively transcribed (ENCODE
Project)
• Non-genic (i.e., genome outside known
gene boundaries) transcription very
widespread (even including “pseudogenes”)
• Function of non-gene transcribed material
as yet unclear
Pseudogenes
• DNA sequences
• Similar to functional genes, but contain genetic lesions
(e.g., truncations, premature stop codons); disrupts ability
to encode proteins or structural RNA
– Long considered “fossils” of past genes
• Recent estimates: 5-20% of human pseudogenes can be
transcriptionally active (Zheng & Gerstein, 2007)
• Might achieve functionality via: fusing with mRNAs from
nearby functional genes to form chimeric RNAs, having
RNA transcript that has regulatory role, combining with
new DNA to generate a new gene
Introns/Exons
• Long understood that eukaryote genes composed
of short exons separated by long introns
• Introns transcribed to RNA that is spliced out
before proteins produced
• Now know splicing for a gene-containing locus
can be done in multiple ways
– Individual exons left out of final product
– Only portions of the sequence in an exon are preserved
– Sequences from outside gene can be spliced in
• Result: many variants of a single gene
Example of Current View
From: Seringhaus & Gerstein, 2008
Gene Regulation
• Traditional view
– Protein-coding portion of gene and regulatory sequence
in close proximity on chromosome
• Doesn’t apply well to mammalian and other higher
eukaryote systems
• Gene activity influenced by epigenetic
modifications (changes to DNA itself or to support
structures of DNA)
• Genes can be regulated over 50,000 base pairs
away, beyond adjacent genes
• Looping and folding of DNA brings distant spans
into close proximity
DNA Folding
From: Seringhaus & Gerstein, 2008
Implications
• Defining gene functionality much more
difficult now
• Traditionally done by phenotypic effect
• Doesn’t capture function on molecular
level, though
• Also, pathways a gene product engages in
within a cell significant for understanding
functionality
Classification
• Non-trivial problem in deciding which
qualities of a gene and its products to use
• Earlier approaches assumed simple
hierarchical scheme
• No longer so simple
• Recent computer technologies offering
solutions
Direct Acyclic Graphs (DAGs)
Simple
hierarchy
DAG
hierarchy
In simple hierarchy
a gene has only one
“parent” for each
node.
In the DAG
approach each node
can have multiple
“parents”. Genes
can be classified
within multiple
groups.
From: Seringhaus & Gerstein, 2008
Naming
• Cross-species gene identification difficult
• Naming inconsistent
• Often, traditionally, have different names for
functionally similar (or same) gene in different
species
• Recent increases in computing power and genome
sequencing making homology mapping of similar
genes across species feasible
Example: Notch Pathway
• Highly conserved among species
• Defective Notch encodes receptor protein in fruit
flies that produces notched wing shape
• Traditional views of Notch pathway quite limited
• High throughput experiments in humans
identifying many more proteins involved in
pathway
• Hypertext software now makes identifying
connections easier
Notch Pathway
Traditional
Current
From: Seringhaus & Gerstein, 2008