Transcript Document
Statistics for Microarrays
Biological background: Gene Expression
and Molecular Laboratory Techniques
Class web site: http://statwww.epfl.ch/davison/teaching/Microarrays/
Basic principles in physics, chemistry and biology
Principles Known?
Physics
Chemistry
Biology
Matter
Compound
Organism
Elementary
Particles
Elements
Genes
Yes
Yes
No
Central Paradigm
(RT)
Protein Synthesis
Transcription
• Transcription is a complex process involving
several steps and many proteins (enzymes)
• RNA polymerase synthesizes a single
strand of RNA against the DNA template
strand (anti-sense strand), adding
nucleotides to the 3’ end of the RNA chain
• Initiation is regulated by transcription
factors, including promoters, usually an
initiator element and TATA box, usually
lying just upstream (at the 5’ end) of the
coding region
• 3’ end cleaved at AAUAAA, poly-A tail
added
Exons and Introns
• Most of the genome consists of non-coding
regions
• Some non-coding regions (centromeres and
telomeres) may have specific chomosomal
functions
• Other non-coding regions have regulatory
purposes
• Non-coding, non-functional DNA often
called junk DNA, but may have some effect
on biological functions
• The terms exon and intron refer to coding
and non-coding DNA, respectively
Intron Splicing
Transcription Overview
Transcription Illustration
Translation
• The AUG start codon is recognized by
methionyl-tRNAiMet
• Once the start codon has been identified,
the ribosome incorporates amino acids into
a polypeptide chain
• RNA is decoded by tRNA (transfer RNA)
molecules, which each transport specific
amino acids to the growing chain
• Translation ends when a stop codon (UAA,
UAG, UGA) is reached
Translation Illustrated
From Primary Transcript
to Protein
Alternative Splicing (of Exons)
• How is it possible that there are over
1,000,000 human antibodies when there
are only about 30,000 genes?
• Alternative splicing refers to the different
ways the exons of a gene may be combined,
producing different forms of proteins
within the same gene-coding region
• Alternative pre-mRNA splicing is an
important mechanism for regulating gene
expression in higher eukaryotes
Molecular Laboratory Techniques
• Hybridizing DNA
• Copying DNA
• Cutting DNA
• Probing DNA
Hybridization
• Hybridization exploits a potent feature of
the DNA duplex – the sequence
complementarity of the two strands
• Remarkably, DNA can reassemble with
perfect fidelity from separated strands
• Strands can be separated (denatured) by
heating
Polymerase Chain Reaction (PCR)
• PCR is used to amplify (copy) specific DNA
sequences in a complex mixture when the ends of
the sequence are known
• Source DNA is denatured into single strands
• Two synthetic oligonucleotides complementary to
the 3’ ends of the segment of interest are added
in great excess to the denatured DNA, then the
temperature is lowered
• The genomic DNA remains denatured, because
the complementary strands are at too low a
concentration to encounter each other during the
period of incubation, but the specific
oligonucleotides hybridize with their
complementary sequences in the genomic DNA
PCR, ctd
• The hybridized oligos then serve as primers
for DNA synthesis, which begins upon
addition of a supply of nucleotides and a
temperature resistant polymerase such as
Taq polymerase, from Thermus aquaticus (a
bacterium that lives in hot springs)
• Taq polymerase extends the primers at
temperatures up to 72˚C
• When synthesis is complete, the whole
mixture is heated further (to 95˚C) to melt
the newly formed duplexes
• Repeated cycles (25—30) of synthesis
(cooling) and melting (heating) quickly provide
many DNA copies
(BREAK)
Types of Viruses
A virus is a nucleic acid in a protein coat.
Reverse transcriptase makes a complementary
DNA copy from RNA.
Reverse transcription
Clone cDNA strands, complementary to the mRNA
mRNA
G U AA U C C U C
Reverse
transcriptase
cDNA
CATTAG GAG
CA
G
CT
ACTTAA
TTG
ATG
GAA
G
AGGA G
G
C
A
T
T
A
G
C A T T A G G AGGA G
RT-PCR
Restriction Enzymes Cut DNA
Restriction Enzymes
• When a bacterium is invaded by a DNAcontaining organism (e.g. virus), it can defend
itself with restriction enzymes (REs; also
called restriction endonucleases)
• REs recognize a specific short sequence of
DNA and cut both strands
• The recognition sequence is typically a
palindrome – i.e. the sequence in one strand is
the same as in the other, read in the other
direction (e.g. GAATTC)
• REs named after the bacteria in which they
occur, plus sequence number (e.g. Eco RI)
RE Example (Eco RI)
(cut)
5’ – GAATTC – 3’
3’ – CTTAAG – 5’
(cut)
Probing DNA
• One way to study a specific DNA fragment
within a genome is to probe for the sequence of
the fragment
• A probe is a labeled (usually radioactive or
fluorescent) single-stranded oligonucleotide,
synthesized to be complementary to the
sequence of interest – probe sequence is known
• Attach single-stranded DNA to a membrane (or
other solid support) and incubate with the
probe so that it hybridizes
• Visualize the probe (e.g. by X-ray for
radioactive probes)
The Southern blotting technique
Sample Autoradiogragh (Gel)
Types of Blots
• Southern Blot – use DNA to probe DNA
• Northern Blot – use DNA to probe RNA
• Western Blot – use antibodies to probe
Protein
Measuring Gene Expression
Idea: measure the amount of mRNA to see which
genes are being expressed in (used by) the cell.
Measuring protein would be more direct, but is
currently harder.
Microarrays provide a means
to measure gene expression
Areas Being Studied with Microarrays
• Differential gene expression between two (or
more) sample types
• Similar gene expression across treatments
• Tumor sub-class identification using gene
expression profiles
• Classification of malignancies into known classes
• Identification of “marker” genes that
characterize different tumor classes
• Identification of genes associated with clinical
outcomes (e.g. survival)
cDNA microarray experiments
mRNA levels compared in many different contexts
• Different tissues, same organism (brain v. liver)
• Same tissue, same organism (ttt v. ctl, tumor v.
non-tumor)
• Same tissue, different organisms (wt v. ko, tg, or
mutant)
• Time course experiments (effect of ttt,
development)
Web animation of a cDNA microarray
experiment
http://www.bio.davidson.edu/courses/genomics/chip/
chip.html
Yeast genome on a chip
Brief outline of steps for producing a
microarray
• cDNA probes attached or synthesized to
solid support
• Hybridize targets
• Scan array
cDNA microarrays
cDNA clones
cDNA microarrays
Compare the genetic expression in two samples of cells
PRINT
cDNA from one
gene on each spot
SAMPLES
cDNA labelled red/green
e.g. treatment / control
normal / tumor tissue
HYBRIDIZE
Add equal amounts of
labelled cDNA samples to
microarray.
SCAN
Laser
Detector
Quantification of expression
For each spot on the slide we calculate
Red intensity = Rfg - Rbg
(fg = foreground, bg = background) and
Green intensity = Gfg - Gbg
and combine them in the log (base 2) ratio
Log2( Red intensity / Green intensity)
Gene Expression Data
On p genes for n slides: p is O(10,000), n is
O(10-100), but growing,
Slides
Genes
1
2
3
4
5
slide 1
slide 2
slide 3
slide 4
slide 5
…
0.46
-0.10
0.15
-0.45
-0.06
0.30
0.49
0.74
-1.03
1.06
0.80
0.24
0.04
-0.79
1.35
1.51
0.06
0.10
-0.56
1.09
0.90
0.46
0.20
-0.32
-1.09
...
...
...
...
...
Gene expression level of gene 5 in slide 4
=
Log2( Red intensity / Green intensity)
These values are conventionally displayed
on a red (>0) yellow (0) green (<0) scale.
Biological question
Differentially expressed genes
Sample class prediction etc.
Experimental design
Microarray experiment
16-bit TIFF files
Image analysis
(Rfg, Rbg), (Gfg, Gbg)
Normalization
R, G
Estimation
Testing
Clustering
Biological verification
and interpretation
Discrimination