Transcript Slide 1
Intro to Microarray Analysis
Courtesy of
Professor Dan Nettleton
Iowa State University
(with some edits)
Some Basic Biology
Genes are DNA sequences that code for proteins.
(e.g. gene lengths perhaps 1000 basepairs to 2.5 million basepairs)
An organism is made by its proteins.
Genes
Proteins
Organism
Complementary Base Pairing
Each DNA nucleotide has one of four bases (A, T, C, G).
Each messenger RNA (mRNA) nucleotide has one of
four bases (A,U, C, or G).
Complementary Base Pairing:
DNA
A
T
C
G
<->
<->
<->
<->
mRNA
U
A
G
C
Transcription and Translation
mRNA “reads” the DNA template.
A sequence of three mRNA nucleotides code for an amino acid.
DNA template sequence: CGTAAGACA...
(transcription)
mRNA sequence: GCAUUCUGU...
(translation)
protein sequence: alanine phenylalanine cysteine ...
Microarrays Measure mRNA
Abundance in Cells
At any point in time, cells of living organisms contain
mRNA sequences waiting to be converted into proteins.
In a microarray experiment, we extract mRNA from
biological samples and use microarrays to measure
how much of each mRNA sequence is present in each
sample.
Why Do Microarray
Experiments?
We have the gene sequences. What are their functions?
If we can learn how a gene’s level of activity changes
across varying conditions, we gain clues about its function.
Some Examples from Iowa
State
Jim Reecy from Animal Science: muscle undergoing
hypertrophy (muscle building) vs. stable muscle
Anne Bronikowski in Genetics: wheel-running mice vs.
non-runners
Roger Wise, Rico Caldo in Plant Pathology: interaction
between multiple isolates of powdery mildew fungus and
multiple genotypes of barley.
Affymetrix GeneChips
Affymetrix is a company that manufactures GeneChips.
Short sequences representing small pieces of genes of
interest are synthetically assembled and attached to a
GeneChip.
Probe sequences are chosen to have good and relatively
uniform hybridization (‘binding’) characteristics.
A probe is chosen to match a portion of its target mRNA
transcript that is unique to that sequence.
Simplified Example
...
gene 1
...
oligo probe
for gene 1
ATTACTAAGCATAGATTGCCGTATA
...gene 2
shared blue regions indicate
high degree of sequence similarity
throughout much of the transcript
...
GCGTATGGCATGCCCGGTAAACTGG
oligo probe for gene 2
9
Affymetrix GeneChips
Each gene (more accurately `sequence of interest’) is
represented by many short oligo probes.
Each short oligo probe is made-up of 25 nucleotides.
Thousands of these probes are
placed together on a small chip
called a GeneChip.
How is this used to measure a
sample’s mRNA?
Affymetrix GeneChips
Only one sample is placed on each GeneChip.
The mRNA that has been extracted from a biological sample
can be labeled (dyed) and hybridized to a GeneChip.
During hybridization the
mRNA strands from the
sample bind to their
respective complementary
oligo probe.
Expression Measures
Scanning of an Affymetrix GeneChip yields one intensity
value for each probe (cell).
A high intensity value for a probe (cell) implies that many
sequences from the biological sample were able to bind to the
sequences in the probe (cell).
There is concern that some of the mRNA that binds to a
particular probe should not really be there (considered a
mistake or non-specific binding).
To try and measure this ‘background noise’, for each perfectly
created probe, a mismatch probe is also created and used.
A Probe Set for Measuring Expression Level
of a Particular Gene
gene sequence
...TGCAATGGGTCAGAAGGACTCCTATGTGCCT...
perfect match sequence
AATGGGTCAGAAGGACTCCTATGTG
mismatch sequence
AATGGGTCAGAACGACTCCTATGTG
probe
pair
probe
cell
probe set (11 probe pairs representing a gene)
13
Different Probe Pairs Represent Different
Parts of the Same Gene
gene sequence
Probes are selected to be specific to the target gene
and have good hybridization characteristics.
14
Affymetrix GeneChips
Fluorescence coming from the squares
(probes) tells researchers whether a
gene is greatly expressed (white and red
features) or not (blue and black
features). (credit: Affymetrix)
Expression Measures
For each probe set (i.e., gene) on a GeneChip, it is often
desirable to summarize the probe cell intensities with one
number that serves as a measure of the expression of the
gene in the biological sample whose RNA was hybridized
to the GeneChip.
There are a number of approaches to this summarization
procedure (e.g. RMA). Some take the ‘background
noise’ into account, while others do not.
Statistical Analysis
Use statistical methods to summarize the expression values on
each chip, i.e. get a single expression value for each gene.
Use statistical methods to normalize the expression values,
i.e. try to remove variation due to technological sources.
(Some procedures do summarization and normalization simultaneously).
Perform a classical statistical analysis (ANOVA, t-test, etc)
on a gene-by-gene basis.
Account for multiple testing, and provide a list of `interesting’
genes with an estimated False Discovery Rate (FDR).
Microarray facility at U-Iowa
University of Iowa Carver College of Medicine
Holden Comprehensive Cancer Center
http://dna-9.int-med.uiowa.edu/?q=node/12