Introduction to DNA Microarrays

Download Report

Transcript Introduction to DNA Microarrays

Introduction to DNA Microarrays
Todd Lowe
BME 88a
March 11, 2003
Goal – study many genes at once
Major types of DNA microarray
How to roll your own
Designing the right experiment
Many pretty spots – Now what?
Interpreting the data
The Goal
“Big Picture” biology –
– What are all the components & processes taking place
in a cell?
– How do these components & processes interact to
sustain life?
One approach: What happens to the entire cell when
one particular gene/process is perturbed?
Genome Sequence Flood
• Typical results from initial analysis of a new
genome by the best computational methods:
For 1/3 of the genes we have a “good” idea what they are
doing (high similarity to exp. studied genes)
For 1/3 of the genes, we have a guess at what they are
doing (some similarity to previously seen genes)
For 1/3 of genes, we have no idea what they are doing (no
similarity to studied genes)
Large Scale Approaches
• Geneticists used to study only one (or a
few) genes at a time
• Now, thousands of identified genes to
assign biological function to
• Microarrays allow massively parallel
measurements in one experiment (3 orders
of magnitude or greater)
Several types of arrays
• Spotted DNA arrays
– Developed by Pat Brown’s lab at Stanford
– PCR products of full-length genes (>100nt)
• Affymetrix gene chips
– Photolithography technology from computer industry
allows building many 25-mers
• Ink-jet microarrays from Agilent
– 25-60-mers “printed directly on glass slides
– Flexible, rapid, but expensive
Basis: The Southern Blot
Basic DNA detection technique that has been used
for over 30 years, known as Southern blots:
1. A “known” strand of DNA is deposited on a solid
support (i.e. nitocellulose paper)
2. An “unknown” mixed bag of DNA is labelled
(radioactive or flourescent)
3. “Unknown” DNA solution allowed to mix with
known DNA (attached to nitro paper), then excess
solution washed off
4. If a copy of “known” DNA occurs in “unknown”
sample, it will stick (hybridize), and labeled DNA
will be detected on photographic film
Spotting Robot Demo
Massive Increase in Measurements
• Most commonly, 5-50 samples can be tested
in each traditional Southern experiment
• Affymetrix chips have >250,000 oligos per chip
(multiple oligos per gene)
• Microarray “spotters” are high-precision robots
with metal pins that dip into DNA solution & tap
down on glass slide (pins work like a fountain
– Currently, ~48,000 different DNA spots can fit on one
glass microscope slide
Pros/Cons of Different Technologies
Spotted Arrays
relative cheap to make
(~$10 slide)
flexible - spot anything
you want
Cheap so can repeat
experiments many times
highly variable spot
usually have to make your
Accuracy at extremes in
range may be less
Affy Gene Chips
expensive ($500 or more)
limited types avail, no
chance of specialized
fewer repeated
experiments usually
more uniform DNA
Can buy off the shelf
Dynamic range may be
slightly better
Types of Array Exp
• mRNA transcription analysis
– Single experiment (control v. experimental)
– Time course (multiple samples in same exp)
• Genomic DNA -- similarity of genomes
– Genetic Footprinting
– Species cross hybridization (existence of a
specific pathway in a related species)
An Array Experiment
Yeast Genome Expression Array
Image Analysis & Data Visualization
200 10000 50.00
4800 4800 1.00
9000 300 0.03
log2 Cy3
What do we want to know?
• Genes involved in a specific biological
process (i.e. heat shock)
• “Guilt by association” - assumption that
genes with same pattern of changes in
expression are involved the same pathway
• Tumor classification - predict outcome /
prescribe appropriate treatment based on
clustering with “known outcome” tumors
Developing New Methods
• How do you know when your method
performs better than a previous method?
• A “gold standard” test set for benchmarking
array data doesn’t exist
• There is too much biology we don’t know:
if a new method classifies a gene in the
“wrong” gene group, is it recognizing new
biology, or just getting it wrong??
Limitations of Arrays
• Do not necessarily reflect true levels of
proteins - protein levels are regulated by
translation initiation & degradation as well
• Generally, do not “prove” new biology simply suggest genes involved in a process,
a hypothesis that will require traditional
experimental verification
• Expensive! $20-$100K to make your own
/ buy enough to get publishable data
Array + Sequence Analysis
Promoter motif extraction (Church/
1. Cluster / classify genes with common
response pattern
2. Align upstream promoter regions (Gibb’s
sampler) or count over-represented X-mers
3. Develop profile / motif from set & search
genome for new candidates w/ motif
4. Return to array data, look for supporting
evidence for new members
5. Carry out experiment to support hypothesis