Bumgarner_Array-Tutorial-2001

Download Report

Transcript Bumgarner_Array-Tutorial-2001

DNA Arrays - Technology
and Uses : A tutorial
Roger Bumgarner
1/10/01
The University
of Washington
School of
Medicine
Department of Microbiology
Outline
• Types of arrays
– Choices to be made
• Applications of arrays
– Focus on Expression analysis
• Data Analysis
The University
of Washington
School of
Medicine
Department of Microbiology
DNA Arrays
• Spots of DNA arranged in a particular
spatial arangement on a solid support
• Supports - Filters(nylon, nitrocellulose),
glass, silicon
• Types
– Spotted- cDNA’s, genomic clones, oligos
– Synthesized - Light directed synthesis, spatially
directed fluidics(ink-jet)
The University
of Washington
School of
Medicine
Department of Microbiology
The Original DNA Array
Petri dish with
bacterial colonies
Apply membrane and lift to make a filter
containing DNA from each clone.
Probe and image to identify
Clones homologous to the probe.
Robotic Spotters for Filters
The University
of Washington
School of
Medicine
Department of Microbiology
Major Suppliers of Filter Based Arrays
• Research Genetics (www.resgen.com)
–
–
–
–
Human (35-40k genes, some specific sets)
Rat (3 filters, 15k genes total)
Mouse (5.5k genes)
Yeast (6.2k genes)
• Incyte (Genome Systems -www.incyte.com)
– A variety genomic filters and microarrays
• Clontech (www.clontech.com)
– Human, mouse, rat - filters and glass - many custom
sets
The University
of Washington
School of
Medicine
Department of Microbiology
Oligo Arrays
• Synthesized or spotted arrays of short
(typically <20 base pairs) oligos of chosen
sequence.
• Synthesis methods - light directed, ink jet.
• Spotting using reactive coupling.
• Used for re-sequencing, genotyping,
diagnostics and expression arrays.
The University
of Washington
School of
Medicine
Department of Microbiology
Affymetrix Array Technology
(www.affymetrix.com)
The University
of Washington
School of
Medicine
Department of Microbiology
The “Gene Chip” from Affymetrix
The University
of Washington
School of
Medicine
Department of Microbiology
Ink Jet Synthesis
C-
A-
OH OH OH OH OH
C
1) Deposit
phosphoramadite
2) After coupling
OH OH OH OH
C
C
A A
OH
3) After deprotection
C
A A OH
C C
G G
C C
4) Repeat
G G
T T
A A
OH
Ink Jet Synthesis Companies
• Rosetta InPharmatics (www.rii.com)
– working system, “FlexJet™ Arrays”
• Protogene (www.protogene.com)
– working system, “FlexChip™ Arrays”
• InCyte (www.InCyte.com)
– licensed a patent from Caltech
The University
of Washington
School of
Medicine
Department of Microbiology
Spotted Oligo Arrays
Drop containing
modified oligo
(typically 5’ NH2)
C
A
G
T
T
T
G
A
C
A
G
T
T
T
G
A
Reactive surface
The University
of Washington
School of
Medicine
Department of Microbiology
Reactive Slides
• Surmodics (www.surmodics.com)
(availability???)
• Telechem (www.arrayit.com)
• Noab Diagnostics
(http://www.noabdiagnostics.com)
• homebrew
The University
of Washington
School of
Medicine
Department of Microbiology
Relative merits of different
methods of making oligo arrays
• Affy:
– available now, small feature size possible
• Inkjet:
– much more flexible to design
• Spotted:
– less practical for large numbers (>a few 100) of
oligo’s, can be made with std. spotting
equipment.
The University
of Washington
School of
Medicine
Department of Microbiology
Arrays of longer DNA’s
• Typically PCR products
– ORF’s with gene specific primers
– cDNA inserts
• Spotted onto derivatized slides
– Vendors (Amersham, Corning, Telechem,
Surmodics, etc.)
– Homebrew (polylysine cmgm.stanford.edu/pbrown/mguide/index.html)
The University
of Washington
School of
Medicine
Department of Microbiology
Arrayers
•
•
•
•
•
•
•
Amersham/Molecular Dynamics (www.amersham.com)
Genomic Solutions (www.genomicsolutions.com)
Genetix (www.genetix.co.uk)
GeneMachines (www.genemachines.com)
Genetic Microsystems (www.geneticmicro.com)
Intelligent Automation Systems (www.ias.com/bio.html)
many, many, others – See www.ncbi.nlm.nih.gov/ncigap “Expression Technologies”
The University
of Washington
School of
Medicine
Department of Microbiology
MD GenIII
Arrayer
Plate hotel
holds thirteen
384-well plates
Gridding head,
12 pins
Slide holder
36 slides
Features:
•36 slides in 5 hours
•4608 genes spotted in duplicate
•Built-in humidity control
Choices to be made
• Type of substrate:
– Filters, glass, silicon (Affymetrix) ?
• Type of target
– Oligo or longer (PCR product, clone) ?
• Where to obtain the arrays
– In-house production or purchase/collaborate
The University
of Washington
School of
Medicine
Department of Microbiology
Decision Parameters
• Application
– Genotyping - requires oligo arrays
– Expression analysis can be done with oligo or cDNA
arrays but ...
– Is separation of homologous genes, splice variants
important? - oligos
• Organism
– human, mouse, rat, yeast, e-coli arrays are
commercially available.
– Other- you must make.
The University
of Washington
School of
Medicine
Department of Microbiology
Decision Parameters - cont.
• Amount of sample
– Glass or Affy arrays - typically 1-2ug of mRNA
or 10-50ug of total
– Filters - 10-20 ng of mRNA
• Number of genes
• $’s - Commercial arrays average $1000 for
5000 gene arrays - $500 for Affy (single
color)
The University
of Washington
School of
Medicine
Department of Microbiology
Practical Advice
• For genotyping, oligos
– small number of loci (a few 100): Make your own
– large number of loci : purchase
• Replicate measurments are essential so cost is a
very important factor.
• For expression analysis, the cost of in-house
cDNA arrays is at 2-5x less than commercial
arrays - our cost is $260/array.
• A lot can be done with cDNA arrays.
The University
of Washington
School of
Medicine
Department of Microbiology
The UW Center for Expression
Arrays
• Arrays
– Human: 15k sequence verified set from Research
Genetics
– Mouse: 15k sequence verified set from NIA
– Yeast: Full genome set from Fields lab
– Psedomonas: Full genome set from Steve Lory
• Each array contains between 4600 and 7600 genes
spotted in duplicate (9200-15,200 spots) - $260/ea.
The University
of Washington
School of
Medicine
Department of Microbiology
The UW Center for Expression
Services
• RNA QC - run on the Agilent 2100 bioanalyzer ($10/sample)
• Scanning (included in the cost of a slide)
• Analysis facilities
– Computers in RPRC, Rosen
– Home brew software + Rosetta’s Resolver package(1Q,2001)
• Protocols
• Contact Kimberly Smith - kimeyeam@u…. 732-6049
The University
of Washington
School of
Medicine
Department of Microbiology
How are we doing?
Typical Yeast Array Data
Typical Human Array from
Training Session
(2000101528 from 12/1/00: HeLa WT vs HepG2)
Where are arrays likely to go?
• Commercial arrays for common organisms
will come down in price - must reach a few
100$’S or less.
• Oligo arrays are superior for most
applications
• In the future we will focus on hybs and data
analysis, also “odd-ball” organisms.
The University
of Washington
School of
Medicine
Department of Microbiology
Applications for DNA Arrays
•
•
•
•
Sequence checking/re-sequencing
Genotyping
Translation State Analysis
Gene expression analysis
The University
of Washington
School of
Medicine
Department of Microbiology
Sequencing By Hybridization (SBH)
TGTCATGCATATGCGGAATCACTTAGCATCGACTACGCATC...
ACAGTACGTATACGCCTTAGTGAATCGTAGCTGATGCGTAG...
ACAGTACGTA
CAGTACGTAT
AGTACGTATA
GTACGTATAC
TACGTATACG
ACGTATACGC
CGTATACGCC
GTATACGCCT
TATACGCCTT
ATACGCCTTA
ETC...
The University
of Washington
School of
A sequence N bases
long contains (N-10)
10 base pair sequences,
each one of which has
9 base pairs of overlap
with another sequence
Medicine
Department of Microbiology
Problem with SBH
Suppose I have the following 43 bp sequence:
---------1---------2---------3---------4--TGTCATGCATATGCGGAATCCTTAGCTGTCATGCATATGCGGA
With a repetitive sequence, there are fewer unique oligos
(in the above case, instead of 33 unique 10 bp oligos, there
are only 25. Eight 10 bp oligos occur twice. With repetive
sequence, it is not possible to construct a unique sequence
of the proper length by SBH.
The University
of Washington
School of
Medicine
Department of Microbiology
Re-sequencing format
....ACGTCGTATCGTAGTAGCAGCTGATCGTACGTACG.....
ACGTCGAATCGTAGT
ACGTCGCATCGTAGT
ACGTCGGATCGTAGT
ACGTCGTATCGTAGT
CGTCGTATCGTAGTA
CGTCGTCTCGTAGTA
CGTCGTGTCGTAGTA
CGTCGTTTCGTAGTA
GTCGTAACGTAGTAG
GTCGTACCGTAGTAG
GTCGTAGCGTAGTAG
GTCGTATCGTAGTAG
etc.....
}
}
}
The University
of Washington
School of
A
C
G
T
TATCGTAGTAG
Chip of oligos distributed
along the known sequence
w/middle base varying
Medicine
Department of Microbiology
Re-sequencing format applied to
genotyping
....ACGTCGTATCGTAGTAGCAGCTGATCGTACGTACG.....
Some other sequence, locus 2
Locus 1
...ACGTCGGATCGTAGT.
etc., locus 3,4,.....
...ACGTCGTATCGTAGT.
}
Individual A: heterozygote G/T
Individual B: homozygote G
etc......
Individual C: homozygote T
The University
of Washington
School of
etc......
Medicine
Department of Microbiology
Arrayed Primer Extension
A
G
T
The University
of Washington
TACGAC ----A
CG
AT
GC
CG
AT
GC
TA
AT
GC
GC
AT
C
GAGAGAC-------
School of
Medicine
Department of Microbiology
Translation State Array
Analysis (TSAA)
CY3
The University
of Washington
CY5
School of
Medicine
Department of Microbiology
CY3
CY5
CY3
CY5
Analyze for changes in translation state
Cell Population #1
Cell Population #2
Extract mRNA
Extract mRNA
Make cDNA
Label w/ Green Fluor
Make cDNA
Label w/ Red Fluor
Co-hybridize
Scan
……………………….
……………………….
……………………….
……………………….
……………………….
……………………….
Slide with DNA from
different genesSchool of
The University
of Washington
Medicine
Department of Microbiology
Towards Pathway Modeling
DNA---> RNA ---> Protein
[mRNA’s]
[Protein’s]
Rates
TSAA
Expression
Arrays
The University
of Washington
School of
2-D gels, other
proteomic
technologies
Medicine
Department of Microbiology
Other Applications
• Genome - genome comparisons
– Species-to-species
– Individual-to-individual
• Environmental surveys for presence/absence of
given bacteria(um)
• Identification of protein-DNA binding sites.
• Measurement of DNA replication rates.
• Many others…...
The University
of Washington
School of
Medicine
Department of Microbiology
Data Analysis (short)
• Normalization
• Statistics
The University
of Washington
School of
Medicine
Department of Microbiology
What do we actually measure?
• Answer: We measure signal (radioactivity,
Cy3 signal, or Cy5 signal) of cDNA
target(s) which hybridize(s) to our probe
(and backgrounds, ratios, standard
deviations, dust etc.…)
• What to we wish to know (an abstraction)?
[mRNA]1a , [mRNA] 1b ,….. [mRNA]Na , [mRNA] Nb
Where N = Number of Genes, a and b = different
experimental conditions.
The University
of Washington
School of
Medicine
Department of Microbiology
Some observations
• Ratios we measure by 2-color expression arrays often
underestimate the ratio as measured by other technologies
(e.g Northerns or real-time PCR)
• The above effect is worse for more highly expressed genes
- e.g. ratios are more “compressed” at high expression.
• Everything that can go wrong generally conspires to
compress the ratio.
• The measured ratio is dependent on the concentration of
the probe (e.g. the amount of DNA on the spot).
• Hence, I don’t refer to our measurements using “foldchange” terminology.
The University
of Washington
School of
Medicine
Department of Microbiology
Types of normalization
• To total signal
• To “house keeping genes”
• To genomic DNA spots (Research Genetics)
or mixed cDNA’s
• To internal spikes
• Other ways …..
The University
of Washington
School of
Medicine
Department of Microbiology
of u and me
AssAssume
Often we assume:
[mRNA]n,a a signaln,a
[mRNA]n,a = k * signaln,a
“Normalization
constant”
Data normalization - It’s more
complicated than you might think...
• Experiment
– Take RNA from a single sample and make
aliquots
– label one in Cy3 and one in Cy5
– hyb to the same array
• Expect
– same ratio for all detectable signals (+/- error)
– can normalize to some controls to get a linear
scaling factor
The University
of Washington
School of
Medicine
Department of Microbiology
Ratio sorted from most to least expressed
The University
of Washington
School of
Medicine
Department of Microbiology
How reproducible
are array
experiments?
The University
of Washington
School of
Medicine
Department of Microbiology
Std. Logic for Array Experiments
•
•
•
•
•
•
Arrays are very expensive …..
I can’t afford replicates….
I’ll just do one experiment …..
I can still get this published….
So it must be OK!
Is it?
The University
of Washington
School of
Medicine
Department of Microbiology
What does a typical error profile
look like?
}
60-80% of the data (on semi-random
Mammalian clone arrays)
The University
of Washington
School of
Medicine
Department of Microbiology
# of genes
What is a statistically significant
level of differential expression?
Is this point
significant?
Log ratio
A few comments about
histograms of ratios
• They are narrower at high gene expression
due to decreased scatter in the signal.
– Thresh-hold for differentially expressed should
be F(I).
• They are not necessarily log normal
• They are almost always close to log normal
if all data is included since error is log
normal.
The University
of Washington
School of
Medicine
Department of Microbiology
Selection of differentially expressed genes
Number of genes
Threshold
Approach
Threshold
= F(I)
Log
ratio
Intensity
Number of genes
Error (Precision) Estimation Methods
• Large number of replicate measurements to
calculate standard error n>=5.
•Small number of replicates to calculate standard
error (n=3-5).
•Duplicates with a common error model.
•Duplicates with D to estimate error.
•Single measurement with error model borrowed
from similar experiments.
•Single measurement (current standard in many
publications).
The University
of Washington
School of
Medicine
Cost
Value
Department of Microbiology
Our “Typical” Experiment
Replicates within the array
Replicate arrays
Sample 1
Sample 2
Sample 1
Sample 2
Net result - 4 data points/ “gene”
The University
of Washington
School of
Medicine
Department of Microbiology
Reasons for doing a “flipped
color” experiment
• Aids in data normalization
– Fit a normalization function so that both color schemes
agree with each other
• Some data points do not invert ratio in a flipped
color experiment! (~ 0.1% in human)
– These will appear as differentially expressed in a single
experiment but are false positives.
– Sequence specific incorporation effects?
The University
of Washington
School of
Medicine
Department of Microbiology
“Spot-on” Data Processing
Raw
Data
“Spot-on
Image”
“Spot-on
Unite”
Select Genes
which are
differentially
expressed by a
statistically
significant
amount
Spot locations,
intensities,backgrounds,
ratios, error estimates
“Spot-on
Select”
Mean ratios,
error estimates
links to external dB
“Spot-on select”
The University
of Washington
School of
Medicine
Department of Microbiology
Recommendations/comments
• Normalization algorithms/methods should be
looked at more carefully. Don’t assume linearity
• All measured numbers from arrays should have
associated error estimates of some kind.
• Error estimates are best obtained by replicate
measurements on replicas which represent the
true variability of the biology (sample and/or
culture heterogeneity can be a major issue).
The University
of Washington
School of
Medicine
Department of Microbiology
Recommendations/comments
• Subsequent biology done by yourself or others
on false positives/negatives is often more costly
than the array analysis.
• Biologists often worry too much about false
negatives and not enough about false positives.
• You can’t publish a false negative but you can
publish a false positive.
The University
of Washington
School of
Medicine
Department of Microbiology
A few more comments
• The more experimental planning you do up
front, the more you can extract from an
array experiment.
– Can you control sample heterogeneity?
– What are the best controls?
– Do you have enough sample to do sufficient
replica’s to get meaningful results?
The University
of Washington
School of
Medicine
Department of Microbiology
The Center for Expression Arrays
Current team:
• Kim Smith, manager (206)732-6049,
[email protected]
• Jada Quinn, Darren May, Suzanne Oakley (Research
technicians)
• Erick Hammersmark, Chuck Benson (software engineers).
• Aaron Valla, Alice Tanada, Min Hui (undergraduates).
Lots of help (past and present): Lee Hood lab, Stan Fields lab,
Michael Katze lab (Gary Geiss), Jim Mullins lab
(Angelique van’t Wout), Stephen Lory lab.
The University
of Washington
School of
Medicine
Department of Microbiology
When would one do custom arrays?
• Not very often
– cost/array for large “standard” arrays is similar
to cost/array for small custom arrays.
– there is a lot we don’t know.
• If RNA is very limited, then it makes sense.
– procedure: 1) Using an analogous system,
identify differentially expressed genes. 2) make
small arrays of these genes to use with tissue of
interest.
The University
of Washington
School of
Medicine
Department of Microbiology