Course details

Download Report

Transcript Course details

cDNA Sequencing, SAGE
and Microarray Analysis
Outline
•
•
•
•
•
Overview of transcription
Construction of cDNA libraries
cDNA sequencing
Expression analysis via SAGE
Microarray construction and their use
in expression analysis.
We can
convert it
Theisolate
“CentralmRNA
Dogma”and
of Molecular
to a stableBiology
form (cDNA)
DNA
mRNA
protein
Isolate,
Reverse Transcribe,
label
cDNA’s
Genome in numbers
Nucleic acid content of an average human cell
Abundance distribution of mRNA species in a typical mammalian cell
Isolation of mRNA
cDNA library construction
The big picture
cDNA synthesis
cDNA synthesis occurs in 5’ to 3’ direction, requires:
– a template
– nucleosides (dNTPs)
– reverse transcriptase (retroviral polymerase)
– a primer to initiate synthesis
This duplex cDNA is exact copy of the original mRNA
It is now ready for manipulation (cloning, library construction)
Priming alternatives for cDNA
construction
Oligo dT: priming at 3’ terminus
Random Hexamers: priming
throughout sequence
Cloning: Blunt end vs. sticky
cDNA library construction
Ligation of cDNA into vector
Directional cloning
cDNA library
Definition of a good cDNA library
Ideally containing at least one copy of every expressed gene
Probablity for the above is a function of:
fragment size – the longer the more likely to find gene represented
genome size – smaller genome = increased chance to find gene represented
expression – high expression = high likelihood to find gene represented
For 99% probability, a mammalian cDNA library requires to contain ~800,000
clones
Uses of a cDNA library
Representation of the population of genes defining a cells phenotype
Long-term stable storage of information
Retrieve full-length genes rather than fragments (screening)
Find gene “family” members (screening)
cDNA sequencing
• The advent of cDNA cloning combine with the creating of
automated sequencers led to efforts to sequence the entire
human transcriptome and to create arrays (on filters) of
cDNAs (see reading materials).
• cDNA sequencing was viewed as the fastest way to get at
the coding portion of the genome.
• Numerous companies sprung up to sequence and patent
cDNA’s.
• cDNA sequencing was also used to measure gene
expression levels.
cDNA sequencing --> expression
analysis
• Expression level estimates:
– Count the number of occurrences of a given cDNA
sequence in a given library - highly expressed genes
will have been sequenced more often.
– Use the above (in combination with the total number of
sequences in the library) to estimate expression level.
Ex. PEDB (www.PEDB.org)
Web based expression analysis - www.pedb.org
counting cDNA frequency
Serial Analysis of Gene
Expression (SAGE)
• Concept
– cDNA sequencing is expensive
– Can uniquely identify most mRNA species by a
short sequence in a defined location in the gene
(9bp tags are unique 95% of the time)
– If we could produce a library of short
sequences and ligate them together, then we
could sequence the ligated DNA to measure the
concentration of gene more efficiently
SAGE
diagram
Linker:
Primer A/B - TypeII site – Type I site
A
A
A
Primer A
Primer A
Sequence these ->
B
B
B
Primer B
Primer B
Issues with SAGE (and cDNA
sequencing for expression analysis)
• Low abundance clones
– SAGE
• in 1995, the estimate was that characterization of genes representing
<100mRNA’s/cell would take a few months of work to quantify by a
single in investigator (maybe 10 times quicker today)
• Cost - if we assume even a low estimate of $6/sequencing reaction, 96
lanes * 4 runs/day*30 days * $6 = $69,000 to measure 460,000 tags
(assume 40 tags/run).
– cDNA sequencing
• Same problem costs/time maybe 20-40 times higher
• Hence expression information about low abundance clones
is not accurate in cDNA or SAGE data in most cases.
• Leading to the advent of arrays…..
DNA Hybridization
Taking advantage of DNA hybridization
On the surface
A
B
In solution
4 copies of gene A,
1copy of gene B
After Hybridization
A
B
DNA Arrays
• Spots of DNA arranged in a particular
spatial arangement on a solid support
• Supports - Filters(nylon, nitrocellulose),
glass, silicon
• Types
– Spotted or placed - pre-synthesized DNA put
onto a surface
– Synthesized - DNA synthesized directly on the
surface
The Original DNA Array
Petri dish with
bacterial colonies
Apply membrane and lift to make a filter
containing DNA from each clone.
Probe and image to identify
Clones homologous to the probe.
Vicki - A manual Gridding tool
Gridding tool modifications by : Michèl Schummer
Vicki and the gridding frame
Frame Design by: Michèl Schummer
Robotic Spotters for Filters
Types of filter based arrays
• PCR products - ORFs or cDNAs
• Oligos - some times but generally not used for
short products - oligos do not immobilize well on
membranes
• Living clones
– Place membrane on Whatman paper soaked in media,
can grow colonies directly on the arrays
– Lysis of the colonies followed by cross-linking
produced DNA arrays
– Good for screening large libraries
Uses for Filter Based Arrays
• In general, filter based arrays were in vogue about 8-13
years ago in the pre-genomic days.
• Typically cDNA libraries were spotted as clones and the
arrays were used to perform comparative expression
analysis.
• Detection was typically performed with radioactive
labeling/film or phosphorimaging.
• “Interesting clones” were identified (via differential
expression) and then sequenced.
• For genomes that have not yet been sequenced, this can
still be a cost effective approach, but rapid sequencing is
changing that.
Selected cDNA arrays
• With unselected cDNA libraries, clones for highly
expressed genes are over represented on the arrays.
• As time progressed a large number of cDNA’s were
sequenced and hence it became possible selected
unique cDNA’s and to make arrays on which each
spot represented a single gene.
• Around the same time, coatings for glass were
developed that retained spotted DNA well.
• This allowed for arrays to be produced on glass
microscope slides which in turn allowed for
fluorescence based detection technology.
Typical Path for cDNA clone
acquisition
Image Consortium
sequences
+others
Sequence cDNA’s
GenBank
Reduce
redundancy
Unigene
clones
Livermore, ATCC
Commercial distributors
Res. Genetics,
InCyte, others
Unigene
sets
Sequence
checking
Sequence
verified
sets
Us
Spotted Arrays
Spotting “pen”
Drop containing
DNA in solution
Reactive surface or coated surface
C
A
G
T
T
T
G
A
C
A
G
T
T
T
G
A
MD GenIII
Arrayer
Plate hotel
holds twelve
384-well plates
Gridding head,
12 pins
Slide holder
36 slides
Features:
•36 slides in 8 hours
•7680 genes spotted in duplicate
•Built-in humidity control
Cell Population #1
Cell Population #2
Extract mRNA
Make cDNA
Label w/ Green Fluor
Extract mRNA
Make cDNA
Label w/ Red Fluor
Co-hybridize
……………………….
……………………….
……………………….
……………………….
……………………….
……………………….
Slide with DNA from
different genes
Scan
Glass slides enabled
fluorescent detection in
2(or more) colors
Spotted arrays
• Initially, most spotted arrays were produced by
spotting PCR products produced from selected cDNA
clones.
• Issues
– Must have the libraries in hand
– Must not mix clones up
– Must perform high throughput PCR to produce DNA to spot
(again without mixing things up).
– LOTS of freezer space to store everything
– cDNA’s are long and cross hybridization is a problem
(although it is possible to spot oligo’s)
– Quality manufacturing is difficult to maintain.
Oligo Arrays
• Synthesized or spotted arrays of short oligos of
chosen sequence. (typically 20-60 base pairs)
• Synthesis methods - ink jet, light directed.
• Spotting using reactive coupling.
• Used for re-sequencing, genotyping, diagnostics
and expression arrays.
• MUCH better than cDNA arrays to distinguish
related sequences
• Only have to store the DNA’s OR (better yet) if
you synthesize DNA directly on the surface, you
only need to store the sequence information (and a
few reagents)
Basic Oligo Synthesis
Protecting
Group
Base
Base
Base
+
Coupling
P
Base
P
Glass
Support
Glass
Support
Remove Protecting Group
Base
Base
Base
P
+
Add Next Nucleotide
P
Base
P
Glass
Support
Base
+
Glass
Support
Ink-jets Can be Used to Direct Small
Volumes of Liquids to Specific Sites
Agilent InkJet Array Technology
Resistor Off
Resistor On
Fill
Reservoir
Liquid
Vaporizes
Resistor Off
Gas
Expands
Drop
Breaks Off
Reservoir
Refills
< 1 msec
~ 44,000 Features on 1”x3”
Slide
If, instead of using ink, one fills the reservoirs
with different nucleotides, inkjets can be used
to make DNA on a surface
Glass Can be Treated to Produce
Hydrophilic “Wells”
Agilent Printing Facility
Light-directed oligo synthesis
Number of different DNA
sequences as a function of
photolithographic resolution
Resolution
500 um
200 um
100 um
50 um
20 um
10 um
Synthesis Site Density
400/cm2
2500/cm2
10,000/cm2
40,000/cm2
250,000/cm2
1,000,000/cm2
All possible oligos can be made in
4*N steps
Probe Length
Chemical
Steps
4
16
Number of
Possible
Probes
256
8
32
65,536
10
40
1,048,576
15
60
1,073,741,824
Affymetrix Platform
• Each gene is represented by 11 probe pairs
of 25 bp oligos
• Each probe pair contains a perfect match
and a mismatch to the gene sequence
• Target sample is labeled with a biotinylated
nucleotide and detected via a streptavidinphycoerythrin conjugate
• One sample per array, one-color data
Affymetrix Expression Data
Data from the
11 probe pairs
are used to
calculated an
aggregate
signal for each
gene
Strategies For Array
Design
Known Exons
Unknown transcript
Surrogate Strategy
Most expression
arrays to date
Annotation Strateg
Exon arrays
Splice variants
Tiling strategy
Unbiased look
at the genome
Affymetrix Platform
• Expression arrays
– Human, Mouse, Rat, Yeast, E. coli, Drosophila, C. elegans, Dog,
Soybean, Plasmodium, Anopheles, Pseudomonas, Arabidopsis,
Zebrafish, Xenopus, etc.
• Exon arrays
– Alternative splicing patterns
• Mapping arrays
– SNP analysis, loss of heterozygosity
• Tiling array sets
– Transcript mapping
• Custom arrays
Issues with synthesized oligos
• Repetitive yield - e.g. for each reaction cycle, what
percentage of the oligos react as intended estimated at 95% for light directed method, 9899% for ink jet method
• (0.95)20 = 35.8%, (0.98)20 = 67% - net resultAffy arrays are usually 25-mers, ink jet arrays are
usually 60mers.
• For a single oligo, it can be shown that sensitivity
plateaus at 50-70bp.
Relative merits of different
methods of making oligo arrays
• Affy:
– available first, large catalogue, small feature size
possible
• Inkjet:
– much more flexible to design
• Spotted:
– less practical for large numbers (>a few 100) of
oligo’s, can be made with std. spotting equipment.
Libraries of oligos exist for more common organisms,
so oligo deposition is feasible for some organisms.
Illumina’s Bead Arrays
ACGTGTCTACAGT
TGCATCAGTGCA
CGTGTATGCATGT
TGCATCAGTGCA
ATGCACTGTAGT
Step 1 - synthesize beads in
batches each batch with a
sequence on it. Generally, color
code the beads to keep track of
which one has what molecule on it.
Step 2 - Etch the ends of optical fibers in a bundle
or circular spots on a glass slide to create bead
sized depressions.
Illumina’s Bead Arrays (cont)
ACGTGTCTACAGT
TGCATCAGTGCA
TGCATCAGTGCA
Step 3 - Allow beads to self
assemble an array on the end
of the fibers or on the surface
CGTGTATGCATGT
ATGCACTGTAGT
•These self assembled arrays can be used for the same applications as other DNA
arrays.
•Since the assembly is random, one must over represent each desired oligo 10’s of
times to assure that each oligo is represented at least n times on the array.
•Decoding can also be accomplished by hybridizing short labeled oligos to the oligos
on each bead. In practice, this is how it is usually done.
See www.illumina.com
Detection technologies
• Radio labeled probes
– Film or phosphorimagers
• Biotin labled
– Post hyb with SA labeled with a fluor or an
enzyme
• Fluorescent probes
– confocal scanning
Scanning with a confocal microscope
Expression Array Analysis
2- color Microarray Overview
Measure
Fluorescence
in 2 channels
red/green
Control
Test
Prepare Fluorescently
Labeled Probes
Hybridize,
Wash
Slide from John Quackenbush, Dana Farber
Analyze the data
to identify
patterns of
gene expression
1-color Microarray Overview
Weed
Measure
Fluorescence
in 1 channel
Control
Hybridize,
Wash
Test
Prepare Fluorescently
Labeled Probes
Bush
Slide adapted from John Quackenbush, Dana Farber
Analyze the data
to identify
patterns of
gene expression
2-color vs. single color
• 2-color was originally designed due to problems in
making reproducible arrays - e.g. the ratio on a
spot is more reproducible than the absolute
intensity if the spot size/concentration changes
from array-to-array.
• With 2-colors, you don’t necessarily get twice as
much data since it is typically to run an extra array
in the inverted color scheme.
• Experimental design and cross experiment
comparisons are much more complicated with 2color arrays.
Expression Arrays are a Natural Extension of
Genomic Analysis
• Genome studies provide the source material for
the arrays - eg. clones or manufactured DNA’s.
• For completely sequenced genomes, arrays allow a
comprehensive survey of gene expression.
• This level of analysis is a revolution in biology.
Expression Arrays Have a Broad Range
of Applicability
• Cancer Studies - tumor vs. normal.
• Infectious disease studies - host response
infection, infectious agent gene expression, viral
diversity.
• Pharmaceutical studies - drug treated vs. nontreated.
• Environmental - microbial diversity, effects of
toxins, effect of growth conditions.
Expression Arrays Have a Broad Range
of Applicability
• Gene specific studies - deletion (“knockout”) vs. normal,
over expression vs. normal.
• Agricultural studies - effects of pesticides, growth
conditions, hormones.
• Developmental biology - cells from different
areas/stages of developing organisms
• Many others - any two samples of interest can be
compared.
Challenges for Planning Good
Array Experiments
• Experimental Design
– Replicates are necessary and expensive
– A simple experiment may not give a simple answer
– What comparisons should be made?
• Data Analysis
–
–
–
–
How will differentially expressed genes be identified?
How will errors be estimated?
What software does this best?
How will the data be mined?
Where are arrays going?
• As sequencing gets cheaper and cheaper, most
assays that are currently done by arrays can be
done more effectively by sequencing. Hence, the
analytical use of arrays will be replaced by
sequencing.
• However, arrays can also be used to enrich for
specific genomic regions upstream of sequencing
or can be used to create many sequences for the
artificial production of genomes or genomic
regions.