Microarray Analysis 1

Download Report

Transcript Microarray Analysis 1

DNA microarray and array data
analysis
Some of the slides are adapted from the lecture notes of Dr. Patrick Leahy of
the Gene Expression Array Core Facility at CWRU
What is DNA Microarray




DNA microarray is a new technology to
measure the level of the mRNA gene
products of a living cell.
A microarray chip is a rectangular chip on
which is imposed a grid of DNA spots.
These spots form a two dimensional array.
Each spot in the array contains millions of
copies of some DNA strand, bonded to the
chip.
Chips are made tiny so that a small amount of
RNA is needed from experimental cells.
DNA Microarray

Many applications in both basic and clinical
research


determining the role a gene plays in a pathway,
disease, diagnostics and pharmacology, …
There are three main platforms for performing
microarray analyses.
cDNA arrays (generic, multiple manufacturers)
 Oligonucleotide arrays (genechips) (Affymetrix)
 cDNA membranes (radioactive detection)

cDNA Microarray

Spot cloned cDNAs onto a glass/nylon microscope slide


usually PCR amplified segments of plasmids
Complementary hybridization
-- CTAGCAGG actual gene
-- GATCGTCC cDNA (Reverse transcriptase)
-- CUAGCAGG mRNA




Label 2 mRNA samples with 2 different colors of
fluorescent dye -- control vs. experimental
Mix two labeled mRNAs and hybridize to the chip
Make two scans - one for each color
Combine the images to calculate ratios of amounts of
each mRNA that bind to each spot
Spotted Microarray Process
CTRL
TEST
cDNA Array Experiment Movie

http://www.bio.davidson.edu/courses/genomic
s/chip/chip.html
“Long Oligos”




Like cDNAs, but instead of using a cloned
gene, design a 40-70 base probe to represent
each gene
Relies on genome sequence database and
bioinformatics
Reduces cross hybridization
Cheaper and possibly more sensitive than Affy.
system
Affymetrix


Uses 25 base oligos synthesized in place on a chip (20
pairs of oligos for each gene)
cRNA labeled and scanned in a single “color”





one sample per chip
Can have as many as 47,000 probes on a chip (HG-U133
Plus 2.0 Array)
Arrays get smaller every year (more genes)
Chips are expensive (about $400/chip)
Proprietary system: “black box” software, can only use
their chips
Affymetrix Genome Arrays
®
Affymetrix GeneChip Probe Array
®
Affymetrix GeneChip Probe Arrays
Hybridized Probe Cell
GeneChip Probe Array
Single stranded, fluorescently
labeled cRNA target
*
*
*
*
*
*
Oligonucleotide probe
24~50µm
1.28cm
Each probe cell or feature contains
millions of copies of a specific
oligonucleotide probe
BGT108_DukeUniv
Image of Hybridized Probe Array
Affymetrix
GeneChip
Probe:
 25 bases long single
stranded DNA oligos
Probe Cell:
 Single square-shaped
feature on an array
containing one type of
probe.
 Contains millions of probe
molecules
Probe Pair:

Probe Set
Perfect Match/Mismatch
Array Design
5’
Twenty oligo probes
are selected from the
last 600 bases from the
3’ end of the gene
3’
For each probe
selected, a partner
containing a
central mutation is
also made
Perfect Match
25 mer DNA oligo
Mismatch
Probe Set
Perfect Match
Mismatch
24m
For each gene a
total of 20 probe
pairs are arrayed
on the chip
PM
MM
Probe Pair
24m
Probe Cell
Probe Sub-types on chips

Known genes






Specific transcripts
Exemplars
Consensus
Housekeeping genes
Expressed sequence tags (ESTs)
Spiked control transcripts
cRNA preparation
Total RNA (5-8 g)
AAAAAAAAA
cDNA Strand 1 synthesis
SS II reverse transcriptase
AAAAAAAAA
TTTTTTTTTNNNNNNNNN
T7RNA pol.
promoter
E. coli DNA pol. I
cDNA Strand 2 synthesis
AAAAAAAAANNNNN
TTTTTTTTTNNNNNNNNN
T7RNA pol.
promoter
………..
………..
………..
…… ………..
UUUUUUUUUU
UUUUUUUUUU
UUUUUUUUUU
UUUUUUUUUU
UUUUUUUUUU
IVT cRNA synthesis amplifies
and labels transcripts with
Biotin
T7 RNA pol.
Fragmented cRNA
cRNA is now ready for hybridization to test chip
AAAAAAAAAAAAAAN
NNNNNNNNNNNNN
TNNNNNNNN
T
T
TT
TTTTTTT T
cRNA labeled targets
B
B
B
B
B
B
B
B
Post
hybridiz
-ation
washes
B
B
B
B
B
cRNA labeled targetsB
Specific
Binding
B
NonSpecific
Binding
B
B
B
B
B
B
FL
B
B
B S
B
B
B
B
B
BS
B
B
B
cDNA probes
FL
B S
FL
B
B
FL
BS
BS
FL
FL
BS
FL
B S
FL
BS
FL
B S
Streptavidin
Microarray experiment
Biotin-Labeled
cRNA transcript
Cells
AAAA
Poly (A)+
RNA
IVT
cDNA
B
B
B
B
(B-UTP)
Fragment
(heat, Mg2+)
B
Hybridize
Scan
Wash
Stain
(1-18 hours)
B
B
B
Biotin-Labeled
cRNA fragments
.dat file
Probe set
The chip image data
file (or “.dat” file) is
the first part of data
acquisition and
appears on the
computer screen
upon completion of
the laser scan.
Here, we zoom in
to see an individual
probe set that has
been highlighted
The first image is
“sample1.dat.” note the
pixel to pixel variation
within a probe cell
A “*.cel.” file is automatically
generated when the “*.dat”
image first appears on the screen.
Note that this derivative file has
homogenous signal intensity
within its probe cells
.cel file
Affymetrix Algorithms
All MMs < PMs,
No adjustment
necessary
Few MMs > PMs, change
MMs based on weighted
mean of other MMs
Most MMs > PMs,
change MMs to be
slightly lesss than PM
1. Signal
1.1 Adjusting MMs to
purge negative values
Signal Calculation.
Affymetrix Algorithms
PM 1000
MM 900
PM-MM
100
5000
2000
430
230
Calculate the signal
765
25
355
331
98
40
3005
1200
413 20333
203 6197
Having adjusted the MM values, we
3000now200
740 the24
1805 210 14136
calculate
signal58
Unweighted mean = 2063
The unweighted mean is vulnerable to outlier data. In order to
protect against this, we dampen the effect of outliers by using the
Tukey bi-weight mean. PM-MM values that are a number of
standard deviations away from the mean are given low weights in
accordance with the graph shown here. Individual PM-MM data
are multiplied by the weight factor before calculation of the mean.
The weighted mean is then called the “signal.”
The PM values.
The MM values.
The PM-MM values are calculated.
Weight
factor
1
1
2
3
4
5
6
Standard deviations
Using Tukey’s biweight mean = 1780
Signal (expression level) = 1780
590
230
360
.xls file
ALL_vs_AML_train_set_38_sorted.res
ALL_vs_AML_train_set_38_sorted.cls
38 2 1
00000000000000000000000000011111111111
27
11