Transcript PPS - VCU
Introduction to DNA Microarrays
Michael F. Miles, M.D., Ph.D.
Depts. of Pharmacology/Toxicology and
Neurology and the Center for Study of
Biological Complexity
[email protected]
225-4054
Biological Regulation:
“You are what you express”
• Levels of regulation
• Methods of measurement
• Concept of genomics
Regulation of Gene Expression
• Transcriptional
– Altered DNA binding protein complex abundance or function
• Post-transcriptional
– mRNA stability
– mRNA processing (alternative splicing)
• Translational
– RNA trafficking
– RNA binding proteins
• Post-translational
– Many forms!
Regulation of Gene Expression
• Genes are expressed when they are transcribed into RNA
• Amount of mRNA indicates gene activity
•
Some genes expressed in all tissues -- but are still
regulated!
•
Some genes expressed selectively depending on tissue,
disease, environment
• Dynamic regulation of gene expression allows long term
responses to environment
Mesolimbic dopamine
? Other
Acute Drug Use
Reinforcement
Intoxication
Altered Signaling
Gene Expression
Tolerance
Dependence
?Synaptic Remodeling
Sensitization
Chronic Drug Use
?Synaptic Remodeling
Persistent Gene Exp.
Compulsive Drug
Use
“Addiction”
Progress in Studies on Gene Regulation
1960
1970
1980
1990
2000
mRNA,
tRNA discovered
Nucleic acid hybridization,
protein/RNA electrophoresis
Molecular cloning;
Southern, Northern &
Western blots; 2-D gels
Subtractive
Hybridization, PCR,
Differential Display,
MALDI/TOF MS
Genome Sequencing
DNA/Protein
Microarrays
Nucleic Acid Hybridization:
How It Works
Primer on Nucleic Acid
Hybridization
• Hybridization rate depends on time,the
concentration of nucleic acids, and the
reassociation constant for the nucleic acid:
C/Co = 1/(1+kCot)
High Density DNA Microarrays
A Bit of History
~1992-1996: Oligo arrays developed by Fodor, Stryer,
Lockhart, others at Stanford/Affymetrix and Southern in
Great Britain
~1994-1995: cDNA arrays usually attributed to Pat Brown
and Dari Shalon at Stanford who first used a robot to print
the arrays. In 1994, Shalon started Synteni which was
bought by Incyte in 1998.
However, in 1982 Augenlicht and Korbin proposed a
DNA array (Cancer Research) and in 1984 they made a
4000 element array to interrogate human cancer cells.
(Rejected by Science, Nature and the NIH)
Biological Networks
Types of Biological Networks
Gene Regulation Network
Examining Biological Networks:
Experimental Design
Examining Biological Networks
AvgDiff
Use of Sscore in
Hierarchical
Clustering
of Brain
Regional
Expression
Patterns
S-score
-2
0
+2
relative change
Expression Profiling: A Non-biased, Genomic Approach to
Resolving the Mechanisms of Addiction
Candidate
Gene Studies
Cycles of
Expression
Profiling
Merge with
Biological
Databases
Utility of Expression Profiling
•
•
•
•
Non-biased, genome-wide
Hypothesis generating
Gene hunting
Pattern identification:
– Insight into gene function
– Molecular classification
– Phenotypic mechanisms
Comparisons
(S-score, dchip)
De-noise
GE Database
(SQL Server)
Statistical
Filtering
(e.g. SAM)
Hybridization
and Scanning
Clustering
Techniques
Experimental
Design
Behavioral
Validation
Provisional
Gene
“Patterns”
Molecular
Validation
(RT-PCR, in
situ, Western)
Candidate
Genes
Filtered Gene
Lists
Overlay
Biological
Databases
(PubGen,
GenMAPP,
QTL, etc.)
Experimental Design with DNA
Microarrays
High Density DNA Microarrays
Synthesis and Analysis of 2-color
Spotted cDNA Arrays: “Brown Chips”
Comparative Hybridization with
Spotted cDNA Microarrays
Synthesis of High Density Oligonucleotide
Arrays by Photolithography/Photochemistry
GeneChip Features
• Parallel analysis of >30K human, rat or
mouse genes/EST clusters with 15-20
oligos (25 mer) per gene/EST
• entire genome analysis (human, yeast,
mouse)
• 3-4 orders of magnitude dynamic range
(1-10,000 copies/cell)
• quantitative for changes >25% ??
• SNP analysis
Oligonucleotide Array Analysis
Total RNA
5’
AAAA
Rtase/
Pol II
dsDNA
AAAA-T7
TTTT-T7
T7 pol
Biotin-cRNA
TTTT-5’
CTP-biotin
Oligo(dT)-T7
Hybridization
Scanning
PM
MM
Steptavidinphycoerythrin
Stepwise Analysis of
Microarray Data
• Low-level analysis -- image analysis,
expression quantitation
• Primary analysis -- is there a change in
expression?
• Secondary analysis -- what genes show
correlated patterns of expression?
(supervised vs. unsupervised)
• Tertiary analysis -- is there a phenotypic
“trace” for a given expression pattern?
Affymetrix Arrays: Image
Analysis
Affymetrix Arrays: Image Analysis
“.DAT” file
“.CEL” file
Affymetrix Arrays: PM-MM
Difference Calculation
Probe pairs control for non-specific hybridization of oligonucleotides
Variability and Error in DNA
Microarray Hybridizations
Variability in Ln(FC)
Ln(FC1)
(a)
Ln(FC2)
•
Position Dependent Nearest Neighbor (PDNN) - 2003
Zhang, Miles and Aldape, (2003) A model of molecular interactions on short
oligogonucleotide microarrays: implications for probe design and data analysis.
Nature Biotech. In Press.
Chip Normalization Procedures
• Whole chip intensity
– Assumes relatively few changes, uniform
error/noise across chip and abundance classes
• Spiked standards
– Requires exquisite technical control, assumes
uniform behavior
• Internal Standards
– Assumes no significant regulation
• “Piece-wise” linear normalization
S-score
Normalization Confounds:
Non-uniform Chip Behavior
Gene
Normalization Confounds:
Non-linearity
Slide Normalization: Pieces and Pins
“Lowess” normalization,
Pin-specific Profiles
After Print-tip Normalization
http://www.ipam.ucla.edu/publications/fg2000/fgt_tspeed9.pdf
See also: Schuchhardt, J. et al., NAR 28: e47 (2000)
Quality Assessment
• Gene specific: R/G correlation, %BG,
%spot
• Array specific: normalization factor, %
genes present, linearity, control/spike
performance (e.g. 5’/3’ ratio, intensity)
• Across arrays: linearity, correlation,
background, normalization factors, noise
Statistical Analysis of Microarrays:
“Not Your Father’s Oldsmobile”
Normal vs. Normal
Normal vs. Tumor
Sources of Variability
• Target Preparation
– Group target preps
• Chip Run
– Minor, BUT…
– Be aware of processing order
• Chip Lot
– Stagger lots across experiment if necessary
• Chip Scanning Order
– Cross and block chip scanning order
Secondary Analysis: Expression
Patterns
• Supervised multivariate analyses
– Support vector machines
• Non-supervised clustering methods
– Hierarchical
– K-means
– SOM
AvgDif
f
Use of Sscore in
Hierarchica
l Clustering
of Brain
Regional
Expression
Patterns
Sscore
-2
0
+2
relative change
Expression Profiling
Prot-Prot
Interactions
BioMed Lit
Relations
Expression Networks
HomoloGen
e
Ontology
Pharmacology
Genetics
Behavior
Array Analysis: Conclusions
• Be careful! Assess quality control
parameters rigorously
• Single arrays or experiments are of limited
value
• Normalization and weighting for noise are
critical procedures
• Across investigator/platform/species
comparisons will most easily be done with
relative data
Comparison of Primary Analysis Algorithms II
Spotted cDNA Microarrays