No Slide Title

Download Report

Transcript No Slide Title

Introduction to DNA
Microarrays
Michael F. Miles, M.D., Ph.D.
Depts. of Pharmacology/Toxicology and
Neurology and the Center for Study of
Biological Complexity
[email protected]
225-4054
Biological Regulation:
“You are what you express”
• Levels of regulation
• Methods of measurement
• Concept of genomics
Regulation of Gene
Expression
• Transcriptional
– Altered DNA binding protein complex abundance or function
• Post-transcriptional
– mRNA stability
– mRNA processing (alternative splicing)
• Translational
– RNA trafficking
– RNA binding proteins
• Post-translational
– Many forms!
Regulation of Gene Expression
• Genes are expressed when they are transcribed into
RNA
• Amount of mRNA indicates gene activity
•
Some genes expressed in all tissues -- but are still
regulated!
•
Some genes expressed selectively depending on
tissue, disease, environment
• Dynamic regulation of gene expression allows long
term responses to environment
 Mesolimbic dopamine
? Other
Acute Drug Use
Reinforcement
Intoxication
Altered Signaling
Gene Expression
Tolerance
Dependence
?Synaptic Remodeling
Sensitization
Chronic Drug Use
?Synaptic Remodeling
Persistent Gene Exp.
Compulsive Drug
Use
“Addiction”
Progress in Studies on Gene
Regulation
1960
1970
1980
1990
2000
mRNA,
tRNA discovered
Nucleic acid hybridization,
protein/RNA
electrophoresis
Molecular cloning;
Southern, Northern &
Western blots; 2-D
gels
Subtractive
Hybridization, PCR,
Differential Display,
MALDI/TOF MS
Genome
Sequencing
DNA/Protein
Microarrays
Nucleic Acid Hybridization:
How It Works
Primer on Nucleic Acid
Hybridization
• Hybridization rate depends on time,the
concentration of nucleic acids, and the
reassociation constant for the nucleic
acid:
C/Co = 1/(1+kCot)
Biological Networks
Types of Biological
Networks
Gene Regulation Network
Examining Biological Networks:
Experimental Design
Examining Biological Networks
A Bit of History
~1992-1996: Oligo arrays developed by Fodor,
Stryer, Lockhart, others at Stanford/Affymetrix and
Southern in Great Britain
~1994-1995: cDNA arrays usually attributed to Pat
Brown and Dari Shalon at Stanford who first used a
robot to print the arrays. In 1994, Shalon started
Synteni which was bought by Incyte in 1998.
However, in 1982 Augenlicht and Korbin proposed a
DNA array (Cancer Research) and in 1984 they
made a 4000 element array to interrogate human
cancer cells.
High Density DNA Microarrays
Expression Profiling: A Non-biased, Genomic
Approach to Understanding Complex CNS Disease
Candidate
Gene Studies
Molecular
Triangulation:
Genomics,
Genetics and
Pharmacology
Bioinformatics:
Genetical genomics
Functional Grouping
Literature Networks
Protein Interactions
Promotor Motif Grouping
Utility of Expression
Profiling
•
•
•
•
Non-biased, genome-wide
Hypothesis generating
Gene hunting
Pattern identification:
– Insight into gene function
– Molecular classification
– Phenotypic mechanisms
AvgDiff
Use of Sscore in
Hierarchical
Clustering
of Brain
Regional
Expression
Patterns
S-score
-2
0
+2
relative change
Experimental Design with DNA
Microarrays
Sources of Variance in Microarray
Experiments
Ty pe of Variance Factors
Biological
Animal-animal differences (int ra/inter cage, supplier)
Genotype
Circadian rhythms
Stress
Technical
Sample t reat ment /harvesting (dissect ions, inject ions)
Target preparat ion (enzyme lots, mRNA quality)
Lot-to-lot chip variat ion
Chip processing (scanning order)
Environmental
Temperature
Handling
Noise/odors
High Density DNA Microarrays
Synthesis and Analysis of 2-color
Spotted cDNA Arrays: “Brown
Chips”
Comparative Hybridization with
Spotted cDNA Microarrays
Synthesis of High Density Oligonucleotide
Arrays by Photolithography/Photochemistry
GeneChip Features
• Parallel analysis of >30K human,
rat or mouse genes/EST clusters
with 15-20 oligos (25 mer) per
gene/EST
• entire genome analysis (human,
yeast, mouse)
• 3-4 orders of magnitude dynamic
range (1-10,000 copies/cell)
• quantitative for changes >25% ??
• SNP analysis
Oligonucleotide Array Analysis
Total RNA
5’
AAAA
Rtase/
Pol II
dsDNA
AAAA-T7
TTTT-T7
T7 pol
Biotin-cRNA
TTTT-5’
CTP-biotin
Oligo(dT)-T7
Hybridization
Scanning
PM
MM
Steptavidinphycoerythrin
Stepwise Analysis of
Microarray Data
• Low-level analysis -- image analysis,
expression quantitation
• Primary analysis -- is there a change in
expression?
• Secondary analysis -- what genes show
correlated patterns of expression?
(supervised vs. unsupervised)
• Tertiary analysis -- is there a phenotypic
“trace” for a given expression pattern?
Affymetrix Arrays: Image
Analysis
Affymetrix Arrays: Image Analysis
“.DAT” file
“.CEL” file
Affymetrix Arrays: PM-MM
Difference Calculation
Probe pairs control for non-specific hybridization of oligonucle
Variability in Ln(FC)
Ln(FC1)
(a)
Ln(FC2)
Probe Level Analysis
Methods
• AvgDiff -- Affymetrix 1996, trimmed mean with
exclusion of outliers, PM-MM
• MAS 5 -- Affymetrix 2001, modeled correction of MM,
Tukey’s bi-weight, PM-MM or PM-m
• MBEI -- Li and Wong 2001, modeled correction and
outlier detection, PM-MM or PM only
• RMA (Robust Multichip Analysis) -- Irizarry et al.
2002, PM only
• PDNN (Position Dependent Nearest Neighbor) Zhang et al. 2003, thermodynamic model for probe
interactions, PM only
Slide Normalization: Pieces and
Pins
“Lowess” normalization,
Pin-specific Profiles
After Print-tip Normalization
http://www.ipam.ucla.edu/publications/fg2000/fgt_tspeed9.
pdf
See also: Schuchhardt, J. et al., NAR 28: e47 (2000)
Normalization Confounds:
Non-linearity
Normal vs. Normal
Normal vs. Tumor
Statistical Analysis of Microarrays:
“Not Your Father’s Oldsmobile”
Secondary Analysis:
Expression Patterns
• Supervised multivariate analyses
– Support vector machines
• Non-supervised clustering methods
– Hierarchical
– K-means
– SOM
Clustering Methods
• Distance measurement -- Euclidean most frequently
used (d2 = S (xi-yi)2)
• Clustering techniques
• Supervised multivariate analyses
– Support vector machines
• Non-supervised clustering methods
–
–
–
–
Hierarchical -- single vs. complete vs. average linkage
K-means -- have to estimate “k” initially
SOM -- self-organizing maps
Principal components analysis
K-means vs. Hierarchical
Clustering
• K-means: select number of groups, divide
genes randomly into those groups, calculate
inter- and intra-group distances. Move genes
until maximize inter-group and minimize intragroup differences.
• Hierarchical: calculate all pairwise distances
(correlations) and order genes accordingly.
AvgDif
f
Use of Sscore in
Hierarchica
l Clustering
of Brain
Regional
Expression
Patterns
Sscore
-2
0
+2
relative change
Expression
Profiling:
“It is possible that the expression
profile could serve as a universal
phenotype … Using a
comprehensive database of
reference profiles, the pathway(s)
perturbed by an uncharacterized
mutation would be ascertained by
simply asking which expression
patterns in the database its profile
most strongly resembles … it
should be equally effective at
determining consequences of
pharmaceutical treatments and
disease states”
Hughes et al. Cell 102:109-126 (2000)
Use of Expression Profile “Compendium”
to Characterize Gene or Drug Function
Key features:
established error model
profiled large number of mutants/drugs under highly controlled conditions
statistical treatment of expression patterns
verified array results with biochemical/phenotypic assays
Hughes et al. Cell 102:109-126 (2000)
Correlation in Expression Profiles
of Drugs/Genes Affecting Same
Pathways
cup5 and vma8,
components of
Unrelated gene
mutants
H+/ATPase complex
Red symbols = significant change (p<0.05) in both treatments
HMG CoAreductase mutant
vs. lovastatin, an
inhibitor of
HMG2
Hughes et al. Cell 102:109-126 (2000)
Assigning Function to Uncharacterized Genes
by Expression Profiles
Hughes et al. Cell 102:109-126 (2000)
Tertiary Analysis: Connecting
Function with Expression Patterns
• Annotation
– UniGene/Swiss-Prot, SOURCE, DAVID
• Biased functional assessment
– Manual, GenMAPP, GeneSpring
• Non-biased functional queries
– PubGen
– MAPPFinder, DAVID/Ease, GEPAS, GOTree
Machine, others
• Overlaying genomics and genetics
– WebQTL
Non-biased (semi)
Functional Group Analysis:
GenMAPP
Expression Analysis Systematic Explorer -EASE
http://apps1.niaid.nih.gov/david/upload.jsp
Genome Biol. 2003;4(10):R70. Epub 2003 Sep 11.
EASE -- Options in Analysis
Efforts to Integrate Diverse Biological Databases
with Expression Information: PubGen
www.PubGen.org
Quaternary Analysis: Profiles to Physio
Expression Profiling
Prot-Prot
Interactions
BioMed Lit
Relations
Expression Networks
Homolo
-Gene
Ontology
Genetics
Pharmacology
Complex
Trait
Analysis Stages for Oligonucleotide Microarrays
Analysi s S tage
Normal iz ati on
Probe reduction
C om parative
Mu l ti variate
stu die s
Biological overlay
Descri ption
Equalizes overall signal across
arrays to be compared, ensures
linearity of response across
abundance classes
Combines signals from mult iple
probes or probe pairs to define
Òexpression levelÓ. Ident ifies
genes with invalid or hypervariable expression levels.
Compares expression of a gene
across two or more arrays to
determine significant changes in
expression
Ident ifies significant correlations
in expression data across
experiments/condit ions
Ident ify functions for given
genes, clusters of genes;
hypothesis generation
Exam ple s of Methods
Whole chip(26)
Quant ile(27)
Weighted average (MAS 4)(29)
Tukey bi-weight (MAS 5)(30)
Model-based (MBEI)(31)
Log scale linear addit ive (RMA)(32)
Posit ion-dependent stacking energy modeling
(PDNN) (33)
t -test
rank order (MAS 5) (30)
permutat ion (SAM) (46, 47)
S-score (48)
hierarchical clustering
k-means clust ering
self-organizing maps
principle component s analysis
& many more(34, 49)
Mult iple database access (Source)(50)
PubMed correlat ions (PubGene)(51)
Gene Ontology rankings (GenMAPP,
MAPP Finder, DAVID/EASE)(52, 53)
Bioinformatics Resources for Microarray Experiments
Name
S OURCE
Descri ption
Human, rat, mouse gene compilat ion
from mult iple databases; allows batch
submissions for annotat ion
Gene Lynx
Human, mouse gene compilat ion;
multiple database links regarding
gene/protein struct ure and funct ion
DAVID/Ease
Mines gene list for frequency of GO
categories; annotat ion of gene list;
st atistical analysis of biological t hemes
in gene list (EASE)
GenMAPP/MAPPFin de r Superimposes array dat a on biological
pathways; stat ist ical ranking of
funct ional groups
FatiGO
Mines gene list for occurrence of GO
terms; stat ist ical comparison of two
list s for over-representat ion
PubGene
Finds associat ions between genes in
biomedical lit erature; superimposes
array data on literature links;
commercial version available
MEME
Search promoter regions of genes in
list /cluster for conserved motifs
Lin k
ht tp://source.stanford.edu/cgibin/sourceSearch
ht tp://www.gen elynx.org/
ht tp://apps1.niaid.nih.gov/David/
upload.asp
ht tp://www.gen mapp.org/
ht tp://fat igo.bioinfo.cnio.es/
ht tp://www.pubgen e.org/
ht tp://meme.sdsc.edu/meme/web
site/int ro.html