BBSIleccture7_04
Download
Report
Transcript BBSIleccture7_04
Introduction to DNA
Microarrays
Michael F. Miles, M.D., Ph.D.
Depts. of Pharmacology/Toxicology and
Neurology and the Center for Study of
Biological Complexity
[email protected]
225-4054
Biological Regulation:
“You are what you express”
• Levels of regulation
• Methods of measurement
• Concept of genomics
Regulation of Gene
Expression
• Transcriptional
– Altered DNA binding protein complex abundance or function
• Post-transcriptional
– mRNA stability
– mRNA processing (alternative splicing)
• Translational
– RNA trafficking
– RNA binding proteins
• Post-translational
– Many forms!
Regulation of Gene Expression
• Genes are expressed when they are transcribed into
RNA
• Amount of mRNA indicates gene activity
•
Some genes expressed in all tissues -- but are still
regulated!
•
Some genes expressed selectively depending on
tissue, disease, environment
• Dynamic regulation of gene expression allows long
term responses to environment
Mesolimbic dopamine
? Other
Acute Drug Use
Reinforcement
Intoxication
Altered Signaling
Gene Expression
Tolerance
Dependence
?Synaptic Remodeling
Sensitization
Chronic Drug Use
?Synaptic Remodeling
Persistent Gene Exp.
Compulsive Drug
Use
“Addiction”
Progress in Studies on Gene
Regulation
1960
1970
1980
1990
2000
mRNA,
tRNA discovered
Nucleic acid hybridization,
protein/RNA
electrophoresis
Molecular cloning;
Southern, Northern &
Western blots; 2-D
gels
Subtractive
Hybridization, PCR,
Differential Display,
MALDI/TOF MS
Genome
Sequencing
DNA/Protein
Microarrays
Nucleic Acid Hybridization:
How It Works
Primer on Nucleic Acid
Hybridization
• Hybridization rate depends on time,the
concentration of nucleic acids, and the
reassociation constant for the nucleic
acid:
C/Co = 1/(1+kCot)
Biological Networks
Types of Biological
Networks
Gene Regulation Network
Examining Biological Networks:
Experimental Design
Examining Biological Networks
A Bit of History
~1992-1996: Oligo arrays developed by Fodor,
Stryer, Lockhart, others at Stanford/Affymetrix and
Southern in Great Britain
~1994-1995: cDNA arrays usually attributed to Pat
Brown and Dari Shalon at Stanford who first used a
robot to print the arrays. In 1994, Shalon started
Synteni which was bought by Incyte in 1998.
However, in 1982 Augenlicht and Korbin proposed a
DNA array (Cancer Research) and in 1984 they
made a 4000 element array to interrogate human
cancer cells.
High Density DNA Microarrays
Expression Profiling: A Non-biased, Genomic
Approach to Understanding Complex CNS Disease
Candidate
Gene Studies
Molecular
Triangulation:
Genomics,
Genetics and
Pharmacology
Bioinformatics:
Genetical genomics
Functional Grouping
Literature Networks
Protein Interactions
Promotor Motif Grouping
Utility of Expression
Profiling
•
•
•
•
Non-biased, genome-wide
Hypothesis generating
Gene hunting
Pattern identification:
– Insight into gene function
– Molecular classification
– Phenotypic mechanisms
AvgDiff
Use of Sscore in
Hierarchical
Clustering
of Brain
Regional
Expression
Patterns
S-score
-2
0
+2
relative change
Experimental Design with DNA
Microarrays
Sources of Variance in Microarray
Experiments
Type of Variance Factors
Biological
Anim al- anim al dif ferences (intra/inter cage, supp li er)
Geno type
Cir cadian rhyth ms
Stress
Techn ical
Sample treatment/harvesting (dissections , injections )
Target preparation (enzy me lots, mRNA qua lit y)
Lot-to-lot chip variation
Chip processing (scanning o rder)
Envi ronmental
Temperature
Hand li ng
Noise/odors
High Density DNA Microarrays
Synthesis and Analysis of 2-color
Spotted cDNA Arrays: “Brown
Chips”
Comparative Hybridization with
Spotted cDNA Microarrays
Synthesis of High Density Oligonucleotide
Arrays by Photolithography/Photochemistry
GeneChip Features
• Parallel analysis of >30K human,
rat or mouse genes/EST clusters
with 15-20 oligos (25 mer) per
gene/EST
• entire genome analysis (human,
yeast, mouse)
• 3-4 orders of magnitude dynamic
range (1-10,000 copies/cell)
• quantitative for changes >25% ??
• SNP analysis
Oligonucleotide Array Analysis
Total RNA
5’
AAAA
Rtase/
Pol II
dsDNA
AAAA-T7
TTTT-T7
T7 pol
Biotin-cRNA
TTTT-5’
CTP-biotin
Oligo(dT)-T7
Hybridization
Scanning
PM
MM
Steptavidinphycoerythrin
Stepwise Analysis of
Microarray Data
• Low-level analysis -- image analysis,
expression quantitation
• Primary analysis -- is there a change in
expression?
• Secondary analysis -- what genes show
correlated patterns of expression?
(supervised vs. unsupervised)
• Tertiary analysis -- is there a phenotypic
“trace” for a given expression pattern?
Affymetrix Arrays: Image
Analysis
Affymetrix Arrays: Image Analysis
“.DAT” file
“.CEL” file
Affymetrix Arrays: PM-MM
Difference Calculation
Probe pairs control for non-specific hybridization of oligonucle
Variability in Ln(FC)
Ln(FC1)
(a)
Ln(FC2)
Probe Level Analysis
Methods
• AvgDiff -- Affymetrix 1996, trimmed mean with
exclusion of outliers, PM-MM
• MAS 5 -- Affymetrix 2001, modeled correction of MM,
Tukey’s bi-weight, PM-MM or PM-m
• MBEI -- Li and Wong 2001, modeled correction and
outlier detection, PM-MM or PM only
• RMA (Robust Multichip Analysis) -- Irizarry et al.
2002, PM only
• PDNN (Position Dependent Nearest Neighbor) Zhang et al. 2003, thermodynamic model for probe
interactions, PM only
Slide Normalization: Pieces and
Pins
“Lowess” normalization,
Pin-specific Profiles
After Print-tip Normalization
http://www.ipam.ucla.edu/publications/fg2000/fgt_tspeed9.
pdf
See also: Schuchhardt, J. et al., NAR 28: e47 (2000)
Normalization Confounds:
Non-linearity
Normal vs. Normal
Normal vs. Tumor
Statistical Analysis of Microarrays:
“Not Your Father’s Oldsmobile”
Secondary Analysis:
Expression Patterns
• Supervised multivariate analyses
– Support vector machines
• Non-supervised clustering methods
– Hierarchical
– K-means
– SOM
Clustering Methods
• Distance measurement -- Euclidean most frequently
used (d2 = S (xi-yi)2)
• Clustering techniques
• Supervised multivariate analyses
– Support vector machines
• Non-supervised clustering methods
–
–
–
–
Hierarchical -- single vs. complete vs. average linkage
K-means -- have to estimate “k” initially
SOM -- self-organizing maps
Principal components analysis
K-means vs. Hierarchical
Clustering
• K-means: select number of groups, divide
genes randomly into those groups, calculate
inter- and intra-group distances. Move genes
until maximize inter-group and minimize intragroup differences.
• Hierarchical: calculate all pairwise distances
(correlations) and order genes accordingly.
AvgDif
f
Use of Sscore in
Hierarchica
l Clustering
of Brain
Regional
Expression
Patterns
Sscore
-2
0
+2
relative change
Expression
Profiling:
“It is possible that the expression
profile could serve as a universal
phenotype … Using a
comprehensive database of
reference profiles, the pathway(s)
perturbed by an uncharacterized
mutation would be ascertained by
simply asking which expression
patterns in the database its profile
most strongly resembles … it
should be equally effective at
determining consequences of
pharmaceutical treatments and
disease states”
Hughes et al. Cell 102:109-126 (2000)
Use of Expression Profile “Compendium”
to Characterize Gene or Drug Function
Key features:
established error model
profiled large number of mutants/drugs under highly controlled conditions
statistical treatment of expression patterns
verified array results with biochemical/phenotypic assays
Hughes et al. Cell 102:109-126 (2000)
Correlation in Expression Profiles
of Drugs/Genes Affecting Same
Pathways
cup5 and vma8,
components of
Unrelated gene
mutants
H+/ATPase complex
Red symbols = significant change (p<0.05) in both treatments
HMG CoAreductase mutant
vs. lovastatin, an
inhibitor of
HMG2
Hughes et al. Cell 102:109-126 (2000)
Assigning Function to Uncharacterized Genes
by Expression Profiles
Hughes et al. Cell 102:109-126 (2000)
Tertiary Analysis: Connecting
Function with Expression Patterns
• Annotation
– UniGene/Swiss-Prot, SOURCE, DAVID
• Biased functional assessment
– Manual, GenMAPP, GeneSpring
• Non-biased functional queries
– PubGen
– MAPPFinder, DAVID/Ease, GEPAS, GOTree
Machine, others
• Overlaying genomics and genetics
– WebQTL
Non-biased (semi)
Functional Group Analysis:
GenMAPP
Expression Analysis Systematic Explorer -EASE
http://apps1.niaid.nih.gov/david/upload.jsp
Genome Biol. 2003;4(10):R70. Epub 2003 Sep 11.
EASE -- Options in Analysis
Efforts to Integrate Diverse Biological Databases
with Expression Information: PubGen
www.PubGen.org
Quaternary Analysis: Profiles to Physio
Expression Profiling
Prot-Prot
Interactions
BioMed Lit
Relations
Expression Networks
Homolo
-Gene
Ontology
Genetics
Pharmacology
Complex
Trait
Analysis Stages for Oligonucleotide Microarrays
Analysis Stage
Normalization
Probe reduction
Comparat ive
Multivariate
studi es
Biological overlay
Description
Equa li zes ove rall signa l across
arrays to be compared, ensu res
li nea rit y o f response ac ross
abundan ce cla sses
Combines signa ls from multi ple
probes or probe pa ir s to define
ression levelΣ. Identifi es
gene s wit h inva li d or hype rvariable expr ession leve ls.
Compares exp ression of a gene
across two or more arrays to
determi ne significant chang es in
exp ression
Identifies significant correlation s
in expre ssion data across
expe rim ents/cond iti ons
Identify fun ction s for given
gene s, clusters of gen es;
hypo thesis gener ation
Exampl es of Methods
Who le chip(26)
Quantil e(27)
Weighted ave rage (MAS 4)(29)
Tukey b i- weight (MAS 5)(30)
Model-based (MBEI)(31)
Log scale li near additi ve (RMA)(32)
Positi on-dependen t stacking ene rgy modeli ng
(PDNN) (33)
t-test
rank o rder (MAS 5) (30)
permutation (SAM) (46, 47)
S-score (48)
hierarchical clustering
k-means clustering
self-organizing maps
principle componen ts analysis
& many more(34, 49)
Multi ple database access (Source)(50)
PubMed correlations (PubGene )(51)
Gene Ontology rank ings (GenMAPP,
MAPPFinde r, DAV ID/EASE)(52, 53)
Bioinformatics Resources for Microarray Experiments
Name
SOURCE
Description
Human, rat, mouse gene compil ation
from m ultiple databases; all ows batch
sub mis sions for anno tation
GeneLynx
Human, mouse gen e compil ation;
multi ple database li nks regarding
gene /protein structure and func tion
DAVID/Ease
Mines gene list f or frequency of GO
catego ries; anno tation of gene list;
statisti cal analysis of biological themes
in g ene lis t (EASE)
GenMAPP/MAPPFinder Supe rimpo ses array da ta on bio logical
pathway s; statistical rank ing of
func tiona l groups
FatiGO
Mines gene list f or occurrence of GO
terms; statistical comparison of two
li sts for ove r-representation
PubGene
Finds associations between gene s in
biomedical l iterature; supe rim poses
array data on lit erature li nks;
commercial version av aila ble
MEME
Search promoter regions of gen es in
li st/cluster fo r conse rved motifs
Link
http:/ /source.stanford.edu/cgibin/sourceSearch
http:/ /www.gen elynx.o rg/
http:/ /apps1.niaid.nih.gov /David/
upload.asp
http:/ /www.gen ma pp.org/
http:/ /fatigo.bio info.cnio.es/
http:/ /www.pubgen e.org/
http:/ /me me.sdsc.edu/meme/web
sit e/intro.html