BBSIleccture7_04

Download Report

Transcript BBSIleccture7_04

Introduction to DNA
Microarrays
Michael F. Miles, M.D., Ph.D.
Depts. of Pharmacology/Toxicology and
Neurology and the Center for Study of
Biological Complexity
[email protected]
225-4054
Biological Regulation:
“You are what you express”
• Levels of regulation
• Methods of measurement
• Concept of genomics
Regulation of Gene
Expression
• Transcriptional
– Altered DNA binding protein complex abundance or function
• Post-transcriptional
– mRNA stability
– mRNA processing (alternative splicing)
• Translational
– RNA trafficking
– RNA binding proteins
• Post-translational
– Many forms!
Regulation of Gene Expression
• Genes are expressed when they are transcribed into
RNA
• Amount of mRNA indicates gene activity
•
Some genes expressed in all tissues -- but are still
regulated!
•
Some genes expressed selectively depending on
tissue, disease, environment
• Dynamic regulation of gene expression allows long
term responses to environment
 Mesolimbic dopamine
? Other
Acute Drug Use
Reinforcement
Intoxication
Altered Signaling
Gene Expression
Tolerance
Dependence
?Synaptic Remodeling
Sensitization
Chronic Drug Use
?Synaptic Remodeling
Persistent Gene Exp.
Compulsive Drug
Use
“Addiction”
Progress in Studies on Gene
Regulation
1960
1970
1980
1990
2000
mRNA,
tRNA discovered
Nucleic acid hybridization,
protein/RNA
electrophoresis
Molecular cloning;
Southern, Northern &
Western blots; 2-D
gels
Subtractive
Hybridization, PCR,
Differential Display,
MALDI/TOF MS
Genome
Sequencing
DNA/Protein
Microarrays
Nucleic Acid Hybridization:
How It Works
Primer on Nucleic Acid
Hybridization
• Hybridization rate depends on time,the
concentration of nucleic acids, and the
reassociation constant for the nucleic
acid:
C/Co = 1/(1+kCot)
Biological Networks
Types of Biological
Networks
Gene Regulation Network
Examining Biological Networks:
Experimental Design
Examining Biological Networks
A Bit of History
~1992-1996: Oligo arrays developed by Fodor,
Stryer, Lockhart, others at Stanford/Affymetrix and
Southern in Great Britain
~1994-1995: cDNA arrays usually attributed to Pat
Brown and Dari Shalon at Stanford who first used a
robot to print the arrays. In 1994, Shalon started
Synteni which was bought by Incyte in 1998.
However, in 1982 Augenlicht and Korbin proposed a
DNA array (Cancer Research) and in 1984 they
made a 4000 element array to interrogate human
cancer cells.
High Density DNA Microarrays
Expression Profiling: A Non-biased, Genomic
Approach to Understanding Complex CNS Disease
Candidate
Gene Studies
Molecular
Triangulation:
Genomics,
Genetics and
Pharmacology
Bioinformatics:
Genetical genomics
Functional Grouping
Literature Networks
Protein Interactions
Promotor Motif Grouping
Utility of Expression
Profiling
•
•
•
•
Non-biased, genome-wide
Hypothesis generating
Gene hunting
Pattern identification:
– Insight into gene function
– Molecular classification
– Phenotypic mechanisms
AvgDiff
Use of Sscore in
Hierarchical
Clustering
of Brain
Regional
Expression
Patterns
S-score
-2
0
+2
relative change
Experimental Design with DNA
Microarrays
Sources of Variance in Microarray
Experiments
Type of Variance Factors
Biological
Anim al- anim al dif ferences (intra/inter cage, supp li er)
Geno type
Cir cadian rhyth ms
Stress
Techn ical
Sample treatment/harvesting (dissections , injections )
Target preparation (enzy me lots, mRNA qua lit y)
Lot-to-lot chip variation
Chip processing (scanning o rder)
Envi ronmental
Temperature
Hand li ng
Noise/odors
High Density DNA Microarrays
Synthesis and Analysis of 2-color
Spotted cDNA Arrays: “Brown
Chips”
Comparative Hybridization with
Spotted cDNA Microarrays
Synthesis of High Density Oligonucleotide
Arrays by Photolithography/Photochemistry
GeneChip Features
• Parallel analysis of >30K human,
rat or mouse genes/EST clusters
with 15-20 oligos (25 mer) per
gene/EST
• entire genome analysis (human,
yeast, mouse)
• 3-4 orders of magnitude dynamic
range (1-10,000 copies/cell)
• quantitative for changes >25% ??
• SNP analysis
Oligonucleotide Array Analysis
Total RNA
5’
AAAA
Rtase/
Pol II
dsDNA
AAAA-T7
TTTT-T7
T7 pol
Biotin-cRNA
TTTT-5’
CTP-biotin
Oligo(dT)-T7
Hybridization
Scanning
PM
MM
Steptavidinphycoerythrin
Stepwise Analysis of
Microarray Data
• Low-level analysis -- image analysis,
expression quantitation
• Primary analysis -- is there a change in
expression?
• Secondary analysis -- what genes show
correlated patterns of expression?
(supervised vs. unsupervised)
• Tertiary analysis -- is there a phenotypic
“trace” for a given expression pattern?
Affymetrix Arrays: Image
Analysis
Affymetrix Arrays: Image Analysis
“.DAT” file
“.CEL” file
Affymetrix Arrays: PM-MM
Difference Calculation
Probe pairs control for non-specific hybridization of oligonucle
Variability in Ln(FC)
Ln(FC1)
(a)
Ln(FC2)
Probe Level Analysis
Methods
• AvgDiff -- Affymetrix 1996, trimmed mean with
exclusion of outliers, PM-MM
• MAS 5 -- Affymetrix 2001, modeled correction of MM,
Tukey’s bi-weight, PM-MM or PM-m
• MBEI -- Li and Wong 2001, modeled correction and
outlier detection, PM-MM or PM only
• RMA (Robust Multichip Analysis) -- Irizarry et al.
2002, PM only
• PDNN (Position Dependent Nearest Neighbor) Zhang et al. 2003, thermodynamic model for probe
interactions, PM only
Slide Normalization: Pieces and
Pins
“Lowess” normalization,
Pin-specific Profiles
After Print-tip Normalization
http://www.ipam.ucla.edu/publications/fg2000/fgt_tspeed9.
pdf
See also: Schuchhardt, J. et al., NAR 28: e47 (2000)
Normalization Confounds:
Non-linearity
Normal vs. Normal
Normal vs. Tumor
Statistical Analysis of Microarrays:
“Not Your Father’s Oldsmobile”
Secondary Analysis:
Expression Patterns
• Supervised multivariate analyses
– Support vector machines
• Non-supervised clustering methods
– Hierarchical
– K-means
– SOM
Clustering Methods
• Distance measurement -- Euclidean most frequently
used (d2 = S (xi-yi)2)
• Clustering techniques
• Supervised multivariate analyses
– Support vector machines
• Non-supervised clustering methods
–
–
–
–
Hierarchical -- single vs. complete vs. average linkage
K-means -- have to estimate “k” initially
SOM -- self-organizing maps
Principal components analysis
K-means vs. Hierarchical
Clustering
• K-means: select number of groups, divide
genes randomly into those groups, calculate
inter- and intra-group distances. Move genes
until maximize inter-group and minimize intragroup differences.
• Hierarchical: calculate all pairwise distances
(correlations) and order genes accordingly.
AvgDif
f
Use of Sscore in
Hierarchica
l Clustering
of Brain
Regional
Expression
Patterns
Sscore
-2
0
+2
relative change
Expression
Profiling:
“It is possible that the expression
profile could serve as a universal
phenotype … Using a
comprehensive database of
reference profiles, the pathway(s)
perturbed by an uncharacterized
mutation would be ascertained by
simply asking which expression
patterns in the database its profile
most strongly resembles … it
should be equally effective at
determining consequences of
pharmaceutical treatments and
disease states”
Hughes et al. Cell 102:109-126 (2000)
Use of Expression Profile “Compendium”
to Characterize Gene or Drug Function
Key features:
established error model
profiled large number of mutants/drugs under highly controlled conditions
statistical treatment of expression patterns
verified array results with biochemical/phenotypic assays
Hughes et al. Cell 102:109-126 (2000)
Correlation in Expression Profiles
of Drugs/Genes Affecting Same
Pathways
cup5 and vma8,
components of
Unrelated gene
mutants
H+/ATPase complex
Red symbols = significant change (p<0.05) in both treatments
HMG CoAreductase mutant
vs. lovastatin, an
inhibitor of
HMG2
Hughes et al. Cell 102:109-126 (2000)
Assigning Function to Uncharacterized Genes
by Expression Profiles
Hughes et al. Cell 102:109-126 (2000)
Tertiary Analysis: Connecting
Function with Expression Patterns
• Annotation
– UniGene/Swiss-Prot, SOURCE, DAVID
• Biased functional assessment
– Manual, GenMAPP, GeneSpring
• Non-biased functional queries
– PubGen
– MAPPFinder, DAVID/Ease, GEPAS, GOTree
Machine, others
• Overlaying genomics and genetics
– WebQTL
Non-biased (semi)
Functional Group Analysis:
GenMAPP
Expression Analysis Systematic Explorer -EASE
http://apps1.niaid.nih.gov/david/upload.jsp
Genome Biol. 2003;4(10):R70. Epub 2003 Sep 11.
EASE -- Options in Analysis
Efforts to Integrate Diverse Biological Databases
with Expression Information: PubGen
www.PubGen.org
Quaternary Analysis: Profiles to Physio
Expression Profiling
Prot-Prot
Interactions
BioMed Lit
Relations
Expression Networks
Homolo
-Gene
Ontology
Genetics
Pharmacology
Complex
Trait
Analysis Stages for Oligonucleotide Microarrays
Analysis Stage
Normalization
Probe reduction
Comparat ive
Multivariate
studi es
Biological overlay
Description
Equa li zes ove rall signa l across
arrays to be compared, ensu res
li nea rit y o f response ac ross
abundan ce cla sses
Combines signa ls from multi ple
probes or probe pa ir s to define
ression levelΣ. Identifi es
gene s wit h inva li d or hype rvariable expr ession leve ls.
Compares exp ression of a gene
across two or more arrays to
determi ne significant chang es in
exp ression
Identifies significant correlation s
in expre ssion data across
expe rim ents/cond iti ons
Identify fun ction s for given
gene s, clusters of gen es;
hypo thesis gener ation
Exampl es of Methods
Who le chip(26)
Quantil e(27)
Weighted ave rage (MAS 4)(29)
Tukey b i- weight (MAS 5)(30)
Model-based (MBEI)(31)
Log scale li near additi ve (RMA)(32)
Positi on-dependen t stacking ene rgy modeli ng
(PDNN) (33)
t-test
rank o rder (MAS 5) (30)
permutation (SAM) (46, 47)
S-score (48)
hierarchical clustering
k-means clustering
self-organizing maps
principle componen ts analysis
& many more(34, 49)
Multi ple database access (Source)(50)
PubMed correlations (PubGene )(51)
Gene Ontology rank ings (GenMAPP,
MAPPFinde r, DAV ID/EASE)(52, 53)
Bioinformatics Resources for Microarray Experiments
Name
SOURCE
Description
Human, rat, mouse gene compil ation
from m ultiple databases; all ows batch
sub mis sions for anno tation
GeneLynx
Human, mouse gen e compil ation;
multi ple database li nks regarding
gene /protein structure and func tion
DAVID/Ease
Mines gene list f or frequency of GO
catego ries; anno tation of gene list;
statisti cal analysis of biological themes
in g ene lis t (EASE)
GenMAPP/MAPPFinder Supe rimpo ses array da ta on bio logical
pathway s; statistical rank ing of
func tiona l groups
FatiGO
Mines gene list f or occurrence of GO
terms; statistical comparison of two
li sts for ove r-representation
PubGene
Finds associations between gene s in
biomedical l iterature; supe rim poses
array data on lit erature li nks;
commercial version av aila ble
MEME
Search promoter regions of gen es in
li st/cluster fo r conse rved motifs
Link
http:/ /source.stanford.edu/cgibin/sourceSearch
http:/ /www.gen elynx.o rg/
http:/ /apps1.niaid.nih.gov /David/
upload.asp
http:/ /www.gen ma pp.org/
http:/ /fatigo.bio info.cnio.es/
http:/ /www.pubgen e.org/
http:/ /me me.sdsc.edu/meme/web
sit e/intro.html