GST II: ---Title---
Download
Report
Transcript GST II: ---Title---
Gene Expression
Data Analyses (1)
Trupti Joshi
Computer Science Department
317 Engineering Building North
E-mail: [email protected]
573-884-3528(O)
Lecture Schedule for Gene
Expression Analyses
Concept of microarray and experimental
design for DNA microarray (9/6/05)
Data transformation and normalization for
DNA microarray (9/8/05)
Statistical analysis for DNA microarray
and Software comparison (9/13/05)
Clustering Techniques for DNA
microarray (Dr. Dong Xu 9/15/05)
Lecture Outline
Central Dogma of Molecular Biology
Introduction to Gene Expression and
Microarray
Experimental Design
Lecture Outline
Central Dogma of Molecular Biology
Introduction to Gene Expression and
Microarray
Experimental Design
Central Dogma of Molecular
Biology
Gene Expression
mRNA level
Protein level
Lecture Outline
Central Dogma of Molecular Biology
Introduction to Gene Expression and
Microarray
Experimental Design
Introduction: Gene Expression
Same DNA in all cells, but only a few percent common
genes expressed (house-keeping genes).
A few examples:
(1) Specialized cell: over-represented hemoglobin in blood cells.
(2) Different stages of life cycle: hemoglobins before and after
birth, caterpillar and butterfly.
(3) Different environments: microbial in nutrient poor or rich
environment.
(4) Diversity of life.
Microarray is about gene
expression.
All information about living being is coded in DNA
as a set of genes.
Each gene contains structural information about
protein sequence and regulatory information about
protein expression.
Intermediate step between gene and protein is
mRNA.
The concentration of mRNA is measured by
microarray.
Problem
RNA levels and protein levels are not always directly
correlated.
No mRNA no protein; Relation is not simple and
not universal.
Functional genomics fill the gap between gene
expression and organism function.
The meaning of life is hidden in gene expression
value but it is not easy to get it out.
Eucaryote Gene Expression Control
nucleus
DNA
inactive
mRNA
mRNA
degradation
control
Primary
RNA
transcript
transcriptional
control
cytosol
mRNA
RNA
processing
control
Microarray mRNA
Mass-spec protein
RNA
transport
control
mRNA
translation
control
protein
nucleus
membrane
protein
activity
control
inactive
protein
Principle of DNA Microarray
Complimentary hybridization
is the basis of RNA measurement.
Base-pairing rules
DNA:
A-T and G-C
RNA:
A-U, G-C, G-U
A--T
G--C
T--A
C--G
Microarray Technology
Macroarray: sample spot sizes >= 300
microns
Microarray: typically < 200 microns
biochip, DNA chip, DNA microarray, gene array,
genome array, gene chip
Initial Ideas of DNA Microarray
Immunoassay
Ekins, R. and F. W. Chu. Microarrays: their origins and applications.
Trends in Biotech. 17: 217-218
Application of DNA Microarray
Technology
Gene discovery
Biological mechanisms (gene regulatory network, etc.)
Disease diagnosis (cancer, infectious disease, etc.)
Drug discovery: Pharmacogenomics
Toxicological research: Toxicogenomics
Microbial diversity in the environment
…
Increasing Microarray
Applications
Advantages and
Disadvantages of Micoarray
Advantages:
High-throughput
Analyze gene expressions of different cells or
from cells under different condition
simultaneously
Disadvantages:
High noise
Relatively high cost
Categories of DNA Microarray
Probe based
cDNA microarray: cDNA (500~5,000 bases) as probe. 10,00020,000 spots/slide.
Oligo microarray (Affimetrix Microarray): oligonucleotide (20~80mer oligos) as probe. 200,000-500,000 spots/slide.
Dye based
Double label. For example, Cy3 and Cy5.
One sample is labeled with a “green” dye and the other with “red”.
Relative fluorescent intensity of red and green from the same spot.
Single label.
All samples are labeled with one color.
Absolute fluorescent intensity between different slides.
Does not control for the amount of DNA in each spot.
Chips
Typically
a glass slide with
cDNA or oligo
Printed
by robot or
synthesized by photo-lithography.
Typical
arrays are 25x75
mm. Contains up to 500,000
probed gene fragments.
Probe Layout on Chips
Positive control
Genome DNA
House keeping genes
Negative control
Spots with cDNA from very different species
Blank spots
Spots with buffer
Samples
Technical replicates
Microarray Procedures
Experimental Design
RNA extraction
cDNA prepration
Data interpretation
Statistical analysis
Data transformation and Normalization
cDNA labeling
Sample mixing
Hybridization
Image Analysis
Scanning
Molecular Interaction on
microarray
1 molecule per square angstroms
Large molecules are easily to be folded
by themselves
Short targets are better than large
targets to interact with tethered oligos
Ideally, target and probe should have
the same length
Molecules interaction are dynamic
Competitive hybridization
Lecture Outline
Central Dogma of Molecular Biology
Introduction to Gene Expression and
Microarray
Experimental Design
Experimental Protocol
A. Synthesis of cDNA
Synthesis of the second strand DNA
B. Labeling
C. Hybridization
D. Scanning
Rational for Experimental
Design
Scientific constrains:
Scientific aims and their priorities
Physical constrains:
Number of slides
Amount of mRNA
Goal of an optimal design: Minimize costs
from money, time
Maximize the useful information
Issues for Experimental
Design
Scientific
Specific questions and their priorities.
Practical (logistic)
Types of mRNA samples: reference, control, treatment.
Amount of material available (mRNA, slides, dyes).
Other factors
The experimental process before hybridization: sample
isolation, mRNA extraction, amplification, and labeling.
Controls planned: positive, negative, ratio, and so on.
Verification method: northern blot, reverse transcriptase
(RT)-PCR, in situ hybridization, and so on.
Variability and Replicates
Gene expression level for one gene in different
slides may not be the same
Replicates:
Technical replicates: the target mRNA is from the same
pool (RNA extraction)
Reduce variability
Biological replicates: the target mRNA is from different
individual extraction.
Obtain averages of independent data
Validate generalizations of conclusions
Variation within technical replicates are smaller
than that within Biological replicates
Importance of Replicates
Graphical Representation
of Design
Cy3: green
Cy5: red
Cy3+Cy5: blue
Use directed graphs
Node: sample
Edge: hybridization, use Cy3 Cy5
Weight: replicates
Direct & Indirect Comparison
Compared objectives: T and C
Directive design: TC are on the same slide
Indirect design: TR and CR are on the same slides,
respectively. But T and C are on different slides
Variance & Std Deviation
Variance
The most common statistical measure of variability of a
random quantity or random sample about its mean. Its scale is the
square of the scale of the random quantity or sample.
Standard Deviation
Standard deviation is the square root of the variance. It
measures the spread of a set of observations. The larger the
standard deviation is, the more spread out the observations are.
Variance for Indirect
Design
For sample T and C:
log2 T
α and β are means of log intensities across slides for a typical gene.
Differential Expression
Direct design
^
D 1 / 2(log2 (T / C ) log2 (T ' / C ' ))
var(D ) / 2
2
Indirect design
^
D log2 (T / R) log2 (C / R' )
var(D ) 2
2
log2 C
Dye-swapped Replication
Two sets of replications
Dye-swapped replications
Two hybridizations for two mRNA samples are on the
two slides, but dye swapped. For example, Cy3 for A
and Cy5 for the first hybridization (slide 1), then C5 for
A and Cy3 for the second hybridization (slide 2).
Advantage: reduce systematic bias (e.g. dye bias)
Reference Design
It may not be feasible to perform direct design when
experimental conditions are more than 3.
Factors in the design
Single
Two
factor
factors
Multiple
factors
Single Factor Experiments
Time-course Experiments
2x2 factorial experiments
Lecture Outline
Central Dogma of Molecular Biology
Introduction to Microarray
Application
Advantage vs. Disadvantage
Chips
Microarray procedure
Experimental design
Rational
Variability and Replication
Graphical representation
Direct comparison and Indirect comparison
Dye swap
Reference design
Single-factor design
Multifactorial design
Reading Assignments
Suggested reading:
Yang, YH and T. Speed. 2002. Design issues for
cDNA microarray experiments. Nature Reviews, 3:
579-588.
Statistical analysis of gene expression microarray
data. Chapter 2. pp. 35-92. Chapman&Hall/CRC
Press, 2003.