Review of Gene Expression Analysis

Download Report

Transcript Review of Gene Expression Analysis

Analysis of High-throughput
Gene Expression Profiling
Why to Measure Gene Expression
1. Determines which genes are induced/repressed in
response to a developmental phase or to an
environmental change.
2. Sets of genes whose expression rises and falls
under the same condition are likely to have a
related function.
3. Features such as a common regulatory motif can be
detected within co-expressed genes.
4. A pattern of gene expression may be used as an
indicator of abnormal cellular regulation.
• A useful tool for cancer diagnosis
Why to Measure Gene Expression in
Large Scale?
Transitional
vs. Highthroughput
Approaches
Techniques Used to Detect Gene
Expression Level
•
•
•
•
•
•
•
•
•
•
Microarray (single or dual channel)
High-throughput
SAGE
EST/cDNA library
Northern Blots
Subtractive hybridisation
Differential hybridisation
Representational difference analysis (RDA)
DNA/RNA Fingerprinting (RAP-PCR)
Differential Display (DD-PCR)
aCGH: array CGH (DNA level)
Basic Information of Microarray,
SAGE and cDNA Library
(DNA) Microarray
1. Developed around 1987.
2. Employ methods previously exploited in immunoassay
context – specific binding and marking techniques.
3. Two types of probes:
Format I: probe cDNA (500~5,000 bases long) is
immobilized to a solid surface such as glass; widely
considered as developed at Stanford University;
Traditionally called DNA microarrays.
Format II: an array of oligonucleotide (20~80-mer oligos)
probes is synthesized either in situ(on-chip) or by
conventional synthesis followed by on-chip immobilization;
developed at Affymetrix, Inc. Many companies are
anufacturing oligonucleotide based chips using alternative
in-situ synthesis or depositioning technologies.
Historically called DNA chips.
Microarray
• Single Channel: sub-type classification
• Dual Channel: differential expression
gene screening
• Tissue microarray
• Protein microarray
• ……
Array CGH
• Detecting DNA copy variation via
microarray approach
• A hotspot in recent research works,
especially in Cancer research
Microarray Analysis
Which genes are upregulated, down-regulated,
co-regulated, not-regulated?
gene discovery
pattern discovery
inferences about biological processes
classification of biological processes
SAGE
• Experimental technique assigned to
gain a quantitive measure of gene
expression.
• ~10-20 base “tags” are produced
(immediately adjacent to the 3’ end of
the 3’ most NlaIII restriction site).
• The SAGE technique measures not the
expression level of a gene, but
quantifies a "tag" which represents the
transcription product of a gene.
SAGE
Tags are isolated
and concatermized.
Relative expression
levels can be
compared between
cells in different
states.
SAGEmap (http://cgap.nci.nih.gov)
SAGE:
comparing two relational libraries
EST library (UniGene)
Gene expression info from Unigene Library
An Example of In-house
EST Library Analysis
The Algorithms and Challenges of
High-throughput Gene Expression
Analysis
Seeing is believing?
No, need to correct errors.
SAGE:
• A typical experiment requires ~30,000 gene
expression comparisons where normal and a
diseased cell is compared.
• The results were subject to the size and
reliabilities of the SAGE libraries.
• Statistical measures are used to filter out
candidate genes to reduce the dimensionality
of the data but it is tedious and time
consuming to play with these measures until
a good set is found.
SAGE
• TPM: a simple normalization method
TPM=Count*1000,000/TotalCount
• Bayesian approach
http://cancerres.aacrjournals.org/cgi/con
tent/full/59/21/5403
• systematic
• random
log signal intensity
Microarray: Sources of errors
log RNA abundance
Sources of Errors (Cont.)
• Printing and/or tip problems
• Labeling and dye effects (differing amounts of
RNA labeled between the 2 channels)
• Differences in the power of the two lasers (or
other scanner problems)
• Difference in DNA concentration on arrays
(plate effects)
• Spatial biases in ratios across the surface of
the microarray due to uneven hybridization
• cDNA array cannot distinguish alternatively
spliced forms
Errors that cannot be corrected by statistics
• Competitive hybridization of
different targets on the chip
• Failure to distinguish different
splicing forms
• Misinterpretation of time course
data when there are not sufficient
points
• Misinterpretation of relative
intensity
Does clustered time course really mean coexpression?
Picture taken from
http://genomics.stanford.edu/yeast/additional_figures_link.html
Yes, you can study
known system (such
as cell cycle) this way;
but, how about the
unknown systems?
Normalization by iterative linear regression
fit a line (y=mx+b) to the data set
set aside outliers (residuals > 2 x s.e.)
repeat until r2
changes by
< 0.001
then apply slope and
intercept to the
original dataset
D Finkelstein et al.
http://www.camda.duke.edu/CAMDA00/abstracts.asp
Normalization (Curvilinear)
ratio {log2 (Cy5 / Cy3)}
G Tseng et al., NAR 2001
Loess function
fit line
0
average signal {log2 (Cy3 + Cy5)/2}
After Normalization ……
• Differentially Expressed (DE) Gene screeing
– T-test
– T-statistics
– SVM
• Clustering
– Hierarchical
– SOM
– K-means
• Network (Pathway) analysis
–
–
–
–
BioCarta, KEGG, GO databases
Bayesian network learning
Topology
…
Bioinformatics challenges
1. data management
2. utilizing data from multiple
experiments
3. utilizing data from multiple groups
* with different technologies
* with only processed data available
Bioinformatics Analysis of
Integrated Analysis of Gene
Expression Profiling
Large-scale meta-analysis of cancer
microarray data identifies common
transcriptional profiles of neoplastic
transformation and progression
Daniel R. et al. PNAS, 2004(101), 9309-9314
T-test
Q values (estimated false discovery rates)
were calculated as
where P is P value, n is the total
number of genes, and i is the sorted rank
of P value.
Cont. Meta-Profiling.
The purpose of meta-profiling is to address the
hypothesis that a selected set of differential
expression signatures shares a significant
intersection of genes (a meta-signature), thus
inferring a biological relatedness.
67 genes were screened by mata-analysis
Integrated Cancer Gene Expression Map
7 genes were discovered by the system
THANX!!