Literature Survey: Microarray Data Analysis
Download
Report
Transcript Literature Survey: Microarray Data Analysis
Literature Survey:
Microarray Data Analysis
Ei-Ei Gaw
Arizona State University
CSE 591
April 24, 2003
cDNA Microarray Procedure
http://www.anst.uu.se/frgra677/projekt_eng.html
Microarray Data
• Expression patterns of thousands of genes
simultaneously.
– Usually the number of experiments is small compare to the number
of genes.
• Random and systematic variations.
– Systematic variations due to complexity of the method.
• Remove low-quality measurements.
Preprocessing
• Transformation
– Aim: Change data to reflect assumptions (Homologous
variance and normal distribution) of statistical
techniques.
– Log and variance-stabilizing transformation.
• Normalization
– Aim: Account for random and systematic variations.
– Global, lowness, location, and scale normalization
methods.
• Missing data
– K Nearest Neighbors (KNN) algorithm, a Singular
Value Decomposition based method (SVD), and simple
row (gene) average.
• Reduce dimensionality
Classification
• Hierarchical clustering
– Classify tumor and find previously unrecognized tumor subtypes
– Identify differentially expressed genes
– Cluster co-expressed genes, but not suited to find multiple ways
expression patterns are similar
• Self-organizing map
– Suited to find a small number of prominent classes
– Class discovery
• Support vector machine
– Operate in extremely high-dimensional feature space
– Supervised learning – take advantage of prior knowledge
• Genetic Algorithm/KNN
Regulatory Networks
• Two-stage approach
– Find co-regulated gene using clustering algorithm and then look
for conserved motifs upstream
• Unified approach – Joint likelihoods for sequence and
expression
– Find co-regulated gene and then look for conserved motifs
upstream
• Kolmogorov-Smirnov method
– Does not require clustering
– Sort red-green ratios
• Minreg
–
–
–
–
Require prior biological knowledge – candidate regulators
One advantage is speed
Identify and characterize both regulators and regulatees
Assign biological function to regulators
Genetic Networks
• Association rules
– Global gene expression profiling
– Can revel relationship between different genes and relationship
between environment and expression
• Bayesian Networks
• Boolean Networks
– REVEAL (REVerse Engineering Algorithm)
– NetWork
Bibliography
•
Durbin, B. P., Hardin, J. S., Hawkins, D. M., and Rocke, D. M. (2002) A variance-stabilizing
transformation for gene-expression microarray data. Bioinformatics, 18:S105-S110.
•
Kerr, M. Kathleen, Martin, Mitchell, and Churchill, Gary A. (2000) Analysis of Variance for Gene
Expression Microarray Data. Journal of Computational Biology, 7:819-837
•
Yang, Yee Hwa, Dudoit, Sandrine, Luu, Percy et.al (2002) Normalization for cDNA microarry data:
a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids
Research, 30:e15.
•
Quackenbush, John (2002) Microarray data normalization and transformation. Nature Genetics
Supplement 32:496-501.
•
Troyanskaya, Olga et. al. (2001) Missing value estimation methods for DNA l;. Bioinformatics,
17:520-525.
•
Antoniadis, A., Lambert-', S. and Leblanc, F. (2003) Effective dimension reduction methods for
tumor classification using gene expression data. Bioinformatics, 19, 563-570.
•
Golub, T. R. et. al. (1999) Molecular classification of Cancer: class Discovery and Class Prediction
by Gene Expression Monitoring. Science 286:531-537.
•
Rickman, David S. et. al. (2001) Distinctive Molecular profile of High-Grade and Low-Grade
Gliomas Based on Oligonucleotide Microarray Analysis. Cancer Research 61:6885-6891.
•
Eisen, Michael B. et. al. (1998) Cluster analysis and display of genome-wide expression patterns.
Proc. Natl. Acad. Sci. USA 95:14863-14868.
Bibliography
•
Brown, Michael P. S. et. al. (2000) Knowledge-based analysis of microarray gene expression data by
using support vector machines. Proc. Natl. Acad. Sci. USA 97:262-267.
•
Li, Leping et. al. Gene Assessment and Sample Classification for Gene Expression Data Using a
Genetic Algorithm/k-nearest Neighbor Method.
•
Holmes, Ian, Bruno, (2000) William J. Finding Regulatory Elements Using Joint Likelihoods for
Sequence and Expression Profile Data. American Association for Artificial Intelligence
(www.aaai.org).
•
Van Helden, J., Andre, B., and Collado-Vides, J. (1998) Extracting Regulatory Sites from the
Upstream Region of Yeast Genes by Computational analysis of Oligonucleotide Frequencies. J. Mol.
Biol. 281:827-842.
•
Pe’er, Dana, Regev, Aviv, and Tanay, Amos (2002) Minreg: Inferring an active regulator set.
Bioinformatics 18:S258-S267.
•
Jensen, Lars and Knudsen, Steen (2002) Automatic discovery of regulatory patterns in promoter
regions based on whole cell expression data and functional annotation. Bioinformatics 16:326-333.
•
Creighton, Chad and Hanash, Samir (2003) Mining gene expression databases for association rules.
Bioinformatics 19:79-86.
•
Friedman, Nir et. al. (2000) Using Bayesian Networks to Analyze Expression Data. J. Comp. Bio.
7:601-620.
Bibliography
•
Liang S., Fuhrman, S. and Somogyi, R. (1998) REVEAL, A General Reverse Engineering Algorithm
for Inference of Genetic Network Architectures. Pacific Symposium on Biocomputing 3:18-29 (1998).
•
Akutsu, T., Miyano, S. and S. Kuhara S. (1999) Identification of Genetic Networks from a Small
Number of Gene Expression Patterns Under the Boolean Network Model. Pacific Symposium on
Biocomputing 4:17-28.
•
Samsonova, M.G. and Serov, V.N. (1999) NetWork: An Interactive Interface to the Tools for Analysis
of Genetic Network Structure and Dynamics. Pacific Symposium on Biocomputing 4:102-111.