Final Project - Computer Science

Download Report

Transcript Final Project - Computer Science

Yeast Dataset Analysis
Hongli Li
91.580 Final Project
Computer Science Department
UMASS Lowell
Outline





Gene Ontology Annotation
Data Preprocessing
Cluster
Results
Conclusion
GO Annotations




Total Number of Gene: 799
327 Gene has GO at level 3 of Biological
Process
Genes with GO but not at level 3: 272
Genes without GO: 200
GO Annotation
GO Anotation

Of 327 genes with GO at level 3




170 Genes belong to GO:0008152, the metabolism
90 Genes belong to the GO:0007049 the Cell
Cycle
81 Genes belong to GO:0016043, the cell
organization and biogenesis
51 Genes belong to GO:0006810, the transport
Data Preprocessing




Dataset: 799 Cell Cycle Regulated Genes
Filter: Minimum Exiting value over 85%
Impute Missing Values Using KNN
Standardize Patterns (mean = 0 and standard
deviation =1)
Cluster

SOTA – Self-Organizing Tree Algorithm


Euclidean Distance
Variability Threshold: 80%
Result
Cluster 61
Cluster 61

67 Genes from 799 fall in Cluster 61



24 out of 67 genes has GO
10 out of 24 genes belongs to metabolism
14 belongs to Cell Cycle




8 belongs to S phase of mitotic cell cycle
8 belongs to DNA replication
4 belongs to G1/S transition of mitotic cell cycle
Only one genes that belongs to metabolism not in
cell cycles
Cluster 60

33 Genes in this Cluster




11 of 33 has GO
4 of 11 genes are in M-phase
specific microtubule process
which belongs to Cell Cycle
7 in organelle organization
and biogenesis which belongs
to cell growth and/or
maintenance
totally 8 in cell cycle
Cluster 59

38 genes in this cluster

15 genes has anotation


7 in metabolism
5 in cell cycle



M phase of mitotic cell cycle has 3
Nuclear division has 3
No gene in these two classes are same
Conclusion & Future Work





Cluster #61 has strong relations with cell
cycle, next is cluster #60 and #59
Sub-Cluster the cluster #59, #60, #61
Analyze the gene expression data of those
genes that are known belongs to GO cell cycle
annotations
Analyze other clusters
Do the same analyze to 6000 gene dataset
Reference
1.
2.
3.
4.
5.
6.
7.
8.
http://gepas.bioinfo.cnio.es/index.html
P. T. Spellman et al., Comprehensive identification of cell cycleregulated
genes of the yeast Saccharomyces cerevisiae by microarray hybridization
Mol. Biol. Cell., vol. 9, pp. 3273--3297, 1998.
Raymond J Cho. A Genome-Wide Transcriptional Analysis of the Mitotic
Cell Cycle. Mol. Biol. Cell., vol. 2, pp. 65--73, 1998.
Herrero, J., Valencia et al. A hierarchical unsupervised growing neural
network for clustering gene expression patterns. Bioinformatics, 17(2),
126-136. 2001
Orly Alter. Singular value decomposition for genome-wide expression
data processing and modeling. PNS, vol. 97, pp 10101-10106. 2000
http://www.cellsalive.com/cell_cycle.htm
http://www.geneontology.org/
http://fatigo.bioinfo.cnio.es/htdocs/helpFatiGO.html