Final Project - Computer Science
Download
Report
Transcript Final Project - Computer Science
Yeast Dataset Analysis
Hongli Li
91.580 Final Project
Computer Science Department
UMASS Lowell
Outline
Gene Ontology Annotation
Data Preprocessing
Cluster
Results
Conclusion
GO Annotations
Total Number of Gene: 799
327 Gene has GO at level 3 of Biological
Process
Genes with GO but not at level 3: 272
Genes without GO: 200
GO Annotation
GO Anotation
Of 327 genes with GO at level 3
170 Genes belong to GO:0008152, the metabolism
90 Genes belong to the GO:0007049 the Cell
Cycle
81 Genes belong to GO:0016043, the cell
organization and biogenesis
51 Genes belong to GO:0006810, the transport
Data Preprocessing
Dataset: 799 Cell Cycle Regulated Genes
Filter: Minimum Exiting value over 85%
Impute Missing Values Using KNN
Standardize Patterns (mean = 0 and standard
deviation =1)
Cluster
SOTA – Self-Organizing Tree Algorithm
Euclidean Distance
Variability Threshold: 80%
Result
Cluster 61
Cluster 61
67 Genes from 799 fall in Cluster 61
24 out of 67 genes has GO
10 out of 24 genes belongs to metabolism
14 belongs to Cell Cycle
8 belongs to S phase of mitotic cell cycle
8 belongs to DNA replication
4 belongs to G1/S transition of mitotic cell cycle
Only one genes that belongs to metabolism not in
cell cycles
Cluster 60
33 Genes in this Cluster
11 of 33 has GO
4 of 11 genes are in M-phase
specific microtubule process
which belongs to Cell Cycle
7 in organelle organization
and biogenesis which belongs
to cell growth and/or
maintenance
totally 8 in cell cycle
Cluster 59
38 genes in this cluster
15 genes has anotation
7 in metabolism
5 in cell cycle
M phase of mitotic cell cycle has 3
Nuclear division has 3
No gene in these two classes are same
Conclusion & Future Work
Cluster #61 has strong relations with cell
cycle, next is cluster #60 and #59
Sub-Cluster the cluster #59, #60, #61
Analyze the gene expression data of those
genes that are known belongs to GO cell cycle
annotations
Analyze other clusters
Do the same analyze to 6000 gene dataset
Reference
1.
2.
3.
4.
5.
6.
7.
8.
http://gepas.bioinfo.cnio.es/index.html
P. T. Spellman et al., Comprehensive identification of cell cycleregulated
genes of the yeast Saccharomyces cerevisiae by microarray hybridization
Mol. Biol. Cell., vol. 9, pp. 3273--3297, 1998.
Raymond J Cho. A Genome-Wide Transcriptional Analysis of the Mitotic
Cell Cycle. Mol. Biol. Cell., vol. 2, pp. 65--73, 1998.
Herrero, J., Valencia et al. A hierarchical unsupervised growing neural
network for clustering gene expression patterns. Bioinformatics, 17(2),
126-136. 2001
Orly Alter. Singular value decomposition for genome-wide expression
data processing and modeling. PNS, vol. 97, pp 10101-10106. 2000
http://www.cellsalive.com/cell_cycle.htm
http://www.geneontology.org/
http://fatigo.bioinfo.cnio.es/htdocs/helpFatiGO.html