A cluster is considered to be stable depending on stability value

Download Report

Transcript A cluster is considered to be stable depending on stability value

Project Seminar
on
STABLE CLUSTERING ALGORITHM
TO IDENTIFY CPU USAGE OF COMPUTERS BEHAVIOR IN
GRID ENVIRONMENT
Under the guidance of
Prof. Lakshmi Rajamani
(Head of the Department)
SUBMITTED BY
G. Naresh Kumar
(01-09-1824)
M.Tech(CSE)-III SEM
Contents:
 Introduction.
 Motivation.
 Problem statement.
 Work done so far.
 Work to be done.
 Conclusion.
 References.
Introduction.
 Grid computing or simply grid is a generic term given to technologies
designed to make pools of distributed computer resources available on-demand.
 Grid computing has become a well-established method for Internet-based high-
performance computing.
 Grid provides wide-spread, dynamic, flexible and coordinated sharing of
geographically distributed networked resources, among dynamic user groups.
Data mining:
 Data Mining or Knowledge discovery refers to a variety of techniques that have
developed in the fields of databases, machine learning and pattern recognition.
 The process of finding useful patterns and information from raw data is often known
as Data mining.
Clustering:

Clustering is a division of data into groups of similar objects.
 A cluster is a collection of data objects that are similar to one another within the
same cluster and are dissimilar to the objects in other clusters.
 It is a process of unsupervised learning.
 Cluster analysis has been widely used in numerous applications, including market
research, pattern recognition, data analysis, and image processing.
Clustering techniques:
1) Partitioning Clustering
~ PAM
~ CLARA
~ CLARANS
~ K-Means
2) Supervised Clustering
~ K-nearest neighbors
3) On line mode clustering
~ ECM
~ Evoc
4) Fuzzy Clustering
~ Fuzzy c-means
Motivation

In a grid environment the number of computing nodes and users participating are
increase and may reach up to thousands or millions. The abundance of these
resources forges new problems, such as how to collect the massive amounts of
evolving resources in real time and extract the useful information from them. And,
these resources are not ordered, random and chaotic where normal user is not able
to easily discover any knowledge or meaningful information from them.
 In order to deal with these requirements, clustering is proposed as one of the best
ways in terms of processing large set of raw data and turning these data into
meaningful information.
The Flow of Clustering Process in Grid Environment
Problem statement:

Mining cluster in a single large database require more
processing power. Due to conventional technology used for
centralized data mining is no longer suitable for new systems.
We apply different clustering methods on CPU usage to
identify computers behaviors. To find out the stable algorithm
it requires the dynamicity, accuracy and the ability to identify
the stable cluster members. Among those best clustering
algorithm will be implemented for better processing and
cluster stability in grid environment. However, the results are
based on threshold value, stability value and stability hour
Work done so far:

Survey on the existing clustering algorithms.

Survey on Grid technologies.

Installed Grid gain toolkit.
Work to be done:
 Testing of different types of clustering algorithms and calculate their performances,
complexity in a system.
 Testing of clustering algorithms in grid environment and measure their
performances to find out the stable clustering algorithm.
 Finally, implementation of the stable clustering algorithm in grid environment for
better processing and cluster stability.
Cluster Stability
 Stability Value: The value (in percentage) that measures the change in cluster
radius. For instance, if the stability value is defined as 5%, any cluster radius that
grows or shrinks less than 5% from the original size will be considered as stable.
 Stability Hour: The value that defines the required amount of time in hours for a
cluster member to stay in the same cluster in order for it to be considered stable. If
the stability hour is set to 3 hours, any cluster member that stays in the same cluster
for more than this amount of time will be considered as stable.
Assumptions:
 A cluster is considered to be stable depending on stability value which is pre-defined
by the user, for instance 20%.
 A cluster member is considered to be stable if it stays in the same stable cluster
continuously for or at least two hours. The stability hour is determined by the users.
Conclusion:
 Here the stable clustering algorithm has been evaluated using three main
criteria; that is dynamicity, accuracy and the ability to identify the stable
cluster members. This stable clustering algorithm can handle and process
massive amount of data without any significant error rate. From the
experiment, we can conclude that the stable clustering algorithm is more
dynamic than other existing clustering algorithms.
References:







GRID COMPUTING: A Practical Guide to Technology and Applications. Ahmar Abbas.
Charles River Media Inc, 2004.
“Data Mining Concepts and Techniques” by Jiawei and Micheline Kamber, University of
Illinois at Urbana-Champaign 2000© Morgan Kaufmann Publishers.
Zhijie Xu, Laisheng Wang, Jiancheng Luo and Jianqin Zhang , “A Modified Clustering
Algorithm for Data Mining”.
Kee Sim Ee, Chan Huah Yang, Fazilah Haran, “Mining of Resource Usage Using Evoc
Algorithm in Grid Environment”.
Huimin Wang, Guihua Nie and Kui Fu , “Distributed data mining based on semantic web
and grid” in 2009 International Conference on Computational Intelligence and Natural
Computing.
Ping Luo, Kevin Li, Zhongzhi Shi, Qing He, “Distributed data mining in grid computing
environments.
David A. Cieslak, Nitesh V. Chawla, and Douglas L. Thain published a “Troubleshooting
Thousands of Jobs on Production Grids Using Data Mining Techniques” at 9th Grid
Computing Conference 2008.
Thank you