Transcript Data MINING

Data MINING
• Data mining is the process of extracting
previously unknown, valid and actionable
information from large data and then using
the information so derived to make crucial
business and strategic decision.
• To discover meaningful patterns and rules.
Data Warehouse to Data Mining
• D.W.H, being a subject-oriented, integrated, time variant
and non-volatile collection of data aims at supporting
business decision, whereas data mining process is a
natural and logical continuation of D.W.H
• D.M needs accurate , consistent and good quality data
as the model mined are to be consistent and accurate.
D.W.H is collection of all aspects
• D.M can give more relations, uncovered patterns and
rules it mined from different and more data sources
D.W.H
• Query directions are made easy in D.W.H
Steps of Data Mining
1.
2.
3.
4.
5.
Identifying the Data – Data could be all over the world not just in
the enterprise. data could distributed (paper, people heads etc..)
Getting the data ready – (Getting the data ready) – It is to put in
data in right format in DB are to be built in the system
Mining the data – After right data is known , its cleaned, scrubbed
and remove unnecessary items and get only data essential for
mining.
Getting useful results – After mining , What outcomes do we
want? Do we want the tools to find interesting patterns? Are these
tools available or do we build the tools?
Identifying action – After having determined the data and tools ,
we start the tools to operate the data. It can produce lots of data
so we know what to do with patterns?
6.
7.
8.
9.
Implementing the actions – After getting useful results?
Examine results and identify actions that can be taken
eg analyzing pattern items go together and put
together
Evaluating the benefits – After actions implemented,
we wait to save results. These results may immediately
or may take longtime. Once we are in position to
determine the benefits and costs, re-evaluate the
procedure. By that the data may have changed. New
tools may be recommended. So plan the next mining
cycle and find how to go about it
Determining what to do next
Carrying out the next cycle
Outcomes of Data Mining
There are six activities in data mining which
are known as data mining tasks or types
1. Classification
2. Estimation
3. Prediction
4. Affinity grouping or Association rules
5. Clustering
6. Description and Visualization
Classification
• Classification consists of examining the features
of a newly presented object and assigning to it a
predefined class the common features are
extracted.
• Classification is carried out by developing
training sets with pre-classified examples and
then building a model that fits the description of
the classes.
• In classification, a group of entities is partitioned
based on a predefined value of some attributes.
• Classification deals with discrete outcomes yes
or no, debit card or car loan.
Estimation
• Based on the spending patterns of a
person and his age one can estimate his
salary or the number of children he has.
• Estimation deals with continuously valued
outcome.
• Classification and estimation are used
together.
Prediction
• This task predicts the future behaviour of some values.
For example based on the education of a person, his
current job and trends in the industry one can predict
that his salary will be a certain amount be year 2008.
• Predictive task feel different because the records are
classifed according to some predicted future behaviour
or estimate future value.
• With prediction the only way to check the accuracy of the
classification is to wait and see.
• Historical data is used to build a model that current
observed behavior, when applied to current input it can
predict future behavior.
Affinity grouping or Association
rules
• The task of affinity grouping is to determine is to
determine which things go together.
• This determines the items that go together eg
who are the people that travel together? What
are the items that are purchased together?
• Affinity grouping can also be used to identify
cross selling opportunity and to design attractive
packages or grouping of product and services.
Clustering
• Clustering is a DM task that is often confused with
classification.
• Clustering are formed by analyzing data.For eg group X
prefer zen, Y prefer ford icon, z prefer nano.
• Once cluster are obtained then each cluster can be
examined and mined future for other outcomes such as
estimation and classification.
• Clustering ids often done as a prelude to some other
form of data mining or modeling.
• Clustering might be the first step in a market segment
efforts
Description and Visualization
• For eg john usually goes shopping after he
goes to the bank, but last week he went to
church after shopping
• Anomaly detection is a form of deviation
detection and is used for applications such
as fraud detection and medical illness
detection.