Data Mining Bizatch
Download
Report
Transcript Data Mining Bizatch
Data Mining
By: Thai Hoa Nguyen Pham
Data Mining
Define Data Mining
Classification
Association
Clustering
Define Data Mining
Also known as KDD (Knowledge-Discovery
in Database).
Data mining is the semiautomatic process of
analyzing data to find useful patterns.
Why semiautomatic?
Manual preprocessing of data and
postprocessing of data.
Examples of Data Mining
A simple example would be of a clothing retail store.
A data mining system could be used to list the
customers who often buy t-shirts during the Summer
season.
Another example would be of the urban legend of
how Walmart used data mining to find a correlation
between customers buying beer and baby diapers. So
they put the two aisles close together to increase
profits.
Classification
If it is given that items in databases are put
into classes, a problem arises when a new
item wants to be added to the database.
The class for the new item is unknown, so
other methods have to be used to find the right
class for the item to be put in. Rules then
come in to solve the problems.
Example of a rule
P, P.degree = masters and P.income > 75,000 => P.credit = excellent
P, P.degree = bachelors and P.income < 50K => P.credit = bad
Decision Tree Classifiers
Widely used technique for classification.
Internal nodes either called functions or
predicates
Leaf nodes are associated classes.
Example of Decision Tree Classifiers
Root
Functions
Classes
Example of Decision Tree
Classifiers
Internal nodes or functions are inside the
boxes—degree (root) and income.
Leaf nodes or associated classes are the four
different circles—bad, average, good,
excellent.
Association
An example of an association for beer and
diapers would be:
Beer => Diapers
As already mentioned, the above association
just means that customers that buy beer often
buy diapers, too.
Association Rules
Support—is a measure of what fraction of the
population satisfies both the antecedent and
the consequent. In other words, in the
association below:
milk => screwdrivers
Higher percentage of the above association
happening is worth more attention than lower
percentage.
Association Rule 2
Confidence– The measure of how often the
consequent is true when the antecedent is true.
bread = > milk
For example, if the association above had a
confidence of 50 percent, it just means that 50
percent of the purchases include bread and
milk, but it leaves room for other items
purchased with the bread.
Clustering
Clustering refers to finding clusters of points
in a given data and grouping them in different
subsets.
Widely used clustering techniques—
Hierarchical clustering, agglomerative
clustering, and divisive clustering.
Types of Clustering
Hierarchical—clustering that deals with grouping
things by importance.
Agglomerative—start by building small clusters,
then progressively merge into larger clusters.
Decisive—begins with whole set and successively
divides into smaller clusters.
Example of agglomerative hierarchical
clustering
An example of a agglomerative clustering, where we have separate
elements of a set merging with each internal node until the last merge
“abcdef” is achieved.
Other types of mining
Text Mining– data mining techniques to textual documents.
An example would be how there is a tool to form clusters on
pages that users have visited. So if a user supplies a site and
defines that he/she wants a site containing the keyword
“Japan”, a list of sites that used the keyword “Japan” the most
will appear.
Data Visualization—helps users to examine large volumes of
data, and to detect patterns visually. So instead of seeing
problems through text, visual displays can use maps and
charts to pinpoint where the problem is with some color
coding scheme.
Example of Text Mining
This example shows what happens when a user does a
search for “Japan”. The points closer to the center of the
circle has more information on Japan. We can think of the
points as websites or research articles.
Example of Data-visualization
We could say a number of things for this example. We could say the
map depicts poverty levels or which state grows more apples.
References
Data mining. (2006, October 27). In Wikipedia, The Free Encyclopedia. Retrieved
05:59, October 30, 2006, from
http://en.wikipedia.org/w/index.php?title=Data_mining&oldid=84059363
Data clustering. (2006, October 29). In Wikipedia, The Free Encyclopedia.
Retrieved 06:03, October 30, 2006, from
http://en.wikipedia.org/w/index.php?title=Data_clustering&oldid=84478616
GISmatters (2004-2006) Retrived on October 31, 2006, from
http://www.gismatters.com/over65.html
Martin, G., Spath, J. (2000) Kryptasthesie. Retrieved on October 31, 2006 from
http://www.projekttriangle.com/work/work_rwe.htm?research
Silberschaz, A., Korth, H., Sudarshan, S. (2002). Database System Concepts. New
York: New York.