Data Mining by Mandeep Jandir

Download Report

Transcript Data Mining by Mandeep Jandir

Data Mining
Mandeep Jandir
CS157B
What is Data Mining?
Data mining, or knowledge discovery, is
the process of discovering hidden patterns
and relationships in data in order to make
better and more informed decisions.
Data mining tools predict behaviors and
future trends, allowing businesses to make
knowledge-driven decisions.
Why use Data Mining?
Data mining is technique that helps
individuals or companies find useful
information to make better decisions from
large amounts of data.





Reduce risks
Find problems and issues
Save money
High confidence predictions
Simplifies information
Goals of Data Mining
Prediction


Data mining can show how certain attributes
within the data will behave in the future.
Ex. - certain seismic wave patterns may
predict an earthquake with high probability.
Identification

Data patterns can be used to identify the
existence of an item, an event, or an activity.
Goals of Data Mining (cont’d)
Classification


Data mining can partition the data so that
different classes or categories can be
identified based on combinations of
parameters.
Ex. - customers in a supermarket can be
categorized into discount-seeking shoppers,
shoppers in a rush, loyal regular shoppers,
shoppers attached to name brands, and
infrequent shoppers.
Goals of Data Mining (cont’d)
Optimization

Optimize the use of limited resources such as
time, space, money, or materials and
maximize output variables such as sales or
profits under a given set of constraints.
Types of Knowledge Discovered
during Data Mining
Knowledge is often classified as inductive
versus deductive.


Deductive knowledge deduces new
information based on applying pre-specified
logical rules of deduction on the given data.
Data mining addresses inductive knowledge,
which discovers new rules and patterns from
the supplied data.
Types of Knowledge Discovered
during Data Mining cont’d
It is common to describe knowledge
discovered during data mining as:





Association Rules
Classification hierarchies
Sequential patterns
Patterns within time series
Clustering
Types of Association Rules
Market-Basket Model, Support, and
Confidence
Apriori Algorithm
Sampling Algorithm
Frequent-Pattern Tree Algorithm
Partition Algorithm
Apriori Algorithm
Principle: Any subset of a frequent itemset
must be frequent.
Generate k-itemsets by joining large k-1itemsets and deleting any that is not large.
Notation:
Apriori Algorithm cont’d
Input: Database of m transactions, D, and
a minimum support, mins, represented as
a fraction of m.
Output: Frequent itemsets, L1,L2,…,Lk
References
http://en.wikipedia.org/wiki/Data_mining
http://www.megaputer.com/dm/dm101.php
3#whyuse
www.icaen.uiowa.edu/~comp/Public/Aprior
.pdf
Elmasri, R. and Navathe, S.:
Fundementals of Database Systems, 5th
ed.,Pearson-AddisonWesley