Data Mining by Timothy Vu

Download Report

Transcript Data Mining by Timothy Vu

Data Mining
Database Systems
Timothy Vu
Mining
Mining is the extraction of valuable minerals or other geological materials from
the earth, usually bauxite, coal, diamonds, iron, precious metals, lead, limestone,
nickel, phosphate, rock salt, tin, and uranium, petroleum, natural gas, and even
water.
Often something that is valuable, rare, or useful.
Database
2
What is Data Mining
Data Mining, also known as Knowledge-Discovery in Databases (KDD), is the
process of automatically searching large volumes of data for patterns. In order to
achieve this, data mining uses computational techniques from statistics,
machine learning and pattern recognition.
Machine learning - a method for creating computer programs by the analysis of
data sets.
Pattern recognition - classify data (patterns) based on either a priori knowledge
or on statistical information extracted from the patterns.
Database
3
Why Data Mining
Data mining is a technique that helps individuals or companies find useful
information to make better decisions from large amounts of data.
- Reduce risks
- Find problems and issues
- Save money
- High confidence predictions
- Simplifying information
Database
4
Discussion Topics
1 ) Classification
2 )Regression
3) Association
4) Clustering
Database
5
Classifiers
Decision-Tree Classifiers – each node has an associated class and each
internal node has a predicate.
Bayesian Classifiers – find the distribution of attribute values for each class in
the training data ( the maximum probability predicted ).
Nuro Net Classifiers – Use the training data to train artificial nuro nets.
Database
6
Regression
Regression – Deals with the prediction of a value rather than a
class.
Linear Regression – Predict values using a polynomial by
finding the curve fitting, meaning finding coefficients that give
the best answer.
Database
7
Associations
Finding the association or relationship between two or more items.
Support – measure of what fractions of the pupulation satisifies both the
antecedent and the consequent of the rule.
MILK => Screwdrivers
Confidence – how often the consequent is true when the antecedent is true.
MILK => Bread
Database
8
Clustering
Clustering is the classification of similar objects into different groups, or more
precisely, the partitioning of a data set into subsets (clusters), so that the data in
each subset (ideally) share some common trait - often proximity according to
some defined distance measure.
Database
9
Applications of Data Mining
1. Predictions
- Stock Market
- Earth Quakes
- NBA games
2. Association
- Store Inventory
- Fashion Trends
3. Descriptive Patterns
- Disease Analysis
- Image Recognition
- Fraud Detection
Database
10
Gather Data
Database
11
Electrocardiogram
Database
12
Disease Analysis
Database
13
References
Silberschatz, H.F. Korth, S. Sudershan: Database System Concepts, 5th ed.,
McGraw-Hill, 2006
Runge , Marschall, Magnus Ohman , and Frank Netter. Netter's Cardiology
(Netter Clinical Science). W.B. Saunders Company, 2004.
"Data mining". Wikipedia. 4/1/2006 <http://en.wikipedia.org/wiki/Data_Mining>.
Database
14