Overview of Data Mining Methods (MS PPT)

Download Report

Transcript Overview of Data Mining Methods (MS PPT)

Data Mining Tools
Overview
Business Intelligence for Managers
Data Mining Definition
Revisited
Analysis of large quantities of data
 Knowledge discovery in databases
 Extracting implicit, previously
unknown information from large
volumes of raw data

Instances and Features
Typically, the database will be a
collection of instances
 Each instance will have values for a
given set of features
 From database theory:
instances are rows,
features are columns

Classification
Supervised learning
 Suppose instances have been
categorized into classes and the
database includes this categorization
 Goal: using the “knowledge” in the
database, classify a given instance

Classifiers
feature
values
X1
X2
X3
…
Y
Classifier
category
Xn
DB
collection of instances
with known categories
Classifier intelligence
A classifier’s intelligence will be based
on a dataset consisting of instances
with known categories
 Typical goal of a classifier: predict the
category of a new instance that is
rationally consistent with the dataset

BI Examples


A loans officer in a bank uses a system that
automatically approves or disapproves a
loan application based on previous loan
applications and decisions
An admissions officer in a university uses a
system that automatically makes an
admission decision (accept, reject, wait-list),
based on previous applicants’ data and
decisions made on them
Data mining method example:
k - nearest neighbors

For a given instance T, get the top k
database instances that are “nearest”
to T

Select a reasonable distance measure
Inspect the category of these k
instances, choose the category C that
represent the most instances
 Conclude that T belongs to category C

Clustering
(Chapter 5 of text)
Unsupervised learning
 Classes/categories are not known, but
unexpected groupings (clusters) are
discovered
 Clustering provides insight into the
population segments

Feature 2
Clustering
Feature 1
Goal of Clustering
Input: the database of instances, and
possibly some predetermined number
of clusters
 Output: the same database of
instances partitioned into clusters

BI Examples

After clustering the current university
student population, it was discovered that
there is a large group of female marketing
majors coming from a particular exclusive
school who tend to get high grades


business response: focus recruitment on
that school; push the university’s marketing
program
Customer segment characteristics and
spending patterns can direct business
strategies
Data mining method
example: k-means
Guess the number of clusters (k)
 Guess cluster centers from the
samples (these will be called
centroids)
 Determine cluster membership based
on the distance from the centroids
 Repeatedly refine the centroids by
getting the average (mean) of the
members of each cluster

Summary
Two sub-areas of data mining:
supervised (classification) and
unsupervised (clustering) learning
methods
 For both types of methods, intelligent
systems can be created to support
business decision making
