Transcript Chapter 26
Data Mining
Enterprise systems
infrastructure and architecture
DT211 4
1
• Note for next year Data mining in Laudon
and laudon and Kopec paper gives a few
more good ideas
Data Mining
• The process of extracting valid, previously
unknown, comprehensible, and actionable
information from large databases and
using it to make crucial business
decisions.
• Involves the analysis of data and the use
of software techniques for finding hidden
and unexpected patterns and relationships
in sets of data.
3
Data Mining
• Data mining tools uses ,e.g. AI
techniques, to help:
– predict future trends: ,
– Segment datasets
– “Product” association
• allowing businesses to make proactive,
knowledge-driven decisions.
4
Data mining: A.I. techniques.
• The most commonly used techniques A.I. techniques in
data mining are:
– Decision trees: Tree-shaped structures that represent sets of
decisions. These decisions generate rules for the classification
of a dataset.
– Nearest neighbour method: A technique that classifies each
record in a dataset based on a combination of the classes of the
k record(s) most similar to it in a historical dataset. Sometimes
called the k-nearest neighbour technique; a clustering technique
– Rule induction: The extraction of useful if-then rules from data
based on statistical significance.
– Artificial neural networks: Predictive models that learn through
training and resemble biological neural networks in structure.
5
How Data Mining Works
• For example, say that you are the director
of marketing for a insurance company and
you'd like to acquire some new customers
– You could just randomly go out and mail
coupons to the general population. However
you would not achieve the required result.
– Alternatively As the marketing director you
have access to a lot of information about all of
your customers: their age, sex, income range
and credit card insurance.
6
How Data Mining Works
Customers
Prospects
General information (e.g.
demographic data)
Known
Known
Proprietary information (e.g.
customer transactions)
Known
Target
• The goal in prospecting is to make some
decisions about the information in the lower
right hand quadrant based on the model that we
build going from Customer General Information
to Customer Proprietary Information.
7
An Algorithm for Building
Decision
Trees
Consider the following using decision trees. The following is decision tree
algorithm:
1. Let T be the set of training instances.
2. Choose an attribute that best differentiates the instances in T.
3. Create a tree node whose value is the chosen attribute.
-Create child links from this node where each link represents a
unique value
for the chosen attribute.
-Use the child link values to further subdivide the instances into
subclasses.
4. For each subclass created in step 3:
-If the instances in the subclass satisfy predefined criteria or if the set
of
remaining attribute choices for this path is null, specify the
classification
for new instances following this decision path.
-If the subclass does not satisfy the criteria and there is at least one
attribute to further subdivide the path of the tree, let T be
the current set
of subclass
instances and return to step 2.
8
Table 3.1 • The Credit Card Promotion Database
Income
Range
Life Insurance
Promotion
Credit Card
Insurance
Sex
Age
40–50K
30–40K
40–50K
30–40K
50–60K
20–30K
30–40K
20–30K
30–40K
30–40K
40–50K
20–30K
50–60K
40–50K
20–30K
No
Yes
No
Yes
Yes
No
Yes
No
No
Yes
Yes
Yes
Yes
No
Yes
No
No
No
Yes
No
No
Yes
No
No
No
No
No
No
No
Yes
Male
Female
Male
Male
Female
Female
Male
Male
Male
Female
Female
Male
Female
Male
Female
45
40
42
43
38
55
35
27
43
41
43
29
39
55
19
Table 3.1 • The Credit Card Promotion Database
Income
Range
Life Insurance
Promotion
Credit Card
Insurance
Sex
Age
40–50K
30–40K
40–50K
30–40K
50–60K
20–30K
30–40K
20–30K
30–40K
30–40K
40–50K
20–30K
50–60K
40–50K
20–30K
No
Yes
No
Yes
Yes
No
Yes
No
No
Yes
Yes
Yes
Yes
No
Yes
No
No
No
Yes
No
No
Yes
No
No
No
No
No
No
No
Yes
Male
Female
Male
Male
Female
Female
Male
Male
Male
Female
Female
Male
Female
Male
Female
45
40
42
43
38
55
35
27
43
41
43
29
39
55
19
Income
Range
20-30K
2 Yes
2 No
30-40K
4 Yes
1 No
40-50K
1 Yes
3 No
50-60K
2 Yes
How Data Mining Works
• For instance, a simple
model for a
• Insurance company
might be:
Age
– Customers who earn
between 50 K to 60 K
have a life insurance
policy.
• This model could then
be applied to the
general population to
target those for the life
insurance promotion.
• The tree can be more
complex e.g. See
figure opposite
<= 43
> 43
No (3/0)
Sex
Female
Male
Yes (6/0)
Credit
Card
Insurance
No
No (4/1)
Yes
Yes (2/0)
11
Data Mining Operations
• Data mining operations include:
– Predictive modelling: decision trees,
regression analysis…
– Database segmentation: clustering
techniques
– Link analysis: decision trees, association
rules
12
Predictive Modeling
• Applications of predictive
modelling include direct
marketing and use
techniques like decision
trees.
Simple decision tree example
• uses observations to form a
model of the important
characteristics of some
phenomenon: e.g. those
traits associated with those
who will buy property
13
Database Segmentation
• Aim is to partition a database into an
unknown number of segments, or
clusters, of similar records.
• Uses clustering techniques in order to
group data
• Applications of database segmentation
include credit card fraud….
14
Database Segmentation using a
Scatterplot
15
Link Analysis
• Aims to establish links between
records, or sets of records, in a
database; one such example would
be association discovery….
• Applications include product affinity
analysis.
• Finds items that imply the presence
of other items in the same event.
16
Link Analysis - Associations
Discovery
• Affinities between items are
represented by association discovery.
– e.g. ‘When a customer rents property for
more than 2 years and is more than 25
years old, in 40% of cases, the customer
will buy a property. This association
happens in 35% of all customers who rent
properties’.
17
Examples of Applications of Data Mining
• Retail / Marketing
– Predicting response to mailing campaigns
– Market basket analysis
• Banking:
– Detecting patterns of fraudulent credit card use.
• Insurance
– Claims analysis
• Medicine
– Identifying successful medical therapies for
different illnesses
18
Data mining in conclusion
• Two critical factors for success with
data mining are:
– a large, well-integrated data warehouse
and
– a well-defined understanding of the
business process within which data mining
is to be applied (e.g. customer prospecting,
retention, campaign management etc.).
19
Sample types questions
• Discuss, using suitable examples how
data mining can contribute to companies
making a proactive knowledge driven
decisions which could help with
formulation of a companies strategy.
20