Transcript Chapter 1

What is Data Mining?
What is Market Basket Analysis?
Give an example
What is ARROWSMITH?
What is metadata?
Metadata – data about data
age {0_34,35_51,52_max}
gender {FEMALE,MALE}
region {INNER_CITY,TOWN,RURAL,SUBURBAN}
income {0_24386,24387_43758,43759_max}
married {NO,YES}
children {0,1,2,3}
car {NO,YES}
save_act {NO,YES}
current_act {NO,YES}
mortgage {NO,YES}
Market Basket Analysis identifies customers purchasing habits. It provides insight
into the combination of products within a customers 'basket'. The term 'basket'
normally applies to a single order. However, the analysis can be applied to other
variations. We often compare all orders associated with a single customer.
Ultimately, the purchasing insights provide the potential to create cross sell
propositions:
Which product combinations are bought
When they are purchased; and in
What sequence
Developing this understanding enables businesses to promote their most profitable
products. It can also encourage customers to buy items that might have otherwise
been overlooked or missed.
Market basket analysis delivers the "Amazon effect" to your business. When you
place an order on Amazon, a list of potentially interesting products (based on a
profile of what other "similar" customers have ordered) is presented. They are
seeking to encourage purchase of additional items and thereby increase average
basket value.
Example: Beer and nappies
An observant Wal-Mart store manager discovered a strong
association between a brand of babies nappies (diapers) and a
brand of beer. Analysis of purchases revealed that they were
made by men, on Friday evenings mainly between 6pm and
7pm. The supermarket figured out the following rationale:
Because packs of diapers are very large, the wife, who in most
cases made the household purchases, left the diaper purchase
to her husband.
Being the end of the working week, the husband and father also
wanted to get some beer in for the weekend.
What did the supermarket do with this knowledge?
They put the premium beer display next to the diapers
The result was that the fathers buying diapers and who
also usually bought beer now bought the premium beer
(the up-sell) as it was so conveniently placed next to the
diapers
Significantly, the men that did not buy beer before began
to purchase it because it was so visible and handy - just
next to the nappies (the cross-sell).
Beer sales skyrocketed
Support & Confidence
If we have sales data from a store we can do some analysis:
Imagine there are 1000 customers in one day and we are interested
in two products (A, B).
We can start with frequency, how many times were the products
bought together (A AND B). Let’s say it’s 200 times.
Then we can calculate what proportion of total sales include A&B.
If it’s 200 then 200/1000 = 20%. This is called support.
Then we can look at a conditional probability, how many times does
the relationship A  B occur. Let’s say there were 250 sales that
included A, (of these 200 include B).
The confidence is 200/250 = 80%
In the example above could sales that included A be less than 200?
Is A  B the same as B  A?
How many sales include B?
What does a confidence of 100% mean?
Minimum support (%)?
Minimum confidence (%)?
Exercise: Market Basket Analysis using Excel
Transac Items from the customers
tion ID who bought more than 1
items
1
Apple, Banana, Cherry,
Durian
Transaction ID
1
2
2
Apple, Durian
3
Banana, Durian
3
4
Durian, Banana, Cherry
4
5
Banana, Durian
5
6
Apple, Banana
6
7
Apple, Cherry, Durian
7
Sum
A
B
C
D
1
1
1
1
1
0
0
1
0
1
0
1
0
1
1
1
0
1
0
1
1
1
0
0
1
0
1
1
4
5
3
6
Exercise: Market Basket Analysis using Excel
How many associations
are there for 3 items?
Download MB.xls from the LMS, there are
4 sheets and some questions
Data Mining with WEKA
What is WEKA?
Waikato Environment for
Knowledge Analysis (also a bird
from NZ)
It’s one of the better open source
data mining toolkits around. It’s
comprehensive (there are many
tools) and quite technical (data
mining).
Launch WEKA
Run the Explorer
Open the file ‘bank-data-final.arff’
Week 9 Optimisation
To do:
Research ‘Optimisation’
Find some examples/uses from business & industry
The Travelling Salesman Problem…
(TSP) Travelling Salesman Problem:
A travelling salesman has to visit a number of cities (all &
once) then return home, like a tour. We want to find the
shortest (optimal) route.
For two cities it’s trivial, go from A to B and back to A. There’s
only one possible tour A B  A.
For 3 cities it’s fairly easy A  B, B  C, C  A but you can go
ABCA or ACBA (they’re the same length, so they’re the same
‘tour’ i.e. there’s only 1 tour).
How many tours are there for 4 cities?
How many for 5?
Calculate the number of possible solutions for 10 cities?
What is the highest PROVEN solution you can find for TSP?
The Traveling Salesman Problem
is typical of a large class of “hard”
optimization problems. It has
applications in science and
engineering. For example, in the
manufacture of a circuit board, it
is important to determine the best
order in which a laser will drill
thousands of holes. An efficient
solution to this problem reduces
production costs for the
manufacturer.
3000
2500
2000
1500
1000
500
0
1
for n > 2,
2
3
4
5
6