Chapter 8 - Data Miners

Download Report

Transcript Chapter 8 - Data Miners

Chapter 8
Nearest Neighbor Approaches:
Memory-Based Reasoning and
Collaborative Filtering
Data Mining Techniques So Far…
• Chapter 5 – Statistics
• Chapter 6 – Decision Trees
• Chapter 7 – Neural Networks
2
Nearest Neighbor Approaches
• Based on the concept of similarity
– Memory-Based Reasoning (MBR) – results
are based on analogous situations in the past
– Collaborative Filtering – results use
preferences in addition to analogous
situations from the past
3
Memory-Based Reasoning (MBR)
• Our ability to reason from experience depends
on our ability to recognize appropriate examples
from the past…
– Traffic patterns/routes
– Movies
– Food
• We identify similar example(s) and apply what
we know/learned to current situation
• These similar examples in MBR are referred to
as neighbors
4
MBR Applications
• Fraud detection
• Customer response prediction
• Medical treatments
• Classifying responses – MBR can process
free-text responses and assign codes
5
MBR Strengths
+ Ability to use data “as is” – utilizes both a
distance function and a combination
function between data records to help
determine how “neighborly” they are
+ Ability to adapt – adding new data makes
it possible for MBR to learn new things
+ Good results without lengthy training
6
MBR Example – Rents in Tuxedo, NY
• Classify nearest neighbors
based on descriptive variables
– population & median home
prices (not geography in this
example)
• Range midpoint in 2 neighbors
is $1,000 & $1,250 so Tuxedo
rent should be $1,125; 2nd
method yields rent of $977
• Actual midpoint rent in Tuxedo
turns out to be $1,250 (one
method) and $907 in another.
7
MBR Challenges
1. Choosing appropriate historical data for
use in training
2. Choosing the most efficient way to
represent the training data
3. Choosing the distance function,
combination function, and the number of
neighbors
8
Memory-Based Reasoning Exercise
• Work in teams of 3 or 4
• Time Limit = 10 minutes
• Discuss a couple of ways in which
MBR could be utilized and hence
useful to an organization (enterprise,
govt agency, etc.)
• Teams present ideas
9
Collaborative Filtering
• Lots of human examples of this:
– Best teachers
– Best courses
– Best restaurants (ambiance, service, food,
price)
– Recommend a dentist, mechanic, PC repair,
blank CDs/DVDs, wines, B&Bs, etc…
• CF is a variant of MBR particularly well
suited to personalized recommendations
10
Collaborative Filtering
• Starts with a history of people’s personal
preferences
• Uses a distance function – people who
like the same things are “close”
• Uses “votes” which are weighted by
distances, so close neighbor votes count
more
• Basically, judgments of a peer group are
important
11
Collaborative Filtering
• Knowing that lots of people liked
something is not sufficient…
• Who liked it is also important
– Friend whose past recommendations were
good (or bad)
– High profile person seems to influence
• Collaborative Filtering automates this
word-of-mouth everyday activity
12
Preparing Recommendations for
Collaborative Filtering
1. Building customer profile – ask new
customer to rate selection of things
2. Comparing this new profile to other
customers using some measure of
similarity
3. Using some combination of the ratings
from similar customers to predict what
the new customer would select for items
he/she has NOT yet rated
13
Collaborative Filtering Example
• What rating would Nathaniel
give to Planet of the Apes?
• Simon, distance 2, rated it -1
• Amelia, distance 4, rated it -4
• Using weighted average
inverse to distance, it is
predicted that he would rate it
a -2
• =(0.5*-1 + 0.25*-4) / (0.5 +
0.25)
• Nathaniel can certainly enter
his rating after seeing the
movie which could be close or
far from the prediction
14
Collaborative Filtering Exercise
• Work in teams of 3 or 4
• Time Limit = 10 minutes
• Discuss a couple of ways in which
Collaborative Filtering could be utilized
and hence useful to an organization
(enterprise, govt agency, etc.)
• Teams present ideas
15
End of Chapter 8
16