CART Presentation - Villanova Department of Computing Sciences

Download Report

Transcript CART Presentation - Villanova Department of Computing Sciences

Data Mining Application:
CART
CART:
• Binary Recursion Decision Tree program
from Salford Systeems
• www.salford-systems.com
• 30-day evaluation copy from
– http://www.salford-systems.com/evals/cartreg.html
– Company: Villanova University
– Department: Computer Sciences
CART Binary Recursive Trees
• One target variable
• Splits data into a number of classes on the target
variable (set-able input parameter)
• Many predictor variables
• At each recursion CART determines one yes-no
(binary) question based on one predictor variable
• Various splitting criteria. Default (GINI)
measures how well rule separates classes in parent
node
CART Tutorial
• We have defined three market segments,
numbered 1, 2, 3. They represent “profitability”,
broadly defined as “how much money did we
make from this person in the last year”.
• We are interested in questions which distinguish
these segments so we know how to better target
future marketing.
CART Gym Data Tutorial: Variables
•
•
•
•
•
•
•
•
•
•
•
•
•
•
SEGMENT Member's market segment (coded 1,2,or 3)
ANYRAQT Racquet ball usage (binary indicator coded 0, 1)
TANNING Number of visits to tanning salon
PERSTRN Personal trainer (binary indicator coded 0, 1)
ONAER
Number of on-peak aerobics classes attended
OFFAER Number of off-peak aerobics classes attended
ANYPOOL Pool usage (binary indicator coded 0, 1)
CLASSES Number of classes taken
NSUPPS Number of supplements/vitamins/frozen dinners purchased
SMALLBUS Small business discount (binary indicator coded 0, 1)
OFFER
Terms of offer
FIT
Fitness score
NFAMMEN Number of family members
HOME
Home ownership (binary indicator coded 0, 1)
Potential Data Sources
• CART uses data in the Systat for Windows format,
extension .syd. (Systat is a very popular statistical
package) www.spssscience.com/systat.
• The downloaded version includes a dynamic link to a
program called DMBS-copy, which also allows you to use
other data formats such as ASCII, Excel, etc.
www.conceptual.com/dbmscopy.htm.
Summary: CART
• Good for generating decision trees, and
provides a lot of alternatives and a lot of
information.
• Can also use the rules created and the
resulting data as input into additional tools
• Far more information there than you want to
look at if you don’t know what you’re
looking for.
CART Assignment:
• Three pieces:
– Download and install it
– Work through the tutorial yourself and do a
brief report.
– Analyze a new set of data and answer some
questions about it.
• I am in the process of getting descriptions for the
sample data in the download and will prepare
questions based on one of those
• Or if you have data in an appropriate format you
may use your own data and questions.