2007 Power Point
Download
Report
Transcript 2007 Power Point
By Dan Stalloch
Association – what could be linked together in away
with something
Patterns – sequential and time series, shows us how
often certain things occur
Classification – shows us how data is grouped
Prediction – the detection of a stable occurrence
within the data that may continue into the future
Identification – what can be found out by system usage
or what might be present in a thing
Classification – how the data could be grouped
Optimization – finding ways to utilize resources
Apriori – frequent large item sets
Sampling – small frequent item sets
Frequent-Pattern (FP) Tree and FP-Growth – better
version of Apriori
Partition – efficient way to use the Apriori algorithm
Decision Tree Induction – constructing a decision tree
from a training data set
k-Means – creates clustering
And others
Marketing – analyzing customer behavior
Finance – keeping track of credit and fraud
Manufacturing – optimizing use of resources
Health Care – checking patterns for useful information
http://archive.ics.uci.edu/ml/machine-learning-
databases/auto-mpg/auto-mpg.data
This is a Car database from a depository of databases
made available to everyone through UCI
When mining a database it is essential to ask what
would you like to be able to predict from it and in this
instance we would like to know which cars have decent
mpg
We might also be able to predict which companies are
likely to stay in business
We must create or use programs that shows us either a
2-D contingency table or a 3-D contingency table
http://www.autonl
ab.org/tutorials/dt
ree18.pdf
We use a formula to decide which areas have the
highest information gain dependent on what we would
like to know. That forumula goes
like this
IG(Y|X) = H(Y) - H(Y | X)
Where H(X) = the entropy of X
http://www.autonlab.org/tutorials/dtree18.pdf
http://archive.ics.uci.edu/ml/machine-learning-
databases/auto-mpg/auto-mpg.data
http://www.autonlab.org/tutorials/infogain11.pdf
Chapter 28 from Fundamentals of Database Systems
6th Edition By Elmasri and Navathe
Pictures from Andrew W. Moore Slides