Induction and Decision Trees

Download Report

Transcript Induction and Decision Trees

Induction and Decision Trees
Artificial Intelligence
The design and development of computer
systems that exhibit intelligent behavior.
What is intelligence?
Turing test:





Developed in 1950 by Alan Turing (pioneer in computer
science)
Computer and human in one room
Human “interrogator” in another room
Interrogator asks questions...human OR computer answers
If interrogator cannot tell whether the human or the computer
is answering, then the computer is “intelligent”
Classification of AI Systems
Knowledge Representation Systems
 Capture existing expert knowledge and use it to
consult end-users and provide decision support

Main types: Rule-based expert systems, Case-base reasoning
systems, Frame-based knowledge systems, Semantic networks
Machine Learning


Algorithms that use mathematical or logical
techniques for finding patterns in data and
discovering or creating new knowledge
Main types: Artificial neural networks, genetic algorithms,
inductive decision trees, Naïve Bayesian algorithms,
Clustering and pattern-recognition algorithms
Data mining involves primarily a “machine learning” form of AI
Data Mining
Textbook definition:
 Knowledge discovery in databases
 Using statistical, mathematical, AI, and
machine learning techniques to extract
useful information and subsequent
knowledge from large databases
Key point: identifying patterns in large data
sets
Microsoft SQL Server Data Mining Algorithms
Decision Trees
Naïve Bayesian
Clustering
Sequence Clustering
Association Rules
Neural Network
Time Series
5
Decision Trees for Machine Learning
Based on Inductive Logic
Three types of logical structures commonly
used in AI systems:
 Deduction
 Abduction
 Induction
Deduction
Premise (rule): if p then q
Fact (axiom, observation): p
Conclude: q
This is classical logic (Modus Ponens). If the
rule is correct, and the fact is correct, then
you know that the conclusion will be correct.
We are given the rule
Abduction
Premise (rule): if p then q
Fact (axiom, observation): q
Conclude: p
This form of reasoning is a logical fallacy
called “affirming the consequent” (Post hoc
ergo propter hoc). The conclusion may be
wrong, but it is a plausible explanation of the
fact, given the rule. Useful for diagnostic
tasks.
We are given the rule
Induction
1.
2.
3.
4.
n.
Observe p and q together
.
.
.
Observe p and q together
Conclude: if p then q
This is stereotypical thinking…highly error
prone.
We create the rule
Example – Baby in the kitchen
ID3 Decision Tree Algorithm
“Iterative Dichotomizer”
Developed by Ross Quinlan (1979)
This is the basis for many commercial induction
products
The goal of this algorithm is to find rules resulting in
YES or NO values. (Therefore, the output of
generated rules have 2 possible outcomes)
ID3 generates a tree, where each path of the tree
represents a rule. The leaf node is the THEN part of
the rule, and the nodes leading to it are the ANDS of
attribute-value combinations in the IF part of the rule.
ID3 Algorithm
Starting Point:
 an empty tree (this tree will eventually represent the final
rules created by ID3)
 a recordset of data elements (e.g. records from a database)
 a set attributes (fields), each with some finite number of
possible values
 NOTE: one of the attributes is the “decision” field, with a
YES or NO value (or some other 2-valued
option...GOOD/BAD, HIGH/LOW, WIN/LOSE, etc.)
Output:
 a tree, where each path of the tree represents a rule
ID3 algorithm
If all records in your recordset are positive (i.e. have YES values
for their decision attribute), create a YES node and stop (end
recursion)
If all records in your recordset are negative, create a NO node
and stop (end recursion)
Select the attribute that best discriminates among the records
(using an entropy function)
Create a tree-node representing that attribute, with n branches,
where n is the number of values for the selected attribute
Divide the records of the recordset into subsets subrecordset 1,
subrecordset 2, ..., subrecordset n corresponding with each
value of the selected attribute
Recursively apply the algorithm to each subrecordset i, with
reduced attribute set (don’t include already used attributes
further down the path)
Calculating Entropy
Entropy = mixture, chaos
We want to pick the attribute with the lowest entropy:
  ideally, a particular value for the input attribute
leads to ALL yes or ALL no in the outcome
attribute…or come as close to this as possible
An attribute’s entropy =
Where n is the total number of possible values for the attribute
and xi is the ith value
Baby’s RecordSet of Oven-Touching
Experiences
ID3 Applied to Baby-in-the-Kitchen
Which attribute to start with? Based on
Entropy measure (assuming log base 2),
 Touch stove entropy = 0.918
 Mom in kitchen entropy = 1.0
 To see this, note that:

Probability of touching stove leading to ouch is
.67, and not leading to ouch is .33


.67 * .33 = .22
Probability of mom being in kitchen leading to
ouch is .5 and mom being in kitchen not leading
to ouch is also .5

.5 * .5 = .25
Applying the Touch Stove Attribute
Recurse …apply the Mom in Kitchen attribute
where needed
Resulting decision rules
If Touch_Oven = No then BOO_BOO = No
If Touch_Oven = Yes and Mom_In_Kitchen =
Yes then BOO_BOO = Yes
If Touch_Oven = Yes and Mom_In_Kitchen =
No then BOO_BOO = No
Now we’ll do this with Microsoft SQL Server