G - Kansas State University - Laboratory for Knowledge Discovery in
Download
Report
Transcript G - Kansas State University - Laboratory for Knowledge Discovery in
Lecture 3
Data Mining Basics
Monday, January 22, 2001
William H. Hsu
Department of Computing and Information Sciences, KSU
http://www.cis.ksu.edu/~bhsu
Readings:
Chapter 1-2, Witten and Frank
Sections 2.7-2.8, Mitchell
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Lecture Outline
•
Read Chapters 1-2, Witten and Frank; 2.7-2.8, Mitchell
•
Homework 1: Due Friday, February 2, 2001 (before 12 AM CST)
•
Paper Commentary 1: Due This Friday (in class)
– U. Fayyad, “From Data Mining to Knowledge Discovery”
– See guidelines in course notes
•
Supervised Learning (continued)
– Version spaces
– Candidate elimination algorithm
• Derivation
• Examples
•
The Need for Inductive Bias
– Representations (hypothesis languages): a worst-case scenario
– Change of representation
•
Computational Learning Theory
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Representing Version Spaces
•
Hypothesis Space
– A finite meet semilattice (partial ordering Less-Specific-Than;
all ?)
– Every pair of hypotheses has a greatest lower bound (GLB)
– VSH,D the consistent poset (partially-ordered subset of H)
•
Definition: General Boundary
– General boundary G of version space VSH,D : set of most general members
– Most general minimal elements of VSH,D “set of necessary conditions”
•
Definition: Specific Boundary
– Specific boundary S of version space VSH,D : set of most specific members
– Most specific maximal elements of VSH,D “set of sufficient conditions”
•
Version Space
– Every member of the version space lies between S and G
– VSH,D { h H | s S . g G . g P h P s }
where P Less-Specific-Than
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Candidate Elimination Algorithm [1]
1. Initialization
G (singleton) set containing most general hypothesis in H, denoted {<?, … , ?>}
S set of most specific hypotheses in H, denoted {<Ø, … , Ø>}
2. For each training example d
If d is a positive example (Update-S)
Remove from G any hypotheses inconsistent with d
For each hypothesis s in S that is not consistent with d
Remove s from S
Add to S all minimal generalizations h of s such that
1. h is consistent with d
2. Some member of G is more general than h
(These are the greatest lower bounds, or meets, s d, in VSH,D)
Remove from S any hypothesis that is more general than another hypothesis
in S (remove any dominated elements)
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Candidate Elimination Algorithm [2]
(continued)
If d is a negative example (Update-G)
Remove from S any hypotheses inconsistent with d
For each hypothesis g in G that is not consistent with d
Remove g from G
Add to G all minimal specializations h of g such that
1. h is consistent with d
2. Some member of S is more specific than h
(These are the least upper bounds, or joins, g d, in VSH,D)
Remove from G any hypothesis that is less general than another hypothesis in
G (remove any dominating elements)
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Example Trace
S0
<Ø, Ø, Ø, Ø, Ø, Ø>
d1: <Sunny, Warm, Normal, Strong, Warm, Same, Yes>
d2: <Sunny, Warm, High, Strong, Warm, Same, Yes>
S1
<Sunny, Warm, Normal, Strong, Warm, Same>
S2 = S3
<Sunny, Warm, ?, Strong, Warm, Same>
S4
G3
<Sunny, ?, ?, ?, ?, ?>
<Sunny, ?, ?, ?, ?, ?>
G0 = G1 = G2
d4: <Sunny, Warm, High, Strong, Cool, Change, Yes>
<Sunny, Warm, ?, Strong, ?, ?>
<Sunny, ?, ?, Strong, ?, ?>
G4
d3: <Rainy, Cold, High, Strong, Warm, Change, No>
<Sunny, Warm, ?, ?, ?, ?>
<?, Warm, ?, Strong, ?, ?>
<?, Warm, ?, ?, ?, ?>
<?, Warm, ?, ?, ?, ?> <?, ?, ?, ?, ?, Same>
<?, ?, ?, ?, ?, ?>
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
What Next Training Example?
S:
<Sunny, ?, ?, Strong, ?, ?>
G:
<Sunny, Warm, ?, Strong, ?, ?>
<Sunny, Warm, ?, ?, ?, ?>
<Sunny, ?, ?, ?, ?, ?>
<?, Warm, ?, Strong, ?, ?>
<?, Warm, ?, ?, ?, ?>
•
What Query Should The Learner Make Next?
•
How Should These Be Classified?
– <Sunny, Warm, Normal, Strong, Cool, Change>
– <Rainy, Cold, Normal, Light, Warm, Same>
– <Sunny, Warm, Normal, Light, Warm, Same>
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
What Justifies This Inductive Leap?
•
Example: Inductive Generalization
– Positive example: <Sunny, Warm, Normal, Strong, Cool, Change, Yes>
– Positive example: <Sunny, Warm, Normal, Light, Warm, Same, Yes>
– Induced S: <Sunny, Warm, Normal, ?, ?, ?>
•
Why Believe We Can Classify The Unseen?
– e.g., <Sunny, Warm, Normal, Strong, Warm, Same>
– When is there enough information (in a new case) to make a prediction?
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
An Unbiased Learner
•
Example of A Biased H
– Conjunctive concepts with don’t cares
– What concepts can H not express? (Hint: what are its syntactic limitations?)
•
Idea
– Choose H’ that expresses every teachable concept
– i.e., H’ is the power set of X
– Recall: | A
B | = | B | | A | (A = X; B = {labels}; H’ = A B)
– {{Rainy, Sunny} {Warm, Cold} {Normal, High} {None, Mild, Strong}
{Cool, Warm} {Same, Change}} {0, 1}
•
An Exhaustive Hypothesis Language
– Consider: H’ = disjunctions (), conjunctions (), negations (¬) over previous H
– | H’ | = 2(2 • 2 • 2 • 3 • 2 • 2) = 296; | H | = 1 + (3 • 3 • 3 • 4 • 3 • 3) = 973
•
What Are S, G For The Hypothesis Language H’?
– S disjunction of all positive examples
– G conjunction of all negated negative examples
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Inductive Bias
•
Components of An Inductive Bias Definition
– Concept learning algorithm L
– Instances X, target concept c
– Training examples Dc = {<x, c(x)>}
– L(xi, Dc) = classification assigned to instance xi by L after training on Dc
•
Definition
– The inductive bias of L is any minimal set of assertions B such that, for any
target concept c and corresponding training examples Dc,
xi X . [(B Dc xi) | L(xi, Dc)]
where A | B means A logically entails B
– Informal idea: preference for (i.e., restriction to) certain hypotheses by
structural (syntactic) means
•
Rationale
– Prior assumptions regarding target concept
– Basis for inductive generalization
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Inductive Systems
and Equivalent Deductive Systems
Inductive System
Training Examples
Candidate Elimination
Algorithm
Classification of New Instance
(or “Don’t Know”)
New Instance
Using Hypothesis
Space H
Equivalent Deductive System
Training Examples
New Instance
Classification of New Instance
(or “Don’t Know”)
Theorem Prover
Assertion { c H }
Inductive bias made explicit
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Three Learners with Different Biases
•
Rote Learner
– Weakest bias: anything seen before, i.e., no bias
– Store examples
– Classify x if and only if it matches previously observed example
•
Version Space Candidate Elimination Algorithm
– Stronger bias: concepts belonging to conjunctive H
– Store extremal generalizations and specializations
– Classify x if and only if it “falls within” S and G boundaries (all members agree)
•
Find-S
– Even stronger bias: most specific hypothesis
– Prior assumption: any instance not observed to be positive is negative
– Classify x based on S set
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Hypothesis Space:
A Syntactic Restriction
•
Recall: 4-Variable Concept Learning Problem
x1
x2
x3
x4
Unknown
Function
Example
0
1
2
3
4
5
6
•
x1
0
0
0
1
0
1
0
x2
1
0
0
0
1
1
1
y = f (x1, x2, x3, x4 )
x3
1
0
1
0
1
0
0
x4
0
0
1
1
0
0
1
y
0
0
1
1
0
0
0
Bias: Simple Conjunctive Rules
– Only 16 simple conjunctive rules of the form y = xi xj xk
– y = Ø, x1, …, x4, x1 x2, …, x3 x4, x1 x2 x3, …, x2 x3 x4, x1 x2 x3 x4
– Example above: no simple rule explains the data (counterexamples?)
– Similarly for simple clauses (conjunction and disjunction allowed)
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Hypothesis Space:
m-of-n Rules
•
m-of-n Rules
– 32 possible rules of the form:
“y = 1 iff at least m of the following n variables are 1”
Example
0
1
2
3
4
5
6
Variables
{x1}
{x2}
{x3}
{x4}
{x1, x2}
{x1, x3}
{x1, x4}
{x2, x3}
•
x1
0
0
0
1
0
1
0
Counterexample
1234of
of
of
of
2
0
0
6
0
2
0
2
5
2
0
2
x2
1
0
0
0
1
1
1
x3
1
0
1
0
1
0
0
Variables
{x2, x4}
{x3, x4}
{x1, x2, x3}
{x1, x2, x4}
{x1, x3, x4}
{x2, x3, x4}
{x1, x2, x3, x4}
x4
0
0
1
1
0
0
1
y
0
0
1
1
0
0
0
Counterexample
1234of
of
of
of
0
2
0
3
0
2
2
0
2
2
0
2
0
4
2
0
4
2
2
Found A Consistent Hypothesis!
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Views of Learning
•
Removal of (Remaining) Uncertainty
– Suppose unknown function was known to be m-of-n Boolean function
– Could use training data to infer the function
•
Learning and Hypothesis Languages
– Possible approach to guess a good, small hypothesis language:
• Start with a very small language
• Enlarge until it contains a hypothesis that fits the data
– Inductive bias
• Preference for certain languages
• Analogous to data compression (removal of redundancy)
• Later: coding the “model” versus coding the “uncertainty” (error)
•
We Could Be Wrong!
– Prior knowledge could be wrong (e.g., y = x4 one-of (x1, x3) also consistent)
– If guessed language was wrong, errors will occur on new cases
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Two Strategies for Machine Learning
•
Develop Ways to Express Prior Knowledge
– Role of prior knowledge: guides search for hypotheses / hypothesis languages
– Expression languages for prior knowledge
• Rule grammars; stochastic models; etc.
• Restrictions on computational models; other (formal) specification methods
•
Develop Flexible Hypothesis Spaces
– Structured collections of hypotheses
• Agglomeration: nested collections (hierarchies)
• Partitioning: decision trees, lists, rules
• Neural networks; cases, etc.
– Hypothesis spaces of adaptive size
•
Either Case: Develop Algorithms for Finding A Hypothesis That Fits Well
– Ideally, will generalize well
•
Later: Bias Optimization (Meta-Learning, Wrappers)
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Terminology
•
The Version Space Algorithm
– Version space: constructive definition
• S and G boundaries characterize learner’s uncertainty
• Version space can be used to make predictions over unseen cases
– Algorithms: Find-S, List-Then-Eliminate, candidate elimination
– Consistent hypothesis - one that correctly predicts observed examples
– Version space - space of all currently consistent (or satisfiable) hypotheses
•
Inductive Bias
– Strength of inductive bias: how few hypotheses?
– Specific biases: based on specific languages
•
Hypothesis Language
– “Searchable subset” of the space of possible descriptors
– m-of-n, conjunctive, disjunctive, clauses
– Ability to represent a concept
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Summary Points
•
Introduction to Supervised Concept Learning
•
Inductive Leaps Possible Only if Learner Is Biased
– Futility of learning without bias
– Strength of inductive bias: proportional to restrictions on hypotheses
•
Modeling Inductive Learners with Equivalent Deductive Systems
– Representing inductive learning as theorem proving
– Equivalent learning and inference problems
•
Syntactic Restrictions
– Example: m-of-n concept
– Other examples?
•
Views of Learning and Strategies
– Removing uncertainty (“data compression”)
– Role of knowledge
•
Next Lecture: More on Knowledge Discovery in Databases (KDD)
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences