MS PowerPoint 97/2000 format - Kansas State University

Download Report

Transcript MS PowerPoint 97/2000 format - Kansas State University

Lecture 10
KDD Presentation (3 of 3):
Rule Induction
Wednesday, February 7, 2001
William H. Hsu
Department of Computing and Information Sciences, KSU
http://www.cis.ksu.edu/~bhsu
Readings:
“Using Inductive Learning to Generate Rules for Semantic Query
Optimization”, Hsu and Knoblock
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Presentation Outline
•
Paper
– “Using Inductive Learning to Generate Rules for Semantic Query Optimization”
– Authors: C.-N. Hsu and C. A. Knoblock
– In Advances in Knowledge Discovery in Databases (Fayyad, Piatetsky-Shapiro,
Smyth, Uthurusamy, eds.)
•
Overview
– Learning semantic knowledge
• Rule induction
• Purpose: semantic query optimization (SQO)
– Analogue: inductive logic programming (ILP)
• Knowledge representation: Horn clauses
• Idea: use reformulation of queries to learn (induce) rules
•
Application of Machine Learning to KDD: Issues
– Rules: Good hypothesis language for performance element (SQO)?
– How are goals of database query speedup achieved?
– Key strengths: straightforward induction method; can use domain theory
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Induction as Inverted Deduction:
Design Principles
Deductive System for Inductive Learning
Training Examples
New Instance
Theorem Prover
Assertion { c  H }
Classification of New Instance
(or “Don’t Know”)
Inductive bias made explicit
•
Recall: Definition of Induction
– Induction: finding h such that  <xi, f(xi)>  D . (B  D  xi) | f(xi)
• A | B means A logically entails B
• xi  ith target instance
• f(xi) is the target function value for example xi (data set D = {<xi, f(xi)>})
• Background knowledge B (e.g., inductive bias in inductive learning)
•
Idea
– Design inductive algorithm by inverting operators for automated deduction
– Same deductive operators as used in theorem proving
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Induction as Inverted Deduction:
Example
•
Deductive Query
– “Pairs <u, v> of people such that u is a child of v”
– Relations (predicates)
• Child (target predicate)
• Father, Mother, Parent, Male, Female
•
Learning Problem
– Formulation
• Concept learning: target function f is Boolean-valued
• i.e., target predicate
– Components
• Target function f(xi): Child (Bob, Sharon)
• xi: Male (Bob), Female (Sharon), Father (Sharon, Bob)
• B: {Parent (x, y)  Father (x, y). Parent (x, y)  Mother (x, y).}
– What satisfies  <xi, f(xi)>  D . (B  D  xi) | f(xi)?
• h1: Child (u, v)  Father (v, u).
- doesn’t use B
• h2: Child (u, v)  Parent (v, u).
- uses B
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Induction as Inverted Deduction:
Advantages and Disadvantages
•
Advantages (Pros)
– Subsumes earlier idea of finding h that “fits” training data
– Domain theory B helps define meaning of “fitting” the data: B  D  xi | f(xi)
– Suggests algorithms that search H guided by B
• Theory-guided constructive induction [Donoho and Rendell, 1995]
• aka Knowledge-guided constructive induction [Donoho, 1996]
•
Disadvantages (Cons)
– Doesn’t allow for noisy data
• Q: Why not?
• A: Consider what  <xi, f(xi)>  D . (B  D  xi) | f(xi) stipulates
– First-order logic gives a huge hypothesis space H
• Overfitting…
• Intractability of calculating all acceptable h’s
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Inverting Resolution:
Example
C1: Pass-Exam 
Know-Material
C2: Know-Material 
Study
Resolution
C: Pass-Exam 
Study
C1: Pass-Exam 
Know-Material
P L
L  R
P R
C2: Know-Material 
Study
Inverse
Resolution
C: Pass-Exam 
Study
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Semantic Query Optimization (SQO)
Methodology
•
Goals (Section 17.1)
– Use semantic rules to find “shortcuts” to queries
• Example: all CIS 864 students have studied basic probability
• Query: “Find all CIS 864 students who have had courses in probability and
stochastic processes”
• Can drop condition
– Learn rules from data
• Observe when query can be simplified
• Generalize over these “training cases”
•
Background (Section 17.2)
– Queries: Datalog  select-from-where subset of Structured Query Language
(SQL)
– Semantic rules: Horn clauses (cf. Prolog)
•
Learning Framework (Section 17.3)
– Concept: SatisfyInputQuery (+ iff instance, i.e., tuple, satistifes query)
– Algorithm for dropping constraints (generalization): greedy min-set-cover
– Heuristic (preference bias): gain/cost ratio
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Learning Framework and Algorithm
•
Given: Few Example Queries, Data Set D (Many Tuples)
•
Methodology (Sections 17.3-4)
– Step 1 (Optimizer): optimize queries by dropping constraints if possible
• Use Greedy-Min-Set-Cover algorithm
• Call learning module to add rules to rule base
– Step 2 (Find Alternative Queries):
• 2a (Construct Candidate Constraints): use gain/cost ratio (number of – cases
excluded / syntactic length of constraint)
• Rationale: Occam’s Razor bias, min-set-cover (ratio-bounded approximation)
• 2b (Search for Constraints): build on newly-introduced relations
– Step 3 (Update Rule Bank): apply newly discovered rules
• Put newly-induced rules into rule base
• Use inference engine (Prolog) to generate facts that will shorten query search
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Design Rationale
•
Problem (Sections 17.1-4)
– How to generalize well over reformulable queries?
– Want to make sure inducer does not overfit observed pattern of training examples
•
Solution Approach (Section 17.3-4)
– Idea: Occam’s Razor bias
• Prefer shorter hypotheses, all other things being equal
• Why does this work?
•
Types of Bias
– Preference bias
• Captured (“encoded”) in learning algorithm
• Compare: search heuristic
– Language bias
• Captured (“encoded”) in knowledge (hypothesis) representation
• Compare: restriction of search space
• aka restriction bias
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Experimental Method
•
Experimental Results (Section 17.5)
– Improvement using SQO by rule induction (Table 17.4)
• Reformulation using induced rules improves short and long queries (about
uniformly)
• Speedup
• Breakdown of savings by NIL queries vs. overall
•
Claims (Section 17.5)
– SQO is scalable: can use rule induction on large DBs
– SQO is general: can apply other search techniques, heuristics
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Summary:
Content Critique
•
Key Contribution
– Simple, direct integration of inductive rule learning with SQO
– Significance to KDD: good way to apply ILP-like learning in DB optimization
– Applications
• Inference
• Decision support systems (DSS)
•
Strengths
– Somewhat generalizable approach
• Significant for KDD
• Applies to other learning-for-optimization inducers
– Formal analysis of SQO complexity
– Experiments: measure
• Speedup learning % time saved
• How wasted time is saved (NIL queries, short vs. long queries) cf.
performance profiling
•
Weaknesses, Tradeoffs, and Questionable Issues
– Insufficient comparison of alternative heuristics (MDL, etc.)
– Empirical performance of exhaustive search?
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Summary:
Presentation Critique
•
Audience: Researchers and Practitioners of
– AI (machine learning, intelligent database optimization)
– Database management systems
– Applied logic
•
Positive and Exemplary Points
– Good, abstract examples illustrating role of SQO and ILP
– Real DB optimization example (3 Oracle DBs)
•
Negative Points and Possible Improvements
– Insufficient description of analytical hypothesis representations
– Semantics: not clear how to apply other algorithms of rule induction
• Decision tree
• First-order ILP (e.g., FOIL)
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences