Inductive Learning machine

Download Report

Transcript Inductive Learning machine

Learning From Data
Chichang Jou
Tamkang University
Chapter Objectives








Analyze the general model of inductive learning
Explain how to select an approximating function
Introduce risk functional for regression and
classification problems
Identify concepts in statistical learning theory
Discuss the differences of inductive principles,
empirical risk minimization, and structural risk
minimization
Discuss practical aspects of VC dimension
Compare inductive learning tasks using graphics
Introduce validation methods of inductive learning
results
2
Background


Biological systems learn to cope with the unknown,
statistical environment in a data-driven fashion
Two-phases of predictive-learning process:
– Learning or estimating unknown dependencies

Induction: progressing from particular cases to a
model
– Using estimated dependencies to predict

Deduction: progressing from a model and given input
to particular cases
3
Induction, Deduction,
Transduction
Local estimation, like
association rules
4
4.1 Learning machine


Machine learning algorithms vary in
their goals, in the available training
data sets, and in the learning
strategies and representation of data
Inductive machine learning
– A generalization of models is obtained
from a set of samples
5
Observational setting of a
Learning machine
Real-worlds
systems often
have unmeasured
inputs
Conditional
probability
p(Y/X)
6
Inductive Learning
machine

Try to form generalizations from
particular true facts (called training
data set).
– Formalized as a set of functions that
approximate a system’s behavior

Given X as an input, implementing a set of
functions f(X, w), w is a parameter of the
function
– Its solution requires a priori knowledge
7
Inductive Learning
machine

The task of inductive inference
– Given samples (xi, f(xi)), return a function
h(x), called hypothesis, that approximate
f(x)
linear
non-linear
8
Inductive Learning
machine

Statistical dependency vs. causality
– Inductive-learning processes build the
model of dependencies, but they should
not be automatically interpreted as
causality relations
– Example: people in Florida are on
average older than in other states.
Married mnn live longer than single men.
9
Loss function and Risk function

L(y, f(X,w))
– Measures the difference between y and
f(X,w)

Induction learning is the process of
estimating f(X,wopt), which minimizes
R(w)
10
Common Loss function
11
Inductive principle


An inductive principle is a general
prescription (what to do with the data) for
obtaining an estimate f(X, wopt*)
Human intervention in the learning
algorithm
–
–
–
–
Selection of input and output variables
Data encoding and representation
Incorporating a priori knowledge
Influence over the generator of the sampling
rate or distribution
12
4.2 Statistical Learning
Method

A formalized theory for finite-sample
inductive learning, mainly for classification
or pattern recognition
– Provide quantitative description of the trade-off
between model complexity and the available
information
– Also called VC (Vapnik-Chervonenkis) theory

Other approaches are more engineeringoriented, without proofs and formalizations
13
Empirical risk
minimization (ERM)
Typically used when the model is given or approximated first, and
then its parameters are estimated from the data
14
Empirical risk
minimization (ERM)

The consistency property
– Minimizing one risk for a given data set
will also minimize the other risk

Nontrivial Consistency
– Consistency requirement must hold for all
approximating functions
15
Behavior of the Growth
function G(n)
Approximating functions in the form of G(n) will have a consistency property
16
Structural Risk
Minimization (SRM)


ERM is good when n/h is large
When n/h < 20, use SRM
1. Selecting an element of a structure
having optimal complexity
2. Estimating the model based on the set
of approximating functions defined in
the selected element of the structure
17
SRM in practice
18
SRM

Applications of SRM for non-linear
approximations are difficult, impossible in
many cases
– use heuristics, like early stopping rules and
weight initialization

Three optimization approaches
– Stochastic approximation (gradient descent)
– Iterative methods
– Greedy optimization
19
SRM

Problems with the optimization approaches
– Too sensitive to initial conditions
– Too sensitive to stopping rules
– Too sensitive to many local minima

Two useful guidelines
– Do not attempt to solve a problem by indirectly
solving a harder general problem
– Occam’s razor: The best performance is provided
by a model of optimal complexity
20
Requirement of any
inductive-learning process
21
Types of Learning
Methods
Examples: logistic regression,
multilayered perception,
decision rules, decision trees,
etc.
Emphasis on a task-independent
measure of quality of representation.
22
Examples: cluster analysis, artificial
neural network, association rules
Common Learning Tasks






Classification
Regression
Clustering
Summarization (Formalized Description)
Dependency-modeling
Deviation Detection (Outlier, Changes in
time)
23
Data-mining and
Knowledge-discovery
techniques








Statistical Methods
Cluster Analysis
Decision Trees and Decision Rules
Association Rules
Artificial Neural Network
Genetic Algorithms
Fuzzy Inference Systems
N-dimensional Visualization Methods
24
4.5 Model Estimation
25
Testing
26
Objective of Testing
27
How to Split Samples
28
Common Resampling
Methods





Resubstitution Method
Holdout Method
Leave-one-out Method
Rotation Method
Bootstrap Method
29
Error rate, Accuracy



R= E / S
A = 1 – R = (S – E) / S
Two classes
– False Negative: False Reject Rate (FRR)
– False Positive: False Acceptance Rate
(FAR)

More than two classes
– Confusion matrix
30
Confusion matrix for
three classes
31
Receiver Operating
Characteristic (ROC) Curve


To evaluate FAR and FRR at the same time
The following ROC shows sensitivity (FAR) vs. 1-specificity (1-FRR)
FAR
32