Why Machine Learning? - Lehrstuhl für Informatik 2
Download
Report
Transcript Why Machine Learning? - Lehrstuhl für Informatik 2
Machine learning
Overview
PD. Dr. Gabriella Kókai
[email protected]
Friedrich-Alexander-Universität
Lehrstuhl für Informatik 2
Raum 04.131
Tel: 8528996
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
1
Machine Learning: Content
Why Machine Learning?
How can a learning problem be defined
Designing a learning system: learning to play checker
Perspectives and questions in ML
Summary
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
2
Why Machine Learning? (1/10)
Webster 's definition of 'learn'
'To gain knowledge, or understanding of, or skill in by study
instruction or experience‘
Simons' definition (Machine Learning I, 1993, Chapter 2.)
'Learning denotes changes in the system that are adaptive in the
sense that they enable the system to do the same task or tasks
drawn from the same population more effectively the next
time‘
Donald Michie's Definition (Computer Journal 1991)
'A learning system uses sample data to generate an update basis
for improved (performance) on subsequent data from the same
source and express the new basis in intelligible symbolic form'
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
3
Why Machine Learning? (2/10)
Machine learning is typically thought of as a sup-topic of artificial
intelligence.
It is inspired by several disciplines
Cognitive
Science
Pattern
Recognition
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
Machine
Learning
Computer
Science
Statistic
4
Why Machine Learning? (3/10)
Relevant topics:
Artificial Intelligence:Learning: Learning symbolic representation of concepts, ML as search
problem , Prior knowledge + training examples guide the learning-process
Bayesian Methods:Calculating probabilities of the hypotheses, Bayesian-classifier
Theory of the computational complexity: Theoretical bounds of the complexity for different
learning task measured in the terms of the computational effort, number of different training
examples, the number of mistakes required in order to learn
Information theory:Measurement of the entropy, minimal description length, optimal codes and
their relationship to optimal training sequences for encoding a hypothesis
Philosophy: Occam's razor suggesting the simpliest hypothesis is the best
Psychology and Neurobiology: Motivation of NN the power law of the practice
Statistics: Characterisation of the errors (e.g. bias,variance), that occur when estimating the
accuracy of hypothesis based, confidence interval, statistical tests
Goal: Description of the different learning paradigms, the algorithms, the
theoretical results and applications
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
5
Why Machine Learning?(4/10)
Dimension: Constraints
Task/objective
Availability of the background knowledge
Encoded
Interactive
Availability of data
Learning task
Performance task
Incremental vs. batch
Passive vs. active
Characteristics of the data
Static vs. drifting
Propositional or first-order
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
6
Why Machine Learning?(5/10)
Dimension: Approach
Search mechanism
Top-Down (model driven)
Bottom-up (data driven)
Many others
Reasoning methods
Induction, abduction, deduction
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
7
Why Machine Learning? (6/10)
Deductive Reasoning: T B |= E
Inductive Reasoning:E B |= T
Abductive Reasoning: E T |= B
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
8
Why Machine Learning? (7/10)
Evaluation Methodologies
Mathematical
Previously: Learning in the limit
Now: PAC (Probably Approximately Correct)
Recent:
Best cases analysis (Helpful Teacher Model)
Average case analysis (constraining assumption)
Empirical:When mathematical analysis isn't obvious
More tolerant
Addresses efficiency constraints
Popular
Data intensive
Psychological
Goal: Model human learning behaviour
Method: Comparison with subject data
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
9
Why Machine Learning? (8/10)
Knowledge-Poor Supervised Learning
Knowledge-Intensive Supervised Learning
Given: A training set of annotated instances
To Induce: A hypothesis (concept description)
Given : A set of training instances + a hypothesis of the target concept +
background knowledge
To Induce: A modified hypothesis (concept description)
that is consistent with the domain theory & the training instances
Unsupervised learning: clustering
Given: A set of unclassified instances I
Have not any special target attribute
To Do: Create a set of clusters for I according to their presumed classes
Clusters need not to be disjoint
Clusters can be hierarchically related
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
10
Why Machine Learning? (9/10)
Paradigms knowledge-poor supervised learning:
Paradigms knowledge-intensive supervised learning:
Concept learning
Decision tree (ID3, TIDT)
Rule based
Lazy learning
Genetic algorithms
Neural networks
Bayesian networks
Explanation based learning
Inductive Logic Programming
Unsupervised learning
Bayesian learning
Clustering
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
11
Why Machine Learning? (10/10)
Importance: How can computers be programmed that they 'learn'
Machine learning natural learning
Application areas
Data mining: automatic detection of regularity in big amounts of
data
Implementation of software, which cannot be easily programmed
by hand
Self adaptive programs: programs for playing
Theoretical results: Connection among the number of training
examples, the hypothesis and the expected error
Biological studies
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
12
How can the learning problem be defined
Definition: A computer program is said to learn from experience E
with respect to some class of tasks T and performance measure P, if its
performance at tasks in T, as measured by P improves with experience
E
Example: Learning to play checker
Task T: design a program to learn to play checker
Performance measure P: The percentage of the games won
Experience E: Playing against itself
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
13
Content
Why Machine Learning?
How can the learning problem be defined
✗
✗
✗
✗
Choosing the training experience
Choosing the target function
Choosing the representation of the target function
Choosing a function approximation algorithm
Designing a learning system: learning to play checker
Perspectives and questions in ML
Summary
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
14
Choosing the Training Experience (1/2)
What experience is provided
Direct or indirect feedback regarding the choices executed by the system
Direct: Individual checker board states and the correct move for each
Indirect: move sequences and final outcomes
Problem: determining the degree to which each move in the sequence deserves credit
or blame for the final outcome (credit assignment)
The rate of the controls of the sequence of the training examples by the learning
system
The teacher selects informative board states and provides the correct move for each
The learner might itself propose board states that it finds particularly confusing and
ask the teacher for the correct move
The learner may have complete control over both the board states and the (indirect)
training classification, as it does when it learns playing against itself with no teacher
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
15
Choosing the Training Experience (2/2)
How well does it represent the distribution of examples over
which the final system performance P must be measured
Problem: The distribution of the training examples is identical
to the distribution of the test examples
A checkers learning problem:
Task T: playing checker
Performance measure P: percentage of games won in the world
tournament
Training experience E: games played against itself
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
16
Choosing the Target Function (1/2)
What type of knowledge will be learned and how will this be used
by the performaning program
Example: The program needs to learn how to choose the best
move from any board state
ChooseMove: B M
B: the set of legal board state
M: the set of legal moves
Problem: difficult to learn if only the kind of indirect training
experience is available to our system => V : B
B: the set of legal board states
: some real value
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
17
Choosing the Target Function (2/2)
Question: Definition of the target function V:
If b is a final board state that is won, then V b = 100
If b is a final board state that is lost, then V b = 100
If b is a final board state that is drawn, then V b = 0
If b is not a final state in the game, then V b = V b'
where b' is the best final board state that can be achieved
starting from b and playing optimally until the end of the game
(assuming the opponent plays optimally as well).
Problem: While this definition specifies a value of V(b) for every
board state b recursively, this definition is not usable by our V̂
checker's player because it is not efficiently computable
Solution: Discovering an operational description of the ideal
target function V, Difficult => learning some approximation
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
18
Choosing a Function Approximation
Algorithm (1/2)
How can V̂be represented?
For any given board state, the function V̂
will be calculated as
a linear combination of weights w i , i = 0, , 6
w0 + w1 bp b + w 2 rp b + w3 bk b + w 4 rk b + w 5 bt b + w 6 rt b
bp(p): the number of black pieces on the board
rp(b): the number of red pieces on the board
bk(b): the number of black kings on the board
rk(b): the number of red kings on the board
bt(b): the number of black pieces threatened by red
(i.e., which can be captured on red's next turn)
rt(b): the number of red pieces threatened by black
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
19
Choosing a Function Approximation
Algorithm (2/2)
Partial design of a checker learning program:
Task T: playing checker
Performance measure P: percentage of games won in the world
tournament
Training experience E: games played against itself
Target function
V : Board
Target function representation
V̂ b :
V b
w 0 w1 bp b
w2 rp b
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
w3 bk b
w 4 rk b
w5 bt b
w6 rt b
20
Choosing a Function Approximation Algorithm:
Estimating Training Values
How to assign training values to the more numerous intermediate
board states?
Approach: assign the training value of Vtrain b for any
intermediate board state b to be V̂ Successor b ,
where V̂ is the learner's current approximation to V and
where Successor(b) denotes the next board state following b
for which it is again the program's turn to move.
Rule for estimating the training values:
ˆ Successor b
Vtrain b V
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
21
Choosing a Function Approximation Algorithm:
Adjusting the Weights
LMS Weight update rule
(choosing the weights w i to best fit the set of training examples)
Best fit:minimise the squared error E between the training values
and the values V̂ b predicted by the hypothesis:
E
(b,V
(b))t
train
V
train
ˆ b
b V
2
For each training example (b, Vtrain (b))
Use the current weights to calculate:
ˆ b
error b = Vtrain b V
For each fi bp, rp, bk, rk, bt, rt update w i
w i w i + c fi b error b
c is a small constant 0,1 that moderates the size weight update.
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
22
Some Issues in Machine Learning
What algorithms can approximate functions well (and when?)
How does the number of training examples influence the accuracy?
How does the complexity of the hypothesis representation impact it?
How does noisy data influence the accuracy?
What are the theoretical limits of learnability?
How can prior knowledge of the learner help?
What clues can we get from a biological learning system?
How can systems alter their own representation?
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
23
Summary
Goal: Building computer programs that improve their performance at some
task through experience
Application domain:
Data Mining: discover automatically implicit regularities in large data sets
Poorly understood domains where humans might not have the knowledge
needed to develop effective algorithms
Domains where the program must dynamically adapt to changing conditions
ML draws on ideas from several sets of disciplines, including artificial
intelligence, probability and statistics, computational complexity
information theory, psychology and neurobiology, control theory and
philosophy
Well defined learning problem =
well specified task + performance metric + source of training examples
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
24