AILearning

Transcript AILearning

Machine Learning
CPSC 315 – Programming Studio
Spring 2009
Project 2, Lecture 5
Forms of Learning

Supervised


Unsupervised


Learns from examples which provide desired
outputs for given inputs
Learns patterns in input data when no specific
output values are given
Reinforcement

Learns by an indication of correctness at end of
some reasoning
Supervised Learning

Must have training data including



Inputs (features) to be considered in decision
Outputs (correct decisions) for those inputs
Inductive reasoning

Given a collection of examples of function f, return
a function h that approximates f



Difficulty: many functions h may be possible
Hope to pick function h that generalizes well
Tradeoff between the complexity of the
hypothesis and the degree of fit to the data

Consider data modeling
Evaluating Supervised
Learning Algorithms


Collect a large set of examples (input/output pairs)
Divide into two disjoint sets





Training data
Testing data
Apply learning algorithm to training data, generating
a hypothesis h
Measure % of examples in the testing data that are
successfully classified by h (or amount of error for
continuously valued outputs)
Repeat above steps for different sizes of training
sets and different randomly selected training sets
Decision Trees

Map features of situation to decision

Example from a classification of unsafe acts:
Decision Trees

Relation to rule-based reasoning



Features of element used to classify element
Features of situation used to select action
Used as the basis for many “how to” books

How to identify type of snake?


How to fix an automobile?


Observable features of snake
Features related to problem and state of automobile
If features are understandable, the decision tree
can be used to explain decision
Learning Decision Trees

Types of Decision Trees



Learning a discrete-valued function is
classification learning
Learning a continuous-valued function is
regression
Assumption: use of Ockham’s razor will result
in more general function


Want the smallest decision tree, but that is not
tractable
Will be satisfied with smallish tree
Algorithm for Decision Tree
Learning

Basic idea



Recursively select feature that splits data (most)
unevenly
No need to use all features
Heuristic approach

Compare features for their ability to meaningfully
split data


Feature-value = greatest difference in average output
value(s) * size of smaller subset
Avoids splitting out individuals too early
Unsupervised Learning

Used to characterize/explain the key features
of a set of data



Techniques




No notion of desired output
Example: identifying fast-food vs. fine-dining
restaurants when classes are not known ahead of
time
Clustering (k means, HAC)
Self-Organizing Maps
Gaussian Mixture Models
More on this topic in Project 3
Reinforcement Learning


Many large problems do not have desired
outputs that can be used as training data
Process



Agent (system) performs a set of actions
Agent occasionally receives a reward to indicate
something went right or penalty to indicate
something went wrong
Agent has to learn relationship between the
model of the situation, the chosen actions, and
the rewards/penalties
Analogy to Animal Training



We cannot tell our pets what is right and wrong in
(preconditions, action) pairs
Instead we reward good behavior (giving treats) and
penalize bad behavior (spraying water or loud noise)
Pet has to learn when and where what is
appropriate


Can result in incorrect interpretations
(go in corner vs. go outside)
Difficulty: what of the prior/recent actions caused the
positive/negative outcome

Clicker training for animals is meant to help this
Reinforcement Learning in
Games

Simplest reinforcements



Winning or losing
Requires lots of games/time to learn
Other potential reinforcements

Opponent’s action selection




Did they minimize your goodness value
Modify goodness function to better match their moves
Potential to learn an individual’s values/strategy
Predicted goodness value vs. observed goodness value



This can be used in small (a few moves) or large (a game) time
scales
Similar to person reflecting on when things went wrong
Need to be careful in implementation or else goodness function will
return a constant (thus being totally consistent)
Modifying a Goodness
Function


Consider the game of chess
Presume goodness function has three linear
components



BoardControl
 the difference between the number of board positions that
Player1 and Player2 can get a piece to in one move
Threatened
 the difference between the number of opponents pieces
threatened (can be taken in one move) between Player1 and
Player2
Pieces
 the difference in the sum of the values of pieces left for
Player1 and Player2 where Queen = 10, Rook = 6, Bishop = 3,
Knight = 3, Pawn = 1
Modifying a Goodness
Function


G(s) = a*BoardControl + b*Pieces + c*Threatened
Modify coefficients to learn appropriate weighting of terms




Quantity of overall modification should relate to difference between
predicted goodness and observed goodness
Direction of modification to each linear component should be related
to whether they are consistent with or disagree with outcome
Could modify coefficients using fixed values (e.g. +/- .1) or with
values a function of their effect on overall G for the state being
considered
In theory, such a computer player could recognize that
BoardControl is more important early in a game, Pieces is more
important mid-game, and Threatened is more important for the
end game.

AILearning

Transcript AILearning

Directory