Transcript ppt

Introduction to Learning
ECE457 Applied Artificial Intelligence
Spring 2008
Lecture #11
Outline


Overview of learning
Supervised learning


Unsupervised learning


Russell & Norvig, sections 18.1-18.3, 19.1
Russell & Norvig, section 20.3
Reinforcement learning

Russell & Norvig, sections 21.1, 21.3, 21.5
 CS 498 & CS 698 (Prof. Ben-David)
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 2
Limit of Predefined Knowledge

Many of the algorithms and techniques we
saw relied on predefined information





Probability distributions
Heuristics
Utility functions
Only works if this information is easily
available
For real-world application, often preferable to
make agent learn the information
automatically
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 3
Overview of Learning

Learn what?




Facts about the world (KB)
Decision-making strategy
Probabilities, costs, functions, states, …
Learn from what?

Training data



Often freely available for common problems
Real world
Learn how?

Need some form of feedback to the agent
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 4
Learning and Feedback

Supervised learning




Unsupervised learning




Training data includes correct output
Learn relationship between data and output
Evaluate with statistics
Training data with no correct output
Learn patterns in the data
Evaluate with fitness of the pattern
Reinforcement learning



Set of actions & state rewards or punishments
Learn to maximize reward & minimize punishment
Evaluate with value of reward
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 5
Supervised Learning

Given a training corpus of data-output
pairs




x & y values
Email & spam/not spam
Variable values & decision
Learn the relationship mapping the data
to the output



f(x)
Spam features
Decision rules
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 6
Supervised Learning Example
-
-
-
-
-
-
+ + +
+
+ +
-+ +
+ +
+ +
+ +
+
- - - - -
-
ECE457 Applied Artificial Intelligence


2D state space with
binary classification
Learn function to
separate both classes
R. Khoury (2008)
Page 7
Decision Tree
-
-
-
-
-
-
+ + +
+
+ +
-+ +
+ +
+ +
+ +
+
- - - - -
-
2
X>5
8
Yes
No
-
X<2
Yes
No
-
Y>8
4
Yes
No
-
Y<4
No
Yes
-
+
5
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 8
Decision Tree
-
-
-
-
-
-
+ + +
+
+ +
-+ +
+ +
+ +
+ +
+
- - - - -
-
2
X
8
<2
>5
-
Other
-
Y
<4
-
>8
Other
-
+
4
5
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 9
Decision Tree

Multiple variables and value to decide
whether email is spam
Email
E1
E2
E3
E4
E5
E6
E7
E8
E9
E10
E11
E12
UW
N
N
N
N
Y
N
Y
N
Y
N
N
N
Pics
Y
N
Y
N
N
Y
N
Y
Y
Y
N
Y
Words
58
132
1049
18
26
32
44
256
2789
857
732
541
ECE457 Applied Artificial Intelligence
Attach
Y
N
Y
Y
Y
N
N
N
Y
Y
N
N
Multi
Y
Y
Y
Y
Y
Y
Y
Y
N
N
N
Y
R. Khoury (2008)
Spam?
Yes
No
Yes
Yes
No
Yes
No
Yes
No
No
No
Yes
Page 10
Decision Tree

Build decision tree
UW
No
Yes
Not Spam
Multi
No
Yes
Not Spam
Attach
Yes
No
Spam
Pictures
Yes
Spam
ECE457 Applied Artificial Intelligence
No
Not Spam
R. Khoury (2008)
Page 11
Supervised Learning Algorithm
-
-
-
-
-
-
+ + +
+
+ +
-+ +
+ +
+ +
+ +
+
- - - - -
-
ECE457 Applied Artificial Intelligence

Many possible learning
techniques, depending
on the problem and the
data



Start with inaccurate
initial hypothesis
Refine to reduce error or
increase accuracy
End with trade-off
between accuracy and
simplicity
R. Khoury (2008)
Page 12
Trade-Off
-
++
+
+
+
+ +
+ - +
- +
+
+ +
+ +
- - - +
- -
ECE457 Applied Artificial Intelligence

Why trade-off
between accuracy
and simplicity?



Noisy data
Special cases
Measurement errors
R. Khoury (2008)
Page 13
Supervised Learning Algorithm

Learning a decision tree follows the
same general algorithm


Start with all emails at root
Pick attribute that will teach us the most



Highest information gain, i.e. difference of
probability of each class
Branch using that attribute
Repeat until trade-off between accuracy of
leafs and depth limit / relevance of
attributes
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 14
Supervised Learning Evaluation

Statistical measures of agent’s
performance


RMS error between f(x) and y
Making correct decision




With as few decision rules as possible
Shallowest tree possible
Accuracy of a classification
Precision and recall of a classification
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 15
Precision and Recall


Binary classification: distinguish + (our target)
from – (everything else)
Classifier makes mistakes


Classifies some + as – and some – as +
Define four categories:
Classified
as
ECE457 Applied Artificial Intelligence
+
–
Actual value
+
–
True
False
Positives Positives
False
True
Negatives Negatives
R. Khoury (2008)
Page 16
Precision and Recall

Precision



Proportion of selected items the classifier
got right
TP / (TP + FP)
Recall


Proportion of target items the classifier
selected
TP / (TP + FN)
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 17
Precision and Recall


Why ignore True Negatives?
Typically, there are a lot more negatives
than positives


Internet searches: + are target websites,
- are all other websites
Counting TN would skew the statistics
and favour a system that classifies
everything as negatives
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 18
Overfitting


A common problem with supervised learning
is over-specializing the relation learned to the
training data
Learning from irrelevant features of the data


Works well on training data


Email features such as: paragraph indentation,
number of typos, letter “x” in sender address, …
Because of poor sampling or random chance
Fails in real-world tests
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 19
Testing Data

Evaluate the relation learned using
unseen test data



i.e. that was not used in the training
Therefore system not overfitted for it
Split training data beforehand, keep
part away for testing



Only works once!
If you reuse testing data, you are
overfitting your system for that test!!
Never do that!!!
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 20
Cross-Validation

Shortcomings of holding out test data



Test only works once
Training on less data, therefore result less accurate
n-fold cross-validation




Split the training corpus into n parts
Train with n-1, test with 1
Run n tests, each time using a different test part
Final training with all data and best features
n
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 21
Naïve Bayes Classifier

P(Cx|F1,…,Fn) = P(Cx) i P(Fi|Cx)


Weighted Naïve Bayes Classifier




Classify item in class Cx with maximum
probability
Paper on website
Give each feature Fi a weight wi
Learn the proper weight values
P(Cx|F1,…,Fn) = P(Cx) i P(Fi|Cx)wi
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 22
Learning the Weights


Start with initial weight values
At each iteration, for each feature



Measure the impact of that feature on the
accuracy of the classification
Modify the weight to increase the accuracy
of the classification
End if


Iteration limit is reached
Accuracy increase less than threshold
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 23
Learning the Weights

Define the following




Initial weight values: wi(0) = 1
Learning rate: 
Measure of the accuracy of the
classification using feature Fi at
iteration n: Ain
Function to convert Ain into weight
variation: (Ain)



(Ain) = (1 + e-Ain)-1 * [1 - (1 + e-Ain)-1]²
Threshold improvement in accuracy:
є
Iteration limit: nmax
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 24
Learning the Weights


Start with wi(0)
At iteration n, for feature Fi




Measure Ain
Compute wi(n) = (Ain)
wi(n) = wi(n-1) + wi(n)
End if


n = nmax (entire algorithm)
Ain < є (feature Fi)
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 25
Unsupervised Learning

Given a training corpus of data points




Observed value of random variables in
Bayesian network
Series of data points
Orbits of planets
Learn underlying pattern in the data



Existence and conditional probability of
hidden variables
Number of classes and classification rules
Kepler’s laws of planetary motion
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 26
Unsupervised Learning Example

**
*
*
* * **
** *
* *
*
***
* ***
**
*
*
*
* *
*
*
**


2D state space with
unclassified observations
Learn number and form
of clusters
Problem of unsupervised
clustering


ECE457 Applied Artificial Intelligence
Many algorithms proposed
for it
More research still being
done for better algorithms,
different kind of data, …
R. Khoury (2008)
Page 27
Unsupervised Learning Algorithm

**
*
*
* * **
** *
* *
*
***
* ***
**
*
*
*
* *
*
*
**
ECE457 Applied Artificial Intelligence

Define a similarity
measure, to compare
pairs of elements
Starting with no clusters



Pick seed element
Group similar elements
until threshold
Pick new seed from free
elements and start again
R. Khoury (2008)
Page 28
Unsupervised Learning Algorithm

**
*
*
* * **
** *
* *
*
***
* ***
**
*
*
*
* *
*
*
**
Starting with one allencompassing cluster





ECE457 Applied Artificial Intelligence
Find cluster with highest
internal dissimilarity
Find most dissimilar pair of
elements inside cluster
Split into two clusters
Repeat until all clusters
have internal homogeneity
Merge homogeneous
clusters
R. Khoury (2008)
Page 29
Unsupervised Learning Evaluation

Need to evaluate fitness of relationship
learned




Number of clusters vs. their internal
properties
Difference between clusters vs. internal
homogeneity
Number of parameters vs. number of
hidden variables in Bayesian network
No way of knowing what is the optimal
solution
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 30
K-Means



Popular unsupervised clustering
algorithm
Data represented as cloud of points in
state space
Target


Group points in k clusters
Minimize intra-cluster variance
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 31
K-Means


Start with k random cluster centers
For each iteration

For each data point




Associate the point to the nearest cluster
center
Add to variance
Move each cluster center to the center of
mass of associated data point cloud
End when


Variance less than threshold
Cluster centers stabilize
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 32
K-Means

We have:




Data points: x1, …, xi, …, xn
Clusters: C1, …, Cj, … Ck
Cluster centers: 1, …, j, … k
Minimize intra-cluster variance

V = j xiCj |xi - j|²
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 33
K-Means Example
**
*
*
o
*
o
* **
** *
* *
*
o
***
**
*o
*o**
oo
*
*
*
o
* *o o * *
**
o
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 34
Reinforcement Learning

Given a set of possible actions, the resulting
state of the environment, and rewards or
punishment for each state



Taxi driver: tips, car repair costs, tickets
Checkers: advantage in number of pieces
Learn to maximize the rewards and/or
minimize the punishments


Maximize tip, minimize damage to car and police
tickets: drive properly
Protect own pieces, take enemy pieces: good play
strategy
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 35
Reinforcement Learning


Learning by trial and error
Try something, see the result




Speeding results in tickets, going through a red
light results in car damage, quick and safe drive
results in tips
Checkers pieces in the center of the board are
soon lost, pieces on the side are kept longer,
sacrifice some pieces to take a greater number of
enemy pieces
Sacrifice known rewarding actions to explore
new, potentially more rewarding actions
Develop strategies to maximize rewards while
minimizing penalties over the long-term
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 36
Q-Learning

Each state has



A reward or punishment
A list of possible actions, which lead to
other states
Learn value of state-action pairs

Q-value
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 37
Q-Learning


Update value of previous (t-1) state-action
pair based on current (t) state-action value
Q(st-1,at-1) = [Rt-1 + maxa(Q(st,at)) –
Q(st-1,at-1)]




Q(s,a): estimated value of state-action pair (s,a)
Rt: reward of state st
: learning rate
: discount factor of future rewards


0 (future rewards are irrelevant), 1 (future rewards are
the same as current rewards)
Update: Q(st-1,at-1) = Q(st-1,at-1) + Q(st-1,at-1)
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 38
Exploration Function


If agent always does action with max
Q(s,a), it always evaluates the same
state-action pairs
Need exploration function



Trade-off greed vs. curiosity
Try rarely-explored low-payoff actions
instead of well-known high-payoff actions
Many possible functions
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 39
Exploration Function

Define:







Q(s,a): estimated value of (s,a)
N(s,a): Number of times (s,a) has been tried
Rmax: maximum possible value of Q(s,a)
Nmin: minimum number of times we want the
agent to try (s,a)
f( Q(s,a), N(s,a) ) =
Rmax if N(s,a) <
Nmin
Q(s,a) otherwise
Agent picks action with maximum f(.) value
Guarantees each (s,a) pair is explored at
least Nmin times
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 40
Limits of RL

Search



Number of state-action pairs can be very
large
Intermediate rewards can be noisy
Real-world search



Initial policy can have very poor reward
Necessary exploration of suboptimal
actions can be costly
Some states are hard to reach
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 41
Policy




Learn the optimal policy in decision
network
: S  A
EU() = t=0 t Rt
Greedy search

Modify policy until EU stops increasing
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 42
Helicopter Flight Control

Sustained stable inverted flight


Very difficult for humans
First AI able to do it
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 43
Helicopter Flight Control


Collect flight data with human pilot
Learn model of helicopter dynamics



Stochastic and nonlinear
Supervised learning
Learn policy for helicopter controller

Reinforcement learning
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 44
Helicopter Dynamics

States



391 seconds of flight data




Position, orientation,
velocity, angular velocity
12 variables
Time step 0.1s
3910 triplets (st, at, st+1)
Learn probability distribution P(st+1|st,at)
Implemented in simulator and tested by
pilot
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 45
Helicopter Controller

Problem definition







S: set of possible states
s0: initial state (s0  S)
A: set of possible actions
P(S|S,A): state transition probabilities
: discount factor
R: reward function mapping states to values
At state st, controller picks action at, system
transitions to random state st+1 with
probability P(st+1|st,at)
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 46
Helicopter Controller

Reward function



Policy learning



Punish deviation from desired helicopter
position and velocity
R  [-, 0]
Reinforcement learning
EU() = t=0 t Rt
Problem


Stochastic state transitions
Impossible to compare several policies!
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 47
PEGASUS algorithm

Predefined series of random numbers


Use the same series to test all policies


Length of series function of complexity of policy
At time t, each policy encounters the same
random event
Simulate stochastic environment



Environment stochastic from point of view of
agent
Environment deterministic from our point of view
Makes comparison between policies possible
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 48
Summary of Learning
Training
data
Supervised
Data and
correct
output
Learning
target
Data-output
relationship
Patterns in
data
Policy
Evaluation
Statistics
Fitness
Reward value
Typical
application
Classifiers
Clustering
Controllers
ECE457 Applied Artificial Intelligence
Unsupervised Reinforcement
States,
Data
actions, and
rewards
R. Khoury (2008)
Page 49