neural network

Download Report

Transcript neural network











Machine learning basics
Types of learning
Principles of learning for game design
Learning Distinctions
How the (natural) brain works
Neural networks
Perceptrons
Training a perceptron
Evaluating a perceptron
The problem with perceptrons
1



Nature shows that oganisms do better if they can change their
behaviour in reponse to their environments, and thus profit from
experience
Learning is “Any change in a system that allows it to perform better the
second time on repetition of the same task or on another task drawn from
the same population.” (Simon, 1983)
Researchers distinguish a lot of different types of machine learning:
- rote learning (memorisation) – have the machine store new
behaviour-controlling data
- taking advice (direct instruction) – communicate procedures to the
machine, and have it follow them
- learning for problem solving – generating new ways of problem
solving from own experience ie without an instructor/advisor
- learning from examples (induction) – observe examples of
solutions to problems, then adapt these to own problems
- discovery (deduction) – Like learning from problem solving, but
focussed on logical inference from facts to answers
- explanation-based learning - extracts the concept behind the explanation
for one example, then generalize to other instances. Requires domainspecific knowledge.
- analogy – similar to inductions, but uses analogical knowledge
structures eg “infectious desease is like an enemy invasion”
- neural nets and genetic algorithms (more on these later)



Here we will take a fairly pragmatic, engineering approach,
distinguishing two goals of learning - adaption vs optimisation
adaption – the nature and goal of the problem changes over time, and
the behaviour continues to change incrementally (dynamic)
optimisation – attempts to find a solution to a known problem, usually
by adjusting some parameters to an ideal value; the nature and goals
of problem do not change (static)

Learn what? What might an animat need to learn to play better?

Facts e.g. “behind the gold door there’s a dangerous monster”

Behaviours e.g. swimming when in water so as not to drown




Sometimes it’s not clear whether the knowledge we are interested
in is about facts (declarative), behaviours (procedures), or both
For example, shooting is partly about knowing where to aim, and
partly about the skill required to aim properly
From a technology point of view, it’s possible to use a good
representation system for both. For example, a frame can be used
both to store facts and to launch actors, which generate behaviour
To make sensible distinctions between these things, one has to get
specific about a particular representation technique
4





In practice games developers generally allow factual knowledge
acquired during the game to be queried by a process that’s fixed at
design-time e.g. with a problem-solving RBS, the rules are learned
but the reasoning process is programmed in
Behaviours, on the other hand, are generally learned continuously
and dynamically as the game is played, possibly with a different
kind of AI technique, such as a neural network
The important distinction here is at what stage of game
development is the learning applied – offline, while the game is
being written vs online, while the game is being played
Optimisation is most naturally associated with offline learning,
while adaption is most naturally carried out online
Both should be considered for any animat, but Chapandard argues
that offline learning is underused and should be considered first
for many behaviours
5






Another distinction is that of batch learning, in which facts or
learned representations are extracted all at once from a historical
record of many experiences...
...vs incremental learning, in which sample experiences are
processed one by one, with small changes to facts or learned
representations are made at each turn
Batch and incremental learning are independent of online or
offline methods (but batch learning is generally done offline, while
incremental learning is generally online)
Yet another distinction is that of supervised vs unsupervised learning
In supervised learning, example data is provided in the form of
pairs, mapping input to output of the form x->y, where y is the
desired output
In unsupervised learning, general mathematical principles are
applied to the input data so as to allow a representation to selforganise. This representation may then be interpreted by other AI
methods for meaning
6







A neural network can be thought of as a model of the human
brain
The brain consists of a very densely interconnected set of nerve
cells, called neurons
The human brain incorporates nearly 10 billion neurons and 60
trillion connections, called synapses, between them
Our brain can be considered as a highly complex, non-linear and
massively parallel information processing system
Information is stored and processed in a neural network
simultaneously throughout the whole network, rather than at
specific locations
Learning is a fundamental and essential characteristic of biological
neural networks
The ease with which they can learn led to attempts to emulate a
biological neural network in a machine
7
The neuron
8






A neural network consists of a number of very simple processing units
which are like the biological neurons in the brain. Today this is
almost always simulated in software, using a conventional computer
The ‘neuron’ units are connected by a network of weighted links passing
signals from one neuron to another
The outgoing connection splits into a number of branches that all
transmit the same output signal. The outgoing branches are the
incoming connections of other neurons in the network
Neural networks generally have inputs that come from sensory cells
in the outside world. Others send their output to the outside world.
Processor units deep in the network that are connected to neither
inputs nor outputs are called hidden units
The weighted links can be made to change systematically in response
to patterns of input applied to machine by means of an algorithm –
hence the machine can exhibit a kind of learning
One can experiment with different activation functions for the
neurons, with different network architectures, and with different
learning algorithms
9

A typical architecture is shown here
input patterns
from sensors
output code
class A B C D
• There are three (vertical) layers: input, output and one hidden layer.
Each neuron in a layer is connected to the layer before it
• In a typical experiment, we might want the network to learn to
classify patterns arriving from input ie it would output a code
representing the class it chose for its current input pattern
• Each processing unit would be set to compute a neuronal function
• First, the network would be put into training mode, meaning the
weights between neurons would be allowed to change
• Then a set of training patterns – consisting of pairs (example
inputs and their correct outputs) would be applied, and the weights
incrementally changed by a learning algorithm.
10
no meaning
lower case
80 inputs
upper case
6
a
Q
numbers
• After the network had been trained, the weights would be fixed, and
the learning algorithm turned off – this is recognition mode
• Old or new patterns arriving at the imputs would then be allowed to
generate a class code. The performance score of the network would
be the proportion of patterns correctly classified
• For example, we might feed a 8 x 10 black-and-white ‘retina’ in as
input, and ask the network to distinguish numbers, upper-case
characters and lower-case characters
inputs ‘retina’
11
• The earliest type of network to be studied was the perceptron, a
single layer of processing elements (Rosenblatt, 1959)
x1
X1
w1
x2
X2
w2
x3
X3
w3
x4
X4
w0
y
Input vector ->x = [x1...xn]
Weight vector ->w = [w0...wn] (w0 is a bias)
Output vector ->x = [y1...ym] can have
multiple outputs; here this is only one
w4
• Data type for inputs, outputs may be binary (0,1) or continuous (real
numbers). Weights are generally continuous (typically 0.0-1.0)
• Networks in FEAR use 32-bit floating point numbers for weights
• Now that we have a network architecture, we need to specify an
activation function for neurons, an algorithm for recognition mode, and
a weight change algorithm for training mode
13
• Operating the perceptron in recognition mode is done in 2 steps:
• 1) Combine weighted inputs (compute using net sum) ζ
n
ζ = w0 + ∑ wixi
i=1
• 2) Apply activation function σ to the combination function to produce
the output, y
y = σ(ζ)
The activation function here will be a step or threshold function (others
are possible)
σ(x) = {
1, if x > 0
0 otherwise
In practice, smoother, more continuous
functions are preferred because they allow
continuous outputs, and it makes training
easier
14
• So the following algorithm may be applied to simulate the output of a
single perceptron. Assume initialised vectors input and weight
For convenience we treat w0 as a (fixed) input at input[0]
netsum = 0
for all i
netsum = netsum + input [i] * weight [i]
end
output = activation (netsum)
•With a random set of weights, the perceptron will do a poor job of
approximating the function we want. There will be an ideal set of weights
that makes it approximate the required function well
•We find those weights by repeatedly clamping the input and output of the
perceptron to a known good example pair, then somehow changing the
weights to get closer to that mapping. This is supervised learning.
• One way of doing this is using the delta rule to optimise the weights
incrementally as good training pairs are applied
• The delta rule works on the principles that
- each weight contributes to the output by a variable amount
- The scale of this contribution depends on the input
- The error of the output can be assigned to (blamed on) the weights
• At each example pair, we compare the output of the perceptron y
with the ideal output t which gives a relative error E. This is
computed by an error function, which is commonly least mean
square (LMS) E = ½ (t - y)2
• Then adjust each weight so that the actual output will be slightly
closer to the ideal (lower LMS) next time
• If the error function is well-behaved (descends monotonically as we
get closer to the ideal perceptron function) we can use this to learn
16
• The delta rule
∆wi = -η ∂E
∂wi
= ηx1(t-y), where η is a small learning rate
• Algorithm to apply delta rule. The objective function is a test which
determines the overall quality of output (eg total weight adjustment >
some threshold)
initialise weights randomly
while (objective function is unsatisfied)
for each sample pair
simulate the perceptron
if result is invalid
//eg E > some small value
for all inputs i
delta = desired - output
weight[i] += eta * delta * input[i]
end for
end if
end for
end while
17
• But there is an entire family of patterns which perceptrons cannot
learn at all
• In 1969, Minsky & Papert published an influential book in which they
proved mathematically that a perceptron cannot learn functions
which involve patterns that are linearly seperable eg XOR
• In essence, the argument from these classical AI supporters was
that if a function as simple as XOR could not be learned, neural
networks were not worth persuing
• The book was very influential, and for a decade, funding for neural
net projects all but evaporated
• With an extra (hidden) layer, a neural network could handle such
functions – but then there would be no way of computing an error
term for the weights of the hidden layer, so the network could not be
made to learn
• That is, there was no solution to the credit assignment problem
• Minsky, M. & Papert, S, Perceptrons, An Introduction to
Computational Geometry. Cambridge: MIT Press, 1969
• Rosenblatt, F. Principles of Neurodynamics. Spartan Books, 1958
• Simon, H. "Why should machines learn?", in Machine Learning: An
Artificial Intelligence Approach, R. S. Michalski, J. G. Carbonell and
T. M. Mitchell, Editors 1983, Berlin: Springer-Verlag pp. 25-37. 1983.
19