Modelling Language Evolution Lecture 1: Introduction to
Download
Report
Transcript Modelling Language Evolution Lecture 1: Introduction to
Modelling Language Evolution
Lecture 1: Introduction to Learning
Simon Kirby
University of Edinburgh
Language Evolution & Computation Research Unit
Course Overview
Learning
Introduction to neural nets
Learning syntax
Evolution
Syntax
Learning bias and structure
Culture
Iterated learning
The Talking Heads (practical)
Computers for modelling
Computers in linguistics
Engineering (speech and language technologies)
Research tools (waveform analysis, psycholinguistic
stimuli etc.)
Recently: modelling building
Why build models?
Why use computers?
What is a model anyway?
What is a model?
One view:
MODEL
THEORY
PREDICTION
OBSERVATION
We use models when we can’t be sure what our
theories predict
Especially useful when dealing with complex
systems
A simple example
Vowels exist in a “space”
Only some patterns arise cross-linguistically
E.g. vowel space seems to be symmetrically filled
Why?
Theory to Model
We need a theory to explain vowel-space universal
Possible theory:
Vowels tend to avoid being close to each other to
maintain perceptual distinctiveness.
Use model to test theory
(Liljencrants & Lindblom 1972)
In general, computational models
are useful when dealing with
“complex systems”
Is language a complex system?
Yes – evolution on many different timescales:
Individual
learning
Cultural
evolution
Biological
evolution
Computational models will help us understand these
interactions…
Learning
Language learning is crucial to language evolution
What is learning?
Learning occurs when an organism changes its internal
state on the basis of experience
What do we need to model learning?
1. a model of internal states
2. A model of experience
3. An algorithm to change 1 into 2
One approach: Neural nets
An approach to internal
states based on the
brain
An artificial neuron is a
computational unit that
sums inputs and uses
them to decide whether
to produce an output
Networks of neurons
Typically there will be many connected neurons
Information is stored in weights on the connections
Weights multiply signals sent between nodes
Signals into a node can be excitatory or inhibitory
An artificial neuron
neti wij a j
j
Add up all the inputs multiplied by their weights
f(net) is the “activation function” that scales the input
A useful activation function
1
ai
1 e neti
All or nothing for big excitations or inhibitions…
… but more sensitive in between.
AND: a very simple network
A network that works out if both inputs are activated:
OUTPUT
-7.5
5
5
BIAS NODE
(always set to 1.0)
INPUT 1
INPUT 2
Network gives an output over 0.5 only if both inputs are 1.
OR: another very simple network
A network that works out if either input is activated:
OUTPUT
-7.5
10
10
BIAS NODE
(always set to 1.0)
INPUT 1
INPUT 2
Network gives an output over 0.5 if either input is 1.
XOR: a difficult challenge
A network that works out if only one input is activated:
OUTPUT
?
?
?
BIAS NODE
(always set to 1.0)
INPUT 1
INPUT 2
Solution needs more complex net with three layers. WHY?
XOR network - step 1
XOR is the same as OR but not AND
Calculate OR
Calculate NOT AND
AND the results
AND
NOT AND
OR
XOR network - step 2
OUTPUT
BIAS NODE
-7.5
-7.5
5
5
AND
7.5
HIDDEN 1
HIDDEN 2
10
10
-5
-5
INPUT 1
INPUT 2
NOT AND
OR
But what about learning?
We now have:
a model of internal states (connection weights)
a model of experience (inputs and outputs)
Learning:
set the weights in response to experience
How?
Compare network behaviour with “correct” behaviour
Adjust the weights to reduce network error
Error-driven learning
1. Set weights to random values
2. Present input pattern
3. Feed-forward activation through the network to get
an output
4. Calculate difference between output and desired
output (i.e. error)
5. Adjust weights so that the error is reduced
6. Repeat until network is producing the desired
results.
Gradient descent
Gradient descent is a form of error-driven learning
Start on random point of “error surface”
Move on surface in direction of steepest slope
Potential problems:
May overshoot the global minimum
Might get stuck in local minimum
Example: learning past tense of verbs
Network that takes present tense form of verb…
…and produces past tense.
Uses examples to set weights
Generalises to add /-ed/ to verbs it’s never seen before.
Has it learnt a linguistic rule?
Is this psychologically plausible?
We need an error signal
Where does this error signal come from?
Possibilities:
A teacher
Reinforcement
The outcome of some prediction:
e.g. what’s the next word?
what’s the past tense of this verb?
Summary
Modelling tests theories
Computer modelling appropriate for complex
systems
Language evolution involves several complex
systems
Neural nets are one approach to modelling learning
Networks can be made to adapt to data through
error-driven learning
Next lecture: how to model acquisition of syntax