The Symbolic vs Subsymbolic Debate

Download Report

Transcript The Symbolic vs Subsymbolic Debate

Symbolic vs Subsymbolic,
Connectionism (an Introduction)
H. Bowman
(CCNCS, Kent)
Overview
• Follow up to first symbolic – subsymbolic
talk
• Motivation,
– clarify why (typically) connectionist networks
are not compositional
– introduce connectionism,
• link to biology
• activation dynamics
• learning algorithms
Recap
A (Rather Naïve) Reading Model
PHONOLOGY
/p/.1 /b/.1
/u/.1 /p/.2 /b/.2
/u/.2
/p/.3 /b/.3
/u/.3 /p/.4 /b/.4
/u/.4
A.1 B.1
Z.1 A.2 B.2
Z.2
A.3 B.3
Z.3 A.4 B.4
Z.4
SLOT 1
ORTHOGRAPHY
Compositionality
• Plug constituents in according to rules
• Structure of expressions indicates how they should
be interpreted
• Semantic Compositionality,
“the semantic content of a (molecular) representation is a
function of the semantic contents of its syntactic parts,
together with its constituent structure”
[Fodor & Pylyshyn,88]
• Symbolists argue compositionality is a defining
characteristic of cognition
Semantic Compositionality in
Symbol Systems
• Meanings of items
plugged in as defined
by syntax
M[ X ] denotes
meaning of X
M[ John loves Jane ]
=
M[
John
]
M[
loves
]
M[
Jane
]
………….
..………..
Semantic Compositionality
Continued
• Meanings of atoms constant across
different compositions
M[ Jane loves John ]
=
M[
Jane
]
M[
loves
]
M[
John
]
………….
..………..
The Sub-symbolic Tradition
Rate Coding Hypothesis
• Biological neurons fire spikes (pulses of
current)
• In artificial neural networks,
– nodes reflect populations of biological neurons
acting together, i.e. cell assemblies;
– activation reflects rate of spiking of underlying
biological neurons.
Activation in Classic Artificial
Neural Network Model
Positive weights: Excitation
Negative weights: Inhibition
output - yj
sigmoidal
activation
node j value - yj
integrate h   x w
(weighted sum)
net input - hj
w1j
x1
w2j
x2
inputs
y j  1h j
1 e
j
i
i
wnj
xn
ij
Sigmoidal Activation Function
Saturation:
unresponsive at
high net inputs
activation (y )
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
Threshold:
unresponsive at
low net inputs
y j  1h
1 e j
0.1
0
-4
-3
-2
-1
0
1
net input (h )
2
3
4
Responsive
around net input
of 0
Characteristics
• Nodes homogeneous and essentially dumb
• Input weights characterize what a node
represents / detects
• Sophisticated (intelligent?) behaviour
emerges from interaction amongst nodes
Learning
• directed weight adjustment
• two basic approaches,
– Hebbian learning,
• unsupervised
• extracting regularities from environment
– error-driven learning,
• supervised
• learn an input to output mapping
Example: Simple
Feedforward Network
Output
Hidden
Input
Use term PDP
(Parallel Distributed
Processing)
• weights initially set
randomly
• trained according to
set of input to output
patterns
• error-driven,
– for each input, adjust
weights according to
extent to which in
error
Error-driven Learning
• can learn any (computable) input-output
mapping (modulo local minima)
• delta rule and back-propagation
• network learning completely determined
by patterns presented to it
Example Connectionist Model
• “Jane Loves John” difficult to represent
in PDP models
• Word reading as an example
– orthography to phonology
• Words of four letters or less
• Need to represent order of letters,
otherwise, e.g. slot and lots the same
• Slot coding
A (Rather Naïve) Reading Model
PHONOLOGY
/p/.1 /b/.1
/u/.1 /p/.2 /b/.2
/u/.2
/p/.3 /b/.3
/u/.3 /p/.4 /b/.4
/u/.4
A.1 B.1
Z.1 A.2 B.2
Z.2
A.3 B.3
Z.3 A.4 B.4
Z.4
SLOT 1
ORTHOGRAPHY
pronunciation of a as an example
•
Illustration 1: assume a “realistic” pattern set,
–
a pronounced differently,
1. in different positions
2. with different surrounding letters (context), e.g. mint - pint
both built into patterns
–
frequency asymmetries,
•
•
–
how often a appears at different positions throughout language
reflects how effectively pronounced at different positions
strange prediction: if child only seen a in positions 1 to 3, reach state
in which (broadly) can pronounce a in positions 1 to 3, but not at all
in position 4; that is, cannot even guess at pronunciation, i.e. get
random garbage!
labelling externally imposed: no requirement that the label a
interpreted the same in different slots
•
in symbol systems, every occurrence of a interpreted identically
– contextual influences can be beneficial, for
example,
• reflecting irregularities, e.g. mint – pint
• pronouncing non-words, e.g. wug
– Nonetheless, highly non-compositional: no
sense to which plug in constituent
representations
– can only recognise (and pronounce) a in
specific contexts, but not at all in others.
– surely, sense to which, learn individual
(substitutable) grapheme – phoneme mappings
and then plug them in (modulo contextual
influences).
•
Illustration 2: assume artificial pattern set in
which a mapped in each position to same
representation.
– (assuming enough training) in sense, a in all positions
similarly represented
– but,
•
not actually identical,
1. random initial weight settings imply different (although similar)
hidden layer representations
2. perhaps glossed over by thresholding at output
•
•
•
still strange learning prediction: reach states in which can
recognise a in some positions, but not at all in others
also, amount of training needed in each position is exorbitant
fact that can pronounce a in position i does not help to learn a
in position j; start from scratch in each position, each of
which is different and separately learned
Connectionism & Compositionality
• Principle:
– with PDP nets, contextual influence inherent,
compositionality the exception
– with symbol systems, compositionality inherent,
contextual influence the exception
• in some respects neural nets generalise well, but in
other respects generalise badly.
– appropriate: global regularities across patterns extracted
(similar patterns treated similarly)
– inappropriate: with slot coding, component
representations not reused
Connectionism & Compositionality
• alternative connectionist models may do better,
but not clear that any is truly systematic in sense
of symbolic processing
• alternative approaches,
– localist models, e.g. Interactive Activation or Activation
Gradient models
– O’Reilly’s spatial invariance model of word reading?
– Elman nets – recurrence for learning sequences.
References
•
•
•
•
•
•
•
•
•
•
Anderson, J. R. (1993). Rules of the Mind. Hillsdale, NJ: Erlbaum.
Bowers, J. S. (2002). Challenging the widespread assumption that connectionism and distributed
representations go hand-in-hand. Cognitive Psychology., 45, 413-445.
Evans, J. S. B. T. (2003). In Two Minds: Dual Process Accounts of Reasoning. Trends in Cognitive
Sciences, 7(10), 454-459.
Fodor, J. A., & Pylyshyn, Z. W. (1988). Connectionism and Cognitive Architecture: A Critical Analysis.
Cognition, 28, 3-71.
Hinton, G. E. (1990). Special Issue of Journal Artificial Intelligence on Connectionist Symbol Processing
(edited by Hinton, G.E.). Artificial Intelligence, 46(1-4).
O'Reilly, R. C., & Munakata, Y. (2000). Computational Explorations in Cognitive Neuroscience:
Understanding the Mind by Simulating the Brain.: MIT Press.
McClelland, J. L. (1992). Can Connectionist Models Discover the Structure of Natural Language? In R.
Morelli, W. Miller Brown, D. Anselmi, K. Haberlandt & D. Lloyd (Eds.), Minds, Brains and Computers:
Perspectives in Cognitive Science and Artificial Intelligence (pp. 168-189). Norwood, NJ.: Ablex
Publishing Company.
McClelland, J. L. (1995). A Connectionist Perspective on Knowledge and Development. In J. J. Simon &
G. S. Halford (Eds.), Developing Cognitive Competence: New Approaches to Process Modelling (pp.
157-204). Mahwah, NJ: Lawrence Erlbaum.
Page, M. P. A. (2000). Connectionist Modelling in Psychology: A Localist Manifesto. Behavioral and
Brain Sciences, 23, 443-512.
Pinker, S., Ullman, M. T., McClelland, J. L., & Patterson, K. (2002). The Past-Tense Debate (Series of
Opinion Articles). Trends Cogn Sci, 6(11), 456-474.