Semantics Without Categorization

Download Report

Transcript Semantics Without Categorization

Does the Brain Use Symbols or
Distributed Representations?
James L. McClelland
Department of Psychology and
Center for Mind, Brain, and Computation
Stanford University
Parallel Distributed Processing
Approach to Semantic Cognition
• Representation is a pattern
of activation distributed
over neurons within and
across brain areas.
• Bidirectional propagation
of activation underlies the
ability to bring these
representations to mind
from given inputs.
• The knowledge underlying
propagation of activation is
in the connections.
language
Development and Degeneration
• Learned distributed representations in an
appropriately structured distributed
connectionist system underlies the
development of conceptual knowledge.
• Gradual degradation of the representations
constructed through this developmental
process underlies the pattern of semantic
disintegration seen in semantic dementia.
Differentiation, ‘Illusory Correlations’, and
Overextension of Frequent Names in
Development
The Rumelhart Model
The Training Data:
All propositions true of
items at the bottom level
of the tree, e.g.:
Robin can {grow, move, fly}
Target output for ‘robin can’ input
Forward Propagation of Activation
aj
wij
neti=Sajwij
ai
wki
Back Propagation of Error (d)
aj
wij
di ~
Sdkwki
ai
wki
Error-correcting learning:
At the output layer:
At the prior layer:
…
dk ~ (tk-ak)
Dwki = edkai
Dwij = edjaj
Early
Later
Later
Still
E
x
p
e
r
i
e
n
c
e
Why Does the Model Show
Progressive Differentiation?
• Learning in the model is sensitive to patterns
of coherent covariation of properties
• Coherent Covariation:
– The tendency for properties of objects to
co-vary in clusters
Patterns of Coherent
Covariation in the
Training Set
•
Patterns of coherent covariation are
reflected in the principal
components of the property
covariance matrix of the training
patterns.
•
Figure shows attribute loadings on
the first three principal components:
– 1. Plants vs. animals
– 2. Birds vs. fish
– 3. Trees vs. flowers
•
•
Same color = features covary in
component
Diff color = anti-covarying
features
Illusory Correlations
• Rochel Gelman found that children think that
all animals have feet.
– Even animals that look like small furry balls
and don’t seem to have any feet at all.
• A tendency to over-generalize properties
typical of a superordinate category at an
intermediate point in development is
characteristic of the PDP network.
A typical property that
a particular object lacks
e.g., pine has leaves
An infrequent,
atypical property
A One-Class and a Two-Class Naïve Bayes Classifier Model
Property
One-Class Model
1st class in
two-class model
2nd class in
two-class model
Can Grow
1.0
1.0
0
Is Living
1.0
1.0
0
Has Roots
0.5
1.0
0
Has Leaves
0.4375
0.875
0
Has Branches
0.25
0.5
0
Has Bark
0.25
0.5
0
Has Petals
0.25
0.5
0
Has Gills
0.25
0
0.5
Has Scales
0.25
0
0.5
Can Swim
0.25
0
0.5
Can Fly
0.25
0
0.5
Has Feathers
0.25
0
0.5
Has Legs
0.25
0
0.5
Has Skin
0.5
0
1.0
Can See
0.5
0
1.0
Accounting for the network’s
representations with classes at different
levels of granularity
Regression Beta Weight
Living Thing
Plant
Tree
Pine
Bias
Epochs of Training
Overgeneralization of Frequent
Names to Similar Objects
“goat”
“tree”
“dog”
Why Does Overgeneralization of Frequent
Names Increase and then decrease?
• In the simulation shown, dogs are experienced 10 times
as much as any other animal, and there are 4 other
mammals, 8 other animals, and ten plants.
• In a one-class model, goat is a living thing:
– P(name is ‘Dog’|living thing) = 10/32 = ~.3
• In a two-class model, goat is an animal:
– P(name is ‘Dog’|animal) = 10/22 ~.5
• In a five class model, goat is a mammal:
– P(name is ‘Dog’|mammal) = 10/15 = .67
• In a 23 class model, goat is in a category by itself:
– P(name is ‘Dog’|goat) = 0
Sensitivity to Coherence
Requires Convergence
A
A
A
Inference and Generalization
in the PDP Model
• A semantic representation for a new item can
be derived by error propagation from given
information, using knowledge already stored
in the weights.
• Crucially:
– The similarity structure, and hence the
pattern of generalization depends on the
knowledge already stored in the weights.
Start with a neutral representation on the
representation units. Use backprop to
adjust the representation to minimize the
error.
The result is a representation similar to
that of the average bird…
Use the representation to
infer what this new thing can do.
Differential Importance
(Marcario, 1991)
• 3-4 yr old children see a puppet
and are told he likes to eat, or
play with, a certain object (e.g.,
top object at right)
– Children then must choose
another one that will “be the
same kind of thing to eat” or
that will be “the same kind of
thing to play with”.
– In the first case they tend to
choose the object with the
same color.
– In the second case they will
tend to choose the object
with the same shape.
Adjustments to
Training
Environment
•
•
•
Among the plants:
– All trees are large
– All flowers are small
– Either can be bright or
dull
Among the animals:
– All birds are bright
– All fish are dull
– Either can be small or
large
In other words:
– Size covaries with
properties that
differentiate different
types of plants
– Brightness covaries
with properties that
differentiate different
types of animals
Similarities of Obtained
Representations
Size is relevant
for Plants
Brightness is relevant
for Animals
Development and Degeneration
• Sensitivity to coherent covariation in an
appropriately structured Parallel Distributed
Processing system underlies the development
of conceptual knowledge.
• Gradual degradation of the representations
constructed through this developmental
process underlies the pattern of semantic
disintegration seen in semantic dementia.
Disintegration of Conceptual
Knowledge in Semantic Dementia
• Progressive loss of specific knowledge of
concepts, including their names, with
preservation of general information
• Overgeneralization of frequent names
• Illusory correlations
Picture naming
and drawing in
Sem. Demantia
Proposed Architecture for the
Organization of Semantic Memory
name
action
Temporal
pole
motion
color
valance
form
Medial Temporal Lobe
Rogers et al (2005) model of semantic
dementia
temporal
pole
name
function
assoc
vision
Errors in Naming for As a Function of Severity
Patient Data
Simulation Results
omissions
within categ.
superord.
Severity of Dementia
Fraction of Neurons Destroyed
Simulation of Delayed Copying
• Visual input is
presented, then
removed.
temporal
pole
name
function
assoc
vision
• After several time
steps, pattern is
compared to the
pattern that was
presented initially.
IF’s ‘camel’
DC’s ‘swan’
Simulation results
Omissions by feature type
Intrusions by feature type
Conclusion
• Distributed representations gradually differentiate in
ways that allow them to capture many phenomena in
conceptual development.
• Their behavior is approximated by a blend of Naïve
Bayes classifiers across several levels of granularity,
with the blending weights shifting toward finer grain
categories as learning progresses.
• Effects of damage are approximated by a reversal of
this tendency: degraded representations retain the
coarse-grained level knowledge but loose the finergrained information.
• We are currently extending the models to address the
sharing of knowledge across structurally related
domains, I’ll be glad to discuss this idea in response to
questions.