Emergence as a Perspective on Cognition and

Download Report

Transcript Emergence as a Perspective on Cognition and

Bayesian and Connectionist
Approaches to Learning
Tom Griffiths, Jay McClelland
Alison Gopnik, Mark Seidenberg
Who Are We and What Do
We Study?

We are
Cognitive and developmental psychologists who use
mathematical and computational models together with
experimental studies of children and adults

We study
Human cognitive processes ranging from object
recognition, language processing, and reading to
semantic cognition, naïve physics and causal reasoning
Our Question
How do probabilistic/Bayesian and
connectionist/neural network
models relate?
Brains all round…
Schedule

Tom Griffiths


Jay McClelland


Causal Reasoning
Mark Seidenberg


Connectionist/Neural Network Approaches
Alison Gopnik


Probabilistic/Bayesian Approaches
Language Acquision
Open Discussion

Robotics, Machine Learning, Other Applications…
Emergent Functions of
Simple Systems
J. L. McClelland
Stanford University
Topics



Emergent probabilistic optimization in neural
networks
Relationship between competence/rational
approaches and mechanistic (including
connectionist) approaches
Some models that bring connectionist and
probabilistic approaches into proximal contact
Connectionist Units Calculate Posteriors
based on Priors and Evidence

Given




If
A unit representing hypothesis hi, with binary inputs
j representing the state of various elements of
evidence e, where for all j p(ej) is assumed
conditionally independent given hi
A bias on the unit equal to log(priori/(1-priori))
Weights to the unit from each input equal to
log(p(ej|hi)/(log(p(ej|not hi))
the output of the unit is computed, taking the
logistic function of the net input
neti = biasi + Sj aj wij
ai = 1/[1+exp( -neti)]

Input from
unit j
wij
Then
ai = p(hi|e)



A set units for mutually exclusive alternatives can
assign the posterior probability to each in a similar
way, using the softmax activation function
ai = exp(gneti)/Si’ exp(gneti’)
If g = 1, this constitutes probability matching.
As g increases, more and more of the activation
goes to the most likely alternative(s).
Unit i
Emergent Outcomes from Local Computations
(Hopfield, ’82, Hinton & Sejnowski, ’83)

If wij = wji and if units are updated asynchronously,
setting
ai = 1 if neti >0, ai = 0 otherwise
A network will settle to a state s which is a local
maximum in a measure Rumelhart et al (1986) called G


G(s) = Si<j wij aiaj + Si ai(biasi + exti)
If each unit sets its activation to 1 with probability
logistic(gneti) then
p(s) = exp(gG(s))/Ss’(exp(gG(s’))
A Tweaked Connectionist Model (McClelland &
Rumelhart, 1981) that is Also a Graphical Model

Each pool of units in the IA model is equivalent to
a Dirichlet variable (c.f. Dean, 2005).

This is enforced by using softmax to set one of the
ai in each pool to 1 with probability:
pj = egnetj/Sj’egnetj’

Weight arrays linking the variables are equivalent
of the ‘edges’ encoding conditional relationships
between states of these different variables.

Biases at word level encode prior p(w).

Weights are bi-directional, but encode generative
constraints (p(l|w), p(f|l)).

At equilibrium with g = 1, network’s probability of
being in state s equals p(s|I).
But that’s not the true PDP approach
to Perception/Cognition/etc…

We want to learn how to represent the world
and constraints among its constituents from
experience, using (to the fullest extent
possible) a domain-general approach.

In this context, the prototypical
connectionist learning rules correspond to
probability maximization or matching

Back Propagation Algorithm:


Dwij = e diaj
Maximizes p(oi|I) for each output unit.
Boltmann Machine Learning Algorithm:
Dwij = e (ai+aj+ - ai-aj-)

Learns to match probabilities of entire output
states o given current Input. That is, it
minimizes
∫p(o|I) log(p(o|I)/q(o|I)) do
I
o
Recent Developments
Hinton’s deep belief
networks are fully distributed
learned connectionist models
that use a restricted form of
the Boltzmann machine (no
intra-layer connections).
They are fast and beat other
machine learning methods.
Adding generic constraints
(sparsity, locality) allow such
networks to learn efficiently
and generalize very well in
demanding task contexts.
Hinton, Osindero, and Teh (2006). A fast
learning algorithm for deep belief networks.
Neural Computation, 18, 1527-54.
Topics
Emergent probabilistic optimization in neural
networks
 Relationship between competence/rational
approaches and mechanistic (including
connectionist) approaches
 Some models that bring connectionist and
probabilistic approaches into proximal contact

Two perspectives

People are rational, their behavior
is optimal.


They seek explicit internal models
of the structure of the world,
within which to reason.
 Optimal structure type for each
domain
 Optimal structure instance
within type




People evolved through an
optimization process, and are likely
to approximate
optimality/rationality within limits.
Fundamental aspects of
natural/intuitive cognition may
depend largely on implicit
knowledge.
Natural structure (e.g. language)
does not exactly correspond to any
specific structure type.
Culture/School encourages us to
think and reason explicitly, and
gives us tools for this; we do so
under some circumstances.
Many connectionist models do not
directly address this kind of
thinking; eventually they should be
elaborated to do so.
Two Perspectives, Cont’d

Resource limits and implementation
constraints are unknown, and should
be ignored in determining what is
rational/optimal.

Human behavior won’t be understood
without considering the constraints it
operates under.



Inference is still hard, and prior
domain-specific constraints are
therefore essential.


Determining what is optimal sans
constraints is always useful, even so
Such an effort should not presuppose
individual humans intend to derive an
explicit model.
Inference is hard, and domain specific
priors can help, but domain-general
mechanisms subject to generic
constraints deserve full exploration.
In some cases such models may closely
approximate what might be the optimal
explicit model.

But that model might only be an
approximation and the domain-specific
constraints might not be necessary.
Perspectives on Development


A competence-level
approach can ask, what is
the best representation a
child could have given
the data gathered to
date?
The entire data sample is
retained, and the optimal
model is re-estimated

The developing child is
an on-line learning
system; the parameters
of the mind are adjusted
as each new experience
comes in, and the
experiences themselves
are rapidly lost.
Is a Convergence Possible?


Yes!
It is possible to ask what is optimal/rational within any set of
constraints.






Time
Architecture
Algorithm
Reliability and dynamics of the hardware
It is then possible to ask how close some mechanism actually
comes to achieving optimality, within the specified constraints.
It is also possible to ask how close it comes to explaining actual
human performance, including performance in learning and
response to experience during development.
Topics
Emergent probabilistic optimization in neural
networks
 Relationship between competence/rational
approaches and mechanistic (including
connectionist) approaches
 Some models that bring connectionist and
probabilistic approaches into proximal contact

Models that Bring Connectionist and
Probabilistic Approaches into Proximal
Contact

Graphical IA model of Context Effects in Perception


Leaky Competing Accumulator Model of Decision
Dynamics


Usher and McClelland, 2001, and the large family of related
decision making models
Models of Unsupervised Category Learning


In progress; see Movellan & McClelland, 2001.
Competitive Learning, OME, TOME (Lake et al, ICDL08).
Subjective Likelihood Model of Recognition Memory

McClelland and Chappell, 1998; c.f. REM, Steyvers and
Shiffrin, 1997), and a forthcoming variant using distributed
item representations.