The Principle of Presence:

Download Report

Transcript The Principle of Presence:

The Principle of Presence:
A Heuristic for Growing
Knowledge Structured Neural
Networks
Laurent Orseau,
INSA/IRISA, Rennes, France
Neural Networks

Efficient at learning single problems



Fully connected
Convergence in W3
Lifelong learning:



Specific cases can be important
More knowledge, more weights
Catastrophic forgetting
-> Full connectivity not suitable
-> Need localilty
How can people learn so fast?


Focus, attention
Raw table storing?




Frog and
Car and
Running woman
With generalization
What do people memorize? (1)




1 memory: a set of « things »
Things are made of other, simpler
things
Thing=concept
Basic concept=perceptual event
What do people memorize? (2)

Remember only what is present in mind
at the time of memorization:




What is seen
What is heard
What is thought
Etc.
What do people memorize? (3)

Not what is not in mind!


Too many concepts are known
What is present:



What is absent:



Few things
Probably important
Many things
Probably unrelevant
Good but not always true -> heuristic
Presence in everyday life




Easy to see what is present,
harder to tell what is missing
Infants lose attention to balls that have
just disappeared
The zero number invented long after
other digits
Etc.
The principle of presence



Memorization = create a new concept
upon only active concepts
Independant of the number of known
concepts
Few active concepts
-> few variables
-> fast generalization
Implications




A concept can be active or inactive.
Activity must reflect importance, be rare
~ event (programming)
New concept = conjunction of actives ones
Concepts must be re-usable(lifelong):


Re-use = create a link from this concept
2 independant concepts = 2 units
-> More symbolic than MLP: a neuron can
represent too many things
Implementation: NN



Nonlinearity
Graphs properties: local or global connectivity
Weights:



Smooth on-line generalization
Resistant to noise
But more symbolic:



Inactivity: piecewise continuous activation function
Knowledge not too much distributed
Concepts not too much overlapping
First implementation




Inputs: basic events
Output: target concept
No macro-concept:
-> 3-layer
Neuron = conjunction,
unless explicit (supervised learning),
-> DNF

Output weights simulate priority
Locality in learning

Only one neuron modified at a time:


If target concept not activated when it
should:



Nearest = most activated
Generalize the nearest connected neuron
Add a neuron for that specific case
If target active, but not enough or too much:

Generalize the most activating neuron
Learning: example (0)


Must learn AB.
Examples: ABC, ABD, ABE, but not AB.
A
B
Inputs:
C
D
E
AB
Target already exists
Learning: example (1)

ABC: N1 active when A, B and C all active
A
B
C
D
Disjunction
1/3
2/3
1/3
N1
1
AB
1/3
Conjunction
1
E
0
1-1/Ns
1
Learning : example (2)

ABD:
A
2/3
>1/3
1/3
N1
>1/3
1/3
B
<1/3
1/3
C
1/3
D
E
1/3
1/3
2/3
N2
1
AB
1
Learning : example (3)

ABE: N1 slightly active for AB
A
>>1/3
>1/3
>>1/3
>1/3
2/3
N1
B
<<1/3
<1/3
C
1/3
D
E
1/3
1/3
2/3
N2
1
AB
1
Learning : example (4)

Final: N1 has generalized, active for AB
A
1/2
D
E
N1
1/2
B
C
2/3
1
0
1/3
1/3
1/3
2/3
N2
AB
1
Unuseful neuron
Deleted by criterion
NETtalk task

TDNN: 120 neurons, 25.200 cnx, 90%
Presence: 753 neurons, 6.024 cnx, 74%
Then learns by heart

If inputs activity reversed


-> catastrophic!

Many cognitive tasks heavily biased
toward the principle of presence?
Advantages w/r NNs


As many inputs as wanted, only active
ones are used
Lifelong learning:



Large scale networks
Learns specific cases and generalizes, both
quickly
Can lower weights without wrong
prediction -> imitation
But…



Few data, limiting the number of
neurons:
not as good as backprop
Creates many neurons (but can be
deleted)
No negative weights
Work in progress

Negative case, must stay rare


Inhibitory links
Re-use of concepts

Macro-concepts: each concept can become
an input