Transcript Part 1

● September 2005
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
Connectionist
Knowledge Representation and Reasoning
SCREECH
(Part I)
Neural Networks and Structured Knowledge
Fodor, Pylyshin: What’s deeply wrong with Connectionist architecture is this:
Because it acknowledges neither syntactic nor semantic structure in mental
representations, it perforce treats them not as a generated set but as a list.
[Connectionism and Cognitive Architecture 88]
Our claim: state-of-the-art connectionist architectures do adequately deal
with structures!
Slide 1
● September 2005
Tutorial Outline (Part I)
Neural networks and structured knowledge
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
• Feedforward networks
– The good old days: KBANN and co.
– Useful: neurofuzzy systems, data mining pipeline
– State of the art: structure kernels
• Recurrent networks
– The basics: Partially recurrent networks
– Lots of theory: Principled capacity and limitations
– To do: Challenges
• Recursive data structures
– The general idea: recursive distributed representations
– One breakthrough: recursive networks
– Going on: towards more complex structures
Slide 2
● September 2005
Tutorial Outline (Part I)
Neural networks and structured knowledge
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
• Feedforward networks
– The good old days: KBANN and co.
– Useful: neurofuzzy systems, data mining pipeline
– State of the art: structure kernels
• Recurrent networks
– The basics: Partially recurrent networks
– Lots of theory: Principled capacity and limitations
– To do: Challenges
• Recursive data structures
– The general idea: recursive distributed representations
– One breakthrough: recursive networks
– Going on: towards more complex structures
Slide 3
● September 2005
The good old days – KBANN and co.
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
feedforward neural network
y
x
neuron
x1
1. black box
2. distributed
representation
3. connection to
rules for
symbolic I/O ?
w1
w
x2
θ
2
…
fw :ℝn  ℝo
wn
xn
σ(wtx - θ)
σ(t) = sgd(t)
= (1+e-t)-1
Slide 4
● September 2005
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
The good old days – KBANN and co.
Knowledge Based Artificial Neural Networks [Towell/Shavlik,
AI 94]:
• start with a network which represents known rules
• train using additional data
• extract a set of symbolic rules after training
data
(partial)
rules
train
(complete)
rules
Slide 5
● September 2005
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
The good old days – KBANN and co.
(partial)
rules
propositional acyclic
Horn clauses such as
A :- B,C,D
E :- A
FNN with one neuron per
boolean variable
B
1 2.5
0.5
1
1
C
1
A
E
D
Slide 6
● September 2005
The good old days – KBANN and co.
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
data
train
use some form of backpropagation,
add a penalty to the error e.g. for changing the weights
1. The initial network biases the training result, but
2. There is no guarantee that the initial rules are preserved
3. There is no guarantee that the hidden neurons maintain their
semantic
Slide 7
● September 2005
The good old days – KBANN and co.
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
(complete)
rules
1. There is no exact direct correspondence of a neuron and a
single rule, although each neuron (and the overall mapping)
can be approximated by a set of rules arbitrarily well
2. It is NP complete to find a minimum logical description for a
trained network [Golea, AISB'96]
3. Therefore, a couple of different rule extraction algorithms
have been proposed, and this is still a topic of ongoing
research
Slide 8
● September 2005
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
The good old days – KBANN and co.
(complete)
rules
decompositional approach:
rules
pedagogical approach:
x
y
rules
Slide 9
● September 2005
The good old days – KBANN and co.
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
Decompositional approaches:
• subset algorithm, MofN algorithm describe single neurons by
sets of active predecessors [Craven/Shavlik, 94]
• local activation functions (RBF like) allow an approximate direct
description of single neurons [Andrews/Geva, 96]
• MLP2LN biases the weights towards 0/-1/1 during training and
can then extract exact rules [Duch et al., 01]
• prototype based networks can be decomposed along relevant
input dimensions by decision tree nodes [Hammer et al., 02]
Observation:
• usually some variation of if-then rules is achieved
• small rule sets are only achieved if further constraints
guarantee that single weights/neurons have a meaning
• tradeoff between accuracy and size of the description
Slide 10
● September 2005
The good old days – KBANN and co.
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
Pedagogical approaches:
• extraction of conjunctive rules by extensive search [Saito/Nakano
88]
•
•
•
•
interval propagation [Gallant 93, Thrun 95]
extraction by minimum separation [Tickle/Diderich, 94]
extraction of decision trees [Craven/Shavlik, 94]
evolutionary approaches [Markovska, 05]
Observation:
• usually some variation of if-then rules is achieved
• symbolic rule induction required with a little (or a bit more) help
of a neural network
Slide 11
● September 2005
The good old days – KBANN and co.
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
Where is this good for?
• Nobody uses FNNs these days 
• Insertion of prior knowledge might be valuable. But efficient training
algorithms allow to substitute this by additional training data (generated
via rules) 
• Validation of the network output might be valuable, but there exist
alternative (good) guarantees from statistical learning theory 
• If-then rules are not very interesting since there exist good symbolic
learners for learning propositional rules for classification 
•
•
•
•
Propositional rule insertion/extraction is often an essential part of more
complex rule insertion/extraction mechanisms 
Demonstrates a key problem, different modes of representation, in a very
nice way 
Some people e.g. in the medical domain also want an explanation for a
classification 
There are at least two application domains where if-then rules are very
interesting and not so easy to learn: fuzzy-control and unsupervised data
mining 
Slide 12
● September 2005
Tutorial Outline (Part I)
Neural networks and structured knowledge
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
• Feedforward networks
– The good old days: KBANN and co.
– Useful: neurofuzzy systems, data mining pipeline
– State of the art: structure kernels
• Recurrent networks
– The basics: Partially recurrent networks
– Lots of theory: Principled capacity and limitations
– To do: Challenges
• Recursive data structures
– The general idea: recursive distributed representations
– One breakthrough: recursive networks
– Going on: towards more complex structures
Slide 13
● September 2005
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
Useful: neurofuzzy systems
process
input
observation
control
Fuzzy control:
if (observation є FMI) then (control є FMO)
Slide 14
● September 2005
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
Useful: neurofuzzy systems
Fuzzy control:
if (observation є FMI) then (control є FMO)
Neurofuzzy control:
observation
control
implementation of the fuzzy-if-then rules
Benefit: the form of the fuzzy rules (i.e. neural architecture) and the
shape of the fuzzy sets (i.e. neural weights) can be learned from data!
Slide 15
● September 2005
Useful: neurofuzzy systems
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
• NEFCON implements Mamdani control
[Nauck/Klawonn/Kruse, 94]
• ANFIS implements Takagi-Sugeno control [Jang, 93]
• and many other …
• Learning
– of rules: evolutionary or clustering
– of fuzzy set parameters: reinforcement learning or
some form of Hebbian learning
Slide 16
● September 2005
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
Useful: data mining pipeline
Task: describe given inputs (no class information) by ifthen rules
• Data mining with emergent SOM, clustering, and rule
extraction [Ultsch, 91]
rules
Slide 17
● September 2005
Tutorial Outline (Part I)
Neural networks and structured knowledge
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
• Feedforward networks
– The good old days: KBANN and co.
– Useful: neurofuzzy systems, data mining pipeline
– State of the art: structure kernels
• Recurrent networks
– The basics: Partially recurrent networks
– Lots of theory: Principled capacity and limitations
– To do: Challenges
• Recursive data structures
– The general idea: recursive distributed representations
– One breakthrough: recursive networks
– Going on: towards more complex structures
Slide 18
● September 2005
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
State of the art: structure kernels
sets,
sequences,
tree structures,
graph structures
kernel
k(x,x’)
SVM
data
just compute
pairwise distances for
this complex data using
structure information
Slide 19
● September 2005
State of the art: structure kernels
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
• Closure properties of kernels [Haussler, Watkins]
• Principled problems for complex structures:
computing informative graph kernels is at least as
hard as graph isomorphism [Gärtner]
• Several promising proposals - taxonomy [Gärtner]:
count common
substructures
syntax
semantic
derived from
local transformations
derived from a
probabilistic model
Slide 20
● September 2005
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
State of the art: structure kernels
Count common substructures:
GA AG AT
GAGAGA
GAT
3
1
2
0
0
1
3
Efficient computation:
dynamic programming
suffix trees
locality improved kernel [Sonnenburg et al.], bag of words [Joachims]
string kernel [Lodhi et al.], spectrum kernel [Leslie et al.]
word-sequence kernel [Cancedda et al.]
convolution kernels for language [Collins/Duffy, Kashima/Koyanagi, Suzuki et al.]
kernels for relational learning [Zelenko et al.,Cumby/Roth, Gärtner et al.]
graph kernels based on paths or subtrees [Gärtner et al.,Kashima et al.]
kernels for prolog trees based on similar symbols [Passerini/Frasconi/deRaedt]
Slide 21
State of the art: structure kernels
● September 2005
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
Derived from a probabilistic model:
describe by
probabilistic
model P(x)
compare
characteristics
of P(x)
Fisher kernel [Jaakkola et al., Karchin et al., Pavlidis et al.,
Smith/Gales, Sonnenburg et al., Siolas et al.]
tangent vector of log odds [Tsuda et al.]
marginalized kernels [Tsuda et al., Kashima et al.]
kernel of Gaussian models [Moreno et al., Kondor/Jebara]
Slide 22
State of the art: structure kernels
● September 2005
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
Derived from local transformations:
is
similar
to
expand to a
global kernel
local neighborhood, generator H
diffusion kernel [Kondor/Lafferty, Lafferty/Lebanon, Vert/Kanehisa]
Slide 23
● September 2005
State of the art: structure kernels
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
• Intelligent preprocessing (kernel extraction) allows an adequate
integration of semantic/syntactic structure information
• This can be combined with state of the art neural methods such
as SVM
• Very promising results for
– Classification of documents, text [Duffy, Leslie, Lodhi, …]
– Detecting remote homologies for genomic sequences and
further problems in genome analysis [Haussler, Sonnenburg, Vert,
…]
– Quantitative structure activity relationship in chemistry [Baldi et
al.]
Slide 24
● September 2005
Conclusions: feedforward networks
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
• propositional rule insertion and extraction (to some
extend) are possible 
• useful for neurofuzzy systems, data mining 
• structure-based kernel extraction followed by learning
with SVM yields state of the art results 
but: sequential instead of fully integrated neurosymbolic approach 
• FNNs itself are restricted to flat data which can be
processed in one shot. No recurrence 
Slide 25
● September 2005
Tutorial Outline (Part I)
Neural networks and structured knowledge
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
• Feedforward networks
– The good old days: KBANN and co.
– Useful: neurofuzzy systems, data mining pipeline
– State of the art: structure kernels
• Recurrent networks
– The basics: partially recurrent networks
– Lots of theory: principled capacity and limitations
– To do: challenges
• Recursive data structures
– The general idea: recursive distributed representations
– One breakthrough: recursive networks
– Going on: towards more complex structures
Slide 26
● September 2005
The basics : partially recurrent networks
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
[Elman, Finding structure
in time, CogSci 90]
very natural architecture for
processing speech/temporal
signals/control/robotics
1. can process time series
of arbitrary length
2. interesting for speech
processing [see e.g. Kremer,
xt+1 = f(xt,It)
02]
3. training using a variation
of backpropagation [see
e.g. Pearlmutter, 95]
Slide 27
● September 2005
Tutorial Outline (Part I)
Neural networks and structured knowledge
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
• Feedforward networks
– The good old days: KBANN and co.
– Useful: neurofuzzy systems, data mining pipeline
– State of the art: structure kernels
• Recurrent networks
– The basics: Partially recurrent networks
– Lots of theory: Principled capacity and limitations
– To do: Challenges
• Recursive data structures
– The general idea: recursive distributed representations
– One breakthrough: recursive networks
– Going on: towards more complex structures
Slide 28
Lots of theory: principled capacity and
limitations
● September 2005
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
RNNs and finite automata [Omlin/Giles, 96]
input
output
state
dynamics of the transition function of a DFA
Slide 29
● September 2005
Lots of theory: principled capacity and
limitations
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
DFA  RNN
unary input
unary state representation
implement
(approximate) the
boolean formula
corresponding to the
state transition within
a two-layer network
 RNNs can exactly simulate finite automata
Slide 30
● September 2005
Lots of theory: principled capacity and
limitations
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
RNN  DFA
unary input
in general: distributed
state representation
cluster into disjoint subsets
corresponding to states
and observe their behavior
 approximate description
 approximate extraction of automata rules is possible
Slide 31
● September 2005
Lots of theory: principled capacity and
limitations
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
The principled capacity of RNNs can be characterized
exactly:
RNNs with arbitrary weights = non uniform Boolean circuits
(super Turing capability) [Siegelmann/Sontag]
RNNs with rational weights = Turing machines
[Siegelmann/Sontag]
RNNs with limited noise = finite state automata
[Omlin/Giles, Maass/Orponen]
RNNs with small weights or Gaussian noise
= finite memory models [Hammer/Tino, Maass/Sontag]
Slide 32
● September 2005
Lots of theory: principled capacity and
limitations
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
However, learning might be difficult:
• gradient based learning schemes face the problem of long-termdependencies [Bengio/Frasconi]
error
• RNNs are not PAC-learnable (infinite VC-dim), only distribution
dependent bounds can be derived [Hammer]


tatatatatatatatatatatata tatatatatata tatata
• there exist only few general guarantees for the long term
behavior of RNNs, e.g. stability [Suykens, Steil, …]
•
tatatatatatatatatatatatatatatatatatatatatatatatatata
Slide 33
● September 2005
Lots of theory: principled capacity and
limitations
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
RNNs
• naturally process time series 
• incorporate plausible regularization such as a bias towards
finite memory models 
• have sufficient power for interesting dynamics (context free,
context sensitive; arbitrary attractors and chaotic behavior) 
but:
• training is difficult 
• only limited guarantees for the long term behavior and
generalization ability 
 symbolic description/knowledge can provide solutions
Slide 34
● September 2005
Lots of theory: principled capacity and
limitations
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
recurrent symbolic system
RNN
correspondence?
e.g. attractor/repellor
for counting  anbncn
real numbers;
iterated function
systems give rise to
fractals/attractors/chaos,
implicit memory
discrete states;
crisp boolean function
on the states + explicit
memory
Slide 35
● September 2005
Tutorial Outline (Part I)
Neural networks and structured knowledge
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
• Feedforward networks
– The good old days: KBANN and co.
– Useful: neurofuzzy systems, data mining pipeline
– State of the art: structure kernels
• Recurrent networks
– The basics: Partially recurrent networks
– Lots of theory: Principled capacity and limitations
– To do: Challenges
• Recursive data structures
– The general idea: recursive distributed representations
– One breakthrough: recursive networks
– Going on: towards more complex structures
Slide 36
● September 2005
To do: challenges
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
Training RNNs:
• search for appropriate regularizations inspired by a focus on
specific functionalities – architecture (e.g. local), weights (e.g.
bounded), activation function (e.g. linear), cost term (e.g.
additional penalties) [Hochreiter,Boden,Steil,Kremer…]
• insertion of prior knowledge – finite automata and beyond (e.g.
context free/sensitive, specific dynamical patterns/attractors)
[Omlin,Croog,…]
Long term behavior:
• enforce appropriate constraints while training
• investigate the dynamics of RNNs – rule extraction, investigation
of attractors, relating dynamics and symbolic processing
[Omlin,Pasemann,Haschke,Rodriguez,Tino,…]
Slide 37
To do: challenges
Some further issues:
• processing spatial data
● September 2005
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
[bicausal networks, Pollastri et al.,
contextual RCC, Micheli et al.]
• unsupervised processing
[TKM, Chappell/Taylor, RecSOM, Voegtlin,
SOMSD, Sperduti et al., MSOM, Hammer
et al., general formulation, Hammer et al.]
x1,x2,x3,x4,…
Slide 38
● September 2005
Conclusions: recurrent networks
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
• the capacity of RNNs is well understood and
promising e.g. for natural language processing,
control, … 
• recurrence of symbolic systems has a natural
counterpart in the recurrence of RNNs 
• training and generalization faces problems which
could be solved by hybrid systems 
• discrete dynamics with explicit memory versus realvalued iterated function systems 
• sequences are nice, but not enough 
Slide 39
● September 2005
Tutorial Outline (Part I)
Neural networks and structured knowledge
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
• Feedforward networks
– The good old days: KBANN and co.
– Useful: neurofuzzy systems, data mining pipeline
– State of the art: structure kernels
• Recurrent networks
– The basics: Partially recurrent networks
– Lots of theory: Principled capacity and limitations
– To do: Challenges
• Recursive data structures
– The general idea: recursive distributed representations
– One breakthrough: recursive networks
– Going on: towards more complex structures
Slide 40
The general idea: recursive distributed
representations
● September 2005
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
How to turn tree structures/acyclic graphs into a
connectionist representation?
vector
Slide 41
● September 2005
The general idea: recursive distributed
representations
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
recursion!
vector
inp.
output
cont. cont.
f:ℝixℝcxℝcℝc
yields
fenc where
fenc(ξ)=0
fenc(a(l,r))=
f(a,fenc(l),fenc(r))
Slide 42
The general idea: recursive distributed
representations
label
left
image
right
cont. cont.
encoding
inp.
f:ℝn+2cℝc
g:ℝcℝo
● September 2005
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
h:ℝoℝn+2
encoding:
fenc:(ℝn)2*ℝc
fenc(ξ) = 0
fenc(a(l,r)) = f(a,fenc(l),fenc(r))
decoding:
hdec: ℝo (ℝn)2*
hdec(0) = ξ
hdec(x) =
h0(x) (hdec(h1(x)), hdec(h2(x)))
o
Slide 43
● September 2005
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
The general idea: recursive distributed
representations
•
recursive distributed description [Hinton,90]
– general idea without concrete implementation 
•
•
•
tensor construction [Smolensky, 90]
label
inp.
code
cont.
•
left
cont.
right
– encoding/decoding given by (a,b,c)  a⊗b⊗c
– increasing dimensionality 
Holographic reduced representation [Plate, 95]
– circular correlation/convolution
– fixed encoding/decoding with fixed dimensionality (but potential loss of
information) 
– necessity of chunking or clean-up for decoding 
Binary spatter codes [Kanerva, 96]
– binary operations, fixed dimensionality, potential loss
– necessity of chunking or clean-up for decoding 
RAAM [Pollack,90], LRAAM [Sperduti, 94]
– trainable networks, trained for the identity, fixed dimensionality
– encoding optimized for the given training set 
Slide 44
● September 2005
The general idea: recursive distributed
representations
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
Nevertheless: results not promising 
Theorem [Hammer]:
• There exists a fixed size neural network which can
uniquely encode tree structures of arbitrary depth
with discrete labels 
• For every code, decoding of all trees up to height T
requires Ω(2T) neurons for sigmoidal networks 
 encoding seems possible, but no fixed size
architecture exists for decoding
Slide 45
● September 2005
Tutorial Outline (Part I)
Neural networks and structured knowledge
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
• Feedforward networks
– The good old days: KBANN and co.
– Useful: neurofuzzy systems, data mining pipeline
– State of the art: structure kernels
• Recurrent networks
– The basics: Partially recurrent networks
– Lots of theory: Principled capacity and limitations
– To do: Challenges
• Recursive data structures
– The general idea: recursive distributed representations
– One breakthrough: recursive networks
– Going on: towards more complex structures
Slide 46
● September 2005
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
One breakthrough: recursive networks
Recursive networks [Goller/Küchler, 96]:
• do not use decoding
• combine encoding and mapping
• train this combination directly for the given task with
backpropagation through structure
 efficient data and problem adapted encoding is learned
encoding
transformation
inp.
output
cont. cont.
y
Slide 47
● September 2005
One breakthrough: recursive networks
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
Applications:
• term classification [Goller, Küchler, 1996]
• automated theorem proving [Goller, 1997]
• learning tree automata [Küchler, 1998]
• QSAR/QSPR problems [Schmitt, Goller, 1998; Bianucci, Micheli,
•
•
•
•
•
•
•
Sperduti, Starita, 2000; Vullo, Frasconi, 2003]
logo recognition, image processing [Costa, Frasconi, Soda, 1999,
Bianchini et al. 2005]
natural language parsing [Costa, Frasconi, Sturt, Lombardo, Soda,
2000,2005]
document classification [Diligenti, Frasconi, Gori, 2001]
fingerprint classification [Yao, Marcialis, Roli, Frasconi, Pontil, 2001]
prediction of contact maps [Baldi, Frasconi, Pollastri, Vullo, 2002]
protein secondary structure prediction [Frasconi et al., 2005]
…
Slide 48
● September 2005
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
One breakthrough: recursive networks
Desired: approximation completeness - for every
(reasonable) function f and ε>0 exists a RecNN which
approximates f up to ε (with appropriate distance measure)
Approximation properties can be measured in several ways:
given f, ε, probability P, data points xi, find fw such that
• P(x | |f(x)-fw(x)| > ε ) small (L1 norm) or
• |f(x)-fw(x)| < ε for all x (max norm) or
• f(xi) = fw(xi) for all xi (interpolation of points)
Slide 49
● September 2005
One breakthrough: recursive networks
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
Approximation properties for RecNNs and tree-structured data:
 … capable of approximating every continuous function in maxnorm with restricted height, every measurable function in L1norm (σ:squashing) [Hammer]
 … capable of interpolating every set {f(x1),…,f(xm)} with O(m2)
neurons (σ:squashing, C2 in environment of t s.t. σ‘‘(t)≠0) [Hammer]
 … can approximate every tree automaton for arbitrary large
inputs [Küchler]
 ... cannot approximate every f:{1}2*{0,1} (for realistic σ) [Hammer]
… fairly good results - 3:1 
Slide 50
● September 2005
Tutorial Outline (Part I)
Neural networks and structured knowledge
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
• Feedforward networks
– The good old days: KBANN and co.
– Useful: neurofuzzy systems, data mining pipeline
– State of the art: structure kernels
• Recurrent networks
– The basics: Partially recurrent networks
– Lots of theory: Principled capacity and limitations
– To do: Challenges
• Recursive data structures
– The general idea: recursive distributed representations
– One breakthrough: recursive networks
– Going on: towards more complex structures
Slide 51
● September 2005
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
Going on: towards more complex structures
More general trees:
arbitrary number of not positioned children
fenc=f(1/|ch| · ∑ch w · fenc(ch) · labeledge,label)
approximation complete
for appropriate edge labels
[Bianchini et al. 2005]
Slide 52
● September 2005
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
Going on: towards more complex structures
Planar graphs:
…
[Baldi,Frasconi,…,2002]
…
Slide 53
● September 2005
Going on: towards more complex structures
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
Acyclic graphs:
q-1
q+1
q-1
q-1
Contextual cascade correlation
[Micheli,Sperduti,03]
Approximation complete
(under a mild structural restriction)
even for structural transduction [Hammer,Micheli,Sperduti,05]
Slide 54
● September 2005
Going on: towards more complex structures
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
Cyclic graphs:
neighbor
[Micheli,05]
Slide 55
● September 2005
Conclusions: recursive networks
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
• Very promising neural architectures for direct
processing of tree structures 
• Successful applications and mathematical
background 
• Connections to symbolic mechanisms (tree
automata) 
• Extensions to more complex structures (graphs) are
under development 
• Only few approaches which achieve structured
outputs 
Slide 56
● September 2005
Tutorial Outline (Part I)
Neural networks and structured knowledge
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
• Feedforward networks
– The good old days: KBANN and co.
– Useful: neurofuzzy systems, data mining pipeline
– State of the art: structure kernels
• Recurrent networks
– The basics: Partially recurrent networks
– Lots of theory: Principled capacity and limitations
– To do: Challenges
• Recursive data structures
– The general idea: recursive distributed representations
– One breakthrough: recursive networks
– Going on: towards more complex structures
Slide 57
● September 2005
Conclusions (Part I)
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
Overview literature:
• FNN and rules: Duch,Setiono,Zurada,Computational intelligence methods
•
•
•
•
•
•
for understanding of data, Proc. of the IEEE 92(5):771- 805, 2004
Structure kernels: Gärtner,Lloyd,Flach, Kernels and distances for structured
data, Machine Learning, 57, 2004. (new overview is forthcoming)
RNNs: Hammer,Steil, Perspectives on learning with recurrent networks, in:
Verleysen, ESANN'2002, D-side publications, 357-368, 2002
RNNs and rules: Jacobsson, Rule extraction from recurrent neural
networks: a taxonomy and review, Neural Computation, 17:1223-1263, 2005
Recursive representations: Hammer, Perspectives on Learning Symbolic
Data with Connectionistic Systems, in: Kühn, Menzel, Menzel, Ratsch, Richter,
Stamatescu, Adaptivity and Learning, 141-160, Springer, 2003.
Recursive networks: Frasconi,Gori,Sperduti, A General Framework for
Adaptive Processing of Data Structures, IEEE Transactions on Neural Networks,
9(5):768-786,1998
Neural networks and structures: Hammer,Jain, Neural methods for nonstandard data, in: Verleysen, ESANN'2004, D-side publications, 281-292, 2004
Slide 58
● September 2005
Conclusions (Part I)
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
• There exist networks which can directly deal with
structures (sequences, trees, graphs) with good
success: kernel machines, recurrent and recursive
networks
• Efficient training algorithms and theoretical
foundations exist
• (Loose) connections to symbolic processing have
been established and indicate benefits
Now: towards strong connections
 PART II: Logic and neural networks
Slide 59
● September 2005
AIFB
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany
Slide 60