Neural Networks 2 - Monash University

Download Report

Transcript Neural Networks 2 - Monash University

COT5230 Data Mining
Week 6
Neural Networks 2
MONASH
AUSTRALIA’S
INTERNATIONAL
UNIVERSITY
Neural Networks 2
6.1
Lecture Outline
Self-Organizing Maps (SOMs)
 Motivation
 Biological background
 Algorithm
 Data mining example
Neural Networks 2
6.2
Motivation - 1
 The feed-forward back-propagation NNs
discussed last week are an example of a
supervised learning technique
 In supervised learning, the aim is to discover a
relationship between the inputs and outputs of a
system
 This relationship can be used for tasks such as
prediction, estimation or classification
 A known training set of input/output pairs is used
to train the network
Neural Networks 2
6.3
Motivation - Unsupervised Learning
 Many data mining tasks are not suited to this
approach
 Often the data mining task is to discover
structure in the data set, without any prior
knowledge of what is there
 This is an example of unsupervised learning
(we have already seen the example of the Kmeans clustering algorithm)
 A class of neural networks called SelfOrganizing Maps (SOMs) can be used for this
task
Neural Networks 2
6.4
The Cortex - 1
 SOMs research was inspired by the observation
of topologically correct sensory maps in the
cortex (e.g. the retinotopic, somatotopic, tonotopic
maps)
 In humans, the cortex consists of a layer of nerve
tissue about 0.2m2 in area and 2-3mm in
thickness
 It is highly convoluted to save space, and forms
the exterior of the brain - it’s the folded, wrinkled
stuff we see when we look at a brain
Neural Networks 2
6.5
The Cortex - 2
Lateral (schematic) view of the human left-brain hemisphere.
Various cortical areas devoted to specialized tasks can be
distinguished
Neural Networks 2
6.6
Sensory Surfaces
 Most signals that the brain receives from the
environment come from “sensory surfaces”
covered with receptors:
– skin (touch and temperature)
– retina (vision)
– cochlea [in the ear] (1-D sound sensor)
 It is usually found that the “wiring” of the nervous
system exhibits topographic ordering:
– signals from adjacent receptors tend to be conducted
to adjacent neurons in the cortex
Neural Networks 2
6.7
Topographic Feature Maps - 1
 This neighbourhood-preserving organization of
the cortex is called a topographic feature map
– For touch, maps of the body are found in the
somatosensory cortex
– In the primary visual cortex, neighbouring neurons tend
to respond to stimulation of neighbouring regions of the
retina
 As well as these simple maps, the brain also
constructs topographic maps of abstract
features:
– In the auditory cortex of many higher brains, a
tonotopic map is found, where the pitch of received
sounds is mapped regularly
Neural Networks 2
6.8
Topographic Feature Maps - 2
Map of part of the body surface in
the somatosensory cortex of a
monkey
Direction map for sound signals
in the so-called “optical tectum”
of an owl
Neural Networks 2
6.9
Biological Self-Organizing Maps - 1
 The subject of SOMs arose from the question of
how such topology-preserving mappings might
arise in neural networks
 It is probable that in biological systems that much
of the organization of such maps is genetically
determined, BUT:
 The brain is estimated to have ~1013 synapses
(connections), so it would be impossible to
produce this organization by specifying each
connection in detail
Neural Networks 2
6.10
Biological Self-Organizing Maps - 2
 A more likely scenario is that there are genetically
specifed mechanisms of structure formation
that result in the creation of the desired
connectivity
 These could operate before birth, or as part of
later maturation, involving interaction with the
environment
 There is much evidence for such changes:
– the normal development of edge-detectors in the visual
cortex of newborn kittens is suppressed in the absence
of sufficient visual experience
– the somatosensory maps of adult monkeys have been
observed to adapt following the amputation of a finger
Neural Networks 2
6.11
Biological Self-Organizing Maps - 3
Readaptation of the somatosensory map of the hand region of
an adult nocturnal ape due to the amputation of one finger.
Several weeks after the amputation of the middle finger (3), the
assigned region has disappeared and the adjacent regions
have spread out.
Neural Networks 2
6.12
Artificial Self-Organizing Maps - 1
 In the NN models we have seen so far, every
neuron in a layer is connected to every neuron in
the next layer of the network
 The location of a neuron in a layer plays no role
in determining its connectivity or weights
 With SOMs, the ordering of neurons within a layer
plays an important role:
How should the neurons organize their connectivity to
optimize the spatial distribution of their responses
within the layer?
Neural Networks 2
6.13
Artificial Self-Organizing Maps - 2
 The purpose of this optimization is to achieve the
mapping:
Similarity of
features
Proximity of
excited neurons
 Such a mapping allows neurons with similar tasks
to communicate over especially short connection
paths - important for a massively parallel system
 Moreover, it results in the formation of
topographic feature maps:
– most important similarity relationships among the
input signals are converted into spatial relationships
between responding neurons
Neural Networks 2
6.14
Kohonen’s Self-Organizing Network - 1
 (1982) Teuvo Kohonen, “Self-organized formation
of topologically correct feature maps”, Biological
Cybernetics, 43:59-69
 Kohonen studied a system consisting of a twodimensional layer of neurons, with the properties:
– each neuron identified by its position vector r (i.e. its
coordinates)
– input signals to the layer represented by a feature
vector x (usually normalized)
– output of each neuron is a sigmoidal function of its total
activation (as for MLPs last week):
1
yr  f (net r ) 
1  e  netr
Neural Networks 2
6.15
Kohonen’s Self-Organizing Network - 2
– Each neuron r forms the weighted sum of the input
signals. The external activation is:
net r
external
n
  wrj x j
j 1
(the magnitudes of the weight vectors are usually
normalized)
– In addition to the input connections, the neurons are
connected to each other
» the layer has internal feedback
– The weight from neuron r’ to neuron r is labelled grr’
– These lateral inputs are superimposed on the external
input signal:
netr   wrj x j   g rr ' yr '
j
r'
Neural Networks 2
6.16
Kohonen’s Self-Organizing Network - 3
– The output of neuron r is this given by:


yr  f   wrj x j   g rr ' yr ' 
r'
 j

– The neuron activities are the solutions of this system of
non-linear equations
The feedback due to the lateral
connections grr’ is usually
arranged so that it is excitatory
at small distances and
inhibitory at large distances.
This is often called a “Mexican
Hat” response
Neural Networks 2
6.17
Kohonen’s Self-Organizing Network - 4
Kohonen’s model showing excitation zone around “winning” neuron
 The solution of such systems of non-linear
equations is tedious and time-consuming.
Kohonen avoided this by introducing a
simplification
Neural Networks 2
6.18
Kohonen’s Self-Organizing Network - 5
 The response of the network is assumed to
always be the same “shape”:
– the response is 1 at the location of the neuron r*
receiving maximal external excitation, and decreases
to 0 as one moves away from r*
 The excitation of neuron r is thus only a function
of its distance from r*:
yr  h r  r *   hrr*
 The model then proposes a rule for changing the
weights to each neuron so that a topologically
ordered map is formed. Weight change is:
wrj   hrr* x j  hrr* wrj 
Neural Networks 2
6.19
Kohonen’s Self-Organizing Network - 6
 Experiments have shown that the precise shape
of the response is not critical
 A suitable function is thus simply chosen. The
Gaussian is a suitable choice:
hrr*  e
  r  r *2
2s 2
 The parameter s determines the length scale on
which input stimuli cause changes in the map
– usually learn coarse structure first and then the fine
structure. This is done by letting s decrease over time
–  on the previous slide, which specifies the size of
each change, usually also decreases over time
Neural Networks 2
6.20
Algorithm
0. Initialization: start with appropriate initial values for the
weights wrj. Usually just random
1. Choice of stimulus: Choose an input vector x at
random from the data set
2. Response: Determine the “winning” neuron r* most
strongly activated by x
3. Adaptation: Carry out a “learning step by modifying
the weights:
new
r
w
w
old
r

 hrr* x  w
old
r

(Normalize weights if required)
4. Continue with step 1 until specified number of learning
steps are completed
Neural Networks 2
6.21
Examples - 1
SOM that has learnt data
uniformly distributed on a square
SOM that has learnt data on a
rotated square, where points are
twice as likely to occur in a circle
at the centre of the square
(relationship to clustering)
Neural Networks 2
6.22
Examples - 2
2-dimensional SOM that has learnt data uniformly distributed
in a 3-dimensional cube
Neural Networks 2
6.23
Examples - 3
1-dimensional SOM that has learnt data uniformly distributed
in a 2-dimensional circle
Neural Networks 2
6.24
Examples - 4
2-dimensional SOM that has learnt 2-dimensional data
containing 3 clusters
Neural Networks 2
6.25
The SOM for Data Mining
 The is a good method for obtaining an initial
understanding of a set of data about which the
analyst does not have any opinion (e.g. no need
to estimate number of clusters)
 The map can be used as an initial unbiased
starting point for further analysis. Once the
clusters are selected from the they are analyzed
to find out the reasons for such clustering
– It may be possible to determine which attributes were
responsible for the clusters
– It may also be possible to identify some attributes
which do not contribute to the clustering
Neural Networks 2
6.26
Example - Text Mining with a SOM - 1
 This example comes from the WEBSOM project
in Finland: http://websom.hut.fi/websom/
 WEBSOM is a method for organizing
miscellaneous text documents onto meaningful
maps for exploration and search. WEBSOM
automatically organizes the documents onto a
two-dimensional grid so that related documents
appear close to each other
Neural Networks 2
6.27
Example - Text Mining with a SOM - 2
 This map was constructed using more than one
million documents from 83 USENET newsgroups:
 Color denotes the density or the
clustering tendency of the
documents
 Light (yellow) areas are clusters
and dark (red) areas empty space
between the clusters
 This is a little difficult to read, but
WEBSOM allows one to zoom in
Neural Networks 2
6.28
Example - Text Mining with a SOM - 3
 Zoomed view of the WEBSOM map:
blues - rec.music.bluenote
books - rec.arts.books
classical - rec.music.classical
humor - rec.humor
lang.dylan - comp.lang.dylan
music - music
shostakovich alt.fan.shostakovich
Neural Networks 2
6.29