x - Faculty - East Tennessee State University

Download Report

Transcript x - Faculty - East Tennessee State University

How to Make a Computer
Think for You
Jeff Knisley, The Institute for Quantitative
Biology, East Tennessee State University
ALABAMA MAA STATE DINNER AND
LECTURE, Feb, 2006
This is a Neuron
Signals Propagate from
Dendrites to Soma
Dendrites
Soma
Synapses
Axon
Signals Decay at
Soma if below a
Certain threshold
Signals May Arrive Close Together
If threshold exceeded,
then neuron “fires,”
sending a signal
along its axon.
Neurons Form Networks
Artificial Neural Network (ANN)
 Made of artificial neurons, each of which



Sums inputs from other neurons
Compares sum to threshold
Sends signal to other neurons if above
threshold
 Synapses have weights


Model relative ion collections
Model efficacy (strength) of synapse
Artificial Neuron
wi j  synaptic weight betweeni and j neuron
th
th
 j  threshold of j th neuron
    " firing " function that maps state to output
x1
x2 w wi1
i2
x3 w
.
i3
.
.
win
x
n
Nonlinear
firing function
si  wij x j
xi    si  i 
Firing Functions are Sigmoidal
j
j
j
  s j 
j

1
1 e

 j s j  j

Hopfield Network
Blue = 1
White = 0
Imagine
Complete
Connectivity
with weights
wij between ith
and jth neurons
Choose ith neuron at random and calculate its new state
xi
new


old
    wij x j   j 
 i j

Energy
Define the energy to be
1
E   w jl x j xl   j x j
2 j 1 l 1
j 1
n
n
n
Theorem: If the weights are symmetric, then the Energy
decreases each time a neuron fires.
Applications:
 Handwriting Recognition
 http://faculty.etsu.edu/knisleyj/neural/neuralnet3.htm
 Universal Classifier
 http://faculty.etsu.edu/knisleyj/neural/neuralnet4.htm
 Expert Systems
 Rule-based rather than sequential programs
 Air traffic control, Industrial Controls
 Robotics
 Usually using a 3-layer network
3 Layer Neural Network
The output layer may
consist of a single
neuron
Output
Input
Hidden
(is usually much larger)
Neural Nets can “Think”
 A Neural Network can “think” for itself


Can be trained (programmed) to make decisions
Can be trained to classify information
This tiny 3-Dimensional
Artificial Neural Network,
modeled after neural networks
in the human brain, is helping
machines better visualize
their surroundings.
http://www.nasa.gov/vision/universe/roboticexplorers/robots_like_people.html
The Mars Rovers
 Must choose where to explore


Programmed to avoid “rough” terrain
Programmed to choose “smooth” terrain
 ANN decides between “rough” and “smooth”


“rough” and “smooth”
are ambiguous
Programming
is by means of
many “examples”
(lessons)
Illustration: Colors = Terrains
 As a robot moves, it defines 8 squares of size
1
2
3
A that define
directions
it can move
in



It should avoid red (rough) terrain
It should prefer green (smooth) terrain
It should 4be indifferent
terrain
Robot to blue (normal)
5
 It is impossible to program every possible
shade and variation


Instead, a neural network is constructed
6
8
Terrain Block
– Color7 class input/output
patterns are used to train the network
Train ANN to Classify Colors
“Houston,
Traininga
Set
we
have
Input
Output
problem!”
Terrain
<R,G,B>
0.1
0 Red
<1,0,0>
0 Green
0.9
0.1
0 Blue
Color
Class
Terrain
Examples
Hidden
<0,1,0>
<0,0,1>
<1,0,0>
<0,0,0>
ANN’s Can Also Think for Us
 Mars Rovers do what Humans can do better
 They do not “learn” on their own
 They are taught by Humans who could make the same
or better decisions in their place
 They can use their “learning” independently
 What about problems Humans can’t solve at all
 Neural Networks can be used as savants dedicated to
a single problem too complex for humans to decipher
 Examples
 Agent-Based Modeling
 RNA, Protein Structures, DNA analysis
 Data Mining
Division of Labor in Wasps Nests
 A Wasp can alternate between laborer, water
forager, and pulp forager


Hypothesis: Individual wasps choose roles so
that total water in the nest reaches equilibrium
Hypothesis: Pulp production is maximized
when total water in the nest is stable
 Limited confirmation of Hypotheses


(Karsai and Wenzel, 2002) System of 3 ode’s
(Karsai, *Phillips, and Knisley, 2005)
Computer Simulation
ANN Wasp Societies
 Problems with current models


Role assigned randomly w.r.t. a Weibull
distribution
Parameters selected so model “works”
 ANN: Role is a decision of each wasp

A wasp is a special type of Artificial Neuron



Wasp decides its own role
Wasp learns to make good choices
System is deterministic yet unpredictable
“I’m empty,
I need
water!”
“No
luck,
so I’m a
water
forager.”
“I’m so good at
getting water, I think
I should go forage
for pulp.”
“I have
water!”
Division of Labor by choice?
 Numbers of foragers goes up and down
 But water level in nest becomes stable
Data Mining
 We often consider very large data sets



Microarrays contain about 20,000 data points
Typical studies use 70 – 100 microarrays
Most of the data is not relevant
 ANN’s can be trained to find hidden patterns



Input layer = Genes
Network is trained repeatedly with microarrays
collected in various physiological states
ANN’s predict which genes are responsible for
a given state
ANN’s in Data Mining
 Each neuron acts as a “linear classifier”
 Competition among neurons via nonlinear
firing function = “local linear classifying”
 Method for Genes:


Train Network until it can classify between
control and experimental groups
Eliminating weights sufficiently close to 0 does
not change local classification scheme
 First results obtained with a Perceptron ANN
Simple Perceptron Model
x1 = Gene 1
x2 = Gene 2
w1
w2
Output  y
wn
xn = Gene n
The output is the “physiological
state” due to the relative gene
expression levels used as inputs.
Simple Perceptron Model
 Features



The wi measure gene “significance”
Detects genes across n samples & references
Ref: Artificial Neural Networks for Reducing
the Dimensionality of Gene Expression Data,
A. Narayanan, et al. 2004.
 Drawbacks


The Perceptron is a globaly linear classifier
(i.e., only classifies linearly separable data)
We are now using a more sophisticated model
Linearly Separable Data
Separation using Hyperplanes
Data that Cannot be separated Linearly
How do we select w’s
 Define an energy function
E ( x1 ,..., xn )   y  t j 
m
2
j 1

x1,…,xn are inputs, y = output
 t1,…,tm
are the outputs associated with
the patterns to be “learned”
 y = (  wixi - ) for a perceptron
 Key: Neural networks minimize energy
Back Propagation
 Minimize the Energy Function
E
 Choose wi so that
0
w k
 In practice, this is hard
 Back Propagation:   1
 For each pattern ti
 Feed Forward and Calculate E
 Increment weights using a delta rule:
wk

new
 w k  y  ti y 1  y xk
Repeat until E is sufficiently close to 0
Perceptron for Microarray DataMining
Remove % of genes with synaptic weights
that are close to 0
2. Create ANN classifier on reduced arrays
3. Repeat 1 and 2 until only the genes that
most influence the classifer problem remain
1.
Remaining genes are most important in
classifying experimentals versus controls
Functional Viewpoint
 ANN is a mapping f: Rn → R


Can we train perceptron so that f(x1,…,xn) =1 if
x vector is from a control and f(x1,…,xn) =0 if
x is from an experimental?
Answer: Yes if data can be linearly separated,
but no otherwise unless we use better ANN’s!
 General ANN’s also have problems


Spurious states: (sometimes ANN’s get the
wrong answer)
Hard Margins: Training set must be perfect
Multilayer Network
x1
x2
x3
  w x  1   1
t
1
1
2
..
.
xn
..
.
N
out    j j
j 1
N
  w N x N   N
t
out    j  w j t x   j 
N
j 1
How do we select w’s
 Define an energy function
1
2
E    i i  ti 
2 i 1
n
t vectors are the information to be “learned”
 Neural networks minimize energy
 The “information” in the network is
equivalent to the minima of the total
squared energy function

Back Propagation
 Minimize the Energy Function
 Choose wj and j so that E

In practice, this is hard
wij
 0,
E
0
 j
 Back Propagation with cont. sigmoidal
 Feed Forward and Calculate E
 Modify weights using a d rule
new
new
new


 new






t
,
w

w
 d  j  t j 
j
j
j j
j
j
j
d j   j 1   j 

Repeat until E is sufficiently close to 0
ANN as Classifer
 (Cybenko) For any >0, the function
f(x1,…,xn) =1 if x vector is from a control and
f(x1,…,xn) =0 if x is from an experimental can
be approximated to within  by a multilayer
neural network.
 The weights no longer have the one-to-one
correspondence to genes, so we test
significance using Monte Carlo techniques.
ANN and Monte Carlo Methods
 Monte Carlo methods have been a big
success story with ANN’s


Error estimates with network predictions
ANN’s are very fast in the forward direction
 Example: ANN+MC implement and
outperform Kalman Filters (recursive linear
filters used in Navigation and elsewhere)
(De Freitas J. F. G., et. al., 2000)
Recall: Multilayer Network
out    j  w j t x   j 
N
1
2
..
.
N Genes
..
.
N node
Hidden
Layer
j 1
N
j correspond to genes,
but do not directly depend
on a single gene.
Naïve Monte Carlo ANN Method
1. Randomly choose subset S of genes
2. Train using Back Propagation
3. Prune based on values of wj (or j , or both)
4. Repeat 2-3 until a small subset of S remains
5. Increase “count” of genes in small subset
6. Repeat 1-5 until each gene has 95%
probability of appearing at least some
minimum number of times in a subset
7. Most frequent genes are the predicted
Additional Considerations
 If a gene is up-regulated or down-regulated
for a certain condition, then put it into a
subset in step 1 with probability 1.
 This is a simple-minded Bayesian method.
Bayesian analysis can make it much better.
 Algorithm distributes naturally across a multiprocessor cluster or machine



Choose the subsets first
Distribute subsets to different machines
Tabulate the results from all the machines
Summary
 ANN’s are designed to make decisions in
similar fashion to how we make decisions


In this way, they can “think” for themselves
Can be considered supplements to existing
hardware and software tools
 The ability of ANN’s to make decisions allows
them to think for us as well!


They can find patterns in large data sets that
we humans would likely never uncover
They can “think” for days/months on end
Any Questions?
References
Cybenko, G. Approximation by Superpositions of a sigmoidal function,
Mathematics of Control, Signals, and Systems, 2(4),1989, p. 303-314.
De Freitas J. F. G., et. al. Sequential Monte Carlo Methods To Train
Neural Network Models. Neural Computation, Volume 12, Number 4, 1
April 2000, pp. 955-993(39)
L. Glenn and J. Knisley, Solutions for Transients in Arbitrarily Branching
and Tapering Cables, Modeling in the Neurosciences: From Biological
Systems to Neuromimetic Robotics, ed. Lindsay, R., R. Poznanski,
G.N.Reeke, J.R. Rosenberg, and O.Sporns, CRC Press, London, 2004.
A. Narayan, et. al Artificial Neural Networks for Reducing the
Dimensionality of Gene Expression Data. Neurocomputing, 2004.