Transcript power point

Balancing an Inverted Pendulum
with a Multi-Layer Perceptron
ECE 539 Final Project
Spring 2000
Chad Seys
Outline
•
•
•
•
•
•
•
•
•
The Inverted Pendulum
The Problem
Approach
Position Representation
Output Force Representation
Initialization
Convergence & Reinitialization
Results
Discussion
The Inverted Pendulum:
• Abstraction is a rigid rod
attached at its lower end
to a pivot point.
• Like balancing a broom
on the palm of hand.
• Useful in modeling:
– Launching a rocket into
space
– look up another
The Problem:
• Train a multi-layer perceptron to...
– keep an inverted pendulum in its upright
position
– move an inverted pendulum from any position
to the upright position (keep it balanced there).
Approach:
• Divide the 180 degrees into M arc segments
(where M is odd).
– M odd to provide a central region where no
force is applied.
– There will be M input neurons, one per
segment.
• There will be two output neurons whose
outputs will be interpreted as opposing
force vectors of fixed magnitude.
Inverse Pendulum Position
Representation
• A few of the possibilities to explore:
– (Chosen) A “1” in the input dimension
corresponding to the arc segment which the
inverse pendulum currently occupies, “0” in
other dimensions.
– As above, but have a gradual decline to “0” in
neighboring segments.
• Might help prevent overshoot at the top.
– Alternatively, put “0” to the left of inv
pendulum, “0.5” at the inv pendulum, and “1”
to the right of the inv pendulum.
• Might provide more directional information.
Output Force Representation
• The output neuron force vector will act
perpendicularly to the center of mass of the
inv pendulum.
• Will use a supervised learning paradigm.
– Training data will be a fixed correcting force to
return the inverse pendulum to the vertical.
• Ideally would use a unsupervised learning
paradigm allowing varying correcting force
magnitudes, but unsure how to implement.
Initialization
• at top with a small movement in one or the
other direction
• at increasing angles from the top with no
movement. (not included in final version of
project)
Convergence & Reinitialization
• The standard: Amount of match between
output and the teacher’s data.
• Also, over how many simulation steps does
the inv pendulum stay within a small
number of degrees of the top. Stability.
– This may be the criteria for reinitialization.
– May not reset the network weights, only the
inverse pendulum position.
– (did not appear in the final version of project)
1
1
1
1
1
1
H Hidden
Neurons
M Input
Neurons
H
1
H
1
M Arc Segments
Fixed
Output
Force
θ
2 Output
Neurons
Results (Force vs. Time Step):
• Difficult to find a balance of force and
sampling interval.
– Using too large of a force would result in overcorrection.
Results (Force vs. Time Step):
– Too small of a force resulted in under
correction.
– Smaller time steps solve this problem, but
increase memory usage and processing time.
Did not reach 100% convergence.
– Ran one promising (which appeared not to be under or over corrected)
simulation for a period of several days (>69000 iterations) and achieved a
convergence rate of only 61.3%.
– By the way the pendulum falls during the testing section of the simulation,
the neural network does not yet appear to have “learned” to balance the
inverse pendulum.
Results
• Did not succeed in balancing a inverse
pendulum during the duration of the
simulation runs.