cogarch-laird - Computational Learning Laboratory

Download Report

Transcript cogarch-laird - Computational Learning Laboratory

Beyond Chunking: Learning in Soar
March 22, 2003
John E. Laird
Shelley Nason, Andrew Nuxoll
and a cast of many others
University of Michigan
Research Methodology in
Cognitive Architecture
1. Pick basic principles to guide development
2. Pick desired behavioral capabilities
3. Make design decisions consistent above
4. Build/modify architecture
5. Implement tasks
6. Evaluate performance
Soar Basic Principle:
Knowledge vs. Problem Search
• Knowledge Search
• Finds knowledge relevant to current situation
• Architectural – not subject to change with new knowledge
• Not combinatorial or generative
• Problem Search
• Controlled by knowledge, arises from lack of knowledge
• Subject to improvement with additional knowledge
• Generative – combinatorial
Desired Behavioral Capabilities
•
•
•
•
•
•
•
•
•
•
•
Interact with a complex world - limited uncertain sensing
Respond quickly to changes in the world
Use extensive knowledge
Use methods appropriate for tasks
Goal-driven
Meta-level reasoning and planning
Generate human-like behavior
Coordinate behavior and communicate with others
Learn from experience
Integrate above capabilities across tasks
Behavior generated with low computational expense
Example Tasks
The horse raced
past the barn fell
R1-Soar
NL-Soar
Amber EPIC-Soar
TacAir-Soar &
RWA-Soar
Soar Hauntbot
Soar Quakebot
Soar MOUTbot
Soar 101
Input
Propose
Operator
If cell in direction <d>
is not a wall,
-->
propose operator
move <d>
Compare
Operators
If operator <o1> will move to a
empty
cell and operator <o2>
bonus food
-->
will move to a normal food,
operator
<o1> <
-->
operator <o1> > <o2>
East
North
South
North
North>>East
East
South
South<> East
North = South
Select
Operator
movedirection
North
Working
Memory
Apply
Operator
Output
If an operator is
selected to move <d>
-->
create output
move-direction <d>
Production
Memory
Soar 102: Subgoals
Input
Propose
Operator
Compare
Operators
Tie
Impasse North > East
South > East
East
North = South
North
Select
Operator
Apply
Operator
Output
Chunking creates
rules that create preferences
based on what was tested
South
Evaluate-operator = 10
(North)
Evaluate-operator = 10
(South)
= 10
North
= 10
Evaluate-operator = 5
(East)
Chunking creates
rule that applies
evaluate-operator
Learning Results
1400
1200
Score
1000
random
800
look-ahead no chunk
look-ahead during chunking
600
look-ahead after chunking
400
200
0
1
101 201 301 401 501 601 701 801 901 1001
Decisions
Soar 102:
Dynamic Task Decomposition
Fly-Wing
Execute
Mission
If instructed to intercept an
enemy then
propose intercept
Intercept
If intercepting an enemy and
the enemy is within range
ROE are met then
propose employ-weapons
Fly-route
Ground
Attack
Execute
Tactic
Achieve
Proximity
Employ
Weapons
Search
Scram
Get Missile
LAR
Select
Missile
Launch
Missile
Get Steering
Circle
Sort
Group
Lock Radar
Lock IR
Fire-Missile
Wait-for
Missile-Clear
>250 goals, >600 operators, >8000 rules
If employing-weapons and
missile has been selected and
the enemy is in the steering
circle and LAR has been
achieved,
then propose launch-missile
If launching a missile and
it is an IR missile and
there is currently no IR lock
then propose lock-IR
Chunking
• Simple architectural learning mechanism
• Automatically build rules that summarize/cache processing
• Converts deliberate reasoning/planning to reaction
• Problem search => knowledge search
• Problem solving in subgoals determines what is learned
• Supports deliberate/reflective learning
• Leads to many different types of learning strategies
• If reasoning is inductive, so is learning
Why Beyond Chunking?
• Chunking requires deliberate processing (operators) to
• record experiences
• capture statistical regularities
• learn new concepts (data chunking)
• Processing for these is done only because we want the
learning, not because it is performing a task
• Learning competes with task at hand
• Hard to implement, hard to use
• Are there other architectural learning mechanisms?
Episodic Learning
[Andrew Nuxoll]
• What is it?
• Not facts or procedures but memories of specific events
• Recording and recalling of experiences with the world
• Characteristics of Episodic Memory
•
•
•
•
Autobiographical
Not confused with original experience
Runs forward in time
Temporally annotated
• Why add to Soar architecture?
•
•
•
•
Not appropriate as reflective learning
Provides personal history and identity
Memories that can aid future decision making & learning
Can generalize and analyze when time and more knowledge are available
Episodic Learning
• When is a memory recorded?
• Fixed period of time
• “Significant” event
• Significant change in highest activated working memory elements
• What are the cues for retrieval?
•
•
•
•
Everything
Only input
Most “activated” input / everything
Domain specific features
• Is retrieval automatic or deliberate?
• What is retrieved?
• Changes to input
• Changes to working memory
• Changes to activated
• How is the memory stored?
• As production rule
• What’s missing
• Sense of the time when episode occurred
• Current implementation is not task independent
Episodic Recall Implementation
Input
Propose
Operator
Compare
Operators
Select
Operator
Apply
Operator
Output
Tie
Impasse
East
North
South
Evaluate-operator
(North)
If a memory matches, it computes
correct next state
If no memory matches, returns
default evaluation [3].
North
= 10
3
Two Approaches
1. On-line
•
•
•
Build memories as actions taken
Attempt to recall memories during look-ahead
Chunk use of memories during look-ahead
2. Off-line
•
•
•
Randomly explore while memories are recorded
Off-line attempt to recall and learn from recorded memories
Chunk use of memories during look-ahead
On-line Episodic Learning
1400
1200
greedy
1000
random
epmem chunk 5
epmem chunk 4
800
epmem chunk 3
epmem chunk 2
600
epmem chunk 1
epmem 2
epmem 1
400
leonard
200
0
1
66 131 196 261 326 391 456 521 586 651 716 781 846 911 976
Decisions
On-Line Episodic Learning
900
800
700
600
Score
greedy
epmem chunked iter 5
500
epmem-chunked iter 4
400
epmem
random
300
200
100
0
1
11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161
Actions
Off-Line Episodic Learning
1400
1200
greedy
1000
epmem reflect iter 5
epmem reflect iter 4
800
Score
epmem reflect iter 3
epmem reflect iter 2
random
600
epmem reflect iter 1
epmem chunked iter 5
400
leonard
200
0
1
98
195
292
389
486
583
Decisions
680
777
874
971
Reinforcement Learning
[Shelley Nason]
• Why add it to Soar?
• Might capture statistical regularities automatically/architecturally
• Chunking can do this only via deliberate learning
• Why Soar?
• Potential to integrate RL with complex problem solver
• Quantifiers, hierarchy, …
• How can RL fit into Soar?
• Learn rules that create numeric probabilistic preferences for operators
• Used only when symbolic preferences are inconclusive
• Decision based on all preferences that are recalled for an operator
• Why is this going to be cool?
• Dynamically compute Q-values based on all rules that match state
• Get transfer at different levels of generality
Example Numeric Preferences
North = 8
North =12
East
North
South
North =15
=8
North = 48/6 = 8
North =1
North =2
North =10
Reinforcement Learning
State A
State B = 10
North
East = 6
North = 11
South = 3
Create rule that creates numeric
preference for North in state A
using values in State B and
max(proposed operators)
according to standard RL
Conditions of rule?
> Current: all of state
> Future: what was tested to
produce evaluation of State B
but existed in State A
Score
1200
Reinforcement Learning Results
Greedy
1000
Learned
Learning 3
Learning 2
800
Learning 1
Random
600
400
200
100
200
300
Actions
400
500
Architectural
Learning
•
•
•
•
Automatic & ubiquitous
Task independent & fixed
Bounded processing
Single experience-based
Deliberate/Reflective
Learning
• Deliberately engaged
• “On top” of architecture
• Uses knowledge to control
• Uses architectural learning
• Can change with learning
• Unbounded processing
• Can generalize across multiple
examples through recall
• Examples:
•
•
•
•
Chunking
Episodic learning
Reinforcement learning
Semantic/concept learning?
• Examples:
•
•
•
•
•
Task acquisition
Learning by instruction
Learning by analogy
Recovery from incorrect knowledge
…