Artificial intelligence

Download Report

Transcript Artificial intelligence

How a Modeler’s Conception of Rewards
Influences a Model’s behavior
Investigating ACT-R 6’s utility learning mechanism
› Christian P. Janssen
› Wayne D. Gray
› Michael J. Schoelles
RENSSELAER|
Cognitive CogWorks
Science
Laboratories
|
/
University of
Groningen
/
Artificial
Intelligence
/
Cognitive
Modeling
Temporal difference learning & ACT-R
› Temporal difference learning has recently been
introduced as ACT-R’s new utility learning
mechanism
(e.g., Fu & Anderson, 2004; Anderson, 2006, 2007; Bothell, 2005)
› Utility learning learns to optimize behavior as to
maximize the rewards that the model receives
› A model can:
• Receive rewards at different moments in times
• Receive rewards of different magnitudes
› There are no guidelines for choosing when a reward
should be given and what its magnitude should be
RENSSELAER|
Cognitive CogWorks
Science
Laboratories
|
/
University of
Groningen
/
Artificial
Intelligence
/
Cognitive
Modeling
2
New issues for ACT-R
› We studied two aspects of TD learning:
• When is reward given
• Magnitude of the reward
› This a new issue for ACT-R
• When is reward given: could be varied in ACT-R 5
• Magnitude of reward: could not be varied in ACT-R 5
› As we will show, the modeler’s conception of
rewards has a big influence on a model’s
behavior
› Case study: Blocks World task (Gray et al., 2006)
RENSSELAER|
Cognitive CogWorks
Science
Laboratories
|
/
University of
Groningen
/
Artificial
Intelligence
/
Cognitive
Modeling
3
Why the Blocks World task?
› Previous work indicates that the utility learning
mechanism is crucial for this task
• ACT-R 5 models (Gray, Sims, Schoelles, 2005)
• Regular ACT-R 5 can not provide a good fit to the human
data
• Because rewards in ACT-R 5 are binary (i.e., successes and
failures) and not scalar
• Ideal Performer Model (Gray et al., 2006)
• Model outside of ACT-R that uses temporal difference
learning provided a very good fit (Gray et al., 2006)
RENSSELAER|
Cognitive CogWorks
Science
Laboratories
|
/
University of
Groningen
/
Artificial
Intelligence
/
Cognitive
Modeling
4
Blocks World task
› So what’s the task?
RENSSELAER|
Cognitive CogWorks
Science
Laboratories
|
/
University of
Groningen
/
Artificial
Intelligence
/
Cognitive
Modeling
5
Blocks World task
Task: “Copy pattern in target window by moving blocks from
resource window to workspace window”
RENSSELAER|
Cognitive CogWorks
Science
Laboratories
|
/
University of
Groningen
/
Artificial
Intelligence
/
Cognitive
Modeling
6
Blocks World task
Windows are covered with gray rectangles:
Accessing information requires interaction with the interface
RENSSELAER|
Cognitive CogWorks
Science
Laboratories
|
/
University of
Groningen
/
Artificial
Intelligence
/
Cognitive
Modeling
7
Blocks World task
Windows are covered with gray rectangles:
Accessing information requires interaction with the interface
RENSSELAER|
Cognitive CogWorks
Science
Laboratories
|
/
University of
Groningen
/
Artificial
Intelligence
/
Cognitive
Modeling
8
Blocks World task
Windows are covered with gray rectangles:
Accessing information requires interaction with the interface
RENSSELAER|
Cognitive CogWorks
Science
Laboratories
|
/
University of
Groningen
/
Artificial
Intelligence
/
Cognitive
Modeling
9
Blocks World task
Windows are covered with gray rectangles:
Accessing information requires interaction with the interface
RENSSELAER|
Cognitive CogWorks
Science
Laboratories
|
/
University of
Groningen
/
Artificial
Intelligence
/
Cognitive
Modeling
10
Blocks World task
› Blocks world task:
• Information in Target Window is only available
after waiting for a lockout time
• 0, 400 or 3200 milliseconds (between subjects)
RENSSELAER|
Cognitive CogWorks
Science
Laboratories
|
/
University of
Groningen
/
Artificial
Intelligence
/
Cognitive
Modeling
11
Blocks World task: human data (Gray et al., 2006)
› Size of lockout time influences human behavior:
Number of blocks placed
after 1st visit to target window
5
4
3
2
1
Lockout Time [s]
0
0.0
RENSSELAER|
1.0
Cognitive CogWorks
Science
Laboratories
|
2.0
3.0
/
University of
Groningen
/
Artificial
Intelligence
/
Cognitive
Modeling
12
Blocks World task: Modeling Strategies
› Strategy: How many blocks do you plan to place
after a visit to the target window?
› 8 encode-x production rules
• “study x blocks”
• Encode-1 till encode-8
› Model learns utility value of each production
rule using ACT-R’s temporal difference learning
algorithm
RENSSELAER|
Cognitive CogWorks
Science
Laboratories
|
/
University of
Groningen
/
Artificial
Intelligence
/
Cognitive
Modeling
13
Utility learning
› Utility learning requires the incorporation of
rewards
› Two choices are crucial:
• When is the reward is given?
• What is the magnitude of the reward?
› After some experience, the utility of a
production rule approximates (Anderson, 2007):
Ui  r(tx )  (tx  ti )
Magnitude
RENSSELAER|
When is reward given
Cognitive CogWorks
Science
Laboratories
|
/
University of
Groningen
/
Artificial
Intelligence
/
Cognitive
Modeling
14

Utility learning
› Choice 1: When is the reward given?
› Important because:
• Utility value has a linear relationship with the the time at
which the reward is given
U i  r(t x )  t t i ,t x
› Choice in Blocks World
• Once model: Update once, at the end of the trial
• Each model: Update each time that part of the task is
completed.
• A (set of) block(s) has been placed and the model either returns to
the target window to study more blocks, or finishes the trial
RENSSELAER|
Cognitive CogWorks
Science
Laboratories
|
/
University of
Groningen
/
Artificial
Intelligence
/
Cognitive
Modeling
15

Utility learning
› Choice 2: magnitude of the reward
› Important because:
• Utility value has a linear relationship with the magnitude of
the reward
U i  r(t x )  t t i ,t x
› But how to set this value?
• Experimental tweaking? -> unfavorable
• Fixed range of values? (e.g., between 0 and 1) -> difficult
• Relate to neurological data? -> not available for most models
RENSSELAER|
Cognitive CogWorks
Science
Laboratories
|
/
University of
Groningen
/
Artificial
Intelligence
/
Cognitive
Modeling
16
Utility learning
› Choice 2: magnitude of the reward
› Choice in Blocks World:
• Relate the reward to what might be important in the task
• Accuracy: Accuracy with which task is performed
Options:
• Success: # blocks placed (once)
• Success: # blocks placed (each)
• Success & Failure: # blocks placed - #blocks forgotten (each
model)
• Time: How much time does (part of the) task take?
Options:
• Time spend on the task: -1 * time spend (once)
• Time spend waiting for specific aspect of the task:
-1 * lockout size * number of visits to target window (once)
• Number of blocks placed per second (each)
RENSSELAER|
Cognitive CogWorks
Science
Laboratories
|
/
University of
Groningen
/
Artificial
Intelligence
/
Cognitive
Modeling
17
Blocks World task: Modeling Strategies
› 6 models were developed
› Each model is run 6 times for each of 3
experimental conditions:
• 0, 400 and 3200 milliseconds
› Models interact with the same interface as
human participants
RENSSELAER|
Cognitive CogWorks
Science
Laboratories
|
/
University of
Groningen
/
Artificial
Intelligence
/
Cognitive
Modeling
18
Blocks World task: general results
› Each model has unique results
RENSSELAER|
Cognitive CogWorks
Science
Laboratories
|
/
University of
Groningen
/
Artificial
Intelligence
/
Cognitive
Modeling
19
Blocks World task: general results
› What is the impact of:
• When the reward is given (once/each)
• The concept of the reward (related to
accuracy/time)
› Results averaged over 3 models
RENSSELAER|
Cognitive CogWorks
Science
Laboratories
|
/
University of
Groningen
/
Artificial
Intelligence
/
Cognitive
Modeling
20
Utility learning: impact of when reward is given
RENSSELAER|
Cognitive CogWorks
Science
Laboratories
|
/
University of
Groningen
/
Artificial
Intelligence
/
Cognitive
Modeling
21
Utility learning: impact of concept of reward
RENSSELAER|
Cognitive CogWorks
Science
Laboratories
|
/
University of
Groningen
/
Artificial
Intelligence
/
Cognitive
Modeling
22
Comparison with ACT-R 5
(Gray, Sims & Schoelles, 2005)
RENSSELAER|
Cognitive CogWorks
Science
Laboratories
|
/
University of
Groningen
/
Artificial
Intelligence
/
Cognitive
Modeling
23
Conclusion
› Rewards can be given at different times during a
trial and according to different concepts
› There are no guidelines what the best choices
are
› Blocks World suggests that rewards should:
• Be given once: Model can optimize behavior over
entire task
• Relate to concept of time: because different
strategy choices have a big impact on reward size
› Models of other tasks should point out if this is
consistent
RENSSELAER|
Cognitive CogWorks
Science
Laboratories
|
/
University of
Groningen
/
Artificial
Intelligence
/
Cognitive
Modeling
24
Conclusion
› This is not just a Blocks World issue
• General Computer Science / AI issue:
representing a task in the right way is crucial
(e.g., Russell & Norvig, 1995; Sutton & Barto, 1998)
• Many experiments involve manipulations and
measurements of accuracy and speed of
performance
› This a new issue for ACT-R
• When is reward given: could be varied in ACT-R 5
• Magnitude of reward: could not be varied in ACT-R 5
RENSSELAER|
Cognitive CogWorks
Science
Laboratories
|
/
University of
Groningen
/
Artificial
Intelligence
/
Cognitive
Modeling
25
Thank you for your attention
› Questions?
› More information:
•
•
•
•
[email protected]
www.ai.rug.nl/~cjanssen
www.cogsci.rpi.edu/cogworks
Poster Session @ CogSci 2008
Thursday, July 24th
“Cognitive Models of Strategy Shifts in Interactive Behavior”
(session: “Attention and Implicit Learning”)
RENSSELAER|
Cognitive CogWorks
Science
Laboratories
|
/
University of
Groningen
/
Artificial
Intelligence
/
Cognitive
Modeling
26
References
›
›
›
›
›
›
›
›
Anderson, J. R. (2006). A new utility learning mechanism. Paper presented at the
2006 ACT-R workshop.
Anderson, J. R. (2007). How can the human mind occur in the physical universe?
New York: Oxford University Press.
Bothell, D. (2005). ACT-R 6 Official Release. Proceedings of the 12th ACT-R
Workshop.
Fu, W. T., & Anderson, J. R. (2004). Extending the computational abilities of the
procedural learning mechanism in ACT-R. Proceedings of the 26th annual meeting
of the Cognitive Science Society, 416-421.
Gray, W. D., Schoelles, M. J., & Sims, C. R. (2005). Adapting to the task
environment: Explorations in expected value. Cognitive Systems Research, 6(1), 2740.
Gray, W. D., Sims, C. R., Fu, W. T., & Schoelles, M. J. (2006). The soft constraints
hypothesis: A rational analysis approach to resource allocation for interactive
behavior. Psychological Review, 113(3), 461-482.
Russell, S. J., & Norvig, P. (1995). Artificial intelligence: a modern approach. Upper
Saddle River, NJ: Prentice-Hall, Inc.
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction.
Cambridge, MA: MIT Press.
RENSSELAER|
Cognitive CogWorks
Science
Laboratories
|
/
University of
Groningen
/
Artificial
Intelligence
/
Cognitive
Modeling
27