document

Transcript document

Learning Agents
CSE298
CSE300
CSE333
Presented by:
Huayan Gao ([email protected]),
Thibaut Jahan ([email protected]),
David Keil ([email protected]),
Jian Lian ([email protected])
Students in CSE 333
Distributed Component Systems
Prof. Steven A. Demurjian, Sr.
Computer Science & Engineering Department
The University of Connecticut
1
Outline
CSE298
CSE300
CSE333

Agents

Distributed computing agents

The JADE platform

Reinforcement learning

UML design of agents

The maze problem

Conclusion and future work
2
Agents
CSE298
CSE300
Some general features characterizing agents:
CSE333










Autonomy
goal-orientedness
collaboration
flexibility
ability to be self-starting
temporal continuity
character
adaptiveness
mobility
capacity to learn.
3
Classification of agents
CSE298
CSE300

Interface Agents
AI techniques to provide assistance to the user
CSE333

Mobile agents
capable of moving around networks gathering
information

Co-operative agents
communicate with, and react to, other agents in a
multi-agent systems within a common environment

Reactive agents
“reacts” to a stimulus or input that is governed by
some state or event in its environment
4
Distributed Computing Agents
CSE298
CSE300
CSE333 

Common learning goal (strong sense)
Separate goals but information sharing
(weak sense)
5
The JADE Platform
CSE298
CSE300

CSE333
Java Agent Development Environment
- Java Software framework
- Middleware platform
- Simplifies implementation and deployment of MAS

Services Provided
- AMS (Agent Management System)
registration, directory and management
- DF (Directory Facilitator)
yellow pages service
- ACC (Agent Communication Channel)
message passing service within the platform (including
remote agents)
6
JADE Platforms for distributed agents
CSE298
CSE300
CSE333
7
Agents and Markov processes
CSE298
CSE300
Agent type
Environment type
CSE333
Deterministic
Stochastic
Accessible
Reflex
Solves MDPs
Inaccessible
Policy-based
non-Markov
Solves
POMDPs*
*Partially observable Markov decision problems
8
Learning from the environment
CSE298
CSE300 
CSE333
Environment, especially a distributed one,
may be complex, may change

Necessity to learn dynamically, without
supervision

Reinforcement learning
- used in adaptive systems
- involves finding a policy

Q-learning, a special case of RL
- compute Q-values into Q-table
- finds optimal policy
9
Policy search
CSE298
CSE300
CSE333
 Policy: a mapping from states to actions
 Policy is as opposed to action sequence
 Agents that precompute action sequences
cannot respond to new sensory
information
 Agent that follows a policy incorporates
sensory information about state into
action determination
10
Components of a learner
CSE298
CSE300

In learning, percepts may help improve
agent’s future success in interaction

Components:
- Learning element (improves policy)
- Performance element (executes policy)
- Critic: Applies fixed performance
measure
- Problem generator: Suggests
experimental actions that will provide
information to learning element
CSE333
11
A learning agent and its environment
CSE298
CSE300
CSE333
12
Temporal difference learning
CSE298
CSE300
CSE333
• Uses observed transitions and differences
between utilities of successive states to adjust
utility estimates
• Update rule based on transition from
state i to j:
U(i)  U(i) + (R(i) + U(j) U(i))
where
- U is estimated utility,
- R is reward
-  is learning rate
13
Q-learning
CSE298
CSE300 
CSE333


Q-learning: a variant of reinforcement
learning in which the agent incrementally
computes a table of expected aggregate
future rewards
Agent modifies the values in the table to
refine its estimates.
Using the temporal-difference learning
approach, update formula is calculated
after the learner goes from state i to state j:
Q(a, i)  Q (a, i) +
(R(i) + maxa Q(a, j) - Q (a, i))
14
Q-values
CSE298
CSE300 
CSE333
Definition: Q-values are values Q(a, i) of expected
utility associated with a given action in a given
state

Utility of state:
U(i) = maxa Q(a, i)

Q-values permit decision making without a
transition model

Q-values are directly learnable from reward
percepts
15
UML design of agents
CSE298
CSE300
CSE333

Standard UML did not provide a complete
solution for depicting the design of multiagent systems.

Multi-agent systems being actors and
software, their design does not follow
typical UML design

Goals, complex strategies, knowledge, etc.
are often missed
16
Reactive use cases
CSE298
CSE300
CSE333
17
A maze problem
CSE298
CSE300 
CSE333

Simple example consisting of a maze for which
the learner must find a policy, where the reward
is determined by eventually reaching or not
reaching a goal location in the maze.
Original problem definition may be modified by
permitting multiple distributed agents that
communicate, either directly or via the
environment
18
Cat and Mouse problem
CSE298
CSE300 
CSE333 
Example of reinforcement learning
The rules of the Cat and Mouse game are:
- Cat catches mouse;
- Mouse escapes cat;
- Mouse catches cheese;
- Game is over when the cat catches the mouse.

Source: T. Eden, A. Knittel, R. van Uffelen.
Reinforcement learning.
www.cse.unsw.edu.au/~aek/catmouse

Our project included modifying existing Java
code to enable remote deployment of learning
agents and to begin exploration of a multiagent
version
19
Cat-Mouse GUI
CSE298
CSE300
CSE333
20
Use cases in the Cat-Mouse problem
CSE298
CSE300
CSE333
21
Classes for the Cat-Mouse problem
CSE298
CSE300
CSE333
22
Sequence diagram
CSE298
CSE300
CSE333
23
Maze creation and registration
CSE298
CSE300
CSE333
24
Cat creation and registration
CSE298
CSE300
CSE333
25
JADE
CSE298
CSE300
CSE333
Cat look up maze from AMS and DF service
26
JADE
CSE298
CSE300
CSE333
Mouse Agent Creating and Registration
27
Mouse Agent joins game
CSE298
CSE300
CSE333
28
Game begins
CSE298
CSE300
CSE333
Game begins and
Maze (master) and
Mouse agents
exchange
information by ACL
messages
29
Remote deployment of learning agents
CSE298
CSE300 o
CSE333
o
Using JADE, we can deploy maze, mouse, and
cat agents:
Jademaze maze1
Jademouse mouse1
Jadecat cat1
Jademaze, jademouse, jadecat are batch file
names to deploy maze and cat agents. If we want
to create them from a remote PC, we will use the
following commands:
Jademaze –host hostname mazename;
Jademaze –host hostname catname;
Jademaze –host hostname mousename;
30
Cat-Mouse in JADE
CSE298
CSE300 
CSE333

JADE allows services to be hosted and
discovered in a distributed dynamic
environment.
On top of those “basic” services, mouse/cat
agents can conceive maze/mouse/cat services
provided and join/quit from the maze server
they discovered from DF service.
31
Innovation
CSE298
CSE300 
A backbone for a core platform encouraging other
agents to connect and join
CSE333

Access to ontologies and service description to
move towards interoperability at the service level

A baseline set of deployed agent services that can
be used as building blocks by application
developers to create innovative value added
services

A practical test for a learning agent system
complying with FIPA standards.
32
Deployment Scenario
CSE298
CSE300 
CSE333

Infrastructure Deployment
- Enable their agents to interact with service agents
developed by others
- Test applications in a realistic, distributed, open
environment
Agent and Service Deployment
- FIPA ACL messages to exchange information
- Standard FIPA ACL compatible content
languages
- FIPA defined agent management services|
(directories, communication and naming).
33
Conclusions
CSE298
CSE300

Demonstration of a feasible research approach
exploring the relationship between reinforcement
learning and deployment of component-based
distributed agents

Communication between agents

Issues with the space complexity of Q-learning:
where n = grid size, m = # mice, c = # cats,
space complexity is 64n2(m+c+1)
CSE333
1 mouse + 1 cat =>
481Mb of memory storage for Q-Table
34
Future work
CSE298
CSE300 
Learning in environments that change in
response to the learning agent
CSE333

Communication among learning agents;
multiagent learning

Overcoming problems of table size under
multiagent conditions

Security in message-passing
35
Partial list of references
CSE298
CSE300

CSE333






S. Flake, C. Geiger, J. Kuster. Towards UML-based
analysis and design of multi-agent systems. ENAIS’2001.
T. Mitchell. Machine learning. McGraw-Hill, 1997.
A. Printista, M. Errecalde, C. Montoya. A parallel
implementation of Q-Learning based on communication
with cache. http://journal.info.unlp.edu.ar/journal6/
papers/ p4.pdf.
S. Russell, P. Norvig. Artificial intelligence: A modern
approach. Prentice Hall, 1995.
S. Sen, G. Weiss. Learning in multiagent systems.
In G. Weiss, Ed., Multiagent systems: A modern approach
to distributed artificial intelligence, MIT Press, 1999.
R. Sutton, A. Barto. Reinforcement learning:
An introduction. MIT Press, 1998.
K. Sycara, A. Pannu, M. Williamson, D. Zeng, K. Decker.
Distributed intelligent agents. IEEE Expert, 12/96.
36