Markov Games

Download Report

Transcript Markov Games

Markov Games
TCM Conference 2016
Chris Gann
[email protected]
The Taxi Problem
A taxi company has divided the city into three regions – Northside,
Downtown, and Southside. By keeping track of pickups and
deliveries, the company has found that of the fares picked up in
Northside, 50% stay in that region, 20% are taken Downtown, and
30% go to Southside. Of the fares picked up Downtown, only 10%
go to Northside, 40% stay Downtown, and 50% go to Southside. Of
the fares picked up in Southside, 30% go to each of Northside and
Downtown, while 40% stay in Southside.
We would like to know what the distribution of taxis will be over
time as they pick up and drop off successive fares. Suppose we
want to know the probability that a taxi starting off Downtown, will
be Downtown after letting off its seventh fare?
State Diagram
This information can be
represented in a state diagram
which includes
• the three states D, N, and S
corresponding to the three
regions of the city
• the probabilities of a taxi
transitioning from one
region/state to another
Markov Chains
These probabilities are constant
and independent of previous
behavior – this memorylessness of
the system is called the Markov
property. We assume that a
transition – picking up and
dropping off a fare – occurs each
time the system is observed, and
that observations occur at regular
intervals. Systems with these
characteristics are called Markov
chains or Markov processes.
Computing the Probabilities
What is the probability that a taxi that starts off Downtown ends up
in Northside after two fares?
One possibility is that the taxi stays Downtown for the first fare, and
then transitions to Northside for the second. The probability of this
occurring is:
But we could also have the taxi going to either Northside of
Southside first, then transitioning to Northside:
Computing the Probabilities
Since the taxi could follow the first, second or third path, the
probability of starting and ending Downtown after two fares is
But we could also have the taxi going to either Northside of
Southside first, then transitioning to Northside:
Tree Diagram
If we multiply along all of the paths
and sum the results we find that this
probability is 0.309.
If we want to know the
probability of a taxi
transitioning from one
region to another after
just three fares, the
computation would have
more possible paths.
Suppose we were
interested in the
probability of a taxi both
starting and ending up
Downtown. We can use
a tree diagram to
represent this
calculation.
Matrix Multiplication
However, we originally asked what would be the probability that a
taxi starting Downtown is back Downtown after seven fares.
Considering the tree diagram for only three transitions, trying to
compute this probability using a tree diagram would be unwieldy.
Let’s look again at our first calculation – the probability of
transitioning from D to N in two transitions.
This has the structure of an inner product. That is it can be
represented as the dot product of two vectors, or equivalently, the
product of a row and a column of two matrices.
Developing the Transition Matrix
Labeling these with matrices with D, S, and N indicates which
probability is found in each entry. That is, each entry of these two
simple matrices is the probability of going from the row label to the
column label in one transition.
The Transition Matrix
We can create a square matrix, T, called the transition matrix, by
constructing rows for the probabilities going from Southside and
Northside as well.
An entry Tij of this matrix is the probability of a transition from
region i to region j. For example, T32, is the probability of a fare
that originates in Northside going to Southside. (Note the sum of
entries across rows of T.)
The Transition Matrix
What results when we multiply the transition matrix by itself?
The highlighted entry results from the same computation that we
already considered for of a taxi going from D to N in two fares.
What are the other entries of
? What are the entries of
?
?
Some general features of a transition matrix for a Markov chain can
be observed from this problem:
1. A transition matrix is square. Clearly the number of rows and
the number of columns are both the same as the number of states.
2. All entries are between 0 and 1. This follows from the fact that
the entries correspond to transition probabilities from one state to
another.
3. The sum of the entries in any row must be 1. The sum of the
entries in an entire row is the sum of the transition probabilities
from one state to all other states. Since a transition is sure to take
place, this sum must be 1.
4. The ij entry in the matrix
gives the probability of being in
state j after n transitions, with state i as the initial state.
5. The entries in the transition matrix are constant. A Markov
chain model depends upon the assumption that the transition
matrix does not change throughout the process.
Hi Ho Cherry O – A Markov Game
• A player starts with 0 cherries.
• A spinner determines how many
cherries you gain or lose each
turn.
• If the spinner lands on 1, 2, 3,
or 4 cherries then you gain that
many cherries.
• If the spinner lands on the bird
or dog then you lose 2 cherries
(or lose all your cherries if you
have 2 or less).
• If the spinner lands on the
spilled basket then you lose all
your cherries.
• You must collect 10 or more
cherries to win the game.