ECE-453 Lecture 1

Download Report

Transcript ECE-453 Lecture 1

ECE-517: Reinforcement Learning in
Artificial Intelligence
Lecture 4: Discrete-Time Markov Chains
September 1, 2010
Dr. Itamar Arel
College of Engineering
Department of Electrical Engineering & Computer Science
The University of Tennessee
Fall 2010
1
Simple DTMCs
e
p
1-p
0
0
1
q
1-q
1
a
c
b
d
f
2
“States” can be labeled (0,)1,2,3,…
At every time slot a “jump” decision is made
randomly based on the current state
(Sometimes the arrow pointing back to the same state is omitted)
ECE 517 – Reinforcement Learning in AI
2
1-D Random Walk
p
1-p
X(t)
Time is slotted
The walker flips a coin every time slot to
decide which way to go
ECE 517 – Reinforcement Learning in AI
3
Single Server Queue
Geom(q)
Bernoulli(p)
Consider a queue at a supermarket
In every time slot:


A customer arrives with probability p
The HoL customer leaves with probability q
We’d like to learn about the behavior of such a
system
ECE 517 – Reinforcement Learning in AI
4
Birth-Death Chain
0
1
2
3
Our queue can be modeled by a Birth-Death Chain (a.k.a.
Geom/Geom/1 queue)
Want to know:


Queue size distribution
Average waiting time, etc.
Markov Property




The “Future” is independent of the “Past” given the “Present”
In other words: Memoryless
We’ve mentioned memoryless distributions: Exponential and
Geometric
Useful for modeling and analyzing real systems
ECE 517 – Reinforcement Learning in AI
5
Discrete Time Random Process (DTRP)
Random Process: An indexed family of random
variables
Let Xn be a DTRP consisting of a sequence of
independent, identically distributed (i.i.d.) random
variables with common cdf FX(x). This sequence is
called the i.i.d. random process.


Example: Sequence of Bernoulli trials (flip of coin)
In networking: traffic may obey a Bernoulli i.i.d. arrival
Pattern
In reality, some degree of dependency/correlation
exists between consecutive elements in a DTRP

Example: Correlated packet arrivals (video/audio stream)
ECE 517 – Reinforcement Learning in AI
6
Discrete Time Markov Chains
A sequence of random variables {Xn} is called a
Markov Chain if it has the Markov property :


States are usually labeled {(0,)1,2,…}
State space can be finite or infinite
Transition Probability Matrix




Probability of transitioning from state i to state j
We will assume the MC is homogeneous/stationary:
independent of time
Transition probability matrix: P = {pij}
Two state MC:
ECE 517 – Reinforcement Learning in AI
7
Stationary Distribution
Define
then pk+1 = pk P
(p is a row vector)
Stationary distribution:
if the limit
exists
If p exists, we can solve it by
These are called balance equations

Transitions in and out of state i are balanced
ECE 517 – Reinforcement Learning in AI
8
General Comment & Conditions for p to Exist (I)
If we partition all the states into two sets, then
transitions between the two sets must be “balanced”.
This can be easily derived from the balance equations

Definitions:



State j is reachable by state i if
State i and j commute if they are reachable by
each other
The Markov chain is irreducible if all states
commute
ECE 517 – Reinforcement Learning in AI
9
Conditions for p to Exist (I) (cont’d)
Condition: The Markov chain is irreducible

1
Counter-examples:
2
3
4
1
p=1
2
3
Aperiodic Markov chain …

Counter-example:
0
1
0
1
0
1
1
0
2
ECE 517 – Reinforcement Learning in AI
10
Conditions for p to Exist (II)
For the Markov chain to be recurrent…



All states i must be recurrent, i.e.
Otherwise transient
With regards to a recurrent MC …
State i is recurrent if E(Ti)<1, where Ti is time
between visits to state i
Otherwise the state is considered null-recurrent
ECE 517 – Reinforcement Learning in AI
11
Solving for p: Example for two-state Markov Chain
p
1-p
0
1
1-q
q
p
ECE 517 – Reinforcement Learning in AI
12
Birth-Death Chain
1-u-d
1-u
0
u
1
d
1-u-d
1-u-d
u
d
2
u
3
d
u
d
Arrival w.p. p ; departure w.p. q
Let u = p(1-q), d = q(1-p), r = u/d
Balance equations:
ECE 517 – Reinforcement Learning in AI
13
Birth-Death Chain (cont’d)
Continuing this pattern, we observe that:
p(i-1)u = p(i)d
Equivalently, we can draw a bi-section between
state i and state i-1
Therefore, we have
where r = u/d.
What we are interested in is the stationary
distribution of the states, so …
ECE 517 – Reinforcement Learning in AI
14
Birth-Death Chain (cont’d)
ECE 517 – Reinforcement Learning in AI
15