Presentation Slide - Princeton University

Download Report

Transcript Presentation Slide - Princeton University

Jointly Optimal Transmission and
Probing Strategies for
Multichannel Systems
Saswati Sarkar
University of Pennsylvania
Joint work with Sudipto Guha (Upenn) and Kamesh
Munagala (Duke)
Multichannel Systems
• Current wireless networks have access to
multiple channels
– Frequencies in FDMA, codes in CDMA, antennas in
MIMO, polarization states of antennas, access points
in systems with multiple access points
– IEEE 802.11a, IEEE802.11b, IEEE802.11h
• Future wireless networks are likely to have
access to larger number of channels
– Potential deregulation of spectrum
– Advances in device technology
Multichannel systems
• Transmission quality in different channels
stochastically vary with time.
• A user is likely to find a channel with acceptable
transmission quality when the number of
channels is large,
– if he samples all the channels
• Sampling consumes energy and generates
interference for other users
• Amount of information a user acquires about its
channels becomes an important decision
variable.
Policy Decisions
• Probing policy
– How many channels to probe?
– Which channels to probe?
– Sequence of probing
• Selection policy
– Which channel to transmit (probed channel or
unprobed channel)?
System Goal
• Gain is the difference between the
expected rate of successful transmission
and the cost of probing
• Maximize the system gain
– Need to jointly optimize the probing and
selection policies
– Policies will be adaptive
– Policies may depend on higher order statistics
and cross statistics of channels.
Sense of déjà vu?
• Stopping Time problem (Bertsekas, Dynamic
Programming, Vol. 1)
– Maximize reward by optimally selecting
between two control actions at any given time:
(a) continue or (b) stop
– IID channels (Kanodia et al, 2004, Ji et al, 2004)
• Generalized stopping time problems
– Select between multiple control actions
– Only limited results are known
Sense of déjà vu?
• Stochastic multi-armed bandit problem
(Bertsekas, Dynamic Programming, Vol. 2)
– Observe the state of only one arm at each slot
– Select the arm to observe and get reward
from it
– State of an arm changes only after you
observe it
– State changes are stochastic
– Gittins index based policies maximize the
reward (Gittins-Jones, 1974)
Sense of déjà vu?
• Adversarial multi-armed bandit problems (Auer et
al, 2002)
– State of an arm may change even when you don’t
observe it
– State changes are adversarial
– Some versions allow you to get reward from only one
selected arm and subsequently observe the states of
all arms in the slot
– Given a class of N policies and a time horizon of T
slots, bound the performance loss as compared to the
best of the policies in terms of O(sqrt(T * log N)).
Our Contribution
• Polynomial complexity optimal solution when
channels are mutually and temporally
independent and each channels has two states.
– Sigmetrics, 2006, poster paper
• Polynomial complexity approximate solution
when channels have multiple states
– Approximation ratio ½ for arbitrary number of states
– Approximation ration 2/3 for 3 states
System Model
•
•
•
•
Single user with access to n channels
Each channel can be in one of K states, 0,…K-1.
Channel i is in state j with probability pji
Probability of successful transmission in state j is
rj
• User knows the state of channel i in a slot only if
it probes it.
• Cost of probing channel i is ci
• Channel states are temporally and mutually
important
Optimal policy for K = 2
(Sigmetrics, 2006)
• Exhaustive (S, i) policy:
– probes all channels in set S,
– if no probed channel is in state 1, transmit in channel i (backup).
• Optimal policy is exhaustive (Si, i) for some i
– Si = {j: p1j(1-p1i) > cj}
– Probes all channels with high cross-uncertainty with the backup
channel
– Probes channels in decreasing order of p1j/cj
• Search space for optimal policy restricted (n policies)
• Determine the optimal by explicitly evaluating the gain in
the above class.
Additional challenges for K > 2
• For K = 2, the optimal policy is static
– Actions can be determined before observation
• For K > 3, the optimal policy may be adaptive
– While probed channels are in state 0, focus on channels with
high expected reward
– After observing a channel in intermediate state, focus on
channels which have higher probabilities of being in the highest
state
– After observing a channel in intermediate state, may select either
the observed channel or the backup.
• Optimal policy is a decision tree
– Exponential number of decision trees
– Storage of decision tree consumes exponential complexity
Constant Factor Approximation
Algorithm for K > 2
• Let policy A be the best among those that always
transmits in a probed channel
– Gain GA
• Let policy B be the best among those that
always transmits in an unprobed channel
– Policy B is clearly the one that never probes any
channel and transmits in the channel with the highest
expected transmission rate.
– Gain GB
• Theorem 1: Max(GA , GB) >= ½ Maximum Gain
– The best among A and B attains half the maximum
gain
Proof of Theorem 1
• Q: Probability that the optimal
policy transmit in an unprobed
channel
• OPT: Expected gain of the
optimal policy
• G: Expected gain of the optimal policy
given that it transmits in an unprobed
channel
Proof of Theorem 1
• Modify the optimal so that it transmits in the best probed
channel whenever it was using the backup
– Expected gain OPT’ which is less than or equal to GA
– Expected Gain given that the optimal uses a backup x
– OPT – OPT’ <= Q (G – x)
<= QG
<= GB
OPT <= OPT’ + GB
<= GA + GB
<= 2 max(GA , GB)
Missing link
• Need to compute the policy that
maximizes gains among those that
transmit only in probed channels.
• Once a channel is observed to be in state
u, probe only those channels for which the
expected improvement above u exceeds
cost.
Special case: K = 3
• Theorem 2: There exists an optimum
policy which uses a unique backup
channel.
– There exists a channel l which satisfies the
following characteristic: if the optimum policy
uses a backup it uses l.
Approximation algorithm for
attaining 2/3 approximation ratio
• Let P(l) be the class of policies which
– never probe channel l and
– never uses any channel as a backup other than l.
• Let C be the best policy in P(l) for the best
choice of l.
• Theorem 3: Max(GA , GB , GC) >= 2/3 Maximum
Gain
– Best among A, B, C attains 2/3 of the optimum gain.
Open Problems
• Is the optimization problem NP-hard when K exceeds 2?
• Can we get better approximation ratios?
• What happens when channels are not mutually
independent?
• What happens when channels are not temporally
independent?
• What happens when you can simultaneously transmit in
multiple channels?
– Power control
• What happens when you need not transmit in every slot?
– Lazy scheduling