Multiplicative_Weights_updated

Download Report

Transcript Multiplicative_Weights_updated

The Multiplicative Weights Update Method
Based on Arora, Hazan & Kale (2005)
Mashor Housh
Oded Cats
Advanced simulation methods
Prof. Rubinstein
Outline

Weighted Majority Algorithm



Binary case
Generalized
Applications

Game Theory


Linear Programming


Fractional Packing problem
NP-hard problems


Zero-Sum game
Set Cover problem
Artificial intelligence (Boosting)
WMA – Binary case




N experts give their predictions
Our decision rule is a weighted majority of the expert
predictions
Initially, all experts have the same weight on our
decision rule
The update rule for incorrect experts is:
t 1
i
w
 (1   ) w
t
i
WMA – Binary case

This procedure will yield in gains/losses that are
roughly as good as those of the best of these
experts

Theorem 1 – The algorithm results in the following
bound:
2 ln n
m 
t

 2(1   )m
t
i
Where:
mit - the number of mistakes that expert I after t steps
m t - the number of mistakes that our algorithm made
WMA Binary case – Proof of Theorem 1
I.
II.
By induction: w  (1   )
Define the ‘potential function’: t 
mit
t
i
t
1
w


n
 i
i
III.
Each time we make a mistake, at least half of the
total weight decrease by a factor of 1   , so:

IV.
V.
t 1
1 1

t
  (  (1   ))   (1  )
2 2
2
t
By induction:   n(1   / 2)
t
mt
1
Using  ln(1  x)  x  x for x 
2
2
m 
t
2 ln n

 2(1   )mit
WMA – Binary case : Example1

4 analysts give their prediction to the stock exchange:
3 are always wrong and the 4th is always right
Day
Expert
Market
1
2
3
4
1
2
3
4
WMA – Binary case : Example1
Day
(Cont.)
1
2
3
4
w1
w2
1
0.5
0.25
0.125
1
0.5
0.25
0.125
w3
w4
1
0.5
0.25
0.125
1
1
1
1
Balance of
powers
3/4
1.5/1
0.75/1
0.375/1
Expert
Market
User
  0.5
WMA – Binary case : Example1

(Cont.)
t
m
Since our fourth analyst is never wrong: 4  0 t
m 
t
2 ln n

2 ln 4
 2(1   )m 
 2(1  0.5)0
0.5
t
i
m 4  2  5.545
WMA – Binary case : Example2

100 analysts give their prediction to the stock
exchange:



99 predict up with probability 0.05
the 100th expert predicts up with probability 0.99
The market goes up at 99% of the time.
WMA – Binary case : Example2
(Cont.)
Generalization of the WMA




Set of events/outcomes (P) is not bounded
M (i, j ) is the penalty that expert i pays when the
outcome is j  P
D t  p1t , p2t ,..., pnt  is the distribution associated with the
experts
t
w
t
i
p
The probability to choose an expert is: i 
t
w
k
k

At every round we choose an expert according to D
and follow his advice
Generalization of the WMA (Cont.)

The update rule is:

t


w
(1


)
i
wit 1  
 M (i , jt )
 t

w
(1


)
 i
M (i , jt )
M (i, j t )  0
M (i, j t )  0
Generalization of the WMA

The expected penalty of the randomized algorithm
is not much worse than that of the best expert

Theorem 2 - The algorithm results in:
 ln n
t
t
M
(
D
,
j
)


(1


)
M
(
i
,
j
)

(1


)
M
(
i
,
j
)
t



0
0
t
t
WMA – Comparison via example

The stock market example, using randomized expert
instead of majority vote
 1

 1
 1

 0
t
M (i, j )  
 0
 0

 0
 1

0

0
0

1
Market
t  odd
1

1
t  even

1

0
WMA – Comparison via example (Cont.)

With penalty only l    0 1
WMA – Comparison via example (Cont.)
WMA – Comparison via example (Cont.)

With penalty and reward l    1 1
Generalization of the WMA - Example



4 weather man give their forecast
There are four possible weather conditions
The payoff matrix is:
Sunny
Cloudy
Rainy
Snowy
 100 33 33 33 



50

50
50
50

M (i, j )  
 33 33 33 100 


 25 25 25 25 
Generalization of the WMA –
Example (Cont.)

The actual weather is sunny and cloudy alternately
Generalization of the WMA –
Example (Cont.)

The actual weather varies on the four possible
weather conditions alternately
Applications

Define the following components in order to draw
analogy:
 Experts
 Events
 Payoff matrix
 Weights
 Update rule
Applications
Game theory






Zero-Sum games
Experts – pure strategies to row player
Events – pure strategies to column player
Payoff matrix M (i, j ) – the payoff to the row player,
when the row player plays strategy i and the column
player plays strategy j
A distribution on the experts represents a mixed row
strategy
The game value is (von Neumann’s MinMax theory
and Nash equilibrium):
 *  min max M ( D, j )  max min M (i, P)
D
j
P
i
Applications
Game theory
Algorithm for solving Zero-Sum game:
1)
2)
3)
4)
5)
Initialize W  (1,...,1) . Determine  .
Random a row strategy i according to D
The column player choose the strategy that
maximizes his revenues
t 1
t
M (i, j )
Update wi  wi (1   )
1
t  t 1
2
If t  16 ln( n ) /  stop. Otherwise – return to step 2.


4
 1
Applications
Game theory – Example1
j
1
2
3
max(i )
1
1/4
1/3
1/2
1/2
2
1/4
1/3
1/2
1/2
3
1/4
1/3
1/2
1/2
min( j )
1/4
1/3
1/2
i
The row player choose
minimum of maximum
penalty
1
D
3
1
*
 
2
1
3
1

3
The column player
choose maximum of
minimum penalty
Applications
Game theory – Example1 (Cont.)
Applications
Game theory – Example2
j
1
2
3
max(i )
1
1/4
1/3
1/3
1/3
2
0
0
3/4
3/4
3
2/3
1/3
2/3
2/3
min( j )
0
0
1/3
i
1 / 3


p  1 / 3
1 / 3


(1) The row player chooses a
strategy randomly
wit 1  wit (1   ) M ( i , j )
(2) The column player chooses
the strategy that yield maximum
benefits for him
(3) Updating the weighting over
row strategies
1
 
3
*
Applications
Game theory – Example2 (Cont.)
Applications
Artificial Intelligence




The objective is to learn an unknown function c : X  0,1
A sequence of training examples is given: ( x, c( x))
D is the fixed unknown distribution on the domain X
The learning algorithm results in an hypothesis
h : X  0,1
100
c(x)
80
60
40
20
0
0
0.2
0.4
0.6
0.8
1
x

The error is:
Ex ~ D  h( x)  c( x) 
Applications
Artificial Intelligence (Cont.)



Strong learning algorithm – for every D,  and   0,
Ex ~ D   with probability 1  
 -weak learning algorithm – for every D,  ,   0
and   0, Ex ~ D  1/ 2   with probability 1  
Boosting – combining several moderately accurate
rules-of-thumb into a singly highly accurate prediction
rule.
Applications
Artificial Intelligence (Cont.)




Experts – samples in the training set
Events – set of all hypotheses that can be generated
by the weak learning algorithm
Payoff matrix –
1 h( x)  c( x)
M (i, j )  
0 h ( x )  c ( x )
The final hypothesis is obtained via majority vote
among
1
2
T
h ( x), h ( x),..., h ( x)
T
2

2
ln
1

Applications
Linear Programming

Finding a feasible solution for a set of m constraints
Ax  b
Experts – constraints
 Events – solution vectors x
 Payoff matrix M (i , j ) - the distance from satisfying
the constraint: Ai x  bi
1
xt
 The final solution is x 


T
t

Track cases that there is no feasible solution
Applications
Linear Programming (Cont.)
Algorithm for finding a feasible solution to a LP problem:
1)
2)
Initialize  

1
, W  (1,...,1)and the resulting P .
4
Given an oracle which solves the following feasibility
problem with a single constraint plus a set of easy
constraints (Plotkin, Shmoys and Tardos):
cT x  d
Where:
d pb
t
i i
i
c   pit Ai
i
If there is no feasible solution – break.
Applications
Linear Programming (Cont.)
Algorithm for finding a feasible solution to a LP problem (Cont.):
M (i , x )
3) Update
t

i
t 1
i
 M (i , x )
t

i
w
Where:

 w (1   )


 w (1   )
M (i, x)  0
M (i, x)  0
M (i, x)  Ai x  bi
4)
Update
5)
If T  m, m
P
t 1
wt 1

t 1
w
 i
i
 stop. Otherwise – return to step 2.
2
Applications
Linear Programming - Example

Finding a feasible solution to the following problem:
1 1 1


A  3 2 4
3 2 0


 20 
 
b   42 
 30 
 
Upper bound  (30,30,30)
Lower bound  (0, 0, 0)

Solution: X  (23.06,20.9,19.71)
Applications
Fractional Vertex Covering problem

Finding a feasible solution for a set of m constraints
Ax  b
A, b  0
Ax  0





Experts – constraints
Events – solution vectors x
Payoff matrix M (i, j ) - Ax
The final solution is
1
x
T
t
x

t
Track cases that there is no feasible solution
Applications
Fractional Vertex Covering problem
(Cont.)
Algorithm for finding a feasible solution to a Fractional Covering problem:
1)
2)

1
Initialize  
, W  (1,...,1) and the resulting P .
4
Given an oracle which solves the following feasibility
problem with a single constraint plus a set of easy
constraints (Plotkin, Shmoys and Tardos):
cT x  d
Where:
c   pit Ai
i
d   pit bi
i
If there is no feasible solution – break.
Applications
Fractional Vertex Covering problem (Cont.)
Algorithm for finding a feasible solution to a Fractional Covering problem:
3)
Update
Where:
4)
Update
M (i ,x )
t 1
i
w
 w (1   )
t
i

M (i, x)  Ai x
t 1
w
P t 1 
t 1
w
 i
i
5)
If T  m , m
 stop. Otherwise – return to step 2.
Applications
Flow problems


Maximum multi-commodity flow problem
A set of source-sink pairs and capacity restrained
edges
max  f p

p ,e p
p
f p  ce e
Applications
Flow problems (Cont.)


Experts – edges
Events – a flow of value
t
p
c on the path, where c
is the minimum capacity of an edge.
 Payoff matrix:
M (e, p t ) 

Update rule
t 1
e
w

Terminate rule:
c tp
ce
 w (1   )
t
e
w 1
T
e
c tp / ce
e  p
t
t
p
Applications
Set Cover problem

Find the minimal number of subsets in collection
that their union equals the universe U
U



Experts – elements in the universe
Events – sets in the collection
Payoff matrix –
1 i  C j
M (i, C j )  
0 i  C j
C
C
Applications
Set Cover problem (Cont.)

Update rule:
wit 1  wit (1   M (i, C j ))
i  C j  wit 1  0
i  C j  wit 1  1

We would search for the set C j which maximizes
wi
pi  
i pi M (i, C j )  i
C j
iC j  w j
(Greedy Set Cover Algorithm)
j
Applications
Vertex covering problem - Example
1
5
4
2
5
1
6
2
3
3
4
min  xn
nN
s.t.
xe  xn  1 (e, n)  E
xn  0,1
n  N
1

1
0
M (i, j )  
1
0

0
0 1 0 0

1 0 0 0
1 1 0 0

0 0 0 1
1 0 0 1

0 0 1 1
Applications
Vertex covering problem – Example (Cont.)

Find the minimum number of nodes (subsets) that
cover all of the edges (universal)
2,3,5
E
set of edges
1,2,4
6
1,3
 1
T  ln(n)   X opt
4,5,6
Applications
Vertex covering problem – Example (Cont.)
The maximum subset which includes edge
node =1
The selected nodes are :
c =1
Iteration:
2.000000
The probability for edge i
p=
0
0 0.3333
0 0.3333
Choose edge following the distribution
i=3
The maximum subset which includes edge
node = 2
The selected nodes are :
c =1 2
Iteration:
3.000000
The probability for edge i
p=0 0 0 0 0 1
Choose edge following the distribution
i=6
The maximum subset which includes edge
node = 4
The selected nodes are :
c=1 2 4
1.000000 is:
0.3333
3.000000 is:
6.000000 is:
Applications
Vertex covering problem – Example (Cont.)
Iteration:
1.000000
The probability for edge i
p = 0.1667 0.1667 0.1667 0.1667 0.1667 0.1667
Choose edge following the distribution i = 6
The maximum subset which includes edge 6.000000 is: node =
The selected nodes are :
c= 5
Iteration:
2.000000
The probability for edge i
p = 0.3333 0.3333 0.3333
0
0
0
Choose edge following the distribution i = 3
The maximum subset which includes edge 3.000000 is: node =
The selected nodes are :
c= 5 2
Iteration:
3.000000
The probability for edge i
p= 1 0 0 0 0 0
Choose edge following the distribution i = 1
The maximum subset which includes edge 1.000000 is: node =
The selected nodes are :
c= 5 2 1
5
2
1
Summary




This paper presents a comprehensive meta-analysis
on the weights update method
Various fields developed independently methods that
has a common ground which can be generalized into
one conceptual procedure
The procedure includes the determination of experts,
events, penalty matrix, weights and an update rule
Additional relevant input: error size, size of  and
number of iterations.