Multiplicative_Weights_updated
Download
Report
Transcript Multiplicative_Weights_updated
The Multiplicative Weights Update Method
Based on Arora, Hazan & Kale (2005)
Mashor Housh
Oded Cats
Advanced simulation methods
Prof. Rubinstein
Outline
Weighted Majority Algorithm
Binary case
Generalized
Applications
Game Theory
Linear Programming
Fractional Packing problem
NP-hard problems
Zero-Sum game
Set Cover problem
Artificial intelligence (Boosting)
WMA – Binary case
N experts give their predictions
Our decision rule is a weighted majority of the expert
predictions
Initially, all experts have the same weight on our
decision rule
The update rule for incorrect experts is:
t 1
i
w
(1 ) w
t
i
WMA – Binary case
This procedure will yield in gains/losses that are
roughly as good as those of the best of these
experts
Theorem 1 – The algorithm results in the following
bound:
2 ln n
m
t
2(1 )m
t
i
Where:
mit - the number of mistakes that expert I after t steps
m t - the number of mistakes that our algorithm made
WMA Binary case – Proof of Theorem 1
I.
II.
By induction: w (1 )
Define the ‘potential function’: t
mit
t
i
t
1
w
n
i
i
III.
Each time we make a mistake, at least half of the
total weight decrease by a factor of 1 , so:
IV.
V.
t 1
1 1
t
( (1 )) (1 )
2 2
2
t
By induction: n(1 / 2)
t
mt
1
Using ln(1 x) x x for x
2
2
m
t
2 ln n
2(1 )mit
WMA – Binary case : Example1
4 analysts give their prediction to the stock exchange:
3 are always wrong and the 4th is always right
Day
Expert
Market
1
2
3
4
1
2
3
4
WMA – Binary case : Example1
Day
(Cont.)
1
2
3
4
w1
w2
1
0.5
0.25
0.125
1
0.5
0.25
0.125
w3
w4
1
0.5
0.25
0.125
1
1
1
1
Balance of
powers
3/4
1.5/1
0.75/1
0.375/1
Expert
Market
User
0.5
WMA – Binary case : Example1
(Cont.)
t
m
Since our fourth analyst is never wrong: 4 0 t
m
t
2 ln n
2 ln 4
2(1 )m
2(1 0.5)0
0.5
t
i
m 4 2 5.545
WMA – Binary case : Example2
100 analysts give their prediction to the stock
exchange:
99 predict up with probability 0.05
the 100th expert predicts up with probability 0.99
The market goes up at 99% of the time.
WMA – Binary case : Example2
(Cont.)
Generalization of the WMA
Set of events/outcomes (P) is not bounded
M (i, j ) is the penalty that expert i pays when the
outcome is j P
D t p1t , p2t ,..., pnt is the distribution associated with the
experts
t
w
t
i
p
The probability to choose an expert is: i
t
w
k
k
At every round we choose an expert according to D
and follow his advice
Generalization of the WMA (Cont.)
The update rule is:
t
w
(1
)
i
wit 1
M (i , jt )
t
w
(1
)
i
M (i , jt )
M (i, j t ) 0
M (i, j t ) 0
Generalization of the WMA
The expected penalty of the randomized algorithm
is not much worse than that of the best expert
Theorem 2 - The algorithm results in:
ln n
t
t
M
(
D
,
j
)
(1
)
M
(
i
,
j
)
(1
)
M
(
i
,
j
)
t
0
0
t
t
WMA – Comparison via example
The stock market example, using randomized expert
instead of majority vote
1
1
1
0
t
M (i, j )
0
0
0
1
0
0
0
1
Market
t odd
1
1
t even
1
0
WMA – Comparison via example (Cont.)
With penalty only l 0 1
WMA – Comparison via example (Cont.)
WMA – Comparison via example (Cont.)
With penalty and reward l 1 1
Generalization of the WMA - Example
4 weather man give their forecast
There are four possible weather conditions
The payoff matrix is:
Sunny
Cloudy
Rainy
Snowy
100 33 33 33
50
50
50
50
M (i, j )
33 33 33 100
25 25 25 25
Generalization of the WMA –
Example (Cont.)
The actual weather is sunny and cloudy alternately
Generalization of the WMA –
Example (Cont.)
The actual weather varies on the four possible
weather conditions alternately
Applications
Define the following components in order to draw
analogy:
Experts
Events
Payoff matrix
Weights
Update rule
Applications
Game theory
Zero-Sum games
Experts – pure strategies to row player
Events – pure strategies to column player
Payoff matrix M (i, j ) – the payoff to the row player,
when the row player plays strategy i and the column
player plays strategy j
A distribution on the experts represents a mixed row
strategy
The game value is (von Neumann’s MinMax theory
and Nash equilibrium):
* min max M ( D, j ) max min M (i, P)
D
j
P
i
Applications
Game theory
Algorithm for solving Zero-Sum game:
1)
2)
3)
4)
5)
Initialize W (1,...,1) . Determine .
Random a row strategy i according to D
The column player choose the strategy that
maximizes his revenues
t 1
t
M (i, j )
Update wi wi (1 )
1
t t 1
2
If t 16 ln( n ) / stop. Otherwise – return to step 2.
4
1
Applications
Game theory – Example1
j
1
2
3
max(i )
1
1/4
1/3
1/2
1/2
2
1/4
1/3
1/2
1/2
3
1/4
1/3
1/2
1/2
min( j )
1/4
1/3
1/2
i
The row player choose
minimum of maximum
penalty
1
D
3
1
*
2
1
3
1
3
The column player
choose maximum of
minimum penalty
Applications
Game theory – Example1 (Cont.)
Applications
Game theory – Example2
j
1
2
3
max(i )
1
1/4
1/3
1/3
1/3
2
0
0
3/4
3/4
3
2/3
1/3
2/3
2/3
min( j )
0
0
1/3
i
1 / 3
p 1 / 3
1 / 3
(1) The row player chooses a
strategy randomly
wit 1 wit (1 ) M ( i , j )
(2) The column player chooses
the strategy that yield maximum
benefits for him
(3) Updating the weighting over
row strategies
1
3
*
Applications
Game theory – Example2 (Cont.)
Applications
Artificial Intelligence
The objective is to learn an unknown function c : X 0,1
A sequence of training examples is given: ( x, c( x))
D is the fixed unknown distribution on the domain X
The learning algorithm results in an hypothesis
h : X 0,1
100
c(x)
80
60
40
20
0
0
0.2
0.4
0.6
0.8
1
x
The error is:
Ex ~ D h( x) c( x)
Applications
Artificial Intelligence (Cont.)
Strong learning algorithm – for every D, and 0,
Ex ~ D with probability 1
-weak learning algorithm – for every D, , 0
and 0, Ex ~ D 1/ 2 with probability 1
Boosting – combining several moderately accurate
rules-of-thumb into a singly highly accurate prediction
rule.
Applications
Artificial Intelligence (Cont.)
Experts – samples in the training set
Events – set of all hypotheses that can be generated
by the weak learning algorithm
Payoff matrix –
1 h( x) c( x)
M (i, j )
0 h ( x ) c ( x )
The final hypothesis is obtained via majority vote
among
1
2
T
h ( x), h ( x),..., h ( x)
T
2
2
ln
1
Applications
Linear Programming
Finding a feasible solution for a set of m constraints
Ax b
Experts – constraints
Events – solution vectors x
Payoff matrix M (i , j ) - the distance from satisfying
the constraint: Ai x bi
1
xt
The final solution is x
T
t
Track cases that there is no feasible solution
Applications
Linear Programming (Cont.)
Algorithm for finding a feasible solution to a LP problem:
1)
2)
Initialize
1
, W (1,...,1)and the resulting P .
4
Given an oracle which solves the following feasibility
problem with a single constraint plus a set of easy
constraints (Plotkin, Shmoys and Tardos):
cT x d
Where:
d pb
t
i i
i
c pit Ai
i
If there is no feasible solution – break.
Applications
Linear Programming (Cont.)
Algorithm for finding a feasible solution to a LP problem (Cont.):
M (i , x )
3) Update
t
i
t 1
i
M (i , x )
t
i
w
Where:
w (1 )
w (1 )
M (i, x) 0
M (i, x) 0
M (i, x) Ai x bi
4)
Update
5)
If T m, m
P
t 1
wt 1
t 1
w
i
i
stop. Otherwise – return to step 2.
2
Applications
Linear Programming - Example
Finding a feasible solution to the following problem:
1 1 1
A 3 2 4
3 2 0
20
b 42
30
Upper bound (30,30,30)
Lower bound (0, 0, 0)
Solution: X (23.06,20.9,19.71)
Applications
Fractional Vertex Covering problem
Finding a feasible solution for a set of m constraints
Ax b
A, b 0
Ax 0
Experts – constraints
Events – solution vectors x
Payoff matrix M (i, j ) - Ax
The final solution is
1
x
T
t
x
t
Track cases that there is no feasible solution
Applications
Fractional Vertex Covering problem
(Cont.)
Algorithm for finding a feasible solution to a Fractional Covering problem:
1)
2)
1
Initialize
, W (1,...,1) and the resulting P .
4
Given an oracle which solves the following feasibility
problem with a single constraint plus a set of easy
constraints (Plotkin, Shmoys and Tardos):
cT x d
Where:
c pit Ai
i
d pit bi
i
If there is no feasible solution – break.
Applications
Fractional Vertex Covering problem (Cont.)
Algorithm for finding a feasible solution to a Fractional Covering problem:
3)
Update
Where:
4)
Update
M (i ,x )
t 1
i
w
w (1 )
t
i
M (i, x) Ai x
t 1
w
P t 1
t 1
w
i
i
5)
If T m , m
stop. Otherwise – return to step 2.
Applications
Flow problems
Maximum multi-commodity flow problem
A set of source-sink pairs and capacity restrained
edges
max f p
p ,e p
p
f p ce e
Applications
Flow problems (Cont.)
Experts – edges
Events – a flow of value
t
p
c on the path, where c
is the minimum capacity of an edge.
Payoff matrix:
M (e, p t )
Update rule
t 1
e
w
Terminate rule:
c tp
ce
w (1 )
t
e
w 1
T
e
c tp / ce
e p
t
t
p
Applications
Set Cover problem
Find the minimal number of subsets in collection
that their union equals the universe U
U
Experts – elements in the universe
Events – sets in the collection
Payoff matrix –
1 i C j
M (i, C j )
0 i C j
C
C
Applications
Set Cover problem (Cont.)
Update rule:
wit 1 wit (1 M (i, C j ))
i C j wit 1 0
i C j wit 1 1
We would search for the set C j which maximizes
wi
pi
i pi M (i, C j ) i
C j
iC j w j
(Greedy Set Cover Algorithm)
j
Applications
Vertex covering problem - Example
1
5
4
2
5
1
6
2
3
3
4
min xn
nN
s.t.
xe xn 1 (e, n) E
xn 0,1
n N
1
1
0
M (i, j )
1
0
0
0 1 0 0
1 0 0 0
1 1 0 0
0 0 0 1
1 0 0 1
0 0 1 1
Applications
Vertex covering problem – Example (Cont.)
Find the minimum number of nodes (subsets) that
cover all of the edges (universal)
2,3,5
E
set of edges
1,2,4
6
1,3
1
T ln(n) X opt
4,5,6
Applications
Vertex covering problem – Example (Cont.)
The maximum subset which includes edge
node =1
The selected nodes are :
c =1
Iteration:
2.000000
The probability for edge i
p=
0
0 0.3333
0 0.3333
Choose edge following the distribution
i=3
The maximum subset which includes edge
node = 2
The selected nodes are :
c =1 2
Iteration:
3.000000
The probability for edge i
p=0 0 0 0 0 1
Choose edge following the distribution
i=6
The maximum subset which includes edge
node = 4
The selected nodes are :
c=1 2 4
1.000000 is:
0.3333
3.000000 is:
6.000000 is:
Applications
Vertex covering problem – Example (Cont.)
Iteration:
1.000000
The probability for edge i
p = 0.1667 0.1667 0.1667 0.1667 0.1667 0.1667
Choose edge following the distribution i = 6
The maximum subset which includes edge 6.000000 is: node =
The selected nodes are :
c= 5
Iteration:
2.000000
The probability for edge i
p = 0.3333 0.3333 0.3333
0
0
0
Choose edge following the distribution i = 3
The maximum subset which includes edge 3.000000 is: node =
The selected nodes are :
c= 5 2
Iteration:
3.000000
The probability for edge i
p= 1 0 0 0 0 0
Choose edge following the distribution i = 1
The maximum subset which includes edge 1.000000 is: node =
The selected nodes are :
c= 5 2 1
5
2
1
Summary
This paper presents a comprehensive meta-analysis
on the weights update method
Various fields developed independently methods that
has a common ground which can be generalized into
one conceptual procedure
The procedure includes the determination of experts,
events, penalty matrix, weights and an update rule
Additional relevant input: error size, size of and
number of iterations.