How Good Are Sparse Cutting-Planes?

Download Report

Transcript How Good Are Sparse Cutting-Planes?

How Good Are Sparse Cutting
Planes?
Santanu Dey, Marco Molinaro, Qianyi Wang
Georgia Tech
Cutting planes
In theory
β€’ π‘Žπ‘₯ ≀ 𝑏
β€’ Give convex hull of solutions
β€’ Many families of cuts, large literature, since 60’s
Cutting planes
In practice
β€’ Only want to use sparse cutting planes
π‘Ž
.π‘₯ ≀ 𝑏
β€’ Most commercial
solvers
use sparsity
to filter cuts
at most
π’Œ non-zero
entries
β€’ Very limited theoretical investigation [Andersen, Weismantel 10]
β€’ Do not give convex hull of solutions
How good are sparse cutting planes?
Geometric problem
β€’ 𝑃 - polytope in 0,1 𝑛
β€’ π‘ƒπ‘˜ - intersection of all π‘˜-sparse cuts
β€’ 𝑑 𝑃, π‘ƒπ‘˜ = maxπ‘˜ 𝑑(π‘₯, 𝑃)
π‘₯βˆˆπ‘ƒ
– Well defined for every polytope
– Upper bound on depth-of-cut
– At most 𝑛
Geometric problem
β€’ 𝑃 - polytope in 0,1 𝑛
β€’ π‘ƒπ‘˜ - intersection of all π‘˜-sparse cuts
β€’ 𝑑 𝑃, π‘ƒπ‘˜ = maxπ‘˜ 𝑑(π‘₯, 𝑃)
π‘₯βˆˆπ‘ƒ
– Well defined for every polytope
– Upper bound on depth-of-cut π‘˜
How does 𝑑(𝑃, 𝑃 ) behave?
– At most 𝑛
Ex 1: 𝑃 =
𝑖 π‘₯𝑖
≀ 1, π‘₯ β‰₯ 0 ; π‘ƒπ‘˜ = {π‘₯𝐼 ≀ 1, k-subset 𝐼 of 𝑛 ; π‘₯ β‰₯ 0}
𝑑 𝑃, π‘ƒπ‘˜
β‰ˆ
1
𝑖 π‘₯𝑖
≀
good
5
π‘˜ (density)
5 𝑛
Ex 2: 𝑃 =
𝒏
π’Œ
𝑛
,π‘₯
2
π‘˜
β‰₯ 0 ; 𝑃 = {π‘₯𝐼 ≀
𝑛
,
2
k-subset 𝐼 of 𝑛 ; π‘₯ β‰₯ 0}
𝒏 𝒏
β‰ˆ
𝟐 π’Œ
bad
𝑛/2
Ex 3: 𝑃 - set of 𝑑 random 0/1 points
∝
𝟏
π’Œ
medium
𝟏
∝
π’Œ
Our
Ourresults
results
β€’ Upper bounds on 𝑑(𝑃, π‘ƒπ‘˜ ) for polytopes in 0,1 𝑛
β€’ Matching lower bound: a random 0/1 polytope, with prob ¼
β€’ Hard packing IPs: sparse cuts are bad
β‰ˆ
𝑑 𝑃, π‘ƒπ‘˜
𝒏
π’Œ
π₯𝐨𝐠(𝒏. #𝐯𝐞𝐫𝐭 𝑷 )
𝟐 𝒏
π‘˜ (density)
𝒏
βˆ’πŸ
π’Œ
Our
Ourresults
results
β€’ Upper bounds on 𝑑(𝑃, π‘ƒπ‘˜ ) for polytopes in 0,1 𝑛
β€’ Matching lower bound: a random 0/1 polytope, with prob ¼
β€’ Hard packing IPs: sparse cuts are bad
𝑑 𝑃, π‘ƒπ‘˜
π‘˜ (density)
Our results
β€’ Upper bounds on 𝑑(𝑃, π‘ƒπ‘˜ ) for polytopes in 0,1 𝑛
β€’ Matching lower bound: a random 0/1 polytope, with prob ¼
β€’ Hard packing IPs: sparse cuts are bad
β‰ˆ
𝑑 𝑃, π‘ƒπ‘˜
π‘˜ (density)
𝒏
π’Œ
π₯𝐨𝐠(𝒏. #𝐯𝐞𝐫𝐭 𝑷 )
Upper bound
β€’ Show: 𝑑 𝑃, π‘ƒπ‘˜ <
𝑛
π‘˜
π‘™π‘œπ‘” 𝑛. #π‘£π‘’π‘Ÿπ‘‘ 𝑃
= 2πœ†
𝑒
2πœ†
𝑷
π‘˜-sparse cut
β€’ Start with 𝑑 of norm 1 st 𝑑𝑃 ≀ 𝑏 βˆ’ πœ† and 𝑑𝑒 β‰₯ 𝑏 + πœ†
β€’ Randomly round 𝑑 to vector 𝐷: scaling 𝛼 =
π‘˜
2 𝑛
β€’ Show: with non-zero prob. 𝐷 is π‘˜-sparse, 𝐷𝑃 ≀ 𝑏 and 𝐷𝑒 > 𝑏
Upper bound
β€’ Show: 𝑑 𝑃, π‘ƒπ‘˜ <
𝑛
π‘˜
π‘™π‘œπ‘” 𝑛. #π‘£π‘’π‘Ÿπ‘‘ 𝑃
= 2πœ†
𝑒
𝑷
𝑑
𝐷
β€’ Start with 𝑑𝑃 ≀ 𝑏 βˆ’ πœ† and 𝑑𝑒 β‰₯ 𝑏 + πœ†; 𝑑 of unit norm
β€’ Randomly β€œround” 𝑑 to vector 𝐷: scaling 𝛼 =
𝑑1𝑖 with prob 𝛼𝑑
wp 𝛼𝑑𝑖 𝑖
𝛼
𝐷𝑖 = 𝛼
00 with
𝑀𝑝 prob
1 βˆ’ 𝛼𝑑
1βˆ’
𝑖 𝛼𝑑𝑖
π‘˜
2 𝑛
β€’ Show: with non-zero prob. 𝐷 is π‘˜-sparse, 𝐷𝑃 ≀ 𝑏 and 𝐷𝑒 > 𝑏
Upper bound
β€’ Show: 𝑑 𝑃, π‘ƒπ‘˜ <
𝑛
π‘˜
π‘™π‘œπ‘” 𝑛. #π‘£π‘’π‘Ÿπ‘‘ 𝑃
= 2πœ†
𝑒
𝑷
𝑑
𝐷
β€’ Start with 𝑑𝑃 ≀ 𝑏 βˆ’ πœ† and 𝑑𝑒 β‰₯ 𝑏 + πœ†; 𝑑 of unit norm
β€’ Randomly β€œround” 𝑑 to vector 𝐷: scaling 𝛼 =
π‘˜
2 𝑛
𝑖)
𝑑𝑠𝑖𝑔𝑛(𝑑
𝑖
with𝛼𝑑
prob 𝛼 𝑑𝑖
wp
𝛼
𝑖
𝐷𝑖 = 𝑑𝑖
𝐷𝑖 = 𝛼
00 𝑀𝑝
βˆ’ 𝛼𝑑1𝑖 βˆ’ 𝛼 𝑑𝑖
with1prob
β€’ Show: with non-zero
if 𝜢 𝒅 ≀prob.
𝟏 𝐷 is π‘˜-sparse, 𝐷𝑃
if πœΆβ‰€π’…π‘ and
> πŸπ·π‘’
π’Š
π’Š
Upper bound
β€’ Show: 𝑑 𝑃, π‘ƒπ‘˜ <
𝑛
π‘˜
π‘™π‘œπ‘” 𝑛. #π‘£π‘’π‘Ÿπ‘‘ 𝑃
= 2πœ†
𝑒
𝑷
𝑑
𝐷
β€’ Start with 𝑑𝑃 ≀ 𝑏 βˆ’ πœ† and 𝑑𝑒 β‰₯ 𝑏 + πœ†; 𝑑 of unit norm
β€’ Randomly β€œround” 𝑑 to vector 𝐷: scaling 𝛼 =
𝑑𝑑𝑖𝑖 with prob 𝛼𝑑
wp 𝛼𝑑𝑖 𝑖
𝛼
𝐷𝑖 = 𝛼
00 with
𝑀𝑝 prob
1 βˆ’ 𝛼𝑑
1βˆ’
𝑖 𝛼𝑑𝑖
π‘˜
2 𝑛
β€’ Show: prob > 0, 𝐷 is π‘˜-sparse, 𝐷𝑃 ≀ 𝑏 and 𝐷𝑒 > 𝑏
Upper bound
Obs: For every 𝑣, E 𝐷𝑣 = 𝑑𝑣; Var 𝐷𝑣 =
1
𝛼
2
𝑣
𝑖 𝑖 |𝑑𝑖 |
Claim 1: With high probability 𝐷 is k-sparse
β€’ E[#non-zeros in 𝐷] =
𝑖 𝛼𝑑𝑖
≀ π‘˜/2
β€’
Stddev(#non-zeros in 𝐷) ≀ π‘˜
β€’
Since 𝐷𝑖 ’s are independent, #non-zeros β‰ˆ + π‘˜ ≀ π‘˜
π‘˜
2
Bernstein’s Inequality
Upper bound
Obs: For every 𝑣, E 𝐷𝑣 = 𝑑𝑣; Var 𝐷𝑣 =
1
𝛼
2
𝑣
𝑖 𝑖 |𝑑𝑖 |
Claim 1: With high probability 𝐷 is k-sparse
Claim 2: With high probability 𝐷𝑃 ≀ 𝑏
β€’
max 𝐷𝑝 ≀
π‘βˆˆπ‘ƒ
max
π‘βˆˆπ‘£π‘’π‘Ÿπ‘‘(𝑃)
𝐷𝑝
β€’ For fixed vertex 𝑝, E 𝐷𝑝 = 𝑑𝑝 ≀ 𝑏 βˆ’ πœ†β€¦
β€’ …and Stddev 𝐷𝑝 ≀
2𝑛
π‘˜
≀
πœ†
log(𝑛.#vert 𝑃 )
β€’ Pr 𝐷𝑝 > 𝑏 ≀ Pr 𝐷𝑝 > 𝐸 𝐷𝑝 + πœ† ≀
β€’ By union bound, Pr( max
π‘βˆˆπ‘£π‘’π‘Ÿπ‘‘(𝑃)
1
𝑛.#π‘£π‘’π‘Ÿπ‘‘(𝑃)
𝐷𝑝 > 𝑏) < 1/n
Upper bound
Obs: For every 𝑣, E 𝐷𝑣 = 𝑑𝑣; Var 𝐷𝑣 =
1
𝛼
2
𝑣
𝑖 𝑖 |𝑑𝑖 |
Claim 1: With high probability 𝐷 is k-sparse
Claim 2: With high probability 𝐷𝑃 ≀ 𝑏
Claim 3: With probability 1/2n, 𝐷𝑒 > 𝑏
β€’
β€’
β€’
β€’
𝐷(𝑒 βˆ’ 𝑣) = 𝐷(2πœ†π‘‘), so 𝐷𝑒 = 2πœ†π·π‘‘ + 𝐷𝑣
E 𝐷𝑣 = 𝑑𝑣 = 𝑏 βˆ’ πœ†, so 𝐷𝑒 β‰ˆ 2πœ†π·π‘‘ + 𝑏 βˆ’ πœ†
Show: with prob 1/2𝑛, 2πœ†π·π‘‘ > πœ† ≑ 𝑫𝒅 > 𝟏/𝟐
E[𝐷𝑑] = 𝑑𝑑 = 1
β€’ 𝐷𝑑 ≀
1 2
𝑖 𝛼 𝑑𝑖
≀𝑛
β€’ 𝐷𝑑 > 1/2 with prob 1/2𝑛 (Markov’s ineq to 𝑛 βˆ’ 𝐷𝑑)
𝑣
𝑒
2πœ†
𝑷
𝑑
𝑑π‘₯ = 𝑏
Upper bound
Obs: For every 𝑣, E 𝐷𝑣 = 𝑑𝑣; Var 𝐷𝑣 =
1
𝛼
2
𝑣
𝑖 𝑖 |𝑑𝑖 |
Claim 1: With high probability 𝐷 is k-sparse
Claim 2: With high probability 𝐷𝑃 ≀ 𝑏
Claim 3: With probability 1/2n, 𝐷𝑒 > 𝑏
Taking union bound over the claims, with prob > 0 𝐷 is π‘˜-sparse,
𝐷𝑃 ≀ 𝑏 and 𝐷𝑒 > 𝑏
Obs: Bottleneck of analysis is to control maxπ‘βˆˆπ‘ƒ 𝐷𝑝 in Claim 2. Can
use different parameters of 𝑃 (e.g. covering number, chaining)
Lower bound
Lemma: Let 𝑃 be convex hull of 𝑑 random points from 0,1 𝑛 . Then
with prob 1/4, 𝑑 𝑃, π‘ƒπ‘˜ β‰₯
𝑛
π‘˜
log 𝑑 1 βˆ’
𝑑 𝑃, π‘ƒπ‘˜
π‘˜ (density)
1
π‘˜
3
2
βˆ’ log 𝑑
Lower bound
Lemma: Let 𝑃 be convex hull of 𝑑 random points from 0,1 𝑛 . Then
𝑛
π‘˜
with prob 1/4, 𝑑 𝑃, π‘ƒπ‘˜ β‰₯
log 𝑑 1 βˆ’
1
π‘˜
3
2
βˆ’ log 𝑑
Idea: Explicitly find point in π‘ƒπ‘˜ far from 𝑃
Step 1: Show that inequality
𝑛
2
𝑖 π‘₯𝑖 ≀ + [π‘’π‘Ÿπ‘Ÿ] is valid for 𝑃
Step 2: Show that if π‘Žπ‘₯ ≀ 𝑏 is valid for π‘ƒπ‘˜ , then 𝑏 β‰₯
β‡’ point β‰ˆ
1
2
+
log 𝑑
π‘˜
𝑒 belongs to P^k
1
2
+
log 𝑑
π‘˜
𝑖 π‘Žπ‘–
Lower bound
Lemma: Let 𝑃 be convex hull of 𝑑 random points from 0,1 𝑛 . Then
𝑛
π‘˜
with prob 1/4, 𝑑 𝑃, π‘ƒπ‘˜ β‰₯
log 𝑑 1 βˆ’
1
π‘˜
3
2
βˆ’ log 𝑑
Idea: Explicitly find point in π‘ƒπ‘˜ far from 𝑃
Step 1: Show that inequality
𝑛
2
𝑖 π‘₯𝑖 ≀ + [π‘’π‘Ÿπ‘Ÿ] is valid for 𝑃
Step 2: Show that if π‘Žπ‘₯ ≀ 𝑏 is valid for π‘ƒπ‘˜ , then 𝑏 β‰₯
β‡’ point β‰ˆ
1
2
+
log 𝑑
π‘˜
𝑒 belongs to P^k
Anticoncentration: Pr π‘Žπ‘‹ β‰₯ E π‘Žπ‘‹ +
1
2
+
log 𝑑
π‘˜
𝑖 π‘Žπ‘–
Hard packing IPs
0/1 with
prob 1/2
π‘₯ ≀
𝐴
π‘₯ ∈ 0,1
𝑏
#1β€² 𝑠 𝑖𝑛 π‘Ÿπ‘œπ‘€
2
𝑛
Used commonly as computational test-instances [Freville and
Plateau 96, Chu and Beasly 98, Kaparis and Letchford 08 and 10, …]
Theorem: With probability at least ½, 𝑑
for π‘˜ β‰₯ 𝑛/2.
𝑃, π‘ƒπ‘˜
β‰₯~ 𝑛
𝑛
π‘˜
Obs 1: Almost matches upper bound: as bad as it gets
Obs 2: Still have distance 𝑂( 𝑛) even with sparsity Ξ©(𝑛)
βˆ’1 ,
Conclusion
β€’ Push for theoretical study of sparse cutting planes
Results
β€’ Matching upper and lower bounds on approximation of sparse cuts:
3 phases, dependence on number of vertices
β€’ Analysis of hard packing IPs: sparse cuts are bad
Questions
β€’
β€’
β€’
β€’
Relationship with sparsity of formulation?
When should we use denser cuts?
Sparsify cutting planes?
Reformulations that allow good sparse cuts
Thank you!