Transcript ppt - CS

A threshold of ln(n) for
approximating set
cover
By Uriel Feige
Lecturer: Ariel Procaccia
Set Cover
We are given an (unweighted!) collection F
of subsets of U={1,…,n}.
 We must find as few as possible subsets
from F such that their union covers U.
 Hypergraph vertex cover (lecture of 10/3)
is a special case of set cover.

Approximation algorithms

A greedy algorithm:




Recall algorithms 1…
At each stage, add to the cover the subset which
maximizes the number of new elements.
This algorithm gives an approximation of ln(n)lnln(n)+O(1).
An approximation of ln(n) can also be achieved
using a linear programming relaxation.
A bad example for the greedy
algorithm
U={(x,y): 1 ≤ x,y ≤ n}
32
S1 = {(x,y): x ≤ n/2}
S2 = {(x,y): x ≥ n/2}
16
T1= {(x,y): 1 ≤ y ≤ n/2}
8
T2= {(x,y): n/2 ≤ y ≤ 3n/4} 8
…
Ratio: logn/2
Overview (1)


We wish to prove Theorem 8: If a poly time
algorithm can approximate set cover within (1ε)ln(n), then NP is a subset of DTIME(nO(loglog(n))).
The strategy: a reduction from a k-prover proof
system for an NP-complete problem.
Familiar
with IP?
Overview (2)

Parts of the lecture:






MAX 3SAT-5.
The k-prover proof system.
Partition systems.
The reduction to set cover.
Thundering applause.
May the force be with us.
MAX 3SAT-5


Input: A CNF formula with n variables and 5n/3
clauses, in which every clause contains exactly
three literals, every variable appears in exactly 5
clauses, and a variable does not appear in a
clause more than once.
Theorem 1: For some >0, it is NP-hard to
distinguish between 3CNF-5 formulas where
OPT=1 or OPT≤(1-).
Reduction
from MAX3SAT-B
Two prover proof system for 3SAT5



Presented as to provide intuition and help in the
analysis of the k-prover system.
One round, two prover system for 3SAT-5.
Protocol:
V
selects an index of a clause, sends it to the first
prover, selects a random var in the clause, and sends
it to the second prover.
 First prover returns 3 bits, second returns 1 bit.
Assgn. To
 V accepts if the following conditions hold:
clause and


var
Clause check: the assignment sent by the first prover
satisfies the clause.
Consistency check: the assignment sent by the second
prover is identical to the assignment for the same variable
sent by the first prover.
Proposition 2: If OPT=1-ε, then under the
optimal strategy of the provers, V accepts
with probability (1-ε/3).
 Proof:

 The
strategy of the second prover defines an
assignment A to the variables.
 If V selects a clause that is not satisfied by A
(prob. ε), the first prover must set one of the
variables differently from A.
 The consistency check fails with prob. ≥ 1/3. ■
There is a strategy with
acceptance prob ε/3
Parallel repetition




We would like to lower the error.
Modified proof system: V sends to the first
prover l clauses, from each chooses a var, and
sends these l vars to the second prover.
Theorem 3 (Ran Raz): If a one round two
prover system is repeated l times independently
in parallel, then the error is 2-cl, where c>0 is a
constant that depends only on the original proof
system.
The error of the modified two prover system is ≤
2-cl, for some universal constant c.
C for subtle
The k-prover proof system



Binary
l and
P1: code
0011with k codewords,
P2: 0101each of length
P3: 1100
weight l/2, with Hamming distance at least l/3.
Each prover is associatedV
with a codeword.
c1 c2 v3 v4
100,010,1,0
The Protocol:




The verifier selects l clauses C1,…,Cl, then selects a var from
each clause: the distinguished variables x1,…,xl.
Prover Pi receives Cj for those coordinates in its codeword that
are 1, and xj for the coordinates that are 0, and replies with 2l
bits.
The answer of the prover induces an assignment to the
distinguished variables.
Acceptance predicate:


Weak: at least one pair of provers is consistent.
Strong: every pair of provers is consistent.


Lemma 4: If OPT=1 then the provers have a
strategy that causes V to always strongly accept.
If OPT≤(1-ε), then the verifier weakly accepts
with probability at most k22-cl, where c>0 is a
constant that depends only on ε.
Proof:
 If
OPT=1, the provers can base their answers on a
satisfying assignment.
 Assume OPT=(1- ε), and that V weakly accepts with
prob. ≥ δ. Then with respect to Pi and Pj, V accepts
with probability δ/k2.
 There are ≥ l/6 coordinates on which Pi receives a
clause, and Pj a var in this clause. Other coordinates:
for free.
  The provers have a strategy that succeeds with
prob. ≥ δ/k2 on l/6 parallel repetitions of the original
proof system.
  δ/k2<2-cl. ■
Partition systems

A partition system B(m,L,k,d) has the following
properties.
1.
2.
3.
4.

Lemma 6: For every c≥0k and
= 4 m sufficiently large, there
is a partition system B(m,L,k,d) whose parameters
satisfy the following inequalities:
1.
2.
3.

There exists a ground set B of m distinct points.
There is a collection of L distinct partitions p1,…,pL.
For 1≤i≤L, partition pi is a collection of k disjoint subsets of B
whose union is B.
Any cover of the m points by subsets that appear in pairwise
different partitions requires at least d subsets.
L ≈ (log(m))c.
K can be chosen arbitrarily as long as k < ln(m)/3ln(ln(m)).
d = (1-f(k))k∙ln(m), where f(k)→∞ as k→∞.
Proof: a randomized construction usually works.
Looks
L=4
familiar?
The reduction





|U|?
R=(5n)l : num of r.
r↔Br(m,L,k,d): m=nΘ(l),
L=2l, d=(1-f(k))k∙ln(m).
Partition↔dist. vars,
subset↔prover. B(r,j,i) =
i’th subset, partition j,
partition system r.
Subsets are S(q,a,i): all r
s.t. (q,i)@r, extract from a
an assignment ar to dist.
vars. S(q,a,i) is the union
of all subsets B(r,ar,i), for
all r s.t. (q,i)@r.
Q=nl/2(5n/3)l/2 (questions
to Pi).
m points, k subsets
r=1
L=2l
r=2
00
01
10
11
S(q,a,1)
B(2,4,3)
r=R

Lemma 5:
Completeness: OPT=1  there is a set
cover of size kQ.
ar=01
2. Soundness: OPT=(1-ε)  more than (12f(k))kQln(m) subsets are required.
1.

Proof of Completeness:

If OPT=1, the provers answer consistently
with the satisfying assignment.
 For any r, consider S(q1,a1,r),…,S(qk,ak,r) s.t.
(qi,i)@r, and ai is the appropriate answer.
 Br(m,L,k,d) is covered by these k sets.
 Similar for every r. The number of subsets is
kQ. ■
Proof of soundness: the saga begins
Assume OPT≤1-ε, and that there exists C
that covers U, such that |C|=(1-δ)kQln(m),
δ=2f(k).
 q to prover Pi↔ weight w(q,i)=number of
answers a s.t. S(q,a,i) is in C.
 Σq,iw(q,i) = |C|.
 r↔w(r)= Σ(q,i)@rw(q,i).
 This weight = number of subsets that
participate in covering Br(m,L,k,d).
 Call r good if w(r)<(1-δ/2)k∙ln(m).

The good, the bad, and the random
Proposition 6: The fraction of good r is at
r is good if:
least δ/2.
w(r)= Σ
w(q,i)<(1-δ/2)k∙ln(m)
 Proof:

(q,i)@r
 Assume
otherwise, then:
2
w

(1


/
2)
kR ln m  (1   )kR ln m
 r
r
 On
the other hand,
 Hence
R
R
w(q, i)   w(q, i)  | C |
r wr  r (q
Q
,i )@ r
i ,q Q
|C|>(1- δ)kQln(m) – contradiction. ■



Proposition 7: C covers U, |C|=(1-δ)kQln(m)  for
some strategy for the k provers, V weakly accepts with
prob. ≥ 2δ/(k∙ln(m))2.
This proves soundness, since 2δ/(k∙ln(m))2>k22-cl, for
Fix coin
l=Θ(loglogn).
tosses for
provers
Proof of proposition 7:








Randomized strategy: on q to Pi, select a from the set of a
s.t. S(q,a,i) is in C.
For a fixed r: sets B(r,p,i) in the cover of Br(m,L,k,d) ↔ sets
S(q,a,i) in C.
For a good r: C used two subsets from p in the cover of
Br(m,L,k,d).
Denote B(r,p,i) and B(r,p,j) ↔ S(qi,ai,i) and S(qj,aj,j).
Denote: a is in Ar,i iff S(qi,a,i) is in C.
r is good  |Ar,i|+|Ar,j|<k∙ln(m).
The prob. that the provers answer ai and aj ≥ 4/(k∙ln(m))2.
Answers are consistent with p  V weakly accepts.
The prob. that V chooses a good r ≥ δ/2. ■


Theorem 8: If there is some ε>0 such that a
polynomial time algorithm can approximate set
cover within (1-ε)ln(n), then NP is a subset of
DTIME(nO(loglog(n))).
Proof:
 Assume
there is a poly time algorithm A that
approximates set cover within (1-ε)ln(m).
 Reduce GAP-3SAT-5 to set cover as described, with k
s.t. f(k)<ε/4, and m=(5n)2l/ε.
 m,R,Q=nO(loglogn)  time to perform the reduction is
nO(loglogn).
 ln(m)>(1-ε/2)ln(N), where N=mR.
 By lemma 5, for a YES instance all points can be
covered by kQ subsets, and for a NO instance all
points cannot be covered by (1-2f(k))kQln(m).
 The ratio is (1-2f(k))ln(m)>(1-ε)lnN. ■
Closing remarks (1)

Max k-cover:
 Given
U and F as before, select k subsets that such
that their union has maximum cardinality.
 The obvious greedy algorithm approximates max kcover within a ratio of at least 1-1/e ≈ 0.632.
 Theorem 8 can be directly used to prove: if max kcover can be approximated in a polynomial time
within a ration of (1 - 1/e + ε) for some ε>0, then NP is
a subset of DTIME(nO(loglogn)).
Closing remarks (2)

Refinements:
 In
our hardness of approximation result for set cover,
ε is a constant. We may strengthen our assumption
so as to improve our result.
 ZTIME=class of languages that have a probabilistic
algorithm that runs in expected time t (with zero
error).
 If for some η>0, NP is not a subset of ZTIME(2n^η),
then for some constant c’>0 there is no polynomial
time algorithm that approximates set cover within
ln(n) –c’(lnln(n))2.
Questions and cool animations