Ming Li Talk about Bioinformatics

Transcript Ming Li Talk about Bioinformatics

Lecture 20. Computational Complexity
 So far in this course we have discussed two
sorts of problems: problems that are efficiently
solvable, like shortest paths or matrix
multiplication, and problems that are unsolvable.
In between the two, however, are problems that
are solvable, but for which no fast algorithm is
known to exist.
Juris Hartmanis
 First of all, it is known that for every
time complexity t(n), if t(n)/t’(n)  0,
then there is some problem computable
in time t’(n), not in t(n).
Class P
 What do we mean by "fast"? The running time should be
bounded by a polynomial in the input size. I.e. the worstcase running time should be O(nk) for some k. This is called
polynomial time, and the complexity class P is the class of all
problems solvable in polynomial time.
 P contains pretty much every problem we looked at in this
course: maximum subrange sum, greatest common divisor,
matrix multiplication, all-pairs shortest paths, etc. Of course, it
does not contain any unsolvable problems. About the only
solvable problem we considered that is not in P is the problem
of listing all the permutations. This can't be in P because we
need to output n! results on an input of size n. But this isn't so
interesting because of the large output size.
Why polynomial time:
 Why is polynomial-time a synonym for "fast"? Why
not some other definition, like problems that can be
solved in time O(n), or O(nlog n), or O(en)? There are
three answers:
 Nearly every algorithm that we actually consider to
be fast in practice runs in polynomial time.
 A program that runs in polynomial time in one
model (e.g., a one-tape Turing machine) runs in
polynomial time in most other models (e.g., logcost RAM, multi-tape Turing machine, etc.).
 The class P has nice closure properties. It is
closed under composition. P is also closed under
using a polynomial-time subroutine.
Why not P
 It doesn't consider the exponent of the polynomial. An
algorithm running in n10000 time is considered "fast"
and one running in 2n/10000 is not, although for
practical problem sizes the second will be more
useful than the first.
 It doesn't consider the size of the constant out in
front. An algorithm running in 1000000 n2 will be less
useful in practice than an algorithm running in 2n/10000.
 It considers only worst-case behavior. Some
problems may be efficiently solvable for most inputs,
and only require lots of time on a small set of inputs.
 It doesn't consider other models of computation, such
as quantum computing.
Problems not in P
 Are there any computable problems not in P?
 ERE problem: Given two extended regular expressions E and F




over an alphabet Σ, decide if E and F specify the same set.
An extended regular expression (ERE) is just like a regular
expression except that exponentiation is allowed. For example,
(a3 + b2)2 = (aaa + bb)2
= (aaa + bb)(aaa + bb)
= aaaaaa + aaabb + bbaaa + bbbb.
The ERE problem is solvable: to see this, we can take any ERE
and expand out all the exponents to get a regular RE. Do this for
E and F, getting RE's E' and F'. Now convert these to equivalent
deterministic finite automata, minimize the automata, and check
if they are the same. However,
Theorem. There exists a constant c such that every algorithm
that solves the ERE problem must use at least 2cn/log n time on
infinitely many inputs.
Class NP
 Informally speaking, NP is the class of decision
problems having the property that their "yes"
instances can be verified in polynomial time. Recall
that a "decision problem" is a problem with a yes/no
answer; a "yes" instance is an instance of the
problem with a "yes" answer.
 Example: King Arthur had 150 knights. Some of the
knights are not friendly to some other knights. King
Arthur wishes to arrange a round table meeting so
that no pair of knights sitting together hate each
other.


Enumerate all 150! permutations is exponential time.
Merlin’s oracle gives a permutation, verification is easy
Jack at Waterloo
Edmonds, Cook, Karp, Levin, etc
 Cobham and Edmonds pointed out that polynomial
time is a good definition for “good algorithm”.
 In early 1970’s two people in the US (Steve Cook,
UofT) and Russia (L. Levin, student of A.N.
Kolmogorov) proved independently that NP actually
has a large group of “hardest problems”
 Both Cook and Levin had interesting paths to their
discoveries. Cook only added the NP-hardness proof
after the paper is accepted. Karp immediately
realized the importance, and published a long list of
NP-hard problems. Levin did not even want to publish
his result --- Kolmogorov forced him to publish it.
Hamiltonian cycle
 An undirected graph has a Hamiltonian cycle if there is a simple
cycle visiting all the vertices in the graph exactly once. The decision
problem is:
 HAM-CYCLE: given a graph G, does it have a Hamiltonian cycle?
 The associated "yes" instances are those graphs with a Hamiltonian
cycle.
Hamiltonian cycle …
 It is not at all obvious how to solve HAM-CYCLE in
polynomial time, and in fact, no polynomial time
algorithm is currently known. However, if I claim that
a particular graph has a Hamiltonian cycle, and you
demand proof, then there is a certificate I can
produce that you can easily check: namely, the
vertices forming the cycle! Now you can check the
certificate quickly: just check that it really is a simple
cycle, and that it actually visits each vertex. You can
verify this quickly, if I provide you with the cycle. Of
course, finding the cycle appears to be hard --- but
verifying it is easy. That's the essence of NP.
Small divisors problem.
 Given a positive integer n, and a bound B, is
n divisible by an integer c with 1 < c < B.
 Nobody currently knows a fast algorithm for
integer factorization. But given a solution, you
can check its correctness easily.
 That's the crux of the difference between P
and NP. Problems in P can be quickly solved;
problems in NP are not necessarily solvable
quickly, but the answer can be verified quickly
provided you have the certificate.
A story
 Here's a story which may help clarify this distinction:
 In 1903, F. N. Cole factored (Mersenne number) 267-1 (in 1876,
it was proven that it is not a prime, but nobody knew how to
factor it). He presented this to American Mathematical Society.
When the chairman called on him for his paper, Cole--who was
always a man of very few words--walked to the board, and,
saying nothing, proceeded to chalk up the arithmetic for raising
2 to the sixty-seventh power. Then he carefully subtracted 1.
Without a word he moved over to a clear space on the board
and multiplied out, by longhand,
193,707,721 × 761,838,257,287.
The two calculations agreed... His audience greeted the
presentation with a standing ovation. Cole took his seat without
having uttered a word during the hour long presentation.
Nobody asked a question.
 When asked how long it took him to factor the number, Cole
replied: three years of Sundays.
 By the way: the largest (Mersenne) prime: 243,112,609 -1 as of now
Formal definition of NP
 A decision problem is a parameterized problem with
a yes/no answer.
 A verifier for a decision problem is an algorithm V that
takes two inputs: an instance I of the decision
problem and a certificate C. V returns "true" if C
verifies that the answer to I is "yes" and "false"
otherwise. Furthermore, for each "yes" instance I,
there must be at least one such certificate, and for
each "no" instance there must be no certificates at
all. If V runs in time bounded by a polynomial in the
size of I, then it is called a polynomial-time verifier.
 A decision problem is in NP if it has a polynomialtime verifier. Note that every problem in P is in NP in
a trivial sense.
Examples
 Given a graph G, does G have a Hamiltonian cycle?
A polynomial-time verifier takes I = G and C = a
claimed list of the vertices forming the cycle as input.
It then checks to ensure that all vertices are
represented and the edges actually exist in G.
 Given n, is n a composite number? A polynomial-time
verifier takes I = n and C = (a,b) as input. It then
verifies that both a and b are ≥ 2 and that n = ab. If
this is the case, it answer "yes"; otherwise it answers
"false".
 The traveling salesman problem. Given a directed
graph G with weights on the edges, and a number B,
find a directed cycle passing through all the vertices,
with total weight ≤ B. Certificate is the cycle.
Examples continue …
 We say a subset of the vertices of an undirected
graph is an "independent set" if no two vertices are
connected by an edge. The problem is, given a graph
G, and an integer b, is there an independent set of
size ≥ b? The certificate is the subset.
 King Arthur’s problem. It is actually the Hamiltonian
problem, right? (This is a reduction!) The certificate is
the sitting arrangement, and King Arthur can easily
check no neighboring knights fight. --- Merlin’s Oracle
actually provides nothing but the certificate!
Decision problem vs Computational Problem
 Although not every problem is a decision problem,
the following principle applies:
 Principle: Usually, a problem p has an associated
decision problem p' such that
 (a) Given an algorithm for p', we can solve p with a
small amount of additional work, and
 (b) Given an algorithm for p, we can solve p' with a
small amount of additional work. )
 For example, for integer factorization, we could ask,
is the i'th bit of the smallest prime divisor of n equal to
1? By asking a small number of such questions, we
can find the smallest prime divisor p and then
recursively produce the factorization of n/p.
Co-NP
 There is an asymmetry in our definition of NP. We
only need certificates for the instances with "yes"
answers. For "no" answers we need not provide any
certificate at all. For many problems in NP, it is not
obvious how we could produce a certificate for the
"no" answers. For example, what certificate would
show that no Hamiltonian cycle exists?
 Because of this asymmetry, it is possible to consider
those decision problems for which the "no" instances
have a polynomial-time verifier. This class is called
co-NP.
 We do not know the relationship between NP and coNP. It could be that NP = co-NP, and it could be that
P = NP ∩ co-NP.
Problems in co-NP
 Obviously, complement of any problem in NP is in co-
NP. For example: given a graph G, does it fail to
have a Hamiltonian cycle?
 Here's another: given a Boolean formula, is it a
tautology? (A tautology is a formula that is always
true, no matter what values are substituted for the
variables.) If the answer is "no", we can verify this
quickly, by simply guessing an assignment of values
to the variables, and verifying that the result is "false“
(I,e, its complement is in NP: Satisfiability problem)
Polynomial-time reductions
 We say that a decision problem A reduces to a
decision problem B in polynomial time, and write
A ≤P B
if there exists a polynomial-time-computable
function f that maps instances of A to instances of
B, such that the answers to the instances are
preserved.
Properties of polynomial reductions
 Theorem 1. If A and B are decision problems,
and B is in P, and A ≤P B, then A is in P.
Proof. Since A ≤P B, there is an answer-preserving
polynomial-time computable function f mapping A to
B. Let us say we are given an instance of A, call it x.
To decide x, we map x to f(x) and feed it into our
algorithm to solve B. Whatever B answers, we
answer for A.
The total running time is the cost of computing f(x)
and the cost of running B on f(x). Since both are
polynomial-time, the resulting algorithm is
polynomial-time.
QED
Polynomial reduction properties …
 Theorem 2. If A ≤P B and B ≤p C, then A ≤P C.
 Proof. We have a polynomial-time
computable function f mapping "yes"
instances of A to "yes" instances of B, and
similarly g mapping "yes" instances of B to
"yes" instances of C. So the composition x ->
g(f(x)) maps "yes" instances of A to "yes"
instances of C. If f and g are polynomial-time
computable, so is g(f(x)).
QED
Polynomial reduction properties …
 Theorem 3. If A ≤P B and B ∈ NP, then A ∈ NP.
 Proof. If B ∈ NP, then B has a polynomial-time verifier
V. For each "yes" instance of B, say y, there is a
certificate Cy and V(y,Cy) = true.
If A ≤P B, then there is a function f mapping "yes"
instances x of A to "yes" instances of B. Now we can
create a polynomial-time verifier V' for A: we let V'(x,
C) = V(f(x), C). Furthermore, a certificate C for f(x) is
a certificate for x. Since V' is a composition of two
polynomial-time computable functions, it is
computable in polynomial time.
QED
NP-hardness
 A decision problem A is NP-hard if B ≤P A for
every B ∈ NP.
 A decision problem A is NP-complete if A is
NP-hard and A ∈ NP.
 Thus the NP-complete problems are the
"hardest" problem in NP.
 Furthermore, from our Theorem 1, if a single
NP-complete problem is in P, then NP = P:
every problem in NP would be polynomialtime solvable.
The relationship
 Open Question: P ?= NP
NP-Complete
P
NP

Ming Li Talk about Bioinformatics

Transcript Ming Li Talk about Bioinformatics

Directory