On The Computability of Julia Sets

Download Report

Transcript On The Computability of Julia Sets

Introduction to information
complexity
Mark Braverman
Princeton University
June 30, 2013
1
Part I: Information theory
• Information theory, in its modern format
was introduced in the 1940s to study the
problem of transmitting data over physical
channels.
communication channel
Alice
Bob
2
Quantifying “information”
• Information is measured in bits.
• The basic notion is Shannon’s entropy.
• The entropy of a random variable is the
(typical) number of bits needed to remove
the uncertainty of the variable.
• For a discrete variable:
𝐻 𝑋 ≔ ∑ Pr 𝑋 = 𝑥 log 1/Pr[𝑋 = 𝑥]
3
Shannon’s entropy
• Important examples and properties:
– If 𝑋 = 𝑥 is a constant, then 𝐻 𝑋 = 0.
– If 𝑋 is uniform on a finite set 𝑆 of possible
values, then 𝐻 𝑋 = log 𝑆.
– If 𝑋 is supported on at most 𝑛 values, then
𝐻 X ≤ log 𝑛.
– If 𝑌 is a random variable determined by 𝑋,
then 𝐻 𝑌 ≤ 𝐻(𝑋).
4
Conditional entropy
• For two (potentially correlated) variables
𝑋, 𝑌, the conditional entropy of 𝑋 given 𝑌 is
the amount of uncertainty left in 𝑋 given 𝑌:
𝐻 𝑋 𝑌 ≔ 𝐸𝑦~𝑌 H X Y = y .
• One can show 𝐻 𝑋𝑌 = 𝐻 𝑌 + 𝐻(𝑋|𝑌).
• This important fact is known as the chain
rule.
• If 𝑋 ⊥ 𝑌, then
𝐻 𝑋𝑌 = 𝐻 𝑋 + 𝐻 𝑌 𝑋 = 𝐻 𝑋 + 𝐻 𝑌 .
5
Example
•
•
•
•
𝑋 = 𝐵1 , 𝐵2 , 𝐵3
𝑌 = 𝐵1 ⊕ 𝐵2 , 𝐵2 ⊕ 𝐵4 , 𝐵3 ⊕ 𝐵4 , 𝐵5
Where 𝐵1 , 𝐵2 , 𝐵3 , 𝐵4 , 𝐵5 ∈𝑈 {0,1}.
Then
– 𝐻 𝑋 = 3; 𝐻 𝑌 = 4; 𝐻 𝑋𝑌 = 5;
– 𝐻 𝑋 𝑌 = 1 = 𝐻 𝑋𝑌 − 𝐻 𝑌 ;
– 𝐻 𝑌 𝑋 = 2 = 𝐻 𝑋𝑌 − 𝐻 𝑋 .
6
Mutual information
• 𝑋 = 𝐵1 , 𝐵2 , 𝐵3
• 𝑌 = 𝐵1 ⊕ 𝐵2 , 𝐵2 ⊕ 𝐵4 , 𝐵3 ⊕ 𝐵4 , 𝐵5
𝐻(𝑋)
𝐻(𝑋|𝑌)
𝐼(𝑋; 𝑌)
𝐵1
𝐵1 ⊕ 𝐵2
𝐵2 ⊕ 𝐵3
𝐻(𝑌|𝑋)
𝐻(𝑌)
𝐵4
𝐵5
7
Mutual information
• The mutual information is defined as
𝐼 𝑋; 𝑌 = 𝐻 𝑋 − 𝐻 𝑋 𝑌 = 𝐻 𝑌 − 𝐻(𝑌|𝑋)
• “By how much does knowing 𝑋 reduce the
entropy of 𝑌?”
• Always non-negative 𝐼 𝑋; 𝑌 ≥ 0.
• Conditional mutual information:
𝐼 𝑋; 𝑌 𝑍 ≔ 𝐻 𝑋 𝑍 − 𝐻(𝑋|𝑌𝑍)
• Chain rule for mutual information:
𝐼 𝑋𝑌; 𝑍 = 𝐼 𝑋; 𝑍 + 𝐼(𝑌; 𝑍|𝑋)
8
• Simple intuitive interpretation.
Information Theory
• The reason Information Theory is so
important for communication is because
information-theoretic quantities readily
operationalize.
• Can attach operational meaning to
Shannon’s entropy: 𝐻 𝑋 ≈ “the cost of
transmitting 𝑋”.
• Let 𝐶 𝑋 be the (expected) cost of
transmitting a sample of 𝑋.
9
𝐻 𝑋 = 𝐶(𝑋)?
• Not quite.
• Let trit 𝑇 ∈𝑈 1,2,3 .
5
3
• 𝐶 𝑇 = ≈ 1.67.
1
2
0
10
3
11
• 𝐻 𝑇 = log 3 ≈ 1.58.
• It is always the case that 𝐶 𝑋 ≥ 𝐻(𝑋).
10
But 𝐻 𝑋 and 𝐶(𝑋) are close
• Huffman’s coding: 𝐶 𝑋 ≤ 𝐻 𝑋 + 1.
• This is a compression result: “an
uninformative message turned into a short
one”.
• Therefore: 𝐻 𝑋 ≤ 𝐶 𝑋 ≤ 𝐻 𝑋 + 1.
11
Shannon’s noiseless coding
• The cost of communicating many copies of 𝑋
scales as 𝐻(𝑋).
• Shannon’s source coding theorem:
– Let 𝐶 𝑋 𝑛 be the cost of transmitting 𝑛
independent copies of 𝑋. Then the
amortized transmission cost
lim 𝐶(𝑋 𝑛 )/𝑛 = 𝐻 𝑋 .
𝑛→∞
• This equation gives 𝐻(𝑋) operational
meaning.
12
𝐻 𝑋 𝑜𝑝𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑎𝑙𝑖𝑧𝑒𝑑
𝑋1 , … , 𝑋𝑛 , …
𝐻(𝑋) per copy
to transmit 𝑋’s
communication
channel
13
𝐻(𝑋) is nicer than 𝐶(𝑋)
• 𝐻 𝑋 is additive for independent variables.
• Let 𝑇1 , 𝑇2 ∈𝑈 {1,2,3} be independent trits.
• 𝐻 𝑇1 𝑇2 = log 9 = 2 log 3.
• 𝐶 𝑇1 𝑇2 =
29
9
5
3
< 𝐶 𝑇1 + 𝐶 𝑇2 = 2 × =
30
.
9
• Works well with concepts such as channel
capacity.
14
Operationalizing other quantities
• Conditional entropy 𝐻 𝑋 𝑌 :
• (cf. Slepian-Wolf Theorem).
𝑋1 , … , 𝑋𝑛 , … 𝐻(𝑋|𝑌) per copy
to transmit 𝑋’s
communication
channel
𝑌1 , … , 𝑌𝑛 , …
Operationalizing other quantities
• Mutual information 𝐼 𝑋; 𝑌 :
𝑌1 , … , 𝑌𝑛 , …
𝑋1 , … , 𝑋𝑛 , …
𝐼 𝑋; 𝑌 per copy
to sample 𝑌’s
communication
channel
Information theory and entropy
• Allows us to formalize intuitive notions.
• Operationalized in the context of one-way
transmission and related problems.
• Has nice properties (additivity, chain rule…)
• Next, we discuss extensions to more
interesting communication scenarios.
17
Communication complexity
• Focus on the two party randomized setting.
Shared randomness R
A & B implement a
functionality 𝐹(𝑋, 𝑌).
X
Y
F(X,Y)
A
e.g. 𝐹 𝑋, 𝑌 = “𝑋 = 𝑌? ”
B
18
Communication complexity
Goal: implement a functionality 𝐹(𝑋, 𝑌).
A protocol 𝜋(𝑋, 𝑌) computing 𝐹(𝑋, 𝑌):
Shared randomness R
m1(X,R)
m2(Y,m1,R)
m3(X,m1,m2,R)
X
Y
A
Communication costF(X,Y)
= #of bits exchanged.
B
Communication complexity
• Numerous applications/potential
applications.
• Considerably more difficult to obtain lower
bounds than transmission (still much easier
than other models of computation!).
20
Communication complexity
• (Distributional) communication complexity
with input distribution 𝜇 and error 𝜀:
𝐶𝐶 𝐹, 𝜇, 𝜀 . Error ≤ 𝜀 w.r.t. 𝜇.
• (Randomized/worst-case) communication
complexity: 𝐶𝐶(𝐹, 𝜀). Error ≤ 𝜀 on all inputs.
• Yao’s minimax:
𝐶𝐶 𝐹, 𝜀 = max 𝐶𝐶(𝐹, 𝜇, 𝜀).
𝜇
21
Examples
• 𝑋, 𝑌 ∈ 0,1 𝑛 .
• Equality 𝐸𝑄 𝑋, 𝑌 ≔ 1𝑋=𝑌 .
• 𝐶𝐶 𝐸𝑄, 𝜀 ≈
1
log .
𝜀
• 𝐶𝐶 𝐸𝑄, 0 ≈ 𝑛.
22
Equality
•𝐹 is “𝑋 = 𝑌? ”.
•𝜇 is a distribution where w.p. ½ 𝑋 = 𝑌 and w.p.
½ (𝑋, 𝑌) are random.
Y
X
MD5(X) [128 bits]
X=Y? [1 bit]
A
Error?
• Shows that 𝐶𝐶 𝐸𝑄, 𝜇, 2−129 ≤ 129.
B
Examples
• 𝑋, 𝑌 ∈ 0,1 𝑛 .
• Inner product 𝐼𝑃 𝑋, 𝑌 ≔ ∑𝑖 𝑋𝑖 ⋅ 𝑌𝑖 (𝑚𝑜𝑑 2).
• 𝐶𝐶 𝐼𝑃, 0 = 𝑛 − 𝑜(𝑛).
In fact, using information complexity:
• 𝐶𝐶 𝐼𝑃, 𝜀 = 𝑛 − 𝑜𝜀 (𝑛).
24
Information complexity
• Information complexity 𝐼𝐶(𝐹, 𝜀)::
communication complexity 𝐶𝐶 𝐹, 𝜀
as
• Shannon’s entropy 𝐻(𝑋) ::
transmission cost 𝐶(𝑋)
25
Information complexity
• The smallest amount of information Alice
and Bob need to exchange to solve 𝐹.
• How is information measured?
• Communication cost of a protocol?
– Number of bits exchanged.
• Information cost of a protocol?
– Amount of information revealed.
26
Basic definition 1: The
information cost of a protocol
• Prior distribution: 𝑋, 𝑌 ∼ 𝜇.
Y
X
Protocol
Protocol π
transcript Π
B
A
𝐼𝐶(𝜋, 𝜇) = 𝐼(Π; 𝑌|𝑋) + 𝐼(Π; 𝑋|𝑌)
what Alice learns about Y + what Bob learns about X
Example
•𝐹 is “𝑋 = 𝑌? ”.
•𝜇 is a distribution where w.p. ½ 𝑋 = 𝑌 and w.p.
½ (𝑋, 𝑌) are random.
Y
X
MD5(X) [128 bits]
X=Y? [1 bit]
A
B
𝐼𝐶(𝜋, 𝜇) = 𝐼(Π; 𝑌|𝑋) + 𝐼(Π; 𝑋|𝑌) ≈ 1 + 65 = 66 bits
what Alice learns about Y + what Bob learns about X
Prior 𝜇 matters a lot for
information cost!
• If 𝜇 = 1
𝑥,𝑦
a singleton,
𝐼𝐶 𝜋, 𝜇 = 0.
29
Example
•𝐹 is “𝑋 = 𝑌? ”.
•𝜇 is a distribution where (𝑋, 𝑌) are just
uniformly random.
Y
X
MD5(X) [128 bits]
X=Y? [1 bit]
A
B
𝐼𝐶(𝜋, 𝜇) = 𝐼(Π; 𝑌|𝑋) + 𝐼(Π; 𝑋|𝑌) ≈ 0 + 128 = 128 bits
what Alice learns about Y + what Bob learns about X
Basic definition 2: Information
complexity
• Communication complexity:
𝐶𝐶 𝐹, 𝜇, 𝜀 ≔
min
𝜋 𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑠
𝐹 𝑤𝑖𝑡ℎ 𝑒𝑟𝑟𝑜𝑟 ≤𝜀
Needed!
• Analogously:
𝐼𝐶 𝐹, 𝜇, 𝜀 ≔
𝜋.
inf
𝜋 𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑠
𝐹 𝑤𝑖𝑡ℎ 𝑒𝑟𝑟𝑜𝑟 ≤𝜀
𝐼𝐶(𝜋, 𝜇).
31
Prior-free information complexity
• Using minimax can get rid of the prior.
• For communication, we had:
𝐶𝐶 𝐹, 𝜀 = max 𝐶𝐶(𝐹, 𝜇, 𝜀).
𝜇
• For information
𝐼𝐶 𝐹, 𝜀 ≔
inf
𝜋 𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑠
𝐹 𝑤𝑖𝑡ℎ 𝑒𝑟𝑟𝑜𝑟 ≤𝜀
max 𝐼𝐶(𝜋, 𝜇).
𝜇
32
Operationalizing IC: Information
equals amortized communication
• Recall [Shannon]: lim 𝐶(𝑋 𝑛 )/𝑛 = 𝐻 𝑋 .
𝑛→∞
• Turns out [B.-Rao’11]:
lim 𝐶𝐶(𝐹 𝑛 , 𝜇𝑛 , 𝜀)/𝑛 = 𝐼𝐶 𝐹, 𝜇, 𝜀 , for 𝜀 > 0.
𝑛→∞
[Error 𝜀 allowed on each copy]
• For 𝜀 = 0: lim 𝐶𝐶(𝐹 𝑛 , 𝜇𝑛 , 0+ )/𝑛 = 𝐼𝐶 𝐹, 𝜇, 0 .
𝑛→∞
𝑛 𝑛
• [ lim 𝐶𝐶(𝐹 , 𝜇 , 0)/𝑛 an interesting open
𝑛→∞
problem.]
33
Entropy vs. Information Complexity
Entropy
IC
Additive?
Yes
Yes
Operationalized
𝐶(𝑋 𝑛 )/𝑛
Compression?
lim
𝑛→∞
Huffman: 𝐶 𝑋 ≤
𝐻 𝑋 +1
𝐶𝐶 𝐹 𝑛 , 𝜇𝑛 , 𝜀
lim
𝑛→∞
𝑛
???!
Can interactive communication
be compressed?
• Is it true that 𝐶𝐶 𝐹, 𝜇, 𝜀 ≤ 𝐼𝐶 𝐹, 𝜇, 𝜀 + 𝑂(1)?
• Less ambitiously:
𝐶𝐶 𝐹, 𝜇, 𝑂(𝜀) = 𝑂 𝐼𝐶 𝐹, 𝜇, 𝜀 ?
• (Almost) equivalently: Given a protocol 𝜋 with
𝐼𝐶 𝜋, 𝜇 = 𝐼, can Alice and Bob simulate 𝜋 using
𝑂 𝐼 communication?
• Not known in general…
35
Applications
• Information = amortized communication
means that to understand the amortized
cost of a problem enough to understand its
information complexity.
36
Example: the disjointness function
•
•
•
•
𝑋, 𝑌 are subsets of 1, … , 𝑛 .
Alice gets 𝑋, Bob gets 𝑌.
Need to determine whether 𝑋 ∩ 𝑌 = ∅.
In binary notation need to compute
𝑛
(𝑋𝑖 ∧ 𝑌𝑖 )
𝑖=1
• An operator on 𝑛 copies of the 2-bit AND
function.
37
Set intersection
•
•
•
•
•
𝑋, 𝑌 are subsets of 1, … , 𝑛 .
Alice gets 𝑋, Bob gets 𝑌.
Want to compute 𝑋 ∩ 𝑌.
This is just 𝑛 copies of the 2-bit AND.
Understanding the information complexity
of AND gives tight bounds on both
problems!
38
Exact communication bounds
[B.-Garg-Pankratov-Weinstein’13]
• 𝐶𝐶 𝐼𝑛𝑡𝑛 , 0+ ≥ 𝑛 (trivial).
• 𝐶𝐶 𝐷𝑖𝑠𝑗𝑛 , 0+ = Ω(𝑛) [KalyanasundaramSchnitger’87, Razborov’92]
New:
• 𝐶𝐶 𝐼𝑛𝑡𝑛 , 0+ ≈ 1.4922 𝑛 ± 𝑜(𝑛).
• 𝐶𝐶 𝐷𝑖𝑠𝑗𝑛 , 0+ ≈ 0.4827 𝑛 ± 𝑜 𝑛 .
39
Small set disjointness
• 𝑋, 𝑌 are subsets of 1, … , 𝑛 , 𝑋 , 𝑌 ≤ 𝑘.
• Alice gets 𝑋, Bob gets 𝑌.
• Need to determine whether 𝑋 ∩ 𝑌 = ∅.
• Trivial: 𝐶𝐶 𝐷𝑖𝑠𝑗𝑛𝑘 , 0+ = 𝑂(𝑘 log 𝑛).
• [Hastad-Wigderson’07]: 𝐶𝐶 𝐷𝑖𝑠𝑗𝑛𝑘 , 0+ = Θ 𝑘 .
• [BGPW’13]: 𝐶𝐶 𝐷𝑖𝑠𝑗𝑛𝑘 , 0+ = (2log 2 𝑒) k ± 𝑜(𝑘).
40
Open problem: Computability of IC
• Given the truth table of 𝐹 𝑋, 𝑌 , 𝜇 and 𝜀,
compute 𝐼𝐶 𝐹, 𝜇, 𝜀 .
• Via 𝐼𝐶 𝐹, 𝜇, 𝜀 = lim 𝐶𝐶(𝐹 𝑛 , 𝜇𝑛 , 𝜀)/𝑛 can
𝑛→∞
compute a sequence of upper bounds.
• But the rate of convergence as a function
of 𝑛 is unknown.
41
Open problem: Computability of IC
• Can compute the 𝑟-round 𝐼𝐶𝑟 𝐹, 𝜇, 𝜀
information complexity of 𝐹.
• But the rate of convergence as a function
of 𝑟 is unknown.
• Conjecture:
1
𝐼𝐶𝑟 𝐹, 𝜇, 𝜀 − 𝐼𝐶 𝐹, 𝜇, 𝜀 = 𝑂𝐹,𝜇,𝜀 2 .
𝑟
• This is the relationship for the two-bit AND.
42
Thank You!
43