Transcript PPT
Introduction to
Quantum Information Processing
CS 667 / PH 767 / CO 681 / AM 871
Lecture 16 (2009)
Richard Cleve
DC 2117
[email protected]
1
Distinguishing between two
arbitrary quantum states
2
Holevo-Helstrom Theorem (1)
Theorem: for any two quantum states and , the optimal
measurement procedure for distinguishing between them
succeeds with probability ½ + ¼|| − ||tr (equal prior probs.)
Proof* (the attainability part):
Since − is Hermitian, its eigenvalues are real
Let + be the projector onto the positive eigenspaces
Let − be the projector onto the non-positive eigenspaces
Take the POVM measurement specified by + and − with
the associations + and −
* The other direction of the theorem (optimality) is omitted here
3
Holevo-Helstrom Theorem (2)
Claim: this succeeds with probability ½ + ¼|| − ||tr
Proof of Claim:
A key observation is Tr(+ − −)( − ) = || − ||tr
The success probability is ps = ½Tr(+ ) + ½Tr(− )
& the failure probability is pf = ½Tr(+ ) + ½Tr(− )
Therefore, ps − pf = ½Tr(+ − −)( − ) = ½|| − ||tr
From this, the result follows
4
Purifications & Ulhmann’s Theorem
Any density matrix , can be obtained by tracing out part of
some larger pure state:
d
j j
j 1
m
m
j Tr2 j j j j j j
j 1
j 1
a purification of
Ulhmann’s Theorem*: The fidelity between and is the
maximum of φψ taken over all purifications ψ and φ
* See [Nielsen & Chuang, pp. 410-411] for a proof of this
Recall our previous definition of fidelity as
F(, ) = Tr√ 1/2 1/2 ||1/2 1/2||tr
5
Relationships between fidelity
and trace distance
1 − F(, ) || − ||tr √1 − F(, )2
See [Nielsen & Chuang, pp. 415-416] for more details
6
Entropy and
compression
7
Shannon Entropy
Let p = (p1,…, pd) be a probability distribution on a set {1,…,d}
Then the (Shannon) entropy of p is H(p1,…, pd)
d
p
j 1
j
log p j
Intuitively, this turns out to be a good measure of “how random”
the distribution p is:
vs.
H(p) = log d
vs.
vs.
H(p) = 0
Operationally, H(p) is the number of bits needed to store the
outcome (in a sense that will be made formal shortly)
8
Von Neumann Entropy
For a density matrix , it turns out that S() = − Tr log is a
good quantum analog of entropy
Note: S() = H(p1,…, pd), where p1,…, pd are the eigenvalues
of (with multiplicity)
Operationally, S() is the number of qubits needed to store
(in a sense that will be made formal later on)
Both the classical and quantum compression results pertain to
the case of large blocks of n independent instances of data:
• probability distribution pn in the classical case, and
• quantum state n in the quantum case
9
Classical compression (1)
Let p = (p1,…, pd) be a probability distribution on a set {1,…,d}
where n independent instances are sampled:
( j1,…, jn) {1,…,d}n (d n possibilities, n log d bits to specify one)
Theorem*: for all > 0, for sufficiently large n, there is a
scheme that compresses the specification to n(H(p) + ) bits
while introducing an error with probability at most
Intuitively, there is a subset of {1,…,d}n, called the “typical
sequences”, that has size 2n(H(p) + ) and probability 1 −
A nice way to prove the theorem, is based on two cleverly
defined random variables …
* “Plain vanilla” version that ignores, for example, the tradeoffs between n and
10
Classical compression (2)
Define the random variable f :{1,…,d} R as f ( j ) = − log pj
Note that E[ f ]
d
p
j 1
Define
g:{1,…,d}n
d
j
f ( j ) p j log p j H p1 ,, pd
j 1
R as g j1 , , jn
Thus E[ g ] H p1 ,, pd
Also, g( j1,…, jn) log p j p j
1
n
1
n
f ( j1 ) f ( jn )
n
11
Classical compression (3)
By standard results in statistics, as n , the observed
value of g( j1,…, jn) approaches its expected value, H(p)
More formally, call ( j1,…, jn) {1,…,d}n -typical if
g j1 ,, jn H p ε
Then, the result is that, for all > 0, for sufficiently large n,
Pr[( j1,…, jn) is -typical] 1−
We can also bound the number of these -typical sequences:
• By definition, each such sequence has probability 2−n(H(p) + )
• Therefore, there can be at most 2n(H(p) + ) such sequences
12
Classical compression (4)
In summary, the compression procedure is as follows:
The input data is ( j1,…, jn) {1,…,d}n, each independently
sampled according the probability distribution p = (p1,…, pd)
The compression procedure is to leave ( j1,…, jn) intact if it is
-typical and otherwise change it to some fixed -typical
sequence, say, ( j ,…, j) (which will result in an error)
Since there are at most 2n(H(p) + ) -typical sequences, the data
can then be converted into n(H(p) + ) bits
The error probability is at most , the probability of an atypical
input arising
13
Quantum compression (1)
The scenario: n independent instances of a d-dimensional
state are randomly generated according some distribution:
φ1 prob. p1
φr prob. pr
Example:
0 prob. ½
+ prob. ½
Goal: to “compress” this into as few qubits as possible so that
the original state can be reconstructed with small error in the
following sense …
The expected* trace distance between the reconstructed state
and the state that was actually generated should be small
* Defined as the expected value of the trace distance, taken with respect to the
randomness of the generation procedure
14
Quantum compression (2)
r
Define pi φ i φ i
i 1
Theorem: for all > 0, for sufficiently large n, there is a
scheme that compresses the data to n(S() + ) qubits,
with expected trace distance √2
For the aforementioned example, 0.6n qubits suffices
The compression method:
d
Express in its eigenbasis as q j ψ j ψ j
j 1
With respect to this basis, we will define an -typical subspace
of dimension 2n(S() + ) = 2n(H(q) + )
15
Quantum compression (3)
The -typical subspace is that spanned by ψ j1 , , ψ jn
where ( j1,…, jn) is -typical with respect to (q1,…, qd)
Define typ as the projector into the -typical subspace
By the same argument as in the classical case, the subspace
has dimension 2n(S() + ) and Tr(typ n) 1−
This is because is the density matrix of
1 prob. q1
d prob. qd
16
Quantum compression (4)
Calculation of the expected fidelity:
p
i1in
i i typ i i
1
n
1
i1in
n
p
i1in
Tr typ i1in i1in
i1in
Abbreviations:
pi1i = pi1 pi
n
n
φi1in = φi1φin
Tr typ pi1in i1in i1in
i1in
Tr typ n
1
Using || − ||tr √1 − F(, )2, we can upper bound the
expected trace distance by √2
17