Теория информации / Cover T.M., Thomas J.A. Elements of Information Theory. 2006., 748p
.pdf
336 RATE DISTORTION THEORY
Rate distortion theorem. If R > R(D), there exists a |
sequence of |
||||||||
|
ˆ n |
(X |
n |
) |
ˆ n |
( ) |
nR |
||
X |
|
X |
| ≤ 2 with |
||||||
codes n |
ˆ n |
|
|
|
n with the number of codewords | |
|
· |
||
Ed(X |
, X |
(X |
)) → D. If R < R(D), no such codes exist. |
||||||
Bernoulli source. For a Bernoulli source with Hamming distortion,
R(D) = H (p) − H (D). |
(10.149) |
Gaussian source. For a Gaussian source with squared-error distortion,
R(D) = |
1 |
log |
σ 2 |
(10.150) |
|
|
|
. |
|||
2 |
D |
||||
Source – channel separation. A source with rate distortion R(D) can be sent over a channel of capacity C and recovered with distortion D if and only if R(D) < C.
Multivariate Gaussian source. The rate distortion function for a multivariate normal vector with Euclidean mean-squared-error distortion is given by reverse water-filling on the eigenvalues.
PROBLEMS
10.1One-bit quantization of a single Gaussian random variable. Let
X N(0, σ 2) and let the distortion measure be squared error. Here we do not allow block descriptions. Show that the optimum
reproduction points for 1-bit quantization are ± π2 σ and that the
expected distortion for 1-bit quantization is ππ−2 σ 2. Compare this |
|
with the distortion rate bound D = σ 22−2R |
for |
R = 1.
10.2Rate distortion function with infinite distortion. Find the rate dis-
|
|
ˆ |
1 |
tortion function R(D) = min I (X; X) for X Bernoulli ( |
2 ) and |
||
distortion |
|
|
|
|
0, |
x = xˆ |
|
d(x, x)ˆ = |
1, |
x = 1, xˆ = 0 |
|
∞, x = 0, xˆ = 1.
PROBLEMS 337
10.3Rate distortion for binary source with asymmetric distortion . Fix
|
|
ˆ |
|
|
|
|
|
p(xˆ|x) and evaluate I (X; X) and D for |
|
||||||
X |
|
Bernoulli |
|
1 |
|
, |
|
2 |
|
||||||
|
|
b |
|
. |
|||
d(x, x)ˆ = |
0 |
|
|||||
|
|
|
0 |
a |
|
||
(The rate distortion function cannot be expressed in closed form.)
10.4 Properties of R(D). |
Consider a |
discrete source X X = |
{1, 2, . . . , m} with distribution p1, p2, . . . , pm and a distortion |
||
measure d(i, j ). Let |
R(D) be the |
rate distortion function for |
this source and distortion measure. Let d (i, j ) = d(i, j ) − wi be a new distortion measure, and let R (D) be the corresponding rate distortion function. Show that R (D) = R(D + w), where w = pi wi , and use this to show that there is no essential loss of generality in assuming that minxˆ d(i, x)ˆ = 0 (i.e., for each x X, there is one symbol xˆ that reproduces the source with zero distortion). This result is due to Pinkston [420].
10.5Rate distortion for uniform source with Hamming distortion. Consider a source X uniformly distributed on the set {1, 2, . . . , m}. Find the rate distortion function for this source with Hamming distortion; that is,
d(x, x)ˆ =
0 if x = x,ˆ
1 if x =xˆ.
10.6Shannon lower bound for the rate distortion function. Consider a source X with a distortion measure d(x, x)ˆ that satisfies the following property: All columns of the distortion matrix are permutations of the set {d1, d2, . . . , dm}. Define the function
φ (D) |
= p: |
max |
H (p). |
(10.151) |
|
|
m |
p d |
D |
|
|
|
|
i=1 |
i |
i ≤ |
|
The Shannon lower bound on the rate distortion function [485]
is proved by the following steps: |
|
|
||
(a) |
Show that φ (D) is a concave function of D. |
ˆ |
||
(b) |
Justify the following series |
of inequalities for |
||
I (X; X) if |
||||
|
ˆ |
|
|
|
|
Ed(X, X) ≤ D, |
|
|
|
|
ˆ |
ˆ |
(10.152) |
|
|
I (X; X) = H (X) − H (X|X) |
|||
338 RATE DISTORTION THEORY
= H (X) − p(x)Hˆ (X|Xˆ = x)ˆ |
(10.153) |
|
xˆ |
|
|
≥ H (X) − p(x)φˆ (Dxˆ ) |
(10.154) |
|
xˆ |
|
|
≥ H (X) − φ |
p(x)Dˆ xˆ |
(10.155) |
|
xˆ |
|
≥ H (X) − φ (D), |
(10.156) |
|
where Dxˆ = x p(x|x)dˆ (x, x)ˆ . |
|
|
(c) Argue that |
|
|
R(D) ≥ H (X) − φ (D), |
(10.157) |
|
which is the Shannon lower bound on the rate distortion function.
(d)If, in addition, we assume that the source has a uniform distribution and that the rows of the distortion matrix are permutations of each other, then R(D) = H (X) − φ (D) (i.e., the lower bound is tight).
10.7Erasure distortion. Consider X Bernoulli ( 12 ), and let the distortion measure be given by the matrix
d(x, x)ˆ |
= |
|
1 |
0 |
(10.158) |
|
0 1 |
∞ . |
|||
|
|
∞ |
|
|
|
Calculate the rate distortion function for this source. Can you suggest a simple scheme to achieve any value of the rate distortion function for this source?
10.8Bounds on the rate distortion function for squared-error distortion. For the case of a continuous random variable X with mean zero and variance σ 2 and squared-error distortion, show that
|
1 |
|
1 σ 2 |
|
|||
h(X) − |
|
log(2π eD) ≤ R(D) ≤ |
|
log |
|
. |
(10.159) |
2 |
2 |
D |
|||||
For the upper bound, consider the following joint distribution:
PROBLEMS 339
|
|
|
|
Z ~ |
|
|
|
|
Ds 2 |
|||||||
|
|
|
|
|
0, s 2 − D s 2 − D |
|||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
s 2 |
|
|||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
^ |
||||||||
|
|
|
|
|
|
|||||||||||
X |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
X |
|
|
2 |
− D |
|
|
|
|
|
|
|
|
|
|
|
||
X = s |
(X + Z ) |
|||||||||||||||
^ |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
s 2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
Are Gaussian random variables harder or easier to describe than other random variables with the same variance?
10.9Properties of optimal rate distortion code. A good (R, D) rate
distortion code with R |
≈ |
ˆ |
|
|
|
|
|
R(D) puts severe constraints on the rela- |
|||
tionship of the source X |
n |
and the representations X |
n |
. Examine the |
|
|
|
||||
chain of inequalities (10.58 – 10.71) considering the conditions for equality and interpret as properties of a good code. For example,
ˆ n |
is a deterministic function |
equality in (10.59) implies that X |
|
of Xn. |
|
10.10Rate distortion. Find and verify the rate distortion function R(D) for X uniform on X = {1, 2, . . . , 2m} and
d(x, x)ˆ |
= |
1 |
for x |
xˆ |
odd, |
0 |
for x |
− xˆ |
even, |
||
|
|
|
|
− |
|
ˆ |
ˆ |
|
|
|
|
where X is defined on |
X = {1, 2, . . . , 2m}. (You may wish to use |
||||
the Shannon lower bound in your argument.)
10.11 Lower bound . Let
e−x4
X ∞ e−x4 dx
−∞
and
x4e−x4 dx = c.
e−x4 dx
Define g(a) = max h(X) over all densities such that EX4 ≤ a. Let R(D) be the rate distortion function for X with the density above and with distortion criterion d(x, x)ˆ = (x − x)ˆ 4. Show that
R(D) ≥ g(c) − g(D).
340 RATE DISTORTION THEORY
10.12Adding a column to the distortion matrix . Let R(D) be the rate
distortion function for an i.i.d. process with probability mass function p(x) and distortion function d(x, x)ˆ , x X, xˆ Xˆ . Now suppose that we add a new reproduction symbol xˆ0 to Xˆ with associated distortion d(x, xˆ0), x X. Does this increase or decrease R(D), and why?
10.13Simplification. Suppose that X = {1, 2, 3, 4}, Xˆ = {1, 2, 3, 4},
p(i) = 14 , i = 1, 2, 3, 4, and X1, X2, . . . are i.i.d. p(x). The distortion matrix d(x, x)ˆ is given by
|
1 |
2 |
3 |
4 |
1 |
0 |
0 |
1 |
1 |
2 |
0 |
0 |
1 |
1 |
3 |
1 |
1 |
0 |
0 |
4 |
1 |
1 |
0 |
0 |
|
|
|
|
|
(a)Find R(0), the rate necessary to describe the process with zero distortion.
(b)Find the rate distortion function R(D). There are some irrelevant distinctions in alphabets X and Xˆ , which allow the problem to be collapsed.
(c) Suppose that we have a nonuniform distribution p(i) = pi ,
i= 1, 2, 3, 4. What is R(D)?
10.14Rate distortion for two independent sources . Can one compress two independent sources simultaneously better than by compressing the sources individually? The following problem addresses this
question. Let {Xi } be i.i.d. p(x) with distortion d(x, x)ˆ and rate distortion function RX(D). Similarly, let {Yi } be i.i.d. p(y) with distortion d(y, y)ˆ and rate distortion function RY (D). Suppose we now wish to describe the process {(Xi , Yi )} subject to distortions
|
|
ˆ |
|
|
ˆ |
. Thus, a rate RX,Y (D1, D2) |
||
Ed(X, X) ≤ D1 |
and Ed(Y, Y ) ≤ D2 |
|||||||
is sufficient, where |
|
|
|
|
||||
R |
|
(D , D ) |
= |
min |
I (X, Y |
; |
ˆ ˆ |
|
X,Y |
X, Y ). |
|||||||
|
1 2 |
ˆ |
ˆ |
|
||||
|
|
|
|
p(x,ˆ yˆ|x,y):Ed(X,X)≤D1,Ed(Y,Y )≤D2 |
|
|
||
Now suppose that the {Xi } process and the {Yi } process are independent of each other.
(a) Show that
RX,Y (D1, D2) ≥ RX(D1) + RY (D2).
342 RATE DISTORTION THEORY
one of the sequences is fixed and the other is random. The techniques of weak typicality allow us only to calculate the average set size of the conditionally typical set. Using the ideas of strong typicality, on the other hand, provides us with stronger bounds
that work for all typical xn sequences. We outline the proof that Pr{(xn, Y n) A (n)} ≈ 2−nI (X;Y ) for all typical xn. This approach
was introduced by Berger [53] and is fully developed in the book by Csiszar´ and Korner¨ [149].
Let (Xi , Yi ) be drawn i.i.d. p(x, y). Let the marginals of X and Y be p(x) and p(y), respectively.
(a) Let A (n) be the strongly typical set for X. Show that
| |
A (n) |
. |
2nH (X). |
(10.168) |
|
|= |
|
|
(Hint: Theorems 11.1.1 and 11.1.3.)
(b) The joint type of a pair of sequences (xn, yn) is the proportion of times (xi , yi ) = (a, b) in the pair of sequences:
n
pxn,yn (a, b) = n1 N (a, b|xn, yn) = n1 I (xi = a, yi = b).
i=1
(10.169) The conditional type of a sequence yn given xn is a stochastic matrix that gives the proportion of times a particular element of Y occurred with each element of X in the pair of sequences. Specifically, the conditional type Vyn|xn (b|a) is defined as
V n |
|x |
n (b a) |
N (a, b|xn, yn) |
. |
(10.170) |
|
|||||
y |
| = |
N (a|xn) |
|
||
Show that the number of conditional types is bounded by (n +
1)|X||Y|.
(c) The set of sequences yn Yn with conditional type V with respect to a sequence xn is called the conditional type class TV (xn). Show that
|
|
|
1 |
|
2nH (Y |X) |
≤ |TV (xn)| ≤ 2nH (Y |X). |
(10.171) |
|||||
|
(n |
+ |
1) |
|X||Y| |
||||||||
|
|
|
|
|
|
|
|
|
|
|
||
(d) The sequence yn |
Y |
n |
is |
said to be -strongly conditionally |
||||||||
|
|
|
|
|
|
|
|
n |
with respect to the conditional |
|||
typical with the sequence x |
|
|||||||||||
distribution V (·|·) if the conditional type is close to V . The conditional type should satisfy the following two conditions:
|
|
|
PROBLEMS 343 |
||||
(i) For all (a, b) X × Y with V (b|a) > 0, |
|
|
|
|
|||
1 |
N (a, b|xn, yn) − V (b|a)N (a|xn) |
≤ |
|
|
. |
||
|
|
|
|
||||
|
n |
|Y| |
1 |
||||
|
|
|
|
|
(10.172)+ |
|
|
(ii) N (a, b|xn, yn) = 0 for all (a, b) such that V (b|a) = 0.
The set of such sequences is called the conditionally typical set and is denoted A (n)(Y |xn). Show that the number
of sequences yn that are conditionally typical with a given xn Xn is bounded by
|
|
1 |
|
2n(H (Y |X)− 1) ≤ |A (n)(Y |xn)| |
||||
(n |
+ |
1) |
|X||Y| |
|||||
|
|
≤ |
|
+ |
1)|X||Y| |
2n(H (Y |X)+ 1), (10.173) |
||
|
|
|
|
(n |
||||
|
|
|
|
|
|
|
||
where 1 → 0 as → 0.
(e) For a pair of random variables (X, Y ) with joint distribution p(x, y), the -strongly typical set A (n) is the set of sequences
(xn, yn) Xn × Yn satisfying
(i)
n |
| |
− |
|
|
|
|
|
||
|
1 |
N (a, b xn, yn) |
|
p(a, b) |
< |
(10.174) |
|||
|
|
|
|
|
|
|
|
|X||Y| |
|
|
|
|
|
|
|
|
|
|
|
for every pair (a, b) X × Y with p(a, b) > 0. |
|||||||||
(ii) N (a, b|xn, yn) = 0 |
for |
all |
(a, b) X × Y with |
||||||
p(a, b) = 0. |
|
|
|
|
|
|
|
||
The set of -strongly jointly typical sequences is called the
-strongly jointly typical set |
and is denoted A (n)(X, Y ). Let |
|||||||||||||||||||||
(X, Y ) |
be drawn i.i.d. |
p(x, y) |
|
|
any xn |
such that there |
||||||||||||||||
|
|
|
n |
n |
|
|
. For(n) |
|
|
|||||||||||||
exists at least one pair (x , y ) |
|
|
|
|
|
|
|
|
|
|
||||||||||||
|
|
A (X, Y ), the set of se- |
||||||||||||||||||||
quences yn such that (xn, yn) A (n) satisfies |
|
|
|
|
|
|||||||||||||||||
|
|
|
|
1 |
|
2n(H (Y |X)−δ( )) |
≤ |{ |
yn : (xn, yn) |
|
A (n) |
}| |
|||||||||||
|
|
|
|
|
|
|||||||||||||||||
|
(n |
|
|
1) |
|
|||||||||||||||||
|
+ |
|X||Y| |
|
|
|
|
|
|
|
|
|
|
|
|||||||||
|
|
|
≤ |
|
+ |
1)|X||Y|2n(H (Y |X)+δ( )), |
|
|
|
(10.175) |
||||||||||||
|
|
|
|
|
|
(n |
|
|
|
|||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
where δ( ) → 0 as → 0. In particular, we can write |
|
|||||||||||||||||||||
2n(H (Y |X)− 2) |
|
yn |
: (xn, yn) |
|
|
|
A (n) |
}| ≤ |
2n(H (Y |X)+ 2), |
|||||||||||||
|
|
|
|
|
|
≤ |{ |
|
|
|
|
|
|
|
|
|
|
|
(10.176) |
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
344 RATE DISTORTION THEORY
where we can make 2 arbitrarily small with an appropriate choice of and n.
(f)Let Y1, Y2, . . . , Yn be drawn i.i.d. p(yi ). For xn A (n), the probability that (xn, Y n) A (n) is bounded by
2−n(I (X;Y )+ 3) ≤ Pr((xn, Y n) A (n)) ≤ 2−n(I (X;Y )− 3),
(10.177)
where 3 goes to 0 as → 0 and n → ∞.
10.17Source–channel separation theorem with distortion. Let V1, V2, . . . , Vn be a finite alphabet i.i.d. source which is encoded
as a sequence of n input symbols Xn of a discrete memoryless channel. The output of the channel Y n is mapped onto the recon-
1 |
n |
struction alphabet Vˆ n = g(Y n). Let D = Ed(V n, Vˆ n) = n |
i=1 |
|
ˆ |
|
|
|
Ed(Vi , Vi ) be the average distortion achieved by this combined |
||||
source and channel coding scheme. |
|
|
||
V n |
X n(V n) |
Channel Capacity C |
Y n |
^ |
V n |
||||
(a)Show that if C > R(D), where R(D) is the rate distortion function for V , it is possible to find encoders and decoders that achieve a average distortion arbitrarily close to D.
(b)(Converse) Show that if the average distortion is equal to D, the capacity of the channel C must be greater than R(D).
10.18 Rate distortion. |
Let d(x, x)ˆ |
be a distortion function. We have |
|||||||
a source X p(x). Let R(D) be the associated rate distortion |
|||||||||
function. |
R(D) |
|
R(D), |
R(D) |
|
|
|||
(a) |
Find |
in terms of |
is the |
rate |
|||||
˜ |
|
where |
˜ |
||||||
|
|
|
|
|
|
|
d(x, x) |
||
|
distortion function associated with the distortion ˜ |
ˆ = |
|||||||
(b) |
d(x, x)ˆ + a for some constant a > 0. (They are not equal.) |
||||||||
Now suppose that d(x, x)ˆ |
≥ 0 for all x, xˆ and define a new |
||||||||
|
distortion function d (x, x)ˆ = bd(x, x),ˆ where b is some num- |
||||||||
|
ber ≥ 0. Find the associated rate distortion function R (D) in |
||||||||
|
terms of R(D). |
|
= 5(x − x)ˆ 2 + 3. What is |
||||||
(c) |
Let |
X N (0, σ 2) and d(x, x)ˆ |
|||||||
R(D)?
