
Теория информации / Cover T.M., Thomas J.A. Elements of Information Theory. 2006., 748p
.pdf
|
|
|
|
|
|
|
|
|
PROBLEMS 95 |
|
following recursion: |
|
|
|
X1 |
(n − |
1) , |
|
|||
X1 |
(n) |
|
1 |
0 |
0 |
|
||||
X |
(n) |
|
0 |
1 |
1 |
|
X |
(n |
1) |
|
2 |
(n) = |
|
|
|
|
2 |
− |
1) |
(4.101) |
|
X3 |
0 |
1 |
0 |
X3 |
(n − |
|
with initial conditions X(1) = [1 1 0]t .
(c) Let |
|
|
|
|
|
|
1 |
0 |
0 |
|
|
|
|
|
|
|
|
|
|
0 |
1 |
1 |
|
|
|
|
|
|
|
A = |
0 |
1 |
0 |
. |
(4.102) |
|||
Then we have by induction |
|
|
|
|
|
An−1X(1). |
||||||
X(n) |
= |
AX(n |
− |
1) |
= |
A2X(n |
− |
2) |
= · · · = |
|||
|
|
|
|
|
|
|
(4.103) |
|||||
|
|
|
|
|
|
|
|
|
|
|
|
Using the eigenvalue decomposition of A for the case of distinct
eigenvalues, we can write A |
= |
U −1 U , where is the diag- |
||||||||||||
|
|
|
|
|
|
n |
− |
1 |
|
1 |
n |
1 |
U . Show |
|
onal matrix of eigenvalues. Then A |
|
= U − |
|
− |
||||||||||
that we can write |
|
|
|
|
|
|
|
|
|
|
|
|
||
X(n) |
= |
λn−1Y |
1 + |
λn−1Y |
2 + |
λn−1Y |
, |
|
|
(4.104) |
||||
|
1 |
|
2 |
|
3 |
3 |
|
|
|
|
where Y1, Y2, Y3 do not depend on n. For large n, this sum is dominated by the largest term. Therefore, argue that for i = 1, 2, 3, we have
1 |
|
n log Xi (n) → log λ, |
(4.105) |
where λ is the largest (positive) eigenvalue. Thus, the number of sequences of length n grows as λn for large n. Calculate λ for the matrix A above. (The case when the eigenvalues are not distinct can be handled in a similar manner.)
(d) We now take a different approach. Consider a Markov chain whose state diagram is the one given in part (a), but with arbitrary transition probabilities. Therefore, the probability transition matrix of this Markov chain is
|
α |
0 |
1 |
0 |
α |
. |
|
|
|
|
0 |
1 |
|
|
|
|
|
P |
= |
|
|
|
− |
|
|
(4.106) |
|
1 |
0 |
|
0 |
|
|
Show that the stationary distribution of this Markov chain is
µ |
= |
|
|
1 |
|
, |
|
1 |
|
, |
1 |
− α |
. |
(4.107) |
|
|
− |
|
|
− |
|
|
|
||||||||
|
3 |
α |
3 |
α |
3 |
− |
α |
|
|||||||
|
|
|
|
|

96 ENTROPY RATES OF A STOCHASTIC PROCESS
(e)Maximize the entropy rate of the Markov chain over choices of α. What is the maximum entropy rate of the chain?
(f) Compare the maximum entropy rate in part (e) with log λ in part (c). Why are the two answers the same?
4.17Recurrence times are insensitive to distributions . Let X0, X1, X2,
. . . be drawn i.i.d. p(x), x X = {1, 2, . . . , m}, and let N be the
waiting time to the next occurrence of X0. Thus N = minn{Xn =
X0}.
(a)Show that EN = m.
(b)Show that E log N ≤ H (X).
(c)(Optional ) Prove part (a) for {Xi } stationary and ergodic.
4.18Stationary but not ergodic process . A bin has two biased coins, one with probability of heads p and the other with probability of heads 1 − p. One of these coins is chosen at random (i.e., with
probability 12 ) and is then tossed n times. Let X denote the identity of the coin that is picked, and let Y1 and Y2 denote the results of the first two tosses.
(a)Calculate I (Y1; Y2|X).
(b)Calculate I (X; Y1, Y2).
(c)Let H(Y) be the entropy rate of the Y process (the sequence of coin tosses). Calculate H(Y). [Hint: Relate this to
lim n1 H (X, Y1, Y2, . . . , Yn).]
You can check the answer by considering the behavior as p → 12 .
4.19Random walk on graph. Consider a random walk on the following graph:
2
3
1
4
5 |
(a) Calculate the stationary distribution.
PROBLEMS 97
(b) What is the entropy rate?
(c) Find the mutual information I (Xn+1; Xn) assuming that the process is stationary.
4.20 Random walk on chessboard . Find the entropy rate of the Markov chain associated with a random walk of a king on the 3 × 3 chess-
board |
|
|
|
|
|
|
|
|
|
1 |
2 |
3 |
|
|
|
4 |
5 |
6 |
|
|
7 |
8 |
9 |
|
What about the entropy rate of rooks, bishops, and queens? There |
||||
are two types of bishops. |
|
|
|
4.21 |
Maximal entropy graphs . Consider a random walk on a connected |
||
|
graph with four edges. |
||
|
(a) |
Which graph has the highest entropy rate? |
|
|
(b) |
Which graph has the lowest? |
|
4.22 |
Three-dimensional maze. A bird is lost in a 3 × 3 × 3 cubical |
||
|
maze. The bird flies from room to room going to adjoining rooms |
||
|
with equal probability through each of the walls. For example, the |
||
|
corner rooms have three exits. |
||
|
(a) |
What is the stationary distribution? |
|
|
(b) |
What is the entropy rate of this random walk? |
|
4.23 |
Entropy rate. Let {Xi } be a stationary stochastic process with |
||
|
entropy rate H (X). |
||
|
(a) |
Argue that H (X) ≤ H (X1). |
|
|
(b) |
What are the conditions for equality? |
|
4.24 |
Entropy rates . |
Let {Xi } be a stationary process. Let Yi = (Xi , |
|
|
Xi+1). Let Zi |
= (X2i , X2i+1). Let Vi = X2i . Consider the entropy |
rates H (X), H (Y), H (Z), and H (V) of the processes {Xi },{Yi }, {Zi }, and {Vi }. What is the inequality relationship ≤, =, or ≥
between each of the pairs listed below?
(a)H (X)<H (Y).
(b)H (X)<H (Z).
(c)H (X)<H (V).
(d)H (Z)<H (X).
4.25Monotonicity
(a)Show that I (X; Y1, Y2, . . . , Yn) is nondecreasing in n.

98ENTROPY RATES OF A STOCHASTIC PROCESS
(b)Under what conditions is the mutual information constant for all n?
4.26Transitions in Markov chains . Suppose that {Xi } forms an irreducible Markov chain with transition matrix P and stationary distri-
bution µ. Form the associated “edge process” {Yi } by keeping track only of the transitions. Thus, the new process {Yi } takes values in X × X, and Yi = (Xi−1, Xi ). For example,
Xn = 3, 2, 8, 5, 7, . . .
becomes
Y n = ( , 3), (3, 2), (2, 8), (8, 5), (5, 7), . . . .
Find the entropy rate of the edge process {Yi }.
4.27Entropy rate. Let {Xi } be a stationary {0, 1}-valued stochastic process obeying
Xk+1 = Xk Xk−1 Zk+1,
where {Zi } is Bernoulli(p)and denotes mod 2 addition. What is the entropy rate H (X)?
4.28Mixture of processes . Suppose that we observe one of two stochastic processes but don’t know which. What is the entropy rate? Specifically, let X11, X12, X13, . . . be a Bernoulli process with
parameter p1, and let X21, X22, X23, . . . be Bernoulli(p2). Let
θ |
= |
|
1 |
with probability |
21 |
|
|
2 |
with probability |
21 |
and let Yi = Xθ i , i = 1, 2, . . . , be the stochastic process observed. Thus, Y observes the process {X1i } or {X2i }. Eventually, Y will know which.
(a)Is {Yi } stationary?
(b)Is {Yi } an i.i.d. process?
(c)What is the entropy rate H of {Yi }?
(d)Does
1
− n log p(Y1, Y2, . . . Yn) −→ H ?
(e) Is there a code that achieves an expected per-symbol description length n1 ELn −→ H ?
PROBLEMS 99
Now let θi be Bern( 12 ). Observe that
Zi = Xθi i , i = 1, 2, . . . .
Thus, θ is not fixed for all time, as it was in the first part, but is chosen i.i.d. each time. Answer parts (a), (b), (c), (d), (e) for the process {Zi }, labeling the answers (a ), (b ), (c ), (d ), (e ).
4.29Waiting times . Let X be the waiting time for the first heads to appear in successive flips of a fair coin. For example, Pr{X = 3} = ( 12 )3. Let Sn be the waiting time for the nth head to appear. Thus,
S0 = 0
Sn+1 = Sn + Xn+1,
where X1, X2, X3, . . . are i.i.d according to the distribution above.
(a)Is the process {Sn} stationary?
(b)Calculate H (S1, S2, . . . , Sn).
(c)Does the process {Sn} have an entropy rate? If so, what is it? If not, why not?
(d)What is the expected number of fair coin flips required to generate a random variable having the same distribution as Sn?
4.30Markov chain transitions
|
|
|
|
1 |
1 |
1 |
|
|
|
|
1 |
1 |
1 |
||
|
|
|
|
2 |
4 |
4 |
|
P |
= |
[Pij ] |
= |
1 |
1 |
1 |
. |
|
|
|
|||||
|
|
|
|
4 |
2 |
4 |
|
|
|
|
4 |
4 |
2 |
Let X1 be distributed uniformly over the states {0, 1, 2}. Let {Xi }∞1 be a Markov chain with transition matrix P ; thus, P (Xn+1 = j |Xn = i) = Pij , i, j {0, 1, 2}.
(a)Is {Xn} stationary?
(b)Find limn→∞ n1 H (X1, . . . , Xn).
Now consider the derived process Z1, Z2, . . . , Zn, where
Z1 = X1
Zi = Xi − Xi−1 (mod 3), i = 2, . . . , n.
Thus, Zn encodes the transitions, not the states.
(c)Find H (Z1, Z2, . . . , Zn).
(d)Find H (Zn) and H (Xn) for n ≥ 2.
100 ENTROPY RATES OF A STOCHASTIC PROCESS
(e)Find H (Zn|Zn−1) for n ≥ 2.
(f)Are Zn−1 and Zn independent for n ≥ 2?
4.31Markov . Let {Xi } Bernoulli(p). Consider the associated
Markov chain {Yi }ni=1, where
Yi = (the number of 1’s in the current run of 1’s). For example, if Xn = 101110 . . . , we have Y n = 101230 . . . .
(a)Find the entropy rate of Xn.
(b)Find the entropy rate of Y n.
4.32Time symmetry . Let {Xn} be a stationary Markov process. We condition on (X0, X1) and look into the past and future. For what index k is
H (X−n|X0, X1) = H (Xk |X0, X1)?
Give the argument.
4.33Chain inequality . Let X1 → X2 → X3 → X4 form a Markov chain. Show that
I (X1; X3) + I (X2; X4) ≤ I (X1; X4) + I (X2; X3). (4.108)
4.34Broadcast channel . Let X → Y → (Z, W ) form a Markov chain [i.e., p(x, y, z, w) = p(x)p(y|x)p(z, w|y) for all x, y, z, w]. Show that
I (X; Z) + I (X; W ) ≤ I (X; Y ) + I (Z; W ). |
(4.109) |
4.35Concavity of second law . Let {Xn}∞−∞ be a stationary Markov process. Show that H (Xn|X0) is concave in n. Specifically, show that
H (Xn|X0) − H (Xn−1|X0) − (H (Xn−1|X0) − H (Xn−2|X0))
= −I (X1; Xn−1|X0, Xn) ≤ 0. |
(4.110) |
Thus, the second difference is negative, establishing that H (Xn|X0) is a concave function of n.
HISTORICAL NOTES
The entropy rate of a stochastic process was introduced by Shannon [472], who also explored some of the connections between the entropy rate of the process and the number of possible sequences generated by the process. Since Shannon, there have been a number of results extending the basic
HISTORICAL NOTES |
101 |
theorems of information theory to general stochastic processes. The AEP for a general stationary stochastic process is proved in Chapter 16.
Hidden Markov models are used for a number of applications, such as speech recognition [432]. The calculation of the entropy rate for constrained sequences was introduced by Shannon [472]. These sequences are used for coding for magnetic and optical channels [288].

CHAPTER 5
DATA COMPRESSION
We now put content in the definition of entropy by establishing the fundamental limit for the compression of information. Data compression can be achieved by assigning short descriptions to the most frequent outcomes of the data source, and necessarily longer descriptions to the less frequent outcomes. For example, in Morse code, the most frequent symbol is represented by a single dot. In this chapter we find the shortest average description length of a random variable.
We first define the notion of an instantaneous code and then prove the important Kraft inequality, which asserts that the exponentiated codeword length assignments must look like a probability mass function. Elementary calculus then shows that the expected description length must be greater than or equal to the entropy, the first main result. Then Shannon’s simple construction shows that the expected description length can achieve this bound asymptotically for repeated descriptions. This establishes the entropy as a natural measure of efficient description length. The famous Huffman coding procedure for finding minimum expected description length assignments is provided. Finally, we show that Huffman codes are competitively optimal and that it requires roughly H fair coin flips to generate a sample of a random variable having entropy H . Thus, the entropy is the data compression limit as well as the number of bits needed in random number generation, and codes achieving H turn out to be optimal from many points of view.
5.1EXAMPLES OF CODES
Definition A source code C for a random variable X is a mapping from X, the range of X, to D , the set of finite-length strings of symbols from a D-ary alphabet. Let C(x) denote the codeword corresponding to x and let l(x) denote the length of C(x).
Elements of Information Theory, Second Edition, By Thomas M. Cover and Joy A. Thomas Copyright 2006 John Wiley & Sons, Inc.
103
104 DATA COMPRESSION
For example, C(red) = 00, C(blue) = 11 is a source code for X = {red, blue} with alphabet D = {0, 1}.
Definition The expected length L(C) of a source code C(x) for a random variable X with probability mass function p(x) is given by
L(C) = p(x)l(x), |
(5.1) |
x X |
|
where l(x) is the length of the codeword associated with x.
Without loss of generality, we can assume that the D-ary alphabet is
D= {0, 1, . . . , D − 1}.
Some examples of codes follow.
Example 5.1.1 Let X be a random variable with the following distribution and codeword assignment:
Pr(X = 1) = 21 , |
codeword C(1) = 0 |
|
Pr(X = 2) = 41 , |
codeword C(2) = 10 |
(5.2) |
Pr(X = 3) = 81 , |
codeword C(3) = 110 |
|
Pr(X = 4) = 81 , |
codeword C(4) = 111. |
|
The entropy H (X) of X is 1.75 bits, and the expected length L(C) = El(X) of this code is also 1.75 bits. Here we have a code that has the same average length as the entropy. We note that any sequence of bits can be uniquely decoded into a sequence of symbols of X. For example, the bit string 0110111100110 is decoded as 134213.
Example 5.1.2 Consider another simple example of a code for a random variable:
Pr(X = 1) = 31 , |
codeword C(1) = 0 |
|
Pr(X = 2) = 31 , |
codeword C(2) = 10 |
(5.3) |
Pr(X = 3) = 31 , |
codeword C(3) = 11. |
|
Just as in Example 5.1.1, the code is uniquely decodable. However, in this case the entropy is log 3 = 1.58 bits and the average length of the encoding is 1.66 bits. Here El(X) > H (X).
Example 5.1.3 (Morse code) The Morse code is a reasonably efficient code for the English alphabet using an alphabet of four symbols: a dot,