Добавил:

fench Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Сумский государственный педагогический университет им. Макаренко

Предмет:

Теория вероятностей и математическая статистика

Файл:

Теория информации / Cover T.M., Thomas J.A. Elements of Information Theory. 2006., 748p

.pdf

Скачиваний:

234

Добавлен:

09.08.2013

Размер:

10.58 Mб

Скачать

☆

<<< < Предыдущая 1 2 3 4 5 6 7 8 9 10 11 1213 / 7813 14 15 16 17 18 19 20 21 22 23 24 25 > Следующая >>>

								PROBLEMS 95
following recursion:					X1		(n −	1) ,
X1	(n)	1	0	0	X1		(n −	1) ,
X	(n)	0	1	1		X	(n	1)
2	(n) =					2	−	1)	(4.101)
X3	(n) =	0	1	0		X3	(n −	1)

with initial conditions X(1) = [1 1 0]t .

(c) Let							1	0		0
							0	1		1
				A =			0	1		0	.	(4.102)
Then we have by induction												An−1X(1).
X(n)	=	AX(n	−	1)	=	A2X(n			−	2)	= · · · =	An−1X(1).
	=		−		=				−		= · · · =	(4.103)
												(4.103)

Using the eigenvalue decomposition of A for the case of distinct

eigenvalues, we can write A

U −1 U , where is the diag-

−

U . Show

onal matrix of eigenvalues. Then A

= U −

−

that we can write

X(n)

λn−1Y

1 +

λn−1Y

2 +

λn−1Y

(4.104)

where Y1, Y2, Y3 do not depend on n. For large n, this sum is dominated by the largest term. Therefore, argue that for i = 1, 2, 3, we have

1
n log Xi (n) → log λ,	(4.105)

where λ is the largest (positive) eigenvalue. Thus, the number of sequences of length n grows as λn for large n. Calculate λ for the matrix A above. (The case when the eigenvalues are not distinct can be handled in a similar manner.)

(d) We now take a different approach. Consider a Markov chain whose state diagram is the one given in part (a), but with arbitrary transition probabilities. Therefore, the probability transition matrix of this Markov chain is

	α		0	1	0	α	.
		0	1		0
P	=				−			(4.106)
	=	1	0		0

Show that the stationary distribution of this Markov chain is


µ	=		1		,	1		,	1	− α		.	(4.107)
			−			−
		3		α	3		α	3		−	α

96 ENTROPY RATES OF A STOCHASTIC PROCESS

(e)Maximize the entropy rate of the Markov chain over choices of α. What is the maximum entropy rate of the chain?

(f) Compare the maximum entropy rate in part (e) with log λ in part (c). Why are the two answers the same?

4.17Recurrence times are insensitive to distributions . Let X0, X1, X2,

. . . be drawn i.i.d. p(x), x X = {1, 2, . . . , m}, and let N be the

waiting time to the next occurrence of X0. Thus N = minn{Xn =

X0}.

(a)Show that EN = m.

(b)Show that E log N ≤ H (X).

(c)(Optional ) Prove part (a) for {Xi } stationary and ergodic.

4.18Stationary but not ergodic process . A bin has two biased coins, one with probability of heads p and the other with probability of heads 1 − p. One of these coins is chosen at random (i.e., with

probability 12 ) and is then tossed n times. Let X denote the identity of the coin that is picked, and let Y1 and Y2 denote the results of the ﬁrst two tosses.

(a)Calculate I (Y1; Y2|X).

(b)Calculate I (X; Y1, Y2).

(c)Let H(Y) be the entropy rate of the Y process (the sequence of coin tosses). Calculate H(Y). [Hint: Relate this to

lim n1 H (X, Y1, Y2, . . . , Yn).]

You can check the answer by considering the behavior as p → 12 .

4.19Random walk on graph. Consider a random walk on the following graph:

(a) Calculate the stationary distribution.

PROBLEMS 97

(b) What is the entropy rate?

4.20 Random walk on chessboard . Find the entropy rate of the Markov chain associated with a random walk of a king on the 3 × 3 chess-

board

	1	2	3
	4	5	6
	7	8	9
What about the entropy rate of rooks, bishops, and queens? There
are two types of bishops.


4.21	Maximal entropy graphs . Consider a random walk on a connected
	graph with four edges.
	(a)	Which graph has the highest entropy rate?
	(b)	Which graph has the lowest?
4.22	Three-dimensional maze. A bird is lost in a 3 × 3 × 3 cubical
	maze. The bird ﬂies from room to room going to adjoining rooms
	with equal probability through each of the walls. For example, the
	corner rooms have three exits.
	(a)	What is the stationary distribution?
	(b)	What is the entropy rate of this random walk?
4.23	Entropy rate. Let {Xi } be a stationary stochastic process with
	entropy rate H (X).
	(a)	Argue that H (X) ≤ H (X1).
	(b)	What are the conditions for equality?
4.24	Entropy rates .		Let {Xi } be a stationary process. Let Yi = (Xi ,
	Xi+1). Let Zi		= (X2i , X2i+1). Let Vi = X2i . Consider the entropy

rates H (X), H (Y), H (Z), and H (V) of the processes {Xi },{Yi }, {Zi }, and {Vi }. What is the inequality relationship ≤, =, or ≥

between each of the pairs listed below?

(a)H (X)<H (Y).

(b)H (X)<H (Z).

(c)H (X)<H (V).

(d)H (Z)<H (X).

4.25Monotonicity

(a)Show that I (X; Y1, Y2, . . . , Yn) is nondecreasing in n.

98ENTROPY RATES OF A STOCHASTIC PROCESS

(b)Under what conditions is the mutual information constant for all n?

4.26Transitions in Markov chains . Suppose that {Xi } forms an irreducible Markov chain with transition matrix P and stationary distri-

bution µ. Form the associated “edge process” {Yi } by keeping track only of the transitions. Thus, the new process {Yi } takes values in X × X, and Yi = (Xi−1, Xi ). For example,

Xn = 3, 2, 8, 5, 7, . . .

becomes

Y n = ( , 3), (3, 2), (2, 8), (8, 5), (5, 7), . . . .

Find the entropy rate of the edge process {Yi }.

4.27Entropy rate. Let {Xi } be a stationary {0, 1}-valued stochastic process obeying

Xk+1 = Xk Xk−1 Zk+1,

where {Zi } is Bernoulli(p)and denotes mod 2 addition. What is the entropy rate H (X)?

4.28Mixture of processes . Suppose that we observe one of two stochastic processes but don’t know which. What is the entropy rate? Speciﬁcally, let X11, X12, X13, . . . be a Bernoulli process with

parameter p1, and let X21, X22, X23, . . . be Bernoulli(p2). Let

θ	=		1	with probability	21
	=		2	with probability	21

and let Yi = Xθ i , i = 1, 2, . . . , be the stochastic process observed. Thus, Y observes the process {X1i } or {X2i }. Eventually, Y will know which.

(a)Is {Yi } stationary?

(b)Is {Yi } an i.i.d. process?

(c)What is the entropy rate H of {Yi }?

(d)Does

− n log p(Y1, Y2, . . . Yn) −→ H ?

(e) Is there a code that achieves an expected per-symbol description length n1 ELn −→ H ?

PROBLEMS 99

Now let θi be Bern( 12 ). Observe that

Zi = Xθi i , i = 1, 2, . . . .

Thus, θ is not ﬁxed for all time, as it was in the ﬁrst part, but is chosen i.i.d. each time. Answer parts (a), (b), (c), (d), (e) for the process {Zi }, labeling the answers (a ), (b ), (c ), (d ), (e ).

4.29Waiting times . Let X be the waiting time for the ﬁrst heads to appear in successive ﬂips of a fair coin. For example, Pr{X = 3} = ( 12 )3. Let Sn be the waiting time for the nth head to appear. Thus,

S0 = 0

Sn+1 = Sn + Xn+1,

where X1, X2, X3, . . . are i.i.d according to the distribution above.

(a)Is the process {Sn} stationary?

(b)Calculate H (S1, S2, . . . , Sn).

(c)Does the process {Sn} have an entropy rate? If so, what is it? If not, why not?

(d)What is the expected number of fair coin ﬂips required to generate a random variable having the same distribution as Sn?

4.30Markov chain transitions

				1	1	1
				1	1	1
				2	4	4
P	=	[Pij ]	=	1	1	1	.
	=		=	1	1	1
				4	2	4
				4	4	2

Let X1 be distributed uniformly over the states {0, 1, 2}. Let {Xi }∞1 be a Markov chain with transition matrix P ; thus, P (Xn+1 = j |Xn = i) = Pij , i, j {0, 1, 2}.

(a)Is {Xn} stationary?

(b)Find limn→∞ n1 H (X1, . . . , Xn).

Now consider the derived process Z1, Z2, . . . , Zn, where

Z1 = X1

Zi = Xi − Xi−1 (mod 3), i = 2, . . . , n.

Thus, Zn encodes the transitions, not the states.

(c)Find H (Z1, Z2, . . . , Zn).

(d)Find H (Zn) and H (Xn) for n ≥ 2.

100 ENTROPY RATES OF A STOCHASTIC PROCESS

(e)Find H (Zn|Zn−1) for n ≥ 2.

(f)Are Zn−1 and Zn independent for n ≥ 2?

4.31Markov . Let {Xi } Bernoulli(p). Consider the associated

Markov chain {Yi }ni=1, where

Yi = (the number of 1’s in the current run of 1’s). For example, if Xn = 101110 . . . , we have Y n = 101230 . . . .

(a)Find the entropy rate of Xn.

(b)Find the entropy rate of Y n.

4.32Time symmetry . Let {Xn} be a stationary Markov process. We condition on (X0, X1) and look into the past and future. For what index k is

H (X−n|X0, X1) = H (Xk |X0, X1)?

Give the argument.

4.33Chain inequality . Let X1 → X2 → X3 → X4 form a Markov chain. Show that

I (X1; X3) + I (X2; X4) ≤ I (X1; X4) + I (X2; X3). (4.108)

4.34Broadcast channel . Let X → Y → (Z, W ) form a Markov chain [i.e., p(x, y, z, w) = p(x)p(y|x)p(z, w|y) for all x, y, z, w]. Show that

I (X; Z) + I (X; W ) ≤ I (X; Y ) + I (Z; W ).

(4.109)

4.35Concavity of second law . Let {Xn}∞−∞ be a stationary Markov process. Show that H (Xn|X0) is concave in n. Speciﬁcally, show that

H (Xn|X0) − H (Xn−1|X0) − (H (Xn−1|X0) − H (Xn−2|X0))

= −I (X1; Xn−1|X0, Xn) ≤ 0.

(4.110)

Thus, the second difference is negative, establishing that H (Xn|X0) is a concave function of n.

HISTORICAL NOTES

The entropy rate of a stochastic process was introduced by Shannon [472], who also explored some of the connections between the entropy rate of the process and the number of possible sequences generated by the process. Since Shannon, there have been a number of results extending the basic

HISTORICAL NOTES

101

theorems of information theory to general stochastic processes. The AEP for a general stationary stochastic process is proved in Chapter 16.

Hidden Markov models are used for a number of applications, such as speech recognition [432]. The calculation of the entropy rate for constrained sequences was introduced by Shannon [472]. These sequences are used for coding for magnetic and optical channels [288].

CHAPTER 5

DATA COMPRESSION

We now put content in the deﬁnition of entropy by establishing the fundamental limit for the compression of information. Data compression can be achieved by assigning short descriptions to the most frequent outcomes of the data source, and necessarily longer descriptions to the less frequent outcomes. For example, in Morse code, the most frequent symbol is represented by a single dot. In this chapter we ﬁnd the shortest average description length of a random variable.

We ﬁrst deﬁne the notion of an instantaneous code and then prove the important Kraft inequality, which asserts that the exponentiated codeword length assignments must look like a probability mass function. Elementary calculus then shows that the expected description length must be greater than or equal to the entropy, the ﬁrst main result. Then Shannon’s simple construction shows that the expected description length can achieve this bound asymptotically for repeated descriptions. This establishes the entropy as a natural measure of efﬁcient description length. The famous Huffman coding procedure for ﬁnding minimum expected description length assignments is provided. Finally, we show that Huffman codes are competitively optimal and that it requires roughly H fair coin ﬂips to generate a sample of a random variable having entropy H . Thus, the entropy is the data compression limit as well as the number of bits needed in random number generation, and codes achieving H turn out to be optimal from many points of view.

5.1EXAMPLES OF CODES

Deﬁnition A source code C for a random variable X is a mapping from X, the range of X, to D , the set of ﬁnite-length strings of symbols from a D-ary alphabet. Let C(x) denote the codeword corresponding to x and let l(x) denote the length of C(x).

103

104 DATA COMPRESSION

For example, C(red) = 00, C(blue) = 11 is a source code for X = {red, blue} with alphabet D = {0, 1}.

Deﬁnition The expected length L(C) of a source code C(x) for a random variable X with probability mass function p(x) is given by

L(C) = p(x)l(x),	(5.1)
x X

where l(x) is the length of the codeword associated with x.

Without loss of generality, we can assume that the D-ary alphabet is

D= {0, 1, . . . , D − 1}.

Some examples of codes follow.

Example 5.1.1 Let X be a random variable with the following distribution and codeword assignment:

Pr(X = 1) = 21 ,	codeword C(1) = 0
Pr(X = 2) = 41 ,	codeword C(2) = 10	(5.2)
Pr(X = 3) = 81 ,	codeword C(3) = 110
Pr(X = 4) = 81 ,	codeword C(4) = 111.

The entropy H (X) of X is 1.75 bits, and the expected length L(C) = El(X) of this code is also 1.75 bits. Here we have a code that has the same average length as the entropy. We note that any sequence of bits can be uniquely decoded into a sequence of symbols of X. For example, the bit string 0110111100110 is decoded as 134213.

Example 5.1.2 Consider another simple example of a code for a random variable:

Pr(X = 1) = 31 ,	codeword C(1) = 0
Pr(X = 2) = 31 ,	codeword C(2) = 10	(5.3)
Pr(X = 3) = 31 ,	codeword C(3) = 11.

Just as in Example 5.1.1, the code is uniquely decodable. However, in this case the entropy is log 3 = 1.58 bits and the average length of the encoding is 1.66 bits. Here El(X) > H (X).

Example 5.1.3 (Morse code) The Morse code is a reasonably efﬁcient code for the English alphabet using an alphabet of four symbols: a dot,

<<< < Предыдущая 1 2 3 4 5 6 7 8 9 10 11 1213 / 7813 14 15 16 17 18 19 20 21 22 23 24 25 > Следующая >>>

Соседние файлы в папке Теория информации

#
09.08.201310.58 Mб234Cover T.M., Thomas J.A. Elements of Information Theory. 2006., 748p.pdf
#
09.08.20131.32 Mб31Gray R.M. Entropy and information theory. 1990., 284p.pdf