Добавил:

fench Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Сумский государственный педагогический университет им. Макаренко

Предмет:

Теория вероятностей и математическая статистика

Файл:

Теория информации / Gray R.M. Entropy and information theory. 1990., 284p

.pdf

Скачиваний:

Добавлен:

09.08.2013

Размер:

1.32 Mб

Скачать

☆

<<< < Предыдущая 5 6 7 8 9 10 11 12 13 14 15 1617 / 3117 18 19 20 21 22 23 24 25 26 27 28 29 > Следующая >>>

7.4. STATIONARY PROCESSES

139

Proof: From the chain rule for conditional relative entropy (equation (7.7),

n¡1

Hpjjm(XN jX¡) = Hpjjm(XljXl; X¡):

l=0

Stationarity implies that each term in the sum equals Hpjjm(X0jX¡), proving the corollary. 2

The next corollary extends Corollary 7.3.1 to processes.

Corollary 7.4.2: Given k and n ‚ k, let Mk denote the class of all k-step stationary Markov process distributions. Then

inf	H„	(X) = H„	(k) (X) = I	(X	; X¡	Xk):
m2Mk	pjjm	pjjp	p	k	j

Proof: Follows from (7.23) and Theorem 7.3.1. 2

This result gives an interpretation of the ﬂnite-gap information property (6.13): If a process has this property, then there exists a k-step Markov process which is only a ﬂnite \distance" from the given process in terms of limiting per-symbol divergence. If any such process has a ﬂnite distance, then the k- step Markov approximation also has a ﬂnite distance. Furthermore, we can apply Corollary 6.4.1 to obtain the generalization of the ﬂnite alphabet result of Theorem 2.6.2

Corollary 7.4.3: Given a stationary process distribution p which satisﬂes the ﬂnite-gap information property,

inf inf	„		„		(k) (X) = lim	„	(k) (X) = 0:
inf inf	H	pjjm	(X) = inf H	pjjp	(k) (X) = lim	H	(k) (X) = 0:
k m2Mk		pjjm	k	pjjp	k!1		pjjp

Lemma 7.4.1 also yields the following approximation lemma.

Corollary 7.4.4: Given a process fXng with standard alphabet A let p and m be stationary measures such that PXn << MXn for all n and m is kth order Markov. Let qk be an asymptotically accurate sequence of quantizers for

A. Then		„
„	(X) = lim	„		(qk(X));
H	(X) = lim	H	pjjm	(qk(X));
pjjm	k!1		pjjm

that is, the divergence rate can be approximated arbitrarily closely by that of a quantized version of the process. Thus, in particular,

„ ⁄

Hpjjm(X) = Hpjjm(X):

140	CHAPTER 7. RELATIVE ENTROPY RATES

Proof: This follows from Corollary 5.2.3 by letting the generating ¾-ﬂelds be Fn = ¾(qn(Xi); i = 0; ¡1; ¢ ¢ ¢) and the representation of conditional relative entropy as an ordinary divergence. 2

Another interesting property of relative entropy rates for stationary processes is that we can \reverse time" when computing the rate in the sense of the following lemma.

Lemma 7.4.2: Let fXng, p, and m be as in Lemma 7.4.1.			If either
H„pjjm(X) < 1 or HP jjM (X0jX¡) < 1, then
Hpjjm(X0jX¡1; ¢ ¢ ¢ ; X¡n) = Hpjjm(X0jX1; ¢ ¢ ¢ ; Xn)
and hence
„
Hpjjm(X0jX1; X2; ¢ ¢ ¢) = Hpjjm(X0jjX¡1; X¡2; ¢ ¢ ¢) = Hpjjm(X) < 1:
„	n		n	n	)
Proof: If Hpjjm(X) is ﬂnite, then so must be the terms Hpjjm(X		) = D(PX		jjMX	)

(since otherwise all such terms with larger n would also be inﬂnite and hence

„

H could not be ﬂnite). Thus from stationarity

Hpjjm(X0jX¡1; ¢ ¢ ¢ ; X¡n) = Hpjjm(XnjXn)

= D(PXn+1 jjMXn+1 ) ¡ D(PXn jjMXn )

D(PXn+1 jjMXn+1 ) ¡ D(PX1n jjMX1n ) = Hpjjm(X0jX1; ¢ ¢ ¢ ; Xn)

from which the results follow. If on the other hand the conditional relative entropy is ﬂnite, the results then follow as in the proof of Lemma 7.4.1 using the fact that the joint relative entropies are arithmetic averages of the conditional relative entropies and that the conditional relative entropy is deﬂned as the divergence between the P and S measures (Theorem 5.3.2). 2

7.5Mean Ergodic Theorems

In this section we state and prove some preliminary ergodic theorems for relative entropy densities analogous to those ﬂrst developed for entropy densities in Chapter 3 and for information densities in Section 6.3. In particular, we show that an almost everywhere ergodic theorem for ﬂnite alphabet processes follows easily from the sample entropy ergodic theorem and that an approximation argument then yields an L1 ergodic theorem for stationary sources. The results involve little new and closely parallel those for mutual information densities and therefore the details are skimpy. The results are given for completeness and because the L1 results yield the byproduct that relative entropies are uniformly integrable, a fact which does not follow as easily for relative entropies as it did for entropies.

ln m(XijXik¡k)

n¡1

1 X

7.5. MEAN ERGODIC THEOREMS

141

Finite Alphabets

Suppose that we now have two process distributions p and m for a random process fXng with ﬂnite alphabet. Let PXn and MXn denote the induced nth order distributions and pXn and mXn the corresponding probability mass functions (pmf’s). For example, pXn (an) = PXn (fxn : xn = ang) = p(fx : Xn(x) = ang). We assume that PXn << MXn . In this case the relative entropy density is given simply by

hn(x) = hXn (Xn)(x) = ln pXn (xn) ; mXn (xn)

where xn = Xn(x).

The following lemma generalizes Theorem 3.1.1 from entropy densities to relative entropy densities for ﬂnite alphabet processes. Relative entropies are of more general interest than ordinary entropies because they generalize to continuous alphabets in a useful way while ordinary entropies do not.

Lemma 7.5.1: Suppose that fXng is a ﬂnite alphabet process and that p and m are two process distributions with MXn >> PXn for all n, where p is AMS with stationary mean p„, m is a kth order Markov source with stationary transitions, and fp„xg is the ergodic decomposition of the stationary mean of p.

Assume also that MX

„

for all n. Then

>> PX

nlim

= h; p ¡ a.e. and in L1(p);

where h(x) is the invariant function deﬂned by

„

)

h(x) = ¡Hp„x

(X) ¡ Ep„x ln m(XkjX

„

lim

p„xjjm

) = H

p„xjjm

(X);

(7.27)

where

n!1 n

k+1 (xk+1)

m(XkjX

)(x) ·

= MXk jXk (xkjx

mXk (xk)

Furthermore,

„

Eph = H

pjjm

(X) = lim

pjjm

);

(7.28)

n!1 n

that is, the relative entropy rate of an AMS process with respect to a Markov process with stationary transitions is given by the limit. Lastly,

„	„	(7.29)
Hpjjm(X) = Hp„jjm(X);

that is, the relative entropy rate of the AMS process with respect to m is the same as that of its stationary mean with respect to m.

Proof: We have that

n1 h(Xn) = n1 ln p(Xn) ¡ n1 ln m(Xk) +

i=k

142

CHAPTER 7.

RELATIVE ENTROPY RATES

= 1 ln p(Xn)

1 ln m(Xk)

n¡1 ln m(X Xk)T i¡k;

(7.30)

i=k

where T is the shift transformation, p(Xn) is an abbreviation for PXn (Xn), and

m(X

Xk) = M

Xk jX

k (Xk

Xk). From Theorem 3.1.1 the ﬂrst term converges to

„

¡Hp„x (X)p-a.e. and in L (p).

Since MXk >> PXk , if MXk (F ) = 0, then also PXk (F ) = 0. Thus PXk and

hence also p assign zero probability to the event that MXk (Xk) = 0. Thus with

probability one under p, ln m(Xk) is ﬂnite and hence the second term in (7.5.4)

converges to 0 p-a.e. as n ! 1.

Deﬂne ﬁ as the minimum nonzero value of the conditional probability m(xkjx

Then with probability 1 under MXn and hence also under PXn we have that

1 n¡1

i=k

ln m(Xi

Xik k) • ln ﬁ

since otherwise the sequence Xn would have 0 probability under MXn and hence

also under PXn and 0 ln 0 is considered to be 0.

Thus the rightmost term of

(7.30) is uniformly integrable with respect to p and hence from Theorem 1.8.3

this term converges to Ep„x (ln m(XkjXk)). This proves the leftmost equality of

(7.27).

denote the distribution of Xn under the ergodic component p„x.

Let p„

„ n

„

dp„(x)„pXn x, if MX

(F ) = 0, then p„Xn x(F ) =

Since MX

>> PX

and PX

0 p-a.e. Since the alphabet of XnRif ﬂnite, wejtherefore also have with probabilityj

one under p„ that MXn >> p„

and hence

(Xn) =

p„ n

p„

(an)

(an) ln

p„xjjm

X jx

MXn (an)

is well deﬂned for p„-almost all x. This expectation can also be written as

n¡1

Hp„xjjm(Xn) = ¡Hp„x (Xn) ¡ Ep„x [ln m(Xk) +

ln m(XkjXk)T i¡k]

i=k

= ¡Hp„x (Xn) ¡ Ep„x [ln m(Xk)] ¡ (n ¡ k)Ep„x [ln m(XkjXk)];

where we have used the stationarity of the ergodic components. Dividing by n and taking the limit as n ! 1, the middle term goes to zero as previously and the remaining limits prove the middle equality and hence the rightmost inequality in (7.27).

Equation (7.28) follows from (7.27) and L1(p) convergence, that is, since n¡1hn ! h, we must also have that Ep(n¡1hn(Xn)) = n¡1Hpjjm(Xn) converges

„

to Eph. Since the former limit is Hpjjm(X), (7.28) follows. Since p„x is invariant (Theorem 1.8.2) and since expectations of invariant functions are the same under an AMS measure and its stationary mean (Lemma 6.3.1 of [50]), application of the previous results of the lemma to both p and p„ proves that

Z Z

„ „ „ „

Hpjjm(X) = dp(x)Hp„x jjm(X) = dp„(x)Hp„x jjm(X) = Hp„jjm(X);

7.5. MEAN ERGODIC THEOREMS

143

which proves (7.30) and completes the proof of the lemma. 2

Corollary 7.5.1: Given p and m as in the Lemma, then the relative entropy rate of p with respect to m has an ergodic decomposition, that is,

„ „

Hpjjm(X) = dp(x)Hp„x jjm(X):

Proof: This follows immediately from (7.27) and (7.28). 2

Standard Alphabets

We now drop the ﬂnite alphabet assumption and suppose that fXng is a standard alphabet process with process distributions p and m, where p is stationary, m is kth order Markov with stationary transitions, and MXn >> PXn are the induced vector distributions for n = 1; 2; ¢ ¢ ¢ . Deﬂne the densities fn and entropy densities hn as previously.

As an easy consequence of the development to this point, the ergodic decomposition for divergence rate of ﬂnite alphabet processes combined with the deﬂnition of H⁄ as a supremum over rates of quantized processes yields an extension of Corollary 6.2.1 to divergences. This yields other useful properties as summarized in the following corollary.

Corollary 7.5.1: Given a standard alphabet process fXng suppose that p and m are two process distributions such that p is AMS and m is kth order Markov with stationary transitions and MXn >> PXn are the induced vector distributions. Let p„ denote the stationary mean of p and let fp„xg denote the ergodic decomposition of the stationary mean p„. Then

Hp⁄jjm(X) = Z	dp(x)Hp„⁄x jjm(X):	(7.31)
In addition,
Hp⁄jjm(X) = Hp„⁄jjm(X) = H„p„jjm(X) = H„pjjm(X);		(7.32)

that is, the two deﬂnitions of relative entropy rate yield the same values for AMS p and stationary transition Markov m and both rates are the same as the corresponding rates for the stationary mean. Thus relative entropy rate has an ergodic decomposition in the sense that

„ „

Hpjjm(X) = dp(x)Hp„x jjm(X): (7.33)

Comment: Note that the extra technical conditions of Theorem 6.4.2 for

„ ⁄

equality of the analogous mutual information rates I and I are not needed here. Note also that only the ergodic decomposition of the stationary mean p„ of the AMS measure p is considered and not that of the Markov source m.

144	CHAPTER 7. RELATIVE ENTROPY RATES

Proof: The ﬂrst statement follows as previously described from the ﬂnite alphabet result and the deﬂnition of H⁄. The left-most and right-most equalities of (7.32) both follow from the previous lemma. The middle equality of (7.32) follows from Corollary 7.4.2. Eq. (7.33) then follows from (7.31) and (7.32). 2

Theorem 7.5.1: Given a standard alphabet process fXng suppose that p and m are two process distributions such that p is AMS and m is kth order Markov with stationary transitions and MXn >> PXn are the induced vector distributions. Let fp„xg denote the ergodic decomposition of the stationary mean p„. If

	1			n	„
nlim		n	Hpjjm(X		) = Hpjjm(X) < 1;
!1

then there is an invariant function h such that n¡1hn ! h in L1(p) as n ! 1. In fact,

„

h(x) = Hp„x jjm(X);

the relative entropy rate of the ergodic component p„x with respect to m. Thus, in particular, under the stated conditions the relative entropy densities hn are uniformly integrable with respect to p.

Proof: The proof exactly parallels that of Theorem 6.3.1, the mean ergodic theorem for information densities, with the relative entropy densities replacing the mutual information densities. The density is approximated by that of a quantized version and the integral bounded above using the triangle inequality.

„ ⁄

One term goes to zero from the ﬂnite alphabet case. Since H = H (Corollary 7.5.1) the remaining terms go to zero because the relative entropy rate can be approximated arbitrarily closely by that of a quantized process. 2

It should be emphasized that although Theorem 7.5.1 and Theorem 6.3.1 are similar in appearance, neither result directly implies the other. It is true that mutual information can be considered as a special case of relative entropy, but given a pair process fXn; Yng we cannot in general ﬂnd a kth order Markov

„

distribution m for which the mutual information rate I(X; Y ) equals a relative

„

entropy rate Hpjjm. We will later consider conditions under which convergence of relative entropy densities does imply convergence of information densities.

Chapter 8

Ergodic Theorems for Densities

8.1Introduction

This chapter is devoted to developing ergodic theorems ﬂrst for relative entropy densities and then information densities for the general case of AMS processes with standard alphabets. The general results were ﬂrst developed by Barron [9] using the martingale convergence theorem and a new martingale inequality. The similar results of Algoet and Cover [7] can be proved without direct recourse to martingale theory. They infer the result for the stationary Markov approximation and for the inﬂnite order approximation from the ordinary ergodic theorem. They then demonstrate that the growth rate of the true density is asymptotically sandwiched between that for the kth order Markov approximation and the inﬂnite order approximation and that no gap is left between these asymptotic upper and lower bounds in the limit as k ! 1. They use martingale theory to show that the values between which the limiting density is sandwiched are arbitrarily close to each other, but we shall see that this is not necessary and this property follows from the results of Chapter 6.

8.2Stationary Ergodic Sources

Theorem 8.2.1: Given a standard alphabet process fXng, suppose that p and m are two process distributions such that p is stationary ergodic and m is a K- step Markov source with stationary transition probabilities. Let MXn >> PXn be the vector distributions induced by p and m. As before let

hn = ln fXn (Xn) = ln dPXn (Xn): dMXn

145


146	CHAPTER 8.		ERGODIC THEOREMS FOR DENSITIES
Then with probability one under p
		1	„
	lim		hn = H	pjjm	(X):
	n	!1 n		pjjm
		!1 n

Proof: Let p(k) denote the k-step Markov approximation of p as deﬂned in Theorem 7.3.1, that is, p(k) has the same kth order conditional probabilities and k-dimensional initial distribution. From Corollary 7.3.1, if k ‚ K, then (7.8){(7.10) hold. Consider the expectation

ˆ fXn

(Xn) !

= EPXn

ˆ fXn

= Z

ˆ fXn

! dPXn :

(kn)

(Xn)

(kn)

Deﬂne the set An = fxn : fXn > 0g; then PXn (An) = 1. Use the fact that fXn = dPXn =dMXn to write

EP	ˆ fXn			(Xn) !	=	An	ˆ fXn		! fXn dMXn
		f	(kn)	(Xn)		Z	f	(kn)
			X					X

=fX(kn) dMXn :

From Corollary 7.3.1,

(k)

dPX(kn)

fXn

dMXn

and therefore

f (kn)(Xn)

= ZAn

dP (kn)

(k)

Ep fXn (Xn)

dMXn dMXn = PXn (An) • 1:

Thus we can apply Lemma 5.4.2 to the sequence f

(kn)(Xn)=fXn (Xn) to con-

clude that with p-probability 1

(kn)

(Xn)

nlim

• 0

(Xn)

and hence

lim

ln f (kn)

)

lim inf

n (Xn):

(8.1)

•

The left-hand limit is well deﬂned by the usual ergodic theorem:

	1 (k)			1 n¡1				1
nlim		ln fXn	(Xn) = nlim		X	ln fXljXlk¡k	(XljXlk¡k) + nlim			ln fXk (Xk):
nlim	n	ln fXn	(Xn) = nlim	n		ln fXljXlk¡k	(XljXlk¡k) + nlim		n	ln fXk (Xk):
!1			!1		l=k		!1
					l=k

Since 0 < fXk < 1 with probability 1 under MXk and hence also under PXk , then 0 < fXk (Xk) < 1 under p and therefore n¡1 ln fXk (Xk) ! 0 as n ! 1 with probability one. Furthermore, from the ergodic theorem for stationary and

8.2. STATIONARY ERGODIC SOURCES

147

ergodic processes (e.g., Theorem 7.2.1 of [50]), since p is stationary ergodic we have with probability one under p using (7.20) and Corollary 7.4.1 that

1 n¡1

nlim

ln fXljXlk¡k (XljXlk¡k)

l=k

1 n¡1

= nlim

ln fX0jX¡1;¢¢¢;X¡k (X0j

l=k

X¡1; ¢ ¢ ¢ ; X¡k)T l = Ep ln fX0jX¡1;¢¢¢;X¡k (X0jX¡1; ¢ ¢ ¢ ; X¡k)

„

= Hpjjm(X0jX¡1; ¢ ¢ ¢ ; X¡k) = Hp(k)jjm(X):

Thus with (8.1) we now have that

lim inf

ln f

) H (X X ;

; X )

(8.2)

n X

‚ pjjm 0j ¡1 ¢ ¢ ¢

¡k

for any positive integer k. Since m is Kth order Markov, Lemma 7.4.1 and the above imply that

lim inf

ln f

n (Xn)

‚

X¡) = H„

(X);

(8.3)

pjjm

which completes half of the sandwich proof of the theorem.

„pjjm(X) = 1, the proof is completed with (8.3). Hence we can suppose

„
that Hpjjm(X) < 1. From Lemma 7.4.1 using the distribution SX0;X¡1;X¡2;¢¢¢
constructed there, we have that
D(PX0;X¡1;¢¢¢jjSX0;X¡1;¢¢¢) = Hpjjm(X0jX¡) = Z						dPX0;X¡ ln fX0jX¡
where			dPX0
fX0		X¡ =	dPX0	;X¡1;¢¢¢	:
	j		dS	;X¡1;¢¢¢

				;X¡1;¢¢¢
			X0	;X¡1;¢¢¢

It should be pointed out that we have not (and will not) prove that fX0jX¡1;¢¢¢;X¡n ! fX0jX¡ ; the convergence of conditional probability densities which follows

from the martingale convergence theorem and the result about which most generalized Shannon-McMillan-Breiman theorems are built. (See, e.g., Barron [9].) We have proved, however, that the expectations converge (Lemma 7.4.1), which is what is needed to make the sandwich argument work.

For the second half of the sandwich proof we construct a measure Q which will be dominated by p on semi-inﬂnite sequences using the above conditional densities given the inﬂnite past. Deﬂne the semi-inﬂnite sequence Xn¡ = f¢ ¢ ¢ ; Xn¡1g

for all nonnegative integers n. Let Bkn = ¾(Xkn) and Bk¡ = ¾(Xk¡) = ¾(¢ ¢ ¢ ; Xk¡1) be the ¾-ﬂelds generated by the ﬂnite dimensional random vector Xkn and the

semi-inﬂnite sequence Xk¡, respectively. Let Q be the process distribution having the same restriction to ¾(Xk¡) as does p and the same restriction to

148	CHAPTER 8. ERGODIC THEOREMS FOR DENSITIES

¾(X0; X1; ¢ ¢ ¢) as does p, but which makes X¡ and Xkn conditionally independent given Xk for any n; that is,

QXk¡ = PXk¡ ;

QXk ;Xk+1;¢¢¢ = PXk ;Xk+1;¢¢¢;

and X¡ ! Xk ! Xkn is a Markov chain for all positive integers n so that

Q(Xkn 2 F jXk¡) = Q(Xkn 2 F jXk):

The measure Q is a (nonstationary) k-step Markov approximation to P in the sense of Section 5.3 and

Q = PX¡£(Xk ;Xk+1;¢¢¢)jXk

(in contrast to P = PX¡Xk Xk1 ). Observe that X¡ ! Xk ! Xkn is a Markov chain under both Q and m.

By assumption,

Hpjjm(X0jX¡) < 1 and hence from Corollary 7.4.1

Hpjjm(XknjXk¡) = nHpjjm(XknjXk¡) < 1
and hence from Theorem 5.3.2 the density fXknjXk¡ is well-deﬂned as
fXknjXk¡ =						dSXn¡+k				(8.4)
fXknjXk¡ =						PX¡				(8.4)
							n+k
where
SXn¡+k =									;	(8.5)
SXn¡+k =				MXknjXk PXk¡					;	(8.5)
and
Z dPXn¡+k ln fXknjXk¡ = D(PXn¡+k jjSXn¡+k )
= nHpjjm(XknjXk¡) < 1:										(8.6)
Thus, in particular,
SXn¡+k >> PXn¡+k :
Consider now the sequence of ratios of conditional densities
‡n =		fXknjXk (Xn+k)
‡n =
	f			n	¡(X¡			)
			Xk jXk				n+k
We have that	dp‡n = Z
Z	dp‡n = Z						‡n

<<< < Предыдущая 5 6 7 8 9 10 11 12 13 14 15 1617 / 3117 18 19 20 21 22 23 24 25 26 27 28 29 > Следующая >>>

Соседние файлы в папке Теория информации

#
09.08.201310.58 Mб214Cover T.M., Thomas J.A. Elements of Information Theory. 2006., 748p.pdf
#
09.08.20131.32 Mб28Gray R.M. Entropy and information theory. 1990., 284p.pdf