Добавил:

fench Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Сумский государственный педагогический университет им. Макаренко

Предмет:

Теория вероятностей и математическая статистика

Файл:

Теория информации / Gray R.M. Entropy and information theory. 1990., 284p

.pdf

Скачиваний:

Добавлен:

09.08.2013

Размер:

1.32 Mб

Скачать

☆

<<< < Предыдущая 6 7 8 9 10 11 12 13 14 15 16 1718 / 3118 19 20 21 22 23 24 25 26 27 28 29 30 > Следующая >>>

8.2. STATIONARY ERGODIC SOURCES

149

where

Gn =

x : f n

¡(x¡

) > 0

Xk jXk

n+k

since Gn has probability 1 under p (or else (8.6) would be violated). Thus

fXknjXk¡

k j k

dp‡n = dPXn¡+k

fXknjXk

(Xn+k)

1 fXn X¡ >0

fXknjXk¡

k j k

fXknjXk

(Xn+k)

= dSXn¡+k fXkn Xk¡

g • Z

1 fXn X¡ >0

Xk (Xn+k)1

k j

(Xn+k):

= dS

¡ fXn

Xn+k

Xn X¡

Xn+k

Using the deﬂnition of the measure S and iterated expectation we have that

dp‡n • dMXknjXk¡ dPXk¡ fXknjXk (Xn+k):

=dMXknjXk dPXk¡ fXknjXk (Xn+k):

Since the integrand is now measurable with respect to ¾(Xn+k), this reduces

dp‡n •

dMXknjXk dPXk fXknjXk :

Applying Lemma 5.3.2 we have

dp‡n

dMXn

Xk dPXk

dPXknjXk

dMXknjXk

• Z

Thus

= Z dPXk dPXknjXk = 1:

dp‡n • 1

and we can apply Lemma 5.4.1 to conclude that p-a.e.

lim sup ‡n = lim sup

fXknjXk

• 0:

(8.7)

n!1

Xk jXk¡

Using the chain rule for densities,

fXknjXk

fXn

n¡1

Xk jXk¡

Ql=k

fXljXl¡

150

CHAPTER 8.

ERGODIC THEOREMS FOR DENSITIES

Thus from (8.7)

1 n¡1

X ¡

Xl jXl¡ ! •

lim sup

ln f n

ln f

l=k

Invoking the ergodic theorem for the rightmost terms and the fact that the middle term converges to 0 almost everywhere since ln fXk is ﬂnite almost everywhere implies that

lim sup	1	ln fXn	•	Ep(ln f	¡ ) = Ep(ln fX0		X¡ )
						j
n!1	n			„	Xk jXk
							(8.8)
				= Hpjjm(X):

Combining this with (8.3) completes the sandwich and proves the theorem.

8.3Stationary Nonergodic Sources

Next suppose that the source p is stationary with ergodic decomposition fp‚; ‚ 2 ⁄g and ergodic component function ˆ as in Theorem 1.8.3. We ﬂrst require some technical details to ensure that the various Radon-Nikodym derivatives are well deﬂned and that the needed chain rules for densities hold.

Lemma 8.3.1: Given a stationary source fXng, let fp‚; ‚ 2 ⁄g denote the ergodic decomposition and ˆ the ergodic component function of Theorem 1.8.3. Let Pˆ denote the induced distribution of ˆ. Let PXn and PX‚ n denote the induced marginal distributions of p and p‚. Assume that fXng has the ﬂnite-gap information property of (6.13); that is, there exists a K such that

Ip(XK ; X¡jXK ) < 1;	(8.9)
where X¡ = (X¡1; X¡2; ¢ ¢ ¢). We also assume that for some n
I(Xn; ˆ) < 1:	(8.10)

This will be the case, for example, if (8.9) holds for K = 0. Let m be a K- step Markov process such that MXn >> PXn for all n. (Observe that such a process exists since from (8.9) the Kth order Markov approximation p(K) su–ces.) Deﬂne MXn;ˆ = MXn £ Pˆ. Then

MXn;ˆ >> PXn £ Pˆ >> PXn;ˆ;					(8.11)
and with probability 1 under p
	MXn >> PXn >> PXˆn :
Lastly,
	dPXˆn		dPXn;ˆ		(8.12)
		= fXnjˆ =		:
	dMXn	= fXnjˆ =	d(MXn £ Pˆ)	:

8.3. STATIONARY NONERGODIC SOURCES											151
and therefore
	dPXˆn			=	dPXˆn =dMXn	=			fXnjˆ	:	(8.13)
					dPXn =dMXn
	dPXn								fXn
Proof: From Theorem 6.4.4 the given assumptions ensure that
lim		1	Epi(Xn; ˆ) = lim				1	I(Xn; ˆ) = 0			(8.14)

n!1 n					n!1 n

and hence PXn £ Pˆ >> PXn;ˆ (since otherwise I(Xn; ˆ) would be inﬂnite for some n and hence inﬂnite for all larger n since it is increasing with n). This proves the right-most absolute continuity relation of (8.11). This in turn implies that MXn £ Pˆ >> PXn;ˆ. The lemma then follows from Theorem 5.3.1 with X = Xn, Y = ˆ and the chain rule for Radon-Nikodym derivatives. 2

We know that the source will produce with probability one an ergodic component p‚ and hence Theorem 8.2.1 will hold for this ergodic component. In other words, we have for all ‚ that

	1		ln fXnjˆ(X	n			„
nlim	n		ln fXnjˆ(X			j‚) = Hp‚		(X); p‚ ¡ a:e:
!1
This implies that
nlim	1		ln fXnjˆ(X		n		„
nlim		n	ln fXnjˆ(X				jˆ) = Hpˆ		(X); p ¡ a:e:	(8.15)
!1

Making this step precise generalizes Lemma 3.3.1.

Lemma 8.3.2: Suppose that fXng is a stationary not necessarily ergodic source with ergodic component function ˆ. Then (8.15) holds.

Proof: The proof parallels that for Lemma 3.3.1. Observe that if we have two random variables U; V (U = X0; X1; ¢ ¢ ¢ and Y = ˆ above) and a sequence

¡1 nj „

of functions gn(U; V ) (n fXnjˆ(X ˆ)) and a function g(V ) (Hpˆ (X)) with the property

lim gn(U; v) = g(v); PUjV =v ¡ a:e:;

n!1

then also

lim gn(U; V ) = g(V ); PUV ¡ a:e:

n!1

since deﬂning the (measurable) set G = fu; v : limn!1 gn(u; v) = g(v)g and its section Gv = fu : (u; v) 2 Gg, then from (1.26)

PUV (G) = PUjV (Gvjv)dPV (v) = 1

if PUjV (Gvjv) = 1 with probability 1. 2

It is not, however, the relative entropy density using the distribution of the ergodic component that we wish to show converges. It is the original sample density fXn . The following lemma shows that the two sample entropies converge to the same thing. The lemma generalizes Lemma 3.3.1 and is proved by a

152	CHAPTER 8. ERGODIC THEOREMS FOR DENSITIES

sandwich argument analogous to Theorem 8.2.1. The result can be viewed as an almost everywhere version of (8.14).

Theorem 8.3.1: Given a stationary source fXng, let fp‚; ‚ 2 ⁄g denote the ergodic decomposition and ˆ the ergodic component function of Theorem 1.8.3. Assume that the ﬂnite-gap information property (8.9) is satisﬂed and that (8.10) holds for some n. Then

nlim	1		i(X	n	; ˆ) = nlim	1	ln	fXnjˆ			= 0; p ¡ a:e:
nlim		n	i(X		; ˆ) = nlim	n	ln	f		n	= 0; p ¡ a:e:
!1					!1				X

Proof: From Theorem 5.4.1 we have immediately that

lim inf	n	(8.16)

n!1 in(X ; ˆ) ‚ 0;

which provides half of the sandwich proof.

To develop the other half of the sandwich, for each k ‚ K let p(k) denote the k-step Markov approximation of p. Exactly as in the proof of Theorem 8.2.1, it follows that (8.1) holds. Now, however, the Markov approximation relative entropy density converges instead as

1 (k)

ln fXn (Xn) = nlim

nlim

fXk jXk (XkjXk)T k = Epˆ fXk jXk (XkjXk):

l=k

Combining this with (8.15 we have that

fXnjˆ(Xnjˆ)

„

lim sup n ln

(XkjX

n (Xn)

• Hpˆ jjm(X) ¡ Epˆ fXk jX

n!1

From Lemma 7.4.1, the right hand side is just Ipˆ (Xk; X¡jXk) which from

„

Corollary 7.4.2 is just Hpjjp(k) (X). Since the bound holds for all k, we have that

	1	ln	fXnjˆ(Xnjˆ)				inf „	(k)	(X) · ‡:
lim sup n						•
			f	X	n (Xn)		k Hpˆ jjp
n!1

Using the ergodic decompostion of relative entropy rate (Corollary 7.5.1) that and the fact that Markov approximations are asymptotically accurate (Corollary 7.4.3) we have further that

„

dPˆ‡ = dPˆ inf Hpˆ jjp(k) (X)

• k

ˆHpˆ jjp

(X) = k Hpjjp

(k) (X) = 0

inf

„

(k)

inf

„

and hence ‡ = 0 with Pˆ probability 1. Thus

fXnjˆ(Xnjˆ)

lim sup

• 0;

(8.17)

n (Xn)

n!1

8.4. AMS SOURCES

153

which with (8.16) completes the sandwich proof. 2

Simply restating the theorem yields and using (8.15) the ergodic theorem for relative entropy densities in the general stationary case.

Corollary 8.3.1: Given the assumptions of Theorem 8.3.1,

nlim	1	n	(X	n	„
		n			„
	n ln fX				) = Hpˆ jjm(X); p ¡ a:e:
!1

The corollary states that the sample relative entropy density of a process satisfying (8.9) converges to the conditional relative entropy rate with respect to the underlying ergodic component. This is a slight extension and elaboration

of Barron’s result [9] which made the stronger assumption that H	pjjm	(X	X¡) =
„	pjjm		0j
Hpjjm(X) < 1. From Corollary 7.4.3 this condition is su–cient but not nec-

essary for the ﬂnite-gap information property of (8.9). In particular, the ﬂnite gap information property implies that

„ ¡j k 1

Hpjjp(k) (X) = Ip(Xk; X X ) < ;

„	1. In addition, Barron [9] and
but it need not be true that Hpjjm(X) <

Algoet and Cover [7] do not characterize the limiting density as the entropy rate of the ergodic component, instead they eﬁectively show that the limit is Epˆ (ln fX0jX¡ (X0jX¡)). This, however, is equivalent since it follows from the ergodic decomposition (see speciﬂcally Lemma 8.6.2 [50]) that fX0jX¡ = fX0jX¡;ˆ with probability one since the ergodic component ˆ can be determined from the inﬂnite past X¡.

8.4 AMS Sources

The following lemma is a generalization of Lemma 3.4.1. The result is due to Barron [9], who proved it using martingale inequalities and convergence results.

Lemma 8.4.1: Let fXng be an AMS source with the property that for

every integer k there exists an integer l = l(k) such that
	Ip(Xk; (Xk+l; Xk+l+1; ¢ ¢ ¢)jXkl ): < 1:				(8.18)
Then	1
nlim	1	i(Xk; (Xk + l; ¢ ¢ ¢ ; Xn¡1)jXkl ) = 0; p			¡ a:e:

	n
!1
Proof: By assumption
			Ip(Xk; (Xk+l; Xk+l+1; ¢ ¢ ¢)jXkl ) =
Ep ln			fXk jXk ;Xk+1;¢¢¢(XkjXk; Xk+1; ¢ ¢ ¢)	< 1:
Ep ln			fXk jXkl (XkjXkl )	< 1:

This implies that

PXk £(Xk +l;¢¢¢)jXkl >> PX0;X1;::::


154		CHAPTER 8. ERGODIC THEOREMS FOR DENSITIES
with				fXk jXk ;Xk +1;¢¢¢(XkjXk; Xk + 1; ¢ ¢ ¢)
		dPX0;X1;::::	=	fXk jXk ;Xk +1;¢¢¢(XkjXk; Xk + 1; ¢ ¢ ¢)		:
		dPXk £(Xk +l;¢¢¢)jXkl			fXk jXkl (XkjXkl ):
Restricting the measures to Xn for n > k + l yields
		dPXn	=		fXk jXk ;Xk +1;¢¢¢;Xn (XkjXk; Xk + 1; ¢ ¢ ¢)
	dPXk £(Xk +l;¢¢¢;Xn)jXkl		=		fXk jXkl (XkjXkl )
	dPXk £(Xk +l;¢¢¢;Xn)jXkl				fXk jXkl (XkjXkl )

= i(Xk; (Xk + l; ¢ ¢ ¢ ; Xn)jXkl ):

With this setup the lemma follows immediately from Theorem 5.4.1. 2

The following lemma generalizes Lemma 3.4.2 and will yield the general theorem. The lemma was ﬂrst proved by Barron [9] using martingale inequalities.

Theorem 8.4.1: Suppose that p and m are distributions of a standard alphabet process fXng such that p is AMS and m is k-step Markov. Let p„ be a stationary measure that asymptotically dominates p (e.g., the stationary mean).

Suppose that PX			n	, PX	, and MX	n	are the distributions induced by p, p„, and m
and that MX				„ n				and PX	for all n and that fX		and fX
	n	dominates both PX					n			n
								„ n			„ n

are the corresponding densities. If there is an invariant function h such that

lim 1

n!1 n

then also

lim 1

n!1 n

„	n	(X	n	) = h; p„ ¡ a:e:
ln fX		(X		) = h; p„ ¡ a:e:

ln fXn (Xn) = h; p ¡ a:e:

Proof: For any k and n ‚ k we can write using the chain rule for densities

ln fXn ¡

ln fXkn¡k =

ln fXk jXkn¡k :

Since for k • l < n

ln fXk jXkn¡k =

ln fXk jXkl

i(Xk; (Xk+l; ¢ ¢ ¢ ; Xn¡1)jXkl );

Lemma 8.4.1 and the fact that densities are ﬂnite with probability one implies

that

nlim

ln fXk jXkn¡k

= 0; p ¡ a:e:

This implies that there is a subsequence k(n) ! 1 such that

ln f

n (Xn)

ln f

n k(n)

(Xn¡k(n));

0; p

a:e:

¡ n

Xk(¡n)

)

k(n)

To prove this, for each k chose N (k) large enough so that

p(jN1 (k) ln fXk jXkN (k)¡k (XkjXkN(k)¡k)j > 2¡k) • 2¡k

8.4. AMS SOURCES

155

and then let k(n) = k for N (k) • n < N (k + 1). Then from the Borel-Cantelli lemma we have for any † that

p(j 1 ln f N (k)¡k (XkjXN(k)¡k)j > † i.o.) = 0

N (k) Xk jXk k

and hence

lim

ln f

n (Xn) = lim

ln f

n k(n) (Xn¡k(n)); p

a:e:

Xk(¡n)

k(n)

In a similar manner we can also choose the sequence so that

lim

„ n

) = lim

ln fX

n!1 n

From Markov’s inequality

ln fXkn¡k (Xkn¡k) ‚

p„(

= p„(

fXn¡k (Xkn¡k)

n†

)

‚ e

„ n k

n¡k

)

fXk ¡

(Xk

ln f„			n k(n) (Xn¡k(n)); p„					¡			a:e:
		Xk(¡n)				k(n)		¡
1			„ n	k		n¡k	) + †)
			„ n	k		n¡k
	n ln fXk			¡	(Xk
•			Z	dp„		fXn¡k (Xkn¡k)
•			Z			fXkn¡k (Xk ¡				)
		e¡n†				„ k	n		k

= e¡n† Z

dmfXkn¡k (Xkn¡k) = e¡n†:

Hence again invoking the Borel-Cantelli lemma we have that

n¡k

„ n

n¡k

) + † i.o.) = 0

p„( n ln fXk ¡

(Xk

) ‚ n ln fXk ¡

(Xk

and therefore

ln fXkn¡k (Xkn¡k)

• h; p„ ¡ a:e:

(8.19)

lim sup

n!1

The above event is in the tail ¾-ﬂeld

n ¾(Xn; Xn+1; ¢ ¢ ¢) since h is invariant

Thus

and p„ dominates p on the tail ¾-ﬂeld. T

lim sup

ln f

n k(n)

(Xn¡k(n))

•

h; p

a:e:

n!1

Xk(¡n)

k(n)

and hence

lim sup 1 ln fXn (Xn) • h; p ¡ a:e:

n!1 n

which half proves the lemma.

Since p„ asymptotically dominates p, given † > 0 there is a k such that

p( lim		n¡1f„(Xn¡k) = h)	‚	1	¡	†:
n	!1	k	‚		¡
	!1

156 CHAPTER 8. ERGODIC THEOREMS FOR DENSITIES

Again applying Markov’s inequality and the Borel-Cantelli lemma as previously we have that

k(n) (Xn¡k(n))

k(n)

lim inf

k(n)

‚

0; p

a:e:

„

n k(n)

)

fXn¡k(n) (Xk(¡n)

k(n)

which implies that

p(lim inf

n k(n)

(Xn¡k)

‚

†

Xk(¡n)

and hence also that

p(lim inf 1 fXn (Xn) ‚ h) ‚ †:

n!1 n

Since † can be made arbitrarily small, this proves that p-a.e. lim inf n¡1hn ‚ h, which completes the proof of the lemma. 2

We can now extend the ergodic theorem for relative entropy densities to the general AMS case.

Corollary 8.4.1: Given the assumptions of Theorem 8.4.1,

	1	n	(X	n	„	(X);
lim
		ln fX			) = Hp„ˆ
n!1 n

where p„ˆ is the ergodic component of the stationary mean p„ of p.

Proof: The proof follows immediately from Theorem 8.4.1 and Corollary 8.3.1, the ergodic theorem for the relative entropy density for the stationary mean. 2

8.5Ergodic Theorems for Information Densities.

As an application of the general theorem we prove an ergodic theorem for mutual information densities for stationary and ergodic sources. The result can be extended to AMS sources in the same manner that the results of Section 8.3 were extended to those of Section 8.4. As the stationary and ergodic result su–ces for the coding theorems and the AMS conditions are messy, only the stationary case is considered here. The result is due to Barron [9].

Theorem 8.5.1: Let fXn; Yng be a stationary ergodic pair random process with standard alphabet. Let PXnY n , PXn , and PY n denote the induced distributions and assume that for all n PXn £ PY n >> PXnY n and hence the information densities

in(Xn; Y n) = dPXnY n d(PXn £ PY n )

are well deﬂned. Assume in addition that both the fXng and fYng processes have the ﬂnite-gap information property of (8.9) and hence by the comment

dPZn

8.5. ERGODIC THEOREMS FOR INFORMATION DENSITIES.

157

following Corollary 7.3.1 there is a K such that both processes satisfy the K-

gap property

I(XK ; X¡jXK ) < 1; I(YK ; Y ¡jY K ) < 1:

Then	1						„
	1			n		n	„
nlim		n	in(X		; Y		) = I(X; Y ); p ¡ a:e::
!1
Proof: Let Zn = (Xn; Yn).					Let MXn = PX(Kn) and MY n = PY(Kn ) denote

the Kth order Markov approximations of fXng and fYng, respectively. The ﬂnite-gap approximation implies as in Section 8.3 that the densities

fXn =

dPXn

dMXn

and

dPY n

fY n =

dMY n

are well deﬂned. From Theorem 8.2.1

lim

ln f

n (Xn) = H

(K) (X

X¡) = I(X

; X¡

Xk) <

;

n!1 n

pX jjpX

lim

ln f

n (Y n) = I(Y

; Y ¡

Y k) <

Deﬂne the measures MZn by MXn £ MY n . Then this is a K-step Markov source and since

MXn £ MY n >> PXn £ PY n >> PXn;Y n = PZn ;

the density

fZn = dMZn

is well deﬂned and from Theorem 8.2.1 has a limit

lim 1 ln fZn (Zn) = Hpjjm(Z0jZ¡):

n!1 n

If the density in(Xn; Y n) is inﬂnite for any n, then it is inﬂnite for all larger n and convergence is trivially to the inﬂnite information rate. If it is ﬂnite, the

chain rule for densities yields
1				1			1		1
		in(Xn; Y n) =				ln fZn (Zn)	¡		ln fXn (Xn) ¡		ln fY n (Y n)
	n	in(Xn; Y n) =			n	ln fZn (Zn)	¡	n	ln fXn (Xn) ¡	n	ln fY n (Y n)
		n	! Hpjjp(k) (Z0jZ¡) ¡ Hpjjp(k) (X0jX¡) ¡ Hpjjp(k) (Y0jY ¡)
			!1
			„			„			„
			= Hpjjp(k) (X; Y ) ¡ Hpjjp(k) (X) ¡ Hpjjp(k) (Y ):

The limit is not indeterminate ( of the form 1¡1) because the two subtracted terms are ﬂnite. Since convergence is to a constant, the constant must also be

„

the limit of the expected values of n¡1in(Xn; Y n), that is, I(X; Y ). 2

158	CHAPTER 8. ERGODIC THEOREMS FOR DENSITIES

<<< < Предыдущая 6 7 8 9 10 11 12 13 14 15 16 1718 / 3118 19 20 21 22 23 24 25 26 27 28 29 30 > Следующая >>>

Соседние файлы в папке Теория информации

#
09.08.201310.58 Mб263Cover T.M., Thomas J.A. Elements of Information Theory. 2006., 748p.pdf
#
09.08.20131.32 Mб32Gray R.M. Entropy and information theory. 1990., 284p.pdf