Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Теория информации / Gray R.M. Entropy and information theory. 1990., 284p

.pdf
Скачиваний:
28
Добавлен:
09.08.2013
Размер:
1.32 Mб
Скачать

8.2. STATIONARY ERGODIC SOURCES

 

 

 

 

 

 

 

 

 

 

149

where

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Gn =

f

x : f n

¡(x¡

) > 0

g

 

 

 

 

 

 

 

 

 

 

 

 

 

Xk jXk

 

n+k

 

 

 

 

 

 

 

 

 

since Gn has probability 1 under p (or else (8.6) would be violated). Thus

 

Z

 

 

Z

 

 

ˆ

fXknjXk¡

 

 

f

k j k

 

 

 

g!

 

dp‡n = dPXn¡+k

 

 

fXknjXk

(Xn+k)

1 fXn X¡ >0

 

 

 

ˆ

 

 

 

 

 

 

 

 

 

Z

 

 

 

j

 

fXknjXk¡

 

 

f

 

k j k

 

 

g!

 

 

 

 

 

 

 

 

 

 

fXknjXk

(Xn+k)

 

 

 

 

 

 

 

 

Z

= dSXn¡+k fXkn Xk¡

 

 

 

 

 

 

g Z

1 fXn X¡ >0

 

 

 

 

 

j

Xk (Xn+k)1

f

 

k j

k

 

 

 

 

 

 

 

j

 

 

(Xn+k):

= dS

¡ fXn

 

f

 

>0

 

 

dS

 

¡ fXn

Xk

 

Xn+k

k

 

 

 

 

 

Xn X¡

 

 

 

 

 

 

Xn+k

k

 

 

 

 

Using the deflnition of the measure S and iterated expectation we have that

ZZ

dp‡n • dMXknjXk¡ dPXk¡ fXknjXk (Xn+k):

Z

=dMXknjXk dPXk¡ fXknjXk (Xn+k):

Since the integrand is now measurable with respect to ¾(Xn+k), this reduces

to

 

 

 

Z

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Z

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

dp‡n

 

dMXknjXk dPXk fXknjXk :

 

Applying Lemma 5.3.2 we have

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

dp‡n

 

 

dMXn

Xk dPXk

dPXknjXk

 

 

 

 

dMXknjXk

 

Z

 

Z

 

k

j

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Thus

= Z dPXk dPXknjXk = 1:

 

 

 

 

 

 

 

Z

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

dp‡n 1

 

 

 

 

 

 

 

and we can apply Lemma 5.4.1 to conclude that p-a.e.

 

lim sup n = lim sup

1

ln

fXknjXk

0:

(8.7)

n

f

 

n

n!1

 

 

n!1

 

 

 

 

 

 

Xk jXk¡

 

 

 

 

Using the chain rule for densities,

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

fXknjXk

fXn

 

 

 

 

 

 

 

1

 

 

 

 

 

 

 

 

 

=

 

 

 

£

 

 

 

 

 

 

:

 

 

f

n

 

 

f

 

k

 

 

1

 

 

 

 

 

 

 

Xk jXk¡

 

 

X

 

 

 

 

 

Ql=k

fXljXl¡

 

150

 

CHAPTER 8.

ERGODIC THEOREMS FOR DENSITIES

Thus from (8.7)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

 

1

 

 

 

1 1

 

 

 

n

 

ˆ

 

X ¡

 

X

 

¡

 

X

 

Xl jXl¡ !

 

!1

n

n

 

n

 

 

lim sup

ln f n

 

ln f

k

 

 

 

ln f

 

0:

l=k

Invoking the ergodic theorem for the rightmost terms and the fact that the middle term converges to 0 almost everywhere since ln fXk is flnite almost everywhere implies that

lim sup

1

ln fXn

Ep(ln f

¡ ) = Ep(ln fX0

 

X¡ )

 

j

n!1

n

Xk jXk

 

 

 

 

 

 

 

(8.8)

 

 

 

 

= Hpjjm(X):

 

Combining this with (8.3) completes the sandwich and proves the theorem.

2

8.3Stationary Nonergodic Sources

Next suppose that the source p is stationary with ergodic decomposition fp; ‚ 2 g and ergodic component function ˆ as in Theorem 1.8.3. We flrst require some technical details to ensure that the various Radon-Nikodym derivatives are well deflned and that the needed chain rules for densities hold.

Lemma 8.3.1: Given a stationary source fXng, let fp; ‚ 2 g denote the ergodic decomposition and ˆ the ergodic component function of Theorem 1.8.3. Let Pˆ denote the induced distribution of ˆ. Let PXn and PXn denote the induced marginal distributions of p and p. Assume that fXng has the flnite-gap information property of (6.13); that is, there exists a K such that

Ip(XK ; X¡jXK ) < 1;

(8.9)

where X¡ = (X¡1; X¡2; ¢ ¢ ¢). We also assume that for some n

 

I(Xn; ˆ) < 1:

(8.10)

This will be the case, for example, if (8.9) holds for K = 0. Let m be a K- step Markov process such that MXn >> PXn for all n. (Observe that such a process exists since from (8.9) the Kth order Markov approximation p(K) su–ces.) Deflne MXn= MXn £ Pˆ. Then

MXn>> PXn £ Pˆ >> PXn;

(8.11)

and with probability 1 under p

 

 

 

 

MXn >> PXn >> PXˆn :

 

 

Lastly,

 

 

 

 

dPXˆn

dPXn

 

(8.12)

 

 

= fXn=

 

:

 

dMXn

d(MXn £ Pˆ)

8.3. STATIONARY NONERGODIC SOURCES

 

 

151

and therefore

 

 

 

 

 

 

 

 

dPXˆn

=

dPXˆn =dMXn

=

fXn

:

(8.13)

 

 

 

 

dPXn =dMXn

 

 

dPXn

 

 

 

fXn

 

Proof: From Theorem 6.4.4 the given assumptions ensure that

 

lim

1

Epi(Xn; ˆ) = lim

 

1

I(Xn; ˆ) = 0

(8.14)

 

 

n!1 n

n!1 n

 

 

 

and hence PXn £ Pˆ >> PXn(since otherwise I(Xn; ˆ) would be inflnite for some n and hence inflnite for all larger n since it is increasing with n). This proves the right-most absolute continuity relation of (8.11). This in turn implies that MXn £ Pˆ >> PXn. The lemma then follows from Theorem 5.3.1 with X = Xn, Y = ˆ and the chain rule for Radon-Nikodym derivatives. 2

We know that the source will produce with probability one an ergodic component pand hence Theorem 8.2.1 will hold for this ergodic component. In other words, we have for all that

 

1

 

ln fXn(X

n

 

 

 

 

nlim

n

 

 

 

j‚) = Hp

(X); p¡ a:e:

 

!1

 

 

 

 

 

 

 

 

 

 

 

This implies that

 

 

 

 

 

 

 

 

 

 

 

nlim

1

ln fXn(X

n

 

 

 

 

n

 

 

) = Hpˆ

(X); p ¡ a:e:

(8.15)

!1

 

 

 

 

 

 

 

 

 

 

 

Making this step precise generalizes Lemma 3.3.1.

Lemma 8.3.2: Suppose that fXng is a stationary not necessarily ergodic source with ergodic component function ˆ. Then (8.15) holds.

Proof: The proof parallels that for Lemma 3.3.1. Observe that if we have two random variables U; V (U = X0; X1; ¢ ¢ ¢ and Y = ˆ above) and a sequence

¡1 nj

of functions gn(U; V ) (n fXn(X ˆ)) and a function g(V ) (Hpˆ (X)) with the property

lim gn(U; v) = g(v); PUjV =v ¡ a:e:;

n!1

then also

lim gn(U; V ) = g(V ); PUV ¡ a:e:

n!1

since deflning the (measurable) set G = fu; v : limn!1 gn(u; v) = g(v)g and its section Gv = fu : (u; v) 2 Gg, then from (1.26)

Z

PUV (G) = PUjV (Gvjv)dPV (v) = 1

if PUjV (Gvjv) = 1 with probability 1. 2

It is not, however, the relative entropy density using the distribution of the ergodic component that we wish to show converges. It is the original sample density fXn . The following lemma shows that the two sample entropies converge to the same thing. The lemma generalizes Lemma 3.3.1 and is proved by a

152

CHAPTER 8. ERGODIC THEOREMS FOR DENSITIES

sandwich argument analogous to Theorem 8.2.1. The result can be viewed as an almost everywhere version of (8.14).

Theorem 8.3.1: Given a stationary source fXng, let fp; ‚ 2 g denote the ergodic decomposition and ˆ the ergodic component function of Theorem 1.8.3. Assume that the flnite-gap information property (8.9) is satisfled and that (8.10) holds for some n. Then

nlim

1

i(X

n

; ˆ) = nlim

1

ln

fXn

= 0; p ¡ a:e:

 

n

 

n

f

 

n

!1

 

 

 

 

!1

 

 

 

X

 

 

Proof: From Theorem 5.4.1 we have immediately that

lim inf

n

(8.16)

 

n!1 in(X ; ˆ) 0;

which provides half of the sandwich proof.

To develop the other half of the sandwich, for each k ‚ K let p(k) denote the k-step Markov approximation of p. Exactly as in the proof of Theorem 8.2.1, it follows that (8.1) holds. Now, however, the Markov approximation relative entropy density converges instead as

 

1 (k)

 

 

1

1

 

 

 

 

 

 

ln fXn (Xn) = nlim

 

X

 

 

 

nlim

n

n

 

 

fXk jXk (XkjXk)T k = Epˆ fXk jXk (XkjXk):

!1

 

 

 

 

 

 

!1

 

l=k

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Combining this with (8.15 we have that

 

 

 

 

 

1

 

fXn(Xn)

 

k

k

 

 

lim sup n ln

 

(XkjX

):

 

f

X

n (Xn)

• Hpˆ jjm(X) ¡ Epˆ fXk jX

 

 

n!1

 

 

 

 

 

 

 

 

 

From Lemma 7.4.1, the right hand side is just Ipˆ (Xk; X¡jXk) which from

Corollary 7.4.2 is just Hpjjp(k) (X). Since the bound holds for all k, we have that

 

1

ln

fXn(Xn)

 

inf

(k)

(X) · ‡:

lim sup n

 

 

f

X

n (Xn)

k Hpˆ jjp

 

n!1

 

 

 

 

 

 

 

Using the ergodic decompostion of relative entropy rate (Corollary 7.5.1) that and the fact that Markov approximations are asymptotically accurate (Corollary 7.4.3) we have further that

ZZ

dPˆ= dPˆ inf Hpˆ jjp(k) (X)

k

k

Z

dP

ˆHpˆ jjp

 

 

(X) = k Hpjjp

(k) (X) = 0

 

inf

 

(k)

 

 

inf

 

 

and hence = 0 with Pˆ probability 1. Thus

 

 

 

 

 

 

 

1

 

fXn(Xn)

 

 

 

 

 

lim sup

 

ln

 

 

 

 

 

0;

(8.17)

 

 

n

 

f

X

n (Xn)

 

 

 

n!1

 

 

 

 

 

 

 

 

8.4. AMS SOURCES

153

which with (8.16) completes the sandwich proof. 2

Simply restating the theorem yields and using (8.15) the ergodic theorem for relative entropy densities in the general stationary case.

Corollary 8.3.1: Given the assumptions of Theorem 8.3.1,

nlim

1

n

(X

n

 

 

n ln fX

 

) = Hpˆ jjm(X); p ¡ a:e:

!1

 

 

 

 

 

The corollary states that the sample relative entropy density of a process satisfying (8.9) converges to the conditional relative entropy rate with respect to the underlying ergodic component. This is a slight extension and elaboration

of Barron’s result [9] which made the stronger assumption that H

pjjm

(X

X¡) =

 

0j

Hpjjm(X) < 1. From Corollary 7.4.3 this condition is su–cient but not nec-

essary for the flnite-gap information property of (8.9). In particular, the flnite gap information property implies that

¡j k 1

Hpjjp(k) (X) = Ip(Xk; X X ) < ;

1. In addition, Barron [9] and

but it need not be true that Hpjjm(X) <

Algoet and Cover [7] do not characterize the limiting density as the entropy rate of the ergodic component, instead they efiectively show that the limit is Epˆ (ln fX0jX¡ (X0jX¡)). This, however, is equivalent since it follows from the ergodic decomposition (see speciflcally Lemma 8.6.2 [50]) that fX0jX¡ = fX0jX¡with probability one since the ergodic component ˆ can be determined from the inflnite past X¡.

8.4 AMS Sources

The following lemma is a generalization of Lemma 3.4.1. The result is due to Barron [9], who proved it using martingale inequalities and convergence results.

Lemma 8.4.1: Let fXng be an AMS source with the property that for

every integer k there exists an integer l = l(k) such that

 

 

 

Ip(Xk; (Xk+l; Xk+l+1; ¢ ¢ ¢)jXkl ): < 1:

(8.18)

Then

1

 

 

 

 

 

nlim

i(Xk; (Xk + l; ¢ ¢ ¢ ; X1)jXkl ) = 0; p

¡ a:e:

 

n

!1

 

 

 

 

 

 

Proof: By assumption

 

 

 

 

 

Ip(Xk; (Xk+l; Xk+l+1; ¢ ¢ ¢)jXkl ) =

 

 

Ep ln

fXk jXk ;Xk+1;¢¢¢(XkjXk; Xk+1; ¢ ¢ ¢)

< 1:

fXk jXkl (XkjXkl )

 

This implies that

PXk £(Xk +l;¢¢¢)jXkl >> PX0;X1;::::

154

 

CHAPTER 8. ERGODIC THEOREMS FOR DENSITIES

with

 

 

 

 

fXk jXk ;Xk +1;¢¢¢(XkjXk; Xk + 1; ¢ ¢ ¢)

 

 

 

dPX0;X1;::::

 

=

:

 

 

dPXk £(Xk +l;¢¢¢)jXkl

 

 

fXk jXkl (XkjXkl ):

Restricting the measures to Xn for n > k + l yields

 

 

dPXn

=

fXk jXk ;Xk +1;¢¢¢;Xn (XkjXk; Xk + 1; ¢ ¢ ¢)

 

dPXk £(Xk +l;¢¢¢;Xn)jXkl

fXk jXkl (XkjXkl )

 

 

 

= i(Xk; (Xk + l; ¢ ¢ ¢ ; Xn)jXkl ):

With this setup the lemma follows immediately from Theorem 5.4.1. 2

The following lemma generalizes Lemma 3.4.2 and will yield the general theorem. The lemma was flrst proved by Barron [9] using martingale inequalities.

Theorem 8.4.1: Suppose that p and m are distributions of a standard alphabet process fXng such that p is AMS and m is k-step Markov. Let p„ be a stationary measure that asymptotically dominates p (e.g., the stationary mean).

Suppose that PX

n

, PX

, and MX

n

are the distributions induced by p, p„, and m

and that MX

 

 

n

 

 

and PX

for all n and that fX

 

and fX

n

dominates both PX

n

n

 

 

 

 

 

 

n

 

n

are the corresponding densities. If there is an invariant function h such that

lim 1

n!1 n

then also

lim 1

n!1 n

n

(X

n

) = h; p¡ a:e:

ln fX

 

 

ln fXn (Xn) = h; p ¡ a:e:

Proof: For any k and n ‚ k we can write using the chain rule for densities

1

 

1

 

 

 

 

1

 

 

 

 

 

ln fXn ¡

 

ln fXkn¡k =

 

 

ln fXk jXkn¡k :

 

 

 

n

n

n

Since for k • l < n

1

 

 

 

1

 

 

 

 

1

ln fXk jXkn¡k =

ln fXk jXkl

i(Xk; (Xk+l; ¢ ¢ ¢ ; X1)jXkl );

 

 

 

 

+

 

 

n

n

n

Lemma 8.4.1 and the fact that densities are flnite with probability one implies

that

 

 

 

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

nlim

 

ln fXk jXkn¡k

= 0; p ¡ a:e:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

n

 

 

 

 

 

 

 

!1

 

 

 

 

 

 

 

 

 

 

 

This implies that there is a subsequence k(n) ! 1 such that

 

1

ln f

 

n (Xn)

 

 

1

ln f

n k(n)

(Xn¡k(n));

!

0; p

¡

a:e:

 

n

 

 

 

 

 

X

 

¡ n

Xk(¡n)

)

k(n)

 

 

To prove this, for each k chose N (k) large enough so that

p(jN1 (k) ln fXk jXkN (k)¡k (XkjXkN(k)¡k)j > 2¡k) 2¡k

8.4. AMS SOURCES

155

and then let k(n) = k for N (k) • n < N (k + 1). Then from the Borel-Cantelli lemma we have for any that

p(j 1 ln f N (k)¡k (XkjXN(k)¡k)j > † i.o.) = 0

N (k) Xk jXk k

and hence

 

 

 

 

 

 

 

 

 

 

 

 

lim

1

ln f

 

n (Xn) = lim

1

ln f

n k(n) (Xn¡k(n)); p

¡

a:e:

n

X

 

n

!1

 

n

!1

n

Xk(¡n)

k(n)

 

 

 

 

 

 

 

 

 

 

 

 

In a similar manner we can also choose the sequence so that

lim

1

 

 

 

n

(X

n

) = lim

1

 

 

 

 

 

 

 

 

 

ln fX

 

 

 

 

 

n!1 n

 

 

 

 

 

 

 

n!1 n

From Markov’s inequality

 

 

 

 

 

 

 

 

 

 

 

1

ln fXkn¡k (Xkn¡k)

 

p„(

 

 

n

= p„(

fXn¡k (Xkn¡k)

n†

)

 

 

k

 

 

 

 

‚ e

n k

 

n¡k

)

 

 

 

 

 

fXk ¡

(Xk

 

 

 

 

 

 

ln f

n k(n) (Xn¡k(n)); p

¡

a:e:

 

 

Xk(¡n)

 

 

k(n)

 

 

1

 

n

k

 

n¡k

) + )

 

 

 

 

 

 

 

 

 

 

 

 

 

n ln fXk

¡

(Xk

 

 

 

 

 

Z

dp

fXn¡k (Xkn¡k)

 

 

fXkn¡k (Xk ¡

 

)

 

 

 

e¡n†

k

n

 

k

 

 

 

 

= e¡n† Z

dmfXkn¡k (Xkn¡k) = e¡n†:

 

 

Hence again invoking the Borel-Cantelli lemma we have that

 

1

n

k

 

 

n¡k

1

 

n

k

 

n¡k

) + i.o.) = 0

 

 

 

 

 

 

 

 

 

p„( n ln fXk ¡

 

(Xk

) n ln fXk ¡

 

(Xk

 

 

and therefore

 

 

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

ln fXkn¡k (Xkn¡k)

• h; p¡ a:e:

(8.19)

 

 

lim sup

 

 

 

n

 

 

n!1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The above event is in the tail ¾-fleld

n ¾(Xn; Xn+1; ¢ ¢ ¢) since h is invariant

 

 

 

 

 

 

 

 

 

 

 

Thus

 

 

 

 

 

 

 

 

and p„ dominates p on the tail ¾-fleld. T

 

 

 

 

 

 

 

 

 

 

lim sup

1

ln f

n k(n)

(Xn¡k(n))

h; p

¡

a:e:

 

 

 

n

 

 

 

n!1

 

 

 

Xk(¡n)

k(n)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

and hence

lim sup 1 ln fXn (Xn) • h; p ¡ a:e:

n!1 n

which half proves the lemma.

Since p„ asymptotically dominates p, given † > 0 there is a k such that

p( lim

n¡1f(Xn¡k) = h)

1

¡

†:

n

!1

k

 

 

 

 

 

 

 

 

156 CHAPTER 8. ERGODIC THEOREMS FOR DENSITIES

Again applying Markov’s inequality and the Borel-Cantelli lemma as previously we have that

 

 

 

1

 

f

X

n

¡

k(n) (Xn¡k(n))

 

 

 

 

 

 

 

 

 

 

 

 

 

k(n)

 

 

 

 

 

 

 

 

lim inf

 

ln

 

 

k(n)

 

 

 

0; p

¡

a:e:

 

 

 

 

 

 

n k(n)

 

 

n

!1

n

 

 

 

 

)

 

 

 

 

 

 

 

 

 

fXn¡k(n) (Xk(¡n)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

k(n)

 

 

 

 

 

 

 

 

 

which implies that

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

p(lim inf

1

f

 

n k(n)

(Xn¡k)

h)

 

 

 

 

 

 

 

n

!1

 

n

 

Xk(¡n)

k

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

and hence also that

p(lim inf 1 fXn (Xn) ‚ h) ‚ †:

n!1 n

Since can be made arbitrarily small, this proves that p-a.e. lim inf n¡1hn ‚ h, which completes the proof of the lemma. 2

We can now extend the ergodic theorem for relative entropy densities to the general AMS case.

Corollary 8.4.1: Given the assumptions of Theorem 8.4.1,

 

1

n

(X

n

(X);

lim

 

ln fX

 

) = Hpˆ

n!1 n

 

 

 

 

 

where pˆ is the ergodic component of the stationary mean p„ of p.

Proof: The proof follows immediately from Theorem 8.4.1 and Corollary 8.3.1, the ergodic theorem for the relative entropy density for the stationary mean. 2

8.5Ergodic Theorems for Information Densities.

As an application of the general theorem we prove an ergodic theorem for mutual information densities for stationary and ergodic sources. The result can be extended to AMS sources in the same manner that the results of Section 8.3 were extended to those of Section 8.4. As the stationary and ergodic result su–ces for the coding theorems and the AMS conditions are messy, only the stationary case is considered here. The result is due to Barron [9].

Theorem 8.5.1: Let fXn; Yng be a stationary ergodic pair random process with standard alphabet. Let PXnY n , PXn , and PY n denote the induced distributions and assume that for all n PXn £ PY n >> PXnY n and hence the information densities

in(Xn; Y n) = dPXnY n d(PXn £ PY n )

are well deflned. Assume in addition that both the fXng and fYng processes have the flnite-gap information property of (8.9) and hence by the comment

dPZn

8.5. ERGODIC THEOREMS FOR INFORMATION DENSITIES.

157

following Corollary 7.3.1 there is a K such that both processes satisfy the K-

gap property

I(XK ; X¡jXK ) < 1; I(YK ; Y ¡jY K ) < 1:

Then

1

 

 

 

 

 

 

n

 

n

nlim

 

n

in(X

 

; Y

 

) = I(X; Y ); p ¡ a:e::

!1

 

 

 

 

 

 

 

Proof: Let Zn = (Xn; Yn).

 

Let MXn = PX(Kn) and MY n = PY(Kn ) denote

the Kth order Markov approximations of fXng and fYng, respectively. The flnite-gap approximation implies as in Section 8.3 that the densities

 

 

 

 

 

 

 

 

fXn =

dPXn

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

dMXn

 

 

 

 

 

 

 

and

 

 

 

 

 

 

dPY n

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

fY n =

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

dMY n

 

 

 

 

 

 

 

are well deflned. From Theorem 8.2.1

 

 

 

 

 

 

 

 

 

 

 

lim

1

ln f

 

n (Xn) = H

(K) (X

X¡) = I(X

; X¡

Xk) <

1

;

 

 

n!1 n

X

 

1

 

 

pX jjpX

0j

 

 

 

k

 

 

j

 

 

 

 

 

lim

ln f

n (Y n) = I(Y

; Y ¡

Y k) <

1

:

 

 

 

 

 

 

 

 

 

 

 

 

 

n

!1

n

Y

 

 

k

 

 

j

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Deflne the measures MZn by MXn £ MY n . Then this is a K-step Markov source and since

MXn £ MY n >> PXn £ PY n >> PXn;Y n = PZn ;

the density

fZn = dMZn

is well deflned and from Theorem 8.2.1 has a limit

lim 1 ln fZn (Zn) = Hpjjm(Z0jZ¡):

n!1 n

If the density in(Xn; Y n) is inflnite for any n, then it is inflnite for all larger n and convergence is trivially to the inflnite information rate. If it is flnite, the

chain rule for densities yields

 

 

 

 

 

1

 

 

1

 

1

1

 

 

 

in(Xn; Y n) =

 

 

ln fZn (Zn)

¡

 

ln fXn (Xn) ¡

 

ln fY n (Y n)

n

n

n

n

 

 

n

! Hpjjp(k) (Z0jZ¡) ¡ Hpjjp(k) (X0jX¡) ¡ Hpjjp(k) (Y0jY ¡)

 

 

 

!1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

= Hpjjp(k) (X; Y ) ¡ Hpjjp(k) (X) ¡ Hpjjp(k) (Y ):

The limit is not indeterminate ( of the form 1¡1) because the two subtracted terms are flnite. Since convergence is to a constant, the constant must also be

the limit of the expected values of n¡1in(Xn; Y n), that is, I(X; Y ). 2

158

CHAPTER 8. ERGODIC THEOREMS FOR DENSITIES