
Теория информации / Gray R.M. Entropy and information theory. 1990., 284p
.pdf8.2. STATIONARY ERGODIC SOURCES |
|
|
|
|
|
|
|
|
|
|
149 |
||||||||||||
where |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Gn = |
f |
x : f n |
¡(x¡ |
) > 0 |
g |
|
|
|
|
|
|||||||||
|
|
|
|
|
|
|
|
Xk jXk |
|
n+k |
|
|
|
|
|
|
|
|
|
||||
since Gn has probability 1 under p (or else (8.6) would be violated). Thus |
|||||||||||||||||||||||
|
Z |
|
|
Z |
|
|
ˆ |
fXknjXk¡ |
|
|
f |
k j k |
|
|
|
g! |
|||||||
|
dp‡n = dPXn¡+k |
|
|
fXknjXk |
(Xn+k) |
1 fXn X¡ >0 |
|
|
|||||||||||||||
|
ˆ |
|
|
|
|
|
|
|
|
||||||||||||||
|
Z |
|
|
|
j |
|
fXknjXk¡ |
|
|
f |
|
k j k |
|
|
g! |
|
|||||||
|
|
|
|
|
|
|
|
|
fXknjXk |
(Xn+k) |
|
|
|
|
|
|
|
|
|||||
Z |
= dSXn¡+k fXkn Xk¡ |
|
|
|
|
|
|
g • Z |
1 fXn X¡ >0 |
|
|
|
|||||||||||
|
|
j |
Xk (Xn+k)1 |
f |
|
k j |
k |
|
|
|
|
|
|
|
j |
|
|
(Xn+k): |
|||||
= dS |
¡ fXn |
|
f |
|
>0 |
|
|
dS |
|
¡ fXn |
Xk |
||||||||||||
|
Xn+k |
k |
|
|
|
|
|
Xn X¡ |
|
|
|
|
|
|
Xn+k |
k |
|
|
|
|
Using the deflnition of the measure S and iterated expectation we have that
ZZ
dp‡n • dMXknjXk¡ dPXk¡ fXknjXk (Xn+k):
Z
=dMXknjXk dPXk¡ fXknjXk (Xn+k):
Since the integrand is now measurable with respect to ¾(Xn+k), this reduces
to |
|
|
|
Z |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
Z |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
dp‡n • |
|
dMXknjXk dPXk fXknjXk : |
|
||||||||||||||||||
Applying Lemma 5.3.2 we have |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
dp‡n |
|
|
dMXn |
Xk dPXk |
dPXknjXk |
|
|
|||||||||||||||
|
|
dMXknjXk |
|
|||||||||||||||||||
Z |
|
• Z |
|
k |
j |
|
|
|
|
|
|
|
|
|||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
Thus |
= Z dPXk dPXknjXk = 1: |
|
|
|
|
|||||||||||||||||
|
|
|
Z |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
|
|
|
|
dp‡n • 1 |
|
|
|
|
|
|
|
||||||||||
and we can apply Lemma 5.4.1 to conclude that p-a.e. |
|
|||||||||||||||||||||
lim sup ‡n = lim sup |
1 |
ln |
fXknjXk |
• 0: |
(8.7) |
|||||||||||||||||
n |
f |
|
n |
|||||||||||||||||||
n!1 |
|
|
n!1 |
|
|
|
|
|
|
Xk jXk¡ |
|
|
|
|
||||||||
Using the chain rule for densities, |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||
|
fXknjXk |
fXn |
|
|
|
|
|
|
|
1 |
|
|
|
|
||||||||
|
|
|
|
|
= |
|
|
|
£ |
|
|
|
|
|
|
: |
|
|
||||
f |
n |
|
|
f |
|
k |
|
|
n¡1 |
|
|
|
|
|
||||||||
|
|
Xk jXk¡ |
|
|
X |
|
|
|
|
|
Ql=k |
fXljXl¡ |
|
150 |
|
CHAPTER 8. |
ERGODIC THEOREMS FOR DENSITIES |
||||||||||
Thus from (8.7) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
|
1 |
|
|
|
1 n¡1 |
|
|
|
||
n |
|
ˆ |
|
X ¡ |
|
X |
|
¡ |
|
X |
|
Xl jXl¡ ! • |
|
!1 |
n |
n |
|
n |
|
|
|||||||
lim sup |
ln f n |
|
ln f |
k |
|
|
|
ln f |
|
0: |
l=k
Invoking the ergodic theorem for the rightmost terms and the fact that the middle term converges to 0 almost everywhere since ln fXk is flnite almost everywhere implies that
lim sup |
1 |
ln fXn |
• |
Ep(ln f |
¡ ) = Ep(ln fX0 |
|
X¡ ) |
|
j |
||||||
n!1 |
n |
„ |
Xk jXk |
|
|||
|
|
|
|
|
|
(8.8) |
|
|
|
|
|
= Hpjjm(X): |
|
Combining this with (8.3) completes the sandwich and proves the theorem.
2
8.3Stationary Nonergodic Sources
Next suppose that the source p is stationary with ergodic decomposition fp‚; ‚ 2 ⁄g and ergodic component function ˆ as in Theorem 1.8.3. We flrst require some technical details to ensure that the various Radon-Nikodym derivatives are well deflned and that the needed chain rules for densities hold.
Lemma 8.3.1: Given a stationary source fXng, let fp‚; ‚ 2 ⁄g denote the ergodic decomposition and ˆ the ergodic component function of Theorem 1.8.3. Let Pˆ denote the induced distribution of ˆ. Let PXn and PX‚ n denote the induced marginal distributions of p and p‚. Assume that fXng has the flnite-gap information property of (6.13); that is, there exists a K such that
Ip(XK ; X¡jXK ) < 1; |
(8.9) |
where X¡ = (X¡1; X¡2; ¢ ¢ ¢). We also assume that for some n |
|
I(Xn; ˆ) < 1: |
(8.10) |
This will be the case, for example, if (8.9) holds for K = 0. Let m be a K- step Markov process such that MXn >> PXn for all n. (Observe that such a process exists since from (8.9) the Kth order Markov approximation p(K) su–ces.) Deflne MXn;ˆ = MXn £ Pˆ. Then
MXn;ˆ >> PXn £ Pˆ >> PXn;ˆ; |
(8.11) |
||||
and with probability 1 under p |
|
|
|
||
|
MXn >> PXn >> PXˆn : |
|
|
||
Lastly, |
|
|
|
||
|
dPXˆn |
dPXn;ˆ |
|
(8.12) |
|
|
|
= fXnjˆ = |
|
: |
|
|
dMXn |
d(MXn £ Pˆ) |

8.3. STATIONARY NONERGODIC SOURCES |
|
|
151 |
||||||||
and therefore |
|
|
|
|
|
|
|
||||
|
dPXˆn |
= |
dPXˆn =dMXn |
= |
fXnjˆ |
: |
(8.13) |
||||
|
|
|
|
dPXn =dMXn |
|
||||||
|
dPXn |
|
|
|
fXn |
|
|||||
Proof: From Theorem 6.4.4 the given assumptions ensure that |
|
||||||||||
lim |
1 |
Epi(Xn; ˆ) = lim |
|
1 |
I(Xn; ˆ) = 0 |
(8.14) |
|||||
|
|
||||||||||
n!1 n |
n!1 n |
|
|
|
and hence PXn £ Pˆ >> PXn;ˆ (since otherwise I(Xn; ˆ) would be inflnite for some n and hence inflnite for all larger n since it is increasing with n). This proves the right-most absolute continuity relation of (8.11). This in turn implies that MXn £ Pˆ >> PXn;ˆ. The lemma then follows from Theorem 5.3.1 with X = Xn, Y = ˆ and the chain rule for Radon-Nikodym derivatives. 2
We know that the source will produce with probability one an ergodic component p‚ and hence Theorem 8.2.1 will hold for this ergodic component. In other words, we have for all ‚ that
|
1 |
|
ln fXnjˆ(X |
n |
|
„ |
|
|
|
||
nlim |
n |
|
|
|
j‚) = Hp‚ |
(X); p‚ ¡ a:e: |
|
||||
!1 |
|
|
|
|
|
|
|
|
|
|
|
This implies that |
|
|
|
|
|
|
|
|
|
|
|
nlim |
1 |
ln fXnjˆ(X |
n |
„ |
|
|
|
||||
|
n |
|
|
jˆ) = Hpˆ |
(X); p ¡ a:e: |
(8.15) |
|||||
!1 |
|
|
|
|
|
|
|
|
|
|
|
Making this step precise generalizes Lemma 3.3.1.
Lemma 8.3.2: Suppose that fXng is a stationary not necessarily ergodic source with ergodic component function ˆ. Then (8.15) holds.
Proof: The proof parallels that for Lemma 3.3.1. Observe that if we have two random variables U; V (U = X0; X1; ¢ ¢ ¢ and Y = ˆ above) and a sequence
¡1 nj „
of functions gn(U; V ) (n fXnjˆ(X ˆ)) and a function g(V ) (Hpˆ (X)) with the property
lim gn(U; v) = g(v); PUjV =v ¡ a:e:;
n!1
then also
lim gn(U; V ) = g(V ); PUV ¡ a:e:
n!1
since deflning the (measurable) set G = fu; v : limn!1 gn(u; v) = g(v)g and its section Gv = fu : (u; v) 2 Gg, then from (1.26)
Z
PUV (G) = PUjV (Gvjv)dPV (v) = 1
if PUjV (Gvjv) = 1 with probability 1. 2
It is not, however, the relative entropy density using the distribution of the ergodic component that we wish to show converges. It is the original sample density fXn . The following lemma shows that the two sample entropies converge to the same thing. The lemma generalizes Lemma 3.3.1 and is proved by a

152 |
CHAPTER 8. ERGODIC THEOREMS FOR DENSITIES |
sandwich argument analogous to Theorem 8.2.1. The result can be viewed as an almost everywhere version of (8.14).
Theorem 8.3.1: Given a stationary source fXng, let fp‚; ‚ 2 ⁄g denote the ergodic decomposition and ˆ the ergodic component function of Theorem 1.8.3. Assume that the flnite-gap information property (8.9) is satisfled and that (8.10) holds for some n. Then
nlim |
1 |
i(X |
n |
; ˆ) = nlim |
1 |
ln |
fXnjˆ |
= 0; p ¡ a:e: |
|||
|
n |
|
n |
f |
|
n |
|||||
!1 |
|
|
|
|
!1 |
|
|
|
X |
|
|
Proof: From Theorem 5.4.1 we have immediately that
lim inf |
n |
(8.16) |
|
||
n!1 in(X ; ˆ) ‚ 0; |
which provides half of the sandwich proof.
To develop the other half of the sandwich, for each k ‚ K let p(k) denote the k-step Markov approximation of p. Exactly as in the proof of Theorem 8.2.1, it follows that (8.1) holds. Now, however, the Markov approximation relative entropy density converges instead as
|
1 (k) |
|
|
1 |
1 |
|
|
|
|
|||||
|
|
ln fXn (Xn) = nlim |
|
X |
|
|
|
|||||||
nlim |
n |
n |
|
|
fXk jXk (XkjXk)T k = Epˆ fXk jXk (XkjXk): |
|||||||||
!1 |
|
|
|
|
|
|
!1 |
|
l=k |
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|||
Combining this with (8.15 we have that |
|
|
|
|||||||||||
|
|
1 |
|
fXnjˆ(Xnjˆ) |
|
„ |
k |
k |
|
|||||
|
lim sup n ln |
|
(XkjX |
): |
||||||||||
|
f |
X |
n (Xn) |
• Hpˆ jjm(X) ¡ Epˆ fXk jX |
||||||||||
|
|
n!1 |
|
|
|
|
|
|
|
|
|
From Lemma 7.4.1, the right hand side is just Ipˆ (Xk; X¡jXk) which from
„
Corollary 7.4.2 is just Hpjjp(k) (X). Since the bound holds for all k, we have that
|
1 |
ln |
fXnjˆ(Xnjˆ) |
|
inf „ |
(k) |
(X) · ‡: |
||
lim sup n |
|
• |
|||||||
|
f |
X |
n (Xn) |
k Hpˆ jjp |
|
||||
n!1 |
|
|
|
|
|
|
|
Using the ergodic decompostion of relative entropy rate (Corollary 7.5.1) that and the fact that Markov approximations are asymptotically accurate (Corollary 7.4.3) we have further that
ZZ
„
dPˆ‡ = dPˆ inf Hpˆ jjp(k) (X)
k
• k |
Z |
dP |
ˆHpˆ jjp |
|
|
(X) = k Hpjjp |
(k) (X) = 0 |
|
||||||
inf |
|
„ |
(k) |
|
|
inf |
„ |
|
|
|||||
and hence ‡ = 0 with Pˆ probability 1. Thus |
|
|
|
|
||||||||||
|
|
|
1 |
|
fXnjˆ(Xnjˆ) |
|
|
|
||||||
|
|
lim sup |
|
ln |
|
|
|
|
|
• 0; |
(8.17) |
|||
|
|
n |
|
f |
X |
n (Xn) |
|
|||||||
|
|
n!1 |
|
|
|
|
|
|
|
|
8.4. AMS SOURCES |
153 |
which with (8.16) completes the sandwich proof. 2
Simply restating the theorem yields and using (8.15) the ergodic theorem for relative entropy densities in the general stationary case.
Corollary 8.3.1: Given the assumptions of Theorem 8.3.1,
nlim |
1 |
n |
(X |
n |
„ |
|
|
||||
n ln fX |
|
) = Hpˆ jjm(X); p ¡ a:e: |
|||
!1 |
|
|
|
|
|
The corollary states that the sample relative entropy density of a process satisfying (8.9) converges to the conditional relative entropy rate with respect to the underlying ergodic component. This is a slight extension and elaboration
of Barron’s result [9] which made the stronger assumption that H |
pjjm |
(X |
X¡) = |
„ |
|
0j |
|
Hpjjm(X) < 1. From Corollary 7.4.3 this condition is su–cient but not nec- |
essary for the flnite-gap information property of (8.9). In particular, the flnite gap information property implies that
„ ¡j k 1
Hpjjp(k) (X) = Ip(Xk; X X ) < ;
„ |
1. In addition, Barron [9] and |
but it need not be true that Hpjjm(X) < |
Algoet and Cover [7] do not characterize the limiting density as the entropy rate of the ergodic component, instead they efiectively show that the limit is Epˆ (ln fX0jX¡ (X0jX¡)). This, however, is equivalent since it follows from the ergodic decomposition (see speciflcally Lemma 8.6.2 [50]) that fX0jX¡ = fX0jX¡;ˆ with probability one since the ergodic component ˆ can be determined from the inflnite past X¡.
8.4 AMS Sources
The following lemma is a generalization of Lemma 3.4.1. The result is due to Barron [9], who proved it using martingale inequalities and convergence results.
Lemma 8.4.1: Let fXng be an AMS source with the property that for
every integer k there exists an integer l = l(k) such that |
|
|
|||||
|
Ip(Xk; (Xk+l; Xk+l+1; ¢ ¢ ¢)jXkl ): < 1: |
(8.18) |
|||||
Then |
1 |
|
|
|
|
|
|
nlim |
i(Xk; (Xk + l; ¢ ¢ ¢ ; Xn¡1)jXkl ) = 0; p |
¡ a:e: |
|||||
|
|||||||
n |
|||||||
!1 |
|
|
|
|
|
|
|
Proof: By assumption |
|
|
|||||
|
|
|
Ip(Xk; (Xk+l; Xk+l+1; ¢ ¢ ¢)jXkl ) = |
|
|
||
Ep ln |
fXk jXk ;Xk+1;¢¢¢(XkjXk; Xk+1; ¢ ¢ ¢) |
< 1: |
|||||
fXk jXkl (XkjXkl ) |
|
This implies that
PXk £(Xk +l;¢¢¢)jXkl >> PX0;X1;::::

154 |
|
CHAPTER 8. ERGODIC THEOREMS FOR DENSITIES |
|||||
with |
|
|
|
|
fXk jXk ;Xk +1;¢¢¢(XkjXk; Xk + 1; ¢ ¢ ¢) |
|
|
|
|
dPX0;X1;:::: |
|
= |
: |
||
|
|
dPXk £(Xk +l;¢¢¢)jXkl |
|
|
fXk jXkl (XkjXkl ): |
||
Restricting the measures to Xn for n > k + l yields |
|||||||
|
|
dPXn |
= |
fXk jXk ;Xk +1;¢¢¢;Xn (XkjXk; Xk + 1; ¢ ¢ ¢) |
|||
|
dPXk £(Xk +l;¢¢¢;Xn)jXkl |
fXk jXkl (XkjXkl ) |
|||||
|
|
|
= i(Xk; (Xk + l; ¢ ¢ ¢ ; Xn)jXkl ):
With this setup the lemma follows immediately from Theorem 5.4.1. 2
The following lemma generalizes Lemma 3.4.2 and will yield the general theorem. The lemma was flrst proved by Barron [9] using martingale inequalities.
Theorem 8.4.1: Suppose that p and m are distributions of a standard alphabet process fXng such that p is AMS and m is k-step Markov. Let p„ be a stationary measure that asymptotically dominates p (e.g., the stationary mean).
Suppose that PX |
n |
, PX |
, and MX |
n |
are the distributions induced by p, p„, and m |
||||||
and that MX |
|
|
„ n |
|
|
and PX |
for all n and that fX |
|
and fX |
||
n |
dominates both PX |
n |
n |
||||||||
|
|
|
|
|
|
„ n |
|
„ n |
are the corresponding densities. If there is an invariant function h such that
lim 1
n!1 n
then also
lim 1
n!1 n
„ |
n |
(X |
n |
) = h; p„ ¡ a:e: |
ln fX |
|
|
ln fXn (Xn) = h; p ¡ a:e:
Proof: For any k and n ‚ k we can write using the chain rule for densities
1 |
|
1 |
|
|
|
|
1 |
|
|||||||
|
|
|
|
ln fXn ¡ |
|
ln fXkn¡k = |
|
|
ln fXk jXkn¡k : |
||||||
|
|
|
n |
n |
n |
||||||||||
Since for k • l < n |
1 |
|
|
|
1 |
|
|
|
|
||||||
1 |
ln fXk jXkn¡k = |
ln fXk jXkl |
i(Xk; (Xk+l; ¢ ¢ ¢ ; Xn¡1)jXkl ); |
||||||||||||
|
|
|
|
+ |
|
||||||||||
|
n |
n |
n |
Lemma 8.4.1 and the fact that densities are flnite with probability one implies
that |
|
|
|
|
1 |
|
|
|
|
|
|
|
|
|
||
|
|
|
|
nlim |
|
ln fXk jXkn¡k |
= 0; p ¡ a:e: |
|
|
|
||||||
|
|
|
|
|
|
|
|
|
|
|||||||
|
|
|
|
n |
|
|
|
|||||||||
|
|
|
|
!1 |
|
|
|
|
|
|
|
|
|
|
|
|
This implies that there is a subsequence k(n) ! 1 such that |
|
|||||||||||||||
1 |
ln f |
|
n (Xn) |
|
|
1 |
ln f |
n k(n) |
(Xn¡k(n)); |
! |
0; p |
¡ |
a:e: |
|||
|
n |
|
|
|
||||||||||||
|
|
X |
|
¡ n |
Xk(¡n) |
) |
k(n) |
|
|
To prove this, for each k chose N (k) large enough so that
p(jN1 (k) ln fXk jXkN (k)¡k (XkjXkN(k)¡k)j > 2¡k) • 2¡k

8.4. AMS SOURCES |
155 |
and then let k(n) = k for N (k) • n < N (k + 1). Then from the Borel-Cantelli lemma we have for any † that
p(j 1 ln f N (k)¡k (XkjXN(k)¡k)j > † i.o.) = 0
N (k) Xk jXk k
and hence |
|
|
|
|
|
|
|
|
|
|
|
|
lim |
1 |
ln f |
|
n (Xn) = lim |
1 |
ln f |
n k(n) (Xn¡k(n)); p |
¡ |
a:e: |
|||
n |
X |
|
||||||||||
n |
!1 |
|
n |
!1 |
n |
Xk(¡n) |
k(n) |
|
||||
|
|
|
|
|
|
|
|
|
|
|
In a similar manner we can also choose the sequence so that
lim |
1 |
|
|
|
„ n |
(X |
n |
) = lim |
1 |
|
||||
|
|
|
|
|
|
|
||||||||
|
ln fX |
|
|
|
|
|
||||||||
n!1 n |
|
|
|
|
|
|
|
n!1 n |
||||||
From Markov’s inequality |
|
|
|
|
|
|
|
|
||||||
|
|
|
1 |
ln fXkn¡k (Xkn¡k) ‚ |
||||||||||
|
p„( |
|
||||||||||||
|
n |
|||||||||||||
= p„( |
fXn¡k (Xkn¡k) |
n† |
) |
|||||||||||
|
|
k |
|
|
|
|
‚ e |
|||||||
„ n k |
|
n¡k |
) |
|
|
|||||||||
|
|
|
fXk ¡ |
(Xk |
|
|
|
|
|
|
ln f„ |
n k(n) (Xn¡k(n)); p„ |
¡ |
a:e: |
||||||||
|
|
Xk(¡n) |
|
|
k(n) |
|
|
||||
1 |
|
„ n |
k |
|
n¡k |
) + †) |
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
n ln fXk |
¡ |
(Xk |
|
|
|
|
||||
• |
|
Z |
dp„ |
fXn¡k (Xkn¡k) |
|
||||||
|
fXkn¡k (Xk ¡ |
|
) |
|
|||||||
|
|
e¡n† |
„ k |
n |
|
k |
|
|
|
|
= e¡n† Z |
dmfXkn¡k (Xkn¡k) = e¡n†: |
|
|
||||||||||||||
Hence again invoking the Borel-Cantelli lemma we have that |
|
||||||||||||||||||
1 |
n |
k |
|
|
n¡k |
1 |
|
„ n |
k |
|
n¡k |
) + † i.o.) = 0 |
|
||||||
|
|
|
|
|
|
|
|
||||||||||||
p„( n ln fXk ¡ |
|
(Xk |
) ‚ n ln fXk ¡ |
|
(Xk |
|
|
||||||||||||
and therefore |
|
|
|
1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
ln fXkn¡k (Xkn¡k) |
• h; p„ ¡ a:e: |
(8.19) |
||||||||||||
|
|
lim sup |
|
||||||||||||||||
|
|
n |
|||||||||||||||||
|
|
n!1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
The above event is in the tail ¾-fleld |
n ¾(Xn; Xn+1; ¢ ¢ ¢) since h is invariant |
||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
Thus |
|
|
|
|
|
|
|
|
and p„ dominates p on the tail ¾-fleld. T |
|
|
|
|
|
|
|
|
|||||||||||
|
|
lim sup |
1 |
ln f |
n k(n) |
(Xn¡k(n)) |
• |
h; p |
¡ |
a:e: |
|
||||||||
|
|
n |
|
||||||||||||||||
|
|
n!1 |
|
|
|
Xk(¡n) |
k(n) |
|
|
|
|
|
|
||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
and hence
lim sup 1 ln fXn (Xn) • h; p ¡ a:e:
n!1 n
which half proves the lemma.
Since p„ asymptotically dominates p, given † > 0 there is a k such that
p( lim |
n¡1f„(Xn¡k) = h) |
‚ |
1 |
¡ |
†: |
|
n |
!1 |
k |
|
|
||
|
|
|
|
|
|

156 CHAPTER 8. ERGODIC THEOREMS FOR DENSITIES
Again applying Markov’s inequality and the Borel-Cantelli lemma as previously we have that
|
|
|
1 |
|
f |
X |
n |
¡ |
k(n) (Xn¡k(n)) |
|
|
|
|
|
|
|||||
|
|
|
|
|
|
|
k(n) |
|
|
|
|
|
|
|
|
|||||
lim inf |
|
ln |
|
|
k(n) |
|
|
|
‚ |
0; p |
¡ |
a:e: |
||||||||
|
|
„ |
|
|
|
|
n k(n) |
|
|
|||||||||||
n |
!1 |
n |
|
|
|
|
) |
|
|
|
|
|||||||||
|
|
|
|
|
fXn¡k(n) (Xk(¡n) |
|
|
|
|
|
|
|
||||||||
|
|
|
|
|
|
|
|
k(n) |
|
|
|
|
|
|
|
|
|
|||
which implies that |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
p(lim inf |
1 |
f |
|
n k(n) |
(Xn¡k) |
‚ |
h) |
‚ |
† |
|
|||||||||
|
|
|
|
|||||||||||||||||
|
|
n |
!1 |
|
n |
|
Xk(¡n) |
k |
|
|
|
|
|
|||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
and hence also that
p(lim inf 1 fXn (Xn) ‚ h) ‚ †:
n!1 n
Since † can be made arbitrarily small, this proves that p-a.e. lim inf n¡1hn ‚ h, which completes the proof of the lemma. 2
We can now extend the ergodic theorem for relative entropy densities to the general AMS case.
Corollary 8.4.1: Given the assumptions of Theorem 8.4.1,
|
1 |
n |
(X |
n |
„ |
(X); |
lim |
|
|||||
ln fX |
|
) = Hp„ˆ |
||||
n!1 n |
|
|
|
|
|
where p„ˆ is the ergodic component of the stationary mean p„ of p.
Proof: The proof follows immediately from Theorem 8.4.1 and Corollary 8.3.1, the ergodic theorem for the relative entropy density for the stationary mean. 2
8.5Ergodic Theorems for Information Densities.
As an application of the general theorem we prove an ergodic theorem for mutual information densities for stationary and ergodic sources. The result can be extended to AMS sources in the same manner that the results of Section 8.3 were extended to those of Section 8.4. As the stationary and ergodic result su–ces for the coding theorems and the AMS conditions are messy, only the stationary case is considered here. The result is due to Barron [9].
Theorem 8.5.1: Let fXn; Yng be a stationary ergodic pair random process with standard alphabet. Let PXnY n , PXn , and PY n denote the induced distributions and assume that for all n PXn £ PY n >> PXnY n and hence the information densities
in(Xn; Y n) = dPXnY n d(PXn £ PY n )
are well deflned. Assume in addition that both the fXng and fYng processes have the flnite-gap information property of (8.9) and hence by the comment

8.5. ERGODIC THEOREMS FOR INFORMATION DENSITIES. |
157 |
following Corollary 7.3.1 there is a K such that both processes satisfy the K-
gap property
I(XK ; X¡jXK ) < 1; I(YK ; Y ¡jY K ) < 1:
Then |
1 |
|
|
|
|
„ |
|
|
|
n |
|
n |
|||
nlim |
|
n |
in(X |
|
; Y |
|
) = I(X; Y ); p ¡ a:e:: |
!1 |
|
|
|
|
|
|
|
Proof: Let Zn = (Xn; Yn). |
|
Let MXn = PX(Kn) and MY n = PY(Kn ) denote |
the Kth order Markov approximations of fXng and fYng, respectively. The flnite-gap approximation implies as in Section 8.3 that the densities
|
|
|
|
|
|
|
|
fXn = |
dPXn |
|
|
|
|
|
|
|
|
|||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
dMXn |
|
|
|
|
|
|
|
|||||
and |
|
|
|
|
|
|
dPY n |
|
|
|
|
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
fY n = |
|
|
|
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||
|
|
|
|
|
|
|
|
dMY n |
|
|
|
|
|
|
|
|||||
are well deflned. From Theorem 8.2.1 |
|
|
|
|
|
|
|
|
|
|
|
|||||||||
lim |
1 |
ln f |
|
n (Xn) = H |
(K) (X |
X¡) = I(X |
; X¡ |
Xk) < |
1 |
; |
||||||||||
|
|
|||||||||||||||||||
n!1 n |
X |
|
1 |
|
|
pX jjpX |
0j |
|
|
|
k |
|
|
j |
|
|
||||
|
|
|
lim |
ln f |
n (Y n) = I(Y |
; Y ¡ |
Y k) < |
1 |
: |
|
|
|
||||||||
|
|
|
|
|
|
|
||||||||||||||
|
|
|
n |
!1 |
n |
Y |
|
|
k |
|
|
j |
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Deflne the measures MZn by MXn £ MY n . Then this is a K-step Markov source and since
MXn £ MY n >> PXn £ PY n >> PXn;Y n = PZn ;
the density
fZn = dMZn
is well deflned and from Theorem 8.2.1 has a limit
lim 1 ln fZn (Zn) = Hpjjm(Z0jZ¡):
n!1 n
If the density in(Xn; Y n) is inflnite for any n, then it is inflnite for all larger n and convergence is trivially to the inflnite information rate. If it is flnite, the
chain rule for densities yields |
|
|
|
|
|
||||||
1 |
|
|
1 |
|
1 |
1 |
|
||||
|
|
in(Xn; Y n) = |
|
|
ln fZn (Zn) |
¡ |
|
ln fXn (Xn) ¡ |
|
ln fY n (Y n) |
|
n |
n |
n |
n |
||||||||
|
|
n |
! Hpjjp(k) (Z0jZ¡) ¡ Hpjjp(k) (X0jX¡) ¡ Hpjjp(k) (Y0jY ¡) |
||||||||
|
|
|
!1 |
|
|
|
|
|
|
|
|
|
|
|
„ |
|
|
„ |
|
|
„ |
||
|
|
|
= Hpjjp(k) (X; Y ) ¡ Hpjjp(k) (X) ¡ Hpjjp(k) (Y ): |
The limit is not indeterminate ( of the form 1¡1) because the two subtracted terms are flnite. Since convergence is to a constant, the constant must also be
„
the limit of the expected values of n¡1in(Xn; Y n), that is, I(X; Y ). 2
158 |
CHAPTER 8. ERGODIC THEOREMS FOR DENSITIES |