Теория информации / Gray R.M. Entropy and information theory. 1990., 284p
.pdf7.4. STATIONARY PROCESSES |
139 |
Proof: From the chain rule for conditional relative entropy (equation (7.7),
n¡1
X
Hpjjm(XN jX¡) = Hpjjm(XljXl; X¡):
l=0
Stationarity implies that each term in the sum equals Hpjjm(X0jX¡), proving the corollary. 2
The next corollary extends Corollary 7.3.1 to processes.
Corollary 7.4.2: Given k and n ‚ k, let Mk denote the class of all k-step stationary Markov process distributions. Then
inf |
H„ |
(X) = H„ |
(k) (X) = I |
(X |
; X¡ |
Xk): |
m2Mk |
pjjm |
pjjp |
p |
k |
j |
|
Proof: Follows from (7.23) and Theorem 7.3.1. 2
This result gives an interpretation of the flnite-gap information property (6.13): If a process has this property, then there exists a k-step Markov process which is only a flnite \distance" from the given process in terms of limiting per-symbol divergence. If any such process has a flnite distance, then the k- step Markov approximation also has a flnite distance. Furthermore, we can apply Corollary 6.4.1 to obtain the generalization of the flnite alphabet result of Theorem 2.6.2
.
Corollary 7.4.3: Given a stationary process distribution p which satisfles the flnite-gap information property,
inf inf |
„ |
|
„ |
|
(k) (X) = lim |
„ |
(k) (X) = 0: |
H |
pjjm |
(X) = inf H |
pjjp |
H |
|||
k m2Mk |
|
k |
k!1 |
|
pjjp |
Lemma 7.4.1 also yields the following approximation lemma.
Corollary 7.4.4: Given a process fXng with standard alphabet A let p and m be stationary measures such that PXn << MXn for all n and m is kth order Markov. Let qk be an asymptotically accurate sequence of quantizers for
A. Then |
|
„ |
|
|
„ |
(X) = lim |
|
(qk(X)); |
|
H |
H |
pjjm |
||
pjjm |
k!1 |
|
|
that is, the divergence rate can be approximated arbitrarily closely by that of a quantized version of the process. Thus, in particular,
„ ⁄
Hpjjm(X) = Hpjjm(X):
140 |
CHAPTER 7. RELATIVE ENTROPY RATES |
Proof: This follows from Corollary 5.2.3 by letting the generating ¾-flelds be Fn = ¾(qn(Xi); i = 0; ¡1; ¢ ¢ ¢) and the representation of conditional relative entropy as an ordinary divergence. 2
Another interesting property of relative entropy rates for stationary processes is that we can \reverse time" when computing the rate in the sense of the following lemma.
Lemma 7.4.2: Let fXng, p, and m be as in Lemma 7.4.1. |
If either |
|
|||
H„pjjm(X) < 1 or HP jjM (X0jX¡) < 1, then |
|
|
|
|
|
Hpjjm(X0jX¡1; ¢ ¢ ¢ ; X¡n) = Hpjjm(X0jX1; ¢ ¢ ¢ ; Xn) |
|
|
|
|
|
and hence |
|
|
|
|
|
„ |
|
|
|
|
|
Hpjjm(X0jX1; X2; ¢ ¢ ¢) = Hpjjm(X0jjX¡1; X¡2; ¢ ¢ ¢) = Hpjjm(X) < 1: |
|
|
|||
„ |
n |
|
n |
n |
) |
Proof: If Hpjjm(X) is flnite, then so must be the terms Hpjjm(X |
|
) = D(PX |
jjMX |
(since otherwise all such terms with larger n would also be inflnite and hence
„
H could not be flnite). Thus from stationarity
Hpjjm(X0jX¡1; ¢ ¢ ¢ ; X¡n) = Hpjjm(XnjXn)
= D(PXn+1 jjMXn+1 ) ¡ D(PXn jjMXn )
D(PXn+1 jjMXn+1 ) ¡ D(PX1n jjMX1n ) = Hpjjm(X0jX1; ¢ ¢ ¢ ; Xn)
from which the results follow. If on the other hand the conditional relative entropy is flnite, the results then follow as in the proof of Lemma 7.4.1 using the fact that the joint relative entropies are arithmetic averages of the conditional relative entropies and that the conditional relative entropy is deflned as the divergence between the P and S measures (Theorem 5.3.2). 2
7.5Mean Ergodic Theorems
In this section we state and prove some preliminary ergodic theorems for relative entropy densities analogous to those flrst developed for entropy densities in Chapter 3 and for information densities in Section 6.3. In particular, we show that an almost everywhere ergodic theorem for flnite alphabet processes follows easily from the sample entropy ergodic theorem and that an approximation argument then yields an L1 ergodic theorem for stationary sources. The results involve little new and closely parallel those for mutual information densities and therefore the details are skimpy. The results are given for completeness and because the L1 results yield the byproduct that relative entropies are uniformly integrable, a fact which does not follow as easily for relative entropies as it did for entropies.
7.5. MEAN ERGODIC THEOREMS |
141 |
Finite Alphabets
Suppose that we now have two process distributions p and m for a random process fXng with flnite alphabet. Let PXn and MXn denote the induced nth order distributions and pXn and mXn the corresponding probability mass functions (pmf’s). For example, pXn (an) = PXn (fxn : xn = ang) = p(fx : Xn(x) = ang). We assume that PXn << MXn . In this case the relative entropy density is given simply by
hn(x) = hXn (Xn)(x) = ln pXn (xn) ; mXn (xn)
where xn = Xn(x).
The following lemma generalizes Theorem 3.1.1 from entropy densities to relative entropy densities for flnite alphabet processes. Relative entropies are of more general interest than ordinary entropies because they generalize to continuous alphabets in a useful way while ordinary entropies do not.
Lemma 7.5.1: Suppose that fXng is a flnite alphabet process and that p and m are two process distributions with MXn >> PXn for all n, where p is AMS with stationary mean p„, m is a kth order Markov source with stationary transitions, and fp„xg is the ergodic decomposition of the stationary mean of p.
Assume also that MX |
n |
|
|
|
„ |
n |
for all n. Then |
|
|
|
|
|
|
||||||||||
|
>> PX |
|
|
|
|
|
|
|
|
||||||||||||||
|
|
|
1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
nlim |
|
hn |
= h; p ¡ a.e. and in L1(p); |
|
|||||||||||||||||||
n |
|
||||||||||||||||||||||
|
!1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
where h(x) is the invariant function deflned by |
|
|
|
|
|
|
|||||||||||||||||
|
|
|
|
|
|
„ |
|
|
|
|
|
|
|
|
|
|
|
|
k |
) |
|
||
h(x) = ¡Hp„x |
(X) ¡ Ep„x ln m(XkjX |
|
|||||||||||||||||||||
|
|
|
|
|
|
1 |
|
|
|
|
|
|
n |
|
|
„ |
|
|
|
|
|
|
|
= |
|
lim |
|
H |
p„xjjm |
(X |
|
) = H |
p„xjjm |
(X); |
(7.27) |
||||||||||||
where |
|
n!1 n |
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||
|
|
|
|
|
|
|
m |
|
k+1 (xk+1) |
|
|
|
|
|
|
|
|
|
|||||
|
k |
|
|
|
|
|
|
|
|
|
|
|
|
|
k |
|
|||||||
|
|
|
|
|
|
|
X |
|
|
|
|
|
|
|
|
|
|
|
|
||||
m(XkjX |
|
)(x) · |
|
|
|
|
= MXk jXk (xkjx |
): |
|||||||||||||||
|
|
|
mXk (xk) |
||||||||||||||||||||
Furthermore, |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
„ |
|
|
|
|
|
|
|
|
|
|
|
|
|
n |
|
|
||
Eph = H |
pjjm |
(X) = lim |
|
|
H |
pjjm |
(X |
|
); |
(7.28) |
|||||||||||||
|
|
|
|
|
|
|
|
|
n!1 n |
|
|
|
|
|
that is, the relative entropy rate of an AMS process with respect to a Markov process with stationary transitions is given by the limit. Lastly,
„ |
„ |
(7.29) |
Hpjjm(X) = Hp„jjm(X); |
that is, the relative entropy rate of the AMS process with respect to m is the same as that of its stationary mean with respect to m.
Proof: We have that
n1 h(Xn) = n1 ln p(Xn) ¡ n1 ln m(Xk) +
i=k
142 |
|
|
|
|
|
|
|
|
|
|
|
|
|
CHAPTER 7. |
|
RELATIVE ENTROPY RATES |
|
||||||||||||||||||
|
|
|
= 1 ln p(Xn) |
|
|
|
1 ln m(Xk) |
|
|
1 |
|
n¡1 ln m(X Xk)T i¡k; |
(7.30) |
|
|||||||||||||||||||||
|
|
|
|
|
|
|
|
¡ |
|
|
|
|
|
|
|
|
¡ |
|
|
X |
|
|
|
|
|
kj |
|
|
|
||||||
|
|
|
|
n |
|
n |
|
|
|
|
|
|
n |
|
i=k |
|
|
|
|
|
|
|
|
||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
where T is the shift transformation, p(Xn) is an abbreviation for PXn (Xn), and |
|
||||||||||||||||||||||||||||||||||
m(X |
Xk) = M |
Xk jX |
k (Xk |
Xk). From Theorem 3.1.1 the flrst term converges to |
|
||||||||||||||||||||||||||||||
„ |
kj |
|
|
|
|
1j |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
¡Hp„x (X)p-a.e. and in L (p). |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||
Since MXk >> PXk , if MXk (F ) = 0, then also PXk (F ) = 0. Thus PXk and |
|
||||||||||||||||||||||||||||||||||
hence also p assign zero probability to the event that MXk (Xk) = 0. Thus with |
|
||||||||||||||||||||||||||||||||||
probability one under p, ln m(Xk) is flnite and hence the second term in (7.5.4) |
|
||||||||||||||||||||||||||||||||||
converges to 0 p-a.e. as n ! 1. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
k |
). |
|||||||||||||||
Deflne fi as the minimum nonzero value of the conditional probability m(xkjx |
|
||||||||||||||||||||||||||||||||||
Then with probability 1 under MXn and hence also under PXn we have that |
|
|
|||||||||||||||||||||||||||||||||
|
|
|
|
|
|
|
|
1 n¡1 |
|
|
|
1 |
|
|
|
|
|
|
|
1 |
|
|
|
|
|
|
|
||||||||
|
|
|
|
|
|
|
|
|
X |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||
|
|
|
|
|
|
|
|
|
|
|
|
j |
¡ |
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||
|
|
|
|
|
|
|
|
n |
i=k |
ln m(Xi |
Xik k) • ln fi |
|
|
|
|
|
|
|
|||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||
since otherwise the sequence Xn would have 0 probability under MXn and hence |
|
||||||||||||||||||||||||||||||||||
also under PXn and 0 ln 0 is considered to be 0. |
Thus the rightmost term of |
|
|||||||||||||||||||||||||||||||||
(7.30) is uniformly integrable with respect to p and hence from Theorem 1.8.3 |
|
||||||||||||||||||||||||||||||||||
this term converges to Ep„x (ln m(XkjXk)). This proves the leftmost equality of |
|
||||||||||||||||||||||||||||||||||
(7.27). |
|
|
denote the distribution of Xn under the ergodic component p„x. |
|
|||||||||||||||||||||||||||||||
Let p„ |
n |
jx |
|
||||||||||||||||||||||||||||||||
|
X |
|
|
„ n |
„ |
|
|
= |
|
dp„(x)„pXn x, if MX |
|
|
(F ) = 0, then p„Xn x(F ) = |
|
|||||||||||||||||||||
Since MX |
n |
|
|
|
n |
|
n |
|
|||||||||||||||||||||||||||
|
>> PX |
and PX |
|
|
|
|
|
|
|||||||||||||||||||||||||||
0 p-a.e. Since the alphabet of XnRif flnite, wejtherefore also have with probabilityj |
|
||||||||||||||||||||||||||||||||||
one under p„ that MXn >> p„ |
|
|
n |
jx |
and hence |
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
X |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
|
|
|
|
|
H |
|
(Xn) = |
|
|
p„ n |
|
|
|
|
|
p„ |
|
n |
jx |
(an) |
|
|
|
|||||||||||
|
|
|
|
|
|
|
X |
|
(an) ln |
X |
|
|
|
|
|
|
|
||||||||||||||||||
|
|
|
|
|
|
|
p„xjjm |
|
|
|
|
|
|
|
|
|
n |
X jx |
|
|
|
|
|
MXn (an) |
|
|
|
|
|||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
a |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
is well deflned for p„-almost all x. This expectation can also be written as |
|
|
|||||||||||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
n¡1 |
|
|
|
|
|
||||
|
Hp„xjjm(Xn) = ¡Hp„x (Xn) ¡ Ep„x [ln m(Xk) + |
X |
|
|
|
|
|
||||||||||||||||||||||||||||
|
|
|
|
|
ln m(XkjXk)T i¡k] |
|
|
|
i=k
= ¡Hp„x (Xn) ¡ Ep„x [ln m(Xk)] ¡ (n ¡ k)Ep„x [ln m(XkjXk)];
where we have used the stationarity of the ergodic components. Dividing by n and taking the limit as n ! 1, the middle term goes to zero as previously and the remaining limits prove the middle equality and hence the rightmost inequality in (7.27).
Equation (7.28) follows from (7.27) and L1(p) convergence, that is, since n¡1hn ! h, we must also have that Ep(n¡1hn(Xn)) = n¡1Hpjjm(Xn) converges
„
to Eph. Since the former limit is Hpjjm(X), (7.28) follows. Since p„x is invariant (Theorem 1.8.2) and since expectations of invariant functions are the same under an AMS measure and its stationary mean (Lemma 6.3.1 of [50]), application of the previous results of the lemma to both p and p„ proves that
Z Z
„ „ „ „
Hpjjm(X) = dp(x)Hp„x jjm(X) = dp„(x)Hp„x jjm(X) = Hp„jjm(X);
7.5. MEAN ERGODIC THEOREMS |
143 |
which proves (7.30) and completes the proof of the lemma. 2
Corollary 7.5.1: Given p and m as in the Lemma, then the relative entropy rate of p with respect to m has an ergodic decomposition, that is,
Z
„ „
Hpjjm(X) = dp(x)Hp„x jjm(X):
Proof: This follows immediately from (7.27) and (7.28). 2
Standard Alphabets
We now drop the flnite alphabet assumption and suppose that fXng is a standard alphabet process with process distributions p and m, where p is stationary, m is kth order Markov with stationary transitions, and MXn >> PXn are the induced vector distributions for n = 1; 2; ¢ ¢ ¢ . Deflne the densities fn and entropy densities hn as previously.
As an easy consequence of the development to this point, the ergodic decomposition for divergence rate of flnite alphabet processes combined with the deflnition of H⁄ as a supremum over rates of quantized processes yields an extension of Corollary 6.2.1 to divergences. This yields other useful properties as summarized in the following corollary.
Corollary 7.5.1: Given a standard alphabet process fXng suppose that p and m are two process distributions such that p is AMS and m is kth order Markov with stationary transitions and MXn >> PXn are the induced vector distributions. Let p„ denote the stationary mean of p and let fp„xg denote the ergodic decomposition of the stationary mean p„. Then
Hp⁄jjm(X) = Z |
dp(x)Hp„⁄x jjm(X): |
(7.31) |
In addition, |
|
|
Hp⁄jjm(X) = Hp„⁄jjm(X) = H„p„jjm(X) = H„pjjm(X); |
(7.32) |
that is, the two deflnitions of relative entropy rate yield the same values for AMS p and stationary transition Markov m and both rates are the same as the corresponding rates for the stationary mean. Thus relative entropy rate has an ergodic decomposition in the sense that
Z
„ „
Hpjjm(X) = dp(x)Hp„x jjm(X): (7.33)
Comment: Note that the extra technical conditions of Theorem 6.4.2 for
„ ⁄
equality of the analogous mutual information rates I and I are not needed here. Note also that only the ergodic decomposition of the stationary mean p„ of the AMS measure p is considered and not that of the Markov source m.
144 |
CHAPTER 7. RELATIVE ENTROPY RATES |
Proof: The flrst statement follows as previously described from the flnite alphabet result and the deflnition of H⁄. The left-most and right-most equalities of (7.32) both follow from the previous lemma. The middle equality of (7.32) follows from Corollary 7.4.2. Eq. (7.33) then follows from (7.31) and (7.32). 2
Theorem 7.5.1: Given a standard alphabet process fXng suppose that p and m are two process distributions such that p is AMS and m is kth order Markov with stationary transitions and MXn >> PXn are the induced vector distributions. Let fp„xg denote the ergodic decomposition of the stationary mean p„. If
|
1 |
|
n |
„ |
|
nlim |
|
n |
Hpjjm(X |
|
) = Hpjjm(X) < 1; |
!1 |
|
|
|
|
|
then there is an invariant function h such that n¡1hn ! h in L1(p) as n ! 1. In fact,
„
h(x) = Hp„x jjm(X);
the relative entropy rate of the ergodic component p„x with respect to m. Thus, in particular, under the stated conditions the relative entropy densities hn are uniformly integrable with respect to p.
Proof: The proof exactly parallels that of Theorem 6.3.1, the mean ergodic theorem for information densities, with the relative entropy densities replacing the mutual information densities. The density is approximated by that of a quantized version and the integral bounded above using the triangle inequality.
„ ⁄
One term goes to zero from the flnite alphabet case. Since H = H (Corollary 7.5.1) the remaining terms go to zero because the relative entropy rate can be approximated arbitrarily closely by that of a quantized process. 2
It should be emphasized that although Theorem 7.5.1 and Theorem 6.3.1 are similar in appearance, neither result directly implies the other. It is true that mutual information can be considered as a special case of relative entropy, but given a pair process fXn; Yng we cannot in general flnd a kth order Markov
„
distribution m for which the mutual information rate I(X; Y ) equals a relative
„
entropy rate Hpjjm. We will later consider conditions under which convergence of relative entropy densities does imply convergence of information densities.
Chapter 8
Ergodic Theorems for Densities
8.1Introduction
This chapter is devoted to developing ergodic theorems flrst for relative entropy densities and then information densities for the general case of AMS processes with standard alphabets. The general results were flrst developed by Barron [9] using the martingale convergence theorem and a new martingale inequality. The similar results of Algoet and Cover [7] can be proved without direct recourse to martingale theory. They infer the result for the stationary Markov approximation and for the inflnite order approximation from the ordinary ergodic theorem. They then demonstrate that the growth rate of the true density is asymptotically sandwiched between that for the kth order Markov approximation and the inflnite order approximation and that no gap is left between these asymptotic upper and lower bounds in the limit as k ! 1. They use martingale theory to show that the values between which the limiting density is sandwiched are arbitrarily close to each other, but we shall see that this is not necessary and this property follows from the results of Chapter 6.
8.2Stationary Ergodic Sources
Theorem 8.2.1: Given a standard alphabet process fXng, suppose that p and m are two process distributions such that p is stationary ergodic and m is a K- step Markov source with stationary transition probabilities. Let MXn >> PXn be the vector distributions induced by p and m. As before let
hn = ln fXn (Xn) = ln dPXn (Xn): dMXn
145
146 |
CHAPTER 8. |
ERGODIC THEOREMS FOR DENSITIES |
||||
Then with probability one under p |
|
|
|
|||
|
|
1 |
„ |
|
|
|
|
lim |
|
hn = H |
pjjm |
(X): |
|
|
n |
!1 n |
|
|
||
|
|
|
|
|
Proof: Let p(k) denote the k-step Markov approximation of p as deflned in Theorem 7.3.1, that is, p(k) has the same kth order conditional probabilities and k-dimensional initial distribution. From Corollary 7.3.1, if k ‚ K, then (7.8){(7.10) hold. Consider the expectation
Ep |
ˆ fXn |
(Xn) ! |
= EPXn |
ˆ fXn |
! |
= Z |
ˆ fXn |
! dPXn : |
|||||||
|
|
f |
(kn) |
(Xn) |
|
|
f |
(kn) |
|
|
f |
(kn) |
|||
|
|
|
X |
|
|
|
|
|
X |
|
|
|
|
X |
|
Deflne the set An = fxn : fXn > 0g; then PXn (An) = 1. Use the fact that fXn = dPXn =dMXn to write
EP |
ˆ fXn |
(Xn) ! |
= |
An |
ˆ fXn |
! fXn dMXn |
|||||
|
|
f |
(kn) |
(Xn) |
|
Z |
f |
(kn) |
|||
|
|
|
X |
|
|
|
|
|
|
X |
|
Z
=fX(kn) dMXn :
An
From Corollary 7.3.1,
|
|
|
|
|
|
(k) |
|
= |
dPX(kn) |
|
|
|
|
|
|
|
||||||||||
|
|
|
|
|
fXn |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||
|
|
|
|
|
|
|
|
|
|
|
|
dMXn |
|
|
|
|
|
|
|
|||||||
and therefore |
|
|
|
! |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
ˆ |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
f (kn)(Xn) |
= ZAn |
|
dP (kn) |
|
|
|
|
|
|
|
|
(k) |
|
||||||||||||
Ep fXn (Xn) |
|
dMXn dMXn = PXn (An) • 1: |
|
|||||||||||||||||||||||
|
X |
|
|
|
|
|
|
|
|
|
|
X |
|
|
|
|
|
|
|
|
|
|
|
|
||
Thus we can apply Lemma 5.4.2 to the sequence f |
(kn)(Xn)=fXn (Xn) to con- |
|||||||||||||||||||||||||
clude that with p-probability 1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
X |
|
|||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||
|
|
|
|
|
|
1 |
|
|
|
f |
(kn) |
(Xn) |
|
|
|
|
|
|
|
|||||||
|
|
|
|
|
nlim |
|
ln |
|
X |
|
|
|
|
|
|
• 0 |
|
|
|
|||||||
|
|
|
|
|
n |
f |
X |
n |
(Xn) |
|
|
|
|
|||||||||||||
|
|
|
|
|
!1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
and hence |
|
1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
|
|
|
|
|
|
|
lim |
ln f (kn) |
(X |
n |
) |
|
|
lim inf |
|
f |
|
n (Xn): |
(8.1) |
|||||||||||||
|
|
|
|
|
|
|
X |
|||||||||||||||||||
|
n |
!1 |
n |
X |
|
|
• |
n |
!1 |
|
|
n |
|
|
|
|||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The left-hand limit is well deflned by the usual ergodic theorem:
|
1 (k) |
|
1 n¡1 |
|
|
1 |
|
|||
nlim |
|
ln fXn |
(Xn) = nlim |
|
X |
ln fXljXlk¡k |
(XljXlk¡k) + nlim |
|
|
ln fXk (Xk): |
n |
n |
|
n |
|||||||
!1 |
|
|
!1 |
|
l=k |
|
!1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
Since 0 < fXk < 1 with probability 1 under MXk and hence also under PXk , then 0 < fXk (Xk) < 1 under p and therefore n¡1 ln fXk (Xk) ! 0 as n ! 1 with probability one. Furthermore, from the ergodic theorem for stationary and
8.2. STATIONARY ERGODIC SOURCES |
147 |
ergodic processes (e.g., Theorem 7.2.1 of [50]), since p is stationary ergodic we have with probability one under p using (7.20) and Corollary 7.4.1 that
|
|
|
|
|
1 n¡1 |
|
|
|
|
|
||||||
|
|
|
nlim |
|
|
|
|
X |
ln fXljXlk¡k (XljXlk¡k) |
|
|
|||||
|
|
|
n |
|
|
|
|
|
||||||||
|
|
!1 |
|
|
|
|
l=k |
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
|
|
|
|
|
|
1 n¡1 |
|
|
|
|
|||||
|
|
= nlim |
|
|
|
|
|
X |
ln fX0jX¡1;¢¢¢;X¡k (X0j |
|
|
|||||
|
|
|
|
n |
|
|
|
|
||||||||
|
|
!1 |
|
|
|
|
l=k |
|
|
|
|
|||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
X¡1; ¢ ¢ ¢ ; X¡k)T l = Ep ln fX0jX¡1;¢¢¢;X¡k (X0jX¡1; ¢ ¢ ¢ ; X¡k) |
|
|||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
„ |
|
|
|
= Hpjjm(X0jX¡1; ¢ ¢ ¢ ; X¡k) = Hp(k)jjm(X): |
|
||||||||||||||
Thus with (8.1) we now have that |
|
|
|
|||||||||||||
lim inf |
1 |
ln f |
|
|
|
|
(X |
n |
) H (X X ; |
; X ) |
(8.2) |
|||||
|
|
|
|
|
|
|||||||||||
n |
!1 |
n X |
n |
|
|
|
|
‚ pjjm 0j ¡1 ¢ ¢ ¢ |
¡k |
|
||||||
|
|
|
|
|
|
|
|
|||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
for any positive integer k. Since m is Kth order Markov, Lemma 7.4.1 and the above imply that
lim inf |
1 |
ln f |
|
n (Xn) |
‚ |
H |
|
(X |
X¡) = H„ |
|
(X); |
(8.3) |
||
n |
X |
pjjm |
pjjm |
|||||||||||
n |
!1 |
|
|
|
|
0j |
|
|
||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
which completes half of the sandwich proof of the theorem.
„pjjm(X) = 1, the proof is completed with (8.3). Hence we can suppose
H
If
„ |
|
|
|
|
|
|
that Hpjjm(X) < 1. From Lemma 7.4.1 using the distribution SX0;X¡1;X¡2;¢¢¢ |
||||||
constructed there, we have that |
|
|
|
|
|
|
D(PX0;X¡1;¢¢¢jjSX0;X¡1;¢¢¢) = Hpjjm(X0jX¡) = Z |
dPX0;X¡ ln fX0jX¡ |
|||||
where |
|
|
dPX0 |
|
|
|
fX0 |
|
X¡ = |
;X¡1;¢¢¢ |
: |
|
|
j |
dS |
|
||||
|
|
|||||
|
|
;X¡1;¢¢¢ |
|
|||
|
|
|
X0 |
|
It should be pointed out that we have not (and will not) prove that fX0jX¡1;¢¢¢;X¡n ! fX0jX¡ ; the convergence of conditional probability densities which follows
from the martingale convergence theorem and the result about which most generalized Shannon-McMillan-Breiman theorems are built. (See, e.g., Barron [9].) We have proved, however, that the expectations converge (Lemma 7.4.1), which is what is needed to make the sandwich argument work.
For the second half of the sandwich proof we construct a measure Q which will be dominated by p on semi-inflnite sequences using the above conditional densities given the inflnite past. Deflne the semi-inflnite sequence Xn¡ = f¢ ¢ ¢ ; Xn¡1g
for all nonnegative integers n. Let Bkn = ¾(Xkn) and Bk¡ = ¾(Xk¡) = ¾(¢ ¢ ¢ ; Xk¡1) be the ¾-flelds generated by the flnite dimensional random vector Xkn and the
semi-inflnite sequence Xk¡, respectively. Let Q be the process distribution having the same restriction to ¾(Xk¡) as does p and the same restriction to
148 |
CHAPTER 8. ERGODIC THEOREMS FOR DENSITIES |
¾(X0; X1; ¢ ¢ ¢) as does p, but which makes X¡ and Xkn conditionally independent given Xk for any n; that is,
QXk¡ = PXk¡ ;
QXk ;Xk+1;¢¢¢ = PXk ;Xk+1;¢¢¢;
and X¡ ! Xk ! Xkn is a Markov chain for all positive integers n so that
Q(Xkn 2 F jXk¡) = Q(Xkn 2 F jXk):
The measure Q is a (nonstationary) k-step Markov approximation to P in the sense of Section 5.3 and
Q = PX¡£(Xk ;Xk+1;¢¢¢)jXk
(in contrast to P = PX¡Xk Xk1 ). Observe that X¡ ! Xk ! Xkn is a Markov chain under both Q and m.
By assumption,
Hpjjm(X0jX¡) < 1 and hence from Corollary 7.4.1
Hpjjm(XknjXk¡) = nHpjjm(XknjXk¡) < 1 |
|
||||||||||
and hence from Theorem 5.3.2 the density fXknjXk¡ is well-deflned as |
|
||||||||||
fXknjXk¡ = |
|
dSXn¡+k |
|
|
|
(8.4) |
|||||
|
PX¡ |
|
|
|
|||||||
|
|
|
|
|
|
|
n+k |
|
|
|
|
where |
|
|
|
|
|
|
|
|
|
|
|
SXn¡+k = |
|
|
; |
(8.5) |
|||||||
MXknjXk PXk¡ |
|||||||||||
and |
|
|
|
|
|
|
|
|
|
|
|
Z dPXn¡+k ln fXknjXk¡ = D(PXn¡+k jjSXn¡+k ) |
|
||||||||||
= nHpjjm(XknjXk¡) < 1: |
(8.6) |
||||||||||
Thus, in particular, |
|
|
|
|
|
|
|
|
|
|
|
SXn¡+k >> PXn¡+k : |
|
|
|
|
|||||||
Consider now the sequence of ratios of conditional densities |
|
||||||||||
‡n = |
fXknjXk (Xn+k) |
|
|
||||||||
|
|
||||||||||
|
f |
|
n |
¡(X¡ |
) |
|
|
|
|||
|
|
|
Xk jXk |
n+k |
|
|
|
|
|||
We have that |
dp‡n = Z |
|
|
|
|
|
|||||
Z |
‡n |
|
|
|
|
Gn