Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Теория информации / Cover T.M., Thomas J.A. Elements of Information Theory. 2006., 748p

.pdf
Скачиваний:
214
Добавлен:
09.08.2013
Размер:
10.58 Mб
Скачать

 

 

 

 

 

 

 

 

 

 

 

16.7

 

UNIVERSAL PORTFOLIOS

635

find by differentiating with respect to b that the maximum value

 

 

 

w (j n)

=

0

 

b 1

bk (1

b)nk

(16.111)

 

 

 

 

max

 

 

 

 

 

 

 

 

≤ ≤

 

 

 

 

 

 

 

 

 

 

 

 

=

 

k

k

n k

nk ,

(16.112)

 

 

 

 

 

 

 

 

 

 

 

 

n

 

 

 

n

 

 

 

 

which is achieved by

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

b

=

 

k

,

n k

.

(16.113)

 

 

 

 

 

 

 

 

 

 

n

 

n

 

 

 

 

 

 

is

w (j n) > 1, reflecting

 

the fact that the amount “bet” on

Note

that

 

j n

chosen

in hindsight,

thus

 

relieving the hindsight

investor

of the

responsibility of allocating his investments w (j n) to sum to 1. The causal investor has no such luxury. How can the causal investor choose initial investments w(jˆ n), w(jˆ n) = 1, to protect himself from all possible j n and hindsight-determined w (j n)? The answer will be to choose w(jˆ n) proportional to w (j n). Then the worst-case ratio of w(jˆ n)/w (j n) will be maximized. To proceed, we define Vn by

 

 

 

1

 

= n

 

k(j n)

k(j n)

n k(j n)

nk(j n)

(16.114)

 

 

 

 

 

 

 

 

 

 

 

 

 

Vn

 

n

 

 

 

 

n

 

 

 

 

 

 

j

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

n

n

k

k

n k

nk

(16.115)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

= k n

 

 

n

 

 

 

 

 

 

k=0

 

 

 

 

 

 

 

 

 

 

 

 

 

and let

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

w(jˆ

n)

=

Vn

k(j n)

k(j n)

n k(j n)

nk(j n) .

(16.116)

 

 

 

 

 

 

 

n

 

 

 

n

 

It is clear that w(jˆ n) is a legitimate distribution of wealth over the 2n stock sequences (i.e., w(jˆ n) ≥ 0 and j n w(jˆ n) = 1). Here Vn is the normalization factor that makes w(jˆ n) a probability mass function. Also, from (16.109) and (16.113), for all sequences xn,

ˆ

n

)

 

 

w(jˆ

n

)

 

 

 

 

 

 

Sn(x

 

 

min

 

 

 

 

 

 

 

Sn (xn)

w (j n)

 

 

 

 

 

j n

 

 

 

 

 

 

 

=

min

Vn( nk )k ( nn k )nk

 

 

 

k (1

b

 

)n

k

 

 

k

 

 

b

 

 

 

 

Vn,

(16.117)

(16.118)

(16.119)

636 INFORMATION THEORY AND PORTFOLIO THEORY

where (16.117) follows from (16.109) and (16.119) follows from (16.112). Consequently, we have

 

 

ˆ

n

)

 

 

 

max min

Sn

(x

 

V .

(16.120)

S

(xn)

ˆ

xn

n

 

b

n

 

 

 

 

 

We have thus demonstrated a portfolio on the 2n possible sequences of

length that achieves wealth ˆn xn within a factor n of the wealth n S ( ) V

Sn (xn) achieved by the best constant rebalanced portfolio in hindsight. To complete the proof of the theorem, we show that this is the best possible, that is, that any nonanticipating portfolio bi (xi−1) cannot do better than a factor Vn in the worst case (i.e., for the worst choice of xn). To prove this, we construct a set of extremal stock market sequences and show that the performance of any nonanticipating portfolio strategy is bounded by Vn for at least one of these sequences, proving the worst-case bound.

n

{

}

n, we define the corresponding extremal stock mar-

For each j n

 

 

1,

2

ket vector x (j

n

) as

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(1, 0)t

if j

1,

 

 

 

 

 

xi

(ji )

=

(0, 1)t

if ji =

2,

(16.121)

 

 

 

 

 

 

 

i

 

 

 

 

 

 

 

 

 

 

=

 

 

Let e1 = (1, 0)t , e2 = (0, 1)t

be standard basis vectors. Let

 

 

 

 

K = {x(j n) : j n {1, 2}n, xiji

= eji }

(16.122)

be the set of extremal sequences. There are 2n such extremal sequences, and for each sequence at each time, there is only one stock that yields a nonzero return. The wealth invested in the other stock is lost. Therefore, the wealth at the end of n periods for extremal sequence xn(j n) is the product of the amounts invested in the stocks j1, j2, . . . , jn, [i.e., Sn(xn(j n)) = i bji = w(j n)]. Again, we can view this as an investment on sequences of length n, and given the 0 – 1 nature of the return, it is easy to see for xn K that

Sn(xn(j n)) = 1.

(16.123)

j n

For any extremal sequence xn(j n) K, the best constant rebalanced portfolio is

n

n

)) =

 

n (j n)

 

n (j n)

 

t

(16.124)

b (x (j

 

1 n

,

2 n

,

16.7 UNIVERSAL PORTFOLIOS

637

where n1(j n) is the number of occurrences of 1 in the sequence j n. The corresponding wealth at the end of n periods is

Sn (xn(j n))

=

 

n1(j n)

n1(j n)

n2(j n)

n2(j n)

w(jˆ n)

,

(16.125)

 

 

 

 

 

 

n

n

 

 

=

 

Vn

 

from (16.116) and it therefore follows that

 

 

 

 

 

 

 

 

n

1

 

n

 

 

1

 

 

 

 

 

 

Sn (xn) =

 

w(jˆ

n) =

 

 

.

 

(16.126)

 

 

Vn

Vn

 

 

 

 

x K

 

 

 

 

j

 

 

 

 

 

 

 

 

We then have the following inequality for any portfolio sequence {bi }ni=1, with Sn(xn) defined as in (16.104):

min

Sn(xn)

xn K Sn (xn)

 

 

S (xn)

 

Sn(xn)

n

 

 

n

˜

 

˜

 

xn

K

Sn (xn) Sn (xn)

x˜ K

 

 

 

˜

= n

 

Sn(x˜n)

 

 

 

xn

K

Sn (xn)

 

 

x˜ K

 

 

 

 

(16.127)

(16.128)

1

(16.129)

=

 

xn K Sn (xn)

= Vn,

(16.130)

where the inequality follows from the fact that the minimum is less than the average. Thus,

max min

Sn(xn)

 

V .

 

(16.131)

S (xn)

b

xn

K

n

 

 

 

n

 

 

 

The strategy described in the theorem puts mass on all sequences of length n and is clearly dependent on n. We can recast the strategy in incremental terms (i.e., in terms of the amount bet on stock 1 and stock 2 at time 1), then, conditional on the outcome at time 1, the amount bet on each of the two stocks at time 2, and so on. Consider the weight

ˆi,1 assigned by the algorithm to stock 1 at time given the previous b i

sequence of stock vectors xi−1. We can calculate this by summing over all sequences j n that have a 1 in position i, giving

bˆ i,1(xi−1)

 

j

M

 

w(jˆ

i

1

i 1

) ,

(16.132)

 

=

i

1

i

 

1

 

i

1

 

)x(j i

1

 

 

 

 

j i Mi w(jˆ

 

)x(j )

 

 

 

 

638 INFORMATION THEORY AND PORTFOLIO THEORY

where

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

w(jˆ

i ) =

n i

n w(j n)

(16.133)

 

 

 

 

 

 

 

 

 

j

:j j

 

 

 

 

 

 

 

 

is the weight put on all sequences j n that start with j i , and

 

 

 

 

 

 

 

 

 

 

 

i−1

 

 

 

x(j i−1) = xkjk

(16.134)

 

 

 

 

 

 

 

 

 

 

k=1

 

is the return on those sequences as defined in (16.106).

 

Investigation of the asymptotics of Vn reveals [401, 496] that

 

 

 

 

 

 

 

m−1

 

 

 

 

 

 

 

 

 

Vn

 

 

2

 

(m/2)/

 

 

(16.135)

 

 

π

n

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

for m assets. In particular, for m = 2 assets,

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Vn

 

 

2

 

 

 

 

 

 

(16.136)

 

 

 

 

 

π n

 

 

 

 

 

 

 

 

 

 

 

 

and

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

 

 

 

 

 

 

Vn

 

2

 

(16.137)

 

 

 

 

 

 

 

 

 

 

2

n + 1

 

 

n + 1

 

 

 

 

 

 

 

 

 

 

for all n [400]. Consequently, for m = 2 stocks, the causal portfolio strat-

ˆ i−1 ˆ n

egy bi (x ) given in (16.132) achieves wealth Sn(x ) such that

ˆ

 

n

)

 

 

1

 

 

 

Sn

(x

Vn

 

 

(16.138)

S

 

(xn)

 

 

 

n

 

1

 

 

 

2

+

 

 

n

 

 

 

 

 

 

 

for all market sequences xn.

16.7.2Horizon-Free Universal Portfolios

We describe the horizon-free strategy in terms of a weighting of different portfolio strategies. As described earlier, each constantly rebalanced portfolio b can be viewed as corresponding to a mutual fund that rebalances the m assets according to b. Initially, we distribute the wealth among these funds according to a distribution µ(b), where (b) is the amount of wealth invested in portfolios in the neighborhood db of the constantly rebalanced portfolio b.

16.7 UNIVERSAL PORTFOLIOS

639

Let

n

Sn(b, xn) = bt xi

(16.139)

i=1

 

be the wealth generated by a constant rebalanced portfolio b on the stock sequence xn. Recall that

S (xn)

max S (b, xn)

(16.140)

n

= b B

n

 

is the wealth of the best constant rebalanced portfolio in hindsight. We investigate the causal portfolio defined by

bˆ i

 

 

 

B b

S (

b

, xi ) dµ(b)

 

 

1(xi )

 

i

 

.

+

 

 

 

i

 

 

= B Si (b, x ) dµ(b)

 

We note that

ˆ t

i

bi+1

(x )xi+1

= B bt xi+1Si (b, xi ) dµ(b)

B Si (b, xi ) dµ(b) = B Si+1(b, xi+1) dµ(b) .

B Si (b, xi ) dµ(b)

(16.141)

(16.142)

(16.143)

Thus, the product

ˆ t

ˆ

n

)

bi xi

telescopes and we see that the wealth Sn(x

 

resulting from this portfolio is given by

ˆ

n

) =

Sn(x

n

 

bˆ it (xi−1)xi

(16.144)

i=1

 

= Sn(b, xn) dµ(b). (16.145)

b B

There is another way to interpret (16.145). The amount given to portfolio manager b is dµ(b), the resulting growth factor for the manager rebalancing to b is S(b, xn), and the total wealth of this batch of investments is

 

Sˆn(xn) =

Sn(b, xn) dµ(b).

(16.146)

 

 

B

 

ˆ

, defined in (16.141), is the performance-weighted total “buy

Then bi+1

order” of the individual portfolio manager b.

640 INFORMATION THEORY AND PORTFOLIO THEORY

So far, we have not specified what distribution µ(b) we use to apportion the initial wealth. We now use a distribution µ that puts mass on all possible portfolios, so that we approximate the performance of the best portfolio for the actual distribution of stock price vectors.

In the next lemma, we bound ˆn as a function of the initial wealth

S /Sn

distribution µ(b).

Lemma 16.7.2 Let Sn (xn) in 16.140 be the wealth achieved by the best

constant rebalanced portfolio and let ˆn xn in (16.144) be the wealth

S ( )

ˆ (·) achieved by the universal mixed portfolio b , given by

 

bˆ i

 

 

 

 

 

 

 

bS

(

b

, xi ) dµ(b)

 

+

1(xi )

 

 

 

i

 

 

.

 

 

 

 

 

 

i

Then

 

 

 

 

= Si (b, x ) dµ(b)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

ˆ

 

 

n

)

 

 

 

 

 

 

 

n

dµ(b)

 

Sn

(x

 

min

 

 

 

i

1 bji

 

 

 

 

 

 

 

 

 

 

 

 

 

.

 

Sn

 

 

n

)

B =n

 

 

 

 

 

n

 

 

 

(x

j

 

 

 

 

 

i=1 bji

Proof: As before, we can write

 

 

 

 

 

 

 

 

 

Sn (xn) = n

w (j n)x(j n),

 

 

 

 

 

 

 

 

 

j

 

 

 

 

 

 

 

(16.147)

(16.148)

(16.149)

where w (j n)

x(j n) = n=

i 1

n

 

= i=1 bji

is the amount invested on the sequence j n and

xiji is the corresponding return. Similarly, we can write

n

ˆn(xn) = bt xi dµ(b) (16.150)

S

i=1

n

 

 

 

 

=

 

 

 

 

 

 

 

bji xiji dµ(b)

 

(16.151)

 

 

 

 

 

 

j n

 

i=1

 

 

 

 

 

 

 

 

 

 

= n

w(jˆ

n)x(j n),

 

(16.152)

 

 

 

 

 

 

j

 

 

 

 

 

 

 

 

 

 

 

 

 

where w(jˆ n) =

 

n

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

i=1 bji dµ(b). Now applying Lemma 16.7.1, we have

 

n

 

 

 

 

 

 

w(jˆ

n

 

n

 

 

 

Sˆn(x )

 

 

j n

 

n)x(j n)

 

 

(16.153)

 

n

(j

 

 

Sn (x ) =

 

j

n

 

 

)x(j

)

 

 

 

 

 

 

 

w n

)x(j

n

)

 

 

 

 

 

 

 

min

 

w(jˆ

 

 

 

 

 

(16.154)

 

 

 

w (j n)x(j n)

 

 

 

 

 

j n

 

 

 

 

 

 

 

 

 

 

 

 

 

 

n

1 bji

dµ(b)

 

 

 

 

 

 

min

 

 

 

i

 

(16.155)

 

 

 

 

 

 

 

 

 

 

 

 

.

 

 

 

=

B

 

=n

 

 

 

 

 

 

 

n

 

 

 

 

 

 

 

 

 

 

j

 

 

 

 

i=1 bji

 

 

16.7 UNIVERSAL PORTFOLIOS

641

We now apply this lemma when µ(b) is the Dirichlet( 12 ) distribution.

 

 

 

 

 

 

 

 

 

 

 

ˆ

=

 

Theorem 16.7.2 For the causal universal portfolio bi ( ), i

1, 2, . . .,

given in (16.141), with m = 2 stocks and dµ(b) the Dirichlet(

1

,

1

2

2 ) distri-

bution, we have

 

 

 

 

 

 

 

 

 

 

 

 

 

 

ˆ

 

(x

n

)

 

 

1

 

 

 

 

 

 

 

Sn

 

 

 

,

 

 

 

 

S

 

(x n)

 

 

 

 

 

 

 

n

 

1

 

 

 

 

 

 

 

 

2

+

 

 

 

 

 

 

n

 

 

 

 

 

 

 

 

 

 

 

for all n and all stock sequences x n.

Proof: As in the discussion preceding (16.112), we can show that the weight put by the best constant portfolio b on the sequence j n is

n

bji

 

 

k

k

n k

nk

 

2nH (k/n),

(16.156)

 

=

n

 

=

 

 

 

n

 

 

i=1

 

 

 

 

 

 

 

 

 

 

where k is the number of indices where ji = 1. We can also explicitly calculate the integral in the numerator of (16.148) in Lemma 16.7.2 for the Dirichlet( 12 ) density, defined for m variables as

 

 

( m2 )

 

m

 

 

 

 

b21 db,

 

dµ(b) =

 

 

 

 

 

j

(16.157)

 

 

1

 

m

 

 

2

 

j =1

 

where (x) = 0et t x−1 dt denotes the gamma function. For simplicity,

we consider

the case of two stocks, in which case

 

 

 

 

 

 

 

 

 

dµ(b) =

1

 

1

db,

0 ≤ b ≤ 1,

(16.158)

 

 

 

π

 

 

 

 

 

 

b(1 − b)

 

 

 

where b is the fraction of wealth invested in stock 1. Now consider any sequence j n {1, 2}n, and consider the amount invested in that sequence,

 

 

n

 

= bl (1 − b)nl ,

 

 

 

b(j n) = bji

 

(16.159)

 

 

i=1

 

 

 

 

 

 

 

 

 

 

where l is the number of indices where ji = 1. Then

 

 

 

 

=

 

 

 

π

 

1

 

 

 

b(j n) dµ(b)

 

bl (1

 

b)nl

1

 

 

db

(16.160)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

b(1 − b)

 

 

ˆ n
Sn(x )

642 INFORMATION THEORY AND PORTFOLIO THEORY

 

 

 

 

 

 

 

 

1

 

 

bl21 (1

b)nl21 db

(16.161)

 

 

 

 

= π

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

B l

+

1

, n

l

+

1

,

(16.162)

 

 

2

2

= π

 

 

 

 

 

 

 

 

 

 

 

 

where B(λ1, λ2) is the beta function, defined as

 

 

 

 

 

 

 

 

B(λ1, λ2) = 0

1 xλ1−1(1 − x)λ2−1 dx

 

(16.163)

=

 

1) (λ2)

 

 

 

 

 

 

 

 

(16.164)

 

1 + λ2)

 

 

 

 

 

 

 

and

 

 

 

 

xλ−1ex dx.

 

 

 

 

 

 

 

(λ) = 0

 

 

 

 

(16.165)

Note that for any integer n, (n

+

1)

=

n! and (n

+

1 )

=

1·3·5···(2n−1)

 

.

π

 

 

 

 

 

 

 

 

 

 

 

2

2n

We can calculate B(l + 21 , n l + 21 )

by means of simple recursion

using integration by parts. Alternatively, using (16.164), we obtain

 

l

 

1

 

 

 

 

1

 

 

π

 

2n n

B

+

, n

l

+

=

 

 

n l

.

2

2

2n

 

 

 

 

 

 

2

 

2n

 

 

 

 

 

 

 

 

 

 

 

 

 

2l

Combining all the results with Lemma 16.7.2, we have

≥ min

Sn (xn) j n

≥ min

l

B

n

dµ(b)

i=n

 

1 bji

i=1 bji

π1 B(l + 12 , n l + 12 )

2nH (l/n)

(16.166)

(16.167)

(16.168)

1

 

,

 

(16.169)

 

 

 

2

n + 1

 

 

 

 

using the results in [135, Theorem 2].

It follows for m = 2 stocks that

 

 

 

 

 

 

ˆ

 

 

1

 

 

 

 

Sn

 

 

(16.170)

 

 

 

 

Vn

 

 

Sn

 

 

 

2π

16.7 UNIVERSAL PORTFOLIOS

643

for all n and all market sequences x1, x2, . . . , xn. Thus, good minimax per-

formance for all n costs at most an extra factor 2π over the fixed horizon minimax portfolio. The cost of universality is Vn, which is asymptotically negligible in the growth rate in the sense that

1

 

n

 

1

n

1

 

Vn

 

 

n

 

n

n

2π

 

 

 

ln Sˆ

 

(xn)

 

 

ln S (xn)

 

 

ln

 

 

0.

(16.171)

Thus, the universal causal portfolio achieves the same asymptotic growth rate of wealth as the best hindsight portfolio.

Let’s now consider how this portfolio algorithm performs on two real stocks. We consider a 14-year period (ending in 2004) and two stocks, Hewlett-Packard and Altria (formerly, Phillip Morris), which are both components of the Dow Jones Index. Over these 14 years, HP went up by a factor of 11.8, while Altria went up by a factor of 11.5. The performance of the different constantly rebalanced portfolios that contain HP and Altria are shown in Figure 16.2. The best constantly rebalanced portfolio (which can be computed only in hindsight) achieves a growth of a factor of 18.7 using a mixture of about 51% HP and 49% Altria. The universal portfolio strategy described in this section achieves a growth factor of 15.7 without foreknowledge.

Value Sn (b) of initial investment

20

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

S *

 

 

 

 

 

 

 

 

 

 

18

 

 

 

 

 

 

 

 

 

n

 

 

 

 

 

 

 

 

 

 

 

 

 

16

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

14

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

12

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

10

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

8

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

6

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

4

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

 

 

 

Proportion b of wealth in HPQ

 

 

 

FIGURE 16.2. Performance of different constant rebalanced portfolios b for HP and Altria.

644 INFORMATION THEORY AND PORTFOLIO THEORY

16.8 SHANNON–MCMILLAN–BREIMAN THEOREM (GENERAL AEP)

The AEP for ergodic processes has come to be known as the Shannon McMillan Breiman theorem. In Chapter 3 we proved the AEP for i.i.d. processes. In this section we offer a proof of the theorem for a general ergodic process. We prove the convergence of n1 log p(Xn) by sandwiching it between two ergodic sequences.

In a sense, an ergodic process is the most general dependent process for which the strong law of large numbers holds. For finite alphabet processes, ergodicity is equivalent to the convergence of the kth-order empirical distributions to their marginals for all k.

The technical definition requires some ideas from probability theory. To be precise, an ergodic source is defined on a probability space ( , B, P ), where B is a σ -algebra of subsets of and P is a probability measure. A random variable X is defined as a function X(ω), ω , on the probability space. We also have a transformation T : → , which plays the role of a time shift. We will say that the transformation is stationary if P (T A) = P (A) for all A B. The transformation is called ergodic if every set A such that T A = A, a.e., satisfies P (A) = 0 or 1. If T is stationary and ergodic, we say that the process defined by Xn(ω) = X(T nω) is stationary and ergodic. For a stationary ergodic source, Birkhoff’s ergodic theorem states that

n

 

 

 

=

 

 

1

n

Xi (ω)

 

EX

 

X dP with probability 1.

(16.172)

 

 

 

i=1

Thus, the law of large numbers holds for ergodic processes. We wish to use the ergodic theorem to conclude that

n−1

n1 log p(X0, X1, . . . , Xn−1) = − n1 log p(Xi |X0i−1)

i=0

 

n

 

log p(X

n|

0

 

 

 

 

lim E[

 

 

Xn−1)]. (16.173)

 

 

 

 

→∞

 

 

 

 

 

 

 

 

But the stochastic sequence p(X

i |

Xi−1) is

not

ergodic. However,

the

 

 

 

0

 

 

 

Xi−1 ) are ergodic

 

closely related quantities p(X

i |

Xi−1) and

p(X

and

 

ik

 

 

 

i |

−∞

 

have expectations easily identified as entropy rates. We plan to sandwich p(Xi |X0i−1) between these two more tractable processes.