Добавил:

fench Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Сумский государственный педагогический университет им. Макаренко

Предмет:

Теория вероятностей и математическая статистика

Файл:

Теория информации / Cover T.M., Thomas J.A. Elements of Information Theory. 2006., 748p

.pdf

Скачиваний:

214

Добавлен:

09.08.2013

Размер:

10.58 Mб

Скачать

☆

<<< < Предыдущая 54 55 56 57 58 59 60 61 62 63 64 65 66 6768 / 7868 69 70 71 72 73 74 75 76 77 78 > Следующая >>>

16.8 SHANNON–MCMILLAN–BREIMAN THEOREM (GENERAL AEP)

645

We deﬁne the kth-order entropy H k as
H k = E {− log p(Xk \|Xk−1, Xk−2, . . . , X0)}	(16.174)
= E {− log p(X0\|X−1, X−2, . . . , X−k )} ,	(16.175)

where the last equation follows from stationarity. Recall that the entropy

rate is given by				H = klim H k
				H = klim H k					(16.176)
				→∞
					1 n−1
				= n→∞ n
				lim		H k .			(16.177)
						k=0
Of course,	H k		H	by stationarity and the			fact that conditioning does
Of course,				by stationarity and the			k	H = H ∞, where
not increase entropy. It will be crucial that H								H = H ∞, where
		H ∞ = E {− log p(X0\|X−1, X−2, . . .)} .							(16.178)

The proof that H ∞ = H involves exchanging expectation and limit. The main idea in the proof goes back to the idea of (conditional) propor-

tional gambling. A gambler receiving uniform odds with the knowledge of the k past will have a growth rate of wealth log |X| − H k , while a gambler with a knowledge of the inﬁnite past will have a growth rate of wealth

of log	\|X\| −	n
of log		H ∞. We don’t know the wealth growth rate of a gambler

with growing knowledge of the past X0 , but it is certainly sandwiched between log |X| − H k and log |X| − H ∞. But H k H = H ∞. Thus, the sandwich closes and the growth rate must be log |X| − H .

We will prove the theorem based on lemmas that will follow the proof.

Theorem 16.8.1 (AEP: Shannon–McMillan–Breiman Theorem) If H is the entropy rate of a ﬁnite-valued stationary ergodic process {Xn},

then
1	, . . . , Xn−1) → H with probability 1.
− n log p(X0	, . . . , Xn−1) → H with probability 1.	(16.179)

Proof: We prove this for ﬁnite alphabet X; this proof and the proof for countable alphabets and densities is given in Algoet and Cover [20]. We argue that the sequence of random variables − n1 log p(X0n−1) is asymptotically sandwiched between the upper bound H k and the lower bound H ∞ for all k ≥ 0. The AEP will follow since H k → H ∞ and H ∞ = H . The kth-order Markov approximation to the probability is deﬁned for n ≥ k as

n−1
pk (X0n−1) = p(X0k−1) p(Xi \|Xii−−k1).	(16.180)
i=k

646 INFORMATION THEORY AND PORTFOLIO THEORY

From Lemma 16.8.3 we have

log

pk (X0n−1)

≤ 0,

(16.181)

lim sup

n→∞

p(X0 −

)

which we

rewrite,

taking

the

existence

the

limit

1 log pk (Xn) into

account (Lemma 16.8.1), as

log

= H k

(16.182)

lim sup

≤ nlim

n→∞

p(X0

−

)

→∞

(X0

−

)

for k = 1, 2, . . . . Also, from Lemma 16.8.3, we have

p(X0n−1)

≤ 0,

(16.183)

lim sup

log

−

X−

→∞

p(X

)

−∞

which we rewrite as

log

= H ∞

(16.184)

lim inf

≥ lim

−

X−

p(X

−

)

p(X

)

| −∞

from the deﬁnition of H ∞ in Lemma 16.8.1.

Putting together (16.182) and (16.184), we have

≤

− n

H ∞

lim inf

log p(Xn−1)

lim sup

log p(Xn−1)

≤ H k for all k.

(16.185)

But by Lemma 16.8.2, H k → H ∞ = H . Consequently,

log p(X0n) = H.

(16.186)

lim −

We now prove the lemmas that were used in the main proof. The ﬁrst lemma uses the ergodic theorem.

Lemma 16.8.1 (Markov approximations ) For a stationary ergodic stochastic process {Xn},

	1		0	→
	− n		0	→
1		log pk (Xn−1)			H k	with probability 1,	(16.187)
1	0		\| −∞	→
− n	0		\| −∞	→
	log p(Xn−1		X−1 )		H ∞	with probability 1.	(16.188)

16.8 SHANNON–MCMILLAN–BREIMAN THEOREM (GENERAL AEP)

647

Proof:

Functions Yn = f (X−n ∞) of ergodic processes {Xi } are ergodic

processes. Thus, log p(X

Xn−1) and log p(X

n−1

, X

n−2

, . . . , ) are also

ergodic processes, and

n−k

1 n−1

−

log pk (X0n−1)

= −

log p(X0k−1) −

log p(Xi |Xii−−k1) (16.189)

i=k

→ 0 + H k

with probability 1,

(16.190)

by the ergodic theorem. Similarly, by the ergodic theorem,

log p(X0n−1|X−1, X−2, . . .) = −

1 n−1

−

log p(Xi |Xi−1, Xi−2, . . .)

i=0

(16.191)

→

∞

with probability 1.

(16.192)

Lemma 16.8.2

(No gap)

H k H ∞ and H = H ∞.

Proof:

We know that for stationary processes, H k H , so it remains

to show that H k H ∞,

thus yielding H = H ∞. Levy’s

martingale

convergence theorem for conditional probabilities asserts that

−k

→

p(x

−∞

with probability 1

(16.193)

p(x

X−1)

X−1 )

for all x0 X. Since X is ﬁnite and p log p is bounded and continuous in p for all 0 ≤ p ≤ 1, the bounded convergence theorem allows interchange of expectation and limit, yielding

k→∞

= k→∞

−

lim

H k

lim

x0 X

p(x0

X−k1) log p(x0

X−k1)

(16.194)

−

−∞

) log p(x0

−∞

)

(16.195)

x0 X

p(x0

X−1

H ∞.

(16.196)

Thus, H k H = H ∞.

648 INFORMATION THEORY AND PORTFOLIO THEORY

Lemma 16.8.3

(Sandwich)

pk (X0n−1)

lim sup

log

≤ 0,

(16.197)

n→∞

p(X0 −

)

p(Xn−1)

)

≤ 0.

(16.198)

p(Xn−1

X−1

lim sup

log

| −∞

Proof: Let A be the support set of p(X0n−1). Then

pk (Xn−1)

pk (xn−1)

p(Xn−1)

p(xn−1)

p(x0n−1)

(16.199)

xn−1

n 1

pk (x0n−1)

(16.200)

x0 − A

= pk (A)

(16.201)

≤ 1.

(16.202)

−∞

·|

−∞

Similarly, let B(X−1 ) denote the support set of p( X−1

). Then we have

p(X0n−1)

E E

p(X0n−1)

X−1

(16.203)

p(Xn−1

X−1

p(Xn−1

X−1

)

−∞

p(xn)

= E

p(x

|X−∞−

)

(16.204)

p(xn X−1 )

B(X

−−∞

)

−∞

= E

p(x

(16.205)

)

B(X

−−∞

)

≤ 1.

(16.206)

By Markov’s inequality and (16.202), we have

pk (X0n−1)

≥

(16.207)

p(X0n−1)

≤ tn

SUMMARY

649

log

pk (X0n−1)

log tn

(16.208)

p(Xn−1)

≥ n

≤ tn

Letting t

n =

and

noting that

∞

, we

see by the

Borel – Cantelli lemma that the event

log

pk (X0n−1)

log tn

(16.209)

p(Xn−1)

≥ n

occurs only ﬁnitely often with probability 1. Thus,

lim sup

log

pk (X0n−1)

≤ 0

with probability 1.

(16.210)

p(Xn−1)

Applying the same arguments using Markov’s inequality to (16.206), we obtain

1				p(X0n−1)				≤ 0	with probability 1, (16.211)
lim sup		log
lim sup	n	log	p(Xn−1		\|	X−1	)
				0	\|	−∞
proving the lemma.

The arguments used in the proof can be extended to prove the AEP for the stock market (Theorem 16.5.3).

SUMMARY

Growth rate. The growth rate of a stock market portfolio b with respect to a distribution F (x) is deﬁned as

W (b, F )	=		log bt x dF (x)	=	E	log bt x .	(16.212)
	=			=

Log-optimal portfolio. The optimal growth rate with respect to a distribution F (x) is

W (F )	=	b	(16.213)
		max W (b, F ).

650 INFORMATION THEORY AND PORTFOLIO THEORY

The portfolio b that achieves the maximum of W (b, F ) is called the log-optimal portfolio.

Concavity. W (b, F ) is concave in b and linear in F . W (F ) is convex in F .

Optimality conditions. The portfolio b is log-optimal if and only if

1 if bi > 0,

b X

≤ 1 if bi = 0.

(16.214)

Expected ratio optimality. If Sn = i=1 b t Xi , Sn =

i=1 bit Xi , then

≤ 1

if and only if

E ln

≤ 0.

(16.215)

Growth rate (AEP)

log Sn → W (F )

with probability 1.

(16.216)

Asymptotic optimality

lim sup

log

≤ 0

with probability 1.

(16.217)

→∞

Wrong information. Believing g when f is true loses

W (b , F )

−

W (b

, F )

≤

D(f g).

(16.218)

Side information Y

W ≤ I (X; Y ).

(16.219)

Chain rule

W (X

, X

, . . . , X

i−1

)

max

E log bt

(16.220)

i |

= bi (x1,x2,...,xi−1)

W (X1, X2, . . . , Xn) = W (Xi |X1, X2, . . . , Xi−1).

(16.221)

i=1

SUMMARY 651

Growth rate for a stationary market.


W∞	= lim			W (X1, X2, . . . , Xn)
W∞	= lim			n
	1		log Sn → W∞.

		n

Competitive optimality of log-optimal portfolios.

Pr(V S ≥ U S ) ≤ 1 .

Universal portfolio.

where

Vn =

n1+···+nm

For m = 2,

		n		bˆ t (xi−1)xi
n		i	=				n
				n	t
max min			i=1 b xi			=	V ,
bˆ i (·) x ,b
	n				2−nH (n1/n,...,nm/n) −1.

=n n1, n2, . . . , nm

Vn 2/π n

(16.222)

(16.223)

(16.224)

(16.225)

(16.226)

(16.227)

The causal universal portfolio

bS (

, xi ) dµ(b)

bˆ i

1(xi )

achieves

= Si (b, x ) dµ(b)

)

(xn) ≥

√

for all n and all xn.

(16.228)

(16.229)

AEP. If {Xi } is stationary ergodic, then
1
− n log p(X1	, X2, . . . , Xn) → H (X)	with probability 1. (16.230)

652 INFORMATION THEORY AND PORTFOLIO THEORY

PROBLEMS

16.1Growth rate. Let

		(1, a)	with probability	1
X		(1, a)	with probability	2
X	=			1
	=	(1, 1/a)	with probability
		(1, 1/a)	with probability	2

where a > 1. This vector X represents a stock market vector of cash vs. a hot stock. Let

W (b, F ) = E log bt X

and

W = max W (b, F )

be the growth rate.

(a)Find the log optimal portfolio b .

(b)Find the growth rate W .

(c)Find the asymptotic behavior of

Sn = bt Xi i=1

for all b.

16.2Side information. Suppose, in Problem 16.1, that

Y = 1 if (X1, X2) ≥ (1, 1), 0 if (X1, X2) ≤ (1, 1).

Let the portfolio b depend on Y. Find the new growth rate W and verify that W = W − W satisﬁes

W ≤ I (X; Y ).

16.3Stock dominance. Consider a stock market vector

X = (X1, X2).

Suppose that X1 = 2 with probability 1. Thus an investment in the ﬁrst stock is doubled at the end of the day.

PROBLEMS 653

(a)Find necessary and sufﬁcient conditions on the distribution of

stock X2 such that the log-optimal portfolio b invests all the wealth in stock X2 [i.e., b = (0, 1)].

(b)Argue for any distribution on X2 that the growth rate satisﬁes W ≥ 1.

16.4Including experts and mutual funds . Let X F (x), x Rm+, be

the vector of price relatives for a stock market. Suppose that an “expert” suggests a portfolio b. This would result in a wealth

factor bt X. We

add this to the stock alternatives to form X˜

(X1, X2, . . . , Xm, bt X). Show that the new growth rate,

max

m 1

ln(bt

) dF (

,...,b

(16.231)

is equal to the old growth rate,

= b1,...,bm

ln b

t x) dF (x).

(16.232)

max

(

16.5Growth rate for symmetric distribution. Consider a stock vec-

tor X F (x), X Rm, X ≥ 0, where the component stocks are exchangeable. Thus, F (x1, x2, . . . , xm) = F (xσ (1), xσ (2), . . . , xσ (m)) for all permutations σ .

(a)Find the portfolio b optimizing the growth rate and establish its optimality. Now assume that X has been normalized so

that	1	m	Xi = 1, and F is symmetric as before.
		i=1
	m

(b) Again assuming X to be normalized, show that all symmetric distributions F have the same growth rate against b .

(c)Find this growth rate.

16.6Convexity . We are interested in the set of stock market densities

that yield the same optimal porfolio. Let Pb0 be the set of all probability densities on Rm+ for which b0 is optimal. Thus, Pb0 = {p(x) : ln(bt x)p(x) dx is maximized by b = b0}. Show that Pb0 is a convex set. It may be helpful to use Theorem 16.2.2.

16.7Short selling . Let

X		(1, 1 ),			p.
	=	(	, 2),	p,
	=	1	2	1 −

Let B = {(b1, b2) : b1 + b2 = 1}. Thus, this set of portfolios B does not include the constraint bi ≥ 0. (This allows short selling.)

654INFORMATION THEORY AND PORTFOLIO THEORY

(a)Find the log optimal portfolio b (p).

(b)Relate the growth rate W (p) to the entropy rate H (p).

16.8Normalizing x. Suppose that we deﬁne the log-optimal portfolio b to be the portfolio maximizing the relative growth rate

ln			bt x		dF (x1, . . . , xm).
ln	1		m		dF (x1, . . . , xm).
			i=1 xi
		m	i=1 xi
The virtue of the normalization					1	Xi , which can be viewed as
The virtue of the normalization					m	Xi , which can be viewed as
the wealth associated with a				uniform portfolio, is that the relative
the wealth associated with a
growth rate is ﬁnite even when the growth rate								ln bt xdF (x)
is not. This matters, for example, if						X	has a St.	Petersburg-like
is not. This matters, for example, if							has a St.

distribution. Thus, the log-optimal portfolio b is deﬁned for all distributions F , even those with inﬁnite growth rates W (F ).

(a) Show that if b maximizes ln(bt x) dF (x), it also maximizes

ln ubtt xx dF (x), where u = ( m1 , m1 , . . . , m1 ).

(b)Find the log optimal portfolio b for

X =	2k	+		2k		1		2−(k+1)
	2k		1	,	2	2k	),	(k	1)	,
	(2			,	2		),			,
	(2	, 2			+ ),			2− +		,

where k = 1, 2, . . . .

(d)Argue that b is competitively better than any portfolio b in the sense that Pr{bt X > cb t X} ≤ 1c .

16.9Universal portfolio. We examine the ﬁrst n = 2 steps of the implementation of the universal portfolio in (16.7.2) for µ(b) uniform for m = 2 stocks. Let the stock vectors for days 1 and 2 be

x1 = (1, 12 ), and x2 = (1, 2). Let b = (b, 1 − b) denote a portfolio.

	2
(a) Graph S2(b) = i=1 bt xi , 0 ≤ b ≤ 1.
(b)	Calculate S2 = maxb S2(b).
(c)	Argue that log S2(b) is concave in b.
(d) Calculate the (universal) wealth Sˆ2 = 01 S2(b)db.
(e)	Calculate the universal portfolio at times n = 1 and n = 2:

ˆ = d b1 b b

<<< < Предыдущая 54 55 56 57 58 59 60 61 62 63 64 65 66 6768 / 7868 69 70 71 72 73 74 75 76 77 78 > Следующая >>>

Соседние файлы в папке Теория информации

#
09.08.201310.58 Mб214Cover T.M., Thomas J.A. Elements of Information Theory. 2006., 748p.pdf
#
09.08.20131.32 Mб28Gray R.M. Entropy and information theory. 1990., 284p.pdf