Добавил:

fench Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Сумский государственный педагогический университет им. Макаренко

Предмет:

Теория вероятностей и математическая статистика

Файл:

Теория информации / Cover T.M., Thomas J.A. Elements of Information Theory. 2006., 748p

.pdf

Скачиваний:

214

Добавлен:

09.08.2013

Размер:

10.58 Mб

Скачать

☆

<<< < Предыдущая 54 55 56 57 58 59 60 61 62 63 64 65 6667 / 7867 68 69 70 71 72 73 74 75 76 77 78 > Следующая >>>

16.7

UNIVERSAL PORTFOLIOS

635

ﬁnd by differentiating with respect to b that the maximum value

w (j n)

b 1

bk (1

−

b)n−k

(16.111)

max

≤ ≤

n − k

n−k ,

(16.112)

which is achieved by

n − k

(16.113)

w (j n) > 1, reﬂecting

the fact that the amount “bet” on

Note

that

j n

chosen

in hindsight,

thus

relieving the hindsight

investor

of the

responsibility of allocating his investments w (j n) to sum to 1. The causal investor has no such luxury. How can the causal investor choose initial investments w(jˆ n), w(jˆ n) = 1, to protect himself from all possible j n and hindsight-determined w (j n)? The answer will be to choose w(jˆ n) proportional to w (j n). Then the worst-case ratio of w(jˆ n)/w (j n) will be maximized. To proceed, we deﬁne Vn by

= n

k(j n)

n − k(j n)

n−k(j n)

(16.114)

n − k

n−k

(16.115)

= k n

k=0

and let

w(jˆ

k(j n)

n − k(j n)

n−k(j n) .

(16.116)

It is clear that w(jˆ n) is a legitimate distribution of wealth over the 2n stock sequences (i.e., w(jˆ n) ≥ 0 and j n w(jˆ n) = 1). Here Vn is the normalization factor that makes w(jˆ n) a probability mass function. Also, from (16.109) and (16.113), for all sequences xn,

)

w(jˆ

)

Sn(x

min

Sn (xn) ≥

w (j n)

j n

min

Vn( nk )k ( n−n k )n−k

k (1

−

≥ Vn,

(16.117)

(16.118)

(16.119)

636 INFORMATION THEORY AND PORTFOLIO THEORY

where (16.117) follows from (16.109) and (16.119) follows from (16.112). Consequently, we have

		ˆ	n	)
max min		Sn	(x	)	V .	(16.120)
max min		S	(xn) ≥		V .	(16.120)
ˆ	xn	S	(xn) ≥		n
b		n

We have thus demonstrated a portfolio on the 2n possible sequences of

length that achieves wealth ˆn xn within a factor n of the wealth n S ( ) V

Sn (xn) achieved by the best constant rebalanced portfolio in hindsight. To complete the proof of the theorem, we show that this is the best possible, that is, that any nonanticipating portfolio bi (xi−1) cannot do better than a factor Vn in the worst case (i.e., for the worst choice of xn). To prove this, we construct a set of extremal stock market sequences and show that the performance of any nonanticipating portfolio strategy is bounded by Vn for at least one of these sequences, proving the worst-case bound.

n	{			}	n, we deﬁne the corresponding extremal stock mar-
For each j n			1,	2	n, we deﬁne the corresponding extremal stock mar-
ket vector x (j		n	) as
ket vector x (j			) as
							(1, 0)t	if j	1,
				xi	(ji )	=	(0, 1)t	if ji =	2,	(16.121)
						=		i
								=
Let e1 = (1, 0)t , e2 = (0, 1)t							be standard basis vectors. Let
			K = {x(j n) : j n {1, 2}n, xiji						= eji }	(16.122)

be the set of extremal sequences. There are 2n such extremal sequences, and for each sequence at each time, there is only one stock that yields a nonzero return. The wealth invested in the other stock is lost. Therefore, the wealth at the end of n periods for extremal sequence xn(j n) is the product of the amounts invested in the stocks j1, j2, . . . , jn, [i.e., Sn(xn(j n)) = i bji = w(j n)]. Again, we can view this as an investment on sequences of length n, and given the 0 – 1 nature of the return, it is easy to see for xn K that

Sn(xn(j n)) = 1.

(16.123)

j n

For any extremal sequence xn(j n) K, the best constant rebalanced portfolio is

n	n	)) =		n (j n)		n (j n)		t	(16.124)
b (x (j		)) =		1 n	,	2 n		,	(16.124)

16.7 UNIVERSAL PORTFOLIOS

637

where n1(j n) is the number of occurrences of 1 in the sequence j n. The corresponding wealth at the end of n periods is

Sn (xn(j n))

n1(j n)

n2(j n)

w(jˆ n)

(16.125)

from (16.116) and it therefore follows that

Sn (xn) =

w(jˆ

n) =

(16.126)

x K

We then have the following inequality for any portfolio sequence {bi }ni=1, with Sn(xn) deﬁned as in (16.104):

min

Sn(xn)

xn K Sn (xn)

	S (xn)			Sn(xn)
≤ n		n	˜	˜
≤ n	xn	K	Sn (xn) Sn (xn)
x˜ K		K		˜
= n	Sn(x˜n)
= n	xn	K	Sn (xn)
x˜ K		K

(16.127)

(16.128)

1		(16.129)
=
	xn K Sn (xn)
= Vn,		(16.130)

where the inequality follows from the fact that the minimum is less than the average. Thus,

max min			Sn(xn)	V .	(16.131)
			S (xn) ≤
b	xn	K		n
			n

The strategy described in the theorem puts mass on all sequences of length n and is clearly dependent on n. We can recast the strategy in incremental terms (i.e., in terms of the amount bet on stock 1 and stock 2 at time 1), then, conditional on the outcome at time 1, the amount bet on each of the two stocks at time 2, and so on. Consider the weight

ˆi,1 assigned by the algorithm to stock 1 at time given the previous b i

sequence of stock vectors xi−1. We can calculate this by summing over all sequences j n that have a 1 in position i, giving

bˆ i,1(xi−1)

−

w(jˆ

−

i 1

−

) ,

(16.132)

)x(j i

j i Mi w(jˆ

)x(j − )

638 INFORMATION THEORY AND PORTFOLIO THEORY

where

w(jˆ

i ) =

n i

n w(j n)

(16.133)

:j j

is the weight put on all sequences j n that start with j i , and

i−1

x(j i−1) = xkjk

(16.134)

k=1

is the return on those sequences as deﬁned in (16.106).

Investigation of the asymptotics of Vn reveals [401, 496] that

m−1

(m/2)/√

(16.135)

for m assets. In particular, for m = 2 assets,

(16.136)

π n

and

≤ Vn

≤

(16.137)

√

n + 1

for all n [400]. Consequently, for m = 2 stocks, the causal portfolio strat-

ˆ i−1 ˆ n

egy bi (x ) given in (16.132) achieves wealth Sn(x ) such that

ˆ		n	)			1
Sn		(x		≥ Vn ≥	√				(16.138)
S		(xn)
						n		1
				2			+
	n

for all market sequences xn.

16.7.2Horizon-Free Universal Portfolios

We describe the horizon-free strategy in terms of a weighting of different portfolio strategies. As described earlier, each constantly rebalanced portfolio b can be viewed as corresponding to a mutual fund that rebalances the m assets according to b. Initially, we distribute the wealth among these funds according to a distribution µ(b), where dµ(b) is the amount of wealth invested in portfolios in the neighborhood db of the constantly rebalanced portfolio b.

16.7 UNIVERSAL PORTFOLIOS

639

Let

Sn(b, xn) = bt xi	(16.139)
i=1

be the wealth generated by a constant rebalanced portfolio b on the stock sequence xn. Recall that

S (xn)	max S (b, xn)		(16.140)
n	= b B	n

is the wealth of the best constant rebalanced portfolio in hindsight. We investigate the causal portfolio deﬁned by

bˆ i				B b	S (	b	, xi ) dµ(b)
		1(xi )			i	b		.
	+	1(xi )					i	.
	+		= B Si (b, x ) dµ(b)

We note that

ˆ t	i
bi+1	(x )xi+1

= B bt xi+1Si (b, xi ) dµ(b)

B Si (b, xi ) dµ(b) = B Si+1(b, xi+1) dµ(b) .

B Si (b, xi ) dµ(b)

(16.141)

(16.142)

(16.143)

Thus, the product	ˆ t	ˆ	n	)
Thus, the product	bi xi	telescopes and we see that the wealth Sn(x		)

resulting from this portfolio is given by

ˆ	n	) =
Sn(x

n
bˆ it (xi−1)xi	(16.144)
i=1

= Sn(b, xn) dµ(b). (16.145)

b B

There is another way to interpret (16.145). The amount given to portfolio manager b is dµ(b), the resulting growth factor for the manager rebalancing to b is S(b, xn), and the total wealth of this batch of investments is

	Sˆn(xn) =	Sn(b, xn) dµ(b).	(16.146)
		B
ˆ	, deﬁned in (16.141), is the performance-weighted total “buy
Then bi+1

order” of the individual portfolio manager b.

640 INFORMATION THEORY AND PORTFOLIO THEORY

So far, we have not speciﬁed what distribution µ(b) we use to apportion the initial wealth. We now use a distribution µ that puts mass on all possible portfolios, so that we approximate the performance of the best portfolio for the actual distribution of stock price vectors.

In the next lemma, we bound ˆn as a function of the initial wealth

S /Sn

distribution µ(b).

Lemma 16.7.2 Let Sn (xn) in 16.140 be the wealth achieved by the best

constant rebalanced portfolio and let ˆn xn in (16.144) be the wealth

S ( )

ˆ (·) achieved by the universal mixed portfolio b , given by

bˆ i

(

, xi ) dµ(b)

1(xi )

Then

= Si (b, x ) dµ(b)

)

dµ(b)

min

1 bji

) ≥

B =n

i=1 bji

Proof: As before, we can write

Sn (xn) = n

w (j n)x(j n),

(16.147)

(16.148)

(16.149)

where w (j n)

x(j n) = n=

i 1

n
= i=1 bji	is the amount invested on the sequence j n and

xiji is the corresponding return. Similarly, we can write

ˆn(xn) = bt xi dµ(b) (16.150)

i=1

bji xiji dµ(b)

(16.151)

j n

i=1

= n

w(jˆ

n)x(j n),

(16.152)

where w(jˆ n) =

i=1 bji dµ(b). Now applying Lemma 16.7.1, we have

w(jˆ

Sˆn(x )

j n

n)x(j n)

(16.153)

Sn (x ) =

)x(j

)

w n

)x(j

)

≥

min

w(jˆ

(16.154)

w (j n)x(j n)

j n

1 bji

dµ(b)

min

(16.155)

i=1 bji

16.7 UNIVERSAL PORTFOLIOS

641

We now apply this lemma when µ(b) is the Dirichlet( 12 ) distribution.

Theorem 16.7.2 For the causal universal portfolio bi ( ), i

1, 2, . . .,

given in (16.141), with m = 2 stocks and dµ(b) the Dirichlet(

2 ) distri-

bution, we have

)

≥

√

(x n)

for all n and all stock sequences x n.

Proof: As in the discussion preceding (16.112), we can show that the weight put by the best constant portfolio b on the sequence j n is

n	bji		k	k	n − k	n−k		2−nH (k/n),	(16.156)
		=	n				=
					n
i=1

where k is the number of indices where ji = 1. We can also explicitly calculate the integral in the numerator of (16.148) in Lemma 16.7.2 for the Dirichlet( 12 ) density, deﬁned for m variables as

	( m2 )			m
	( m2 )			b− 21 db,
dµ(b) =				j	(16.157)
dµ(b) =		1	m	j	(16.157)
		2		j =1

where (x) = 0∞ e−t t x−1 dt denotes the gamma function. For simplicity,

we consider	the case of two stocks, in which case
we consider
dµ(b) =		1	√	1	db,	0 ≤ b ≤ 1,	(16.158)

		π
				b(1 − b)

where b is the fraction of wealth invested in stock 1. Now consider any sequence j n {1, 2}n, and consider the amount invested in that sequence,

= bl (1 − b)n−l ,

b(j n) = bji

(16.159)

i=1

where l is the number of indices where ji = 1. Then

−

√

b(j n) dµ(b)

bl (1

b)n−l

(16.160)

b(1 − b)

ˆ n

Sn(x )

642 INFORMATION THEORY AND PORTFOLIO THEORY

bl− 21 (1

−

b)n−l− 21 db

(16.161)

= π

B l

, n

−

(16.162)

= π

where B(λ1, λ2) is the beta function, deﬁned as

B(λ1, λ2) = 0

1 xλ1−1(1 − x)λ2−1 dx

(16.163)

(λ1) (λ2)

(16.164)

(λ1 + λ2)

and

∞ xλ−1e−x dx.

(λ) = 0

(16.165)

Note that for any integer n, (n

n! and (n

1 )

1·3·5···(2n−1) √

We can calculate B(l + 21 , n − l + 21 )

by means of simple recursion

using integration by parts. Alternatively, using (16.164), we obtain

2n n

, n

−

n l

Combining all the results with Lemma 16.7.2, we have

≥ min

Sn (xn) j n

≥ min

B	n	dµ(b)
	i=n
	1 bji

i=1 bji

π1 B(l + 12 , n − l + 12 )

2−nH (l/n)

(16.166)

(16.167)

(16.168)

≥	1				,			(16.169)
		√
2			n + 1
using the results in [135, Theorem 2].
It follows for m = 2 stocks that
		ˆ			1
		Sn	≥					(16.170)
				√			Vn
		Sn
						2π

16.7 UNIVERSAL PORTFOLIOS

643

for all n and all market sequences x1, x2, . . . , xn. Thus, good minimax per-

√

formance for all n costs at most an extra factor 2π over the ﬁxed horizon minimax portfolio. The cost of universality is Vn, which is asymptotically negligible in the growth rate in the sense that

→

− n

≥ n

√2π

ln Sˆ

(xn)

ln S (xn)

(16.171)

Thus, the universal causal portfolio achieves the same asymptotic growth rate of wealth as the best hindsight portfolio.

Let’s now consider how this portfolio algorithm performs on two real stocks. We consider a 14-year period (ending in 2004) and two stocks, Hewlett-Packard and Altria (formerly, Phillip Morris), which are both components of the Dow Jones Index. Over these 14 years, HP went up by a factor of 11.8, while Altria went up by a factor of 11.5. The performance of the different constantly rebalanced portfolios that contain HP and Altria are shown in Figure 16.2. The best constantly rebalanced portfolio (which can be computed only in hindsight) achieves a growth of a factor of 18.7 using a mixture of about 51% HP and 49% Altria. The universal portfolio strategy described in this section achieves a growth factor of 15.7 without foreknowledge.

Value Sn (b) of initial investment

20
20
										S *
										S *
18										n
18
16
16
14
14
12
12
10
10
8
8
6
6
4
4
2
2
0

0	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	0.9	1
			Proportion b of wealth in HPQ

FIGURE 16.2. Performance of different constant rebalanced portfolios b for HP and Altria.

644 INFORMATION THEORY AND PORTFOLIO THEORY

16.8 SHANNON–MCMILLAN–BREIMAN THEOREM (GENERAL AEP)

The AEP for ergodic processes has come to be known as the Shannon – McMillan – Breiman theorem. In Chapter 3 we proved the AEP for i.i.d. processes. In this section we offer a proof of the theorem for a general ergodic process. We prove the convergence of n1 log p(Xn) by sandwiching it between two ergodic sequences.

In a sense, an ergodic process is the most general dependent process for which the strong law of large numbers holds. For ﬁnite alphabet processes, ergodicity is equivalent to the convergence of the kth-order empirical distributions to their marginals for all k.

The technical deﬁnition requires some ideas from probability theory. To be precise, an ergodic source is deﬁned on a probability space ( , B, P ), where B is a σ -algebra of subsets of and P is a probability measure. A random variable X is deﬁned as a function X(ω), ω , on the probability space. We also have a transformation T : → , which plays the role of a time shift. We will say that the transformation is stationary if P (T A) = P (A) for all A B. The transformation is called ergodic if every set A such that T A = A, a.e., satisﬁes P (A) = 0 or 1. If T is stationary and ergodic, we say that the process deﬁned by Xn(ω) = X(T nω) is stationary and ergodic. For a stationary ergodic source, Birkhoff’s ergodic theorem states that

n			→		=
1	n	Xi (ω)		EX		X dP with probability 1.	(16.172)
		Xi (ω)		EX		X dP with probability 1.	(16.172)

i=1

Thus, the law of large numbers holds for ergodic processes. We wish to use the ergodic theorem to conclude that

n−1

− n1 log p(X0, X1, . . . , Xn−1) = − n1 log p(Xi |X0i−1)

i=0

→ n

−

log p(X

lim E[

Xn−1)]. (16.173)

→∞

But the stochastic sequence p(X

i |

Xi−1) is

not

ergodic. However,

the

Xi−1 ) are ergodic

closely related quantities p(X

i |

Xi−1) and

p(X

and

i−k

i |

−∞

have expectations easily identiﬁed as entropy rates. We plan to sandwich p(Xi |X0i−1) between these two more tractable processes.

<<< < Предыдущая 54 55 56 57 58 59 60 61 62 63 64 65 6667 / 7867 68 69 70 71 72 73 74 75 76 77 78 > Следующая >>>

Соседние файлы в папке Теория информации

#
09.08.201310.58 Mб214Cover T.M., Thomas J.A. Elements of Information Theory. 2006., 748p.pdf
#
09.08.20131.32 Mб28Gray R.M. Entropy and information theory. 1990., 284p.pdf