 Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

# Теория информации / Gray R.M. Entropy and information theory. 1990., 284p

.pdf
Скачиваний:
21
Добавлен:
09.08.2013
Размер:
1.32 Mб
Скачать 10.5. D-BAR CONTINUOUS CHANNELS 199

length one, is a stationary channel, and g is a length m sliding block decoder. The probability of error for the resulting hookup is deﬂned by

Z

6 ^

Pe(„; ”; f; g) = Pr(U0 = U0) = „”(E) = d„(u)f (u)(Eu);

where E is the error event fu; y : u0 6= gm(Y¡qm)g and Eu = fy : (u; y) 2 Eg is the section of E at u.

Lemma 10.5.2: Given a stationary channel , a stationary source [G; „; U ], a length m sliding block decoder, and two encoders f and `, then for any positive integer r

jPe(„; ”; f; g) ¡ Pe(„; ”; `; g)j

 m max „ r r sup r (”x ; ”x0 ): • r + r Pr(f 6= `) + m ar 2 Ar dr x;x02c(a ) Proof: Deﬂne ⁄ = fu : f (u) = `(u)g and r¡1 i\ ⁄r = fu : f (T iu) = `(T iu); i = 0; 1 ¢ ¢ ¢ ; r ¡ 1g = T i⁄: =0 From the union bound „(⁄rc ) • r„(⁄c) = rPr(f 6= `): (10.12) From stationarity, if g = gm(Y m ) then ¡q Pe(„; ”; f; g) = Z d„(u)”f (u)(y : gm(y¡mq) 6= u0) 1 r¡1 d„(u)”f (u)(y : gm(yim¡q) 6= u0) = r i=0 Z X m 1 r¡q (10.13) • r + r i=q Z⁄r d„(u)”fr(u)(yr : gm(yim¡q) 6= ui) + „(⁄rc ): X „ yr pu(yr; wr) = ”`r(u) r P r r r r (wr), and Fix u 2 ⁄r and let pu yield dr(”f (u);`(u)); that is, wr pu(y ; w ) = ”f (u)(y ),

P

1

1 X

r

i=0

 r ; w r „ r (10.14) pu(y : yi 6= wi) = dr(”f (u);`(u)):

We have that

 1 r¡q 1 r¡q X X r ”fr(u)(yr : gm(yim¡q) 6= ui) = r pu(yr; wr : gm(yim¡q) 6= ui) i=q i=q 200 CHAPTER 10. DISTORTION 1 r¡q 1 r¡q X X • r i=q pu(yr; wr : gm(yim¡q) 6= wim¡q) + r pu(yr; wr : gm(wim¡q) 6= ui) i=q
 1 r¡q • X pu(yr; wr : yir¡q 6= wir¡q) + Pe(„; ”; `; g) r i=q 1 r¡q i¡q+m • X X pu(yr; wr : yj 6= wj ) + Pe(„; ”; `; g) r i=q j=i¡q „ r r • mdr (”f (u) ; ”`(u)) + Pe(„; ”; `; g);

which with (10.12)-(10.14) proves the lemma. 2

The following corollary states that the probability of error using sliding block

codes over a d-continuous channel is a continuous function of the encoder as measured by the metric on encoders given by the probability of disagreement of the outputs of two encoders.

Corollary 10.5.1: Given a stationary d-continuous channel and a ﬂnite length decoder gm : Bm ! A, then given † > 0 there is a – > 0 so that if f and ` are two stationary encoders such that Pr(f 6= g) , then

jPe(„; ”; f; g) ¡ Pe(„; ”; `; g)j • †:

Proof: Fix † > 0 and choose r so large that

 max sup „ (” r r ) • † d ; ” ; ar x;x02c(ar ) r x x0 3m

mr 3;

and choose = †=(3r). Then Lemma 10.5.2 implies that

jPe(„; ”; f; g) ¡ Pe(„; ”; `; g)j • †: 2

Given an arbitrary channel [A; ”; B], we can deﬂne for any block length

Na closely related CBI channel [A; ”;~ B] as the CBI channel with the same probabilities on output N -blocks, that is, the same conditional probabilities for YkNN given x, but having conditionally independent blocks. We shall call ~ the

N-CBI approximation to . A channel is said to be conditionally almost block independent or CABI if given there is an N0 such that for any N ‚ N0 there is an M0 such that for any x and any N -CBI approximation ~ to

 „ M M ) • †; all M ‚ M0; d(~”x ; ”x

where xM denotes the restriction of x to BBN , that is, the output distribution on Y N given x. A CABI channel is one such that the output distribution is close (in

a d sense) to that of the N -CBI approximation provided that N is big enough. CABI channels were introduced by Neuhoﬁ and Shields  who provided

 10.6. THE DISTORTION-RATE FUNCTION 201

several examples alternative characterizations of the class. In particular they

showed that ﬂnite memory channels are both d-continuous and CABI. Their

principal result, however, requires the notion of the d distance between channels.

0

Given two channels [A; ”; B] and [A; ” ; B], deﬂne the d distance between the

channels to be

0 n 0N d(”; ” ) = lim sup sup d(x ; ” x ):

n!1 x

Neuhoﬁ and Shields  showed that the class of CABI channels is exactly

the class of primitive channels together with the d limits of such channels.

10.6The Distortion-Rate Function

We close this chapter on distortion, approximation, and performance with the introduction and discussion of Shannon’s distortion-rate function. This function (or functional) of the source and distortion measure will play a fundamental role in evaluating the OPTA functions. In fact, it can be considered as a form of information theoretic OPTA. Suppose now that we are given a source [A; „]

 and a ﬂdelity criterion ‰n; n = 1; 2; ¢ ¢ ¢ deﬂned on A £ ^ ^ A, where A is called

the reproduction alphabet. Then the Shannon distortion rate function (DRF) is deﬂned in terms of a nonnegative parameter called rate by

 D(R; „) = lim sup 1 DN (R; „N ) N!1 N where DN (R; „N ) = inf EpN ‰N (XN ; Y N ); pN 2RN (R;„N ) where N is the collection of all distributions pN for the coordinate RN (R; „ ) N and Y N on the space (A N ^N N N random vectors X £ A , BA £ BA^ ) with the properties that N N ; that is, p N ^N N induces the given marginal „ (A F ) = „ (F ) for all (1) pF 2 BAN , and £ (2) the mutual information satisﬂes 1 N ^ N IpN (X ; X ) • R: N

If RN (R; „N ) is empty, then DN (R; „N ) is 1. DN is called the Nth order

 distortion-rate function. Lemma 10.6.1: DN (R; „) and D(R; „) are nonnegative convex functions of R and hence are continuous in R for R > 0. S

Proof: Nonnegativity is obvious from the nonnegativity of distortion. Suppose that pi 2 RN (Ri; „N ); i = 1; 2 yields

Epi N (XN ; Y N ) • DN (Ri; „) + †:

 202 CHAPTER 10. DISTORTION From Corollary 5.5.5 mutual information is a convex function of the condi- tional distribution and hence if p„ = ‚p1 + (1 ¡ ‚)p2, then S

Ip• ‚Ip1 + (1 ¡ ‚)Ip2 • ‚R1 + (1 ¡ ‚)R2

and hence p2 RN (‚R1 + (1 ¡ ‚)R2) and therefore

DN (‚R1 + (1 ¡ ‚)R2) • EpN (XN ; Y N )

= ‚Ep1 N (XN ; Y N ) + (1 ¡ ‚)Ep2 N (XN ; Y N )

• ‚DN (R1; „) + (1 ¡ ‚)DN (R2; „):

Since D(R; „) is the limit of DN (R; „), it too is convex. It is well known from real analysis that convex functions are continuous except possibly at their end points. 2

The following lemma shows that when the underlying source is stationary and the ﬂdelity criterion is subadditive (e.g., additive), then the limit deﬂning D(R; „) is an inﬂmum.

Lemma 10.6.2: If the source is stationary and the ﬂdelity criterion is

 subadditive, then D(R; „) = lim DN (R; „) = inf 1 DN (R; „): N!1 N N Proof: Fix N and n < N and let pn 2 Rn(R; „n) yield Epn ‰n(Xn; Y n) • Dn(R; „n) + † 2 and let pN¡n 2 RN¡n(R; „N¡n) yield EpN ¡n ‰N¡n(XN¡n; Y N¡n) • DN¡n(R; „N¡n) + † : 2

pn together with n implies a regular conditional probability q(F jxn), F 2 Bn .

^

A

Similarly pN¡n and N¡n imply a regular conditional probability r(GjxN¡n). Deﬂne now a regular conditional probability t(¢jxN ) by its values on rectangles as

t(F £ GjxN ) = q(F jxn)r(GjxNn ¡n); F 2 Bn^; G 2 BN^ ¡n:

A A

Note that this is the ﬂnite dimensional analog of a block memoryless channel with two blocks. Let pN = N t be the distribution induced by and t. Then exactly as in Lemma 9.4.2 we have because of the conditional independence that

IpN (XN ; Y N ) • IpN (Xn; Y n) + IpN (XnN¡n; YnN¡n)

and hence from stationarity

IpN (XN ; Y N ) • Ipn (Xn; Y n) + IpN ¡n (XN¡n; Y N¡n)

 10.6. THE DISTORTION-RATE FUNCTION 203 • nR + (N ¡ n)R = N R so that pN 2 RN (R; „N ). Thus DN (R; „N ) EpN ‰N (XN ; Y N ) EpN ‰n(Xn; Y n) + ‰N¡n(XnN¡n; YnN¡n) • = Epn ‰n(X n ; Y n • ¡ N n ; Y N n ¢ ) + EpN ¡n ‰N¡n(X ¡ ¡ )

• Dn(R; „n) + DN¡n(R; „N¡n) + †:

Thus since is arbitrary we have shown that if dn = Dn(R; „n), then

dN • dn + dN¡n; n • N ;

that is, the sequence dn is subadditive. The lemma then follows immediately from Lemma 7.5.1 of . 2

As with the „ distance, there are alternative characterizations of the distortionrate function when the process is stationary. The remainder of this section is devoted to developing these results. The idea of an SBM channel will play an important role in relating nth order distortion-rate functions to the process deﬂnitions. We henceforth assume that the input source is stationary and we conﬂne interest to additive ﬂdelity criteria based on a per-letter distortion

= 1.

The basic process DRF is deﬂned by

 „ ; Y0); Ds(R; „) = inf Ep‰(X0 „ p2Rs (R;„)
 „ where Rs(R; „) is the collection of all stationary processes p having „ as an „ „

input distribution and having mutual information rate Ip = Ip(X; Y ) • R. The

original idea of a process rate-distortion function was due to Kolmogorov and his colleagues   (see also ). The idea was later elaborated by Marton  and Gray, Neuhoﬁ, and Omura .

Recalling that the L1 ergodic theorem for information density holds when

Ip = Ip ; that is, the two principal deﬂnitions of mutual information rate yield the same value, we also deﬂne the process DRF

Ds(R; „) = inf Ep(X0; Y0);

p2Rs (R;„)

where Rs (R; „) is the collection of all stationary processes p having as an

input distribution, having mutual information rate Ip R, and having Ip = Ip .

If is both stationary and ergodic, deﬂne the corresponding ergodic process

 DRF’s by „ inf Ep‰(X0; Y0); De(R; „) = „ p2Re (R;„) De⁄(R; „) = inf Ep‰(X0; Y0); „ „ p2Re⁄(R;„)

where Re(R; „) is the subset of Rs(R; „) containing only ergodic measures and Re (R; „) is the subset of Rs (R; „) containing only ergodic measures.

 204 CHAPTER 10. DISTORTION Theorem 10.6.1: Given a stationary source which possesses a reference letter in the sense that there exists a letter a⁄ 2 A^ such that E„‰(X0; a⁄) • ‰⁄ < 1: (10.15) Fix R > 0. If D(R; „) < 1, then D(R; „) = D„s(R; „) = Ds⁄(R; „): If in addition „ is ergodic, then also D(R; „) = D„e(R; „) = De⁄(R; „): The proof of the theorem depends strongly on the relations among distortion

and mutual information for vectors and for SBM channels. These are stated and proved in the following lemma, the proof of which is straightforward but somewhat tedious. The theorem is proved after the lemma.

 Lemma 10.6.3: Let „ be the process distribution of a stationary source fXng. Let ‰n; n = 1; 2; ¢ ¢ ¢ be a subadditive (e.g., additive) ﬂdelity criterion. Suppose that there is a reference letter a⁄ A^ for which (10.15) holds. Let pN be a measure on (A N ^N N N 2 N N (F £ £A , BA £BA^ ) having „ as input marginal; that is, p ^N ) = „ N N . Let q denote the induced conditional probability A (F ) for F 2 BAN 2 A N N , is a regular conditional probability measure; that is, qxN (F ), x , F 2 BA^ measure. (This exists because the spaces are standard.) We abbreviate this relationship as pN = „N q. Let XN ; Y N denote the coordinate functions on A N ^N and suppose that £ A 1 ‰N (XN ; Y N ) • D (10.16) EpN N and 1 IpN (XN ; Y N ) • R: (10.17) N

If is an (N; ) SBM channel induced by q as in Example 9.4.11 and if p = „” is the resulting hookup and fXn; Yng the input/output pair process, then

 1 Ep‰N (XN ; Y N ) • D + ‰⁄– (10.18) N and I„ (X; Y ) = I⁄(X; Y ) • R; (10.19) p p

that is, the resulting mutual information rate of the induced stationary process satisﬂes the same inequality as the vector mutual information and the resulting distortion approximately satisﬂes the vector inequality provided is su–ciently small. Observe that if the ﬂdelity criterion is additive, the (10.18) becomes

Ep1(X0; Y0) • D + –:

 10.6. THE DISTORTION-RATE FUNCTION 205

Proof: We ﬂrst consider the distortion as it is easier to handle. Since the SBM channel is stationary and the source is stationary, the hookup p is stationary and

 n Ep‰n(Xn; Y n) = n Z dmZ (z)Epz ‰n(Xn; Y n); 1 1

where pz is the conditional distribution of fXn; Yng given fZng. Note that the above formula reduces to Ep(X0; Y0) if the ﬂdelity criterion is additive because of the stationarity. Given z, deﬂne J0n(z) to be the collection of indices of zn for which zi is not in an N -cell. (See the discussion in Example 9.4.11.) Let J1n(z) be the collection of indices for which zi begins an N -cell. If we deﬂne the event G = fz : z0 begins an N ¡ cellg, then i 2 J1n(z) if T iz 2 G. From Corollary 9.4.3 mZ (G) • N ¡1. Since is stationary and fXng and fZng are mutually independent,

 2X 2X nEpz ‰n(Xn; Y n) • Epz ‰(Xi; a⁄) + N Epz ‰(XiN ; YiN ) i Jn(z) i Jn(z) 0 1 n¡1 n¡1 X X = 1Gc (T iz)‰⁄ + EpN ‰N 1G(T iz): i=0 i=0

Since mZ is stationary, integrating the above we have that

Ep1(X0; Y0) = mZ (Gc) + N mZ (G)EpN N

• ‰+ EpN N ;

proving (10.18).

^

Let rm and tm denote asymptotically accurate quantizers on A and A; that is, as in Corollary 6.2.1 deﬂne

 ^ n = rm(X) n = (rm(X0); ¢ ¢ ¢ ; rm(Xn¡1)) X ^ n n . Then and similarly deﬂne Y = tm(Y ) I(rm(X)n; tm(Y )n) m! I(Xn; Y n) !1 and I„(rm(X); tm(Y )) m! I⁄(X; Y ): We wish to prove that !1 „ 1 n n I(X; Y ) = lim lim I(rm(X) ; tm(Y ) ) n!1 m!1 n = lim lim 1 I(rm(X)n; tm(Y )n): = I⁄(X; Y ) m!1 n!1 n Since I„ ‚ I⁄, we must show that 1 lim lim I(rm(X)n; tm(Y )n) n!1 m!1 n
 206 CHAPTER 10. DISTORTION lim 1 n n nlim n I(rm(X) ; tm(Y ) ): • m !1 We have that !1 ^ n ^ n ^ n ; Z n ^ n ) ¡ I(Z n ^ n ^ n ) I(X ; Y ) = I((X ); Y ; Y jX and ^ n ; Z n ^ n ^ n ^ n jZ n ^ n ; Z n ^ n ^ n jZ n ) I((X ); Y ) = I(X ; Y ) + I(Y ) = I(X ; Y ^ n and Z n are independent. Similarly, since X I(Z n ^ n ^ n ) = H(Z n ^ n ) ¡ H(Z n ^ n ^ n ) ; Y jX jX jX ; Y = H(Z n ) ¡ H(Z n ^ n ^ n ) = I(Z n ^ n ^ n )): jX ; Y ; (X ; Y Thus we need to show that ¶ n!1 m!1 µ n m m j ¡ n m m lim lim 1 I(r (X)n; t (Y )n Zn) 1 I(Zn; (r (X)n; t (Y )n)) • m!1 n!1 µ n m m j ¡ n m m ¶ lim lim 1 I(r (X)n; t (Y )n Zn) 1 I(Zn; (r (X)n; t (Y )n)) :

Since Zn has a ﬂnite alphabet, the limits of n¡1I(Zn; (rm(X)n; tm(Y )n)) are

the same regardless of the order from Theorem 6.4.1. Thus I will equal I if we can show that

 „ 1 I(r n n n lim m(X) ; tm(Y ) jZ ) n I(X; Y jZ) = nlim m !1 !1 lim lim 1 I(r (X)n; t (Y )n Zn) = I⁄(X; Y Z): (10.20) m m • m !1 n !1 n j j This we now proceed to do. From Lemma 5.5.7 we can write I(rm(X)n; tm(Y )njZn) = Z I(rm(X)n; tm(Y )njZn = zn) dPZn (zn): n n Z n = z n ) to I ^ n ^ n ). This is simply the Abbreviate I(rm(X) (X ; Y ; tm(Y ) j n ^ n z ^ n ^ n ^ ) mutual information between X and Y under the distribution for (X ; Y given a particular random blocking sequence z. We have that ^ n ^ n ) = Hz ^ n ^ n ^ n ): Iz (X ; Y (Y ) ¡ Hz (Y jX

Given z, let J0n(z) be as before. Let J2n(z) denote the collection of all indices i of zi for which zi begins an N cell except for the ﬂnal such index (which may begin an N -cell not completed within zn). Thus J2n(z) is the same as J1n(z) except that the largest index in the latter collection may have been removed 10.6. THE DISTORTION-RATE FUNCTION 207

if the resulting N -cell was not completed within the n-tuple. We have using standard entropy relations that

 Iz (X ; Y ) ‚ i J0n(z) ‡Hz (YijY ) ¡ Hz (YijY ; X )· ^ n ^ n 2X ^ ^ i ^ ^ i ^ i+1 + i J2n(z) ‡Hz (Yi jY ) ¡ Hz (Yi jY ; X )· : (10.21) 2X ^ N ^ i ^ N ^ i ^ i+N For i 2 J0n(z), however, Yi is a⁄ with probability one and hence ^ ^ i ^ Hz (YijY ) • Hz (Yi) • Hz (Yi) = 0 and ^ ^ i ^ i+1 ^ Hz (YijY ; X ) • Hz (Yi) • Hz (Yi) = 0: Thus we have the bound Iz (X ; Y ) ‚ i J2n(z) ‡Hz (Yi jY ) ¡ Hz (Yi jY ; X )· : ^ n ^ n 2X ^ N ^ i ^ N ^ i ^ i+N = X ‡Iz (Y^iN ; (Y^ i; X^ i + N )) ¡ Iz (Y^iN ; Y^ i)·

i2J2n(z)

 ‚ i J2n(z) ‡Iz (Yi ; Xi ) ¡ Iz (Yi ; Y )· ; (10.22) 2X ^ N ^ N ^ N ^ i

where the last inequality follows from the fact that I(U ; (V; W )) ‚ I(U ; V ). For i 2 J2n(z) we have by construction and the stationarity of that

 ^ N ^ N ^ N ^ N ): (10.23) Iz (Xi ; Yi ) = IpN (X ; Y

As before let G = fz : z0 begins an N ¡ cellg. Then i 2 J2n(z) if T iz 2 G and i < n ¡ N and we can write

 1 ^ n ^ n 1 ^ N ^ N n¡N¡1 i Iz (X ; Y ) ‚ IpN (X ; Y ) X 1G(T z) n n i=0 1 n¡N¡1 ^ N ^ i i ¡ X Iz (Yi ; Y )1G(T z): n i=0

All of the above terms are measurable functions of z and are nonnegative. Hence they are integrable (although we do not yet know if the integral is ﬂnite) and we have that

 1 ^ n ^ n n ^ n I(X ; Y ) ‚ Ip (X

N ^ N n ¡ N

; Y )mZ (G)

n

 1 n¡N¡1 Z dmZ (z)Iz (Y^iN ; Y^ i)1G(T iz): ¡n i=0 X 208 CHAPTER 10. DISTORTION

To continue we use the fact that since the processes are stationary, we can consider it to be a two sided process (if it is one sided, we can imbed it in a two sided process with the same probabilities on rectangles). By construction

 ^ N ^ i ^ N ¢ ¢ ¢ ; Y¡1)) Iz (Yi ; Y ) = IT iz (Y0 ; (Y¡i; and hence since mZ is stationary we can change variables to obtain 1 I(X^ n; Y^ n) ‚ I n (X^ N ; Y^ N )m (G) n ¡ N n p Z n 1 n¡N¡1 Z dmZ (z)Iz (Y^0N ; (Y^¡i; ¢ ¢ ¢ ; Y^¡1))1G(z): ¡n i=0 X We obtain a further bound from the inequalities I (Y^ N ; (Y^ ; ¢ ¢ ¢ ; Y^ )) • I (Y N ; (Y ¡i ; ¢ ¢ ¢ ; Y )) • I (Y N ; Y ¡) z 0 ¢ ¢ ¢ ¡i ¡1 z 0 ¡1 z 0 where Y ¡ = ( ¡2 ; Y ¡1 z 0 ; Y ). Since I (Y N ; Y ¡) is measurable and nonnegative,

its integral is deﬂned and hence

 n!1 n j ‚ p Z ¡ ZG Z z 0 1 ^ n ^ n n ) n ^ N ^ N )m (G) dm (z)I (Y N ; Y lim I(X ; Y Z I (X ; Y

We can now take the limit as m ! 1 to obtain

Z

I(X; Y jZ) ‚ Ipn (XN ; Y N )mZ (G) ¡ dmZ (z)Iz (Y0N ; Y ¡):

G

This provides half of what we need.

Analogous to (10.21) we have the upper bound

¡):

(10.24)

X

^ n ^ n ^ N ^ i ^ i+N

Iz (X ; Y ) Iz (Yi ; (Y ; X

·

¡ ^ N ^ i

)) Iz (Yi ; Y ) (10.25)

i2J1n(z)

We note in passing that the use of J1 here assumes that we are dealing with a one sided channel and hence there is no contribution to the information from any initial symbols not contained in the ﬂrst N -cell. In the two sided case time 0 could occur in the middle of an N -cell and one could ﬂx the upper bound by adding the ﬂrst index less than 0 for which zi begins an N -cell to the above sum. This term has no aﬁect on the limits. Taking the limits as m ! 1 using Lemma 5.5.1 we have that

 Iz (Xn; Y n) • X Iz (YiN ; (Y i; Xi+N )) ¡ Iz (YiN ; Y i) : ¡ ¢

i2J1n(z)

Given Zn = zn and i 2 J1n(z), (Xi; Y i) ! XiN ! YiN forms a Markov chain because of the conditional independence and hence from Lemma 5.5.2 and Corol-

lary 5.5.3

Iz (YiN ; (Y i; Xi+N )) = Iz (XiN ; YiN ) = IpN (XN ; Y N ):