Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Теория информации / Gray R.M. Entropy and information theory. 1990., 284p

.pdf
Скачиваний:
32
Добавлен:
09.08.2013
Размер:
1.32 Mб
Скачать

11.7. SLIDING BLOCK SOURCE CODES

239

components. Since the components are ergodic, however, an inflnite length sliding block encoder could select such a source sequence in a simple (if impractical) way: Proceed as in the proof of the theorem up to the use of Corollary 9.4.2. Instead of using this result, we construct by brute force a punctuation sequence for the ergodic component in efiect. Suppose that G = fGi; i = 1; 2; ¢ ¢ ¢g is a countable generating fleld for the input sequence space. Given , the inflnite length sliding block encoder flrst flnds the smallest value of i for which

1

0 < lim 1 X 1Gi (T kx);

n!1 n

k=0

and

1

lim 1 X 1Gi (T kx)(xk; a) • –;

n!1 n

k=0

that is, we flnd a set with strictly positive relative frequency (and hence strictly positive probability with respect to the ergodic component in efiect) which occurs rarely enough to ensure that the sample average distortion between the symbols produced when Gi occurs and the reference letter is smaller than . Given N and there must exist an i for which these relations hold (apply the proof of Lemma 9.4.4 to the ergodic component in efiect with ° chosen to satisfy (11.23) for that component and then replace the arbitrary set G by a set in the generating fleld having very close probability). Analogous to the proof of Lemma 9.4.4 we construct a punctuation sequence fZng using the event Gi in place of G. The proof then follows in a like manner except that now from the dominated convergence theorem we have that

1

E(12(Z0)(X0; a)) = lim 1 X E(12(Zi)(Xi; a)

n!1 n

i=0

1

= E( lim 1 X 12(Zi)(Xi; a)) • –

n!1 n

i=0

by construction.

The above argument is patterned after that of Davisson and Gray [27] and extends the theorem to stationary nonergodic sources if inflnite window sliding block encoders are allowed. We can then approximate this encoder by a flnitewindow encoder, but we must make additional assumptions to ensure that the resulting encoder yields a good approximation in the sense of overall distortion. Suppose that f is the inflnite window length encoder and g is the flnite windowlength (say 2L + 1) encoder. Let G denote a countable generating fleld of rectangles for the input sequence space. Then from Corollary 4.2.2 applied to G given † > 0 we can flnd for su–ciently large N a flnite window sliding block code r : A2N+1 ! B such that Pr(r 6= f 0) • †=(2L + 1), that is, the two encoders produce the same channel symbol with high probability. The issue is when does this imply that (f g; „) and (rg; „) are therefore also close, which

240

CHAPTER 11. SOURCE CODING THEOREMS

would complete the proof. Let r„ : AT block encoder induced by r, i.e., r„(x)

!B denote the inflnite-window sliding

=r(x2¡NN+1). Then

 

(f g; „) = E((X0; X^0)) = b

B2L+1 Zx2Vf (b) d„(x)(x0; g(b));

 

 

2X

where

 

 

 

Vf (b) = fx : f (x)2L+1 = bg;

where

f (x)2L+1 is shorthand for f (xi), i = ¡L; ¢ ¢ ¢ ; L, that is, the channel

(2L + 1)-tuple produced by the source using encoder x. We therefore have that

(„rg; „) b

B2L+1

Zx2Vf (b) d„(x)(x0; g(b))

 

 

 

 

2X

 

 

+ b

B2L+1 Zx2Vr(b)¡Vf (b) d„(x)(x0; g(b))

 

2X

B2L+1 Zx2Vr(b)¡Vf (b) d„(x)(x0; g(b))

= (f; „) + b

• ‰(f; „) + b

 

2X

 

 

 

B2L+1 Zx2Vr(bVf (b) d„(x)(x0; g(b)):

 

 

2X

 

 

By making N large enough, however, we can make

 

 

 

 

(Vr(f Vf (b))

 

 

 

 

 

 

^2L

+ 1 and hence force all of the

arbitrarily small simultaneously for all b 2 A

integrals above to be arbitrarily small by the continuity of integration. With Lemma 11.7.1 and Theorem 11.7.1 this completes the proof of the following theorem.

Theorem 11.7.2: Given an AMS source and an additive fldelity criterion with a reference letter,

SBC(R; „) = (R; „);

that is, the class of sliding block codes is capable of exactly the same performance as the class of block codes.

The sliding block source coding theorem immediately yields an alternative coding theorem for a code structure known as trellis encoding source codes wherein the sliding block decoder is kept but the encoder is replaced by a tree or trellis search algorithm such as the Viterbi algorithm [41]. The details of inferring the trellis encoding source coding theorem from the sliding-block source coding theorem can be found in [52].

11.8. A GEOMETRIC INTERPRETATION OF OPTA’S

241

11.8A Geometric Interpretation of OPTA’s

We close this chapter on source coding theorems with a geometric interpretation of the OPTA functions in terms of the „ distortion between sources. Suppose that is a stationary and ergodic source and that f‰ng is an additive fldelity criterion with a fldelity criterion. Suppose that we have a nearly optimal sliding

block encoder and decoder for and a channel with rate R, that is, if the overall

f ^ g process is Xn; Xn and

^

E‰(X0; X0) (R; „) + –:

If the overall hookup (source/encoder/channel/decoder) yields a distribution p

f ^ g f ^ g

on Xn; Xn and distribution · on the reproduction process Xn , then clearly

„(„; ·) • –(R; „) + –:

Furthermore, since the channel alphabet is B the channel process must have entropy rate less than R = log jjBjj and hence the reproduction process must also have entropy rate less than B from Corollary 4.2.5. Since is arbitrary,

(R; „) inf „(„; ·):

:( )

·H · R

Suppose next that p, and · are stationary and ergodic and that H(·) R.

Choose a stationary p having and · as coordinate processes such that

Ep(X0; Y0) • ‰„(„; ”) + –:

We have easily that I(X; Y ) H(·) R and hence the left hand side is bounded

below by the process distortion rate function Ds(R; „). From Theorem 10.6.1 and the block source coding theorem, however, this is just the OPTA function. We have therefore proved the following:

Theorem 11.8.1: Let be a stationary and ergodic source and let f‰ng be an additive fldelity criterion with a reference letter. Then

(R;) = inf „(„; ·);

:( )

·H · R

that is, the OPTA function (and hence the distortion-rate function) of a stationary ergodic source is just the \distance" in the „ sense to the nearest stationary and ergodic process with the specifled reproduction alphabet and with entropy rate less than R.

This result originated in [55].

242

CHAPTER 11. SOURCE CODING THEOREMS

Chapter 12

Coding for noisy channels

12.1Noisy Channels

In the treatment of source coding the communication channel was assumed to be noiseless. If the channel is noisy, then the coding strategy must be difierent. Now some form of error control is required to undo the damage caused by the channel. The overall communication problem is usually broken into two pieces: A source coder is designed for a noiseless channel with a given resolution or rate and an error correction code is designed for the actual noisy channel in order to make it appear almost noiseless. The combination of the two codes then provides the desired overall code or joint source and channel code. This division is natural in the sense that optimizing a code for a particular source may suggest quite difierent structure than optimizing it for a channel. The structures must be compatible at some point, however, so that they can be used together.

This division of source and channel coding is apparent in the subdivision of this chapter. We shall begin with a basic lemma due to Feinstein [38] which is at the basis of traditional proofs of coding theorems for channels. It does not consider a source at all, but flnds for a given conditional distribution the maximum number of inputs which lead to outputs which can be distinguished with high probability. Feinstein’s lemma can be thought of as a channel coding theorem for a channel which is used only once and which has no past or future. The lemma immediately provides a coding theorem for the special case of a channel which has no input memory or anticipation. The di–culties enter when the conditional distributions of output blocks given input blocks depend on previous or future inputs. This di–culty is handled by imposing some form of continuity on the channel with respect to its input, that is, by assuming that if the channel input is known for a big enough block, then the conditional probability of outputs during the same block is known nearly exactly regardless of previous or future inputs. The continuity condition which we shall consider is that of

d-continuous channels. Joint source and channel codes have been obtained for more general channels called weakly continuous channels (see, e.g., Kiefier [80]

243

244

CHAPTER 12. CODING FOR NOISY CHANNELS

[81]), but these results require a variety of techniques not yet considered here and do not follow as a direct descendent of Feinstein’s lemma.

Block codes are extended to sliding-block codes in a manner similar to that for source codes: First it is shown that asynchronous block codes can be synchronized and then that the block codes can be \stationarized" by the insertion of random punctuation. The approach to synchronizing channel codes is based on a technique of Dobrushin [33].

We consider stationary channels almost exclusively, thereby not including interesting nonstationary channels such as flnite state channels with an arbitrary starting state. We will discuss such generalizations and we point out that they are straightforward for two-sided processes, but the general theory of AMS channels for one-sided processes is not in a satisfactory state. Lastly, we emphasize ergodic channels. In fact, for the sliding block codes the channels are also required to be totally ergodic, that is, ergodic with respect to all block shifts.

As previously discussed, we emphasize digital, i.e., discrete, channels. A few of the results, however, are as easily proved under somewhat more general conditions and hence we shall do so. For example, given the background of this book it is actually easier to write things in terms of measures and integrals than in terms of sums over probability mass functions. This additional generality will also permit at least a description of how the results extend to continuous alphabet channels.

12.2Feinstein’s Lemma

Let (A; BA) and (B; BB ) be measurable spaces called the input space and the output space, respectively. Let PX denote a probability distribution on (A; BA) and let (F jx), F 2 BB , x 2 B denote a regular conditional probability distribution on the output space. can be thought of as a \channel" with random variables as input and output instead of sequences. Deflne the hookup PX =

PXY by

Z

PXY (F ) = dPX (x)(Fxjx):

Let PY denote the induced output distribution and let PX £ PY denote the resulting product distribution. Assume that PXY << (PX £PY ) and deflne the Radon-Nikodym derivative

f =

dPXY

(12.1)

d(PX £ PY )

and the information density

i(x; y) = ln f (x; y):

We use abbreviated notation for densities when the meanings should be clear from context, e.g., f instead of fXY . Observe that for any set F

Z

µZ

dPX (x) dPY (y)f (x; y)

F

Z

=d(PX £ PY )(x; y)f (x; y)

F £B

12.2. FEINSTEIN’S LEMMA

245

Z

=

dPXY (x; y) = PX (B) 1

 

and hence

F £B

 

 

 

Z

dPY (y)f (x; y) 1; PX ¡ a:e:

(12.2)

Feinstein’s lemma shows that we can pick M inputs fxi 2 A; i = 1; 2; ¢ ¢ ¢ ; M g, and a corresponding collection of M disjoint output events f¡i 2 BB ; i = 1; 2; ¢ ¢ ¢ ; M g, with the property that given an input xi with high probability the output will be in ¡i. We call the collection C = fxi; ¡i; i = 1; 2; ¢ ¢ ¢ ; M g a code with codewords xi and decoding regions ¡i. We do not require that the ¡i exhaust B.

The generalization of Feinstein’s original proof for flnite alphabets to general measurable spaces is due to Kadota [70] and the following proof is based on his.

Lemma 12.2 Feinstein’s Lemma: Given an integer M and a > 0 there exist xi 2 A; i = 1; ¢ ¢ ¢ ; M and a measurable partition F = f¡i; i = 1; ¢ ¢ ¢ ; M g of B such that

ci jxi) • M e¡a + PXY (i • a):

Proof: Deflne G = fx; y : i(x; y) > ag Set = M e¡a + PXY (i • a) = M e¡a + PXY (Gc). The result is obvious if † ‚ 1 and hence we assume that

† < 1 and hence also that

PXY (Gc) • † < 1

and therefore that

Z

PXY (i > a) = PXY (G) = dPX (x)(Gxjx) > 1 ¡ † > 0:

~

¡ † and (12.2) holdsg must have

This implies that the set A = fx : (Gxjx) > 1

positive measure under PX We now construct a code consisting of input points

xi and output sets ¡xi . Choose~

~

= Gx1 . Next choose

an x1 2 A and deflne ¡x1

if possible a point x2 2 A for which (Gx2 ¡ ¡x1 jx2) > 1 ¡ †. Continue in this

~

way until either M points have been selected or all the points in A have been

exhausted. In particular, given the pairs fxj ; ¡j g; j = 1; 2; ¢ ¢ ¢ ; i ¡ 1, satisfying the condition, flnd an xi for which

[

 

(Gxi ¡ ¡xj jxi) > 1 ¡ †:

(12.3)

j<i

If the procedure terminates before M points have been collected, denote the flnal point’s index by n. Observe that

xi cjxi) • ”(Gxi cjxi) • †; i = 1; 2; ¢ ¢ ¢ ; n

and hence the lemma will be proved if we can show that necessarily n cannot be strictly less than M . We do this by assuming the contrary and flnding a contradiction.

246 CHAPTER 12. CODING FOR NOISY CHANNELS

Suppose that the selection has terminated at n < M and deflne the set

n

¡xi 2 BB . Consider the probability

 

 

 

F = Si=1

c

 

 

 

PXY (G) = PXY (G \(A £ F )) + PXY (G

\(A £ F

)):

(12.4)

The flrst term can be bounded above as

\

PXY (G (A £ F )) • PXY (A £ F ) = PY (F )

n

X

= PY xi ):

 

 

 

 

i=1

 

 

 

 

We also have from the deflnitions and from (12.2) that

 

 

 

PY xi ) = Z¡xi dPY (y) ZGxi dPY (y)

 

 

ZGxi

f (xi; y)

• e¡a Z dPY (y)f (xi; y) • e¡a

 

 

 

 

dPY (y)

 

 

ea

 

 

and hence

 

PXY (G

(A £ F )) • ne¡a:

 

(12.5)

 

 

 

Consider the second term of (12.3):

 

 

 

 

 

 

 

 

\

 

 

 

 

 

 

PXY (G (A £ F c)) = Z

dPX (x)((G (A £ F c))xjx)

 

= Z

\

 

F cjx) =

Z

\

n

(12.6)

dPX (x)(Gx

 

dPX (x)(Gx

¡ i=1 ¡ijx):

 

 

 

\

 

 

 

[

 

We must have, however, that

n

[

(Gx ¡ ¡ijx) 1 ¡ †

i=1

with PX probability 1 or there would be a point xn+1 for which

n+1

[

(Gxn+1 ¡ ¡ijxn+1) > 1 ¡ †;

i=1

that is, (12.3) would hold for i = n + 1, contradicting the deflnition of n as the largest integer for which (12.3) holds. Applying this observation to (12.6) yields

\

PXY (G (A £ F c)) 1 ¡ † which with (12.4) and (12.5) implies that

PXY (G) • ne¡a + 1 ¡ †: (12.7) From the deflnition of , however, we have also that

PXY (G) = 1 ¡ PXY (Gc) = 1 ¡ † + M e¡a which with (12.7) implies that M • n, completing the proof. 2

12.3. FEINSTEIN’S THEOREM

247

12.3Feinstein’s Theorem

Given a channel [A; ”; B] an (M; n; †) block channel code for is a collection fwi; ¡ig; i = 1; 2; ¢ ¢ ¢ ; M , where wi 2 An, ¡i 2 BBn , all i, with the property that

sup max

n

(12.8)

x i) • †;

x2c(wi) i=1;¢¢¢;M

 

where c(an) = fx : xn = ang and where xn is the restriction of x to BBn . The rate of the code is deflned as n¡1 log M . Thus an (n; M; †) channel code is a collection of M input n-tuples and corresponding output cells such that regardless of the past or future inputs, if the input during time 1 to n is a channel codeword, then the output during time 1 to n is very likely to lie in the corresponding output cell. Channel codes will be useful in a communication system because they permit nearly error free communication of a select group of messages or codewords. A communication system can then be constructed for communicating a source over the channel reliably by mapping source blocks into channel codewords. If there are enough channel codewords to assign to all of the source blocks (at least the most probable ones), then that source can be reliably reproduced by the receiver. Hence a fundamental issue for such an application will be the number of messages M or, equivalently, the rate R of a channel code.

Feinstein’s lemma can be applied fairly easily to obtain something that resembles a coding theorem for a noisy channel. Suppose that [A; ”; B] is a channel hookup. De-

and [A; „] is a source and that [A £ B; p = „”] is the resulting

K

denote

note the resulting pair process by

X

; Y

For any integer K let p

 

the restriction of p to (AK

BK , f Kn

ngK ), that is, the distribution on in-

put/output K-tuples (X

K

 

£K

 

BA £ BB

 

K

together with the

 

; Y

). The joint distribution p

 

input distribution K induce a regular conditional probability ^K deflned by ^K (F jxK ) = Pr(Y K 2 F jXK = xK ). In particular,

Z

^K (GjaK ) = Pr(Y K 2 GjXK = aK ) = 1 K (G)d„(x): (12.9)

K (aK ) c(aK ) x

where c(aK ) = fx : xK = aK g is the rectangle of all sequences with a common K-dimensional output. We call ^K the induced K-dimensional channel of the channel and the source . It is important to note that the induced channel depends on the source as well as on the channel, a fact that will cause some di–culty in applying Feinstein’s lemma. An exception to this case which proves to be an easy application is that of a channel without input memory and anticipation, in which case we have from the deflnitions that

^K (F jaK ) = x(Y K 2 F ); x 2 c(aK );

Application of Feinstein’s lemma to the induced channel yields the following result, which was proved by Feinstein for stationary flnite alphabet channels and is known as Feinstein’s theorem:

248 CHAPTER 12. CODING FOR NOISY CHANNELS

Lemma 12.3.1: Suppose that [A £B; „”] is an AMS and ergodic hookup of

„ „

a source and channel . Let I„” = I„” (X; Y ) denote the average mutual infor-

mation rate and assume that I„” = I„” is flnite (as is the case if the alphabets are flnite (Theorem 6.4.1) or have the flnite-gap information property (Theorem

6.4.3)). Then for any R < I„” and any † > 0 there exists for su–ciently large

n a code fwin; ¡i; i = 1; 2; ¢ ¢ ¢ ; M g, where M = benRc, win 2 An, and ¡i 2 BBn , with the property that

^nicjwin) • †; i = 1; 2; ¢ ¢ ¢ ; M:

(12.10)

Comment: We shall call a code fwi; ¡i; i = 1; 2; ¢ ¢ ¢ ; M g which satisfles (12.10) for a channel input process a („; M; n; )-Feinstein code. The quantity n¡1 log M is called the rate of the Feinstein code.

Proof: Let · denote the output distribution induced by and . Deflne the

information density

dpn

in = (d„n £ ·n)

and deflne

¡

= I„” R > 0:

2

Apply Feinstein’s lemma to the n-dimensional hookup („”)n with M = benRc and a = n(R + ) to obtain a code fwi; ¡ig; i = 1; 2; ¢ ¢ ¢ ; M with

max ^nc wn)

M e¡n(R+) + pn(i

n

n(R + ))

 

i

i j i

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

 

 

 

 

 

 

 

 

= benRce¡n(R+) + p(

 

in(Xn; Y n) • R + )

(12.11)

 

n

and hence

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

max ^nc wn)

e¡n– + p(

1

i

(Xn; Y n)

I

):

(12.12)

 

i

i j i

 

 

 

n n

 

 

„” ¡

 

 

From Theorem 6.3.1 n¡1in converges in L1 to I„” and hence it also converges in probability. Thus given we can choose an n large enough to ensure that the right hand side of (12.11) is smaller than , which completes the proof of the theorem. 2

We said that the lemma \resembled" a coding theorem because a real coding theorem would prove the existence of an (M; n; †) channel code, that is, it would concern the channel itself and not the induced channel ^, which depends on a channel input process distribution . The difierence between a Feinstein code and a channel code is that the Feinstein code has a similar property for an induced channel which in general depends on a source distribution, while the channel code has this property independent of any source distribution and for any past or future inputs.

Feinstein codes will be used to construct block codes for noisy channels. The simplest such construction is presented next.