Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Теория информации / Gray R.M. Entropy and information theory. 1990., 284p

.pdf
Скачиваний:
32
Добавлен:
09.08.2013
Размер:
1.32 Mб
Скачать

10.6. THE DISTORTION-RATE FUNCTION

 

 

 

 

209

Thus we have the upper bound

 

 

 

 

 

 

 

 

 

 

 

1

1

 

 

1

 

 

 

 

 

1 1

 

 

 

 

 

 

 

 

X

 

 

 

 

 

 

X

 

 

n

Iz (Xn; Y n)

n

IpN (XN ; Y N )

 

1G(T iz) ¡

n

Iz (YiN ; Y i)1G(T iz):

 

 

 

 

 

 

 

i=0

 

 

 

 

 

 

i=0

 

Taking expectations and using stationarity as before we flnd that

 

 

 

1

I(Xn; Y njZn) • IpN (XN ; Y N )mZ (G)

 

 

 

 

 

 

 

 

 

 

 

n

 

 

 

 

 

 

1 1

 

 

 

 

 

 

 

 

 

 

 

 

 

¡n i=0

ZG dmZ (z)Iz (Y0N ; (Y¡i; ¢ ¢ ¢ ; Y¡1)):

 

 

 

 

 

 

X

 

 

 

 

 

 

 

 

 

 

 

Taking the limit as n ! 1 using Lemma 5.6.1 yields

 

 

 

I(X; Y jZ) • IpN (XN ; Y N )mZ (G) ¡ ZG dmZ (z)Iz (Y0N ; Y ¡):

(10.26)

Combining this with (10.24) proves that

 

 

 

 

 

 

 

 

 

 

 

 

 

 

I(X; Y jZ) • I(X; Y jZ)

 

and hence that

I(X; Y ) = I(X; Y ):

 

 

 

 

 

 

 

 

 

 

 

 

 

 

It also proves that

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

I(X; Y ) = I(X; Y jZ) ¡ I(Z; (X; Y )) • I(X; Y jZ)

 

 

 

• IpN (XN ; Y N )mZ (G)

1

IpN (XN ; Y N )

 

 

 

 

 

 

 

 

 

N

 

using Corollary 9.4.3 to bound mX (G). This proves (10.19). 2

 

 

Proof of the theorem: We have immediately that

 

 

 

 

 

 

(R; „)

 

(R; „)

 

 

 

(R; „)

 

 

 

 

 

 

Re

‰ Rs

‰ Rs

 

 

 

 

and

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(R; „)

 

(R; „)

 

 

(R; „);

 

 

 

 

 

 

Re

 

‰ Re

 

‰ Rs

 

 

 

 

 

and hence we have for stationary sources that

 

 

 

 

 

 

 

 

 

 

 

Ds(R; „) • Ds(R; „)

 

 

 

(10.27)

and for ergodic sources that

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Ds(R; „) • Ds(R; „) • De(R; „)

(10.28)

and

 

 

 

Ds(R; „) • De(R; „) • De(R; „):

 

 

 

 

 

 

(10.29)

210

CHAPTER 10.

DISTORTION

We next prove that

(10.30)

 

 

Ds(R; „) ‚ D(R; „):

 

If Ds(R; „) is inflnite, the inequality is obvious. Otherwise flx † > 0 and choose

a p 2 Rs(R; „) for which Ep1

(X0; Y0) • Ds(R; „) + and flx – > 0 and choose

m so large that for n ‚ m we have that

¡1 n n

n Ip(X ; Y ) Ip(X; Y ) + – R + –:

For n ‚ m we therefore have that pn 2 Rn(R + –;n) and hence

n

n ‚ Dn(R + –; „) ‚ D(R + –; „):

Ds(R; „) + = Ep

From Lemma 10.6.1 D(R; „) is continuous in R and hence (10.30) is proved. Lastly, flx † > 0 and choose N so large and pN 2 RN (R; „N ) so that

EpN N • DN (R; „N ) + 3• D(R; „) + 23:

Construct the corresponding (N; )-SBM channel as in Example 9.4.11 with small enough to ensure that –‰ • †=3. Then from Lemma 10.6.2 we have

that the resulting hookup p is stationary and that Ip = Ip R and hence

p

2 Rs

‰ Rs

(R; „). Furthermore, if is ergodic then so is p and hence

(R; „)

 

p

2 Re

‰ Re

(R; „). From Lemma 10.6.2 the resulting distortion is

(R; „)

Ep1(X0; Y0) • EpN N + – • D(R; „) + †:

Since † > 0 this implies the exisitence of a p 2 Rs (R; „) (p 2 Re (R; „) if is ergodic) yielding Ep1(X0; Y0) arbitrarily close to D(R; „. Thus for any

stationary source

Ds(R; „) • D(R; „)

and for any ergodic source

De(R; „) • D(R; „):

With (10.27){(10.30) this completes the proof. 2

The previous lemma is technical but important. It permits the construction of a stationary and ergodic pair process having rate and distortion near that of that for a flnite dimensional vector described by the original source and a flnite-dimensional conditional probability.

UnKK

Chapter 11

Source Coding Theorems

11.1Source Coding and Channel Coding

In this chapter and the next we develop the basic coding theorems of information theory. As is traditional, we consider two important special cases flrst and then later form the overall result by combining these special cases. In the flrst case we assume that the channel is noiseless, but it is constrained in the sense that it can only pass R bits per input symbol to the receiver. Since this is usually insu–cient for the receiver to perfectly recover the source sequence, we attempt to code the source so that the receiver can recover it with as little distortion as possible. This leads to the theory of source coding or source coding subject to a fldelity criterion or data compression, where the latter name re°ects the fact that sources with inflnite or very large entropy are \compressed" to flt across the given communication link. In the next chapter we ignore the source and focus on a discrete alphabet channel and construct codes that can communicate any of a flnite number of messages with small probability of error and we quantify how large the message set can be. This operation is called channel coding or error control coding. We then develop joint source and channel codes which combine source coding and channel coding so as to code a given source for communication over a given channel so as to minimize average distortion. The ad hoc division into two forms of coding is convenient and will permit performance near that of the OPTA function for the codes considered.

11.2Block Source Codes for AMS Sources

We flrst consider a particular class of codes: block codes. For the time being we also concentrate on additive distortion measures. Extensions to subadditive distortion measures will be considered later. Let fXng be a source with a standard alphabet A. Recall that an (N; K) block code of a source fXng maps successive nonoverlapping input vectors fXnNN g into successive channel vectors

= (XnNN ), where : AN ! BK is called the source encoder. We assume

211

212

CHAPTER 11. SOURCE CODING THEOREMS

that the channel is noiseless, but that it is constrained in the sense that N source time units corresponds to the same amount of physical time as K channel time

units and that

K log jjBjj • R; N

where the inequality can be made arbitrarily close to equality by taking N and K large enough subject to the physical stationarity constraint. R is called the source coding rate or resolution in bits or nats per input symbol. We may wish to change the values of N and K, but the rate is flxed.

A reproduction or approximation of the original source is obtained by a source decoder, which we also assume to be a block code. The decoder is a

mapping : B

K

 

^N

which forms the reproduction process

 

^

 

^ N

=

 

!

A

f

X

via X

 

K

 

 

 

 

ng

 

nN

 

(UnK ); n =

1; 2; ¢ ¢ ¢.

In general we could have a reproduction dimension

difierent from that of the input vectors provided they corresponded to the same amount of physical time and a suitable distortion measure was deflned. We will make the simplifying assumption that they are the same, however.

Because N source symbols are mapped into N reproduction symbols, we will often refer to N alone as the block length of the source code. Observe that the resulting sequence coder is N -stationary. Our immediate goal is now the following: Let E and D denote the collection of all block codes with rate no greater than R and let be the given channel. What is the OPTA function ¢(„; E; ”; D) for this system? Our flrst step toward evaluating the OPTA is to flnd a simpler and equivalent expression for the current special case.

Given a source code consisting of encoder and decoder , deflne the code-

book to be

C = f all (uK ); uK 2 BK g;

that is, the collection of all possible reproduction vectors available to the receiver. For convenience we can index these words as

C = fyi; i = 1; 2; ¢ ¢ ¢ ; M g;

where N ¡1 log M • R by construction. Observe that if we are given only a decoder or, equivalently, a codebook, and if our goal is to minimize the average distortion for the current block, then no encoder can do better than the encoder which maps an input word xN into the minimum distortion available reproduction word, that is, deflne (xN ) to be the uK minimizing N (xN ; fl(uK )), an assignment we denote by

(xN ) = min¡1N (xN ; fl(uK )):

uK

Observe that by construction we therefore have that

N (xN ; fl((xN ))) = min N (xN ; y)

y2C

and the overall mapping of xN into a reproduction is a minimum distortion or nearest neighbor mapping. Deflne

N (xN ; C) = min N (xN ; y):

y2C

11.2. BLOCK SOURCE CODES FOR AMS SOURCES

213

To formally prove that this is the best decoder, observe that if the source is AMS and p is the joint distribution of the source and reproduction, then p is also AMS. This follows since the channel induced by the block code is N -stationary and hence also AMS with respect to T N . This means that p is AMS with respect to T N which in turn implies that it is AMS with respect to T (Theorem 7.3.1 of [50]). Letting p„ denote the stationary mean of p and pN denote the N -stationary mean, we then have from (10.10) that for any block codes with codebook C

¢ = N1 EpN N (XN ; Y N ) N1 EpN N (XN ; C);

with equality if the minimum distortion encoder is used. For this reason we can conflne interest to block codes specifled by a codebook: the encoder produces the index of the minimum distortion codeword for the observed vector and the decoder is a table lookup producing the codeword being indexed. A code of this type is also called a vector quantizer or block quantizer. Denote the performance of the block code with codebook C on the source by

(C; „) = ¢ = Ep1:

Lemma 11.2.1: Given an AMS source and a block length N code book C, let N denote the N -stationary mean of (which exists from Corollary 7.3.1 of [50]), let p denote the induced input/output distribution, and let p„ and pN denote its stationary mean and N -stationary mean, respectively. Then

(C; „) = Ep1(X0; Y0) = N1 EpN N (XN ; Y N )

= N1 EN N (XN ; C) = (C; N ):

Proof: The flrst two equalities follow from (10.10), the next from the use of the minimum distortion encoder, the last from the deflnition of the performance of a block code. 2

It need not be true in general that (C; „) equal (C; ). For example, if produces a single periodic waveform with period N and C consists of a single period, then (C; „) = 0 and (C; ) > 0. It is the N -stationary mean and not the stationary mean that is most useful for studying an N -stationary code.

We now deflne the OPTA for block codes to be

(R; „) = ¢(„; ”; E; D) = inf N (R; „);

 

 

 

 

N

N (R; „) =

inf

(C; „);

 

C2K

(N;R)

 

 

 

 

where is the noiseless channel as described previously, E and D are classes of block codes for the channel, and K(N; R) is the class of all block length N codebooks C with

N1 log jjCjj • R:

214 CHAPTER 11. SOURCE CODING THEOREMS

(R; „) is called the block source coding OPTA or the operational block coding distortion-rate function.

Corollary 11.2.1: Given an AMS source , then for any N and i =

0; 1; ¢ ¢ ¢ ; N ¡ 1

N (R; „T ¡i) = N (R; N T ¡i):

Proof: For i = 0 the result is immediate from the lemma. For i 6= 0 it follows from the lemma and the fact that the N -stationary mean of „T ¡i is N T ¡i (as is easily verifled from the deflnitions). 2

Reference Letters

Many of the source coding results will require a technical condition that is a generalization of reference letter condition of Theorem 10.6.1 for stationary

 

 

 

 

^

 

 

2

A^ with respect

sources. An AMS source is said to have a reference letter a

 

to a distortion measure = 1 on A £ A if

 

 

 

 

 

sup E

„T ¡

n (X

; a) = sup E

(X

; a) = <

1

;

(11.1)

n

0

n

n

 

 

 

that is, there exists a letter for which E(Xn; a) is uniformly bounded above. If we deflne for any k the vector a⁄k = (a; a; ¢ ¢ ¢ ; a) consisting of k a’s, then (11.1) implies that

sup E

 

n

1

(Xk; a⁄k)

<

1

:

(11.2)

„T ¡

 

n

 

k

k

 

 

 

 

We assume for convenience that any block code of length N contains the reference vector a⁄N . This ensures that N (xN ; C) • ‰N (xN ; a⁄N ) and hence that N (xN ; C) is bounded above by a -integrable function and hence is itself -integrable. This implies that

(R; „) • –N (R; „) • ‰:

(11.3)

The reference letter also works for the stationary mean source since

1

lim 1 X (xi; a) = 1(x; a);

n!1 n i=0

-a.e. and -a.e., where adenotes an inflnite sequence of a. Since 1 is invariant we have from Lemma 6.3.1 of [50] and Fatou’s lemma that

E(X0; a) = Eˆnlim

n

(Xi; a)!

 

1 1

!1

 

X

 

i=0

1

lim inf 1 X E(Xi; a) • ‰:

n!1 n

i=0

11.2. BLOCK SOURCE CODES FOR AMS SOURCES

215

Performance and OPTA

We next develop several basic properties of the performance and OPTA functions for block coding AMS sources with additive fldelity criteria.

Lemma 11.2.2: Given two sources 1 and 2 and ‚ 2 (0; 1), then for any block code C

(C; ‚„1 + (1 ¡ ‚)2) = ‚‰(C; „1) + (1 ¡ ‚)(C; „2)

and for any N

N (R; ‚„1 + (1 ¡ ‚)2) ‚ ‚–N (R; „1) + (1 ¡ ‚)N (R; „2)

and

(R; ‚„1 + (1 ¡ ‚)2) ‚ ‚–(R; „1) + (1 ¡ ‚)(R; „2):

Thus performance is linear in the source and the OPTA functions are convex

T

. Lastly,

1

N (R + N ; ‚„1 + (1 ¡ ‚)2) • ‚–N (R; „1) + (1 ¡ ‚)N (R; „2):

Proof: The equality follows from the linearity of expectation since (C; „) = E(XN ; C). The flrst inequality follows from the equality and the fact that the inflmum of a sum is bounded below by the sum of the inflma. The next inequality follows similarly. To get the flnal inequality, let Ci approximately yield N (R; „i); that is,

(Ci; „i) • –N (R; „i) + †:

Form the union code C = C1

C2 containing all of the words in both of the

codes. Then the rate of the

code is

 

S

N1 log jjCjj = N1 log(jjC1jj + jjC2jj)

N1 log(2NR + 2NR) = R + N1 :

This code yields performance

(C; ‚„1 + (1 ¡ ‚)2) = ‚‰(C; „1) + (1 ¡ ‚)(C; „2)

• ‚‰(C1; „1) + (1 ¡ ‚)(C2; „2) • ‚–N (R; „1) + ‚† + (1 ¡ ‚)N (R; „2) + (1 ¡ ‚)†:

Since the leftmost term in the above equation can be no smaller than N (R + 1=N; ‚„1 + (1 ¡ ‚)2), the lemma is proved. 2

The flrst and last inequalities in the lemma suggest that N is very nearly an a–ne function of the source and hence perhaps is as well. We will later pursue this possibility, but we are not yet equipped to do so.

216 CHAPTER 11. SOURCE CODING THEOREMS

Before developing the connection between the OPTA functions of AMS sources and those of their stationary mean, we pause to develop some additional properties for OPTA in the special case of stationary sources. These results follow Kiefier [76].

Lemma 11.2.3: Suppose that is a stationary source. Then

(R; „) = lim N (R; „):

N!1

Thus the inflmum over block lengths is given by the limit so that longer codes can do better.

Proof: Fix an N and an n < N and choose codes Cn ^n CN¡n ^N¡n

A

and

A

for which

(Cn; „) • –n(R; „) + 2

(CN¡n; „) • –N¡n(R; „) + 2 :

Form the block length N code C = Cn £ CN¡n. This code has rate no greater than R and has distortion

N ‰(C; „) = E min N (XN ; y)

y2C

=Eyn2Cn n(Xn; yn) + EvN ¡n2CN ¡n N¡n(XnN¡n; vN¡n)

=Eyn2Cn n(Xn; yn) + EvN ¡n2CN ¡n N¡n(XN¡n; vN¡n)

=n‰(Cn; „) + (N ¡ n)(CN¡n; „)

• n–n(R; „) + (N ¡ n)N¡n(R; „) + †;

(11.4)

where we have made essential use of the stationarity of the source. Since is arbitrary and since the leftmost term in the above equation can be no smaller than N N (R; „), we have shown that

N –N (R; „) • n–n(R; „) + (N ¡ n)N¡n(R; „)

and hence that the sequence N N is subadditive. The result then follows immediately from Lemma 7.5.1 of [50]. 2

S

Corollary 11.2.2: If is a stationary source, then (R;) is a convex function of R and hence is continuous for R > 0.

Proof: Pick R1 > R2 and ‚ 2 (0; 1). Deflne R = ‚R1 + (1 ¡ ‚)R2. For large

n deflne n1 = b‚ncn be the largest integer less than ‚n and let n2 = n ¡n1. Pick

^

i

with rate Ri with distortion

codebooks Ci ‰ A

 

(Ci; „) • –ni (Ri; „) + †:

Analogous to (11.4), for the product code C = C1 £ C2 we have

n‰(C; „) = n1(C1; „) + n2(C2; „)

11.2. BLOCK SOURCE CODES FOR AMS SOURCES

217

• n1n1 (R1; „) + n2n2 (R2; „) + n†:

The rate of the product code is no greater than R and hence the leftmost term above is bounded below by n–n(R;). Dividing by n we have since is arbitrary

that

n1

 

n2

 

n(R; „)

n1 (R1; „) +

n2 (R2; „):

n

n

Taking n ! 1 we have using the lemma and the choice of ni that

(R; „) • ‚–(R1; „) + (1 ¡ ‚)(R2; „);

proving the claimed convexity. 2

Corollary 11.2.3: If is stationary, then (R;) is an a–ne function of . Proof: From Lemma 11.2.2 we need only prove that

(R; ‚„1 + (1 ¡ ‚)2) • ‚–(R; „1) + (1 ¡ ‚)(R; „2):

From the same lemma we have that for any N

1

N (R + N ; ‚„1 + (1 ¡ ‚)2) • ‚–N (R; „1) + (1 ¡ ‚)N (R; „2)

For any K • N we have since N (R; „) is nonincreasing in R that

1

N (R + K ; ‚„1 + (1 ¡ ‚)2) • ‚–N (R; „1) + (1 ¡ ‚)N (R; „2):

Taking the limit as N ! 1 yields from Lemma 11.2.3 that

1

(R + K ; „) • ‚–(R; „1) + (1 ¡ ‚)(R; „2):

From Corollary 11.2.2, however, is continuous in R and the result follows by letting K ! 1. 2

The following lemma provides the principal tool necessary for relating the OPTA of an AMS source with that of its stationary mean. It shows that the OPTA of an AMS source is not changed by shifting or, equivalently, by redeflning the time origin.

Lemma 11.2.4: Let be an AMS source with a reference letter. Then for any integer i (R;) = (R; „T ¡i).

Proof: Fix † > 0 and let CN be a rate R block length N codebook for which (CN ; „) (R;) + †=2. For 1 • i • N ¡1 choose J large and deflne the block length K = JN code CK (i) by

C

K (i) = a(N¡i)

2

N

£

a⁄i

;

 

£ j£ C

 

 

 

 

 

=0

 

 

 

 

where a⁄l is an l-tuple containing all a’s. CK (i) can be considered to be a code consisting of the original code shifted by i time units and repeated many times, with some flller at the beginning and end. Except for the edges of the long

218

CHAPTER 11. SOURCE CODING THEOREMS

product code, the efiect on the source is to use the original code with a delay. The code has at most (2NR)1 = 2KR2¡NR words; the rate is no greater than

R.

For any K-block xK the distortion resulting from using Ck(i) is given by

K‰

K

(xK ;

CK

(i))

(N

¡

i)

N¡i

(xN¡i; a(N¡i)) + i‰ (xi

; a⁄i): (11.5)

 

 

 

 

 

i K¡i

 

Let fx^ng denote the encoded process using the block code CK (i). If n is a multiple of K, then

 

 

b

n

c

 

 

 

 

n‰ (xn; x^n)

 

K

 

 

 

(xN¡i; a(N¡i))

 

 

 

((N i)

 

n

X

¡

N¡i kK

 

 

k=0

 

 

 

 

 

 

 

 

b

n

cJ¡1

 

 

 

 

 

K

 

+i‰i(x(ik+1)K¡i; a⁄i)) +

X

N ‰N (xNN¡i+kN ; CN ):

 

 

k=0

If n is not a multiple of K we can further overbound the distortion by including the distortion contributed by enough future symbols to complete a K-block, that is,

 

 

 

n

 

 

 

 

 

 

 

 

 

 

 

 

n‰n(xn; x^n) • n°n(x; x^)

 

 

b

c+1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

=

K

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

k=0

(N ¡ i)N¡i(xkKN¡i; a(N¡i) + i‰i(x(ik+1)K¡i; a⁄i)·

 

 

 

X

 

 

 

 

 

 

 

 

 

n

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(b

c+1)1

 

 

 

 

 

 

 

 

 

 

 

K

 

 

 

 

 

 

 

 

 

+

 

 

 

 

 

X

N ‰N (xNN¡i+kN ; CN ):

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

k=0

 

 

 

 

Thus

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

b

n

c+1

 

 

 

 

 

 

 

 

 

 

 

N ¡ i 1

 

(xn; x^n)

 

 

 

K

 

 

 

 

 

 

(XN¡i(T kK x); a(N¡i)

 

n

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

X

N¡i

 

 

 

 

 

 

 

K n=K

 

k=0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

 

 

 

b

n

c+1

 

 

 

 

 

 

 

 

 

i

 

 

 

K

 

 

 

 

 

 

+

 

 

 

 

X

 

i(Xi(T (k+1)K¡ix; a⁄i)

 

 

 

 

 

K n=K

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

k=0

 

 

 

 

 

 

 

 

 

 

 

1

 

 

(b

n

c+1)1

 

 

 

 

 

 

 

 

 

 

 

 

 

K

 

 

 

 

 

 

 

 

 

+

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

N (XN (T (N¡i)+kN x); CN ):

 

 

 

 

 

n=N

 

 

 

 

 

 

 

 

 

 

 

 

 

X

k=0

Since is AMS these quantities all converge to invariant functions:

lim (xn; x^n)

 

N ¡ i

lim

1

1

 

(XN¡i(T kK x); a(N¡i)

n!1 n

 

 

 

 

m!1

 

X

 

N¡i

 

 

 

K

m

k=0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

i

1

 

1

i(Xi(T (k+1)K¡ix; a⁄i)

 

+

 

 

lim

 

 

 

X

 

 

 

 

 

 

 

K m!1 m

 

 

 

 

 

 

 

k=0