Добавил:

fench Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Сумский государственный педагогический университет им. Макаренко

Предмет:

Теория вероятностей и математическая статистика

Файл:

Теория информации / Cover T.M., Thomas J.A. Elements of Information Theory. 2006., 748p

.pdf

Скачиваний:

290

Добавлен:

09.08.2013

Размер:

10.58 Mб

Скачать

☆

<<< < Предыдущая 22 23 24 25 26 27 28 29 30 31 32 3334 / 7834 35 36 37 38 39 40 41 42 43 44 45 46 > Следующая >>>

10.2 DEFINITIONS 305

• Squared-error distortion. The squared-error distortion,

d(x, x)ˆ = (x − x)ˆ 2,

(10.5)

is the most popular distortion measure used for continuous alphabets. Its advantages are its simplicity and its relationship to least-squares prediction. But in applications such as image and speech coding, various authors have pointed out that the mean-squared error is not an appropriate measure of distortion for human observers. For example, there is a large squared-error distortion between a speech waveform and another version of the same waveform slightly shifted in time, even though both would sound the same to a human observer.

Many alternatives have been proposed; a popular measure of distortion in speech coding is the Itakura–Saito distance, which is the relative entropy between multivariate normal processes. In image coding, however, there is at present no real alternative to using the mean-squared error as the distortion measure.

The distortion measure is deﬁned on a symbol-by-symbol basis. We extend the deﬁnition to sequences by using the following deﬁnition:

Deﬁnition The distortion between sequences xn and xˆn is deﬁned by

d(xn, xˆn) = n1 d(xi , xˆi ). (10.6)

i=1

So the distortion for a sequence is the average of the per symbol distortion of the elements of the sequence. This is not the only reasonable deﬁnition. For example, one may want to measure the distortion between two sequences by the maximum of the per symbol distortions. The theory derived below does not apply directly to this more general distortion measure.

Deﬁnition A (2nR , n)-rate distortion code consists of an encoding function,

fn : Xn → {1, 2, . . . , 2nR },

(10.7)

and a decoding (reproduction) function,

nR	} →	ˆn	.	(10.8)
gn : {1, 2, . . . , 2		X

306 RATE DISTORTION THEORY

The distortion associated with the (2nR , n) code is deﬁned as

D = Ed(Xn, gn(fn(Xn))),

(10.9)

where the expectation is with respect to the probability distribution on X:

D = n	p(xn)d(xn, gn(fn(xn))).			(10.10)
x
	nR	), denoted by	ˆ n	(1), . . . ,
The set of n-tuples gn(1), gn(2), . . . , gn(2		), denoted by	X	(1), . . . ,

ˆ n 2nR , constitutes the codebook, and −1 1 −1 2nR are the

X ( ) fn ( ), . . . , fn ( ) associated assignment regions.

Many terms are used to describe the replacement of Xn by its quantized

ˆ n	(w). It is common	ˆ n	as the vector quantiza-
version X		to refer to X

tion, reproduction, reconstruction, representation, source code, or estimate of Xn.

Deﬁnition A rate distortion pair (R, D) is said to be achievable if

there exists a sequence of (2nR , n)-rate distortion codes (fn, gn) with limn→∞ Ed(Xn, gn(fn(Xn))) ≤ D.

Deﬁnition The rate distortion region for a source is the closure of the set of achievable rate distortion pairs (R, D).

Deﬁnition The rate distortion function R(D) is the inﬁmum of rates R such that (R, D) is in the rate distortion region of the source for a given distortion D.

Deﬁnition The distortion rate function D(R) is the inﬁmum of all distortions D such that (R, D) is in the rate distortion region of the source for a given rate R.

The distortion rate function deﬁnes another way of looking at the boundary of the rate distortion region. We will in general use the rate distortion function rather than the distortion rate function to describe this boundary, although the two approaches are equivalent.

We now deﬁne a mathematical function of the source, which we call the information rate distortion function. The main result of this chapter is the proof that the information rate distortion function is equal to the rate distortion function deﬁned above (i.e., it is the inﬁmum of rates that achieve a particular distortion).

10.3 CALCULATION OF THE RATE DISTORTION FUNCTION

307

Deﬁnition The information rate distortion function R(I )(D) for a source

X with distortion measure d(x, x)ˆ is deﬁned as

R	(I )	(D)	= p(xˆ \|x): (x,x)ˆ	min	I (X	;	ˆ	(10.11)
							X),
				p(x)p(xˆ\|x)d(x,x)ˆ ≤D

where the minimization is over all conditional distributions p(xˆ|x) for which the joint distribution p(x, x)ˆ = p(x)p(xˆ|x) satisﬁes the expected distortion constraint.

Paralleling the discussion of channel capacity in Chapter 7, we initially consider the properties of the information rate distortion function and calculate it for some simple sources and distortion measures. Later we prove that we can actually achieve this function (i.e., there exist codes with rate R(I )(D) with distortion D). We also prove a converse establishing that R ≥ R(I )(D) for any code that achieves distortion D.

The main theorem of rate distortion theory can now be stated as follows:

Theorem 10.2.1 The rate distortion function for an i.i.d. source X with distribution p(x) and bounded distortion function d(x, x)ˆ is equal to the associated information rate distortion function. Thus,

R(D)	=	R	(I )	(D)	min	I (X	;	ˆ	(10.12)
								X)
					= p(xˆ \|x): (x,x)ˆ p(x)p(xˆ\|x)d(x,x)ˆ ≤D

is the minimum achievable rate at distortion D.

This theorem shows that the operational deﬁnition of the rate distortion function is equal to the information deﬁnition. Hence we will use R(D) from now on to denote both deﬁnitions of the rate distortion function. Before coming to the proof of the theorem, we calculate the information rate distortion function for some simple sources and distortions.

10.3CALCULATION OF THE RATE DISTORTION FUNCTION

10.3.1Binary Source

We now ﬁnd the description rate R(D) required to describe a Bernoulli(p) source with an expected proportion of errors less than or equal to D.

Theorem 10.3.1 The rate distortion function for a Bernoulli(p) source with Hamming distortion is given by

	=	0,	−		D > min p, 1		p .
R(D)		H (p)		H (D),	0	≤ D ≤ min{p, 1		− p},	(10.13)
						{	−	}

308 RATE DISTORTION THEORY

Proof: Consider a binary source X Bernoulli(p) with a Hamming distortion measure. Without loss of generality, we may assume that p < 12 . We wish to calculate the rate distortion function,

R(D)	= p(xˆ \|x): (x,x)ˆ	min	I (X	;	ˆ	(10.14)
R(D)		min	I (X		X).
		p(x)p(xˆ\|x)d(x,x)ˆ ≤D
		ˆ
Let denote modulo 2 addition. Thus, X X = 1 is equivalent to X =
ˆ		ˆ

X. We do not minimize I (X; X) directly; instead, we ﬁnd a lower bound

and then show that this lower bound is achievable. For any joint distribution satisfying the distortion constraint, we have

ˆ	ˆ		(10.15)
I (X; X) = H (X) − H (X\|X)			(10.15)
	ˆ ˆ		(10.16)
	= H (p) − H (X X\|X)		(10.16)
	ˆ		(10.17)
	≥ H (p) − H (X X)		(10.17)
	≥ H (p) − H (D),		(10.18)
ˆ	H (D) increases with D for D ≤	1	. Thus,
since Pr(X =X) ≤ D and	H (D) increases with D for D ≤	2	. Thus,
R(D) ≥ H (p) − H (D).			(10.19)

We now show that the lower bound is actually the rate distortion function

by ﬁnding a	joint distribution that meets the distortion constraint and
ˆ	= R(D). For 0 ≤ D ≤ p, we can achieve the value of the
has I (X; X)	= R(D). For 0 ≤ D ≤ p, we can achieve the value of the
	ˆ
rate distortion function in (10.19) by choosing (X, X) to have the joint

distribution given by the binary symmetric channel shown in Figure 10.3.

1 − p − D		1 − D	1 − p
	0	0
1 − 2D
^		D
	X	X
		D
p − D			p
	1	1

1 − 2D		1 − D

FIGURE 10.3. Joint distribution for binary source.

10.3 CALCULATION OF THE RATE DISTORTION FUNCTION

309

		ˆ
We choose the distribution of X at the input of the channel so that the
									ˆ
output distribution of X is the speciﬁed distribution. Let r = Pr(X = 1).
Then choose r so that
	r(1 − D) + (1 − r)D = p,								(10.20)
or
	r	=		p − D			.		(10.21)
	r						.		(10.21)
			1		−	2D
					−
1	ˆ	≥					ˆ
If D ≤ p ≤ 2	, then Pr(X = 1)	≥	0 and Pr(X = 0) ≥ 0. We then have
	ˆ			ˆ		= H (p)		− H (D),	(10.22)
I (X; X) = H (X) − H (X\|X)						= H (p)		− H (D),	(10.22)
						ˆ		ˆ
and the expected distortion is Pr(X =X) = D.								ˆ
If D ≥ p, we can achieve R(D) = 0 by letting X = 0 with probability

ˆ	− p, we can
1. In this case, I (X; X) = 0 and D = p. Similarly, if D ≥ 1

achieve = 0 by setting ˆ = 1 with probability 1. Hence, the rate

R(D) X distortion function for a binary source is

	=	0,	−		D > min p, 1		p .
R(D)		H (p)		H (D),	0	≤ D ≤ min{p, 1		− p},
						{	−	}

This function is illustrated in Figure 10.4.

		1
		0.9
		0.8
		0.7
		0.6
	D)	0.5
	R(	0.5
	R(
		0.4

0.3

0.2

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

(10.23)

FIGURE 10.4. Rate distortion function for a Bernoulli ( 12 ) source.

310 RATE DISTORTION THEORY

The above calculations may seem entirely unmotivated. Why should minimizing mutual information have anything to do with quantization? The answer to this question must wait until we prove Theorem 10.2.1.

10.3.2Gaussian Source

Although Theorem 10.2.1 is proved only for discrete sources with a bounded distortion measure, it can also be proved for well-behaved continuous sources and unbounded distortion measures. Assuming this general theorem, we calculate the rate distortion function for a Gaussian source with squared-error distortion.

Theorem 10.3.2 The rate distortion function for a N(0, σ 2) source with squared-error distortion is

log

σ 2

≤

σ 2,

R(D)

2 D

(10.24)

≤

D > σ .

Proof: Let X be N(0, σ 2). By the rate distortion theorem extended to continuous alphabets, we have

	=	ˆ	2		;	ˆ	(10.25)
R(D)				I (X		X).
		min

f (xˆ |x):E(X−X) ≤D

As in the preceding example, we ﬁrst ﬁnd a lower bound for the rate

ˆ	2																−
distortion function and then prove that this is achievable. Since E(X
X) ≤ D, we observe that
		ˆ							ˆ							(10.26)
		I (X; X) = h(X) − h(X\|X)														(10.26)
			1					2				ˆ ˆ
								2				ˆ ˆ
		=	2 log(2π e)σ						− h(X − X\|X)							(10.27)
			1					2				ˆ
								2				ˆ
		≥	2 log(2π e)σ						− h(X − X)							(10.28)
			1					2				ˆ	2
								2				ˆ	2
		≥	2 log(2π e)σ						− h(N(0, E(X − X)					))		(10.29)
			1					2		1				ˆ	2	(10.30)
								2						ˆ	2
		=	2 log(2π e)σ						−	2 log(2π e)E(X − X)
		≥	1	log(2π e)σ				2	−	1	log(2π e)D					(10.31)
								2
			2							2
		=	1		log	σ 2										(10.32)
							,
			2			D	,

10.3 CALCULATION OF THE RATE DISTORTION FUNCTION

311

where (10.28) follows from the fact that conditioning reduces entropy and (10.29) follows from the fact that the normal distribution maximizes the entropy for a given second moment (Theorem 8.6.5). Hence,

R(D) ≥	1	log	σ 2		(10.33)
				.
	2		D

To ﬁnd the conditional density f (xˆ|x) that achieves this lower bound, it is usually more convenient to look at the conditional density f (x|x)ˆ , which is sometimes called the test channel (thus emphasizing the duality of rate distortion with channel capacity). As in the binary case, we construct

f (x x)

bound. We choose the joint distribution

| ˆ to achieve equality in the

, we choose

as shown in Figure 10.5. If D ≤

− D),

Z N(0, D),

(10.34)

X = X + Z, X N(0, σ

Z are independent. For this joint distribution, we calculate

where X and

σ 2

I (X; X) = 2 log D ,

(10.35)

= D, thus achieving the bound in (10.33). If D > σ

, we

and E(X − X)

choose ˆ = 0 with probability 1, achieving = 0. Hence, the rate

X R(D)

distortion function for the Gaussian source with squared-error distortion is

log

σ 2

≤

σ 2,

R(D)

(10.36)

≤

D > σ ,

as illustrated in Figure 10.6.

We can rewrite (10.36) to express the distortion in terms of the rate,

D(R)

σ 22−2R .

(10.37)

Z ~

(0,D)

X ~ (0, s 2

− D)

X ~

(0,s 2)

FIGURE 10.5. Joint distribution for Gaussian source.

312 RATE DISTORTION THEORY

R(D)

4.5

3.5

2.5

1.5

0.5

0.2

0.4

0.6

0.8

1.2

1.4

1.6

1.8

FIGURE 10.6. Rate distortion function for a Gaussian source.

Each bit of description reduces the expected distortion by a factor of 4. With a 1-bit description, the best expected square error is σ 2/4. We can compare this with the result of simple 1-bit quantization of a N(0, σ 2) random variable as described in Section 10.1. In this case, using the two regions corresponding to the positive and negative real lines and reproduction points as the centroids of the respective regions, the expected dis-

tortion is (π −2) σ 2 = 0.3633σ 2 (see Problem 10.1). As we prove later, the

rate distortion limit R(D) is achieved by considering long block lengths. This example shows that we can achieve a lower distortion by considering several distortion problems in succession (long block lengths) than can be achieved by considering each problem separately. This is somewhat surprising because we are quantizing independent random variables.

10.3.3 Simultaneous Description of Independent Gaussian

Random Variables

Consider the case of representing m independent (but not identically distributed) normal random sources X1, . . . , Xm, where Xi are N(0, σi2), with squared-error distortion. Assume that we are given R bits with which to represent this random vector. The question naturally arises as to how we should allot these bits to the various components to minimize the total distortion. Extending the deﬁnition of the information rate distortion

10.3 CALCULATION OF THE RATE DISTORTION FUNCTION

313

function to the vector case, we have

R(D) =

minm

ˆ m

I (X

ˆ m

(10.38)

; X

f (xˆ

):Ed(X

)≤D

where d(xm, xˆm) =

(xi

− xˆi )2. Now using the arguments in the pre-

i=1

have

ceding example, we

I (X

ˆ m

) = h(X

) − h(X

ˆ m

)

(10.39)

; X

= h(Xi ) − h(Xi |Xi−1, Xˆ m)

(10.40)

i=1

≥				h(Xi ) −				ˆ	(10.41)
≥				h(Xi ) −				h(Xi \|Xi )	(10.41)
	i=1						i=1
		m
= I (Xi ; Xˆ i )									(10.42)
	i=1
		m
≥ R(Di )									(10.43)
	i=1							+ ,
=				2			Di
		m			1	log	σi2		(10.44)
	i=1					log			(10.44)
	i=1
ˆ		2	and (10.41) follows from the fact that condi-
where Di = E(Xi − Xi )			and (10.41) follows from the fact that condi-

tioning reduces entropy. We can achieve equality in (10.41) by choosing

	\|	xˆm)		=	m	2	\|
					i=
f (xm					1 f (xi xˆi ) and in (10.43) by choosing the distribution of
	ˆ		i N		(0, σ	i −	i
	X
each

lem of ﬁnding the rate distortion function can be reduced to the following optimization (using nats for convenience):

R(D)

min

max

σi2

, 0 .

(10.45)

i=1

Using Lagrange multipliers, we construct the functional

σ 2

J (D) =

+ λ

Di ,

(10.46)

i=1

314 RATE DISTORTION THEORY

and differentiating with respect to Di and setting equal to 0, we have

	∂J	= −	1 1	+ λ = 0	(10.47)

	∂Di		2 Di
or		Di = λ .			(10.48)

Hence, the optimum allotment of the bits to the various descriptions results in an equal distortion for each random variable. This is possible if the constant λ in (10.48) is less than σi2 for all i. As the total allowable distortion D is increased, the constant λ increases until it exceeds σi2 for some i. At this point the solution (10.48) is on the boundary of the allowable region of distortions. If we increase the total distortion, we must use the Kuhn – Tucker conditions to ﬁnd the minimum in (10.46). In this case the Kuhn – Tucker conditions yield

			∂J	= −	1 1		+ λ,	(10.49)
		∂Di			2	Di
where λ is chosen so that
	∂J		= 0		if Di < σi2			(10.50)
					if Di ≥ σi2.
	∂Di			≤ 0

It is easy to check that the solution to the Kuhn – Tucker equations is given by the following theorem:

Theorem 10.3.3 (Rate distortion for a parallel Gaussian source) Let Xi N(0, σi2), i = 1, 2, . . . , m, be independent Gaussian random vari-

ables, and let the distortion measure be d(xm, xˆm) = m (xi − xˆi )2 .

i=1

Then the rate distortion function is given by

		m	1			σ 2
R(D) =					log	i		(10.51)
							,
				2		Di	,
		i=1
where
Di	λ		if λ < σi2,					(10.52)
	=	σi2	if λ ≥ σi2,
	m	Di = D.
where λ is chosen so that i=1		Di = D.

<<< < Предыдущая 22 23 24 25 26 27 28 29 30 31 32 3334 / 7834 35 36 37 38 39 40 41 42 43 44 45 46 > Следующая >>>

Соседние файлы в папке Теория информации

#
09.08.201310.58 Mб290Cover T.M., Thomas J.A. Elements of Information Theory. 2006., 748p.pdf
#
09.08.20131.32 Mб34Gray R.M. Entropy and information theory. 1990., 284p.pdf