Теория информации / Gray R.M. Entropy and information theory. 1990., 284p
.pdf
11.7. SLIDING BLOCK SOURCE CODES |
239 |
components. Since the components are ergodic, however, an inflnite length sliding block encoder could select such a source sequence in a simple (if impractical) way: Proceed as in the proof of the theorem up to the use of Corollary 9.4.2. Instead of using this result, we construct by brute force a punctuation sequence for the ergodic component in efiect. Suppose that G = fGi; i = 1; 2; ¢ ¢ ¢g is a countable generating fleld for the input sequence space. Given –, the inflnite length sliding block encoder flrst flnds the smallest value of i for which
n¡1
0 < lim 1 X 1Gi (T kx);
n!1 n
k=0
and
n¡1
lim 1 X 1Gi (T kx)‰(xk; a⁄) • –;
n!1 n
k=0
that is, we flnd a set with strictly positive relative frequency (and hence strictly positive probability with respect to the ergodic component in efiect) which occurs rarely enough to ensure that the sample average distortion between the symbols produced when Gi occurs and the reference letter is smaller than –. Given N and – there must exist an i for which these relations hold (apply the proof of Lemma 9.4.4 to the ergodic component in efiect with ° chosen to satisfy (11.23) for that component and then replace the arbitrary set G by a set in the generating fleld having very close probability). Analogous to the proof of Lemma 9.4.4 we construct a punctuation sequence fZng using the event Gi in place of G. The proof then follows in a like manner except that now from the dominated convergence theorem we have that
n¡1
E(12(Z0)‰(X0; a⁄)) = lim 1 X E(12(Zi)‰(Xi; a⁄)
n!1 n
i=0
n¡1
= E( lim 1 X 12(Zi)‰(Xi; a⁄)) • –
n!1 n
i=0
by construction.
The above argument is patterned after that of Davisson and Gray [27] and extends the theorem to stationary nonergodic sources if inflnite window sliding block encoders are allowed. We can then approximate this encoder by a flnitewindow encoder, but we must make additional assumptions to ensure that the resulting encoder yields a good approximation in the sense of overall distortion. Suppose that f is the inflnite window length encoder and g is the flnite windowlength (say 2L + 1) encoder. Let G denote a countable generating fleld of rectangles for the input sequence space. Then from Corollary 4.2.2 applied to G given † > 0 we can flnd for su–ciently large N a flnite window sliding block code r : A2N+1 ! B such that Pr(r 6= f 0) • †=(2L + 1), that is, the two encoders produce the same channel symbol with high probability. The issue is when does this imply that ‰(f g; „) and ‰(rg; „) are therefore also close, which
11.8. A GEOMETRIC INTERPRETATION OF OPTA’S |
241 |
11.8A Geometric Interpretation of OPTA’s
We close this chapter on source coding theorems with a geometric interpretation of the OPTA functions in terms of the ‰„ distortion between sources. Suppose that „ is a stationary and ergodic source and that f‰ng is an additive fldelity criterion with a fldelity criterion. Suppose that we have a nearly optimal sliding
block encoder and decoder for „ and a channel with rate R, that is, if the overall
f ^ g process is Xn; Xn and
^ •
E‰(X0; X0) –(R; „) + –:
If the overall hookup (source/encoder/channel/decoder) yields a distribution p
f ^ g f ^ g
on Xn; Xn and distribution · on the reproduction process Xn , then clearly
‰„(„; ·) • –(R; „) + –:
Furthermore, since the channel alphabet is B the channel process must have entropy rate less than R = log jjBjj and hence the reproduction process must also have entropy rate less than B from Corollary 4.2.5. Since – is arbitrary,
–(R; „) ‚ inf ‰„(„; ·):
:„ ( )•
·H · R
„ • Suppose next that p, „ and · are stationary and ergodic and that H(·) R.
Choose a stationary p having „ and · as coordinate processes such that
Ep‰(X0; Y0) • ‰„(„; ”) + –:
„ • „ •
We have easily that I(X; Y ) H(·) R and hence the left hand side is bounded
„
below by the process distortion rate function Ds(R; „). From Theorem 10.6.1 and the block source coding theorem, however, this is just the OPTA function. We have therefore proved the following:
Theorem 11.8.1: Let „ be a stationary and ergodic source and let f‰ng be an additive fldelity criterion with a reference letter. Then
–(R; „) = inf ‰„(„; ·);
:„ ( )•
·H · R
that is, the OPTA function (and hence the distortion-rate function) of a stationary ergodic source is just the \distance" in the ‰„ sense to the nearest stationary and ergodic process with the specifled reproduction alphabet and with entropy rate less than R.
This result originated in [55].
242 |
CHAPTER 11. SOURCE CODING THEOREMS |
12.2. FEINSTEIN’S LEMMA |
245 |
Z
= |
dPXY (x; y) = PX (B) • 1 |
|
and hence |
F £B |
|
|
|
|
Z |
dPY (y)f (x; y) • 1; PX ¡ a:e: |
(12.2) |
Feinstein’s lemma shows that we can pick M inputs fxi 2 A; i = 1; 2; ¢ ¢ ¢ ; M g, and a corresponding collection of M disjoint output events f¡i 2 BB ; i = 1; 2; ¢ ¢ ¢ ; M g, with the property that given an input xi with high probability the output will be in ¡i. We call the collection C = fxi; ¡i; i = 1; 2; ¢ ¢ ¢ ; M g a code with codewords xi and decoding regions ¡i. We do not require that the ¡i exhaust B.
The generalization of Feinstein’s original proof for flnite alphabets to general measurable spaces is due to Kadota [70] and the following proof is based on his.
Lemma 12.2 Feinstein’s Lemma: Given an integer M and a > 0 there exist xi 2 A; i = 1; ¢ ¢ ¢ ; M and a measurable partition F = f¡i; i = 1; ¢ ¢ ¢ ; M g of B such that
”(¡ci jxi) • M e¡a + PXY (i • a):
Proof: Deflne G = fx; y : i(x; y) > ag Set † = M e¡a + PXY (i • a) = M e¡a + PXY (Gc). The result is obvious if † ‚ 1 and hence we assume that
† < 1 and hence also that
PXY (Gc) • † < 1
and therefore that
Z
PXY (i > a) = PXY (G) = dPX (x)”(Gxjx) > 1 ¡ † > 0:
~ |
¡ † and (12.2) holdsg must have |
This implies that the set A = fx : ”(Gxjx) > 1 |
positive measure under PX We now construct a code consisting of input points
xi and output sets ¡xi . Choose~ |
~ |
= Gx1 . Next choose |
an x1 2 A and deflne ¡x1 |
if possible a point x2 2 A for which ”(Gx2 ¡ ¡x1 jx2) > 1 ¡ †. Continue in this
~
way until either M points have been selected or all the points in A have been
exhausted. In particular, given the pairs fxj ; ¡j g; j = 1; 2; ¢ ¢ ¢ ; i ¡ 1, satisfying the condition, flnd an xi for which
[ |
|
”(Gxi ¡ ¡xj jxi) > 1 ¡ †: |
(12.3) |
j<i
If the procedure terminates before M points have been collected, denote the flnal point’s index by n. Observe that
”(¡xi cjxi) • ”(Gxi cjxi) • †; i = 1; 2; ¢ ¢ ¢ ; n
and hence the lemma will be proved if we can show that necessarily n cannot be strictly less than M . We do this by assuming the contrary and flnding a contradiction.
246 CHAPTER 12. CODING FOR NOISY CHANNELS
Suppose that the selection has terminated at n < M and deflne the set
n |
¡xi 2 BB . Consider the probability |
|
|
|
F = Si=1 |
c |
|
|
|
|
PXY (G) = PXY (G \(A £ F )) + PXY (G |
\(A £ F |
)): |
(12.4) |
The flrst term can be bounded above as
\
PXY (G (A £ F )) • PXY (A £ F ) = PY (F )
n
X
= PY (¡xi ):
|
|
|
|
i=1 |
|
|
|
|
|
We also have from the deflnitions and from (12.2) that |
|
||||||||
|
|
PY (¡xi ) = Z¡xi dPY (y) • ZGxi dPY (y) |
|
||||||
|
• ZGxi |
f (xi; y) |
• e¡a Z dPY (y)f (xi; y) • e¡a |
|
|||||
|
|
|
dPY (y) |
|
|||||
|
ea |
|
|
||||||
and hence |
|
PXY (G |
(A £ F )) • ne¡a: |
|
(12.5) |
||||
|
|
|
|||||||
Consider the second term of (12.3): |
|
|
|
|
|
||||
|
|
|
\ |
|
|
|
|
|
|
|
PXY (G (A £ F c)) = Z |
dPX (x)”((G (A £ F c))xjx) |
|
||||||
= Z |
\ |
|
F cjx) = |
Z |
\ |
n |
(12.6) |
||
dPX (x)”(Gx |
|
dPX (x)”(Gx |
¡ i=1 ¡ijx): |
||||||
|
|
|
\ |
|
|
|
[ |
|
|
We must have, however, that
n
[
”(Gx ¡ ¡ijx) • 1 ¡ †
i=1
with PX probability 1 or there would be a point xn+1 for which
n+1
[
”(Gxn+1 ¡ ¡ijxn+1) > 1 ¡ †;
i=1
that is, (12.3) would hold for i = n + 1, contradicting the deflnition of n as the largest integer for which (12.3) holds. Applying this observation to (12.6) yields
\
PXY (G (A £ F c)) • 1 ¡ † which with (12.4) and (12.5) implies that
PXY (G) • ne¡a + 1 ¡ †: (12.7) From the deflnition of †, however, we have also that
PXY (G) = 1 ¡ PXY (Gc) = 1 ¡ † + M e¡a which with (12.7) implies that M • n, completing the proof. 2
12.3. FEINSTEIN’S THEOREM |
247 |
12.3Feinstein’s Theorem
Given a channel [A; ”; B] an (M; n; †) block channel code for ” is a collection fwi; ¡ig; i = 1; 2; ¢ ¢ ¢ ; M , where wi 2 An, ¡i 2 BBn , all i, with the property that
sup max |
n |
(12.8) |
|
”x (¡i) • †; |
|||
x2c(wi) i=1;¢¢¢;M |
|
where c(an) = fx : xn = ang and where ”xn is the restriction of ”x to BBn . The rate of the code is deflned as n¡1 log M . Thus an (n; M; †) channel code is a collection of M input n-tuples and corresponding output cells such that regardless of the past or future inputs, if the input during time 1 to n is a channel codeword, then the output during time 1 to n is very likely to lie in the corresponding output cell. Channel codes will be useful in a communication system because they permit nearly error free communication of a select group of messages or codewords. A communication system can then be constructed for communicating a source over the channel reliably by mapping source blocks into channel codewords. If there are enough channel codewords to assign to all of the source blocks (at least the most probable ones), then that source can be reliably reproduced by the receiver. Hence a fundamental issue for such an application will be the number of messages M or, equivalently, the rate R of a channel code.
Feinstein’s lemma can be applied fairly easily to obtain something that resembles a coding theorem for a noisy channel. Suppose that [A; ”; B] is a channel hookup. De-
and [A; „] is a source and that [A £ B; p = „”] is the resulting |
K |
denote |
|||||||||
note the resulting pair process by |
X |
; Y |
For any integer K let p |
|
|||||||
the restriction of p to (AK |
BK , f Kn |
ngK ), that is, the distribution on in- |
|||||||||
put/output K-tuples (X |
K |
|
£K |
|
BA £ BB |
|
K |
together with the |
|||
|
; Y |
). The joint distribution p |
|
||||||||
input distribution „K induce a regular conditional probability ”^K deflned by ”^K (F jxK ) = Pr(Y K 2 F jXK = xK ). In particular,
Z
”^K (GjaK ) = Pr(Y K 2 GjXK = aK ) = 1 ”K (G)d„(x): (12.9)
„K (aK ) c(aK ) x
where c(aK ) = fx : xK = aK g is the rectangle of all sequences with a common K-dimensional output. We call ”^K the induced K-dimensional channel of the channel ” and the source „. It is important to note that the induced channel depends on the source as well as on the channel, a fact that will cause some di–culty in applying Feinstein’s lemma. An exception to this case which proves to be an easy application is that of a channel without input memory and anticipation, in which case we have from the deflnitions that
”^K (F jaK ) = ”x(Y K 2 F ); x 2 c(aK );
Application of Feinstein’s lemma to the induced channel yields the following result, which was proved by Feinstein for stationary flnite alphabet channels and is known as Feinstein’s theorem:
248 CHAPTER 12. CODING FOR NOISY CHANNELS
Lemma 12.3.1: Suppose that [A £B; „”] is an AMS and ergodic hookup of
„ „
a source „ and channel ”. Let I„” = I„” (X; Y ) denote the average mutual infor-
„ ⁄
mation rate and assume that I„” = I„” is flnite (as is the case if the alphabets are flnite (Theorem 6.4.1) or have the flnite-gap information property (Theorem
„
6.4.3)). Then for any R < I„” and any † > 0 there exists for su–ciently large
n a code fwin; ¡i; i = 1; 2; ¢ ¢ ¢ ; M g, where M = benRc, win 2 An, and ¡i 2 BBn , with the property that
”^n(¡icjwin) • †; i = 1; 2; ¢ ¢ ¢ ; M: |
(12.10) |
Comment: We shall call a code fwi; ¡i; i = 1; 2; ¢ ¢ ¢ ; M g which satisfles (12.10) for a channel input process „ a („; M; n; †)-Feinstein code. The quantity n¡1 log M is called the rate of the Feinstein code.
Proof: Let · denote the output distribution induced by „ and ”. Deflne the
information density
dpn
in = (d„n £ ·n)
and deflne
„ ¡
– = I„” R > 0:
2
Apply Feinstein’s lemma to the n-dimensional hookup („”)n with M = benRc and a = n(R + –) to obtain a code fwi; ¡ig; i = 1; 2; ¢ ¢ ¢ ; M with
max ”^n(¡c wn) |
• |
M e¡n(R+–) + pn(i |
n • |
n(R + –)) |
|
||||||||||
i |
i j i |
|
|
|
|
|
|
|
|
|
|
||||
|
|
|
|
|
|
1 |
|
|
|
|
|
|
|
||
|
= benRce¡n(R+–) + p( |
|
in(Xn; Y n) • R + –) |
(12.11) |
|||||||||||
|
n |
||||||||||||||
and hence |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
max ”^n(¡c wn) |
• |
e¡n– + p( |
1 |
i |
(Xn; Y n) |
• |
I„ |
–): |
(12.12) |
||||||
|
|||||||||||||||
i |
i j i |
|
|
|
n n |
|
|
„” ¡ |
|
|
|||||
„
From Theorem 6.3.1 n¡1in converges in L1 to I„” and hence it also converges in probability. Thus given † we can choose an n large enough to ensure that the right hand side of (12.11) is smaller than †, which completes the proof of the theorem. 2
We said that the lemma \resembled" a coding theorem because a real coding theorem would prove the existence of an (M; n; †) channel code, that is, it would concern the channel ” itself and not the induced channel ”^, which depends on a channel input process distribution „. The difierence between a Feinstein code and a channel code is that the Feinstein code has a similar property for an induced channel which in general depends on a source distribution, while the channel code has this property independent of any source distribution and for any past or future inputs.
Feinstein codes will be used to construct block codes for noisy channels. The simplest such construction is presented next.
