Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
литература / Digital_Video_and_HD_Second_Edition_Algorithms_and_Interfaces.pdf
Скачиваний:
0
Добавлен:
13.05.2026
Размер:
38.02 Mб
Скачать

Figure 45.4 An 8× 8 array of luma samples from an image is shown. This 8× 8 array is known in JPEG terminology as a block.

optimizations comparable to those of the fast Fourier transform (FFT), greatly reduces computational complexity: A fully optimized 8× 8 DCT requires as few as 11 multiplies for each 8 samples (or in an IDCT, transform coefficients).

JPEG encoding example

I will illustrate JPEG encoding by walking through a numerical example. Figure 45.4 represents an 8× 8 array of luma samples from an image, prior to level shifting:

 

 

 

 

 

 

 

 

 

 

139

144

149

153

155

155

155

155

 

144

151

153

156

159

156

156

156

 

150

155

160

163

158

156

156

156

f =

159

161

162

160

160

159

159

159

159

160

161

162

162

155

155

155

 

161

161

161

161

160

157

157

157

 

162

162

161

163

162

157

157

157

 

162

162

161

161

163

158

158

158

 

 

 

 

 

 

 

 

 

Figure 45.5 The DCT tends to concentrate the power of the image block into low-frequency DCT coefficients (those in the upper left-hand corner of the matrix). No information is lost at this stage. The DCT is its own inverse, within a scale factor, so performing the DCT on these transform coefficients would reconstruct the original samples (subject only to roundoff error).

The result of computing the DCT, rounded to integers, is shown in Figure 45.5:

 

 

-1

-12

-5

 

-2

-3

 

 

1260

2

1

 

-23 -17 -6 -3 -3

0

0

1

 

-11 -9 -2

2

0

-1 -1

0

F =

-7

-2

0

1

1

0

0

0

-1 -1

1

2

0

-1

1

1

 

2

0

2

0

-1

1

1

-1

 

-1

0

0 -1

0

2

1 -1

 

-3

2 -4 -2

2

1

-1

0

 

 

 

 

 

 

 

 

 

In MPEG-2, DC terms can be coded with 8, 9, or 10 bits – or, in 4:2:2 profile, 11 bits – of precision.

This example shows that image power is concentrated into low-frequency transform coefficients – that is, those coefficients in the upper left-hand corner of the DCT matrix. No information is lost at this stage. The DCT is its own inverse, so performing the DCT a second time would perfectly reconstruct the original samples, subject only to the roundoff error in the DCT and IDCT.

As expressed in Equation 45.1, the arithmetic of an 8× 8 DCT effectively causes the coefficient values to be multiplied by a factor of 8 relative to the orig-

496

DIGITAL VIDEO AND HD ALGORITHMS AND INTERFACES

In MPEG, default quantizer matrices are standardized, but they can be overridden by matrices conveyed in the bitstream.

16 11 10 16 24 40 51 61

12 12 14 19 26 58 60 55

14 13 16 24 40 57 69 56

14 17 22 29 51 87 80 62

Q = 18 22 37 56 68 109 103 77

24 35 55 64 81 104 113 92

49 64 78 87 103 121 120 101

72 92 95 98 112 100 103 99

Figure 45.6 A typical JPEG quantizer matrix reflects the visual system’s poor sensitivity to high spatial frequencies. Transform coefficients can be approximated, to some degree, without introducing noticeable impairments. The quantizer matrix codes a step size for each spatial frequency. Each transform coefficient is divided by the corresponding quantizer value; the remainder (or fraction) is discarded. Discarding the fraction is what makes JPEG lossy.

inal sample values. The value 1260 in the [0, 0] entry – the DC coefficient, or term – is 18 of the sum of the original sample values. (All of the other coefficients are referred to as AC.)

The human visual system is not very sensitive to information at high spatial frequencies. Information at high spatial frequencies can be discarded, to some degree, without introducing noticeable impairments. JPEG uses a quantizer matrix (Q), which codes a step size for each of the 64 spatial frequencies. In the quantization step of compression, each transform coefficient is divided by the corresponding quantizer value (step size) entry in the Q matrix. The remainder (fraction) after division is discarded.

It is not the DCT itself, but the discarding of the fraction after quantization of the transform coefficients, that makes JPEG lossy!

JPEG has no standard or default quantizer matrix; however, sample matrices given in a nonnormative appendix are often used. Typically, there are two matrices, one for luma and one for colour differences.

An example Q matrix is shown in Figure 45.6 above. Its entries form a radially symmetric version of

Figure 23.5, on page 252. The [0, 0] entry in the quantizer matrix is relatively small (here, 16), so the DC term

CHAPTER 45

JPEG AND MOTION-JPEG (M-JPEG) COMPRESSION

497

Figure 45.7 DCT coefficients after quantization are shown. Most of the high-frequency information in this block – DCT entries at the right and the bottom of the matrix – are quantized to zero. The nonzero coefficients have small magnitudes.

is finely quantized. Further from [0, 0], the entries get larger, and the quantization becomes more coarse. Owing to the large step sizes associated with the highorder coefficients, they can be represented by fewer bits.

In the JPEG and MPEG standards, and in most JPEG-like schemes, each entry in the quantizer matrix takes a value between 1 and 255.

At first glance, the large step size associated with the DC coefficient (here, Q0,0 =16) looks worrisome: With 8-bit data ranging from -127 to +128, owing to the divisor of 16, you might expect this quantized coefficient to be be represented with just 4 bits. However, as mentioned earlier, the arithmetic of Equation 45.1 scales the coefficients by 8 with respect to the sample values, so a quantizer value of 16 corresponds to 7 bits of precision when referenced to the sample values.

DCT coefficients after quantization, and after discarding the quotient fractions, are shown in Figure 45.7:

 

 

 

 

 

 

 

 

 

 

79

0

-1

0

0

0

0

0

 

-2

-1

0

0

0

0

0

0

 

-1

-1

0

0

0

0

0

0

F* =

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

 

 

0

0

0

0

0

0

0

0

 

0

0

0

0

0

0

0

0

 

0

0

0

0

0

0

0

0

 

 

 

 

 

 

 

 

 

Most of the high-frequency information in this block – the DCT entries at the right and the bottom of the matrix – are quantized to zero. Apart from the DC term, the nonzero coefficients have small magnitudes.

Following quantization, the quantized coefficients are rearranged according to the likely distribution of image power in the block. This is accomplished by zigzag scanning, sketched in Figure 45.8 at the top of the facing page.

Once rearranged, the quantized coefficients are represented in a one-dimensional string; an end of block (EOB) code marks the location in the string where all

498

DIGITAL VIDEO AND HD ALGORITHMS AND INTERFACES

Figure 45.8 Zigzag scanning is used to rearrange quantized coefficients according to the likely distribution of image power in the block.

In JPEG and MPEG terminology, the magnitude (absolute value) of a coefficient is called its level.

MPEG’s VLE tables are standardized; they do not need to be transmitted with each sequence or each picture.

 

 

 

 

 

 

 

 

 

 

 

79

0

-1

0

0

0

0

0

 

 

-2

-1

0

0

0

0

0

0

 

 

-1

-1

0

0

0

0

0

0

 

F*=

0

0

0

0

0

0

0

0

 

0

0

0

0

0

0

0

0

 

 

 

 

0

0

0

0

0

0

0

0

 

 

0

0

0

0

0

0

0

0

 

 

0

0

0

0

0

0

0

0

 

 

 

 

succeeding coefficients are zero, as sketched in

Figure 45.9:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

79 0

-2

-1

-1

-1

0

0

-1

 

EOB

 

 

 

 

 

 

 

 

 

 

Figure 45.9 Zigzag-scanned coefficient string

In the usual case that just a few high-order quantized coefficients are nonzero, zigzag reordering tends to produce strings of repeating zeros. Additional compression can be accomplished by using variable-length encoding (VLE, also known as Huffman coding). Variable-length encoding is a lossless process that takes advantage of the statistics of the “run length” (the count of zero codes) and the “level” (absolute value, or magnitude) of the following transform coefficient.

The DC term is treated specially: It is differentially coded. The first DC term is coded directly (using a DC VLE table), but successive DC terms are coded as differences from that. In essence, the previous DC term is used as a predictor for the current term. Separate predictors are maintained for Y’, CB, and CR.

Zero AC coefficients are collapsed, and the string is represented in {run length, level} pairs, as shown in Figure 45.10:

{1: -2}, {0: -1}, {0: -1}, {0: -1}, {2: -1}, EOB

Figure 45.10 VLE {run length, level} pairs

A JPEG encoder has one or more VLE tables that map the set of {run length, level} pairs to variable-length bitstrings; pairs with high probability are assigned short bitstrings. JPEG has no standard VLE tables; however, sample tables given in a nonnormative appendix are often used. Typically, there are two tables, one for luma and one for colour differences. The tables used for an

CHAPTER 45

JPEG AND MOTION-JPEG (M-JPEG) COMPRESSION

499

Соседние файлы в папке литература