Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
литература / Digital_Video_and_HD_Second_Edition_Algorithms_and_Interfaces.pdf
Скачиваний:
0
Добавлен:
13.05.2026
Размер:
38.02 Mб
Скачать

Figure 45.1 A JPEG 4:2:0

Four 8× 8 Luma (Y’) blocks

minimum coded unit (MCU)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

comprises six 8× 8 blocks: four

 

 

 

8× 8 CB block 8× 8 CR block

luma blocks, a block of CB, and

 

 

 

a block of CR. The six constit-

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

uent blocks result from

 

 

 

 

 

 

 

 

nonlinear R’G’B’ data being

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

matrixed to Y’CBCR, then

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

subsampled according to the

 

 

 

 

 

 

 

 

4:2:0 scheme; chroma subsam-

 

 

 

 

 

 

 

 

pling is effectively the first stage

 

 

 

 

 

 

 

 

of compression. The blocks are

 

 

 

 

 

 

 

 

processed independently.

 

 

 

 

 

 

 

 

In MPEG, a macroblock is the area covered by a 16× 16 array of luma samples. In DV, a macroblock comprises the Y’, CB, and CR blocks covered by an 8× 8 array (block) of chroma samples. In JPEG, an MCU comprises those blocks covered by the minimum-sized tiling of Y’, CB, and CR blocks. For 4:2:0 subsampling, all of these definitions are equivalent; they differ for 4:1:1 and 4:2:2 (or for JPEG’s other rarely used patterns).

In desktop graphics, saving JPEG at high quality may cause individual R’G’B’ channels (components) to be compressed without subsampling.

Quantizer matrices and VLE tables will be described in the example starting on page 496.

I use zero-origin array indexing.

JPEG blocks and MCUs

An 8× 8 array of sample data is known in JPEG terminology as a block. Prior to JPEG compression of a colour image, normally the nonlinear R’G’B’ data is matrixed to Y’CBCR, then subsampled 4:2:0. According to the JPEG standard (and the JFIF standard, to be described), other colour subsampling schemes are possible; strangely, different subsampling ratios are permitted for CB and CR. However, only 4:2:0 is widely deployed, and the remainder of this discussion assumes 4:2:0. Four

8× 8 luma blocks, an 8× 8 block of CB, and an 8× 8 block of CR are known in JPEG terminology as

a minimum coded unit (MCU); this corresponds to a macroblock in DV or MPEG terminology. The 4:2:0

macroblock arrangement is shown in Figure 45.1 above. The luma and colour difference blocks are processed

independently by JPEG, using virtually the identical algorithm. The only significant difference is that the quantizer matrix and the VLE tables used for chroma blocks are usually different from the quantizer matrix and VLE tables used for luma blocks.

As explained in Spatial frequency domain on page 238, typical images are dominated by power at low spatial frequencies. In Figure 45.4, on page 496,

I present an example 8× 8 array of luma samples from an image. In Figure 45.2 at the top of the facing page, I show an 8× 8 array of the spatial frequencies computed from this luma array through the DCT. The [0, 0] entry (the DC term), at the upper left-hand corner of that array represents power at zero frequency. That entry typically contains quite a large value; it is not

492

DIGITAL VIDEO AND HD ALGORITHMS AND INTERFACES

Figure 45.2 The DCT concentrates image power at low spatial frequencies. In Figure 45.4, on page 496, I give an example 8× 8 array of luma samples from an image. The magnitudes of the spatial frequency coefficients after the DCT transform are shown in this plot. Most of the image power is collected in the [0, 0] (DC) coefficient, whose value is so large that it is omitted from this plot. Only a handful of other (AC) coefficients are much greater than zero.

DC coefficient

25

(not plotted)

20

15

10

5

0

0

 

 

 

 

 

 

2

1 0

1

2

3 4

 

4

3

 

 

 

 

5

5

v (vertical)

u (horizontal)

 

6 7 7 6

 

 

 

plotted here. Coefficients near that one tend to have fairly high values; coefficients tend to decrease in value further away from [0, 0]. Depending upon the image data, a few isolated high-frequency coefficients may have high values.

This typical distribution of image power, in the spatial frequency domain, represents the redundancy present in the image. The redundancy is reduced by coding the image in that domain, instead of coding the sample values of the image directly.

In addition to its benefit of removing redundancy from typical image data, representation in spatial frequency has another advantage. The lightness sensitivity of the visual system depends upon spatial frequency: We are more sensitive to low spatial frequencies than high, as can be seen from the graph in Figure 23.5, on page 252. Information at high spatial frequencies can be degraded to a large degree, without having any objectionable (or perhaps even perceptible) effect on image quality. Once image data is transformed by the DCT, high-order coefficients can be approximated – that is, coarsely quantized – to discard data corresponding to spatial frequency components that have little contribution to the perceived quality of the image.

In principle, the DCT algorithm could be applied to any block size, from 2× 2 up to the size of the whole image, perhaps 512× 512. (DCT is most efficient when applied to a matrix whose dimensions are powers of

CHAPTER 45

JPEG AND MOTION-JPEG (M-JPEG) COMPRESSION

493

Figure 45.3 The JPEG block diagram shows the encoder (at the top), which performs the discrete cosine transform

(DCT). The DCT is followed by a quantizer (Q), then

a variable-length encoder

(VLE). The decoder (at the bottom) performs the inverse of each of these operations, in reverse order.

DISCRETE

 

VARIABLE-

COSINE

 

LENGTH

TRANSFORM

QUANTIZER

ENCODER

DCT

Q

VLE

 

 

VARIABLE-

INVERSE

INVERSE

LENGTH

DCT

QUANTIZER

DECODER

DCT-1 Q-1 VLE-1

Inverse quantization (IQ) has no relation to the historical NTSC IQ colour difference components.

two.) The choice of 8× 8 blocks of luma for the application of DCT in video represents a compromise between a block size small enough to minimize storage and processing overheads, but large enough to effectively exploit image redundancy.

The DCT operation discards picture information to which vision is insensitive. Surprisingly, though, the JPEG standard itself makes no reference to perceptual uniformity. Because JPEG’s goal is to represent visually important information, it is important that so-called RGB values presented to the JPEG algorithm are first subject to a nonlinear transform such as that outlined in Perceptual uniformity, on page 8, that mimics vision.

JPEG block diagram

The JPEG block diagram in Figure 45.3 shows, at the top, the three main blocks of a JPEG encoder: the discrete cosine transform (DCT) computation (sometimes called forward DCT, FDCT), quantization (Q), and variable-length encoding (VLE). The decoder (at the bottom of Figure 45.3) performs the inverse of each of these operations, in reverse order. The inverse DCT is sometimes denoted IDCT; inverse quantization is sometimes called dequantization, and sometimes denoted IQ.

Owing to the eight-line-high vertical transform, eight lines of image memory are required in the DCT subsystem of the encoder, and in the IDCT (DCT-1) subsystem of the decoder. When the DCT is implemented in separable form, as is almost always the case, this is called transpose memory.

494

DIGITAL VIDEO AND HD ALGORITHMS AND INTERFACES

Eq 45.1

Eq 45.2

Level shifting

The DCT formulation in JPEG is intended for signed sample values. In ordinary hardware or firmware, the DCT is implemented in fixed-point, two’s complement arithmetic. Standard video interfaces use offset binary representation, so each luma or colour difference sample is level shifted prior to DCT by subtracting 2k-1, where k is the number of bits in use.

Discrete cosine transform (DCT)

The 8× 8 forward DCT (FDCT) takes an 8× 8 array of 64 sample values (denoted f, whose elements are fi,j), and produces an 8× 8 array of 64 transform coefficients (denoted F, whose elements are Fu,v). The FDCT is expressed by this equation:

 

1

 

 

 

 

7 7

 

(2i + 1)uπ

 

(2 j + 1)vπ

 

 

 

 

 

 

 

 

 

 

 

 

 

Fu,v =

4 C(u)C(v)Σ Σ fi, j cos

 

16 cos

 

16

;

 

 

 

 

 

 

i=0 j=0

 

 

 

 

 

 

 

1

w = 0

 

 

 

 

 

 

 

 

 

 

;

 

 

 

 

 

 

 

 

2

 

 

 

 

 

 

C(w) =

 

 

 

 

 

 

 

 

 

 

 

 

w = 1, 2,,7

 

 

 

 

 

 

 

1;

 

 

 

 

 

 

 

 

The cosine terms need not be computed on-the-fly; they can be precomputed and stored in tables.

The inverse transform – the IDCT, or DCT-1 – is this:

 

1

7 7

 

(2i + 1)uπ

 

 

(2 j + 1)vπ

fi, j =

Σ Σ C u C v Fu,v cos

 

cos

 

 

 

 

 

 

4

16

 

16

 

 

u=0 v=0

 

 

 

 

 

 

The forward and inverse transforms involve nearly identical arithmetic: The complexity of encoding and decoding is very similar. The DCT is its own inverse (within a scale factor), so performing the DCT on the transform coefficients would perfectly reconstruct the original samples, subject only to the roundoff error in the DCT and IDCT.

If implemented directly according to these equations, an 8× 8 DCT requires 64 multiply operations (and 49 additions) for each of the 64 result coefficients, for a total of 4096 multiplies, an average of 8 multiplication operations per pixel. However, the DCT is separable: an 8× 8 DCT can be computed as eight 8× 1 horizontal transforms followed by eight 1× 8 vertical transforms. This optimization, combined with other

CHAPTER 45

JPEG AND MOTION-JPEG (M-JPEG) COMPRESSION

495

Соседние файлы в папке литература