Добавил:

chrysler_a57_mltbnk Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Московский технический университет связи и информатики

Предмет:

Основы телевидения

Файл:

литература / Digital_Video_and_HD_Second_Edition_Algorithms_and_Interfaces.pdf

Скачиваний:

Добавлен:

13.05.2026

Размер:

38.02 Mб

Скачать

☆

<<< < Предыдущая 104 105 106 107 108 109 110 111 112 113 114 115116 / 140116 117 118 119 120 121 122 123 124 125 126 127 128 > Следующая >>>

Figure 45.1 A JPEG 4:2:0	Four 8× 8 Luma (Y’) blocks
minimum coded unit (MCU)
minimum coded unit (MCU)
comprises six 8× 8 blocks: four		8× 8 CB block 8× 8 CR block
luma blocks, a block of CB, and		8× 8 CB block 8× 8 CR block
a block of CR. The six constit-
a block of CR. The six constit-
uent blocks result from
nonlinear R’G’B’ data being
nonlinear R’G’B’ data being
matrixed to Y’CBCR, then
matrixed to Y’CBCR, then
subsampled according to the
4:2:0 scheme; chroma subsam-
pling is effectively the first stage
of compression. The blocks are
processed independently.

In MPEG, a macroblock is the area covered by a 16× 16 array of luma samples. In DV, a macroblock comprises the Y’, CB, and CR blocks covered by an 8× 8 array (block) of chroma samples. In JPEG, an MCU comprises those blocks covered by the minimum-sized tiling of Y’, CB, and CR blocks. For 4:2:0 subsampling, all of these definitions are equivalent; they differ for 4:1:1 and 4:2:2 (or for JPEG’s other rarely used patterns).

In desktop graphics, saving JPEG at high quality may cause individual R’G’B’ channels (components) to be compressed without subsampling.

Quantizer matrices and VLE tables will be described in the example starting on page 496.

I use zero-origin array indexing.

JPEG blocks and MCUs

An 8× 8 array of sample data is known in JPEG terminology as a block. Prior to JPEG compression of a colour image, normally the nonlinear R’G’B’ data is matrixed to Y’CBCR, then subsampled 4:2:0. According to the JPEG standard (and the JFIF standard, to be described), other colour subsampling schemes are possible; strangely, different subsampling ratios are permitted for CB and CR. However, only 4:2:0 is widely deployed, and the remainder of this discussion assumes 4:2:0. Four

8× 8 luma blocks, an 8× 8 block of CB, and an 8× 8 block of CR are known in JPEG terminology as

a minimum coded unit (MCU); this corresponds to a macroblock in DV or MPEG terminology. The 4:2:0

macroblock arrangement is shown in Figure 45.1 above. The luma and colour difference blocks are processed

independently by JPEG, using virtually the identical algorithm. The only significant difference is that the quantizer matrix and the VLE tables used for chroma blocks are usually different from the quantizer matrix and VLE tables used for luma blocks.

As explained in Spatial frequency domain on page 238, typical images are dominated by power at low spatial frequencies. In Figure 45.4, on page 496,

I present an example 8× 8 array of luma samples from an image. In Figure 45.2 at the top of the facing page, I show an 8× 8 array of the spatial frequencies computed from this luma array through the DCT. The [0, 0] entry (the DC term), at the upper left-hand corner of that array represents power at zero frequency. That entry typically contains quite a large value; it is not

492	DIGITAL VIDEO AND HD ALGORITHMS AND INTERFACES

Figure 45.2 The DCT concentrates image power at low spatial frequencies. In Figure 45.4, on page 496, I give an example 8× 8 array of luma samples from an image. The magnitudes of the spatial frequency coefficients after the DCT transform are shown in this plot. Most of the image power is collected in the [0, 0] (DC) coefficient, whose value is so large that it is omitted from this plot. Only a handful of other (AC) coefficients are much greater than zero.

DC coefficient

(not plotted)

0							2	1 0
0	1	2	3 4		4	3	2
			3 4	5	5	v (vertical)
u (horizontal)				5	6 7 7 6

plotted here. Coefficients near that one tend to have fairly high values; coefficients tend to decrease in value further away from [0, 0]. Depending upon the image data, a few isolated high-frequency coefficients may have high values.

This typical distribution of image power, in the spatial frequency domain, represents the redundancy present in the image. The redundancy is reduced by coding the image in that domain, instead of coding the sample values of the image directly.

In addition to its benefit of removing redundancy from typical image data, representation in spatial frequency has another advantage. The lightness sensitivity of the visual system depends upon spatial frequency: We are more sensitive to low spatial frequencies than high, as can be seen from the graph in Figure 23.5, on page 252. Information at high spatial frequencies can be degraded to a large degree, without having any objectionable (or perhaps even perceptible) effect on image quality. Once image data is transformed by the DCT, high-order coefficients can be approximated – that is, coarsely quantized – to discard data corresponding to spatial frequency components that have little contribution to the perceived quality of the image.

In principle, the DCT algorithm could be applied to any block size, from 2× 2 up to the size of the whole image, perhaps 512× 512. (DCT is most efficient when applied to a matrix whose dimensions are powers of

CHAPTER 45

JPEG AND MOTION-JPEG (M-JPEG) COMPRESSION

493

Figure 45.3 The JPEG block diagram shows the encoder (at the top), which performs the discrete cosine transform

(DCT). The DCT is followed by a quantizer (Q), then

a variable-length encoder

(VLE). The decoder (at the bottom) performs the inverse of each of these operations, in reverse order.

DISCRETE		VARIABLE-
COSINE		LENGTH
TRANSFORM	QUANTIZER	ENCODER

DCT

VLE

		VARIABLE-
INVERSE	INVERSE	LENGTH
DCT	QUANTIZER	DECODER

DCT-1 Q-1 VLE-1

Inverse quantization (IQ) has no relation to the historical NTSC IQ colour difference components.

two.) The choice of 8× 8 blocks of luma for the application of DCT in video represents a compromise between a block size small enough to minimize storage and processing overheads, but large enough to effectively exploit image redundancy.

The DCT operation discards picture information to which vision is insensitive. Surprisingly, though, the JPEG standard itself makes no reference to perceptual uniformity. Because JPEG’s goal is to represent visually important information, it is important that so-called RGB values presented to the JPEG algorithm are first subject to a nonlinear transform such as that outlined in Perceptual uniformity, on page 8, that mimics vision.

JPEG block diagram

The JPEG block diagram in Figure 45.3 shows, at the top, the three main blocks of a JPEG encoder: the discrete cosine transform (DCT) computation (sometimes called forward DCT, FDCT), quantization (Q), and variable-length encoding (VLE). The decoder (at the bottom of Figure 45.3) performs the inverse of each of these operations, in reverse order. The inverse DCT is sometimes denoted IDCT; inverse quantization is sometimes called dequantization, and sometimes denoted IQ.

Owing to the eight-line-high vertical transform, eight lines of image memory are required in the DCT subsystem of the encoder, and in the IDCT (DCT-1) subsystem of the decoder. When the DCT is implemented in separable form, as is almost always the case, this is called transpose memory.

494	DIGITAL VIDEO AND HD ALGORITHMS AND INTERFACES

Eq 45.1

Eq 45.2

Level shifting

The DCT formulation in JPEG is intended for signed sample values. In ordinary hardware or firmware, the DCT is implemented in fixed-point, two’s complement arithmetic. Standard video interfaces use offset binary representation, so each luma or colour difference sample is level shifted prior to DCT by subtracting 2k-1, where k is the number of bits in use.

Discrete cosine transform (DCT)

The 8× 8 forward DCT (FDCT) takes an 8× 8 array of 64 sample values (denoted f, whose elements are fi,j), and produces an 8× 8 array of 64 transform coefficients (denoted F, whose elements are Fu,v). The FDCT is expressed by this equation:

7 7

(2i + 1)uπ

(2 j + 1)vπ

Fu,v =

4 C(u)C(v)Σ Σ fi, j cos

16 cos

;

i=0 j=0

w = 0

;

C(w) =

w = 1, 2,…,7

The cosine terms need not be computed on-the-fly; they can be precomputed and stored in tables.

The inverse transform – the IDCT, or DCT-1 – is this:

	1	7 7	(2i + 1)uπ		(2 j + 1)vπ
fi, j =		Σ Σ C u C v Fu,v cos	(2i + 1)uπ	cos	(2 j + 1)vπ

	4		16		16
		u=0 v=0

The forward and inverse transforms involve nearly identical arithmetic: The complexity of encoding and decoding is very similar. The DCT is its own inverse (within a scale factor), so performing the DCT on the transform coefficients would perfectly reconstruct the original samples, subject only to the roundoff error in the DCT and IDCT.

If implemented directly according to these equations, an 8× 8 DCT requires 64 multiply operations (and 49 additions) for each of the 64 result coefficients, for a total of 4096 multiplies, an average of 8 multiplication operations per pixel. However, the DCT is separable: an 8× 8 DCT can be computed as eight 8× 1 horizontal transforms followed by eight 1× 8 vertical transforms. This optimization, combined with other

CHAPTER 45

JPEG AND MOTION-JPEG (M-JPEG) COMPRESSION

495

<<< < Предыдущая 104 105 106 107 108 109 110 111 112 113 114 115116 / 140116 117 118 119 120 121 122 123 124 125 126 127 128 > Следующая >>>

Соседние файлы в папке литература

#
13.05.202619.44 Mб0Color_Appearance_Models_SE_RUS.pdf
#
13.05.202638.02 Mб0Digital_Video_and_HD_Second_Edition_Algorithms_and_Interfaces.pdf
#
13.05.202610.94 Mб0hubel глаз мозг зрение.djvu
#
13.05.202623.26 Mб0Tsifrovye_videoinformatsionnye_sistemy_teoria_i.pdf
#
13.05.202612.31 Mб0tv_4.djvu
#
13.05.20265.32 Mб0videocode.djvu
#
13.05.20261.98 Mб0Zelenin_IA_PTSVS.pdf