Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
литература / Digital_Video_and_HD_Second_Edition_Algorithms_and_Interfaces.pdf
Скачиваний:
0
Добавлен:
13.05.2026
Размер:
38.02 Mб
Скачать

Inverse quantization is sometimes denoted IQ, not to be confused with IQ colour difference components.

When a decoder reconstructs a P-picture, it is displayed; additionally, the picture is written into a reference frame so as to be available for subsequent predictions.

Each B-picture contains elements that are bipredicted from one or both reference frames. The encoder computes, compresses, and transmits residuals. The decoder reconstructs a B-picture, displays it, then discards it: No B-picture is used for prediction.

Each reference picture is associated with a full frame of storage. When a decoder reconstructs a reference field (an I-field or a P-field), half the lines of the reference framestore are written; the other half retains the contents of the previous reference field. After the first field of a field pair has been reconstructed, it is available as a predictor for the second field. (The first field of the previous reference frame is no longer available.)

Prediction

In Figure 16.1, on page 152, I sketched a naïve interpicture coding scheme. For any scene element that moves more than few pixels from one video frame to the next, the naïve scheme is liable to produce large interpicture difference values. Motion can be more effectively coded by having the encoder form motioncompensated predictions. The encoder also produces motion vectors; these are used to displace a region of a reference picture to improve the prediction of the current picture relative to an undisplaced prediction. The residuals are then compressed using DCT, quantized, and VLE-encoded.

At a decoder, predictions are formed from the reference picture(s), based upon the transmitted motion vectors and prediction modes. Residuals are recovered from the bitstream by VLE decoding, inverse quantization, and inverse DCT. Finally, the decoded residual is added to the prediction to form the reconstructed picture. If the decoder is reconstructing an I-picture or a P-picture, the reconstructed picture is written to the appropriate portion (or the entirety) of a reference frame.

The obvious way for an encoder to form forward interpicture differences is to subtract the current source picture from the reference picture. (The reference

CHAPTER 47

MPEG-2 VIDEO COMPRESSION

521

A prediction region in a reference frame is rarely aligned to

a 16-luma-sample macroblock grid; it is not properly referred to as a macroblock. Some authors fail to make the distinction between macroblocks and prediction regions; other authors use the term prediction macroblocks for prediction regions.

In a closed GoP, no B-picture is permitted to use forward prediction to the I-picture that starts the next GoP. See the caption to Figure 16.5, on page 155.

picture would have been subject to motion-compen- sated interpolation, according to the encoder’s motion estimate.) Starting from an intra coded picture, the decoder would then accumulate interpicture differences. However, MPEG involves lossy compression: Both the I-picture starting point of a GoP and each set of decoded interpicture differences are subject to reconstruction errors. With the naïve scheme of computing interpicture differences, reconstruction errors would accumulate at the decoder. To alleviate this potential source of decoder error, the encoder incorporates a decoder. The interpicture difference is formed by subtracting the current source picture from the previous reference picture as a decoder will reconstruct it. Reconstruction errors are thereby brought “inside the loop,” and are prevented from accumulating.

The prediction model used by MPEG-2 is blockwise translation of 16× 16 blocks of luma samples (along with the associated chroma samples): A macroblock of the current picture is predicted from a like-sized region of a reconstructed reference picture. The choice of

16× 16 region size was a compromise between the desire for a large region (to effectively exploit spatial coherence, and to amortize motion vector overhead across a fairly large number of samples), and a small region (to efficiently code small scene elements in motion).

Macroblocks in a P-picture are typically forwardpredicted. However, an encoder can decide that a particular macroblock is best intracoded (that is, not predicted at all). Macroblocks in a B-picture are typically predicted as averages of motion-compensated past and future reference pictures – that is, they are ordinarily bidirectionally predicted. However, an encoder can decide that a particular macroblock in a B-picture is best intracoded, or unidirectionally predicted using either forward or backward prediction. Table 47.7 at the top of the facing page indicates the four macroblock types. The macroblock types allowed in any picture are restricted by the declared picture type, as indicated in Table 47.8.

Each nonintra macroblock in an interlaced sequence can be predicted either by frame prediction (typically chosen by the encoder when there is little motion

522

DIGITAL VIDEO AND HD ALGORITHMS AND INTERFACES

Table 47.7 MPEG macroblock

 

Typ. quantizer

types

 

Prediction

matrix

 

 

 

 

Intra

 

None – the macroblock is self-contained

Perceptual

 

 

 

 

 

Backward

Predicts from the future reference picture

Flat

 

predictive-coded

 

 

 

 

 

 

Inter

Forward

Predicts from the past reference picture

Flat

(Nonintra)

predictive-coded

 

 

 

 

 

 

 

Bipredictive-coded

Averages predictions from past and future

Flat

reference pictures

Table 47.8 MPEG

Binary

Reference

 

 

 

picture coding types

code

picture? Permitted macroblock types

 

 

 

 

 

 

 

 

 

 

I-picture

001

Yes

Intra

 

 

 

 

 

 

 

 

 

 

P-picture

010

Yes

Intra

 

 

 

 

 

 

Forward predictive-coded

 

 

 

 

 

 

 

 

 

 

B-picture

011

No

Intra

 

 

 

 

 

 

Forward predictive-coded

 

 

 

 

 

 

Backward predictive-coded

 

 

 

 

 

 

Bipredictive-coded

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Table 47.9 MPEG-2

 

 

 

 

Max. MVs

prediction modes

For

 

Description

back.

fwd.

Frame prediction

(P, B)-pictures

Predictions are made for the frame, using

1

1

 

 

 

data from one or two previously

 

 

 

 

 

reconstructed frames.

 

 

Field prediction

(P, B)-pictures,

Predictions are made independently for

1

1

 

(P, B)-fields

 

each field, using data from one or two

 

 

 

 

 

previously reconstructed fields.

 

 

16× 8 motion

(P, B)-fields

 

The upper 16× 8 and lower 16× 8 regions

2

2

compensation

 

 

of the macroblock are predicted separately.

 

 

(16× 8 MC)

 

 

(This is completely unrelated to top and

 

 

 

 

 

bottom fields.)

 

 

Dual prime

P-fields with

 

Two motion vectors are derived from the

1

1

 

no intervening

transmitted vector and a small differential

 

 

 

B-pictures

 

motion vector (DMV, -1, 0, or +1); these

 

 

 

 

 

are used to form predictions from two

 

 

 

 

 

reference fields (one top, one bottom),

 

 

 

 

 

which are averaged to form the predictor.

 

 

Dual prime

P-pictures with

As in dual prime for P-fields (above), but

1

1

 

no intervening

repeated for 2 fields; 4 predictions are

 

 

 

B-pictures

 

made and averaged.

 

 

 

 

 

 

 

 

 

CHAPTER 47

MPEG-2 VIDEO COMPRESSION

523

between the fields), or by field prediction (typically chosen by the encoder when there is significant interfield motion). This is comparable to field/frame coding in DV, which I described on page 507. Predictors for a field picture must be field predictors. However, predictors for a frame picture may be chosen on

a macroblock-by-macroblock basis to be either field predictors or frame predictors. MPEG-2 defines several additional prediction modes, which can be selected on a macroblock-by-macroblock basis. MPEG-2’s prediction modes are summarized in Table 47.9.

Motion vectors (MVs)

A motion vector identifies a region of 16× 16 luma samples in a reference picture that are to be used for prediction. A motion vector refers to a prediction region that is potentially quite distant (spatially) from the region being coded – that is, the motion vector range can be quite large. Even in field pictures, motion vectors are specified in units of frame luma samples. A motion vector can specify integer pixel coordinates, in which case forming the 16× 16 prediction is accomplished by merely copying pixels. However, in MPEG, a motion vector can be specified to half-sample precision: If the fractional bit of a motion vector is set, then the prediction is formed by averaging sample values at the neighboring integer coordinates – that is, by linear interpolation. Transmitted motion vector values are halved for use with subsampled chroma. All defined profiles require that no motion vector refers to any sample outside the bounds of the reference frame.

Each macroblock’s header contains a count of motion vectors. Motion vectors are themselves predicted! An initial MV is established at the start of a slice (see page 534); the motion vector for each successive nonintra macroblock is differentially coded with respect to the previous macroblock in raster-scan order.

Motion vectors are variable-length encoded, so that short vectors – the most likely ones in large areas of translational motion or no motion – are coded compactly. Zero-valued motion vectors are quite likely, so provision is made for compact coding of them.

Intra macroblocks are not predicted, so motion vectors are not necessary for them. However, in certain

524

DIGITAL VIDEO AND HD ALGORITHMS AND INTERFACES

Соседние файлы в папке литература