Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
литература / Digital_Video_and_HD_Second_Edition_Algorithms_and_Interfaces.pdf
Скачиваний:
0
Добавлен:
13.05.2026
Размер:
38.02 Mб
Скачать

Concerning the distinction between luminance and luma, see Appendix A, on page 567.

Code

Aspectratio

0000

Forbidden

0001

Square

 

sampling

0010

4:3

0011

16:9

0100

2.21:1

0101 …

Reserved

1111

 

 

 

Table 47.4 MPEG-2 aspect ratio information. The 2.21:1 aspect ratio is not permitted in any defined profile.

GoP structures permitted at high data rates, as shown in Table 47.3:

Bit rate

 

Interlace?

GoP structure allowed

0 to 175

 

any

any

175 to 230

 

any

I-only, IP, or IB

230 to 300

{

interlaced

I-only

progressive

I-only, IP, or IB

Table 47.3 GoP restrictions in SMPTE 308M

I have presented the profile and level constraints of MPEG-2 itself. Certain applications of MPEG-2 – such as ATSC broadcasting, and DVD – impose restrictions beyond those of MPEG’s profiles and levels. For example, MPEG-2 allows a frame rate of 24 Hz, but that rate is disallowed for DVD. (Movies originating at 24 frames per second are ordinarily coded onto 480i DVD at 29.97 Hz, but signalling 2-3 pulldown.)

Picture structure

Each frame in MPEG-2 is coded with a fixed number of image columns (SAL, called horizontal size in MPEG) and image rows (LA, called vertical size) of luma samples.

I use the term luma; MPEG documents use luminance, but that term is technically incorrect in MPEG’s context.

A frame in MPEG-2 has either square sampling, or 4:3, 16:9, or 2.21:1 picture aspect ratio – that is, nonsquare sampling is permitted only at 4:3, 16:9, or 2.21:1. (MPEG writes aspect ratio unconventionally, as as height:width.) Table 47.4 presents MPEG-2’s aspect ratio information field. The 2.21:1 value is not permitted in any defined profile.

MPEG-2 accommodates both progressive and interlaced material. An image having NC columns and NR rows of luma samples can be coded directly as a framestructured picture, as depicted at the left of Figure 47.1 overleaf. In a frame-structured picture, all of the luma and chroma samples of a frame are taken to originate at the same time, and are intended for display at the same time. A flag progressive sequence asserts that

a sequence contains only frame-structured pictures. Alternatively, a video frame (typically originated from

an interlaced source) may be coded as a pair of fieldstructured pictures – a top-field picture and a bottom-

CHAPTER 47

MPEG-2 VIDEO COMPRESSION

517

Figure 47.1 An MPEG-2 frame picture contains an array of luma samples NC columns by NR rows. It is implicit that an MPEG-2 frame picture occupies the entire frame time.

Figure 47.2 An MPEG-2 field picture pair contains a top-field picture and a bottom-field picture, each NC columns by NR/2 rows. The samples are vertically offset. Here I show a pair ordered {top, bottom}; alternatively, it could be {bottom, top}. Concerning the relation between top and bottom fields and video standards, see Interlacing in MPEG-2, on page 132.

I, P, and B picture coding types were introduced on page 153.

Code

Framerate

 

 

0000 Forbidden

0001 241.001

0010 24

0011 25

0100 301.001

0101 30

0110 50

0111 601.001

1000 60

1001 … Reserved

1111

Table 47.5 MPEG-2 frame rate code

A frame-coded picture can code a top/bottom pair or a bottom/top pair – that is, a frame picture may correspond to a video frame, or may straddle two video frames. The latter case accommodates M-frames in 2-3 pulldown.

field picture – each having NC columns and NR/2 rows as depicted by Figure 47.2. The two fields are timeoffset by half the frame time, and are intended for interlaced display. Field pictures always come in pairs having opposite parity (top/bottom). Both pictures in a field pair must have the same picture coding type (I, P, or B), except that an I-field may be followed by

a P-field (in which case the pair functions as an I-frame, and may be termed an IP-frame).

Frame rate and 2-3 pulldown in MPEG

The defined profiles of MPEG-2 provide for the display frame rates shown in Table 47.5 in the margin. Frame rate is constant within a video sequence (to be defined on page 533). Unfortunately, it is unspecified how long a decoder may take to adapt to a change in frame rate.

In a sequence of frame-structured pictures, provisions are made to include, in the MPEG-2 bitstream, information to enable the display process to impose 2-3 pulldown upon display. Frames in such a sequence are coded as frame-structured pictures; in each frame, both fields are associated with the same instant in time. The flag repeat first field may accompany a picture; if that flag is set, then an interlaced display is expected to display the first field, the second field, and then the first field again – that is, to impose 2-3 pulldown. The frame rate in the bitstream specifies the display rate after 2-3 processing. I sketched a 2-3 sequence of four film frames in Figure 34.1, on page 405; on DVD, that sequence would be coded as the set of four progressive MPEG-2 frames flagged as indicated in Table 47.6 at the top of the facing page.

518

DIGITAL VIDEO AND HD ALGORITHMS AND INTERFACES

4:2:0 4:2:0

(top field) (bottom field)

Y’0 Y’1

Y’4 Y’5

Y’2 Y’3

Y’6 Y’7

CB0–3 CB4–7

CR0–3 CR4–7

Figure 47.3 Chroma subsampling in fieldstructured pictures

Film frame

Top first field

Repeat first field

(TFF)

(RFF)

A

0

0

B

0

1

C

1

0

D

1

1

 

 

 

Table 47.6 2-3 pulldown sequence in MPEG-2. For the definitions of film frames A through D, see 2-3 pulldown, on page 405.

Luma and chroma sampling structures

MPEG-2 accommodates 4:2:2 chroma subsampling, suitable for studio applications, and 4:2:0 chroma subsampling, suitable for video distribution. Unlike DV, CB and CR sample pairs are spatially coincident. The MPEG-2 standard includes 4:4:4 chroma format, but it isn’t permitted in any defined profile, so is highly unlikely to be commercialized.

There is no vertical subsampling in 4:2:2 – in this case, subsampling and interlace do not interact. 4:2:2 chroma for both frame-structured and field-structured pictures is depicted in the third (BT.601) column of Figure 12.1, on page 124.

4:2:0 chroma subsampling in a frame-structured picture is depicted in the rightmost column of Figure 12.1; CB and CR samples are centered vertically midway between luma samples in the frame, and are cosited horizontally.

4:2:0 chroma subsampling in a field-structured picture is depicted in Figure 47.3 in the margin. In the top field, chroma samples are centered 14 of the way vertically between a luma sample in the field and the luma sample immediately below in the same field. (In this example, CB0-3 is centered 14 of the way down from Y’0 to Y’2.) In the bottom field, chroma samples are centered 34 of the way vertically between a luma sample in the field and the luma sample immediately below in the same field. (In this example, CB4-7 is centered 34 of the way down from Y’4 to Y’6.) This scheme centers chroma samples at the same locations that they would take in a frame-structured picture; however, alternate rows of chroma samples in the frame are time-offset by half the frame time.

CHAPTER 47

MPEG-2 VIDEO COMPRESSION

519

If horizontal size or vertical size is not divisible by 16, then the encoder pads the image with a suitable number of black “overhang” samples at the right edge or bottom edge. For example, when coding HD at 1920× 1080, an encoder appends 8 rows of black pixels to the image array, to make the row count 1088. Upon decoding, these samples are cropped prior to display. The overhang regions are retained in the reference framestores.

MPEG-2 and H.264 use the term reference picture. Some people say anchor picture.

Macroblocks

At the core of MPEG compression is the DCT coding of 8× 8 blocks of sample values (in I-pictures, as in JPEG), or 8× 8 blocks of prediction errors – residuals, or residues (in P- and B-pictures). To simplify the implementation of subsampled chroma, the same DCT and block coding scheme is used for both luma and chroma. When combined with 4:2:0 chroma subsampling, two 8× 8 block of chroma samples are associated with

a 16× 16 block of luma. This leads to the tiling of a field or frame into units of 16× 16 luma samples. Each such unit is a macroblock (MB). Macroblocks lie a on 16× 16 grid aligned with the upper-left luma sample of the image.

Each macroblock comprises four 8× 8 luma blocks, accompanied by the requisite number and arrangement of 8× 8 CB blocks and 8× 8 CR blocks, depending upon chroma format. In the usual 4:2:0 chroma format,

a macroblock comprises six blocks: four luma blocks,

a CB block, and a CR block. In the 4:2:2 chroma format, a macroblock comprises eight blocks: four luma blocks, two blocks of CB, and two blocks of CR.

Picture coding types – I, P, B

I-pictures, P-pictures, and B-pictures were introduced on page 152. Coded I-picture and P-picture data are used to reconstruct reference pictures – fields or frames that are available for constructing predictions. An MPEG decoder maintains two reference framestores, one past and one future. An encoder also maintains two reference framestores, reconstructed as if by

a decoder; these track the contents of the decoder’s reference framestores. The simple profile has no B-pictures; a single reference framestore suffices.

Each I-picture is coded independently of any other picture. When an I-picture is reconstructed by

a decoder, it is displayed. Additionally, it is stored as a reference frame so as to be available as a predictor. I-pictures are compressed using a JPEG-like algorithm, using perceptually based quantization matrices.

Each P-picture is coded using the past reference picture as a predictor. Residuals are compressed using the same JPEG-like algorithm that is used for I-pictures, but typically with quite different quantization matrices.

520

DIGITAL VIDEO AND HD ALGORITHMS AND INTERFACES

Соседние файлы в папке литература