Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
литература / Digital_Video_and_HD_Second_Edition_Algorithms_and_Interfaces.pdf
Скачиваний:
0
Добавлен:
13.05.2026
Размер:
38.02 Mб
Скачать

This method is sometimes called “logarithmic,” which I consider to be a very poor term in this context.

The fractions 0.6, 0.3, and 0.1 are comparable to the SD luma coefficients of green, red, and blue; so, I use green, red, and blue to designate I-picture, P-picture, and B-picture data respectively in Figures 16.2, 16.3, and 16.4 (page 153).

the reference frame. For the large ranges of MPEG-2, full block matching is impractical.

Pixel-recursive (or pel-recursive) methods start with a small number of initial guesses at motion, based upon motion estimates from previous frames. The corresponding coordinates in the reference frame are searched, and each guess is refined. The best guess is taken as the final motion vector.

Pyramidal methods form spatial lowpass-filtered versions of the target macroblock, and of the reference frames; block matches are performed at low resolution. Surrounding the coordinates of the most promising candidates at one resolution level, less severely filtered versions of the reference picture regions are formed, and block matches are performed on those. Successive refinement produces the final motion vector. This technique tends to produce smooth motion-vector fields.

Rate control and buffer management

A typical video sequence, encoded by a typical MPEG-2 encoder, produces I-, P-, and B-pictures that consume bits roughly in the ratio 60:30:10. An I-picture requires perhaps six times the number of bits as two B-pictures.

Many applications of MPEG-2, such as DTV, involve a transmission channel with a fixed data rate. This calls for constant bit rate (CBR) operation. Other applications of MPEG-2, such as DVD, involve a channel having variable (but limited) data rate. Such applications call for variable bit rate (VBR) operation, where the instantaneous bit rate is varied to achieve the desired picture quality for each frame, maximizing storage utilization.

The larger the decoder’s buffer size, the more flexibility is available to the encoder to allocate bits among pictures. However, a large buffer is expensive. Each profile/level combination dictates the minimum buffer size that a decoder must implement.

An encoder effects rate control by altering the quantization matrices – the perceptually weighted matrix used for intra macroblocks, and the flat matrix used for nonintra macroblocks. MPEG-2 allows quantizer matrices to be included in the bitstream. Additionally, and more importantly, a quantizer scale code is transmitted at the slice level, and may be updated at the

CHAPTER 47

MPEG-2 VIDEO COMPRESSION

531

Figure 47.9 Buffer occupancy in MPEG-2 is managed through an idealized video buffering verifier (VBV) that analyzes the output bitstream produced by any encoder. This graph shows buffer occupancy for a typical GoP.

 

100

 

 

 

 

 

 

 

 

 

 

[percent]

80

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

B

 

 

 

 

 

 

 

 

 

B

 

 

occupancy

60

 

 

 

 

 

B

 

 

 

 

 

 

B

 

B

 

P

 

 

 

40

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

B

 

P

 

 

 

 

 

 

Buffer

 

 

 

 

 

 

 

 

 

20

I

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0

 

 

 

 

 

 

 

 

 

 

 

-1

0

1

2

3

4

5

6

7

8

9

 

 

 

 

Time [picture durations]

 

 

 

macroblock level. This code determines an overall scale factor that is applied to the quantizer matrices. The encoder quantizes more or less severely to achieve the required bit rate; the quantizer scale code is conveyed to the decoder so that it dequantizes accordingly.

Video display requires a constant number of frames per second. Because an I-picture has a relatively large number of bits, during decoding and display of an I-picture in all but degenerate cases, the decoder’s net buffer occupancy decreases. During decoding and display of a B-picture, net buffer occupancy increases. Figure 47.9 shows typical buffer occupancy at the start of a sequence, for a duration of about one GoP.

An MPEG bitstream must be constructed such that the decoder’s buffer doesn’t overflow: If it did, bits would be lost. The bitstream must also be constructed so that the buffer doesn’t underflow: If it did, a picture to be displayed would not be available at the required time.

Buffer management in MPEG-2 is based upon an idealized model of the decoder’s buffer: All of the bits associated with each picture are deemed to be extracted from the decoder’s buffer at a certain precisely defined instant in time with respect to the bitstream. Every encoder implements a video buffering verifier (VBV) that tracks the state of this idealized buffer. Each picture header contains a VBV delay field that declares the fullness of the buffer at the start of

532

DIGITAL VIDEO AND HD ALGORITHMS AND INTERFACES

that picture. After channel acquisition, a decoder waits a corresponding amount of time before starting decoding. (If the decoder did not wait, buffer underflow could result.)

 

Bitstream syntax

 

The end product of MPEG-2 video compression is a bit-

 

stream partitioned into what MPEG calls a syntactic

 

hierarchy having six layers: sequence, GoP, picture, slice,

 

macroblock, and block. Except for the video sequence

 

layer, which has a sequence end element, each syntactic

 

element has a header and no trailer. The sequence, GoP,

 

picture, and slice elements each begin with a 24-bit

 

start code prefix comprising 23 zero bits followed by

 

a one bit. A start code establishes byte alignment, and

 

may be preceded by an arbitrary number of zero-

 

stuffing bits. All other datastream elements are

 

constructed so as to avoid the possibility of 23 or more

 

consecutive zero bits.

Video sequence layer

The top layer of the MPEG syntax is the video

 

sequence. The sequence header specifies high-level

 

parameters such as bit rate, picture rate, picture size,

 

picture aspect ratio, profile, level, progressive/interlace,

 

and chroma format. The VBV buffer size parameter

 

declares the maximum buffer size required within the

 

sequence. The sequence header may specify quantizer

 

matrices. At the encoder’s discretion, the sequence

 

header may be retransmitted intermittently or periodi-

 

cally throughout the sequence, to enable rapid channel

 

acquisition by decoders.

 

The start of each interlaced video sequence estab-

 

lishes an immutable sequence of field pairs, ordered

 

either {top, bottom, …}, typical of 480i, or {bottom,

 

top, …}, typical of 576i and 1080i. Within a sequence,

 

any individual field may be field-coded, and any two

 

adjacent fields may be frame-coded; however, field

 

parity must alternate in strict sequence.

Group of pictures

The GoP is MPEG’s unit of random access. The GoP

(GoP header)

layer is optional in MPEG-2; however, it is a practical

 

necessity for most applications. A GoP starts with an

 

I-picture. (Additional I-pictures are allowed.) The GoP

 

header contains SMPTE timecode, and closed GoP and

 

broken link flags.

CHAPTER 47

MPEG-2 VIDEO COMPRESSION

533

 

A GoP header contains 23 bits of coded SMPTE

 

timecode. If present, this applies to the first frame of

 

the GoP (in display order). It is unused within MPEG.

 

If a GoP is closed, no coded B-picture in the GoP may

 

reference the first I-picture of the following GoP. This is

 

inefficient, because the following I-picture ordinarily

 

contains useful prediction information. If a GoP is open,

 

or the GoP header is absent, then B-pictures in the GoP

 

may reference the first I-picture of the following GoP.

 

To allow editing of an MPEG bitstream, GoPs must be

 

closed.

 

A device that splices bitstreams at GoP boundaries

 

can set broken link; this signals a decoder to invalidate

 

B-pictures immediately following the GoP’s first

 

I-picture.

Picture layer

The picture header specifies picture structure (frame,

 

top field, or bottom field), and picture coding type (I, P,

 

or B). The picture header can specify quantizer matrices

 

and quantizer scale type. The VBV delay parameter is

 

used for buffer management.

Slice layer

A slice aggregates macroblocks in raster order, left to

 

right and top to bottom. No slice crosses the edge of

 

the picture. All defined profiles have “restricted slice

 

structure,” where slices cover the picture with no gaps

 

or overlaps. The slice header contains the quantizer

 

scale code. The slice serves several purposes. First, the

 

slice is the smallest unit of resynchronization in case of

 

uncorrected data transmission error. Second, the slice is

 

the unit of differential coding of intra-macroblock DC

 

terms. Third, the slice is the unit for differential coding

 

of nonintra motion vectors: The first macroblock of

 

a slice has motion vectors coded absolutely, and motion

 

vectors for subsequent macroblocks are coded in terms

 

of successive differences from that.

Macroblock layer

The macroblock is MPEG’s unit of motion prediction.

 

Coded macroblock data contains an indication of the

 

macroblock type (intra, forward predicted, backward

 

predicted, or bipredicted); a quantizer scale code; 0, 1,

 

or 2 forward motion vectors; and 0, 1, or 2 backward

 

motion vectors. The coded block pattern flags provide

 

a compact way to represent blocks that are not coded

 

(owing to being adequately predicted without the need

 

for residuals).

534

DIGITAL VIDEO AND HD ALGORITHMS AND INTERFACES

Соседние файлы в папке литература