Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Richardson I.E.H.264 and MPEG-4 video compression.2003.pdf
Скачиваний:
34
Добавлен:
23.08.2013
Размер:
4.27 Mб
Скачать

DESIGN AND PERFORMANCE

246

7.4 PERFORMANCE

In this section we compare the performance of selected profiles of MPEG-4 Visual and H.264. It should be emphasised that what is considered to be ‘acceptable’ performance depends very much on the target application and on the type of video material that is encoded. Further, coding performance is strongly influenced by encoder decisions that are left to the discretion of the designer (e.g. motion estimation algorithm, rate control method, etc.) and so the performance achieved by a commercial CODEC may vary considerably from the examples reported here.

7.4.1 Criteria

Video CODEC performance can be considered as a tradeoff between three variables, quality, compressed bit rate and computational cost. ‘Quality’ can mean either subjective or objective measured video quality (see Chapter 2). Compressed bit rate is the rate (in bits per second) required to transmit a coded video sequence and computational cost refers to the processing ‘power’ required to code the video sequence. If video is encoded in real time, then the computational cost must be low enough to ensure encoding of at least n frames per second (where n is the target number of frames per second); if video is encoded ‘offline’, i.e. not in real time, then the computational cost per frame determines the total coding time of a video sequence.

The rate–distortion performance of a video CODEC describes the tradeoff between two of these variables, quality and bit rate. Plotting mean PSNR against coded bit rate produces a characteristic rate–distortion curve (Figure 7.23). As the bit rate is reduced, quality (as measured by PSNR) drops at an increasing rate. Plotting rate–distortion curves for identical source material (i.e. the same resolution, frame rate and content) is a widely accepted method of comparing the performance of two video CODECs. As Figure 7.23 indicates, ‘better’ rate–distortion performance is demonstrated by moving the graph up and to the left.

Comparing and evaluating competing video CODECs is a difficult problem. Desirable properties of a video CODEC include good rate–distortion performance and low (or acceptable) computational complexity. When comparing CODECs, it is important to use common

better performance

worse performance

coded bitrate

Figure 7.23 Example of a rate–distortion curve

PERFORMANCE

247

 

test conditions where possible. For example, different video sequences can lead to dramatic differences in rate–distortion performance (i.e. some video sequences are ‘easier’ to code than others) and computational performance (especially if video processing is carried out in software). Certain coding artefacts (e.g. blocking, ringing) may be more visible in some decoded sequences than others. For example, blocking distortion is particularly visible in larger areas of continuously-varying tone in an image and blurring of features (for example due to a crude deblocking filter) is especially obvious in detailed areas of an image.

7.4.2 Subjective Performance

In this section we examine the subjective quality of video sequences after encoding and decoding. The ‘Office’ sequence (Figure 7.24) contains 200 frames, each captured in 4:2:0 CIF format (see Chapter 2 for details of this format). The ‘Office’ sequence was shot from a fixed camera position and the only movement is due to the two women. In contrast, the ‘Grasses’ sequence (Figure 7.25), also consisting of 200 CIF frames, was shot with a handheld camera and contains rapid, complex movement of grass stems. This type of sequence is particularly difficult to encode due to the high detail and complex movement, since it is difficult to find accurate matches during motion estimation.

Each sequence was encoded using three CODECs, an MPEG-2 Video CODEC, an MPEG-4 Simple Profile CODEC and the H.264 Reference Model CODEC (operating in Baseline Profile mode, using only one reference picture for motion compensation). In each case, the first frame was encoded as an I-picture. The remaining frames were encoded as

Figure 7.24 Office: original frame

DESIGN AND PERFORMANCE

248

Figure 7.25 Grasses: original frame

P-pictures using the MPEG-4 and H.264 CODECs and as a mixture of B- and P-pictures with the MPEG-2 CODEC (with the sequence BBPBBP. . . ). The ‘Office’ sequence was encoded at a mean bitrate of 150 kbps with all three CODECs and the ‘Grasses’ sequence at a mean bitrate of 900 kbps.

The decoded quality varies significantly between the three CODECs. A close-up of a frame from the ‘Office’ sequence after encoding and decoding with MPEG-2 (Figure 7.26) shows considerable blocking distortion and loss of detail. The MPEG-4 Simple Profile frame (Figure 7.27) is noticeably better but there is still evidence of blocking and ringing distortion. The H.264 frame (Figure 7.28) is the best of the three and at first sight there is little difference between this and the original frame (Figure 7.24). Visually important features such as the woman’s face and smooth areas of continuous tone variation have been preserved but fine texture (such as the wood grain on the table and the texture of the wall) has been lost.

The results for the ‘Grasses’ sequence are less clear-cut. At 900 kbps, all three decoded sequences are clearly distorted. The MPEG-2 sequence (a close-up of one frame is shown in Figure 7.29) has the most obvious blocking distortion but blocking distortion is also clearly visible in the MPEG-4 Simple Profile sequence (Figure 7.30). The H.264 sequence (Figure 7.31) does not show obvious block boundaries but the image is rather blurred due to the deblocking filter. Played back at the full 25 fps frame rate, the H.264 sequence looks better than the other two but the performance improvement is not as clear as for the ‘Office’ sequence.

These examples highlight the way CODEC performance can change depending on the video sequence content. H.264 and MPEG-4 SP perform well at a relatively low bitrate (150 kbps) when encoding the ‘Office’ sequence; both perform significantly worse at a higher bitrate (900 kbps) when encoding the more complex ‘Grasses’ sequence.

Figure 7.26 Office: encoded and decoded, MPEG-2 Video (close-up)

Figure 7.27 Office: encoded and decoded, MPEG-4 Simple Profile (close-up)

Figure 7.28 Office: encoded and decoded, H.264 Baseline Profile (close-up)

Figure 7.29 Grasses: encoded and decoded, MPEG-2 Video (close-up)

Figure 7.30 Grasses: encoded and decoded, MPEG-4 Simple Profile (close-up)

Figure 7.31 Grasses: encoded and decoded, H.264 Baseline Profile (close-up)

PERFORMANCE

251

 

Office (CIF, 50 frames)

 

44

 

 

 

 

 

 

 

 

 

 

 

42

 

 

 

 

 

 

 

 

 

 

 

40

 

 

 

 

 

 

 

 

 

 

(dB)

38

 

 

 

 

 

Office, MP4 SP

 

 

 

YPSNR

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Office, H.264 Baseline

 

 

36

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

34

 

 

 

 

 

 

 

 

 

 

 

32

 

 

 

 

 

 

 

 

 

 

 

30

 

 

 

 

 

 

 

 

 

 

 

0

100000

200000

300000

400000

500000

600000

700000

800000

900000

1000000

 

 

 

 

 

 

Bitrate (bps)

 

 

 

 

 

Figure 7.32 Rate–distortion comparison: ‘Office’, CIF

7.4.3 Rate–distortion Performance

Measuring bitrate and PSNR at a range of quantiser settings provides an estimate of compression performance that is numerically more accurate (but less closely related to visual perception) than subjective comparisons. Some comparisons of MPEG4 and H.264 performance are presented here.

Figure 7.32 and Figure 7.33 compare the performance of MPEG4 (Simple Profile) and H.264 (Baseline Profile, 1 reference frame) for the ‘Office’ and ‘Grasses’ sequences. Note that ‘Office’ is easier to compress than ‘Grasses’ (see above) and at a given bitrate, the PSNR of ‘Office’ is significantly higher than that of ‘Grasses’. H.264 out-performs MPEG4 compression at all of the tested bit rates, but the rate-distortion gain is more noticeable for the ‘Office’ sequence.

The rate–distortion performance of the popular ‘Carphone’ test sequence is plotted in Figure 7.34. This sequence contains moderate motion and in this case the source is QCIF resolution at 30 frames per second. Four sets of results are compared, two from MPEG-4 and two from H.264. The first two are MPEG-4 Simple Profile (first frame coded as an I-picture, subsequent frames coded as P-pictures) and MPEG-4 Advanced Simple Profile (using two B- pictures between successive P-pictures, no other ASP tools used). The second pair are H.264 Baseline (first frame coded as an I-slice, subsequent frames coded as P-slices, one reference frame used for inter prediction, UVLC/CAVLC entropy coding) and H.264 Main Profile (first frame coded as an I-slice, subsequent frames coded as P-slices, five reference frames used for inter prediction, CABAC entropy coding).

DESIGN AND PERFORMANCE

252

Grasses (CIF, 50 frames)

 

40

 

 

 

 

 

 

 

 

 

 

 

38

 

 

 

 

 

 

 

 

 

 

 

36

 

 

Grasses, MP4 SP

 

 

 

 

 

 

 

 

Grasses, H.264 Baseline

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

34

 

 

 

 

 

 

 

 

 

 

(dB)

32

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

YPSNR

30

 

 

 

 

 

 

 

 

 

 

28

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

26

 

 

 

 

 

 

 

 

 

 

 

24

 

 

 

 

 

 

 

 

 

 

 

22

 

 

 

 

 

 

 

 

 

 

 

20

 

 

 

 

 

 

 

 

 

 

 

0

500000

1000000

1500000

2000000

2500000

3000000

3500000

4000000

4500000

5000000

 

 

 

 

 

 

Bitrate (bps)

 

 

 

 

 

Figure 7.33 Rate–distortion comparison: ‘Grasses’, CIF

MPEG-4 ASP performs slightly better than MPEG-4 SP at higher bitrates but performs worse at lower bitrates (in this case). It is interesting to note that other ASP tools (quarterpel MC and alternate quantiser) produced poorer performance in this test. H.264 Baseline outperforms both MPEG-4 profiles at all bitrates and CABAC and multiple reference frames provide a further performance gain. For example, at a PSNR of 35 dB, MPEG-4 SP produces a coded bitrate of around 125 kbps, ASP reduces the bitrate to around 120 kbps, H.264 Baseline (with one reference frame) produces a bitrate of around 80 kbps and H.264 with CABAC and five reference frames gives a bitrate of less than 70 kbps.

The results for the ‘Carphone’ sequence show a more convincing performance gain from H.264 than the results for ‘Grasses’ and ‘Office’. This is perhaps because ‘Carphone’ is a professionally-captured sequence with high image fidelity whereas the other two sequences were captured from a high-end consumer video camera and have more noise in the original images.

Other Performance Studies

The compression performance of MPEG-4 Simple and Advanced Simple Profiles are compared in [35]. In this paper the Advanced Simple tools are found to improve the compression performance significantly, producing a coded bitrate around 30–40% smaller for the same video quality, at the expense of increased encoder complexity. Most of the performance gain

PERFORMANCE

253

 

QCIF sequence ("Carphone", 200 frames)

 

45

 

 

 

 

 

 

 

43

 

 

 

 

 

 

 

41

 

 

 

 

 

 

 

39

 

 

 

 

 

 

(dB)

37

 

 

 

 

 

 

 

 

 

 

 

MPEG4 SP

 

PSNR

 

 

 

 

 

 

35

 

 

 

 

MPEG4 ASP

 

 

 

 

 

 

 

 

 

 

 

 

H264 UVLC, 1 reference

 

Y

 

 

 

 

 

 

33

 

 

 

 

H264 CABAC, 5 reference

 

 

 

 

 

 

 

 

31

 

 

 

 

 

 

 

29

 

 

 

 

 

 

 

27

 

 

 

 

 

 

 

25

 

 

 

 

 

 

 

0

100000

200000

300000

400000

500000

600000

 

 

 

 

Bitrate (bps)

 

 

 

Figure 7.34 Rate–distortion comparison: ‘Carphone’, QCIF

is due to the use of B-pictures (which require extra storage and encoding delay), quarterpixel motion compensation and rate–distortion optimised mode selection (i.e. choosing the encoding mode of each macroblock to maximise compression performance).

Reference [36] evaluates the performance of H.264 (an older version of the test model) and compares it with H.263++ (H.263 with optional modes to improve coding performance). According to the results presented, H.264 (H.26L) consistently out-performs H.263++. The authors study the contribution of some of the optional features of H.264 and conclude that CABAC provides a consistent coding gain compared with VLCs, reducing the coded bit rate by an average of 7.7% for the same objective quality (although the version of the test model did not include the more efficient context-adaptive VLCs for coefficient encoding). Using multiple reference frames for motion-compensated prediction provides better performance than a single reference frame, although the gains are slightly less obvious (a mean gain of 5.7% over the single reference frame case). Small motion compensation block sizes (down to 4 × 4) provided a clear performance gain over a single vector per macroblock (a mean bit rate saving of 16.4%), although most of the gain (13.7%) was achieved for a minimum block size of 8 × 8.

H.264 is compared with both H.263++ and MPEG-4 Visual Advanced Simple Profile in [37]. The authors compared the luminance PSNR (see Chapter 2) of each CODEC under the same test conditions. The three CODECs were chosen to represent the most highly-optimised versions of each standard available at the time. Test Model 8 of H.26L (an earlier version