- •Copyright
- •Contents
- •About the Author
- •Foreword
- •Preface
- •Glossary
- •1 Introduction
- •1.1 THE SCENE
- •1.2 VIDEO COMPRESSION
- •1.4 THIS BOOK
- •1.5 REFERENCES
- •2 Video Formats and Quality
- •2.1 INTRODUCTION
- •2.2 NATURAL VIDEO SCENES
- •2.3 CAPTURE
- •2.3.1 Spatial Sampling
- •2.3.2 Temporal Sampling
- •2.3.3 Frames and Fields
- •2.4 COLOUR SPACES
- •2.4.2 YCbCr
- •2.4.3 YCbCr Sampling Formats
- •2.5 VIDEO FORMATS
- •2.6 QUALITY
- •2.6.1 Subjective Quality Measurement
- •2.6.2 Objective Quality Measurement
- •2.7 CONCLUSIONS
- •2.8 REFERENCES
- •3 Video Coding Concepts
- •3.1 INTRODUCTION
- •3.2 VIDEO CODEC
- •3.3 TEMPORAL MODEL
- •3.3.1 Prediction from the Previous Video Frame
- •3.3.2 Changes due to Motion
- •3.3.4 Motion Compensated Prediction of a Macroblock
- •3.3.5 Motion Compensation Block Size
- •3.4 IMAGE MODEL
- •3.4.1 Predictive Image Coding
- •3.4.2 Transform Coding
- •3.4.3 Quantisation
- •3.4.4 Reordering and Zero Encoding
- •3.5 ENTROPY CODER
- •3.5.1 Predictive Coding
- •3.5.3 Arithmetic Coding
- •3.7 CONCLUSIONS
- •3.8 REFERENCES
- •4 The MPEG-4 and H.264 Standards
- •4.1 INTRODUCTION
- •4.2 DEVELOPING THE STANDARDS
- •4.2.1 ISO MPEG
- •4.2.4 Development History
- •4.2.5 Deciding the Content of the Standards
- •4.3 USING THE STANDARDS
- •4.3.1 What the Standards Cover
- •4.3.2 Decoding the Standards
- •4.3.3 Conforming to the Standards
- •4.7 RELATED STANDARDS
- •4.7.1 JPEG and JPEG2000
- •4.8 CONCLUSIONS
- •4.9 REFERENCES
- •5 MPEG-4 Visual
- •5.1 INTRODUCTION
- •5.2.1 Features
- •5.2.3 Video Objects
- •5.3 CODING RECTANGULAR FRAMES
- •5.3.1 Input and output video format
- •5.5 SCALABLE VIDEO CODING
- •5.5.1 Spatial Scalability
- •5.5.2 Temporal Scalability
- •5.5.3 Fine Granular Scalability
- •5.6 TEXTURE CODING
- •5.8 CODING SYNTHETIC VISUAL SCENES
- •5.8.1 Animated 2D and 3D Mesh Coding
- •5.8.2 Face and Body Animation
- •5.9 CONCLUSIONS
- •5.10 REFERENCES
- •6.1 INTRODUCTION
- •6.1.1 Terminology
- •6.3.2 Video Format
- •6.3.3 Coded Data Format
- •6.3.4 Reference Pictures
- •6.3.5 Slices
- •6.3.6 Macroblocks
- •6.4 THE BASELINE PROFILE
- •6.4.1 Overview
- •6.4.2 Reference Picture Management
- •6.4.3 Slices
- •6.4.4 Macroblock Prediction
- •6.4.5 Inter Prediction
- •6.4.6 Intra Prediction
- •6.4.7 Deblocking Filter
- •6.4.8 Transform and Quantisation
- •6.4.11 The Complete Transform, Quantisation, Rescaling and Inverse Transform Process
- •6.4.12 Reordering
- •6.4.13 Entropy Coding
- •6.5 THE MAIN PROFILE
- •6.5.1 B slices
- •6.5.2 Weighted Prediction
- •6.5.3 Interlaced Video
- •6.6 THE EXTENDED PROFILE
- •6.6.1 SP and SI slices
- •6.6.2 Data Partitioned Slices
- •6.8 CONCLUSIONS
- •6.9 REFERENCES
- •7 Design and Performance
- •7.1 INTRODUCTION
- •7.2 FUNCTIONAL DESIGN
- •7.2.1 Segmentation
- •7.2.2 Motion Estimation
- •7.2.4 Wavelet Transform
- •7.2.6 Entropy Coding
- •7.3 INPUT AND OUTPUT
- •7.3.1 Interfacing
- •7.4 PERFORMANCE
- •7.4.1 Criteria
- •7.4.2 Subjective Performance
- •7.4.4 Computational Performance
- •7.4.5 Performance Optimisation
- •7.5 RATE CONTROL
- •7.6 TRANSPORT AND STORAGE
- •7.6.1 Transport Mechanisms
- •7.6.2 File Formats
- •7.6.3 Coding and Transport Issues
- •7.7 CONCLUSIONS
- •7.8 REFERENCES
- •8 Applications and Directions
- •8.1 INTRODUCTION
- •8.2 APPLICATIONS
- •8.3 PLATFORMS
- •8.4 CHOOSING A CODEC
- •8.5 COMMERCIAL ISSUES
- •8.5.1 Open Standards?
- •8.5.3 Capturing the Market
- •8.6 FUTURE DIRECTIONS
- •8.7 CONCLUSIONS
- •8.8 REFERENCES
- •Bibliography
- •Index
• |
DESIGN AND PERFORMANCE |
254 |
of H.264) out-performed H.263++ and MPEG-4 ASP by an average of 3.0 dB and 2.0 dB respectively. The H.26L CODEC achieved roughly the same performance at a coded bitrate of 32 kbit/s as the other two CODECs at a bitrate of 64 kbit/s (QCIF video, 10 frames per second), i.e. in this test H.26L produced the same decoded quality at around half the bitrate of MPEG-4 ASP and H.263++. At higher bitrates (512 kbps and above) the gain was still significant (but not so large). An overview of rate-constrained encoder control and a comparison of H.264 performance with H.263, MPEG-2 Video and MPEG-4 Visual is given in [38].
7.4.4 Computational Performance
MPEG-4 Visual and (to a lesser extent) H.264 provide a range of optional coding modes that have the potential to improve compression performance. For example, MPEG-4’s Advanced Simple Profile is designed to offer greater compression efficiency than the popular Simple Profile (see Chapter 5); the Main Profile of H.264 is capable of providing better compression efficiency than the Baseline Profile (see Chapter 6). Within a specific Profile, a designer or user of a CODEC can choose whether or not to enable certain coding features. A Main Profile H.264 decoder should support both Context Adaptive VLCs (CAVLC) and arithmetic coding (CABAC) but the encoder has the choice of which mode to use in a particular application.
Improved coding efficiency often comes at the cost of higher computational complexity. The situation is complicated by the fact that the computational cost and coding benefit of a particular mode or feature can depend very much on the type of source material. In a practical application, the choice of possible coding modes may depend on the limitations of the processing platform and it may be necessary to choose encoding parameters to suit the source material and available processing resources.
Example
The first 25 frames of the ‘Violin’ sequence (QCIF, 25 frames per second, see Figure 7.18) were encoded using the H.264 test model software (version JM4.0) with a fixed quantiser parameter of 36. The sequence was encoded with a range of coding parameters to investigate the effect of each on compression performance and coding time. Two reference configurations were used as follows.
Basic configuration: CAVLC entropy coding, no B-pictures, loop filter enabled, rate–distortion optimisation disabled, one reference frame for motion compensation, all block sizes (down to 4 × 4) available.
Advanced configuration: CABAC entropy coding, every 2nd picture coded as a B-picture, loop filter enabled, rate–distortion optimisation enabled, five reference frames, all block sizes available.
The ‘basic’ configuration represents a suitable set-up for a low complexity, real-time CODEC whilst the ‘advanced’ configuration might be suitable for a high-complexity, high-efficiency CODEC. Table 7.3 summarises the results. The luminance PSNR (objective quality) of each sequence is almost identical and the differences in performance are apparent in the coded bitrate and encoding time.
The ‘basic’ configuration takes 40 seconds to encode the sequence and produces a bitrate of 46 kbps (excluding the bits produced by the first I-slice). Using only 8 × 8 or larger motion compensation block sizes reduces coding time (by c. 6 seconds) but increases the coded bitrate, as
PERFORMANCE |
|
|
|
255 |
|
||
|
|
|
|
|
|||
|
Table 7.3 Computational |
performance of H.264 optional modes: violin, QCIF, 25 frames |
|||||
|
|
|
|
• |
|||
|
|
|
|
|
|
|
|
|
|
|
Average luminance |
Coded bitrate |
Encoding time |
||
|
Configuration |
|
PSNR (dB) |
(P/B slices) (kbps) |
(seconds) |
||
|
|
|
|
|
|
|
|
|
Basic |
|
29.06 |
45.9 |
40.4 |
|
|
|
Basic + min. block size of 8 × 8 |
29.0 |
46.6 |
33.9 |
|
|
|
|
Basic + 5 reference frames |
|
29.12 |
46.2 |
157.2 |
|
|
|
Basic + rate-distortion optimisation |
29.18 |
44.6 |
60.5 |
|
|
|
|
Basic + every 2nd picture coded |
29.19 |
42.2 |
55.7 |
|
|
|
|
as a B-picture |
|
|
|
|
|
|
|
Basic + CABAC |
|
29.06 |
44.0 |
40.5 |
|
|
|
Advanced |
|
29.57 |
38.2 |
180 |
|
|
|
Advanced (only one reference |
29.42 |
38.8 |
77 |
|
|
|
|
frame) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
might be expected. Using multiple reference frames (five in this case) increases coding time (by almost four times) but results in an increase in coded bitrate. Adding rate–distortion optimisation (in which the encoder repeatedly codes each macroblock in different ways in order to find the best coding parameters) reduces the bitrate at the expense of a 50% increase in coding time. B-pictures provide a compression gain at the expense of increased coding time (nearly 50%); CABAC gives a compression improvement and does not increase coding time.
The ‘advanced’ configuration takes over four times longer than the ‘basic’ configuration to encode but produces a bitrate 17% smaller than the basic configuration. By using only one reference frame, the coding time is reduced significantly at the expense of a slight drop in compression efficiency.
These results show that, for this sequence and this encoder at least, the most useful performance optimisations (in terms of coding efficiency improvement and computational complexity) are CABAC and B-pictures. These give a respectable improvement in compression without a high computational penalty. Conversely, multiple reference frames make only a slight improvement (and then only in conjunction with certain other modes, notably rate-distortion optimised encoding) and are computationally expensive. It is worth noting, however, (i) that different outcomes would be expected with other types of source material (for example, see [36]) and (ii) that the reference model encoder is not optimised for computational efficiency.
7.4.5 Performance Optimisation
Achieving the optimum balance between compression and decoded quality is a difficult and complex challenge. Setting encoding parameters at the start of a video sequence and leaving them unchanged throughout the sequence is unlikely to produce optimum rate–distortion performance since the encoder faces a number of inter-related choices when coding each macroblock. For example, the encoder may select a motion vector for an inter-coded MB that minimises the energy in the motion-compensated residual. However, this is not necessarily the best choice because larger MVs generally require more bits to encode and the optimum choice of MV is the one that minimises the total number of bits in the coded MB (including header, MV and coefficients). Thus finding the optimal choice of parameters (such as MV, quantisation parameter, etc.) may require the encoder to code the MB repeatedly before selecting the combination of parameters that minimise the coded size of the MB. Further, the choice of
