- •Copyright
- •Contents
- •About the Author
- •Foreword
- •Preface
- •Glossary
- •1 Introduction
- •1.1 THE SCENE
- •1.2 VIDEO COMPRESSION
- •1.4 THIS BOOK
- •1.5 REFERENCES
- •2 Video Formats and Quality
- •2.1 INTRODUCTION
- •2.2 NATURAL VIDEO SCENES
- •2.3 CAPTURE
- •2.3.1 Spatial Sampling
- •2.3.2 Temporal Sampling
- •2.3.3 Frames and Fields
- •2.4 COLOUR SPACES
- •2.4.2 YCbCr
- •2.4.3 YCbCr Sampling Formats
- •2.5 VIDEO FORMATS
- •2.6 QUALITY
- •2.6.1 Subjective Quality Measurement
- •2.6.2 Objective Quality Measurement
- •2.7 CONCLUSIONS
- •2.8 REFERENCES
- •3 Video Coding Concepts
- •3.1 INTRODUCTION
- •3.2 VIDEO CODEC
- •3.3 TEMPORAL MODEL
- •3.3.1 Prediction from the Previous Video Frame
- •3.3.2 Changes due to Motion
- •3.3.4 Motion Compensated Prediction of a Macroblock
- •3.3.5 Motion Compensation Block Size
- •3.4 IMAGE MODEL
- •3.4.1 Predictive Image Coding
- •3.4.2 Transform Coding
- •3.4.3 Quantisation
- •3.4.4 Reordering and Zero Encoding
- •3.5 ENTROPY CODER
- •3.5.1 Predictive Coding
- •3.5.3 Arithmetic Coding
- •3.7 CONCLUSIONS
- •3.8 REFERENCES
- •4 The MPEG-4 and H.264 Standards
- •4.1 INTRODUCTION
- •4.2 DEVELOPING THE STANDARDS
- •4.2.1 ISO MPEG
- •4.2.4 Development History
- •4.2.5 Deciding the Content of the Standards
- •4.3 USING THE STANDARDS
- •4.3.1 What the Standards Cover
- •4.3.2 Decoding the Standards
- •4.3.3 Conforming to the Standards
- •4.7 RELATED STANDARDS
- •4.7.1 JPEG and JPEG2000
- •4.8 CONCLUSIONS
- •4.9 REFERENCES
- •5 MPEG-4 Visual
- •5.1 INTRODUCTION
- •5.2.1 Features
- •5.2.3 Video Objects
- •5.3 CODING RECTANGULAR FRAMES
- •5.3.1 Input and output video format
- •5.5 SCALABLE VIDEO CODING
- •5.5.1 Spatial Scalability
- •5.5.2 Temporal Scalability
- •5.5.3 Fine Granular Scalability
- •5.6 TEXTURE CODING
- •5.8 CODING SYNTHETIC VISUAL SCENES
- •5.8.1 Animated 2D and 3D Mesh Coding
- •5.8.2 Face and Body Animation
- •5.9 CONCLUSIONS
- •5.10 REFERENCES
- •6.1 INTRODUCTION
- •6.1.1 Terminology
- •6.3.2 Video Format
- •6.3.3 Coded Data Format
- •6.3.4 Reference Pictures
- •6.3.5 Slices
- •6.3.6 Macroblocks
- •6.4 THE BASELINE PROFILE
- •6.4.1 Overview
- •6.4.2 Reference Picture Management
- •6.4.3 Slices
- •6.4.4 Macroblock Prediction
- •6.4.5 Inter Prediction
- •6.4.6 Intra Prediction
- •6.4.7 Deblocking Filter
- •6.4.8 Transform and Quantisation
- •6.4.11 The Complete Transform, Quantisation, Rescaling and Inverse Transform Process
- •6.4.12 Reordering
- •6.4.13 Entropy Coding
- •6.5 THE MAIN PROFILE
- •6.5.1 B slices
- •6.5.2 Weighted Prediction
- •6.5.3 Interlaced Video
- •6.6 THE EXTENDED PROFILE
- •6.6.1 SP and SI slices
- •6.6.2 Data Partitioned Slices
- •6.8 CONCLUSIONS
- •6.9 REFERENCES
- •7 Design and Performance
- •7.1 INTRODUCTION
- •7.2 FUNCTIONAL DESIGN
- •7.2.1 Segmentation
- •7.2.2 Motion Estimation
- •7.2.4 Wavelet Transform
- •7.2.6 Entropy Coding
- •7.3 INPUT AND OUTPUT
- •7.3.1 Interfacing
- •7.4 PERFORMANCE
- •7.4.1 Criteria
- •7.4.2 Subjective Performance
- •7.4.4 Computational Performance
- •7.4.5 Performance Optimisation
- •7.5 RATE CONTROL
- •7.6 TRANSPORT AND STORAGE
- •7.6.1 Transport Mechanisms
- •7.6.2 File Formats
- •7.6.3 Coding and Transport Issues
- •7.7 CONCLUSIONS
- •7.8 REFERENCES
- •8 Applications and Directions
- •8.1 INTRODUCTION
- •8.2 APPLICATIONS
- •8.3 PLATFORMS
- •8.4 CHOOSING A CODEC
- •8.5 COMMERCIAL ISSUES
- •8.5.1 Open Standards?
- •8.5.3 Capturing the Market
- •8.6 FUTURE DIRECTIONS
- •8.7 CONCLUSIONS
- •8.8 REFERENCES
- •Bibliography
- •Index
• |
DESIGN AND PERFORMANCE |
256 |
parameters for MB1 affects the coding performance of MB2 since, for example, the coding modes of MB2 (e.g. MV, intra prediction mode, etc.) may be differentially encoded from the coding modes of MB1.
Achieving near-optimum rate–distortion performance can be a very complex problem indeed, many times more complex than the video coding process itself. In a practical CODEC, the choice of optimisation strategy depends on the available processing power and acceptable coding latency. So-called ‘two-pass’ encoding is widely used in offline encoding, in which each frame is processed once to generate sequence statistics which then influence the coding strategy in the second coding pass (often together with a rate control algorithm to achieve a target bit rate or file size).
Many alternative rate–distortion optimisation strategies have been proposed (such as those based on Lagrangian optimisation) and a useful review can be found in [6]. Rate– distortion optimisation should not be considered in isolation from computational performance. In fact, video CODEC optimisation is (a least) a three-variable problem since rate, distortion and computational complexity are all inter-related. For example, rate–distortion optimised mode decisions are achieved at the expense of increased complexity, ‘fast’ motion estimation algorithms often achieve low complexity at the expense of motion estimation (and hence coding) performance, and so on. Coding performance and computational performance can be traded against each other. For example, a real-time coding application for a hand-held device may be designed with minimal processing load at the expense of poor rate–distortion performance, whilst an application for offline encoding of broadcast video data may be designed to give good rate–distortion performance, since processing time is not an important issue but encoded quality is critical.
7.5 RATE CONTROL
The MPEG-4 Visual and H.264 standards require each video frame or object to be processed in units of a macroblock. If the control parameters of a video encoder are kept constant (e.g. motion estimation search area, quantisation step size, etc.), then the number of coded bits produced for each macroblock will change depending on the content of the video frame, causing the bit rate of the encoder output (measured in bits per coded frame or bits per second of video) to vary. Typically, an encoder with constant parameters will produce more bits when there is high motion and/or detail in the input sequence and fewer bits when there is low motion and/or detail. Figure 7.35 shows an example of the variation in output bitrate produced by coding the Office sequence (25 frames per second) using an MPEG-4 Simple Profile encoder, with a fixed quantiser step size of 12. The first frame is coded as an I-VOP (and produces a large number of bits because there is no temporal prediction) and successive frames are coded as P-VOPs. The number of bits per coded P-VOP varies between 1300 and 9000 (equivalent to a bitrate of 32–225 kbits per second).
This variation in bitrate can be a problem for many practical delivery and storage mechanisms. For example, a constant bitrate channel (such as a circuit-switched channel) cannot transport a variable-bitrate data stream. A packet-switched network can support varying throughput rates but the mean throughput at any point in time is limited by factors such as link rates and congestion. In these cases it is necessary to adapt or control the bitrate produced by a video encoder to match the available bitrate of the transmission mechanism. CD-ROM
RATE CONTROL |
• |
|
257 |
|
|
Office sequence, 25 fps, MP4 Simple Profile, QP = 12
|
10000 |
|
|
|
|
|
|
|
|
|
|
|
9000 |
|
|
|
|
|
|
|
|
|
|
|
8000 |
|
|
|
|
|
|
|
|
|
|
|
7000 |
|
|
|
|
|
|
|
|
|
|
frame |
6000 |
|
|
|
|
|
|
|
|
|
|
5000 |
|
|
|
|
|
|
|
|
|
|
|
Bitsper |
|
|
|
|
|
|
|
|
|
|
|
4000 |
|
|
|
|
|
|
|
|
|
|
|
|
3000 |
|
|
|
|
|
|
|
|
|
|
|
2000 |
|
|
|
|
|
|
|
|
|
|
|
1000 |
|
|
|
|
|
|
|
|
|
|
|
0 |
20 |
40 |
60 |
80 |
100 |
120 |
140 |
160 |
180 |
200 |
|
0 |
||||||||||
|
|
|
|
|
|
Frames |
|
|
|
|
|
Figure 7.35 Bit rate variation (MPEG-4 Simple Profile)
constant |
variable |
constant |
constant |
variable |
constant |
frame rate |
bitrate |
bitrate |
bitrate |
bitrate |
frame rate |
video |
|
|
encoder |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
decoder |
|
video |
|
|
|
|
|
|
|
|
|
channel |
|
|
|
|
|
|
|
|
|
|
||||||
frames |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
frames |
|||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
Figure 7.36 Encoder output and decoder input buffers
and DVD media have a fixed storage capacity and it is necessary to control the rate of an encoded video sequence (for example, a movie stored on DVD-Video) to fit the capacity of the medium.
The variable data rate produced by an encoder can be ‘smoothed’ by buffering the encoded data prior to transmission. Figure 7.36 shows a typical arrangement, in which the variable bitrate output of the encoder is passed to a ‘First In/First Out’ (FIFO) buffer. This buffer is emptied at a constant bitrate that is matched to the channel capacity. Another FIFO is placed at the input to the decoder and is filled at the channel bitrate and emptied by the decoder at a variable bitrate (since the decoder extracts P bits to decode each frame and P varies).
Example
The ‘Office’ sequence is coded using MPEG-4 Simple Profile with a fixed Q P = 12 to produce the variable bitrate plotted in Figure 7.35. The encoder output is buffered prior to transmission over a 100 kbps constant bitrate channel. The video frame rate is 25 fps and so the channel transmits
• |
DESIGN AND PERFORMANCE |
258 |
x 104 Encoder buffer contents (channel bitrate 100kbps)
|
10 |
|
|
|
|
|
|
|
|
|
9 |
|
|
|
|
|
|
|
|
|
8 |
|
|
|
|
|
|
|
|
(bits) |
7 |
|
|
|
|
|
|
|
|
6 |
|
|
|
|
|
|
|
|
|
contents |
|
|
|
|
|
|
|
|
|
5 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Buffer |
4 |
|
|
|
|
|
|
|
|
3 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
|
|
|
|
|
|
|
|
|
1 |
|
|
|
|
|
|
|
|
|
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
|
0 |
||||||||
|
|
|
|
|
Seconds |
|
|
|
|
Figure 7.37 Buffer example (encoder; channel bitrate 100 kbps)
4 kbits (and hence removes 4 kbits from the buffer) in every frame period. Figure 7.37 plots the contents of the encoder buffer (y-axis) against elapsed time (x-axis). The first I-VOP generates over 50 kbits and subsequent P-VOPs in the early part of the sequence produce relatively few bits and so the buffer contents drop for the first 2 seconds as the channel bitrate exceeds the encoded bitrate. At around 3 seconds the encoded bitrate starts to exceed the channel bitrate and the buffer fills up.
Figure 7.38 shows the state of the decoder buffer, filled at a rate of 100 kbps (4 kbits per frame) and emptied as the decoder extracts each frame. It takes half a second before the first complete coded frame (54 kbits) is received. From this point onwards, the decoder is able to extract and decode frames at the correct rate (25 frames per second) until around 4 seconds have elapsed. At this point, the decoder buffer is emptied and the decoder ‘stalls’ (i.e. it has to slow down or pause decoding until enough data are available in the buffer). Decoding picks up again after around 5.5 seconds.
If the decoder stalls in this way it is a problem for video playback because the video clip ‘freezes’ until enough data available to continue. The problem can be partially solved by adding a deliberate delay at the decoder. For example, Figure 7.39 shows the results if the decoder waits for 1 second before it starts decoding. Delaying decoding of the first frame allows the buffer contents to reach a higher level before decoding starts and in this case the contents never drop to zero and so playback can proceed smoothly2 .
2 Varying throughput rates from the channel can also be handled using a decoder buffer. For example, a widely-used technique for video streaming over IP networks is for the decoder to buffer a few seconds of coded data before commencing decoding. If data throughput drops temporarily (for example due to network congestion) then decoding can continue as long as data remain in the buffer.
RATE CONTROL |
• |
|
259 |
|
|
|
|
4 |
Decoder buffer contents (channel bitrate 100kbps) |
|
|
|||||
|
x 10 |
|
|
|||||||
|
7 |
|
|
|
|
|
|
|
|
|
|
|
1st frame decoded |
|
|
|
|
|
|
|
|
|
6 |
|
|
|
|
|
|
|
|
|
|
5 |
|
|
|
|
|
|
|
|
|
(bits) |
4 |
|
|
|
|
|
|
|
|
|
contents |
|
|
|
|
|
|
|
|
|
|
3 |
|
|
|
|
|
|
|
|
|
|
Buffer |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Decoder stalls |
|
|
|
|
|
|
1 |
|
|
|
|
|
|
|
|
|
|
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
|
0 |
|||||||||
|
|
|
|
|
Seconds |
|
|
|
|
|
Figure 7.38 Buffer example (decoder; channel bitrate 100 kbps)
x 104 Decoder buffer contents (channel bitrate 100kbps)
|
12 |
|
|
|
|
|
|
|
|
|
|
|
1st frame decoded |
|
|
|
|
|
|
|
|
|
10 |
|
|
|
|
|
|
|
|
|
(bits) |
8 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Buffercontents |
6 |
|
|
|
|
|
|
|
|
|
4 |
|
|
|
|
|
|
|
|
|
|
|
2 |
|
|
|
|
|
|
|
|
|
|
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
|
0 |
|||||||||
|
|
|
|
|
Seconds |
|
|
|
|
|
Figure 7.39 Buffer example (decoder; channel bitrate 100 kbps)
• |
DESIGN AND PERFORMANCE |
260 |
These examples show that a variable coded bitrate can be adapted to a constant bitrate delivery medium using encoder and decoder buffers. However, this adaptation comes at a cost of buffer storage space and delay and (as the examples demonstrate) the wider the bitrate variation, the larger the buffer size and decoding delay. Furthermore, it is not possible to cope with an arbitrary variation in bitrate using this method, unless the buffer sizes and decoding delay are set at impractically high levels. It is usually necessary to implement a feedback mechanism to control the encoder output bitrate in order to prevent the buffers from overor under-flowing.
Rate control involves modifying the encoding parameters in order to maintain a target output bitrate. The most obvious parameter to vary is the quantiser parameter or step size (QP) since increasing QP reduces coded bitrate (at the expense of lower decoded quality) and vice versa. A common approach to rate control is to modify QP during encoding in order to (a) maintain a target bitrate (or mean bitrate) and (b) minimise distortion in the decoded sequence. Optimising the tradeoff between bitrate and quality is a challenging task and many different approaches and algorithms have been proposed and implemented. The choice of rate control algorithm depends on the nature of the video application, for example:
(a)Offline encoding of stored video for storage on a DVD. Processing time is not a particular constraint and so a complex algorithm can be employed. The goal is to ‘fit’ a compressed video sequence into the available storage capacity whilst maximising image quality and ensuring that the decoder buffer of a DVD player does not overflow or underflow during decoding. Two-pass encoding (in which the encoder collects statistics about the video sequence in a first pass and then carries out encoding in a second pass) is a good option in this case.
(b)Encoding of live video for broadcast. A broadcast programme has one encoder and multiple decoders; decoder processing and buffering is limited whereas encoding may be carried out in expensive, fast hardware. A delay of a few seconds is usually acceptable and so there is scope for a medium-complexity rate control algorithm, perhaps incorporating two-pass encoding of each frame.
(c)Encoding for two-way videoconferencing. Each terminal has to carry out both encoding and decoding and processing power may be limited. Delay must be kept to a minimum (ideally less than around 0.5 seconds from frame capture at the encoder to display at the decoder). In this scenario a low-complexity rate control algorithm is appropriate. Encoder and decoder buffering should be minimised (in order to keep the delay small) and so the encoder must tightly control output rate. This in turn may cause decoded video quality to vary significantly, for example it may drop significantly when there is an increase in movement or detail in the video scene.
Recommendation H.264 does not (at present) specify or suggest a rate control algorithm (however, a proposal for H.264 rate control is described in [39]). MPEG-4 Visual describes a possible rate control algorithm in an Informative Annex [40] (i.e. use of the algorithm is not mandatory). This algorithm, known as the Scalable Rate Control (SRC) scheme, is appropriate for a single video object (a rectangular V.O. that covers the entire frame) and a range of bit rates and spatial/temporal resolutions. The SRC attempts to achieve a target bit rate over a certain number of frames (a ‘segment’ of frames, usually starting with an I-VOP) and assumes the following model for the encoder rate R:
R = |
X1 S |
+ |
X2 S |
(7.10) |
Q |
Q2 |
RATE CONTROL |
• |
|
261 |
|
|
where Q is the quantiser step size, S is the mean absolute difference of the residual frame after motion compensation (a measure of frame complexity) and X1, X2 are model parameters. Rate control consists of the following steps which are carried out after motion compensation and before encoding of each frame i:
1.Calculate a target bit rate Ri , based on the number of frames in the segment, the number of bits that are available for the remainder of the segment, the maximum acceptable buffer contents and the estimated complexity of frame i. (The maximum buffer size affects the latency from encoder input to decoder output. If the previous frame was complex, it is assumed that the next frame will be complex and should therefore be allocated a suitable number of bits: the algorithm attempts to balance this requirement against the limit on the total number of bits for the segment.)
2.Compute the quantiser step size Qi (to be applied to the whole frame). Calculate S for the complete residual frame and solve equation (7.10) to find Q.
3.Encode the frame.
4.Update the model parameters X1, X2 based on the actual number of bits generated for frame i.
The SRC algorithm aims to achieve a target bit rate across a segment of frames (rather than a sequence of arbitrary length) and does not modulate the quantiser step size within a coded frame, giving a uniform visual appearance within each frame but making it difficult to maintain a small buffer size and hence a low delay. An extension to the SRC supports modulation of the quantiser step size at the macroblock level and is suitable for low-delay applications that require ‘tight’ rate control. The macroblock-level algorithm is based on a model for the number of bits Bi required to encode macroblock i , equation (7.11):
= |
|
σ 2 |
|
|
Qi2 + |
|
|
||
Bi |
A K |
i |
C |
(7.11) |
|
where A is the number of pixels in a macroblock, σi is the standard deviation of luminance and chrominance in the residual macroblock (i.e. a measure of variation within the macroblock), Qi is the quantisation step size and K, C are constant model parameters. The following steps are carried out for each macroblock i :
1.Measure σi .
2.Calculate Qi based on B, K , C, σi and a macroblock weight αi .
3.Encode the macroblock.
4.Update the model parameters K and C based on the actual number of coded bits produced for the macroblock.
The weight αi controls the ‘importance’ of macroblock i to the subjective appearance of the image and a low value of αi means that the current macroblock is likely to be highly quantised. These weights may be selected to minimise changes in Qi at lower bit rates since each change involves sending a modified quantisation parameter DQUANT which means encoding an extra five bits per macroblock. It is important to minimise the number of changes to Qi during encoding of a frame at low bit rates because the extra five bits in a macroblock may become significant; at higher bit rates, this DQUANT overhead is less important and Q may change more frequently without significant penalty. This rate control method is effective
