- •Contents
- •Figures
- •Tables
- •Preface
- •Acknowledgments
- •1. Raster images
- •Aspect ratio
- •Geometry
- •Image capture
- •Digitization
- •Perceptual uniformity
- •Colour
- •Luma and colour difference components
- •Digital image representation
- •Square sampling
- •Comparison of aspect ratios
- •Aspect ratio
- •Frame rates
- •Image state
- •EOCF standards
- •Entertainment programming
- •Acquisition
- •Consumer origination
- •Consumer electronics (CE) display
- •Contrast
- •Contrast ratio
- •Perceptual uniformity
- •The “code 100” problem and nonlinear image coding
- •Linear and nonlinear
- •4. Quantization
- •Linearity
- •Decibels
- •Noise, signal, sensitivity
- •Quantization error
- •Full-swing
- •Studio-swing (footroom and headroom)
- •Interface offset
- •Processing coding
- •Two’s complement wrap-around
- •Perceptual attributes
- •History of display signal processing
- •Digital driving levels
- •Relationship between signal and lightness
- •Algorithm
- •Black level setting
- •Effect of contrast and brightness on contrast and brightness
- •An alternate interpretation
- •Brightness and contrast controls in LCDs
- •Brightness and contrast controls in PDPs
- •Brightness and contrast controls in desktop graphics
- •Symbolic image description
- •Raster images
- •Conversion among types
- •Image files
- •“Resolution” in computer graphics
- •7. Image structure
- •Image reconstruction
- •Sampling aperture
- •Spot profile
- •Box distribution
- •Gaussian distribution
- •8. Raster scanning
- •Flicker, refresh rate, and frame rate
- •Introduction to scanning
- •Scanning parameters
- •Interlaced format
- •Interlace and progressive
- •Scanning notation
- •Motion portrayal
- •Segmented-frame (24PsF)
- •Video system taxonomy
- •Conversion among systems
- •9. Resolution
- •Magnitude frequency response and bandwidth
- •Visual acuity
- •Viewing distance and angle
- •Kell effect
- •Resolution
- •Resolution in video
- •Viewing distance
- •Interlace revisited
- •10. Constant luminance
- •The principle of constant luminance
- •Compensating for the CRT
- •Departure from constant luminance
- •Luma
- •“Leakage” of luminance into chroma
- •11. Picture rendering
- •Surround effect
- •Tone scale alteration
- •Incorporation of rendering
- •Rendering in desktop computing
- •Luma
- •Sloppy use of the term luminance
- •Colour difference coding (chroma)
- •Chroma subsampling
- •Chroma subsampling notation
- •Chroma subsampling filters
- •Chroma in composite NTSC and PAL
- •Scanning standards
- •Widescreen (16:9) SD
- •Square and nonsquare sampling
- •Resampling
- •NTSC and PAL encoding
- •NTSC and PAL decoding
- •S-video interface
- •Frequency interleaving
- •Composite analog SD
- •15. Introduction to HD
- •HD scanning
- •Colour coding for BT.709 HD
- •Data compression
- •Image compression
- •Lossy compression
- •JPEG
- •Motion-JPEG
- •JPEG 2000
- •Mezzanine compression
- •MPEG
- •Picture coding types (I, P, B)
- •Reordering
- •MPEG-1
- •MPEG-2
- •Other MPEGs
- •MPEG IMX
- •MPEG-4
- •AVC-Intra
- •WM9, WM10, VC-1 codecs
- •Compression for CE acquisition
- •AVCHD
- •Compression for IP transport to consumers
- •VP8 (“WebM”) codec
- •Dirac (basic)
- •17. Streams and files
- •Historical overview
- •Physical layer
- •Stream interfaces
- •IEEE 1394 (FireWire, i.LINK)
- •HTTP live streaming (HLS)
- •18. Metadata
- •Metadata Example 1: CD-DA
- •Metadata Example 2: .yuv files
- •Metadata Example 3: RFF
- •Metadata Example 4: JPEG/JFIF
- •Metadata Example 5: Sequence display extension
- •Conclusions
- •19. Stereoscopic (“3-D”) video
- •Acquisition
- •S3D display
- •Anaglyph
- •Temporal multiplexing
- •Polarization
- •Wavelength multiplexing (Infitec/Dolby)
- •Autostereoscopic displays
- •Parallax barrier display
- •Lenticular display
- •Recording and compression
- •Consumer interface and display
- •Ghosting
- •Vergence and accommodation
- •20. Filtering and sampling
- •Sampling theorem
- •Sampling at exactly 0.5fS
- •Magnitude frequency response
- •Magnitude frequency response of a boxcar
- •The sinc weighting function
- •Frequency response of point sampling
- •Fourier transform pairs
- •Analog filters
- •Digital filters
- •Impulse response
- •Finite impulse response (FIR) filters
- •Physical realizability of a filter
- •Phase response (group delay)
- •Infinite impulse response (IIR) filters
- •Lowpass filter
- •Digital filter design
- •Reconstruction
- •Reconstruction close to 0.5fS
- •“(sin x)/x” correction
- •Further reading
- •2:1 downsampling
- •Oversampling
- •Interpolation
- •Lagrange interpolation
- •Lagrange interpolation as filtering
- •Polyphase interpolators
- •Polyphase taps and phases
- •Implementing polyphase interpolators
- •Decimation
- •Lowpass filtering in decimation
- •Spatial frequency domain
- •Comb filtering
- •Spatial filtering
- •Image presampling filters
- •Image reconstruction filters
- •Spatial (2-D) oversampling
- •Retina
- •Adaptation
- •Contrast sensitivity
- •Contrast sensitivity function (CSF)
- •24. Luminance and lightness
- •Radiance, intensity
- •Luminance
- •Relative luminance
- •Luminance from red, green, and blue
- •Lightness (CIE L*)
- •Fundamentals of vision
- •Definitions
- •Spectral power distribution (SPD) and tristimulus
- •Spectral constraints
- •CIE XYZ tristimulus
- •CIE [x, y] chromaticity
- •Blackbody radiation
- •Colour temperature
- •White
- •Chromatic adaptation
- •Perceptually uniform colour spaces
- •CIE L*a*b* (CIELAB)
- •CIE L*u*v* and CIE L*a*b* summary
- •Colour specification and colour image coding
- •Further reading
- •Additive reproduction (RGB)
- •Characterization of RGB primaries
- •BT.709 primaries
- •Leggacy SD primaries
- •sRGB system
- •SMPTE Free Scale (FS) primaries
- •AMPAS ACES primaries
- •SMPTE/DCI P3 primaries
- •CMFs and SPDs
- •Normalization and scaling
- •Luminance coefficients
- •Transformations between RGB and CIE XYZ
- •Noise due to matrixing
- •Transforms among RGB systems
- •Camera white reference
- •Display white reference
- •Gamut
- •Wide-gamut reproduction
- •Free Scale Gamut, Free Scale Log (FS-Gamut, FS-Log)
- •Further reading
- •27. Gamma
- •Gamma in CRT physics
- •The amazing coincidence!
- •Gamma in video
- •Opto-electronic conversion functions (OECFs)
- •BT.709 OECF
- •SMPTE 240M OECF
- •sRGB transfer function
- •Transfer functions in SD
- •Bit depth requirements
- •Gamma in modern display devices
- •Estimating gamma
- •Gamma in video, CGI, and Macintosh
- •Gamma in computer graphics
- •Gamma in pseudocolour
- •Limitations of 8-bit linear coding
- •Linear and nonlinear coding in CGI
- •Colour acuity
- •RGB and R’G’B’ colour cubes
- •Conventional luma/colour difference coding
- •Luminance and luma notation
- •Nonlinear red, green, blue (R’G’B’)
- •BT.601 luma
- •BT.709 luma
- •Chroma subsampling, revisited
- •Luma/colour difference summary
- •SD and HD luma chaos
- •Luma/colour difference component sets
- •B’-Y’, R’-Y’ components for SD
- •PBPR components for SD
- •CBCR components for SD
- •Y’CBCR from studio RGB
- •Y’CBCR from computer RGB
- •“Full-swing” Y’CBCR
- •Y’UV, Y’IQ confusion
- •B’-Y’, R’-Y’ components for BT.709 HD
- •PBPR components for BT.709 HD
- •CBCR components for BT.709 HD
- •CBCR components for xvYCC
- •Y’CBCR from studio RGB
- •Y’CBCR from computer RGB
- •Conversions between HD and SD
- •Colour coding standards
- •31. Video signal processing
- •Edge treatment
- •Transition samples
- •Picture lines
- •Choice of SAL and SPW parameters
- •Video levels
- •Setup (pedestal)
- •BT.601 to computing
- •Enhancement
- •Median filtering
- •Coring
- •Chroma transition improvement (CTI)
- •Mixing and keying
- •Field rate
- •Line rate
- •Sound subcarrier
- •Addition of composite colour
- •NTSC colour subcarrier
- •576i PAL colour subcarrier
- •4fSC sampling
- •Common sampling rate
- •Numerology of HD scanning
- •Audio rates
- •33. Timecode
- •Introduction
- •Dropframe timecode
- •Editing
- •Linear timecode (LTC)
- •Vertical interval timecode (VITC)
- •Timecode structure
- •Further reading
- •34. 2-3 pulldown
- •2-3-3-2 pulldown
- •Conversion of film to different frame rates
- •Native 24 Hz coding
- •Conversion to other rates
- •Spatial domain
- •Vertical-temporal domain
- •Motion adaptivity
- •Further reading
- •36. Colourbars
- •SD colourbars
- •SD colourbar notation
- •Pluge element
- •Composite decoder adjustment using colourbars
- •-I, +Q, and Pluge elements in SD colourbars
- •HD colourbars
- •References
- •38. SDI and HD-SDI interfaces
- •Component digital SD interface (BT.601)
- •Serial digital interface (SDI)
- •Component digital HD-SDI
- •SDI and HD-SDI sync, TRS, and ancillary data
- •Analog sync and digital/analog timing relationships
- •Ancillary data
- •SDI coding
- •HD-SDI coding
- •Interfaces for compressed video
- •SDTI
- •Switching and mixing
- •Timing in digital facilities
- •Summary of digital interfaces
- •39. 480i component video
- •Frame rate
- •Interlace
- •Line sync
- •Field/frame sync
- •R’G’B’ EOCF and primaries
- •Luma (Y’)
- •Picture center, aspect ratio, and blanking
- •Halfline blanking
- •Component digital 4:2:2 interface
- •Component analog R’G’B’ interface
- •Component analog Y’PBPR interface, EBU N10
- •Component analog Y’PBPR interface, industry standard
- •40. 576i component video
- •Frame rate
- •Interlace
- •Line sync
- •Analog field/frame sync
- •R’G’B’ EOCF and primaries
- •Luma (Y’)
- •Picture center, aspect ratio, and blanking
- •Component digital 4:2:2 interface
- •Component analog 576i interface
- •Scanning
- •Analog sync
- •Picture center, aspect ratio, and blanking
- •R’G’B’ EOCF and primaries
- •Luma (Y’)
- •Component digital 4:2:2 interface
- •Scanning
- •Analog sync
- •Picture center, aspect ratio, and blanking
- •R’G’B’ EOCF and primaries
- •Luma (Y’)
- •Component digital 4:2:2 interface
- •43. HD videotape
- •HDCAM (D-11)
- •DVCPRO HD (D-12)
- •HDCAM SR (D-16)
- •JPEG blocks and MCUs
- •JPEG block diagram
- •Level shifting
- •Discrete cosine transform (DCT)
- •JPEG encoding example
- •JPEG decoding
- •Compression ratio control
- •JPEG/JFIF
- •Motion-JPEG (M-JPEG)
- •Further reading
- •46. DV compression
- •DV chroma subsampling
- •DV frame/field modes
- •Picture-in-shuttle in DV
- •DV overflow scheme
- •DV quantization
- •DV digital interface (DIF)
- •Consumer DV recording
- •Professional DV variants
- •47. MPEG-2 video compression
- •MPEG-2 profiles and levels
- •Picture structure
- •Frame rate and 2-3 pulldown in MPEG
- •Luma and chroma sampling structures
- •Macroblocks
- •Picture coding types – I, P, B
- •Prediction
- •Motion vectors (MVs)
- •Coding of a block
- •Frame and field DCT types
- •Zigzag and VLE
- •Refresh
- •Motion estimation
- •Rate control and buffer management
- •Bitstream syntax
- •Transport
- •Further reading
- •48. H.264 video compression
- •Algorithmic features, profiles, and levels
- •Baseline and extended profiles
- •High profiles
- •Hierarchy
- •Multiple reference pictures
- •Slices
- •Spatial intra prediction
- •Flexible motion compensation
- •Quarter-pel motion-compensated interpolation
- •Weighting and offsetting of MC prediction
- •16-bit integer transform
- •Quantizer
- •Variable-length coding
- •Context adaptivity
- •CABAC
- •Deblocking filter
- •Buffer control
- •Scalable video coding (SVC)
- •Multiview video coding (MVC)
- •AVC-Intra
- •Further reading
- •49. VP8 compression
- •Algorithmic features
- •Further reading
- •Elementary stream (ES)
- •Packetized elementary stream (PES)
- •MPEG-2 program stream
- •MPEG-2 transport stream
- •System clock
- •Further reading
- •Japan
- •United States
- •ATSC modulation
- •Europe
- •Further reading
- •Appendices
- •Cement vs. concrete
- •True CIE luminance
- •The misinterpretation of luminance
- •The enshrining of luma
- •Colour difference scale factors
- •Conclusion: A plea
- •Radiometry
- •Photometry
- •Light level examples
- •Image science
- •Units
- •Further reading
- •Glossary
- •Index
- •About the author
Uncompressed recording is taken to be the upper floor; formats for distribution are at ground level. The mezzanine lies between.
Dirac PRO (VC-2)
DNxHD (VC-3)
Apple ProRes
RedCode; CineForm
Mezzanine compression
DV and its derivatives and relatives are common in the studio. In the last several years, several software-based codecs suitable for acquisition and postproduction have emerged: Dirac/VC-2, DNxHD/VC-3, and Apple ProRes. The term mezzanine is used for such codecs, signifying compression rates intermediate between uncompressed recording and consumer distribution.
An open-source development project led by the BBC developed video compression technology collectively named Dirac (in honour of the Nobel prize-winning physicist) for lossy mezzanine-level intra-frame wavelet compression. Dirac pro handles a wide range of video formats, but is optimized to compress 10-bit, 4:2:2 1080p video to bit rates between about 50 Mb/s and 165 Mb/s. SMPTE has standardized Dirac PRO in the ST 2042 series of standards (also known as VC-2).
The Dirac PRO bitstream includes parameter values to identify the complexity of the coded bit-stream through the use of profiles and levels. These values enable a decoder to easily establish whether it has the capability to decode the bit-stream. Profiles and levels are defined.
Avid implemented a family of compression systems that are now widely deployed in postproduction. One of these, denoted DNxHD by Avid, has been standardized by SMPTE as ST 2042-1, colloquially called VC-3. Bit rate for HD ranges from about 60 Mb/s to 220 Mb/s. Compressed data is typically carried in MXF files.
Apple implemented a set of proprietary variable bit rate, intra-frame DCT-based codecs called ProRes, now used fairly widely in acquisition and in postproduction. ProRes resembles motion-JPEG; however, the scheme is proprietary and details aren’t published. The scheme was designed for implementation in software; however, hardware-based codecs are commercially available.
Several intraframe wavelet-based codecs have been developed and deployed for digital cinema acquisition, and can be used for HD. Red’s RedCode is a proprietary form of JPEG 2000; GoPro’s CineForm also uses wavelet compression. Both of these schemes individually compress Bayer mosaic data prior to demosaicking.
CHAPTER 16 |
INTRODUCTION TO VIDEO COMPRESSION |
151 |
Reference value
∆
1
∆
2
∆
3
∆
4
∆
5
∆
6
∆
7
∆
8
Figure 16.1 Interpicture coding exploits the similarity between successive pictures in video. First, a reference picture is transmitted (typically using intrapicture compression). Then, pixel differences to successive pictures are computed by the encoder and transmitted. The decoder reconstructs successive pictures by accumulating the differences. The scheme is effective provided that the difference information can be coded more compactly than the raw picture information.
The M in MPEG stands for moving, not motion! If you read a document that errs in this detail, the accuracy of the document is suspect!
MPEG
Apart from scene changes, there is a statistical likelihood that successive pictures in a video sequence are very similar. In fact, it is necessary that successive pictures are mostly similar: If this were not the case, human vision could make no sense of the sequence!
The efficiency of transform coding can be increased by a factor of 10 or more by exploiting the inherent temporal redundancy of video. The MPEG standard was developed by the Moving Picture Experts Group within ISO and IEC. In MPEG, intra-frame compression is used to provide an initial, self-contained picture – a reference picture, upon which predictions of succeeding pictures can be based. The encoder then transmits pixel differences, that is, prediction errors, or residuals – from the reference, as sketched in Figure 16.1. The method is termed interframe coding. (In interlaced video, differences between fields may be used, so the method is more accurately described as interpicture coding.)
Once a reference picture has been received by the decoder, it provides a basis for predicting succeeding pictures. This estimate is improved when the decoder receives the residuals. The scheme is effective provided that the residuals can be coded more compactly than the raw picture information.
Motion in a video sequence causes displacement of scene elements with respect to the image array. A fastmoving image element may easily move 20 pixels in one frame time. In the presence of motion, a pixel at a certain location may take quite different values in successive pictures. Motion is liable to cause prediction
152 |
DIGITAL VIDEO AND HD ALGORITHMS AND INTERFACES |
Figure 16.2 An MPEG group of pictures (GoP). The GoP depicted here has nine pictures, numbered 0 through 8. Picture 9 is the first picture of the next GoP. I-picture 0 is decoded from the coded data depicted as a green block. Here, the intra-count (n) is 9.
One form of motion-compensated prediction forms an integer-valued MV, and copies a block of pixels from a reference picture for use as a prediction block. A more sophisticated form computes motion vectors to half-pixel precision, and interpolates between reference pixels (for example, by averaging). The latter is motion-compensated interpolation.
Although the simple technique involves just copying, it’s also called interpolation.
When encoding interlaced source material, an MPEG-2 encoder can choose to code each field as
a picture or each frame as
a picture, as I will describe on page 518. In this chapter, and in Chapter 47, the term picture can refer to either a field or a frame.
error information to grow in size to the point where the advantage of interframe coding would be negated.
However, image elements tend to retain their spatial structure even when moving. MPEG overcomes the problem of motion between pictures by using motioncompensated prediction (MCP). The encoder is equipped with motion estimation (ME) circuitry that computes motion vectors. The encoder then displaces the pixel values of the reference picture by the estimated motion – a process called motion compensated interpolation – then computes residuals from the motioncompensated reference. The encoder compresses the residuals using a JPEG-like technique, then transmits the motion vectors and the compressed residuals.
Based upon the received motion vectors, the decoder mimics the motion compensated interpolation of the encoder to obtain a predictor much more effective than the undisplaced reference picture. The received residuals are then applied (by simple addition) to reconstruct an approximation of the encoder’s picture.
The MPEG suite of standards specifies the properties of a compliant bitstream and the algorithms that are implemented by a decoder. Any encoder that produces compliant (“legal”) bistreams is considered MPEGcompliant, even if it produces poor-quality images.
Picture coding types (I, P, B)
In MPEG, a video sequence is partitioned into successive groups of pictures (GoPs). The first picture in each GoP is coded using a JPEG-like algorithm, independently of other pictures. This is an intra or I-picture. Once reconstructed, an I-picture becomes a reference picture available for use in predicting neighboring (nonintra) pictures. The example GoP sketched in Figure 16.2 above comprises nine pictures.
CHAPTER 16 |
INTRODUCTION TO VIDEO COMPRESSION |
153 |
Figure 16.3 An MPEG P-picture contains elements forward-predicted from a preceding reference picture, which may be an I-picture or
a P-picture. Here, a P-picture
(3) is predicted from an I-picture (0). Once decoded, that P-picture becomes the predictor for a following P-picture (6).
The term bidirectional prediction was historically used, suggesting reverse and forward directions in time. It is often the case that pairs of motion vectors associated with B-pictures point in opposite directions in the image plane. In H.264, the term bipredictive is used, and is more accurate in H.264’s context.
A P-picture contains elements that are predicted from the most recent anchor frame. Once a P-picture is reconstructed, it is displayed; in addition, it becomes a new anchor. I-pictures and P-pictures form a twolayer hierarchy. An I-picture and two dependent P-pictures are depicted in Figure 16.3 above.
MPEG provides an optional third hierarchical level whereby B-pictures may be interposed between anchor pictures. Elements of a B-picture are typically bipredicted by averaging motion-compensated elements from the past reference picture and motion-compen- sated elements from the future reference picture. (At the encoder’s discretion, elements of a B-picture may be unidirectionally forward-interpolated from the preceding reference, or unidirectionally backwardpredicted from the following reference.) Each B-picture is reconstructed, displayed, then discarded: No decoded B-picture forms the basis for any prediction. Using B-pictures delivers a substantial gain in compression efficiency compared to encoding with just I- and P-pictures, but incurs a coding latency.
Figure 16.4 below depicts two B-pictures.
Figure 16.4 An MPEG B-picture is generally predicted from the average of the preceding reference picture and the following (“future”) reference picture. (At the encoder’s option, a B-picture may be unidirectionally forward-predicted from the preceding reference, or unidirectionally backward-predicted from the following reference.)
I/P
I/P
154 |
DIGITAL VIDEO AND HD ALGORITHMS AND INTERFACES |
Figure 16.5 The three-level MPEG picture hierarchy. This sketch shows a regular GoP structure with an I-picture interval of n=9, and a reference picture interval of m=3. This example represents a simple encoder that emits a fixed schedule of I-, B-, and P-pictures; this structure can be described as IBBPBBPBB. The example shows an open GoP: B-pictures following the GoP’s last P-picture are permitted to use backward prediction from the I-picture of the following GoP. Such prediction precludes editing of the bitstream between GoPs. A closed GoP permits no such prediction, so the bitstream can be edited between GoPs. Closed GoPs lose some efficiency.
A 15-frame “long” GoP structured IBBPBBPBBPBBPBB is common for broadcasting. A sophisticated encoder may produce irregular GoP patterns.
The three-level MPEG picture hierarchy is summarized in Figure 16.5 above; this example has the structure IBBPBBPBB.
A simple encoder typically produces a bitstream having a fixed schedule of I-, P-, and B-pictures. A typical GoP structure is denoted IBBPBBPBBPBBPBB. At 30 pictures per second, there are two such GoPs per second. Periodic GoP structure can be described by
a pair of integers n and m; n is the number of pictures from one I-picture (inclusive) to the next (exclusive), and m is the number of pictures from one anchor picture (inclusive) to the next (exclusive). If m=1, there are no B-pictures. Figure 16.5 shows a regular GoP structure with an I-picture interval of n=9 and an anchor-picture interval of m=3. The m=3 component indicates two B-pictures between anchor pictures. Rarely do more than 2 B-pictures intervene between reference pictures.
Coded B-pictures in a GoP depend upon P- and I-pictures; coded P-pictures depend upon earlier P-pictures and I-pictures. Owing to these interdependencies, an MPEG sequence cannot be edited, except at GoP boundaries, unless the sequence is decoded, edited, and subsequently reencoded. MPEG is very suit-
CHAPTER 16 |
INTRODUCTION TO VIDEO COMPRESSION |
155 |
