- •Contents
- •Figures
- •Tables
- •Preface
- •Acknowledgments
- •1. Raster images
- •Aspect ratio
- •Geometry
- •Image capture
- •Digitization
- •Perceptual uniformity
- •Colour
- •Luma and colour difference components
- •Digital image representation
- •Square sampling
- •Comparison of aspect ratios
- •Aspect ratio
- •Frame rates
- •Image state
- •EOCF standards
- •Entertainment programming
- •Acquisition
- •Consumer origination
- •Consumer electronics (CE) display
- •Contrast
- •Contrast ratio
- •Perceptual uniformity
- •The “code 100” problem and nonlinear image coding
- •Linear and nonlinear
- •4. Quantization
- •Linearity
- •Decibels
- •Noise, signal, sensitivity
- •Quantization error
- •Full-swing
- •Studio-swing (footroom and headroom)
- •Interface offset
- •Processing coding
- •Two’s complement wrap-around
- •Perceptual attributes
- •History of display signal processing
- •Digital driving levels
- •Relationship between signal and lightness
- •Algorithm
- •Black level setting
- •Effect of contrast and brightness on contrast and brightness
- •An alternate interpretation
- •Brightness and contrast controls in LCDs
- •Brightness and contrast controls in PDPs
- •Brightness and contrast controls in desktop graphics
- •Symbolic image description
- •Raster images
- •Conversion among types
- •Image files
- •“Resolution” in computer graphics
- •7. Image structure
- •Image reconstruction
- •Sampling aperture
- •Spot profile
- •Box distribution
- •Gaussian distribution
- •8. Raster scanning
- •Flicker, refresh rate, and frame rate
- •Introduction to scanning
- •Scanning parameters
- •Interlaced format
- •Interlace and progressive
- •Scanning notation
- •Motion portrayal
- •Segmented-frame (24PsF)
- •Video system taxonomy
- •Conversion among systems
- •9. Resolution
- •Magnitude frequency response and bandwidth
- •Visual acuity
- •Viewing distance and angle
- •Kell effect
- •Resolution
- •Resolution in video
- •Viewing distance
- •Interlace revisited
- •10. Constant luminance
- •The principle of constant luminance
- •Compensating for the CRT
- •Departure from constant luminance
- •Luma
- •“Leakage” of luminance into chroma
- •11. Picture rendering
- •Surround effect
- •Tone scale alteration
- •Incorporation of rendering
- •Rendering in desktop computing
- •Luma
- •Sloppy use of the term luminance
- •Colour difference coding (chroma)
- •Chroma subsampling
- •Chroma subsampling notation
- •Chroma subsampling filters
- •Chroma in composite NTSC and PAL
- •Scanning standards
- •Widescreen (16:9) SD
- •Square and nonsquare sampling
- •Resampling
- •NTSC and PAL encoding
- •NTSC and PAL decoding
- •S-video interface
- •Frequency interleaving
- •Composite analog SD
- •15. Introduction to HD
- •HD scanning
- •Colour coding for BT.709 HD
- •Data compression
- •Image compression
- •Lossy compression
- •JPEG
- •Motion-JPEG
- •JPEG 2000
- •Mezzanine compression
- •MPEG
- •Picture coding types (I, P, B)
- •Reordering
- •MPEG-1
- •MPEG-2
- •Other MPEGs
- •MPEG IMX
- •MPEG-4
- •AVC-Intra
- •WM9, WM10, VC-1 codecs
- •Compression for CE acquisition
- •AVCHD
- •Compression for IP transport to consumers
- •VP8 (“WebM”) codec
- •Dirac (basic)
- •17. Streams and files
- •Historical overview
- •Physical layer
- •Stream interfaces
- •IEEE 1394 (FireWire, i.LINK)
- •HTTP live streaming (HLS)
- •18. Metadata
- •Metadata Example 1: CD-DA
- •Metadata Example 2: .yuv files
- •Metadata Example 3: RFF
- •Metadata Example 4: JPEG/JFIF
- •Metadata Example 5: Sequence display extension
- •Conclusions
- •19. Stereoscopic (“3-D”) video
- •Acquisition
- •S3D display
- •Anaglyph
- •Temporal multiplexing
- •Polarization
- •Wavelength multiplexing (Infitec/Dolby)
- •Autostereoscopic displays
- •Parallax barrier display
- •Lenticular display
- •Recording and compression
- •Consumer interface and display
- •Ghosting
- •Vergence and accommodation
- •20. Filtering and sampling
- •Sampling theorem
- •Sampling at exactly 0.5fS
- •Magnitude frequency response
- •Magnitude frequency response of a boxcar
- •The sinc weighting function
- •Frequency response of point sampling
- •Fourier transform pairs
- •Analog filters
- •Digital filters
- •Impulse response
- •Finite impulse response (FIR) filters
- •Physical realizability of a filter
- •Phase response (group delay)
- •Infinite impulse response (IIR) filters
- •Lowpass filter
- •Digital filter design
- •Reconstruction
- •Reconstruction close to 0.5fS
- •“(sin x)/x” correction
- •Further reading
- •2:1 downsampling
- •Oversampling
- •Interpolation
- •Lagrange interpolation
- •Lagrange interpolation as filtering
- •Polyphase interpolators
- •Polyphase taps and phases
- •Implementing polyphase interpolators
- •Decimation
- •Lowpass filtering in decimation
- •Spatial frequency domain
- •Comb filtering
- •Spatial filtering
- •Image presampling filters
- •Image reconstruction filters
- •Spatial (2-D) oversampling
- •Retina
- •Adaptation
- •Contrast sensitivity
- •Contrast sensitivity function (CSF)
- •24. Luminance and lightness
- •Radiance, intensity
- •Luminance
- •Relative luminance
- •Luminance from red, green, and blue
- •Lightness (CIE L*)
- •Fundamentals of vision
- •Definitions
- •Spectral power distribution (SPD) and tristimulus
- •Spectral constraints
- •CIE XYZ tristimulus
- •CIE [x, y] chromaticity
- •Blackbody radiation
- •Colour temperature
- •White
- •Chromatic adaptation
- •Perceptually uniform colour spaces
- •CIE L*a*b* (CIELAB)
- •CIE L*u*v* and CIE L*a*b* summary
- •Colour specification and colour image coding
- •Further reading
- •Additive reproduction (RGB)
- •Characterization of RGB primaries
- •BT.709 primaries
- •Leggacy SD primaries
- •sRGB system
- •SMPTE Free Scale (FS) primaries
- •AMPAS ACES primaries
- •SMPTE/DCI P3 primaries
- •CMFs and SPDs
- •Normalization and scaling
- •Luminance coefficients
- •Transformations between RGB and CIE XYZ
- •Noise due to matrixing
- •Transforms among RGB systems
- •Camera white reference
- •Display white reference
- •Gamut
- •Wide-gamut reproduction
- •Free Scale Gamut, Free Scale Log (FS-Gamut, FS-Log)
- •Further reading
- •27. Gamma
- •Gamma in CRT physics
- •The amazing coincidence!
- •Gamma in video
- •Opto-electronic conversion functions (OECFs)
- •BT.709 OECF
- •SMPTE 240M OECF
- •sRGB transfer function
- •Transfer functions in SD
- •Bit depth requirements
- •Gamma in modern display devices
- •Estimating gamma
- •Gamma in video, CGI, and Macintosh
- •Gamma in computer graphics
- •Gamma in pseudocolour
- •Limitations of 8-bit linear coding
- •Linear and nonlinear coding in CGI
- •Colour acuity
- •RGB and R’G’B’ colour cubes
- •Conventional luma/colour difference coding
- •Luminance and luma notation
- •Nonlinear red, green, blue (R’G’B’)
- •BT.601 luma
- •BT.709 luma
- •Chroma subsampling, revisited
- •Luma/colour difference summary
- •SD and HD luma chaos
- •Luma/colour difference component sets
- •B’-Y’, R’-Y’ components for SD
- •PBPR components for SD
- •CBCR components for SD
- •Y’CBCR from studio RGB
- •Y’CBCR from computer RGB
- •“Full-swing” Y’CBCR
- •Y’UV, Y’IQ confusion
- •B’-Y’, R’-Y’ components for BT.709 HD
- •PBPR components for BT.709 HD
- •CBCR components for BT.709 HD
- •CBCR components for xvYCC
- •Y’CBCR from studio RGB
- •Y’CBCR from computer RGB
- •Conversions between HD and SD
- •Colour coding standards
- •31. Video signal processing
- •Edge treatment
- •Transition samples
- •Picture lines
- •Choice of SAL and SPW parameters
- •Video levels
- •Setup (pedestal)
- •BT.601 to computing
- •Enhancement
- •Median filtering
- •Coring
- •Chroma transition improvement (CTI)
- •Mixing and keying
- •Field rate
- •Line rate
- •Sound subcarrier
- •Addition of composite colour
- •NTSC colour subcarrier
- •576i PAL colour subcarrier
- •4fSC sampling
- •Common sampling rate
- •Numerology of HD scanning
- •Audio rates
- •33. Timecode
- •Introduction
- •Dropframe timecode
- •Editing
- •Linear timecode (LTC)
- •Vertical interval timecode (VITC)
- •Timecode structure
- •Further reading
- •34. 2-3 pulldown
- •2-3-3-2 pulldown
- •Conversion of film to different frame rates
- •Native 24 Hz coding
- •Conversion to other rates
- •Spatial domain
- •Vertical-temporal domain
- •Motion adaptivity
- •Further reading
- •36. Colourbars
- •SD colourbars
- •SD colourbar notation
- •Pluge element
- •Composite decoder adjustment using colourbars
- •-I, +Q, and Pluge elements in SD colourbars
- •HD colourbars
- •References
- •38. SDI and HD-SDI interfaces
- •Component digital SD interface (BT.601)
- •Serial digital interface (SDI)
- •Component digital HD-SDI
- •SDI and HD-SDI sync, TRS, and ancillary data
- •Analog sync and digital/analog timing relationships
- •Ancillary data
- •SDI coding
- •HD-SDI coding
- •Interfaces for compressed video
- •SDTI
- •Switching and mixing
- •Timing in digital facilities
- •Summary of digital interfaces
- •39. 480i component video
- •Frame rate
- •Interlace
- •Line sync
- •Field/frame sync
- •R’G’B’ EOCF and primaries
- •Luma (Y’)
- •Picture center, aspect ratio, and blanking
- •Halfline blanking
- •Component digital 4:2:2 interface
- •Component analog R’G’B’ interface
- •Component analog Y’PBPR interface, EBU N10
- •Component analog Y’PBPR interface, industry standard
- •40. 576i component video
- •Frame rate
- •Interlace
- •Line sync
- •Analog field/frame sync
- •R’G’B’ EOCF and primaries
- •Luma (Y’)
- •Picture center, aspect ratio, and blanking
- •Component digital 4:2:2 interface
- •Component analog 576i interface
- •Scanning
- •Analog sync
- •Picture center, aspect ratio, and blanking
- •R’G’B’ EOCF and primaries
- •Luma (Y’)
- •Component digital 4:2:2 interface
- •Scanning
- •Analog sync
- •Picture center, aspect ratio, and blanking
- •R’G’B’ EOCF and primaries
- •Luma (Y’)
- •Component digital 4:2:2 interface
- •43. HD videotape
- •HDCAM (D-11)
- •DVCPRO HD (D-12)
- •HDCAM SR (D-16)
- •JPEG blocks and MCUs
- •JPEG block diagram
- •Level shifting
- •Discrete cosine transform (DCT)
- •JPEG encoding example
- •JPEG decoding
- •Compression ratio control
- •JPEG/JFIF
- •Motion-JPEG (M-JPEG)
- •Further reading
- •46. DV compression
- •DV chroma subsampling
- •DV frame/field modes
- •Picture-in-shuttle in DV
- •DV overflow scheme
- •DV quantization
- •DV digital interface (DIF)
- •Consumer DV recording
- •Professional DV variants
- •47. MPEG-2 video compression
- •MPEG-2 profiles and levels
- •Picture structure
- •Frame rate and 2-3 pulldown in MPEG
- •Luma and chroma sampling structures
- •Macroblocks
- •Picture coding types – I, P, B
- •Prediction
- •Motion vectors (MVs)
- •Coding of a block
- •Frame and field DCT types
- •Zigzag and VLE
- •Refresh
- •Motion estimation
- •Rate control and buffer management
- •Bitstream syntax
- •Transport
- •Further reading
- •48. H.264 video compression
- •Algorithmic features, profiles, and levels
- •Baseline and extended profiles
- •High profiles
- •Hierarchy
- •Multiple reference pictures
- •Slices
- •Spatial intra prediction
- •Flexible motion compensation
- •Quarter-pel motion-compensated interpolation
- •Weighting and offsetting of MC prediction
- •16-bit integer transform
- •Quantizer
- •Variable-length coding
- •Context adaptivity
- •CABAC
- •Deblocking filter
- •Buffer control
- •Scalable video coding (SVC)
- •Multiview video coding (MVC)
- •AVC-Intra
- •Further reading
- •49. VP8 compression
- •Algorithmic features
- •Further reading
- •Elementary stream (ES)
- •Packetized elementary stream (PES)
- •MPEG-2 program stream
- •MPEG-2 transport stream
- •System clock
- •Further reading
- •Japan
- •United States
- •ATSC modulation
- •Europe
- •Further reading
- •Appendices
- •Cement vs. concrete
- •True CIE luminance
- •The misinterpretation of luminance
- •The enshrining of luma
- •Colour difference scale factors
- •Conclusion: A plea
- •Radiometry
- •Photometry
- •Light level examples
- •Image science
- •Units
- •Further reading
- •Glossary
- •Index
- •About the author
Figure 16.6 Example GoP
I0B1B2P3B4B5P6B7B8
Figure 16.7 Example 9-picture GoP without B-pictures
I0P1P2P3P4P5P6P7P8
able for distribution, but owing to its inability to be edited without impairment at arbitrary points, MPEG is generally unsuitable for production. In the specialization of MPEG-2 called I-frame only MPEG-2, every GoP is a single I-frame. This is conceptually equivalent to Motion-JPEG, but has the great benefit of an international standard. (Another variant of MPEG-2, the simple profile, has no B-pictures.)
I have introduced MPEG as if all elements of every P-picture and all elements of every B-picture are coded similarly. But even a picture that is generally well predicted by the past reference picture may have a few regions that cannot effectively be predicted. In MPEG, the image is tiled into macroblocks of 16× 16 luma samples, and the encoder is given the option to code any particular macroblock in intra mode – that is, independently of any prediction. A compact code signals that a macroblock should be skipped, in which case the motion-compensated prediction is used without modification. In a B-picture, the encoder can decide on
a macroblock-by-macroblock basis to code using forward prediction, backward prediction, or biprediction. Formally, an I-picture contains only I-macro- blocks; a P-picture has at least one P-macroblock, and a B-picture has at least one B-macroblock.
Reordering
In a sequence without B-pictures, I- and P-pictures are encoded then stored or transmitted in the obvious order. However, when B-pictures are used, the decoder typically needs to access the past anchor picture and the future anchor picture to reconstruct a B-picture.
Consider an encoder about to compress the sequence in Figure 16.6 (where anchor pictures I0, P3, and P6 are written in boldface). The coded B1 and B2 pictures may be backward predicted from P3, so the encoder must buffer the uncompressed B1 and B2 pictures until P3 is coded: Only when coding of P3 is complete can coding of B1 start. Using B-pictures incurs a penalty in encoding delay. (If the sequence were coded without B-pictures, as depicted in Figure 16.7, transmission of the coded information for P1 would not be subject to this two-picture delay.) Coding delay
156 |
DIGITAL VIDEO AND HD ALGORITHMS AND INTERFACES |
Figure 16.8 GoP reordered for transmission
I0P3B1B2P6B4B5(I9)B7B8
ISO/IEC 11172-1, Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s – Part 1: Systems [MPEG-1].
(latency) can make MPEG with B-pictures unsuitable for realtime two-way applications such as teleconferencing.
If the coded 9-picture GoP of Figure 16.6 were transmitted in the order shown, then the decoder would have to hold the coded B1 and B2 data in a buffer while receiving and decoding P3; only when decoding of P3 was complete could decoding of B1 start. The encoder must buffer the B1 and B2 pictures no matter what; however, to minimize buffer memory at the decoder, MPEG-2 specifies that coded B-picture information is transmitted after the coded reference picture.
Figure 16.8 indicates picture data as reordered for transmission. I have placed I9 in parentheses because it belongs to the next GoP (the GoP header precedes it). Here, B7 and B8 follow the GoP header.
MPEG-1
The original MPEG effort resulted in a standard now called MPEG-1, which was deployed in multimedia applications. MPEG-1 was optimized for the coding of progressive 352× 240 images at 30 frames per second (240p30). MPEG-1 has no provision for interlace. When 480i29.97 or 576i25 video is coded with MPEG-1 at typical data rates, the first field of each frame is coded as if it were progressive, and the second field is dropped. At its intended data rate of about 1.5 Mb/s, MPEG-1 delivers VHS-quality images.
For broadcast, MPEG-1 has been superseded by MPEG-2. An MPEG-2 decoder must decode MPEG-1 constrained-parameter bitstream (CPB) sequences – to be discussed in the caption to Table 47.1, on page 515 – so I will not discuss MPEG-1 further.
MPEG-2
The MPEG-2 effort was initiated to extend MPEG-1 to interlaced scanning, to larger pictures, and to data rates much higher than 1.5 Mb/s. MPEG-2 is standardized in a series of documents from ISO/IEC; MPEG-2 is widely deployed for the distribution of digital television (DTV) including SD and HD (for example, ATSC), and is the (only) video compression scheme for DVD.
CHAPTER 16 |
INTRODUCTION TO VIDEO COMPRESSION |
157 |
Many MPEG terms – such as frame, picture, and macroblock – can refer to elements of the source video, to the corresponding elements in the coded bitstream, or to the corresponding elements in the reconstructed video. It is generally clear from context which is meant.
ISO/IEC 15938-1, Information technology – Multimedia content description interface – Part 1: Systems.
ISO/IEC 21000, Multimedia framework (MPEG-21).
MPEG IMX is a Sony trademark; IMX is not an MPEG designation.
MPEG-2 accommodates both progressive and interlaced material. A video frame can be coded directly as a frame-structured picture. Alternatively, a video frame (typically originated from an interlaced source) may be coded as a pair of field-structured pictures – a top-field picture and a bottom-field picture. The two fields are time-offset by half the frame time, and are intended for interlaced display. Field pictures always come in pairs having opposite parity (top/bottom or bottom/top). Both pictures in a field pair have the same picture coding type (I, P, or B), except that an I-field may be followed by a P-field (in which case the pair effectively serves as an I-frame).
The MPEG IMX variant of MPEG-2, for studio use, is described below. The HDV variant of MPEG-2, for consumer use, is described on page 161. MPEG-2 video compression is detailed starting on page 513.
Other MPEGs
While the MPEG-2 work was underway, an MPEG-3 effort was launched to address HD. The MPEG-3 committee concluded early on that MPEG-2, at high data rate, would accommodate HD; consequently, the MPEG-3 effort was abandoned. I’ll discuss MPEG-4 below. MPEG numbers above 4 are capricious.
MPEG-7, titled Multimedia Content Description Interface, standardizes a description of various types of multimedia information (metadata). In my view, MPEG-7 is not relevant to handling studioor distribu- tion-quality video signals.
According to ISO, MPEG-21 “defines an open framework for multimedia delivery and consumption, with both the content creator and content consumer as focal points. The vision for MPEG-21 is to define a multimedia framework to enable transparent and augmented use of multimedia resources across a wide range of networks and devices used by different communities.” In my view, MPEG-21 is not relevant to handling studioor distribution-quality video.
MPEG IMX
Sony’s original Digital Betacam for 480i and 576i SD used proprietary motion-JPEG-like compression. The first products were videotape recorders denoted
158 |
DIGITAL VIDEO AND HD ALGORITHMS AND INTERFACES |
ISO/IEC 14496, Information technology – Coding of audiovisual objects.
ISO/IEC 14496-12:2004, Information technology – Coding of audio-visual objects – Part 12:
ISO base media file format.
Betacam SX, having a data rate of about 18 Mb/s. Follow-on products adopted I-frame-only MPEG-2 422P@ML, denoted MPEG IMX, having data rates of 30 Mb/s, 40 Mb/s, or 50 Mb/s (IMX30, IMX40, IMX50). MPEG IMX videotape recorders were commercialized. Today it is common to place or wrap IMX compressed video in an MXF file.
XDCAM is Sony’s designation for a line of products using a variety of compression systems and a variety of physical media. MPEG IMX compression is one of the compression systems available in the XDCAM line.
Recording on optical disc media is possible.
MPEG-4
The original goal of the MPEG-4 effort was video coding at very low bit rates. The video compression system that resulted is standardized as MPEG-4 Part 2
Visual; it differes from MPEG-2 and from H.264. ISO/IEC co-published the ITU-T H.264 standard as MPEG-4 Part 10, so the term MPEG-4 alone is ambiguous.
MPEG-4 Part 2 defines the advanced simple profile (ASP), implemented by DivX and Xvid. ASP is not useful for professional quality video. Even in ASP’s intended application domain, low bit-rate video, H.264 – to be described in a moment – has proven to have better performance. Consequently, ASP has fallen out of favour.
MPEG-4 Part 2 also defines a profile called Simple Studio Profile (SStP). This profile is used in Sony’s HDCAM SR, at very high bit rates (the other end of the bit rate spectrum for which MPEG-4 was conceived). HDCAM SR is widely used in HD, both on tape and in files. Apart from HDCAM SR, the simple studio profile of MPEG-4 sees very limited use.
Part 12 of the MPEG-4 suite of standards defines the
ISO Base Media File Format, which defines a general container structure for time-based media files. The format is used in desktop video (most commonly with MPEG-4 Part 2/ASP video, in mp4 files), but is rarely if ever used in professional video distribution.
CHAPTER 16 |
INTRODUCTION TO VIDEO COMPRESSION |
159 |
