Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Richardson I.E.H.264 and MPEG-4 video compression.2003.pdf
Скачиваний:
34
Добавлен:
23.08.2013
Размер:
4.27 Mб
Скачать

DESIGN AND PERFORMANCE

262

at maintaining good visual quality with a small encoder output buffer, keeping coding delay to a minimum (important for low-delay applications such as scenario (c) described above).

Further information on some of the many alternative strategies for rate control can be found in [41].

7.6 TRANSPORT AND STORAGE

A video CODEC is rarely used in isolation; instead, it is part of a communication system that involves coding video, audio and related information, combining the coded data and storing and/or transmitting the combined stream. There are many different options for combining (multiplexing), transporting and storing coded multimedia data and it has become clear in recent years that no single transport solution fits every application scenario.

7.6.1 Transport Mechanisms

Neither MPEG-4 nor H.264 define a mandatory transport mechanism for coded visual data. However, there are a number of possible transport solutions depending on the method of transmission, including the following.

MPEG-2 Systems: Part 1 of the MPEG-2 standard [42] defines two methods of multiplexing audio, video and associated information into streams suitable for transmission (Program Streams or Transport Streams). Each data source or elementary stream (e.g. a coded video or audio sequence) is packetised into Packetised Elementary Stream (PES) packets and PES packets from the different elementary streams are multiplexed together to form a Program Stream (typically carrying a single set of audio/visual data such as a single TV channel) or a Transport Stream (which may contain multiple channels) (Figure 7.40). The Transport Stream adds both Reed–Solomon and convolutional error control coding and so provides protection from transmission errors. Timing and synchronisation is supported by a system of clock references and time stamps in the sequence of packets. An MPEG-4 Visual stream may be carried as an elementary stream within an MPEG-2 Program or Transport Stream. Carriage of an MPEG-4 Part 10/H.264 stream over MPEG-2 Systems is covered by Amendment 3 to MPEG-2 Systems, currently undergoing standardisation.

...

 

elementary stream (e.g. video, audio)

 

...

PES

packetise packets from

multiple streams

Multiplex Transport Stream

Figure 7.40 MPEG-2 Transport Stream

TRANSPORT AND STORAGE

 

 

263

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Payload

Sequence

Timestamp

Unique

Payload (e.g. Video Packet)

 

 

Type

Number

Identifier

 

 

 

 

 

 

 

 

 

Figure 7.41 RTP packet structure (simplified)

Real-Time Protocol: RTP [43] is a packetisation protocol that may be used in conjunction with the User Datagram Protocol (UDP) to transport real-time multimedia data across networks that use the Internet Protocol (IP). UDP is preferable to the Transmission Control Protocol (TCP) for real-time applications because it offers low-latency transport across IP networks. However, it has no mechanisms for packet loss recovery or synchronisation. RTP defines a packet structure for real-time data (Figure 7.41) that includes a type identifier (to signal the type of CODEC used to generate the data), a sequence number (essential for reordering packets that are received out of order) and a time stamp (necessary to determine the correct presentation time for the decoded data). Transporting a coded audio-visual stream via RTP involves packetising each elementary stream into a series of RTP packets, interleaving these and transmitting them across an IP network (using UDP as the basic transport protocol). RTP payload formats are defined for various standard video and audio CODECs, including MPEG-4 Visual and H.264. The NAL structure of H.264 (see Chapter 6) has been designed with efficient packetisation in mind, since each NAL unit can be placed in its own RTP packet.

MPEG-4 Part 6 defines an optional session protocol, the Delivery Multimedia Integration Framework, that supports session management of MPEG-4 data streams (e.g. visual and audio) across a variety of network transport protocols. The FlexMux tool (part of MPEG-4 Systems) provides a flexible, low-overhead mechanism for multiplexing together separate Elementary Streams into a single, interleaved stream. This may be useful for multiplexing separate audio-visual objects prior to packetising into MPEG-2 PES packets, for example.

7.6.2 File Formats

Earlier video coding standards such as MPEG-1, MPEG-2 and H.263 did not explicitly define a format for storing compressed audiovisual data in a file. It is common for single compressed video sequences to be stored in files, simply by mapping the encoded stream to a sequence of bytes in a file, and in fact this is a commonly used mechanism for exchanging test bitstreams. However, storing and playing back combined audio-visual data requires a more sophisticated file structure, especially when, for example, the stored data is to be streamed across a network or when the file is required to store multiple audio-visual objects. The MPEG-4 File Format and AVC File Format (which will both be standardised as Parts of MPEG-4) are designed to store MPEG-4 Audio-Visual and H.264 Video data respectively. Both formats are derived from the ISO Base Media File Format, which in turn is based on Apple Computer’s QuickTime format.

In the ISO Media File Format, a coded stream (for example an H.264 video sequence, an MPEG-4 Visual video object or an audio stream) is stored as a track, representing a sequence of coded data items (samples, e.g. a coded VOP or coded slice) with time stamps (Figure 7.42). The file formats deal with issues such as synchronisation between tracks, random access indices and carriage of the file on a network transport mechanism.

264

 

 

 

 

 

 

 

 

 

DESIGN AND PERFORMANCE

 

 

 

audio track samples

 

 

 

 

 

 

 

mdat

1

2

1

3

4

2

3

4

5

6

7

...

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

media data

video track samples

Figure 7.42 ISO Media File

7.6.3 Coding and Transport Issues

Many of the features and tools of the MPEG-4 Visual and H.264 standards are primarily aimed at improving compression efficiency. However, it has long been recognised that it is necessary to take into account practical transport issues in a video communication system and a number of tools in each standard are specifically designed to address these issues.

Scaling a delivered video stream to support decoders with different capabilities and/or delivery bitrates is addressed by both standards in different ways. MPEG-4 Visual includes a number of tools for scalable coding (see Chapter 5), in which a sequence or object is coded to produce a number of layers. Typically, these include a base layer (which may be decoded to obtain a ‘basic’ quality version of the sequence) and enhancement layer(s), each of which requires an increased transmission bitrate but which adds quality (e.g. image quality, spatial or temporal resolution) to the decoded sequence. H.264 takes a somewhat different approach. It does not support scalable coding but provides SI and SP slices (see Chapter 6) that enable a decoder to switch efficiently between multiple coded versions of a stream. This can be particularly useful when decoding video streamed across a variable-throughput network such as the Internet, since a decoder can dynamically select the highest-rate stream that can be delivered at a particular time.

Latency is a particular issue for two-way real time appliations such as videoconferencing. Tools such as B-pictures (coded frames that use motion-compensated prediction from earlier and later frames in temporal order) can improve compression efficiency but introduce a delay of several frame periods into the coding and decoding ‘chain’ which may be unacceptable for low-latency two way applications. Latency requirements also have an influence on rate control algorithms (see Section 7.5) since post-encoder and pre-decoder buffers (useful for smoothing out rate variations) increase latency.

Each standard includes a number of features to aid the handling of transmission errors. Bit errors are a characteristic of circuit-switched channels; packet-switched networks tend to suffer from packet losses (since a bit error in a packet typically results in the packet being dropped during transit). Errors can have a serious impact on decoded quality [44] because the effect of an error may propagate spatially (distorting an area within the current decoded frame) and temporally (propagating to successive decoded frames that are temporally predicted from the errored frame). Chapters 5 and 6 describe tools that are specifically intended to reduce the damage caused by errors, including data partitioning and independent slice decoding (designed to limit error propagation by localising the effect of an error), redundant slices (sending extra copies of coded data), variable-length codes that can be decoded in either direction (reducing the likelihood of a bit error ‘knocking out’ the remainder of a coded unit) and flexible ordering