Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
литература / Digital_Video_and_HD_Second_Edition_Algorithms_and_Interfaces.pdf
Скачиваний:
0
Добавлен:
13.05.2026
Размер:
38.02 Mб
Скачать

You know you’re in trouble when the Wikipedia page for Metadata starts “The term metadata is an ambiguous

term …” [accessed 2011-10-18].

Metadata

18

This chapter differs in tone from other chapters in this book. I’m a skeptic concerning metadata.

Metadata presents problems – therefore opportunities, therefore commercial activities, therefore products. However, in my view the video industry hasn’t achieved a sufficiently broad understanding of the deep principles of metadata that any general approach can be set out.

Consider an audio file, storing 200 million audio sample pairs at 44.1 kHz representing a performance of Beethoven’s Symphony No. 9, Choral. To recreate that sound approximating the way it was experienced by the original audience, you’ll need to know the sample rate. The sample rate could be provided in a paper document, perhaps a standard. To enable general purpose decoders and players, it makes sense to encode sample

rate in the file, perhaps in the file header.

Is such an encoded sample rate data or metadata? I argue that it’s data, because the intended auditory

experience cannot be attained without knowing it. You may feel that this example – call it Example 0 – is

contrived and irrelevant. Let me present five further examples. Example 1 is conceptually a small step from Example 0; we proceed (with increasing complexity and increasing relevance to professional video) to Example 5, which concerns a highly topical issue in video engineering. I claim that Example 5 exhibits the same philosophical dilemma as Example 0:

What’s data, and what’s metadata?

While this dilemma persists, a chapter entitled Metadata must ask questions instead of providing answers.

171

CD-DA was defined by the Sony and Philips “Red Book,” which IEC subsequently standardized as IEC 60908.

After a few years, the CD proponents adopted the CD Text standard, augmenting the Red Book to allow recording text-based metadata. But by then it was too late.

Today some people would call the table of contents technical metadata.

I consider it to be data: Without the ToC, the user cannot put the system to its intended use – playing songs.

ITU-R BR.1352-2, Broadcast Wave

Format (BWF).

Metadata Example 1: CD-DA

CD-DA abbreviates compact disc-digital audio. CD-DA was conceived by Philips and Sony to store hi-fi digital stereo audio at 16 bits per sample and 44.1 kHz sample rate (that is, a data rate of about 1.5 Mb/s) on optical media having capacity of about 660 MB.

The original “Red Book” specification for CD-DA did not include any provision for album title, artist name, song titles, liner notes, or any other text information.

This information was printed on the CD jacket; apparently Sony and Philips thought that providing such information in digital form would be redundant! The CD format not only lacked the metadata but also lacked any provisions for a unique ID.

The recorded CD-DA media did – of necessity – include a table of contents giving track count, track start times, and track durations (to 1/75 s accuracy). The audiophile and software engineer Ti Kan realized that this information could be “hashed” into a 32-bit number and treated as an ersatz unique ID. As CDs became popular, Kan (assisted by Steve Scherf) created the CDDB service, a database to store communitycontributed metadata associated with their codes. CDDB was originally a community-driven service, but became a commercial entity – first CDDB, Inc. (in 1995), then Gracenote (in 2000, acquired by Sony in 2008).

So, CD albums have metadata – but not reliably sourced by, or under direct control of, content creators. The lesson for the system designer is this: What constitutes “data” and what constitutes “metadata” is

coloured by your view of the boundaries of your system. Sony and Philips apparently thought of the CD system as distributing prerecorded digital audio. Today, we think of the CD system as distributing music to consumers. There’s a subtle difference that changes the notion of what’s data and what’s metadata.

When the MP3 audio compression system was created, the developers made provisions for ID3 tags to convey metadata sourced by the content creators.

The BWF file format commonly used for broadcast audio includes a “parameter” called nSamplesPerSec giving the sample rate. The parameter is carried in

a “BWF Metadata Chunk.” Is the sample rate metadata?

172

DIGITAL VIDEO AND HD ALGORITHMS AND INTERFACES

Metadata Example 2: .yuv files

The “.yuv” file format was introduced by Abekas in the late 1980s to store uncompressed video. Given samples of 8-bit Y’CBCR, 4:2:2 interlaced video in raster order, the file format definition is essentially as follows:

Store successive image rows, where each row is

a sequence of 4-byte elements [CB0, Y0, CR0, Y1] where subscript 0 signifies an even-numbered luma

sample location and subscript 1 signifies odd.

There is no header in a .yuv file – in particular, there is no provision for storing the count of frames, image rows, or image columns. The format was introduced to store 720× 480 video. Later, it was applied to 720× 576. It could potentially be applied to 720× 481, 720× 483, 720× 486, or 704× 480. It has been used in the codec research community for 1280× 720p and 1920× 1080i. Consider the reading of .yuv files constrained to be 720× 480 or 720× 576. Most of the time the format can be determined by dividing the file’s bytecount by 1440,

then dividing by 480 and 576 in turn to see which quotient is an integer. But that approach doesn’t always work. For example, a 4,147,200-byte file could be six frames of 480i or five frames of 576i.

Reliable file interpretation is attained only by agreement between sender and receiver – or expressed more properly in terms of files, between writer and reader – that is, outside the scope of transfer of the file itself.

Imagine extending the .yuv file format by prepending a file header comprising three 32-bit words: a count of the number of frames, a count of the number of image rows, and a count of the number of image columns. Is the header data or metadata? If your “system” is defined in advance as being 480i, then the counts in the header are inessential, auxiliary information – call it metadata. But if your “system” is multiformat, then the counts are most certainly data, because reliable interpretation of the image portion of the file is impossible without the numbers in the header.

The conclusion is this: What comprises “metadata” depends upon what you consider to be your “system.” The larger, more inclusive, and more general your system – the less you depend upon context – the more your metadata turns into data.

CHAPTER 18

METADATA

173

See 2-3 pulldown, on page 405.

Metadata Example 3: RFF

Since about 1953, a dominant source of television content has been movies – first on photochemical film, then in digital form. For more than half a century, movies have been intended for display at a frame rate of 24 Hz. The expedient solution to match movie frame rate to the historical 59.94 Hz field rate of North American television is to slow the movie to 23.976 Hz, then impose 2-3 pulldown whereby successive movie frames are displayed twice, then three times, twice, then three times, and so on. A certain degree of motion stutter results, but is not objectionable to consumers. Certain video frames M-frames, see Figure 34.1 on page 405 comprise fields from two different movie frames.

In about 1990 it became feasible for consumer television receivers to eliminate the display twitter artifact of interlaced display by deinterlacing (by digital means) and displaying frames at 59.94 Hz. Owing to the prevalence of “film” material, deinterlacing required detection and treatment of the M-frames.

The technique adopted compares elements of the image data of successive video fields to see if a 2-3 pattern can be discerned. If a sustained 2-3 sequence is detected, then the source is presumed to be 24 Hz; frames are assembled accordingly. As CE technology progressed, receivers became more and more dependent upon such algorithms, to the point today that

a high-quality digital television processor chip may dedicate a hundred thousand gates to the task. The problem is that implementations aren’t necessarily reliable, and different implementations aren’t consistent.

The problem arose at a time when broadcasting of “line 21” closed caption data was becoming commonplace, transmitting roughly 16 bits per field. The 2-3 problem could have been nipped in the bud by including one bit per field signalling the film pulldown.

The MPEG-2 system accommodates 24 Hz material through the repeat first field (RFF) flag conveyed in the Picture Coding Extension. The flag causes the first decoded field of a field pair to be repeated. MPEG-2’s RFF can be considered a metadata “hint”: Satisfactory performance is obtained ignoring it, but improved performance is obtained by using it.

174

DIGITAL VIDEO AND HD ALGORITHMS AND INTERFACES

Соседние файлы в папке литература