Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
литература / Digital_Video_and_HD_Second_Edition_Algorithms_and_Interfaces.pdf
Скачиваний:
0
Добавлен:
13.05.2026
Размер:
38.02 Mб
Скачать

Figure 1.5 Scene, lens, image plane

Figure 1.6 Digitization comprises sampling and quantization, in either order. Sampling density, expressed in units such as pixels per inch (ppi), relates to resolution. Quantization relates to the number of bits per pixel (bpp) or bits per component/channel (bpc). Total data rate or data capacity depends upon the product of these two factors.

CHAPTER 1

Image capture

In human vision, the three-dimensional world is imaged by the lens of the eye onto the retina, which is populated with photoreceptor cells that respond to light having wavelengths ranging from about 400 nm to 700 nm. In video and in film, we build a camera having a lens and a photosensitive device, to mimic how the world is perceived by vision. Although the shape of the retina is roughly a section of a sphere, it is topologically two dimensional. In a camera, for practical reasons, we employ a flat image plane, sketched in Figure 1.5 above, instead of a section of a sphere. Image science involves analyzing the continuous distribution of optical power that is incident on the image plane.

Digitization

Signals captured from the physical world are translated into digital form by digitization, which involves two processes: sampling (in time or space) and quantization (in amplitude), sketched in Figure 1.6 below. The operations may take place in either order, though sampling usually precedes quantization.

Sampling

of time/space

dv

dh Digitization

Quantization of amplitude

LSB

RASTER IMAGES

7

Quantization

1-D sampling

2-D sampling

Sound pressure level, relative

1

0

0

300

Angle of rotation, degrees

Figure 1.7 Audio taper imposes perceptual uniformity on the adjustment of volume. I use the term perceptual uniformity instead of perceptual linearity: Because we can’t attach an oscilloscope probe to the brain, we can’t ascribe to perception a mathematical property as strong as linearity. This graph is redrawn from Bourns, Inc.

(2005), General Application Note – Panel Controls – Taper.

Quantization assigns an integer to signal amplitude at an instant of time or a point in space, as I will explain in Quantization, on page 37. Virtually all image exchange standards – TIFF, JPEG, SD, HD, MPEG, H.264 – involve pixel values that are not proportional to light power in the scene or at the display: With respect to light power, pixel values in these systems are nonlinearly quantized.

A continuous one-dimensional function of time, such as audio sound pressure level, is sampled through forming a series of discrete values, each of which is

a function of the distribution of a physical quantity (such as intensity) across a small interval of time. Uniform sampling, where the time intervals are of equal duration, is nearly always used. (Details will be presented in Filtering and sampling, on page 191.)

A continuous two-dimensional function of space is sampled by assigning, to each element of the image matrix, a value that is a function of the distribution of intensity over a small region of space. In digital video and in conventional image processing, the samples lie on a regular, rectangular grid.

Analog video was not sampled horizontally; however, it was sampled vertically by scanning and sampled temporally at the frame rate. Historically, samples were not necessarily digital: CCD and CMOS image sensors are inherently sampled, but they are not inherently quantized. (On-chip analog-to-digital conversion is now common in CMOS sensors.) In practice, though, sampling and quantization generally go together.

Perceptual uniformity

A perceptual quantity is encoded in a perceptually uniform manner if a small perturbation to the coded value is approximately equally perceptible across the range of that value. Consider the volume control on your radio. If it were physically linear, the roughly logarithmic nature of loudness perception would place most of the perceptual “action” of the control at the bottom of its range. Instead, the control is designed to be perceptually uniform. Figure 1.7 shows the transfer function of a potentiometer with standard audio taper: Angle of rotation is mapped to sound pressure level such that rotating the knob 10 degrees produces

8

DIGITAL VIDEO AND HD ALGORITHMS AND INTERFACES

Figure 1.8 Grey paint samples exhibit perceptual uniformity: The goal of the manufacturer is to cover a reasonably wide

72range of reflectance values such that the samples are uniformly spaced as judged by human vision. The manufacturer’s code for each chip typically includes an approximate L* value. In image coding, we use a similar scheme, but with code (pixel)

53

value V instead of L*, and a hundred or a thousand codes instead of six.

37

25

16a similar perceptual increment in volume across the range of the control. This is one of many examples of

perceptual considerations built into the engineering of electronic systems.. (For another example, see

7

CIE: Commission Internationale de L’Éclairage. See Chapter 25, on page 265.

0.495 L*(0.18)

0.487 0.180.42

EOCF: Electro-optical conversion function. See Chapter 27, Gamma, on page 315.

Figure 1.8.)

Compared to linear-light encoding, a dramatic improvement in signal-to-noise performance can be obtained by using nonlinear image coding that mimics human lightness perception. Ideally, coding for distribution should be arranged such that the step between pixel component values is proportional to a just noticeable difference (JND) in physical light power. The CIE standardized the L* function in 1976 as its best estimate of the lightness sensitivity of human vision. Although the L* equation incorporates a cube root, L* is effectively a power function having an exponent of about 0.42; 18% “mid grey” in relative luminance corresponds to about 50 on the L* scale from 0 to 100. The inverse of the L* function is approximately a 2.4-power function. Most commercial imaging systems incorporate a mapping from digital code value to linearlight luminance that approximates the inverse of L*.

Different EOCFs have been standardized in different industries:

• In digital cinema, DCI/SMPTE standardizes the reference (approval) projector; that standard is closely approximated in commercial cinemas. The standard digital cinema reference projector has an EOCF that is a pure 2.6-power function.

CHAPTER 1

RASTER IMAGES

9

Vision when only the rod cells are active is termed scotopic. When light levels are sufficiently high that the rod cells are inactive, vision is photopic. In the mesopic realm, both rods and cones are active.

The term multispectral refers to cameras and scanners, or to their data representations. Display systems using more than three primaries are called multiprimary.

In SD and HD, EOCF was historically poorly standardized or not standardized at all. Consistency has been achieved only through use of de facto industrystandard CRT studio reference displays having EOCFs well approximated by a 2.4-power function. In 2011, BT.1886 was adopted formalizing the 2.4-power, but reference white luminance and viewing conditions are not [yet] standardized.

In high-end graphics arts, the Adobe RGB 1998 industry standard is used. That standard establishes

a reference display and its viewing conditions. Its EOCF is a pure 2.2-power function.

In commodity desktop computing and low-end graphics arts, the sRGB standard is used. The sRGB standard establishes a reference display and its viewing conditions. Its EOCF is a pure 2.2-power function.

Colour

To be useful for colour imaging, pixel components represent quantities closely related to human colour vision. There are three types of photoreceptor cone cells in the retina, so human vision is trichromatic: Three components are necessary and sufficient to represent colour for a normal human observer. Rod cells constitute a fourth photoreceptor type, responsible for what can loosely be called night vision. When you see colour, cone cells are responding. Rod (scotopic) vision is disregarded in the design of virtually all colour imaging systems.

Colour images are generally best captured with sensors having spectral responsivities that peak at about 630, 540, and 450 nm – loosely, red, green, and blue – and having spectral bandwidths of about 50, 40, and 30 nm respectively. Details will be presented in Chapters 25 and 26.

In multispectral and hyperspectral imaging, each pixel has 4 or more components each representing power from different wavelength bands. Hyperspectral refers to a device having more than a handful of spectral components. There is currently no widely accepted definition of how many components constitute multispectral or hyperspectral. I define a multispectral system as having between 4 and 10 spectral components, and a hyperspectral system as having 11 or more. Hyper-

10

DIGITAL VIDEO AND HD ALGORITHMS AND INTERFACES

Соседние файлы в папке литература