Ординатура / Офтальмология / Английские материалы / Seeing_De Valois_2000
.pdf
34 Larry N. Thibos
Schematic model of neural sampling of the retinal image. (a) Photoreceptor mosaic showing the relationship between cones (open circles) and rods (filled circles) across the visual field.
(b) Neural architecture of the P-system of retinal cells, which carry the neural image to the brain via the optic nerve. At the second and third stages open circles represent on-neurons; hatched circles represent o -neurons.
the retinal surface, it is often useful to think of the cone aperture as being projected back into object space, where it can be compared with the dimensions of visual targets as illustrated in Figure 19a. This back-projection can be accomplished mathematically by convolving the optical point-spread function of the eye with the uniformly weighted aperture function of the cone, as illustrated in Figure 19b. For this illustration the optical system of the eye was assumed to be di raction-limited (2.5- mm pupil, 550-nm light), and the aperture function of a foveal cone was assumed to be a uniformly weighted circular disk 0.5 arcmin in diameter. (This latter assump-
1 Retinal Image Formation and Sampling |
35 |
tion ignores the e ects of di raction at the cone aperture that would increase the cone aperture still further.) The result is a spatial weighting function called the receptive field of the cone. Since foveal cones are tightly packed on the retinal surface, this illustration shows that the receptive fields of foveal cones must overlap when plotted in object space. Furthermore, these receptive fields will be optically dominated because even under optimum viewing conditions the width of the optical PSF of the normal eye is greater than the aperture of foveal cones (Williams et al., 1994). Just the opposite is true in the periphery, where cones are widely spaced and larger in diameter than the optical PSF, provided o -axis astigmatism and focusing errors are corrected with spectacle lenses (D. Williams et al., 1996).
The neural images encoded by the rod-and-cone mosaics are transmitted from eye to brain over an optic nerve which, in humans, contains roughly one million individual fibers per eye. Each fiber is an outgrowth of a third-order retinal neuron called a ganglion cell. It is a general feature of the vertebrate retina that ganglion cells are functionally connected to many rods and cones by means of intermediate, second-order neurons called bipolar cells. As a result, a given ganglion cell typically responds to light falling over a relatively large receptive field covering numerous rods and cones. Neighboring ganglion cells may receive input from the same receptor, which implies that ganglion cell receptive fields may physically overlap. Thus in general the mapping from photoreceptors to optic nerve fibers is both many-to- one and one-to-many.The net result, however, is a significant degree of image compression since the human eye contains about five times more cones, and about 100 times more rods, than optic nerve fibers (Curcio & Allen, 1990; Curcio et al., 1990). For this reason the optic nerve is often described as an information bottleneck
FIGURE 19 Receptive fields of cone photoreceptors in the fovea. (a) Cone apertures on retina are blurred by the eye’s optical system when projected into object space. (b) Spatial sensitivity profile of foveal cones in object space (solid curve) is broader than in image space (broken curve).
36 Larry N. Thibos
through which the neural image must pass before arriving at visual centers of the brain where vast numbers of neurons are available for extensive visual processing. To cope with this bottleneck, retinal neurons have evolved sophisticated image compression algorithms, similar to those used in computer graphics programs, which eliminate spatial and temporal redundancy, passing signals only when the scene changes between neighboring points in space or time (Werblin, 1991).
It would be a gross oversimplification to suppose that the array of retinal ganglion cells form a homogeneous population of neurons. In fact, ganglion cells fall into a dozen or more physiological and anatomical classes, each of which looks at the retinal image through a unique combination of spatial, temporal, and chromatic filters. Each class of ganglion cell then delivers that filtered neural image via the optic nerve to a unique nucleus of cells within the brain specialized to perform some aspect of either visually controlled motor behavior (e.g., accommodation, pupil constriction, eye movements, body posture, etc.) or visual perception (e.g., motion, color, form, etc.). Di erent functional classes thus represent distinct subpopulations of ganglion cells that exist in parallel to extract di erent kinds of biologically useful information from the retinal image.
In humans and other primates, one particular class of retinal ganglion cell called P-cells (by physiologists) or midget cells (by anatomists) is by far the most numerous everywhere across the retina (Rodieck, 1988). This P-system has evolved to meet the perceptual requirements for high spatial resolution by minimizing the degree of convergence from cones onto ganglion cells (Wässle & Boycott, 1991). The ultimate limit to this evolutionary strategy is achieved in the fovea, where individual cones exclusively drive not just one, but two ganglion cells via separate interneurons (bipolar cells).These bipolar cells carry complementary neural images, analogous to positive and negative photographic images. One type of bipolar cell (“On” cells) signals those regions of the retinal image that are brighter than nearby regions, and the opposite type of bipolar cell (“O ” cell) signals those regions that are darker. Farther into the periphery, beyond about 10–15 field angle, there are more cones than ganglion cells, so some convergence is necessary. Nevertheless, even in the midperiphery the retina preserves high spatial resolution through the bipolar stage, thereby delaying convergence until the output stage of ganglion cells (Wässle, Grünert, Martin, & Boycott, 1994).
A schematic diagram of the neural organization of the P-system is shown in Figure 18b. Individual cone photoreceptors make exclusive contact with two interneurons, one ON-bipolar and one OFF-bipolar, through synaptic connections of opposite sign. In general, signals from several ON-bipolars are pooled by a given ON-ganglion cell, and similarly for OFF-cells, thus preserving the complementary neural images for transmission up the optic nerve to the brain (Kolb, 1970; Polyak, 1941). In the fovea, each ganglion cell connects to a single bipolar cell (Kolb & Dekorver, 1991; Kolb, Linberg, & Fisher, 1992), which connects in turn to a single cone, thus producing a ganglion cell with a receptive field the size of an individual cone. In an eye of an individual with normal color vision, the cone population that
1 Retinal Image Formation and Sampling |
37 |
drives the P-system consists of two subtypes (L- and M-type) with slightly di erent spectral sensitivities. Since a foveal ganglion cell is functionally connected to a single cone, the ganglion cell will inherit the cone’s spectral selectivity, thereby preserving chromatic signals necessary for color vision. In peripheral retina, P-ganglion cells may pool signals indiscriminately from di erent cone types, thus diminishing our ability to distinguish colors.
B. Functional Implications of Neural Sampling
1. Contrast Detection and the Size of Sampling Elements
The neural architecture of the retina outlined above has important functional implications for the basic visual tasks of contrast detection and spatial resolution. The finest spatial pattern for which contrast is detectable depends ultimately upon the size of the largest receptive fields in the chain of neurons that supports contrast perception. Since ganglion cell receptive fields can be no smaller than that of individual cones, and are generally expected to be larger, the retinal limit imposed on contrast detection will be set by the spatial filtering characteristics of ganglion cell receptive fields. To first approximation, the cuto spatial frequency for an individual cell is given by the inverse of its receptive field diameter. For example, a ganglion cell connected to a foveal cone of diameter 2.5 m would have a cuto frequency of about 120 cyc/deg, which is about twice the optical bandwidth of the human eye under optimum conditions and about three times the visual acuity of the average person. Although this is an extremely high spatial frequency by visual standards, this prediction has been verified experimentally by using interference fringes to avoid the eye’s optical limitations (Williams, 1985). However, under natural viewing conditions the optical bandwidth of the retinal image is typically about 60 cyc/deg (Williams et al., 1994), which is approximately half the bandwidth of individual foveal cones. This implies that the cuto spatial frequency for signaling contrast by ganglion cells in foveal vision is determined more by optical attenuation than by cone diameter.
The situation is a little more complicated in peripheral retina, where a ganglion cell’s receptive field may be the union of several disjoint, widely spaced, cone receptive fields. It can be shown that ganglion cells of this kind will have secondary lobes in their frequency response characteristics which extend the cell’s cuto spatial frequency up to that of individual input cones (Thibos & Bradley, 1995). Thus as 30 field angle where cones are three times larger than in the fovea, cuto frequency would be expected to be three times smaller, or 40 cyc/deg. This is also a very high spatial frequency, which approaches the detection cuto for normal foveal vision and is an order of magnitude beyond the resolution limit in the midperiphery. Nevertheless, the prediction has been verified using interference fringes as a visual stimulus (Thibos, Walsh, & Cheney, 1987). Under natural viewing conditions with refractive errors corrected, the cuto frequency for contrast detection is slightly
38 Larry N. Thibos
lower (Thibos, Still, & Bradley, 1996; Wang, Thibos, & Bradley, 1997c), indicating that optical attenuation of the eye sets a lower limit to contrast detection than does neural filtering in peripheral vision, just as in central vision.
2. Spatial Resolution and the Spacing of Sampling Elements
More than a century ago Bergmann (1857) and Helmholtz (1867) laid the foundation for a sampling theory of visual resolution when they argued that for two points to be discriminated, at least one relatively unstimulated photoreceptor must lie between two relatively stimulated photoreceptors. Although this rule was formulated in the context of resolving two points of light, it applies equally well to the case of resolving the individual bars of a sinusoidal grating as illustrated in Figure 20. For a grating stimulus, the sampling rule states that there must be at least two sample points per cycle of the grating so that individual dark and light bars can be separately registered. It is of some historical interest to note that the BergmannHelmholtz rule for adequate spacing of neural sampling elements on the retina predates by more than half a century the celebrated sampling theorem of communication theory formulated in the 20th century by Whittaker and Shannon (D’Zmura, 1996; Zayed, 1993).
Figure 20 illustrates the penalty for disobeying the sampling theorem because of
FIGURE 20 Three schemes for one-dimensional neural sampling of the retinal image. Upper diagram shows location of three rows of neural receptive fields relative to the image of a sinusoidal grating. Top row, oversampled; middle row, critically sampled; bottom row, undersampled. Lower diagram shows strength of response of each neuron (bar graph). Interpolation of neural responses (broken curves) faithfully reconstructs the original image in the top two rows, but misrepresents the grating’s frequency in the bottom row.
1 Retinal Image Formation and Sampling |
39 |
insu cient sampling density. The top half of the figure depicts a sinusoidal grating being sampled by three rows of visual neurons, each with a di erent spacing between their receptive fields.The corresponding spatial pattern of neural responses shown on the bottom half of the figure constitutes a neural image that represents the stimulus within the visual system. The neurons in the top row of neurons are packed so tightly that the requirements of the sampling theorem are exceeded, a condition called oversampling. The middle row of neurons illustrates the critically sampled condition in which the neurons are as widely spaced as possible while still satisfying the requirement for at least two samples per cycle of the stimulus. The spatial frequency of the grating in this critical case is called the Nyquist frequency. The bottom row of neurons are too widely spaced to satisfy the sampling theorem, a condition called undersampling. As described below, undersampling causes the neural image to misrepresent the retinal image because there aren’t enough sample points to register each and every bar in the pattern.
Although the neural image is meant to be a faithful representation of the retinal image, these two kinds of images are fundamentally di erent: the optical image formed on the retina is spatially continuous, whereas the neural image is discrete. This di erence between the stimulus and its neural representation is central to the sampling theory of visual resolution, and it raises an important theoretical question. Is information lost by converting a continuous light image into a discrete neural image? To answer this question, it is helpful for the reader to mentally interpolate between sample points in Figure 20 to form an envelope of modulation, as illustrated by the dotted curve. Clearly this envelope accurately represents the spatial frequency of the visual stimulus for the upper and middle neural images illustrated. In fact, given the proper method of interpolation, Shannon’s rigorous sampling theorem states that the envelop will exactly reconstruct the retinal image provided that the sampling process is error-free and that the sampling density is su cient to provide at least two samples per cycle of the highest spatial frequency component in the image. However, if these preconditions are not satisfied, as indicated in the bottom row of neurons in Figure 20, then information will be irretrievably lost and the stimulus will be misrepresented by the neural image as a pattern of lower spatial frequency. This false representation of the stimulus due to undersampling is called aliasing.
The one-dimensional analysis presented in Figure 20 oversimplifies the problem of neural undersampling of a two-dimensional signal such as the retinal image. The penalty of undersampling a two-dimensional image is that the neural image may misrepresent the orientation of the stimulus pattern as well as its spatial frequency. Furthermore, if the pattern is moving then the direction of motion may also be misrepresented. The static features of two-dimensional visual aliasing are illustrated in Figure 21 for a more naturalistic visual scene. In this computer simulation of neural undersampling, the original image (Figure 21a) was first undersampled and then the empty spaces between samples were filled in by interpolation (Figure 21b). Notice how the zebra’s fine stripes are transformed by undersampling into the
Comparison of two methods for limiting resolution of a natural object: (a) was undersampled to yield (b); (c) was created by blurring the original image in (a). The degree of undersampling and blurring was chosen so that the finer stripes in the animal’s coat would be unresolved.
1 Retinal Image Formation and Sampling |
41 |
coarse, irregular, splotchy pattern of the leopard. The important point is that although the finer stripes are distorted and nonveridical, they remain visible because of the relatively high contrast that persists in the neural image. This illustrates the important point made earlier that spatial patterns can remain visible even though they are misrepresented in the neural image.
In summary, according to the sampling theory of visual resolution, the spectrum of visible spatial frequencies is partitioned into two regions by the Nyquist frequency of the neural array. Frequencies below the Nyquist limit are perceived veridically, whereas frequencies above the Nyquist limit are misperceived as aliases of the stimulus. Thus aliasing is the proof that neural undersampling is the limiting mechanism for spatial resolution.
Surprisingly, although the Bergmann-Helmholtz rule has been a popular fixture of visual science for more than a century (Helmholtz, 1911; Ten Doesschate, 1946; Weymouth, 1958), until relatively recently there was little evidence that the rule actually applies to human vision. Despite thousands of scientific experiments since the mid-19th century, and countless numbers of clinical measurements of visual acuity by optometrists and ophthalmologists, only a few scattered publications prior to 1983 mentioned the telltale signs of aliasing appearing when the visual resolution limit is exceeded (Bergmann, 1857; Byram, 1944). In the absence of compelling evidence of neural undersampling, a competing theory rose to prominence, which suggested that the theoretical sampling limit is never attained in real life because a lower limit is imposed by spatial filtering mechanisms in the eye. According to this filtering theory, spatial resolution fails not because of undersampling, but because of contrast insu ciency. In other words, contrast sensitivity falls below the absolute threshold of unity for high spatial frequencies beyond the neural Nyquist limit, thus preventing aliasing. This is the scenario depicted in the simulated neural image in Figure 21c, which was prepared by blurring the original image with a low-pass spatial filter. Notice how the finer stripes of the zebra’s coat have vanished altogether as filtering reduces their contrast to below our visual threshold. This is the familiar experience we all share in central vision: fine patterns disappear rather than mutate into coarse, aliased patterns that remain visible. In central vision it is contrast insu - ciency, not the ambiguity of aliasing, which limits resolution and justifies the common practice of taking the endpoint of the contrast-sensitivity function as a defi- nition of the resolution limit (De Valois & De Valois, 1988). As will be described next, the situation is just the reverse in peripheral vision.
C. Evidence of Neural Sampling in Perception
The widespread acceptance of filtering theories of visual resolution reflects the dominance of our foveal visual experience in shaping our thinking about the visual system (Hughes, 1996). However, over the past decade of research into parafoveal and peripheral vision we have come to realize that the fovea is the only part of the optically well-corrected eye which is ordinarily not sampling-limited. The reason
42 Larry N. Thibos
sampling-limited performance is not normally achieved in foveal vision is because the extremely high packing density of adult cone photoreceptors and ganglion cells causes the Nyquist frequency to be higher than the optical cuto of the eye. Thus, central vision is protected from aliasing by the low-pass spatial-filtering action of the eye’s optical system. We know that the limiting filter is optical in nature rather than neural because aliasing will occur in central vision if a special interferometric visual stimulator is used to bypass the optical system of the eye (D. Williams, 1985). In fact, even foveal vision is sampling-limited under optimal conditions in some individuals with eyes of exceptionally high optical quality (Bergmann, 1857; Miller, Williams, Morris, & Liang, 1996).
Outside the central fovea the sampling density of retinal cones and ganglion cells declines rapidly (Curcio & Allen, 1990; Curcio et al., 1990), whereas the optical quality of the eye remains excellent, provided that o -axis refractive errors are corrected (D. Williams et al., 1996). Despite these favorable conditions for undersampling, perceptual aliasing in the periphery was reported for the first time only relatively recently (Thibos & Walsh, 1985). Subsequent experiments have shown that visual resolution in the parafovea is well predicted by photoreceptor density (D. Williams & Coletta, 1987). Beyond about 10–15 of eccentricity, however, human resolution acuity is much lower than can be accounted for by the density of cones, but closely matches anatomical predictions based on the density of P-type retinal ganglion cells (Thibos & Bradley, 1995; Thibos, Cheney, & Walsh, 1987).
Perhaps the most compelling evidence of neural undersampling in human vision comes from drawings of what gratings look like when carefully scrutinized by trained observers. Figure 22a illustrates a series of such drawings obtained in the course of experiments reported by Thibos et al. (1996). The stimulus was a patch of vertically oriented grating displaced from the fixation point by 20 along the horizontal meridian of the nasal visual field. When the spatial frequency of the grating was below the resolution cuto , the stimulus appeared veridically. That is, the sub-
FIGURE 22 Aliasing in human peripheral vision. (a) Drawings of subjective appearance of gratings in peripheral vision. (b) Comparison of visual performance for resolution and detection of spatial contrast. Target eccentricity 20 in horizontal nasal field.
1 Retinal Image Formation and Sampling |
43 |
ject reported seeing a patch of vertical grating containing a few cycles of the pattern. However, when the spatial frequency exceeded the resolution limit, which in this case was about 5.5 cyc/deg, the perceived stimulus looked quite di erent from the actual stimulus. The pattern was distorted, the orientation was frequently wrong, and the spatial scale of the visible elements was much coarser than the actual stimulus. These features of aliased perception were remarkably similar to the simulated neural image of Figure 21b. Another characteristic of visual aliasing is the unstable nature of the percept. The two rows of drawings in Figure 22a illustrate the changing appearance from moment-to-moment of the fixed stimulus. This characteristic of aliasing is probably due to small fixational eye movements which continually alter the position of the retinal image relative to the neural sampling array, thus introducing instability into the alias and ambiguity into the perceived pattern. Although cortical mechanisms normally compensate for eye movements to produce a stabilized perception of the external world, these mechanisms would be defeated by the lack of correlation between eye movements and the misrepresented spatial position of an undersampled neural image.
Quantitative measurements of visual performance in the periphery are shown in Figure 22b for the contrast-detection task of discriminating a grating from a uniform field and the resolution task of discriminating a horizontal from a vertical grating (Thibos et al., 1996). Whereas acuity for the resolution task is about 5.5 cyc/ deg at this location in the visual field, acuity for the detection task is nearer 20 cyc/ deg. These results lend strong support to the hypothesis that resolution in the peripheral visual field is limited by undersampling. At the same time, the data are inconsistent with an alternative hypotheses based on neural or optical filtering because the filtering model predicts that resolution acuity and detection acuity will be equal and that perceptual aliasing will not occur.
Today a large body of psychophysical evidence supports the sampling theory of spatial resolution everywhere in the visual field except the central few degrees where optical filtering dominates (S. Anderson, Drasdo, & Thompson, 1995; R. Anderson, Evans, & Thibos, 1996; S. Anderson & Hess, 1990; S. Anderson et al., 1991; Artal, Derrington, & Colombo, 1995; Coletta, Segu, & Tiana, 1993; Coletta & Williams, 1987; Coletta, Williams, & Tiana, 1990; Galvin & Williams, 1992; He & MacLeod, 1996; Thibos & Bradley, 1993; Thibos, Cheney, & Walsh, 1987; Thibos et al., 1996; Thibos,Walsh, & Cheney, 1987;Wang, Bradley, & Thibos, 1997a, 1997b;Wang,Thibos, & Bradley, 1996; Wilkinson, 1994; Williams, 1985; Williams & Coletta, 1987). Qualitative evidence includes subjective reports by several di erent research groups of spatial and motion aliasing under a variety of test stimuli and viewing conditions. Quantitative support for the sampling hypothesis includes evidence that detection acuity can exceed resolution acuity by up to an order of magnitude in peripheral vision, and that contrast sensitivity is much greater than unity at the resolution limit (Thibos et al., 1996). Saturation of resolution acuity as contrast increases is definitive evidence that peripheral resolution of high-contrast retinal images is not limited by the contrast insu ciency predicated by filtering models (Thibos et al., 1996).
