Ординатура / Офтальмология / Английские материалы / Seeing_De Valois_2000
.pdf
194 Clifton Schor
Fusion limits and spatial frequency.The diplopia threshold (radius of the fusional area) as a function of the peak spatial frequency of two Gaussian patches (spatial bandwidth 1.75 octaves) and of the width of two bright bars. For patches with a spatial frequency below about 1.5 c/deg, the diplopia threshold corresponds to a 90 phase shift of the stimulus, indicated by the dotted line. The fusion limit for the bars remains the same as that of the high spatial frequency patch. (Reprinted from Vision Research, 24, Schor, C. M., Wood, I. C., & Ogawa, J. Binocular sensory fusion is limited by spatial resolution, pp. 661–665, Copyright © 1984, with kind permission from Elsevier Science Ltd., The Boulevard, Langford Lane, Kidlington, OX5 1GB, United Kingdom.)
limited by the larger 10-arc min positional disparity. At low spatial frequencies, the 10-arc min positional disparity is smaller than the 90 phase disparity, and consequently the fusion range rises at a rate fixed by the constant 90 phase limit (Schor et al., 1984a). DeAngelis, Ohzawa, and Freeman (1995) have proposed a physiological analogue of this model that is supported by their observations of phase and position encoding disparity processing units in cat striate cortex.
The shape of Panum’s area also changes when spatial frequency is decreased to 2.5 cpd. Panum’s area changes from an elliptical shape at high frequencies to a circular shape at frequencies lower than 2.5 cpd. This is because the vertical dimension continues to decrease as spatial frequency is increased above 2.5 cpd, but the horizontal dimension remains constant at higher spatial frequencies. Interestingly, vertical disparity limits for fusion are only constrained by the phase limit. Their dimension is not limited at high spatial frequencies by a constant positional disparity.
5 Binocular Vision |
195 |
E. Retinal Eccentricity
Panum’s area also increases with retinal eccentricity of fusion stimuli (Crone & Leuridan, 1973; Hampton & Kertesz, 1983; Mitchell, 1966; Ogle, 1952). The increase in the fusion range is approximately 7% of the retinal eccentricity. These measures of fusion range have been made with broad-band spatial frequency stimuli. When fusion ranges are measured with narrow-band spatial frequency stimuli, such as the di erence of Gaussian, (Schor, Wesson, & Robertson, 1986; Wilson, Blake, & Pokorny, 1988), the range of fusion does not change with retinal eccentricity. Fusion ranges remain small with spatial frequencies above 2.5 cpd as retinal eccentricity increases, as long as the fusion stimulus can be resolved. Eventually, when visual resolution decreases below 2.5 cpd (at 10 retinal eccentricity), lower spatial frequencies than 2.5 cpd must be used to stimulate fusion and Panum’s area increases. However, the fusion range is the same as it would be at the fovea when measured with the same low spatial frequency. When measured with a broad-band stimulus, the highest resolvable spatial frequency will limit the fusion range (Schor, Heckman, & Tyler, 1989). Higher spatial frequency components are processed from the broad-band stimulus when imaged in the central retina, and the sensory fusion range will begin to increase when the peripheral retina is not able to resolve frequencies above 2.5 cpd and fusion is limited by the remaining lower spatial frequencies. These results and other studies of the independence of fusion limits and image contrast and luminance (Mitchell, 1966; Schor et al., 1989; Siegel & Duncan, 1960) suggest that binocular fusion is based on information in independent spa- tial-frequency channels rather than on the overall luminance distribution of a broad-band stimulus.
F. Disparity Gradient Limits
Fusion ranges are also reduced by the presence of other nearby stimuli that subtend di erent disparities (Braddick, 1979; Helmholtz, 1909; Schor & Tyler, 1981). As a rule of thumb, two adjacent targets of unequal disparity cannot be fused simultaneously when their disparity di erence is greater than their separation (Burt & Julesz, 1980). This disparity gradient limit, defined as the ratio of disparity di erence over separation, is 1.0. For example, the two dots pairs of unequal disparity shown in Figure 11 can be fused as long as their vertical separation is greater than their disparity di erence.
The interaction between nearby targets is also influenced by their spatial frequency content. The disparity gradient limit is very strong when a small high spatial frequency-crossed disparity stimulus is presented adjacent to a slightly lower spatial frequency background (2 octaves lower) subtending zero disparity. However, a much lower spatial frequency background (4 octaves lower) or a higher spatial frequency background has less or no influence on the fusion range with the same foreground stimulus (Scheidt & Kertesz, 1993; Wilson, Blake, & Halpern, 1991). This coarse-to-fine limit demonstrates how low-frequency stimuli can constrain the
5 Binocular Vision |
197 |
matches of higher spatial frequency stimuli. This can be beneficial in large textured surfaces that contain coarse and fine spatial frequency information. The coarse features have fewer ambiguous matches than the high-frequency features, and the disparity gradient limit helps to bias matches in ambiguous stimuli, such as tree foliage, to solve for smooth surfaces rather than irregular depth planes.
G. Temporal Constraints
The size and dimensions of Panum’s area also depend upon the exposure duration and velocity at which disparity increases. The horizontal radius of Panum’s area increases from 2 to 4 arc min as exposure of pulsed disparities increased from 5 to 100 ms and remains constant for longer durations (Woo, 1974). The horizontal dimension of Panum’s area also increases as the velocity of slow continuous variations of disparity decreases, while the vertical dimension is una ected by disparity velocity (Schor & Tyler, 1981). Thus at low velocities (2 arc min/sec), Panum’s area extends beyond the static disparity limit to 20 arc min horizontally and has an elliptical shape. At higher velocities ( 10 arc min/sec) the horizontal dimension shrinks to equal the size of the vertical dimension (8 arc min) and has a circular shape. This velocity dependence of the fusion range may contribute to the small hysteresis of fusion in which the amplitude of Panum’s area is larger when measured with slowly increasing disparities than with decreasing large disparities (Erkelens, 1988; Fender & Julesz, 1967; Piantanida, 1986).
H. Color Fusion
When narrow-band green (530 m ) and red (680 m ) are viewed dichoptically in a small field, a binocular yellow percept occurs (Prentice, 1948). Color fusion is facilitated by small fields that have textured patches, low luminance, desaturated colors, and flicker. Dichoptic color fusion or mixture suggests that a cortical process is involved (Hovis, 1989).
IV. ENCODING DISPARITY: THE MATCHING PROBLEM
The lateral separation of our eyes, which gives us two vantage points to view scenes, produces small di erences or disparities in the two retinal images from which we
R
FIGURE 11 The disparity-gradient limit for binocular fusion. (a) Diverge or converge to fuse neighboring columns of dots, as shown in the inset on the left. If the lower pair of dots is fused in each set of four dots, the upper pair fused only if the disparity gradient is not higher than about 1. The disparity gradient increases down the rows and may be calibrated for a given viewing distance. The dispar- ity-gradient limit of fusion can then be determined by reading o the row number at which fusion of the upper pair of dots fails. (From Burt, P., & Julesz, J. (1980). Modifications of the classical notion of Panum’s fusional area. Perception, 9, 671–682. Pion, London, reprinted with permission.)
198 Clifton Schor
derive stereoscopic depth. Disparities of the two retinal images could be analyzed in various ways. In a local analysis, individual perceived forms or images could be compared to derive their disparity and sense their depth. In this local analysis, form perception would precede depth perception (Helmholtz, 1909). In a global analysis, luminance properties of the scene, such as texture and other token elements, could be analyzed to code a disparity map from which depth was perceived. The resulting depth map would yield perceptions of form. In this analysis, depth perception would precede form perception. In the local analysis it is clear which monocular components of the perceived binocular images are to be compared because of their uniqueness.The global analysis is much more di cult, as many similar texture elements, such as in tree foliage, must be matched, and the correct pairs of images to match are not obvious. An example of our prowess at accomplishing this feat is the perception of depth in the autostereogram shown in Figure 12. Free fusion of the two random-dot patterns yields the percept of a checkerboard. There are thousands of similar texture elements (dots), yet we can correctly match them
Random-dot stereogram of a checkerboard generated by the autostereogram technique. At 40-cm viewing distance, hold a finger about 10 cm above page and fixate finger continuously. The stereoscopic percept of a checkerboard will gradually emerge in the plane of the finger, which can then be removed for free viewing within the stereo space (From C. M. Schor & K. Ciu reda, 1983.Vergence eye movements, basic and clinical aspects. In C. W. Tyler (Ed.), Sensory processing of binocular disparity (p. 241). Woburn, MA: Butterworth. Reproduced with permission of the author.)
5 Binocular Vision |
199 |
FIGURE 13 Each of the four points in one eye’s view could match any of the four projections in the other eye’s view. Of the 16 (N !) possible matches, only 4 are correct (filled circles); the remaining 12 are false targets (open circles). Without further constraints based on global consideration, such ambiguities cannot be resolved. (Reprinted with permission from Marr, D., & Poggio, T. (1976). Cooperative computation of stereo disparity. Science, 194, 283–287. Copyright © 1976, American Association for the Advancement of Science.)
to derive the disparity map necessary to see a unique form in depth. An important question is how does the visual system identify corresponding monocular features? This problem is illustrated in Figure 13, which shows a schematic of four dots imaged on the two retinae.This pair of retinal images could arise from many depthdot patterns, depending on which dots were paired in the disparity analysis. The number of possible depth patterns that could be analyzed is N !, where N is the number of dots. Thus for 10 dots there are 3,628,800 possible depth patterns that could be yielded from the same pair of retinal images. How does the visual system go about selecting one of these many possible solutions? Clearly the problem must be constrained or simplified by limiting the possible number of matches. This is done by restricting matches to certain features of texture elements (types of primitives or tokens) and by prioritizing certain matching solutions over others. In addition, there is certainly an interaction between the local and global analysis, which simplifies the process. Local features are often visible within textured fields. For example, there are clumps of leaves in foliage patterns that are clearly visible prior to sensing depth. Vergence eye movements may be cued to align these unique monocular patterns on or near corresponding retinal points and thereby reduce the overall range of disparity subtended by the stimulus (Marr & Poggio, 1976). Once this has been done, the global system can begin to match tokens based upon certain attributes and priority solutions.
200Clifton Schor
A. Classes of Matchable Tokens
There are many possible image qualities that are easily seen under monocular conditions, such as size, color, orientation, brightness, and contrast. These attributes are processed early in the visual system by low-level filters, and they could be used to pair binocular images. Marr and Poggio (1979) proposed that the most useful tokens would be invariant (i.e., reliable) under variable lighting conditions, and that perhaps the visual system had prioritized these invariant features. For example, contrast is a more reliable feature than brightness in the presence of variations in lighting conditions caused by shadows, station point of each eye, and variable intensity of the light source. Zero-crossings, or the maximum rate of change in the luminance distribution would be a locus in the retinal image that would not change with light level. Similarly, color and locations of uniquely oriented line segments or contours would be other stable features that varied slightly with station point. Points of peak contrast and patterns of contrast variation are other local cues that could also be used to match binocular images (Frisby & Mayhew, 1978; Hess & Wilcox, 1994).
When tested individually, none of these local tokens has been found to provide as much information as theoretically possible. Zero crossings predict better performance on stereo tasks at high spatial frequencies ( 2 cpd) than observed empirically (Schor, Wood, & Ogawa, 1984b). Contour orientation contributes to matching when the line segments are longer than 3 arc min (Mitchell & O’Hagan, 1972). Contrast polarity is a very important token for the matching process. As shown in Figure 14, sustained stereoscopic depth is impossible in patterns containing coarse detail of opposite polarity (Krol & van de Grind, 1983). However, stereoscopic depth can be seen in line drawings of opposite contrast principally as a result of misalignment of convergence to bring like contrast edges (Mach bands) into alignment. Similarly, contrast variation within a patch can be used in matching to perform stereo-tasks near the upper disparity limits. Using Gabor patches, these studies show that the upper disparity limit for stereopsis is not limited by carrier spatial frequency but rather it increases with the size of the envelope or Gabor patch (Hess & Wilcox, 1994). These results could be attributed to contrast coding of disparity resulting from a nonlinear extraction of the stimulus contrast envelope (Wilcox & Hess, 1995; Schor, Edwards, & Pope, 1998; Pope, Edwards, & Schor, 1999a).
The upper disparity limit could increase with the size of first-order binocular receptive fields that encode luminance, or it could increase with the size of secondorder binocular receptive fields that encode contrast. In the former case, stereopsis would require that similar spatial frequencies be presented to the two eyes, whereas in the latter case stereopsis would occur with very di erent spatial frequencies presented to the two eyes as long as the envelope in which they were presented was similar. Second order or contrast coding requires that information be rectified such that contrast variations could be represented by changes in neural activity, and this information could be used for binocular matching. Color has been investigated with stereo-performance using isoluminance patterns and found to only support stere-
5 Binocular Vision |
201 |
Reversed luminance polarity and line width. (left) Fusion of the narrow luminancereversed fine lines in the upper stereogram produces depth, like that produced by the same-polarity images in the lower stereogram. (right) Luminance-reversed broad lines in the upper stereogram do not produce depth. The same-polarity images in the lower stereogram do produce depth. (From Krol, J. D., & van de Grind, W. A., 1983. Depth from dichoptic edges depends on vergence tuning. Perception, 12, 425–438. Pion, London, reprinted with permission.)
202 Clifton Schor
opsis with coarse detail (de Weert & Sadza, 1983). However, color can disambiguate binocular matches in conditions such as depth transparency when combined with other cues (Jordan, Geisler, & Bovik, 1990). Clearly, the visual system does not simply match one class of tokens. Redundant information present in several classes of tokens improves binocular matching performance.
B. Matching Constraints
Because we are able to perceive form in textured scenes such as tree foliage by virtue of stereo-depth, the binocular matching process must be constrained to simplify the task of choosing between the extreme number of possible solutions to the matching problem. When matching targets in a fixation plane that is parallel to the face, two rules can completely specify correct binocular matches. The nearest-neighbor rule specifies that matches are made between tokens that subtend the smallest disparity (a bias for precepts along the horopter), and the unique-match rule specifies that each individual token may only be used for a single match. Once a feature is matched it cannot have double duty and be matched to other features as well. Without this restriction, the number of possible matches in a random-dot field would greatly exceed N !. The nearest neighbor rule is demonstrated in the double nail illusion (Figure 15).Two nails are placed in the midsagittal plane and convergence is adjusted to an intermediate distance between them.There are two possible matches, one corresponding to the true midsagittal depth and one corresponding to two nails located at the same depth in the fixation plane to the left and right in the point of convergence. The former solution is one of two di erent disparities, and the latter solution is one of two very small equal or zero disparities. The visual system chooses the latter solution and the two nails appear side by side even though they are really at di erent depths in the midsagittal plane.These two matching rules can be applied over a wide range of viewing distances when accompanied by convergence eye movements that can bring depth planes at any viewing distance into close proximity with the horopter.
The unique match rule appears to be violated by the phenomenon known as Panum’s limiting case. It can be demonstrated by aligning two nails along one eye’s line of sight (Figure 16) such that a single image is formed on one retina and two images, corresponding to the two nails, are imaged on the other retina. Hering believed that the vivid appearance of depth of the two nails resulted from multiple matches of the single image in the aligned eye with the two images in the unaligned eye. However, Nakayama and Shimojo (1992) have accounted for the illusion as the result of a monocular or partial occlusion cue (DaVinci stereopsis) rather than a violation of the unique match rule.
Other rules account for our ability to match the images of targets that subtend non-zero disparities because they lie at di erent depths from the horopter and fixation plane. Usually these matches result in a single percept of the target. However, when disparities become too large they exceed a diplopia threshold and are seen as
5 Binocular Vision |
203 |
The double-nail illusion. Two pins are held in the median plane about 30 cm from the eyes, with one pin about 2 cm farther away than the other. Four images are seen when the eyes are converged at distance A or C. When the far pin is fixated, its images are fused and the near pin is seen with a crossed disparity. When the near pin is fixated, the far pin is seen with an uncrossed disparity. When convergence is a B, about halfway between the pins, the two pairs of images are matched inappropriately to form an impression of two pins side by side. A third pin between the other two helps bring the point of fixation to the intermediate position. (From BINOCULAR VISION AND STEREOPSIS by Ian P. Howard & Brian J. Rogers, Copyright © 1995 by Oxford University Press, Inc. Used by permission of Oxford University Press, Inc.
double. Diplopic targets, especially those subtending small amounts of disparity, can still be seen in stereoscopic depth, which suggests that some form of binocular matching is occurring for nonfused percepts. As noted by Hering, however, depth of diplopic and even monocular images can also be judged on the basis of monocular cues such as hemi-retinal locus (Harris & McKee, 1996; Kaye, 1978), in which case binocular matching would not be required.
Irrespective of whether the percept is diplopic or haplopic, several rules are needed to simplify the matching of images subtending retinal disparities. One of these is the smoothness or similarity constraint. Clearly, it is much easier to fuse and
