Ординатура / Офтальмология / Английские материалы / Seeing_De Valois_2000
.pdf274 Andrew Derrington
together. When the di erentiation that calculates the spatial and temporal gradients is combined with filtering, as it generally is in any biologically plausible motiondetecting model, it is possible to express the two approaches in very similar ways.
There are two reasons for this. First, di erentiating an image and then filtering it with a linear filter gives exactly the same result as di erentiating the filter and then using it to filter the original image. Second, di erentiating a filter converts an even filter into an odd filter, introducing the same phase shifts as exist between the quadrature pairs of filters in the linear motion sensor. Di erentiation also changes the amplitude spectrum of the filter, making it more high pass. One consequence of this is that combining a low-pass or mildly band-pass blurring function, such as might be produced by retinal processing, with di erentiation produces a band-pass filter that becomes narrower with subsequent di erentiation operations.
Mark Georgeson has shown that the motion energy computation can be done using input filters that have these spatial and temporal profiles and that the motion energy velocity computation, which is done by dividing the output of the motion energy stage by the output of a spatially matched static contrast energy stage, is identical to the velocity computation based directly on spatial and temporal derivatives (Bruce, Green, & Georgeson, 1996). This blurs the distinction between the multichannel gradient model and energy models of motion analyzers.
B. Experimental Data
The preceding section shows that, although the various approaches to analyzing the direction of motion have di erent starting points, most of them can be brought to a common end point by appropriate choice of filter characteristics and of subsequent processing. This overlap between the di erent models has the consequence that psychophysical experiments may not reveal which of the approaches described in the previous section provides the most appropriate description of the mechanisms that enable us to see motion.
In this section we shall see that psychophysical experiments allow us not only to demonstrate that some motion percepts cannot be derived from analysis of the correspondence between image features over time and so must result from some sort of filtering operation applied to the raw spatiotemporal luminance profile but also to infer the spatial and temporal characteristics of the filters involved. They may yet allow us to distinguish between the di erent nonlinear operations that follow the filtering stages in the models outlined in the previous section. We shall also see that physiological experiments on single neurones can determine which of the di erent classes of model best applies to the neurone in question.
1. Psychophysics
a. Filtering versus Correspondence
In order to demonstrate that a motion percept does not depend on an analysis of how features or objects in the image change their position with time, it is su cient
6 Seeing Motion |
275 |
to show either that motion is perceived when features are not displaced or do not appear to be displaced, or that the perceived motion and the perceived feature displacement may have di erent speeds or go in di erent directions. I shall discuss three clear demonstrations of this kind of dissociation between perceived motion and the perceived displacement of features, indicating that some motion percepts must be extracted directly from the image. In most images, however, motion analysis by tracking of features and by filtering gives the same result, and special techniques are necessary to distinguish motion percepts based on filtering from those based on correspondence.
Perhaps the most straightforward demonstration of a dissociation between perceived motion and the perceived positions of objects arises in the motion aftere ect, or waterfall illusion, first described in the modern literature in the nineteenth century (Addams, 1834). If, after looking at an object or pattern in continuous motion for some time, the motion is stopped or the gaze is shifted, motion in the opposite direction to that which had actually been occurring is seen in the same part of the visual field. However, static objects in that part of the visual field do not appear to change their positions with time.
The fact that the motion aftere ect does not involve any changes in perceived position is clear evidence that the perception of motion can be dissociated from any changes in the position (or the perceived position) of image features, and thus must result from activity in special-purpose mechanisms for the perception of motion. Barlow and Hill suggested that the motion aftere ect arises because the perception of motion in a given direction arises when there is an excess of activity in neurones selective for motion in that direction, relative to those selective for motion in the opposite direction. They illustrated the point by showing recordings from a direc- tion-selective neurone recorded in the retina of the rabbit. Prolonged stimulation with motion in the preferred direction was followed by a depression of its firing rate below the normal resting level (Barlow & Hill, 1963). Although it is unlikely that direction-selective mechanisms in the human visual system are exactly the same as those in the rabbit retina, the motion aftere ect does point to the existence of motion-sensors in the human visual system that do not depend on changes in position.
A second situation in which motion and position changes are dissociated occurs when subjects attempt to detect oscillating relative motion in random-dot fields (Nakayama & Tyler, 1981). The stimulus consists of a rectangular field of random dots in which each row of dots oscillates to and fro along a horizontal path. The horizontal velocity of each dot is given by the product of a sinusoidal function of its vertical position and a sinusoidal function of time. If the motion of such a pattern were sensed by a mechanism that sensed changes in the horizontal positions of dots, one would expect that the motion would be detectable whenever the amplitude of the oscillating displacement exceeded the smallest detectable displacement.
In fact, as Figure 9 shows, for temporal frequencies up to about 2 Hz, the threshold displacement in an oscillating random-dot display declines almost exactly in
276 Andrew Derrington
FIGURE 9 Displacement thresholds for detecting oscillatory shearing motion in a pattern of random dots. From approximately 0.1 Hz to 2 Hz the threshold declines in proportion to the temporal frequency of oscillation which indicates that, expressed as a velocity, it is unchanging over this range. (Reprinted from Vision Research, 21, Nakayama, K., & Tyler, C. W. Psychophysical isolation of movement sensitivity by removal of familiar position cues, pp. 427–433. Copyright 1981, with permission of Elsevier Science and the author.)
proportion to the temporal frequency, indicating that threshold is reached at a constant velocity rather than at a constant displacement. Although it is not possible a priori to define how the sensitivity of motion detectors of di erent types should vary with temporal frequency, it seems likely that in this case the limit is not set by a mechanism that tracks displacements since one would expect the performance of such a mechanism to be limited by displacement, and to decline rapidly at high temporal frequency, since one might expect that the encoding of each position displacement would take a more or less fixed time.
A third situation in which motion judgments and displacement judgments are dissociated arises when human subjects are asked to discriminate the direction of motion of complex grating patterns that contain a high spatial-frequency (about 3 cycles/deg) moving sinusoid added to a low spatial-frequency (1 cycle/deg) sinusoid that does not move. At long durations the motion of such a pattern is seen correctly, but when it is presented for less than about 100 ms the pattern appears to move in the opposite direction to the actual motion of the 3 cycle per degree component, both when the motion is continuous and when it is part of a two-frame apparent motion sequence (Derrington & Henning, 1987b; Henning & Derrington, 1988). However, if subjects are asked to discriminate the direction of vernier o set between the two frames of the apparent motion sequence presented one
6 Seeing Motion |
277 |
above the other, they perform correctly both at short durations and at long durations, as shown in Figure 10.
It is not clear what makes this pattern appear to move in the wrong direction when it is presented for a short duration; however, the fact that it reverses its direction of motion without reversing the o set in perceived position indicates that the motion signal is derived independently of any sense of spatial position. It does not depend on a correspondence-based mechanism.
If we make the assumption that the reversal of the motion percept at short durations is some intrinsic property of the motion filters, such as an interaction between filters tuned to di erent spatial frequencies, it follows that the most likely explanation for the recovery in performance at long durations is that the correspondencebased mechanism for sensing motion is able to provide a veridical signal at long durations that overcomes the erroneous signal derived from motion filters. From this it is tempting to infer that the stimulus duration at which veridical motion is first seen, about 200 ms, represents a lower limit on the operation of the corre- spondence-based motion-sensing mechanism. It suggests that one way to isolate motion mechanisms based on spatiotemporal filters is to use stimuli shorter in duration than this.
FIGURE 10 Performance in judging direction of motion and direction of vernier o set in a pattern that consisted of the sum of a 3 cycle/degree grating that was displaced either between frames (in the motion task) or between the top of the frame and the bottom of the frame (in the vernier task) and a 1 cycle/degree grating that was not displaced. The motion discrimination is reliable but incorrect (i.e., observers see reversed motion) at short durations and correct at long durations. The vernier discrimination is correct at long and short durations. (Reprinted from Vision Research, 27, Derrington, A. M., & Henning, G. B. Errors in direction-of-motion discrimination with complex stimuli, 61–75. Copyright 1987, with permission of Elsevier Science.)
278 Andrew Derrington
Motion discriminations that depend on correspondence-based mechanisms can be identified by adding to the stimulus a mask that prevents a correspondence-based mechanism from extracting a motion signal but does not a ect the motion filter. Lu and Sperling (1995) have shown that adding a pedestal, a high-contrast static replica of itself, to a moving sinusoidal grating should have no e ect on an elaborated Reichardt detector, while making it impossible for a correspondence-based analysis to extract a motion signal.
The logic of the pedestal test is straightforward. First, the elaborated Reichardt detector is immune to the pedestal because its response to the sum of several di erent temporal frequencies is the sum of the responses to the individual temporal frequencies (Lu & Sperling, 1995). The pedestal simply adds an extra temporal frequency component—0 Hz, which generates no output from the Reichardt detector—to the moving stimulus. Accordingly, the pedestal should not a ect the response of the Reichardt detector. In fact, we can expect that when the contrast of the pedestal is high it will reduce sensitivity by activating gain-control mechanisms, but this should not happen until it is several times threshold contrast.
On the other hand, even if the pedestal is only slightly higher in contrast than the moving pattern, it will prevent features from moving consistently in any one direction. Instead, as Figure 11 illustrates, the features oscillate backwards and forwards over a range that depends on the relative contrasts of the moving pattern and the pedestal. In the presence of the pedestal any mechanism that depends solely on changes in the positions of features in the image to compute a motion signal will be prevented from extracting a consistent motion signal.
Psychophysical measurements of contrast thresholds show that pedestals of moderate contrast do not a ect thresholds for judgments of direction of motion of simple luminance patterns, but several more complex motion stimuli are a ected. Adding a pedestal to the moving stimulus raises the contrast required to discriminate direction of motion of patterns defined by variations in binocular disparity or direction of motion (Lu & Sperling, 1995).
These results raise the possibility that we may be able to divide motion stimuli into two classes according to whether or not they are susceptible to pedestals. How-
FIGURE 11 Space–time plots of a moving sinusoidal grating, a static pedestal of twice its contrast, and their sum. Adding the pedestal to the grating prevents the continuous displacement of features that occurs during movement. Instead the features oscillate and change their contrast over time.
6 Seeing Motion |
279 |
ever, although such a classification is attractive, it is not necessarily as straightforward to interpret as Lu and Sperling (1995) suggest. In fact, there are three specific reasons that we should not leap to the conclusion that all motion stimuli that are immune to pedestals are analyzed by correspondence-based mechanisms (feature trackers) and those that are not vulnerable are analyzed by motion filters.
First, the Reichardt detector’s immunity to pedestals depends on an assumption that the detector’s response is integrated over time. The space–time plot in Figure 15 shows that the addition of a pedestal to a moving sinusoidal grating gives rise to a stimulus that moves backwards and forwards over time. When the motion is forwards (i.e., in the same direction as when there is no pedestal) the contrast is higher; however, the grating spends almost half its time moving in the reverse direction. Brief stimuli, or stimuli that are not integrated over time could well give rise to a motion signal in the opposite direction resulting in a deterioration in performance in the presence of the pedestal. Thus relatively minor variations in the detailed architecture of the Reichardt motion detector might make it vulnerable to pedestals. In addition, Lu and Sperling (1995) suggest that high-contrast pedestals are likely to impair the performance of a Reichardt motion analyzer simply by activating a contrast gain-control mechanism.
Second, the assertion that correspondence-based or feature-tracking motion analyzers are vulnerable to pedestals depends on an assumption that the contrast of a feature has no e ect on the ability to analyze its location or to track it. It might well be that when di erent features signal opposite directions of motion, or when the same feature signals opposite directions of motion at di erent times, the features with higher contrast are more likely to determine the perceived direction of motion. If this were to happen we would expect that feature-tracking motion analyzing mechanisms would be resistant to pedestals.
Third, even if all feature-tracking motion mechanisms are vulnerable to pedestals and all motion filters are resistant to them, we should acknowledge that in principle any feature can be tracked, whether or not its motion is normally analyzed by a motion filter or Reichardt detector. Thus many moving stimuli will be analyzed by both types of mechanisms and the e ect of interfering with one or other mechanism will depend on which is the more sensitive.
It follows that the discovery that under a particular set of circumstances our ability to analyze the motion of a particular stimulus is resistant to a pedestal does not mean that under normal circumstances feature tracking may not make an important contribution to the analysis of the motion of that particular stimulus. In my own lab we have found that the same sinusoidal grating moving at the same speed can become vulnerable to a pedestal simply by making its motion less smooth by causing it to move in jumps of period (Derrington & Ukkonen, unpublished observations).
This change can easily be explained by assuming that when the grating moves in smaller steps, spatiotemporal filters are more sensitive than the feature-tracking mechanism, and when it moves in large steps, the reverse is true.This kind of change in relative sensitivities seems quite reasonable in two respects. First, as the jump size
280 Andrew Derrington
increases and the average speed remains constant, tracking features should become easier because the features spend more time stationary in between jumps. Second, changing the jump size while keeping the speed constant a ects the responsiveness of motion detectors based on quadrature pairs of spatial filters because the temporal frequency spectrum of the stimulus becomes contaminated by sampling artifacts (Watson, 1990).
In sum, although the pedestal test represents a promising potential technique for distinguishing between di erent types of motion mechanisms, it is appropriate to be cautious both in interpreting the results and in extrapolating from them.
b. Characteristics of Motion Filters
The spatial and temporal frequency selectivity of the mechanisms subserving motion perception can be analyzed by the same techniques as have been used to study the mechanisms of spatial vision (Braddick, Campbell, & Atkinson, 1978). The most widely used techniques are adaptation (also known as habituation), in which the aftere ect of viewing a moving stimulus is a selective elevation of threshold for subsequently presented stimuli that are moving in the same direction, and masking, in which a high-contrast moving “mask” selectively elevates the threshold of concurrently presented stimuli that are similar to the mask. One of the most complete descriptions of the spatial and temporal frequency selectivity of mechanisms responsible for the detection of moving stimuli comes from a study in which observers adjusted the contrast of a moving grating until it was just visible (Burr, Ross, & Morrone, 1986). High-contrast masking gratings that flickered in temporal counterphase but did not move were added to the test and elevated its threshold.3 When plotted as a function of spatial frequency, the threshold elevation curves always peaked at the spatial frequency of the test grating. When plotted as functions of temporal frequency, however, the threshold elevation function peaked at a frequency close to that of the test when the test had high temporal frequency (8 Hz) regardless of the spatial frequency, and were low-pass with constant height from 10 Hz down to 0.3 Hz when the test grating had a low temporal frequency (0.3 Hz).
By measuring how threshold elevation varied with the contrast of the mask, Burr et al. (1986) were able to infer the threshold sensitivity of the mechanisms responsible for detecting the test. They made the assumption that detection was determined by a linear filter, which was followed by a compressive nonlinearity. Consequently, by using the way threshold elevation changes with mask contrast to factor out the nonlinearity, they were able to calculate the spatiotemporal frequency sensitivity of the filter; they were also able to calculate its profile in space–time. Figure 12 shows space–time profiles of the filters responsible for detecting test stimuli of 0.1, 1, and 5 cycles/deg moving at 8 Hz, and 5 cycles/deg moving at 0.3 Hz. In the first three cases the characteristics of the filter are well matched both to the spa-
3A counterphase flickering grating is the sum of two sinusoidal gratings of the same spatial frequency and contrast moving in opposite directions.
6 Seeing Motion |
281 |
Space–time contour plots of the spatiotemporal impulse responses of motion-detec- tors in the human visual system inferred from masking experiments. Areas of opposite sign are indicated by dashed and continuous lines respectively. The straight line through the center of each plot represents the speed of the test pattern. Note that the horizontal scale changes by factors of 4 from b to c and again from c to d. (Data from Burr et al., 1986, Figure 6 with permission of the Royal Society of London and authors.)
tial structure of the stimulus and to the changes over time as it moves. In the last case, where the stimulus moves at 0.3 Hz, the spatial structure of the filter matches that of the grating, but its temporal structure shows no selectivity either for moving stimuli against static stimuli or for one direction of motion against the other.
Burr and his colleagues suggested that it is the combined spatial and temporal selectivity of the motion-selective filters that allows us to analyze the spatial structure of moving objects and to integrate the energy from a moving spot without smearing it; however, the relationship between the outputs of such filters and our perception of the motion of complex patterns is not straightforward. When a simple, briefly presented, moving grating is added to a static grating of lower spatial frequency, the resulting stimulus appears to move in the opposite direction from
282 Andrew Derrington
that in which it actually moves. Subjects are perfectly reliable in discriminating between opposite directions of motion, but consistently wrong in their decision about which direction is which. This illusory reversed motion is at its strongest when the contrast of the moving pattern and that of the static pattern are roughly equal, and well above threshold (Derrington & Henning, 1987a). This suggests that there is some kind of nonlinear interaction between the filters tuned to di erent spatial frequencies that creates a motion metamer, that is, a stimulus that is moving in one direction but appears as if it is moving in the opposite direction.
2. Physiology
The fact, discussed in section III.A, that di erent schemes for generating direction selectivity can be rendered exactly equivalent to one another makes it di cult to conceive of psychophysical experiments that would reveal the principles of operation of direction-selective mechanisms. Part of the di culty here is that in a typical psychophysical experiment, the observer makes a single, usually binary, decision based on a large number of mechanisms. There is no access to the outputs of individual mechanisms. However, in physiological experiments on the mammalian visual cortex, it is possible to record the outputs of single cells that can be represented as direction-selective spatiotemporal frequency filters (Cooper & Robson, 1968; Movshon, Thompson, & Tolhurst, 1978a, 1978b).
Emerson et al. (1987) have shown that the responses of a complex cell to spatiotemporal sequences of bars flashed in di erent parts of the receptive field can be used to distinguish between di erent nonlinear filtering operations that might give rise to direction selectivity. Figure 13 shows an example of the responses of a complex cell in cat striate cortex. The pattern of these responses is consistent with what would be produced by the nonopponent level of the motion energy filter, but not the Reichardt detector.
A plausible physiological implementation of the motion energy filter in the complex cell receptive field uses two direction-selective subunits with receptive fields of opposite sign to represent each direction-selective filter in the quadrature pair (Emerson, 1997). Each subunit has a nonlinear output stage that half-wave rectifies and then squares the signal (Heeger, 1991), so that adding together the two complementary subunits gives the e ect of a linear filter followed by a squarer. A second pair of filters in quadrature spatial and temporal phase relationship to the first pair complete the model and render the receptive field model formally identical to the motion-energy filter (Adelson & Bergen, 1985; Emerson, 1997).
Simple cells respond to a moving grating with a modulated response whose temporal frequency matches the temporal frequency of the moving grating (Movshon et al., 1978b), so direction-selective simple cells could not possibly be based on the motion energy filter which gives an unmodulated response to a moving grating (Adelson & Bergen, 1985). However, a number of features of the responses of simple cell receptive field suggest that its selectivity for direction of motion could be based, at least in part, on linear filtering like that underlying the motion energy filter.
6 Seeing Motion |
283 |
Space–time contour plot of nonlinear motion-selective spatiotemporal interactions in the receptive field of a cortical cell. The plot is produced by measuring the increase (continuous line) or decrease (dashed line) in the response to a conditioning flash produced by a test flash that is presented at a di erent time and position. The interaction is plotted as a function of the spatial and temporal separation between the test flash and the conditioning flash. The spatiotemporal pattern of facilitation and inhibition represent a nonlinear contribution to direction-selectivity. (Replotted from Emerson, Citron, Vaughn, & Klein, 1987, with permission from the American Psychological Society.)
The temporal phase of a simple cell’s response to a flickering sinusoidal grating varies with the spatial phase of the grating, and the temporal impulse response varies with the location of the stimulus in the receptive field in a way that suggests that the linear receptive field is oriented in space–time like the linear motion sensor (Albrecht & Geisler, 1991). Detailed nonlinear analyses of simple cell responses to flickering gratings and to flashing bars in di erent parts of the receptive field suggest that in fact the simple-cell receptive field may contain a single pair of linear motion filters that have a quadrature spatial and temporal phase relationship with one another (Emerson, 1997; Emerson & Huang, 1997). The resulting receptive fields are very similar to those of the single subunits in linear receptive-field models, except that they give a more sustained response to a stimulus moving in the preferred direction.
The receptive field model of the simple cell contains exactly half the components from the model of the complex receptive field. Missing from the simple cell model are the complementary negative replicas of each motion-filtering subunit
