Ординатура / Офтальмология / Английские материалы / Seeing_De Valois_2000
.pdf264 Andrew Derrington
Ahumada, 1983, 1985). The axes of Figure 3 are spatial frequency and temporal frequency. Moving components of the image have a temporal frequency that is given by their velocity divided by their temporal frequency. Thus the components of an extended moving image will fall on a straight line whose gradient is the image velocity. This means that the velocity in one spatial dimension of an image can be estimated by a combination of spatial frequency filtering and temporal frequency estimation (Watson & Ahumada, 1985).
In a more general sense, the direction (sign) of the motion determines the sign of the temporal frequencies generated by di erent spatial frequencies. Motion to the left (negative velocities) generates positive temporal frequencies from positive spatial frequencies and negative temporal frequencies from negative spatial frequencies, whereas motion to the right generates the opposite pairs of sign combinations. Accordingly, the direction of motion is signaled by whether the spatiotemporal frequency components it generates lie in the first and third or second and fourth quadrants of the spatiotemporal frequency diagram.
Because it is truncated in space and time, our moving grating is spread out in spatial and temporal frequency. Its spatiotemporal frequency components form a 2- D Gaussian pulse centered on the line that corresponds to its notional velocity. In a 3-D plot they would also be spread slightly in orientation about the nominal orientation of the original grating (vertical). The components of an extended twodimensional object occupy a plane tilted in temporal frequency. Again, image motion can be estimated by using the outputs of filters selective for spatial frequency, orientation, and temporal frequency to estimate the azimuth and elevation of this plane (Heeger, 1987).
D. Second-Order Motion
In the moving image we have considered so far, a moving grating is seen through a stationary window, and the motion is signaled directly by the spatiotemporal variations in the luminance profile of the stimulus. However, it is also important to consider other kinds of moving stimuli in which luminance patterns remain stationary, but higher order attributes of the patterns move. One example of this is the con- trast-modulated grating, shown in Figures 4a and b, which consists of a carrier pattern, usually a high spatial frequency sinusoid or a random 1-d or 2-d noise pattern, whose contrast is modulated by a spatial envelope of lower spatial frequency. As Figure 4b and c show, the envelope can be made to move while leaving the carrier stationary. This is an example of second-order, or non-Fourier motion.
The human visual system can discriminate the motion of a contrast envelope even when the motion of the luminance signals that make it up is undetectable (Badcock & Derrington, 1985). This makes it important to consider how such motion may be represented and how it may be processed. When these second-order patterns are made to move, the luminance pattern itself remains static and the contrast modulation moves.This is made clear in Figure 4c and d, which shows a space–
FIGURE 4 Second-order patterns and representations of their motion. (a) A contrast-modulated sinusoidal grating and (b) a contrast-modulated 1-D random-noise pattern. The modulation frequency (expressed in cycles/picture) is the same as in Figures 1 and 2. Space–time plots of the same two patterns moving at 0.5 image frames/sec are shown in (c) and (d). Note that the carriers are stationary: their bars are vertical, only the contrast-envelope is tilted. (e), (f) The spatiotemporal frequency representations of (c) and (d).
266 Andrew Derrington
time plot of a slice through the middle of each of the second-order patterns from Figure 4, moving at a speed of 0.1 image frames per second.
In each case the luminance variation is marked by dark and light lines that run vertically through the figure, giving it a local horizontal profile that is either sinusoidal (4c) or random (4d). The fact that the lines representing the luminance profiles run vertically through the plot indicates that the carriers themselves are stationary. But the horizontal sinusoidal contrast envelope that modulates each carrier pattern is clearly moving: the contrast envelope shifts gradually towards the left as one moves up the time axis.
The motion of a contrast-modulated sine wave also has a signature in the frequency domain. This comes from the fact that modulating a sinusoidal carrier of frequency fc at some modulation frequency fm is the same as adding sidebands of frequencies fc fm and fc fm to it
{1 + m sin 2 ( fmx + t )}sin(2 fc x) = sin(2 fc x) |
|
||
+ |
m |
{sin 2 ( fc x + fmx + t ) + sin 2 ( fc x − fmx − t )}. |
(1) |
|
|||
2 |
|
|
|
The higher frequency sideband moves in the same direction as the contrast modulation, and the lower frequency sideband moves in the opposite direction. This means that, as with moving luminance patterns, the motion is represented by a group of spatiotemporal frequency components that lie along a line whose slope gives the velocity of the motion. However, unlike the situation with a moving luminance pattern, the line does not pass through the origin. Instead it crosses the x axis at a location corresponding to the spatial frequency of the carrier. If the carrier is itself a complex pattern containing many spatial frequency components, then each component has oppositely moving sidebands of higher and lower spatial frequencies added to it.
Figure 4e and f show spatiotemporal frequency representations of the patterns whose motion is represented as S-T plots in Figure 4c and d. In Figure 4e, in which the carrier is a simple sinusoid, the two extra components corresponding to the sidebands are easily seen. When the carrier contains a large number of static spatial frequency components, as in the case in Figure 6f, the signature is less obvious.
E. Representing Motion in 2-D Velocity Space
As we shall see, the visual system’s early analysis of motion appears to rely on ori- entation-selective analyzers, each performing an analysis in one spatial dimension of the motion along a particular axis in the 2-D image. Such analyses can conveniently be represented as 2-D space–time plots or spatiotemporal frequency plots; however, to show how analyses from two di erently oriented spatial axes can be represented and combined to calculate an unambiguous 2-D motion, we need to represent motion in two spatial dimensions.
6 Seeing Motion |
267 |
This can be done by representing motions as vectors in a space that has horizontal and vertical velocity as its cardinal axes. Some convention is needed to distinguish between vectors that represent 1-D motion along some predetermined axis, and those that represent truly 2-D motions.
In this chapter I shall use the terms 1-D vector to refer to a component of a 2-D motion along some predetermined axis, such as the motion of an oriented feature of a moving object or pattern which must necessarily appear to move along an axis perpendicular to its orientation. The motion is 1-D in that each oriented feature is constrained to move along the axis perpendicular to its orientation, although the axes of the 1-D motions of di erently oriented features have di erent directions in 2-D velocity space. We use the term 2-D vector to refer to the motion of a 2-D pattern that is necessarily 2-D.
Figure 5 shows an example of the reconstruction of 2-D vectors from several 1- D vectors. Each 1-D vector constrains the 2-D vector of the object containing the feature that gave rise to it to lie along a line orthogonal to the 1-D vector and passing through its end point. Where features of several di erent orientations are present, the constraint lines that they specify will intersect at the same point unless they belong to more than one object moving with more than one velocity. If there are multiple velocities in a pattern they will give rise to multiple intersection points.
FIGURE 5 Velocity space representation of the intersection of constraints construction for resolving the 2-D motion (solid-headed arrow) consistent with several 1-D motion vectors (empty-headed arrows). The dashed lines passing through the tip of each 1-D vector shows the constraint line, the range of 2-D velocities consistent with that 1-D velocity. The three constraint lines intersect, indicating that they are all consistent with a single 2-D velocity.
268Andrew Derrington
III. ANALYZING DIRECTION OF MOTION ALONG A GIVEN AXIS
A. Principles and Approaches
Theoretical models of detectors that would be selectively sensitive to one direction of motion along a predetermined axis have been inspired by a broad range of motion analyzing problems from physiology, psychophysics, and machine vision. Perhaps the earliest motion detector that is still current was inspired by analysis of the optomotor response in the beetle (Reichardt, 1961). Reichardt’s motion detector derives a motion signal by spatiotemporal correlation of luminance signals from neighboring points in the image.The same principle has been applied both to luminance signals and to second-order signals to account for aspects of human motion perception, and to explain the physiological properties of direction-selective neurones.
Other models of human motion detection have been derived from analysis of the signature that motion in a particular direction leaves in the spatiotemporal frequency domain (Watson & Ahumada, 1985), from analyses of contrast energy (Adelson & Bergen, 1985) and from comparison of the spatial and temporal variations in luminance in moving images (Fennema & Thompson, 1979; Johnston, McOwan, & Buxton, 1992). In the following sections I shall introduce examples of the di erent detectors that have been derived from these approaches and discuss their similarities. In each case I start by considering the motion-detector as a device designed to respond selectively to one of the representations of motion shown in Figures 2–4.
1. Correlation
Reichardt’s model of the motion detectors that process the signals from the insect’s compound eye exploits the fact that, when an image moves along a line between two receptors, the two receptors receive time-shifted versions of the same temporal signal. This is illustrated in Figure 6a, which shows vertical slices through the space–time image of Figure 2a. These show the temporal profile of the moving image measured at horizontal positions 0.25 grating periods apart. The two profiles show similar temporal waveforms shifted slightly in time.1
The time shift between the two profiles is simply the time it takes for any part of the image to move through the distance between the two points at which the profile is measured. It can be calculated as the spatial separation between the two points divided by the velocity of the image motion.
Reichardt’s model of the insect motion-detector consists of two stages. The first stage correlates signals from two neighboring receptors by multiplying the signal from one of them by a temporally filtered version of signal from its neighbor. The temporal filter introduces a time shift, so the multiplier’s output for any given
1There is a slight di erence in the profile caused by the fact that the moving grating does not have uniform contrast throughout the whole image but is spatially shaped by a stationary Gaussian window.
6 Seeing Motion |
269 |
FIGURE 6 (a) Temporal profiles of the stimulus in Figure 2 at two points 0.25 spatial periods apart.
(b) Elaborated Reichardt motion detector. Each side multiplies the output of one spatial filter by the delayed output from the other, thereby selecting stimuli moving in the direction from the delayed to the nondelayed filter. Consequently the two multipliers select opposite directions of motion. By taking the di erence between the time-averaged outputs of the two multipliers the detector produces an output that indicates the direction of motion by its sign.
moving image is greatest when velocity of the motion is such that the time shift within the temporal filter matches the time taken for the image to move the distance between the two receptors. Each first stage has a symmetrical partner that di ers from it only in that it time-shifts the signal from the second receptor instead of the first, in order to sense motion in the opposite direction. The second stage of the detector simply subtracts the signals of the two first stages.
Reichardt’s motion detector has been developed in order to apply it to human psychophysics (van Santen & Sperling, 1984, 1985). Figure 6b shows an elaborated version that di ers from the original in that it has additional filtering stages. Spatial filters at the input restrict the range of spatial frequencies passed by the detector in order to prevent it from responding to high spatial frequencies moving in the opposite direction. Temporal filters at the output integrate the response over time. The spatial filters shown in Figure 6b are identical to each other and spatially o set. To prevent a response to high spatial frequency patterns moving in the opposite direction (spatial aliasing), the filters should remove spatial frequencies above 1/(2d ) cycles per degree, where d is the separation between the centers of the input receptive fields (van Santen & Sperling, 1984). An alternative way of arranging the filters is for the two input spatial filters to be superimposed on one another, but to give them spatial profiles that di er in a way that causes a phase shift between them of/2 for all spatial frequencies in their passband (van Santen & Sperling, 1984).
270Andrew Derrington
2.Linear Spatiotemporal Frequency Filters
Watson and Ahumada (1985) took a di erent approach to designing a motionsensor. They sought to design a spatiotemporal frequency filter that was selective for the characteristic distribution of spatiotemporal frequency components into opposite quadrants, which is the signature of motion in a particular direction.
Their approach was very straightforward. The sensor achieved its selectivity for direction of motion by linearly combining the outputs of component filters that were nondirectional but that had di erent spatial and temporal phase characteristics. The component filters’ phase characteristics were designed so that signals representing the preferred direction of motion would give rise to outputs that reinforced each other, whereas those representing the opposite direction of motion would give rise to outputs that would cancel each other.
The main components of the sensor were a quadrature pair of spatial filters and a quadrature pair of temporal filters that were connected together so that signals took two parallel paths through the sensor. Each path contained one of the quadrature pair of spatial filters and one of the quadrature pair of temporal filters in series. The e ect was that spatiotemporal frequencies from adjacent quadrants, which represent opposite directions of motion, generate outputs from the two pathways through the sensor that are opposite in sign. Consequently, adding or subtracting the signals in the two pathways generates variants of the sensor that are identical except that they respond selectively to opposite directions of motion. All the components representing the nonpreferred direction of motion within each sensor are canceled by the addition or subtraction of the first-stage filters.
Figure 7a and b shows plots of the sensitivity profiles of spatial and temporal filters that can be combined to produce a motion sensor of the type devised by Watson and Ahumada (1985).2 The products of these spatial and temporal filters produce four di erent filters with spatiotemporal profiles that are not oriented in space–time (Figure 7c–f). The lack of orientation in space–time means that the filters are not selective for either direction of movement, although they may be selective for high temporal frequency and thus respond better to moving images than to static images. However, the sums and di erences of the separable filters, shown in Figure 7 g–k, are oriented to the left or to the right in space time and consequently are selective for that direction of image motion.
The motion sensors shown in Figure 7 g–k are linear in spatial and temporal summation. An image sequence that represents motion in its preferred direction gives rise to the same total output as it would if it were played in reverse to give motion in the opposite direction. The di erence is that although the mean level is the same for both directions, motion in the preferred direction modulates the detector output more strongly. For example, sinusoidal gratings moving in di erent directions all give rise to sinusoidal outputs that have a mean level of zero, but the
2In fact the temporal filters used here are not a quadrature pair; they have the form proposed by Adelson and Bergen (1985).
FIGURE 7 Spatiotemporal filtering operations leading to linear motion sensors and motion-energy filters.
(a) Spatial filters for the front-end of a linear motion sensor. The two functions are an even-symmetric and an oddsymmetric Gabor function. (b) Two physiologically plausible temporal filters with di erent time-courses (Adelson & Bergen, 1985). (c,d,e,f ) Space–time plots of the spatiotemporal impulse responses of four spatiotemporal filters formed from the products of the spatial and temporal filters in a and b. None of these filters is oriented in space/time. (g,h,j,k) Space–time plots of the spatiotemporal impulse responses of linear motion sensors produced from the sums and di - erences of the filters c–f as indicated. These filters are oriented in space–time; g and j are selective for leftward motion, and h and k are selective for rightward motion. The pairs of filters selective for each direction of motion for spatial quadrature pairs. Consequently the summed squares of the outputs of the filters tuned to each direction of motion constitute the motion energy signal. The di erence between the two motion energy signals (not indicated) would be an opponent motion-energy signal.
272 Andrew Derrington
amplitude of the sinusoid varies with direction of motion, being greatest for motion in the sensor’s preferred direction and zero for motion in the opposite direction.
Watson and Ahumada (1985) exploit the oscillations in the output of their motion sensor to derive an estimate of image velocity. Because the sensor is selective for spatial frequency, the temporal frequency of the oscillations in its output in response to a spatially broad-band input depends on the speed of the input. Image velocity is estimated by metering the temporal frequency of the outputs of direc- tion-detectors tuned to di erent axes of motion and fitting a sinusoid.
3. Motion Energy Filtering
The outputs of linear motion sensors can be combined to make a filter selective for oriented contrast energy in the s-t image (Adelson & Bergen, 1985) (shown in Figure 7). The motion energy is constructed by taking the outputs of quadrature pairs of motion sensors tuned to the same direction and speed of motion, squaring them, and summing them. The resulting direction-selective detector is nonlinear: it gives a nonoscillating response to a moving sinusoid. An optional final stage of the motion energy filter (not shown in Figure 7) subtracts the responses of the symmetrical stages tuned to opposite directions of motion and gives an output whose sign signals the direction of motion.
The motion energy filter and the linear motion sensor contain very similar linear spatiotemporal filters that extract a direction-selective signal. The essential di erence between the detectors is in the nonlinear processing that follows the direction-selective filtering. Whereas the linear motion sensor uses the temporal frequency of variations in the output from the filter to extract a velocity signal, the energy detector squares and sums the outputs from di erent filters to produce a smoothly changing output. Van Santen and Sperling (1985) have shown that relatively minor modifications make the Reichardt motion detector exactly equivalent to the opponent motion energy detector and have argued that the three motiondetecting schemes are equivalent; however, this argument ignores the stages of Watson and Ahumada’s model of motion sensing that measure velocity.
The motion energy filter gives no direct information about velocity. The filter’s output depends on contrast, on spatial frequency, and on temporal frequency; however, the e ects of contrast and spatial frequency can be discounted by comparing the output of two energy filters tuned to the same spatial frequency but to di erent temporal frequencies. Adelson and Bergen suggest that in order to estimate velocity the output of the motion energy filter could be divided by the output of a contrast energy filter tuned to zero temporal frequency.
4. Spatiotemporal Gradients
The spatial variations in luminance in a moving image give rise to temporal variations at di erent points in space when the image moves. If we make the assumption that all the temporal variation in a space–time image is caused by movement, it becomes possible to use the relation between spatial and temporal variations at
6 Seeing Motion |
273 |
any position to infer the nature of the motion rather exactly. Figure 8 illustrates this for a simple luminance gradient in an image. As the gradient moves past a point in space, the luminance at that point rises at a rate that is proportional to the product of steepness of the spatial gradient and the velocity of translation. This can be expressed exactly in the form
Vx L/ x L/ t, |
(2) |
where Vx is the velocity along an axis x in the image, L is luminance at any point in the image, x is distance along the axis x in the image, and t is time. The velocity is given by dividing the temporal derivative by the spatial derivative
Vx ( L/ t)/( L/ x). |
(3) |
This approach has been used successfully to derive local velocity signals from a sequence of television images and to segment the image on the basis of the di erent velocities present (Fennema & Thompson, 1979). This approach, however, does su er from the problem that the expression for the velocity is a fraction with the spatial luminance gradient as its denominator.When the luminance gradient is small or zero, that is, in parts of the image where the luminance is spatially uniform, the velocity computation will be very noisy.
There are several possible solutions to this problem. The simplest, which was adopted by Fennema and Thompson and has also been incorporated into a specific biological model of motion detection (Marr & Ullman, 1981) is only to make the motion calculation at locations where the spatial luminance gradient has already been identified as being su ciently steep to produce a reliable result. An alternative is to include higher spatial derivatives in the velocity calculation, which has a stabilizing e ect because it is rare for all the derivatives to be zero simultaneously (Johnson et al., 1992).
5. Similarity between the Spatiotemporal Gradient Approach and Motion Energy Filtering
Although on the face of it the computation of velocity as the ratio of the local temporal and spatial luminance derivatives in an image seems very di erent from the filtering approach, in fact there are ways of expressing the two approaches that bring them
FIGURE 8 Spatial luminance gradient. When it moves to the left the luminance at any point on the gradient will decrease at a rate proportional to the velocity and vice versa.
