Добавил:
kiopkiopkiop18@yandex.ru t.me/Prokururor I Вовсе не секретарь, но почту проверяю Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Скачиваний:
0
Добавлен:
28.03.2026
Размер:
8.75 Mб
Скачать

264 Andrew Derrington

Ahumada, 1983, 1985). The axes of Figure 3 are spatial frequency and temporal frequency. Moving components of the image have a temporal frequency that is given by their velocity divided by their temporal frequency. Thus the components of an extended moving image will fall on a straight line whose gradient is the image velocity. This means that the velocity in one spatial dimension of an image can be estimated by a combination of spatial frequency ltering and temporal frequency estimation (Watson & Ahumada, 1985).

In a more general sense, the direction (sign) of the motion determines the sign of the temporal frequencies generated by di erent spatial frequencies. Motion to the left (negative velocities) generates positive temporal frequencies from positive spatial frequencies and negative temporal frequencies from negative spatial frequencies, whereas motion to the right generates the opposite pairs of sign combinations. Accordingly, the direction of motion is signaled by whether the spatiotemporal frequency components it generates lie in the rst and third or second and fourth quadrants of the spatiotemporal frequency diagram.

Because it is truncated in space and time, our moving grating is spread out in spatial and temporal frequency. Its spatiotemporal frequency components form a 2- D Gaussian pulse centered on the line that corresponds to its notional velocity. In a 3-D plot they would also be spread slightly in orientation about the nominal orientation of the original grating (vertical). The components of an extended twodimensional object occupy a plane tilted in temporal frequency. Again, image motion can be estimated by using the outputs of lters selective for spatial frequency, orientation, and temporal frequency to estimate the azimuth and elevation of this plane (Heeger, 1987).

D. Second-Order Motion

In the moving image we have considered so far, a moving grating is seen through a stationary window, and the motion is signaled directly by the spatiotemporal variations in the luminance prole of the stimulus. However, it is also important to consider other kinds of moving stimuli in which luminance patterns remain stationary, but higher order attributes of the patterns move. One example of this is the con- trast-modulated grating, shown in Figures 4a and b, which consists of a carrier pattern, usually a high spatial frequency sinusoid or a random 1-d or 2-d noise pattern, whose contrast is modulated by a spatial envelope of lower spatial frequency. As Figure 4b and c show, the envelope can be made to move while leaving the carrier stationary. This is an example of second-order, or non-Fourier motion.

The human visual system can discriminate the motion of a contrast envelope even when the motion of the luminance signals that make it up is undetectable (Badcock & Derrington, 1985). This makes it important to consider how such motion may be represented and how it may be processed. When these second-order patterns are made to move, the luminance pattern itself remains static and the contrast modulation moves.This is made clear in Figure 4c and d, which shows a space–

FIGURE 4 Second-order patterns and representations of their motion. (a) A contrast-modulated sinusoidal grating and (b) a contrast-modulated 1-D random-noise pattern. The modulation frequency (expressed in cycles/picture) is the same as in Figures 1 and 2. Space–time plots of the same two patterns moving at 0.5 image frames/sec are shown in (c) and (d). Note that the carriers are stationary: their bars are vertical, only the contrast-envelope is tilted. (e), (f) The spatiotemporal frequency representations of (c) and (d).

266 Andrew Derrington

time plot of a slice through the middle of each of the second-order patterns from Figure 4, moving at a speed of 0.1 image frames per second.

In each case the luminance variation is marked by dark and light lines that run vertically through the gure, giving it a local horizontal prole that is either sinusoidal (4c) or random (4d). The fact that the lines representing the luminance proles run vertically through the plot indicates that the carriers themselves are stationary. But the horizontal sinusoidal contrast envelope that modulates each carrier pattern is clearly moving: the contrast envelope shifts gradually towards the left as one moves up the time axis.

The motion of a contrast-modulated sine wave also has a signature in the frequency domain. This comes from the fact that modulating a sinusoidal carrier of frequency fc at some modulation frequency fm is the same as adding sidebands of frequencies fc fm and fc fm to it

{1 + m sin 2 ( fmx + t )}sin(2 fc x) = sin(2 fc x)

 

+

m

{sin 2 ( fc x + fmx + t ) + sin 2 ( fc x fmx t )}.

(1)

 

2

 

 

The higher frequency sideband moves in the same direction as the contrast modulation, and the lower frequency sideband moves in the opposite direction. This means that, as with moving luminance patterns, the motion is represented by a group of spatiotemporal frequency components that lie along a line whose slope gives the velocity of the motion. However, unlike the situation with a moving luminance pattern, the line does not pass through the origin. Instead it crosses the x axis at a location corresponding to the spatial frequency of the carrier. If the carrier is itself a complex pattern containing many spatial frequency components, then each component has oppositely moving sidebands of higher and lower spatial frequencies added to it.

Figure 4e and f show spatiotemporal frequency representations of the patterns whose motion is represented as S-T plots in Figure 4c and d. In Figure 4e, in which the carrier is a simple sinusoid, the two extra components corresponding to the sidebands are easily seen. When the carrier contains a large number of static spatial frequency components, as in the case in Figure 6f, the signature is less obvious.

E. Representing Motion in 2-D Velocity Space

As we shall see, the visual system’s early analysis of motion appears to rely on ori- entation-selective analyzers, each performing an analysis in one spatial dimension of the motion along a particular axis in the 2-D image. Such analyses can conveniently be represented as 2-D space–time plots or spatiotemporal frequency plots; however, to show how analyses from two di erently oriented spatial axes can be represented and combined to calculate an unambiguous 2-D motion, we need to represent motion in two spatial dimensions.

6 Seeing Motion

267

This can be done by representing motions as vectors in a space that has horizontal and vertical velocity as its cardinal axes. Some convention is needed to distinguish between vectors that represent 1-D motion along some predetermined axis, and those that represent truly 2-D motions.

In this chapter I shall use the terms 1-D vector to refer to a component of a 2-D motion along some predetermined axis, such as the motion of an oriented feature of a moving object or pattern which must necessarily appear to move along an axis perpendicular to its orientation. The motion is 1-D in that each oriented feature is constrained to move along the axis perpendicular to its orientation, although the axes of the 1-D motions of di erently oriented features have di erent directions in 2-D velocity space. We use the term 2-D vector to refer to the motion of a 2-D pattern that is necessarily 2-D.

Figure 5 shows an example of the reconstruction of 2-D vectors from several 1- D vectors. Each 1-D vector constrains the 2-D vector of the object containing the feature that gave rise to it to lie along a line orthogonal to the 1-D vector and passing through its end point. Where features of several di erent orientations are present, the constraint lines that they specify will intersect at the same point unless they belong to more than one object moving with more than one velocity. If there are multiple velocities in a pattern they will give rise to multiple intersection points.

FIGURE 5 Velocity space representation of the intersection of constraints construction for resolving the 2-D motion (solid-headed arrow) consistent with several 1-D motion vectors (empty-headed arrows). The dashed lines passing through the tip of each 1-D vector shows the constraint line, the range of 2-D velocities consistent with that 1-D velocity. The three constraint lines intersect, indicating that they are all consistent with a single 2-D velocity.

268Andrew Derrington

III. ANALYZING DIRECTION OF MOTION ALONG A GIVEN AXIS

A. Principles and Approaches

Theoretical models of detectors that would be selectively sensitive to one direction of motion along a predetermined axis have been inspired by a broad range of motion analyzing problems from physiology, psychophysics, and machine vision. Perhaps the earliest motion detector that is still current was inspired by analysis of the optomotor response in the beetle (Reichardt, 1961). Reichardt’s motion detector derives a motion signal by spatiotemporal correlation of luminance signals from neighboring points in the image.The same principle has been applied both to luminance signals and to second-order signals to account for aspects of human motion perception, and to explain the physiological properties of direction-selective neurones.

Other models of human motion detection have been derived from analysis of the signature that motion in a particular direction leaves in the spatiotemporal frequency domain (Watson & Ahumada, 1985), from analyses of contrast energy (Adelson & Bergen, 1985) and from comparison of the spatial and temporal variations in luminance in moving images (Fennema & Thompson, 1979; Johnston, McOwan, & Buxton, 1992). In the following sections I shall introduce examples of the di erent detectors that have been derived from these approaches and discuss their similarities. In each case I start by considering the motion-detector as a device designed to respond selectively to one of the representations of motion shown in Figures 2–4.

1. Correlation

Reichardt’s model of the motion detectors that process the signals from the insect’s compound eye exploits the fact that, when an image moves along a line between two receptors, the two receptors receive time-shifted versions of the same temporal signal. This is illustrated in Figure 6a, which shows vertical slices through the space–time image of Figure 2a. These show the temporal prole of the moving image measured at horizontal positions 0.25 grating periods apart. The two proles show similar temporal waveforms shifted slightly in time.1

The time shift between the two proles is simply the time it takes for any part of the image to move through the distance between the two points at which the prole is measured. It can be calculated as the spatial separation between the two points divided by the velocity of the image motion.

Reichardt’s model of the insect motion-detector consists of two stages. The rst stage correlates signals from two neighboring receptors by multiplying the signal from one of them by a temporally ltered version of signal from its neighbor. The temporal lter introduces a time shift, so the multiplier’s output for any given

1There is a slight di erence in the prole caused by the fact that the moving grating does not have uniform contrast throughout the whole image but is spatially shaped by a stationary Gaussian window.

6 Seeing Motion

269

FIGURE 6 (a) Temporal proles of the stimulus in Figure 2 at two points 0.25 spatial periods apart.

(b) Elaborated Reichardt motion detector. Each side multiplies the output of one spatial lter by the delayed output from the other, thereby selecting stimuli moving in the direction from the delayed to the nondelayed lter. Consequently the two multipliers select opposite directions of motion. By taking the di erence between the time-averaged outputs of the two multipliers the detector produces an output that indicates the direction of motion by its sign.

moving image is greatest when velocity of the motion is such that the time shift within the temporal lter matches the time taken for the image to move the distance between the two receptors. Each rst stage has a symmetrical partner that di ers from it only in that it time-shifts the signal from the second receptor instead of the rst, in order to sense motion in the opposite direction. The second stage of the detector simply subtracts the signals of the two rst stages.

Reichardt’s motion detector has been developed in order to apply it to human psychophysics (van Santen & Sperling, 1984, 1985). Figure 6b shows an elaborated version that di ers from the original in that it has additional ltering stages. Spatial lters at the input restrict the range of spatial frequencies passed by the detector in order to prevent it from responding to high spatial frequencies moving in the opposite direction. Temporal lters at the output integrate the response over time. The spatial lters shown in Figure 6b are identical to each other and spatially o set. To prevent a response to high spatial frequency patterns moving in the opposite direction (spatial aliasing), the lters should remove spatial frequencies above 1/(2d ) cycles per degree, where d is the separation between the centers of the input receptive elds (van Santen & Sperling, 1984). An alternative way of arranging the lters is for the two input spatial lters to be superimposed on one another, but to give them spatial proles that di er in a way that causes a phase shift between them of/2 for all spatial frequencies in their passband (van Santen & Sperling, 1984).

270Andrew Derrington

2.Linear Spatiotemporal Frequency Filters

Watson and Ahumada (1985) took a di erent approach to designing a motionsensor. They sought to design a spatiotemporal frequency lter that was selective for the characteristic distribution of spatiotemporal frequency components into opposite quadrants, which is the signature of motion in a particular direction.

Their approach was very straightforward. The sensor achieved its selectivity for direction of motion by linearly combining the outputs of component lters that were nondirectional but that had di erent spatial and temporal phase characteristics. The component lters’ phase characteristics were designed so that signals representing the preferred direction of motion would give rise to outputs that reinforced each other, whereas those representing the opposite direction of motion would give rise to outputs that would cancel each other.

The main components of the sensor were a quadrature pair of spatial lters and a quadrature pair of temporal lters that were connected together so that signals took two parallel paths through the sensor. Each path contained one of the quadrature pair of spatial lters and one of the quadrature pair of temporal lters in series. The e ect was that spatiotemporal frequencies from adjacent quadrants, which represent opposite directions of motion, generate outputs from the two pathways through the sensor that are opposite in sign. Consequently, adding or subtracting the signals in the two pathways generates variants of the sensor that are identical except that they respond selectively to opposite directions of motion. All the components representing the nonpreferred direction of motion within each sensor are canceled by the addition or subtraction of the rst-stage lters.

Figure 7a and b shows plots of the sensitivity proles of spatial and temporal lters that can be combined to produce a motion sensor of the type devised by Watson and Ahumada (1985).2 The products of these spatial and temporal lters produce four di erent lters with spatiotemporal proles that are not oriented in space–time (Figure 7c–f). The lack of orientation in space–time means that the lters are not selective for either direction of movement, although they may be selective for high temporal frequency and thus respond better to moving images than to static images. However, the sums and di erences of the separable lters, shown in Figure 7 g–k, are oriented to the left or to the right in space time and consequently are selective for that direction of image motion.

The motion sensors shown in Figure 7 g–k are linear in spatial and temporal summation. An image sequence that represents motion in its preferred direction gives rise to the same total output as it would if it were played in reverse to give motion in the opposite direction. The di erence is that although the mean level is the same for both directions, motion in the preferred direction modulates the detector output more strongly. For example, sinusoidal gratings moving in di erent directions all give rise to sinusoidal outputs that have a mean level of zero, but the

2In fact the temporal lters used here are not a quadrature pair; they have the form proposed by Adelson and Bergen (1985).

FIGURE 7 Spatiotemporal ltering operations leading to linear motion sensors and motion-energy lters.

(a) Spatial lters for the front-end of a linear motion sensor. The two functions are an even-symmetric and an oddsymmetric Gabor function. (b) Two physiologically plausible temporal lters with di erent time-courses (Adelson & Bergen, 1985). (c,d,e,f ) Space–time plots of the spatiotemporal impulse responses of four spatiotemporal lters formed from the products of the spatial and temporal lters in a and b. None of these lters is oriented in space/time. (g,h,j,k) Space–time plots of the spatiotemporal impulse responses of linear motion sensors produced from the sums and di - erences of the lters c–f as indicated. These lters are oriented in space–time; g and j are selective for leftward motion, and h and k are selective for rightward motion. The pairs of lters selective for each direction of motion for spatial quadrature pairs. Consequently the summed squares of the outputs of the lters tuned to each direction of motion constitute the motion energy signal. The di erence between the two motion energy signals (not indicated) would be an opponent motion-energy signal.

272 Andrew Derrington

amplitude of the sinusoid varies with direction of motion, being greatest for motion in the sensor’s preferred direction and zero for motion in the opposite direction.

Watson and Ahumada (1985) exploit the oscillations in the output of their motion sensor to derive an estimate of image velocity. Because the sensor is selective for spatial frequency, the temporal frequency of the oscillations in its output in response to a spatially broad-band input depends on the speed of the input. Image velocity is estimated by metering the temporal frequency of the outputs of direc- tion-detectors tuned to di erent axes of motion and tting a sinusoid.

3. Motion Energy Filtering

The outputs of linear motion sensors can be combined to make a lter selective for oriented contrast energy in the s-t image (Adelson & Bergen, 1985) (shown in Figure 7). The motion energy is constructed by taking the outputs of quadrature pairs of motion sensors tuned to the same direction and speed of motion, squaring them, and summing them. The resulting direction-selective detector is nonlinear: it gives a nonoscillating response to a moving sinusoid. An optional nal stage of the motion energy lter (not shown in Figure 7) subtracts the responses of the symmetrical stages tuned to opposite directions of motion and gives an output whose sign signals the direction of motion.

The motion energy lter and the linear motion sensor contain very similar linear spatiotemporal lters that extract a direction-selective signal. The essential di erence between the detectors is in the nonlinear processing that follows the direction-selective ltering. Whereas the linear motion sensor uses the temporal frequency of variations in the output from the lter to extract a velocity signal, the energy detector squares and sums the outputs from di erent lters to produce a smoothly changing output. Van Santen and Sperling (1985) have shown that relatively minor modications make the Reichardt motion detector exactly equivalent to the opponent motion energy detector and have argued that the three motiondetecting schemes are equivalent; however, this argument ignores the stages of Watson and Ahumada’s model of motion sensing that measure velocity.

The motion energy lter gives no direct information about velocity. The lter’s output depends on contrast, on spatial frequency, and on temporal frequency; however, the e ects of contrast and spatial frequency can be discounted by comparing the output of two energy lters tuned to the same spatial frequency but to di erent temporal frequencies. Adelson and Bergen suggest that in order to estimate velocity the output of the motion energy lter could be divided by the output of a contrast energy lter tuned to zero temporal frequency.

4. Spatiotemporal Gradients

The spatial variations in luminance in a moving image give rise to temporal variations at di erent points in space when the image moves. If we make the assumption that all the temporal variation in a space–time image is caused by movement, it becomes possible to use the relation between spatial and temporal variations at

6 Seeing Motion

273

any position to infer the nature of the motion rather exactly. Figure 8 illustrates this for a simple luminance gradient in an image. As the gradient moves past a point in space, the luminance at that point rises at a rate that is proportional to the product of steepness of the spatial gradient and the velocity of translation. This can be expressed exactly in the form

Vx L/ x L/ t,

(2)

where Vx is the velocity along an axis x in the image, L is luminance at any point in the image, x is distance along the axis x in the image, and t is time. The velocity is given by dividing the temporal derivative by the spatial derivative

Vx ( L/ t)/( L/ x).

(3)

This approach has been used successfully to derive local velocity signals from a sequence of television images and to segment the image on the basis of the di erent velocities present (Fennema & Thompson, 1979). This approach, however, does su er from the problem that the expression for the velocity is a fraction with the spatial luminance gradient as its denominator.When the luminance gradient is small or zero, that is, in parts of the image where the luminance is spatially uniform, the velocity computation will be very noisy.

There are several possible solutions to this problem. The simplest, which was adopted by Fennema and Thompson and has also been incorporated into a specic biological model of motion detection (Marr & Ullman, 1981) is only to make the motion calculation at locations where the spatial luminance gradient has already been identied as being su ciently steep to produce a reliable result. An alternative is to include higher spatial derivatives in the velocity calculation, which has a stabilizing e ect because it is rare for all the derivatives to be zero simultaneously (Johnson et al., 1992).

5. Similarity between the Spatiotemporal Gradient Approach and Motion Energy Filtering

Although on the face of it the computation of velocity as the ratio of the local temporal and spatial luminance derivatives in an image seems very di erent from the ltering approach, in fact there are ways of expressing the two approaches that bring them

FIGURE 8 Spatial luminance gradient. When it moves to the left the luminance at any point on the gradient will decrease at a rate proportional to the velocity and vice versa.

Соседние файлы в папке Английские материалы