Ординатура / Офтальмология / Английские материалы / Computational Maps in the Visual Cortex_Miikkulainen_2005
.pdf
274 13 Understanding Perceptual Grouping: Contour Integration
Fig. 13.1. Demonstration of contour integration. Look briefly at the circular area fill with line segments above; a continuous contour from top left to the right and slightly down should immediately emerge. This process is called contour integration: A good continuation of contour elements leads to a vivid perception of a single object. This function is believed to arise automatically in the orientation map of V1, mediated by lateral connections (Section 13.1.1).
Psychophysical experiments (Elder and Goldberg 2002; Field et al. 1993; Geisler et al. 2001; McIlhagga and Mullen 1996; Pettet et al. 1998; see Hess, Hayes, and Field 2004 for a review) suggest that there exists a highly specific pattern of interactions among the contour elements. Such interactions allow contour elements of certain positions and orientations to be more visible than in others. For example, Field et al. (1993) conducted a series of experiments where each subject was told to find a contour of similarly oriented Gabor patterns embedded among randomly oriented Gabor patterns. Several factors affected the performance. The relative orientation of successive contour elements (i.e. orientation jitter) along the longest contour (the path) was the most important such a factor. When the orientation of successive contour elements differed more, the performance degraded. Other factors, such as distance between elements and phase difference between successive Gabor patterns, also affected performance but to a lesser extent.
Based upon these results, Field et al. suggested that local interactions between contour elements follow specific rules and form the basis for contour integration in humans. In other words, these constraints form a local association field that governs how differently oriented contour elements should interact to form a coherent group (Figure 13.2). An association field can be described with two rules: (1) contour elements positioned on a smooth path (Figure 13.2a) and (2) contour elements aligned collinearly along the path (Figure 13.2b) are more likely to be perceived as belonging to the same contour. This idea can be formalized as a mathematical theory, and extended to account for non-uniform variance in the association profile as well (Ben-Shahar and Zucker 2004).
Pettet et al. (1998) further confirmed that lateral interactions between neighboring contour elements follow well-defined constraints. They compared the performance of human subjects with a model consisting of fixed lateral interaction constraints similar to the association field, and found that the model explained psychophysical data very well. In particular, it was consistent with the earlier result of Kovacs and Julesz (1993) showing that closed contours were easier to detect than
13.1 Psychophysical and Computational Background |
275 |
(a) Position |
(b) Orientation |
Fig. 13.2. Association fields for contour integration. The circular disks with black and white oriented bars represent Gabor wavelets, which is the typical input contour element used in the study by Field et al. (1993). Whether the elements interact depends on two factors: (a) The elements’ positions have to be located on smooth (collinear or cocircular) contour, and (b) their orientations have to be parallel to the smooth contour. The solid lines indicate cases where integration occurs, and the dashed lines those where it does not. Adapted from Field et al. (1993).
open-ended contours. Pettet et al. reasoned that the missing lateral interaction between the two ends of the open-ended contour made it harder to perceive the contour, whereas reverberating lateral interaction along the closed loop made it easier.
Geisler et al. (2000, 2001) took a different approach in identifying the conditions that govern contour integration. Instead of proposing rules based on human performance, they extracted the rules from edge-cooccurrence statistics measured in natural images (Section 13.2.3). Edge-detected natural images were decomposed into outline figures, consisting of short oriented edges. The cooccurrence probability of each pair of edges belonging to the same physical contour in natural images was then calculated. Such edge-cooccurrence statistics (as also reported by Elder and Goldberg 2002; Kruger¨ and Worg¨otter¨ 2002; Sigman, Cecchi, Gilbert, and Magnasco 2001) turned out to be very similar to the lateral interaction rules proposed by Field et al. and Pettet et al. Furthermore, Geisler et al. devised a method of extracting contours using these cooccurrence statistics: Two edges are grouped together if the probability of the edges occurring in the given configuration exceeds a threshold. Larger groups are then formed transitively: A and C were grouped together if A and B can be grouped together and B and C can be grouped together. Geisler et al. showed that this method of grouping accurately predicts human performance. Thus, they showed that the statistical structure in the environment closely corresponds to human perceptual grouping.
Moreover, Geisler et al. found that the statistical model explains more of the perceptual performance than previously believed, including the difference between open and closed contours. First, when the complexity of the contours and the eccentricity of the edge elements was carefully controlled, the advantage of closed contours disappeared in most cases. Where it still existed, it was perfectly predicted by the statistical model (Tversky, Geisler, and Perry 2004). Therefore, closed contours may
276 13 Understanding Perceptual Grouping: Contour Integration
(a) Kanizsa triangle |
(b) Ehrenstein figure |
Fig. 13.3. Edge-induced vs. line-end-induced illusory contours. The figures illustrate the two mechanisms believed to underlie the process of forming illusory contours in the visual system: (a) The hovering white triangle is formed by edge inducers, i.e. by continuing the separate, collinearly aligned edges, and (b) the bright circle in the middle is composed by line-end inducers, i.e. by connecting the ends of the lines into a continuous circle.
be easier to perceive simply because each segment is a good continuation of a nearby segment in each direction, without any special reverberating mechanisms along the contour.
Contour integration takes place not only with disjoint line segments, but also with illusory contours, i.e. perceived boundaries where no luminance contrast actually exist. Illusory contours can be triggered by two types of stimulus configuration: edge inducers and line-end inducers. The Kanizsa triangle in Figure 13.3a is a representative of edge-induced illusory contours, where the contour forms collinearly with the inducing edges of the pacmen. The Ehrenstein pattern in Figure 13.3b is an example of line-end-induced illusory contours, where the boundary of the circle is orthogonal to the line ends near the center. Following the initial discovery by Schumann (1900), illusory contours were studied in depth by Ehrenstein (1941) and Kanizsa (1955). They have become an important subject in visual perception research as a test case in figure-ground separation, object recognition, and perceptual grouping in general (Lesher and Mingolla 1995; Petry and Meyer 1987).
Early on, there were two main theories of illusory contour perception: the bottom-up brightness theory and the top-down cognitive factor theory. Brightness theory maintained that illusory contours arise from a low-level mechanism that gives brightness to areas enclosed by illusory boundaries. On the other hand, cognitive theorists argued that illusory contours are purely a high-level cognitive phenomenon. However, more recent evidence suggests that neither of these theories can account for the full range of illusory contour phenomena. Illusory contours were discovered that arise from image configurations without subjective brightness, providing a counterexample to the brightness theory (Hoffman 1998; Kanizsa 1976; Parks 1980; Prazdny 1983). On the other hand, cells in V1 and V2 were found to respond to illusory contours, demonstrating that such contours can be processed early on, unlike what the cognitive theories suggested (Lee and Nguyen 2001; Peterhans, von der
13.1 Psychophysical and Computational Background |
277 |
Heydt, and Baumgartner 1986; Redies, Crook, and Creutzfeldt 1986; Sheth, Sharma, Rao, and Sur 1996; von der Heydt and Peterhans 1989). They arise at least partly based on the same mechanisms as ordinary contours, including lateral interactions in V1 and V2. Association fields and statistical models can potentially be used to explain much of illusory contour processing as well.
Another intriguing characteristic of contour integration is that it is stronger in certain parts of the visual field than in others. The two main divisions are: (1) The lower visual hemifield is better in illusory contour discrimination tasks than the upper visual hemifield (Rubin, Nakayama, and Shapley 1996), and (2) contour integration is stronger in the fovea than in the periphery (Hess and Dakin 1997; Nugent, Keswani, Woods, and Peli 2003).
Rubin et al. (1996) compared the performance of humans in discriminating the angle made by illusory contours in the lower vs. upper hemifield (see Figure 13.3a for an example stimulus). The pacman-like disks were rotated by small amounts so that the perceived square in the middle would look either thick (like a barrel) or thin (like an hour glass). Rubin et al. presented the inputs in either the lower or upper visual hemifields and measured the minimum amount of rotation (i.e. threshold) needed for the subject to reliably tell whether the input was thick or thin. The results showed that the threshold is much higher in the upper hemifield than in the lower hemifield, i.e. lower hemifield is more accurate in this task than the upper hemifield. Similar results are expected for contour integration tasks, although such experiments have not been carried out so far.
Along the same lines, Hess and Dakin (1997) and Nugent et al. (2003) investigated differences in contour integration performance in the fovea vs. the periphery. When line segments had a consistent phase (as in Figures 13.1 and 13.2), they found that, in the fovea, contour integration is accurate even for a relatively large orientation jitter, but quickly fails after a critical point. However, in the periphery, the accuracy decreases linearly as the orientation jitter increases. Hess and Dakin (1997) observed that such a linear decrease is predicted by a linear filter model (e.g. convolution with oriented Gabor filters), which does not require sophisticated lateral interactions. These results suggests that contour integration is a stronger process in the fovea than in the periphery.
How and why do such differences in performance occur? A likely explanation is that the cortical structures in these areas are different. Such differences could arise if the structures develop based on input-driven self-organization, and the areas receive different kinds of inputs during development.
The inputs can differ in two ways: due to passive environmental biases and due to active attentional biases, and there is evidence for both causes. Environmental biases seem to drive the upper vs. lower hemifield distinction. Because of gravity, objects tend to end up near the ground plane, making the input in the lower hemifield more frequent and complex. In animals with high dexterity such as monkeys, reaching for objects and manipulating them takes place mostly in the lower hemifield as well (Gibson 1950; Nakayama and Shimojo 1992; Previc 1990). Attentional biases, on the other hand, may be responsible for the differences between fovea vs. periphery. For example, Reinagel and Zador (1999) presented natural images to humans and
278 13 Understanding Perceptual Grouping: Contour Integration
gathered statistics about the locations in the image to which the human attended, by tracking eye movements. They found out that human gaze most often falls upon areas with high contrast and low pixel correlation. Since the attended areas mostly project to the fovea, the statistical properties will differ in the fovea vs. the periphery. Such evidence suggests that the input statistics can differ in different areas of the visual cortex, which may in turn lead to different development and different performance in these areas.
In summary, psychophysical results suggest that specific lateral interactions are crucial for contour integration. As we saw in the previous chapter, this is exactly what input-driven self-organization captures in the PGLISSOM model. Thus, computational models like PGLISSOM can show how these three observations are related: Human contour integration performance arises from specific neurobiological structures, derived from input statistics through self-organization. So far, PGLISSOM is the only model of contour integration that includes the self-organization component; the other models are reviewed next.
13.1.2 Computational Models
Several neural network models of contour integration have been developed, showing that specific lateral interactions are sufficient to account for the phenomena outlined above (Finkel and Edelman 1989; Grossberg and Mingolla 1985; Hugues, Guilleux, and Rochel 2002; Li 1998, 1999; Peterhans et al. 1986; Ullman 1976; VanRullen, Delorme, and Thorpe 2001; Wersing, Steil, and Ritter 2001; Yen and Finkel 1997, 1998). The models are able to detect and enhance smooth contours of oriented Gabor patterns embedded in a background of randomly oriented Gabor patterns (as in Figure 13.1), contours in natural images, and various illusory contours.
All these models use fixed formulas in determining the lateral interactions, and these interactions are similar to the association fields. For example in Yen and Finkel’s (1997; 1998) coupled oscillator model, the units were connected with longrange lateral excitatory connections. The magnitude and time course of the synaptic interactions depended upon the position and orientation of the connected units. Excitatory connections were confined within two regions. One fanned out cocircularly around the preferred orientation axis of the central unit. The other extended out transaxially to a smaller area flanking the preferred orientation. Inhibitory connections linked to the rest of the surrounding neurons that did not receive excitatory connections. The connection strengths had a Gaussian profile, with the peak at the central unit. Through synchronization of the coupled oscillators, the model was able to predict human contour integration performance, showing that specific lateral interactions can be responsible for contour integration.
In Li’s (1998; 1999) approach, the excitatory and inhibitory interactions were defined by fixed rules derived from specific constraints, as follows: (1) The system should not activate spontaneously, (2) neurons at a region border should respond strongly, and (3) the same neural circuit should enhance contours. Coupled oscillators were used to describe the dynamics of the orientation-selective cells, and
13.1 Psychophysical and Computational Background |
279 |
Fig. 13.4. Contour completion across edge inducers. After the retinal and thalamic preprocessing, the inputs received by the primary visual cortex are essentially edge-detected versions of the original images, as shown here for the Kanizsa triangle of Figure 13.3a. Each of the three sides (e.g. the one outlined with a dashed oval) has a gap (indicated by an arrow). The triangle boundary can be generated by filling in the gap through contour completion. Contour completion is therefore a possible mechanism underlying edge-induced illusory contours.
mean-field techniques and dynamic stability analysis were used to calculate the lateral connection strengths and the connectivity pattern according to these three constraints. The resulting lateral connection strengths were very similar to those of Yen and Finkel’s, except there was no transaxial excitation; instead, areas flanking the center received specific inhibitory connections. The model also predicted contour integration performance well, and again showed that specific lateral connections can accurately predict human contour integration performance.
The first neural network model of illusory contours was based on edge inducers, and illusory contours were processed through contour completion (Ullman 1976). Figure 13.4 gives an example of this process. However, subsequent models were based on line-end inducers, and included edge inducers only as a special case. In these models, the corners where the edges meet (e.g. the throats of the pacmen in Figure 13.3a) and the tip of convex angles (e.g. the lips of the pacmen) constituted line ends, connected to obtain the illusory contour (Finkel and Edelman 1989; Peterhans et al. 1986). However, models strictly based on line-end inducers cannot account for psychophysical results where increasing the length of the inducing edge causes the illusory contour to become clearer (Shipley and Kellman 1992). Thus, neither model can account for both types of illusory contours.
For this reason, the model developed by Grossberg and Mingolla (1985) included both types of inducers. In the first stage, boundaries were formed orthogonally to the line-end inducers, and in the second stage collinearly to both line-end and edge inducers. In the second stage, the neurons had bipole (i.e. bowtie shaped) receptive fields similar to those found in V2 (von der Heydt and Peterhans 1989), combining the responses of two first-stage neurons. These neurons activated only when stimuli were present on both lobes of the receptive field, thereby binding the elements of the contour together. The neurons were connected into a systematic pattern by hand,
280 13 Understanding Perceptual Grouping: Contour Integration
and through equilibrium values the resulting network successfully accounted for illusory contour perception where contour completion had an important role (Gove, Grossberg, and Mingolla 1993; Grossberg 1999; Grossberg et al. 1997; Grossberg and Williamson 2001; Ross, Grossberg, and Mingolla 2000).
Although the models described above have been successfully applied to explain experimental data, several important questions remain. Most importantly, how does the brain construct the circuitry required for contour integration and segmentation? As mentioned in Section 13.1.1, it is possible that statistical regularities in the visual environment drive a self-organizing process that results in such a circuitry. Demonstrating this process computationally is the main goal of this chapter.
Second, can the connections that implement contour integration produce illusory contours as a side effect? They are involved in binding line segments into a coherent representation across gaps, which could result in illusory contours in the extreme. This hypothesis is viable especially about edge-induced contours, which appear to occur early in the visual hierarchy. The hypothesis will be tested in this chapter using PGLISSOM as the computational platform; an extension to line-end-induced contours will be outlined in Section 17.2.12.
Third, can the adapting lateral interactions also account for the differences in human contour integration performance across the visual field? If the input distribution varies in the different parts of the visual field, the corresponding lateral connection patterns will be different, leading to different perceptual performance. This process can be modeled by the adaptive lateral connections in PGLISSOM, as will be done in the third main subsection of this chapter.
13.2 Contour Integration and Segmentation
This section will show how specific, self-organized lateral connections combined with synchronized activities can account for human contour integration performance. In addition, equally salient contours can be segmented via desynchronized activity, making PGLISSOM a unified model of binding and segmentation in the visual cortex.
13.2.1 Method
The contour integration experiments were run on the PGLISSOM network described in Section 11.5. The long-range lateral excitatory connections in the GMAP of this network are patchy, forming a substrate for binding. The inhibitory lateral connections are broad and have a long range, providing a baseline similar to global inhibition in other cortical models (Eckhorn et al. 1988; Terman and Wang 1995; von der Malsburg and Buhmann 1992; Wang 1995, 1996, 2000). Such inhibition allows input elements to be segmented by default, unless lateral excitation binds them together. To establish robust grouping in the self-organized network, the GMAP neurons were made more sensitive to excitation and inhibition, noise was added to their synapses,
13.2 Contour Integration and Segmentation |
281 |
|
|
|
|
|
|
(a) Retinal activation |
|
|
(b) GMAP response areas |
||
Fig. 13.5. Measuring local response as multi-unit activity. (a) An example contour integration input, consisting of nine contour elements with different positions and orientations on the retina. Each element is an oriented Gaussian of length σa = 1.9 and width σb = 1.2. The firing rates of the retinal receptors are set according to these Gaussian values, plotted in gray scale from white to black (low to high). (b) The resulting activations of the GMAP neurons, measured as a leaky average firing rate with a 0.92 decay rate. The circles indicate areas where separate MUA values are measured. Each area is centered on a neuron whose receptive field is centered on one of the contour elements and whose orientation preference is the same as the element’s orientation; due to local distortions in the retinotopic mapping, the circle’s center in V1 is sometimes slightly displaced from the element’s center on the retina. A few neurons outside the circles are also activated, driven by simultaneous input from two different contour elements. The circle radius is chosen such that the spurious activation is not included in the MUA measurements.
and the absolute refractory period was increased, as discussed in Chapter 12 (and described in detail in Appendix D.2).
In each experiment, two input examples with different arrangements of contour elements were tested in separate trials. Each contour element was an oriented Gaussian of length σa = 1.9 and width σb = 1.2, i.e. short enough to fit into a single receptive field (of radius 6). The patterns were generated to approximate contour integration inputs in human experiments as well as possible within the small model retina and V1 (Appendix D.2). The inputs were presented to the trained network for 500 iterations each, allowing the neurons to spike 50 times on average. Assuming that the firing rate is about 40 Hz (the γ-frequency band; Section 2.4.2), the simulations correspond approximately to 1.5 seconds in real time. The performance of the network was measured in GMAP, since it is the component that drives the grouping behavior in the model and therefore establishes its final output.
For each contour element, a distinct area in GMAP becomes activated (Figure 13.5). To measure the performance of the model, the degree of synchrony between these areas was calculated. The number of spikes in each area was counted at each time step. As briefly mentioned in Section 2.4.3, this quantity is called the multi-unit activity of the response, or MUA, and it measures the collective activity of a population of neurons (Eckhorn et al. 1988, 1990). MUAs over the 500 iterations, called MUA sequences, can be used to demonstrate qualitatively the process of contour integration for a given set of inputs.
282 13 Understanding Perceptual Grouping: Contour Integration
In order to measure integration and segmentation performance quantitatively, the degree of synchronization between two areas can be measured using the linear correlation coefficient, or Pearson’s r. That is, the product of the deviations from the mean of each MUA sequence is accumulated over time and normalized by the product of
their variances: |
|
|
¯ |
¯ |
|
|
|
|
|
|
|
|
|
|
|||
r = |
|
i(Xi − X )(Yi − Y ) |
|
, |
(13.1) |
|||
|
¯ |
2 |
|
¯ |
2 |
|||
|
|
|
|
|||||
|
|
i(Xi − X ) |
|
|
i(Yi − Y ) |
|
|
|
where Xi and Yi are the MUA values at time i for the two areas representing the
¯ ¯
two different objects in the scene, and X and Y are the mean MUA values of each
sequence. The correlation coefficients between all possible pairs of MUAs are calculated: The higher the correlations, the more synchronized the areas are, thus representing a strong percept of a contour. The average correlation within the contour was compared with correlation between elements across contours and with background elements to establish how well the network integrates and segments the input elements.
The same method was used in all experiments in this chapter. Contour integration performance will be described first, followed by contour segmentation and contour completion. The effect of input distributions will be analyzed in the last section.
13.2.2 Contour Integration
As was discussed in Section 13.1.1, contour integration accuracy in humans is maximal when orientation jitter is 0◦, and the accuracy decreases as a function of increasing orientation jitter. To test whether PGLISSOM exhibits similar behavior, four contour integration experiments were carried out, with 0◦, 30◦, 50◦, and 70◦ of orientation jitter.
Figure 13.6 shows the MUA sequences for these four simulations. There are nine rows in each plot, corresponding to the nine contour elements in the input. The bottom three rows (rows 1 to 3) represent the three contour elements constituting the salient contour. How well these rows are synchronized compared with the rows representing the background elements determines how strongly the network perceives the contour. As expected, for 0◦ orientation jitter the three bottom MUA sequences are synchronized (Figure 13.6a), but as the orientation jitter increases (Figure 13.6b– d), the synchronized state is more difficult to maintain and the phases tend to shift back and forth.
This process can be quantified using the linear correlation coefficient r. The results are summarized in Figure 13.7, together with the human performance data from Geisler et al. (2001). The plot shows that at low orientation jitter, the model and humans both recognize the contours reliably, but as orientation jitter becomes larger, they both become less accurate in a similar manner. Correlation coefficients between MUA pairs corresponding to two background contour elements, or pairs between a background and a contour element in the salient contour, were usually less than 0.1 (i.e. not perceptually salient), except in rare cases where the jitter caused them to line up accidentally.
13.2 Contour Integration and Segmentation |
283 |
9
8 3 6
7 |
|
2 |
1 |
5 |
|
|
4 |
|
|
|
98
73
6 |
2 |
1 |
4 5 |
9
8
7 2 3
1
4 5 6
9 |
3 |
6 |
|
2 |
|||
8 |
5 |
||
1 |
4 |
7 |
|
|
9
8
7
6
5
4
3
2
1
0 |
100 |
200 |
300 |
400 |
500 |
|
(a) 0◦ of orientation jitter |
|
|
|
|
9
8
7
6
5
4
3
2
1
0 |
100 |
200 |
300 |
400 |
500 |
(b) 30◦ of orientation jitter
9
8
7
6
5
4
3
2
1
0 |
100 |
200 |
300 |
400 |
500 |
(c) 50◦ of orientation jitter
9
8
7
6
5
4
3
2
1
0 |
100 |
200 |
300 |
400 |
500 |
|
(d) 70◦ of orientation jitter |
|
|
|
|
Fig. 13.6. Contour integration process with varying degrees of orientation jitter. In each subfigure, the input presented to the network is shown at left, the areas in GMAP where MUA was measured in the middle, and the resulting MUA plot at right. Each contour was composed of three contour elements, and embedded in a background of six randomly oriented elements. Each contour runs diagonally from lower left to top right with varying degrees of orientation jitter. The MUA of each area is plotted in gray scale from white (no neurons firing in the area at this time step) to black (all neurons firing). Time (i.e. simulation iteration) is on the x-axis and the y-axis consists of nine rows, each plotting the MUA of the area labeled with the row number. The three bottom rows (1 to 3) represent the MUAs of the salient contour, and the six top rows (4 to 9) the MUAs of the background elements. The contour is very strongly synchronized for 0◦ and 30◦ but relatively weakly synchronized for 50◦ and 70◦ of orientation jitter: The contours get harder to detect as the jitter increases. In all cases (a to d), the background MUAs are unsynchronized. A quantitative summary of these results is shown in Figure 13.7, and an animated demo can be seen at http://computationalmaps.org.
