Ординатура / Офтальмология / Английские материалы / Computational Maps in the Visual Cortex_Miikkulainen_2005
.pdf
284 13 Understanding Perceptual Grouping: Contour Integration
0.8 |
|
|
|
|
|
|
|
100% |
Human: Accuracy (portion correct) |
|
|
|
|
|
Model |
|
|||
|
|
|
|
|
|
|
|||
0.7 |
|
|
|
|
|
Subject WSG |
|||
|
|
|
|
|
|
|
90% |
||
|
|
|
|
|
|
|
|
||
0.6 |
|
|
|
|
|
|
|
|
|
0.5 |
|
|
|
|
|
|
|
80% |
|
|
|
|
|
|
|
|
|
||
0.4 |
|
|
|
|
|
|
|
70% |
|
0.3 |
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
||
0.2 |
|
|
|
|
|
|
|
60% |
|
correlationModel:(r) |
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
||
0.1 |
0o |
10o |
20o |
30o |
40o |
50o |
60o |
70o |
|
Orientation jitter
Fig. 13.7. Contour integration performance in humans and in PGLISSOM. The model’s performance was measured as the average correlation coefficient between the MUA sequences in the salient contour, calculated over two trials, each with a different input example (left y axis). Human performance was measured as the percentage of correctly identified contours (right y axis; data by Geisler et al. 2001, root-mean-square (RMS) amplitude 12.5, fractal exponent 1.5, which is the closest match with the PGLISSOM input configuration). The x- axis is the orientation jitter in degrees, and the error bars indicate ±1 SEM in the model (no error measures were published for the human data). In both humans and the model, contour integration is robust up to 30◦, but quickly breaks down as the orientation jitter increases (the difference between 30◦ and 50◦ is significant with p < 10−4; the other differences are not significant with p > 0.1).
As described in Section 13.1.1, such contour integration performance is believed to depend on specific lateral connection patterns in the primary visual cortex. Next, the distributions of lateral connections in the model will be analyzed in order to demonstrate how they influence perceptual performance.
13.2.3 The Role of Lateral Connections
As was discussed in Section 11.5.3, lateral connections in PGLISSOM (as well as in the LISSOM orientation model) have two specific anatomical properties: (1) Strong connections exist between neurons with similar orientation preferences, and
(2) the connections extend along the direction matching the source neuron’s orientation preference. These properties allow the connections to encode specific local grouping functions, or association fields.
However, to understand the functional role of these connections in visual space (instead of cortical space), the relationships between the receptive fields of the connected neurons need to be examined. Which input features in a scene activate neurons that have strong lateral connections between them, and how strongly is a pair of input features bound together in the cortex through lateral connections? By comparing such connection statistics with human perceptual performance and natural scene statistics, it is possible to determine precisely what functional role the patchy lateral connections play in contour integration.
13.2 Contour Integration and Segmentation |
285 |
δ
φ
Reference
Fig. 13.8. Quantifying the spatial relationship between two receptive fields. For each pair of neurons connected with excitatory lateral connections, the afferent connection weights were examined to determine (1) the orientation preference of the neuron (shown as oriented bars), and (2) the location of the receptive field in retinal space (as the center of gravity of the afferent weight matrix). From these values, the direction φ, radial distance δ, and difference between orientation preferences θ between all pairs of neurons were calculated. Notice that these values define the spatial relationship between the two neurons in the retinal (or visual) space, not in the cortical space, and therefore allow comparing connectivity with edge-cooccurrence data. Such a comparison is presented in Figure 13.9. Adapted from Geisler et al. (2001).
Figure 13.8 illustrates the quantities that define the spatial relationship between a pair of receptive fields. These quantities were measured from all lateral excitatory connections in GMAP that remained after connection death. The results are summarized in Figure 13.9b. Two properties are evident in the plot: (1) The target receptive fields are most likely oriented along cocircular paths emanating from the center, and
(2) the most likely target locations form a bowtie-shaped flank along the horizontal axis. These results show that neurons with receptive fields falling upon a common smooth contour are most likely to be connected with lateral excitatory connections. Such a pattern closely matches the association field proposed by Field et al. (1993; Figure 13.2), thus suggesting that perceptual grouping rules can be implemented as actual patterns of lateral connections in the brain.
In fact, such connection patterns predict the contour integration performance of the previous section very well. Since receptive fields aligned on an arc with smaller curvatures are more likely to be connected, inputs with smaller orientation jitter would be more strongly bound together than those with large orientation jitter. The model therefore offers an explanation for the observed performance in terms of specific neural structures.
Furthermore, these functional statistics in the model are similar to the local Bayesian edge-cooccurrence statistics in natural images (Geisler et al. 2001). Figure 13.9a summarizes the likelihood that a pair of edges under configuration (φ, θ, δ) fall upon a common physical contour, such as a tree trunk, boulder boundary, etc. Such natural contours are also found likely to follow cocircular paths. As demonstrated by Geisler et al. (2001) the edge-cooccurrence patterns accurately predict human contour integration performance, which also indirectly explains why PGLISSOM accurately predicts human contour integration performance: Both humans and PGLISSOM are biased toward integration of natural contours.
286 13 Understanding Perceptual Grouping: Contour Integration
φ |
= 90 |
o |
Likelihood ratio |
|||||
|
||||||||
|
|
100 |
|
|
||||
|
|
|
|
|
||||
|
|
|
|
|
|
10 |
|
|
|
|
|
|
|
|
1 |
|
|
|
|
|
|
|
|
0.1 |
|
|
|
|
|
0.01 |
|
|
|||
|
|
|
|
|
φ |
= 0 |
o |
|
|
|
|
|
|
|
|
||
δ = 1.23 o
(a) Edge cooccurrence in nature
φ |
o Relative probability |
|||||
= 90 |
1 |
|
|
|||
|
|
|
|
|||
|
|
0.1 |
|
|
||
|
|
0.01 |
|
|
||
|
|
0.001 |
|
|
||
|
|
0.0001 |
|
|
||
|
|
|
|
φ |
o |
|
|
|
|
|
|
||
|
|
= 0 |
|
|||
δ = 27
(b) Lateral excitation in GMAP
Fig. 13.9. Edge cooccurrence in nature and long-range lateral connections in PGLISSOM. The distributions of excitatory lateral connections in the model are compared with the edge-cooccurrence statistics in nature to see how well they match perceptual requirements. (a) The Bayesian edge-cooccurrence statistics in natural images (Geisler et al. 2001; reprinted with permission, copyright 2001 by Elsevier). Each location in polar coordinates (φ, δ) contains a small round disk, representing the likelihood ratios of all possible orientations θ at direction φ and distance δ by color coding; the θ with the highest ratio is shown in the foreground (θ, φ, and δ are defined as in Figure 13.8). Each likelihood ratio represents the conditional probability that a pair of edge elements in configuration (θ, φ, δ) belongs to the same physical contour vs. different physical contours in natural images. The conditional probabilities were determined through manual labeling of contours in real world images. The most likely elements are aligned along cocircular paths emanating from the center. (b) The distributions of θ, φ, and δ for the lateral excitatory connections in GMAP (Choe and Miikkulainen 2004; reprinted with permission, copyright 2004 by Springer). Each location (φ, δ) displays two values: (1) The color scale in the background shows the relative log-probability of finding a target receptive field at that location, and (2) the black oriented bars represent the most probable orientation θ of the target receptive field at that location (not plotted for the weakest connections). The figure shows that neurons with receptive fields aligned on a common smooth contour are most likely to be connected with lateral excitatory connections. This distribution corresponds closely to the edge cooccurrence patterns in nature, suggesting that the model is well suited for encoding grouping relations in natural images.
13.2.4 Contour Segmentation
Importantly, the synchronization process that establishes the contour percept can also separate different contours to different percepts. The same self-organized network with the same simulation parameters as in Section 13.2.1 was used for the contour segmentation experiment. Two contours and three background elements were presented as input, and the correlations between elements within and across the contours, between the contour and the background, and within the background were calculated. The MUA sequences of the nine areas are shown in gray-scale coding in Figure 13.10. The bottom three rows (1 to 3) correspond to the diagonal contour, the
13.2 Contour Integration and Segmentation |
287 |
|
|
|
9 |
|
9 |
|
8 |
8 |
6 |
7 |
|
|
3 |
|
6 |
7 |
5 |
5 |
|
2 |
4 |
4 |
|
1 |
|
3 |
|
|
|
|
2 |
|
|
|
1 |
0 |
100 |
200 |
300 |
400 |
500 |
Fig. 13.10. Contour segmentation process. Input for the contour segmentation experiment consisted of two contours, diagonal and vertical, and three background elements. The same plotting conventions as in Figure 13.6 were used to illustrate the MUAs of the areas that responded to these inputs. The three bottom rows (1 to 3) correspond to the diagonal contour, the three middle rows (4 to 6) to the vertical contour, and the top three rows (7 to 9) to the background elements. The MUA sequences within each contour are synchronized. On the other hand, the MUA sequences of elements in different contours, of elements in the background, and of contour and background elements are desynchronized. In other words, the three areas representing the same contour fire together while the areas responding to the other contour and to the background are silent. Such an alternating activation of neuronal groups ensures that each coherent object is represented distinctly and not mixed with representations of other objects. An animated demo of this process can be seen at http://computationalmaps.org.
middle three rows (4 to 6) to the vertical contour, and the top three rows (7 to 9) to the background elements.
In the beginning, all areas are mostly synchronized, but as lateral interactions begin to take effect, the MUAs form two major groups firing in two alternating phases. The correlation coefficients of areas in the same contour are consistently high while those in different contours and in the background are low (Figure 13.11), signifying integration within each contour and segmentation across the contours. This result suggests that the same circuitry responsible for contour integration can also be responsible for segmentation between multiple contours.
PGLISSOM can segment up to about six contours this way. With more than six, representations for some objects will be synchronized instead of being desynchronized (a similar limitation was reported by Horn and Opher 1998 and Horn and Usher 1992). Over a longer period of observation, it may be possible to separate even more objects. Even if disjoint representations occasionally become synchronized, they do not stay in this state permanently. Synchrony is eventually broken, and another pair of representations that was previously desynchronized becomes synchronized. Therefore, even with a limited capacity for segmentation, a large number of objects can be segmented if the degree of synchrony is measured over a long period of time.
There is an interesting balance between segmentation and integration in the model. Segmentation cannot be made too strong, otherwise contour integration suffers. It turns out that with integration performance roughly comparable to that of humans, the system sometimes integrates when there is no contour. This behavior can explain how an interesting class of visual illusions, those based on edge-induced contour completion, may arise, as will be described next.
288 13 Understanding Perceptual Grouping: Contour Integration
|
1.0 |
|
|
|
|
|
0.8 |
|
|
|
|
(r) |
0.6 |
|
|
|
|
Correlation |
|
|
|
|
|
0.4 |
|
|
|
|
|
|
|
|
|
|
|
|
0.2 |
|
|
|
|
|
0.0 |
Within |
Across |
Contour |
BG |
|
|
||||
|
|
contours contours |
vs. BG |
vs. BG |
|
Fig. 13.11. Contour segmentation performance. The average correlation coefficients between two MUA sequences within the same contour, across different contours, between contour and background, and within the background are plotted, calculated over two trials. The error bars indicate ±1 SEM. The MUA sequences within the same contour are highly correlated, whole those belonging to different contours or the background are not (the difference is significant with p < 10−10). This result demonstrates quantitatively that neurons within each contour form a synchronized group, whereas neurons responding to different contours are desynchronized.
13.3 Contour Completion and Illusory Contours
As was discussed in Section 13.1.1, the same lateral interactions that implement contour integration could also underlie contour completion, i.e. filling in missing elements in a contour. Experiments with PGLISSOM strongly support this hypothesis. The model demonstrates that contour completion and the resulting illusory contours are a necessary side effect of the contour integration circuitry. In this section, the contour completion performance of PGLISSOM will be analyzed in detail, focusing on conditions under which completion occurs. This process gives rise to edge-induced illusory contours, and to a difference in detecting closed vs. open contours. These results suggest a possible mechanism for edge-induced illusory contours in V1. Similar mechanisms in V2 could be responsible for line-end-induced contours, as will be discussed in Section 17.2.12.
13.3.1 Method
The PGLISSOM network that was used to demonstrate contour integration and segmentation in Section 13.2 was tested in contour completion as well. The inputs included long contours with one element missing, and contours representing the edgedetected Kanizsa triangle (Figure 13.4). Because these inputs have more elements than those in Section 13.2, the elements tend to be closer than before. As a result, the radius of the MUA areas was reduced to five to avoid overlap.
As before, the network was activated for 500 iterations, and the MUA sequences for areas of GMAP representing the input contour elements and the gaps were measured. Each experiment consisted of two trials with the input positioned in a different location and orientation but with a similar structure.
13.3 Contour Completion and Illusory Contours |
289 |
5
5
44
33
22
1
1
0 |
100 |
200 |
300 |
400 |
500 |
(a) Contour completion
5
5
44
33
2 |
2 |
1 |
1 |
|
0 |
100 |
200 |
300 |
400 |
500 |
(b) Single edge
Fig. 13.12. Contour completion process. (a) The four contour elements in the input with a gap in the middle correspond to one side in the edge-detected Kanizsa triangle (the dashed oval in Figure 13.4). In the MUA plot, the four contour elements are shown in the bottom and the top (rows 1–2 and 4–5) and the gap in the middle (row 3). Even though there were no inputs in the middle, the cortical area representing the gap is activated, and the activations are synchronized with the other four MUA sequences. This behavior indicates that contour completion occurred and the gap is perceived as an illusory edge. (b) In the second experiment, the input consisted of two contour elements from only one side of the gap. The MUA sequence for the gap is silent (row 3), indicating that contour completion did not occur. Thus, both sides of the gap need to be stimulated for the gap to be perceived as an edge.
13.3.2 Contour Completion
To test basic contour completion, PGLISSOM was presented with a straight contour with a gap in the middle as shown in Figure 13.12a. Such a contour represents one side of the edge-detected Kanizsa triangle in Figure 13.4. To make sure the contour elements on one side of the gap do not alone activate the gap, an input consisting of only half the contour was also presented to the network (Figure 13.12b). The prediction was that the network would fill in the gap in the first stimulus, but not in the second.
For the contour completion input (Figure 13.12a), there indeed is a significant MUA sequence for the gap (row 3), and it is synchronized with the rest of the sequences (rows 1–2 and 4–5): The gap is perceived as part of the contour. In contrast, with the single-edge input (Figure 13.12b), the MUA sequence representing the gap (row 3) is silent, while the rest of the MUA sequences (rows 1 and 2) are active and synchronized. Thus, both sides of the gap need to be stimulated for the gap to be perceived as an edge. The same self-organized circuitry in PGLISSOM that is responsible for contour integration can therefore account for contour completion as well. The contributions of the different kinds of connections to this process are analyzed next.
290 13 Understanding Perceptual Grouping: Contour Integration
(a) Retinal activation |
(b) Afferent GMAP input |
Fig. 13.13. Afferent contribution in contour completion. The afferent contribution of the input in (a) to the GMAP activation is plotted in gray scale from white to black (low to high) in (b); the circles delineate the MUA areas as shown in Figure 13.12. The four areas corresponding to the four contour elements all receive strong afferent input. The center area, corresponding to the gap, receives weak afferent input, due to slight overlap with neighboring regions in the retina. However, as seen in Figures 13.14 and 13.15, it is not enough to activate its representation without a contribution from the lateral connections.
13.3.3 Afferent and Lateral Contributions
The filling in of gaps in the PGLISSOM model is to be expected, given that specific excitatory lateral connections project from the neighboring areas into the gap. However, it is also possible that afferent input is causing the completion. In animals and in the PGLISSOM model, receptive fields of neighboring areas in the cortex overlap. If the cortical area representing the gap receives enough afferent input from both sides around the gap, it can be activated the same way as the rest of the contour representations.
To check the amount of afferent input received by the gap, the net afferent contribution in GMAP was measured in the contour completion experiment (Figure 13.12a). A two-dimensional intensity plot (Figure 13.13) shows that the central area indeed receives some afferent input. Could such spurious afferent input be enough to activate the area representing the gap?
More generally, the question is whether the afferent contribution alone, or the lateral excitatory contribution alone, can cause the filling-in effect, or whether the phenomenon requires both kinds of contributions. To answer this question, two experiments were performed using the same method as in Section 13.3.2, with the single-contour input of Figure 13.12a. In the first experiment, the gap area received no afferent connections, and in the second there were no excitatory lateral connections.
The MUA sequences for the two experiments are shown in Figure 13.14. In both cases, the sequence representing the gap in the contour shows no activity at all, suggesting that contour completion did not occur in either case. For comparison, the average correlation coefficients in all three cases of lateral connectivity are shown in Figure 13.15. The correlation is high only when both afferent and lateral excitatory connections are included.
13.3 Contour Completion and Illusory Contours |
291 |
5
5
44
33
22
1
1
0 |
100 |
200 |
300 |
400 |
500 |
(a) No afferent connections to the gap area
5
5
44
33
2 |
2 |
1 |
1 |
|
0 |
100 |
200 |
300 |
400 |
500 |
(b) No lateral excitatory connections to the gap area
Fig. 13.14. Contour completion process with different kinds of connections. Networks without afferent connections to the gap area (a) and without lateral excitatory connections to this area (b) were tested in the contour completion task. In both cases, the MUA sequences for the four input contour elements (rows 1–2 and 4–5) are synchronized, whereas the sequences for the gap (row 3) are silent, suggesting that filling in did not occur. Contour completion therefore requires both kinds of connections.
Correlation (r)
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0 



Both Lateral Afferent
Fig. 13.15. Contour completion performance with different kinds of connections. The average correlation coefficients for the four MUA sequences representing the four input contour elements vs. the MUA sequence representing the gap are shown, calculated over two trials. Both afferent and excitatory connections are included in “Both”. In “Lateral”, the afferent connections are removed from the center, i.e. binding is due to excitatory lateral connections only. In “Afferent”, the excitatory lateral connections are removed from the center, and binding is based on afferent connections only. The plot shows that both afferent and excitatory contributions are necessary for contour completion (p < 10−7).
These results demonstrate that contour completion in PGLISSOM requires a contribution from both afferent and lateral excitatory connections. Such a condition can only occur when the input contour elements are aligned along a smooth path. The central receptive field is then partially activated by the input in the neighboring areas, and the cocircular projection of lateral connections amplify this activation above threshold. The next question is: Can this mechanism of contour completion be responsible for illusory contours as well?
292 13 Understanding Perceptual Grouping: Contour Integration
|
|
|
|
15 |
|
|
|
|
|
|
|
|
|
14 |
|
|
|
|
|
|
|
|
|
13 |
|
|
|
|
|
9 |
8 |
76 |
|
12 |
|
|
|
|
|
14 |
11 |
|
|
|
|
|
|||
1 |
2 |
5 |
|
10 |
|
|
|
|
|
|
|
9 |
|
|
|
|
|
||
|
3 4 |
|
|
|
|
|
|
||
10 |
13 |
8 |
|
|
|
|
|
||
|
|
|
7 |
|
|
|
|
|
|
|
|
12 |
15 |
6 |
|
|
|
|
|
|
11 |
|
|
5 |
|
|
|
|
|
|
|
|
4 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
|
|
|
|
|
|
|
|
|
2 |
|
|
|
|
|
|
|
|
|
1 |
|
|
|
|
|
|
|
|
|
0 |
100 |
200 |
300 |
400 |
500 |
|
|
|
|
|
(a) Complete triangle |
|
|
|
|
|
|
|
|
15 |
|
|
|
|
|
|
|
|
|
14 |
|
|
|
|
|
|
|
|
|
13 |
|
|
|
|
|
9 |
8 |
76 |
|
12 |
|
|
|
|
|
14 |
11 |
|
|
|
|
|
|||
1 |
2 |
5 |
|
10 |
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
|
9 |
|
|
|
|
|
||
|
3 4 |
|
|
|
|
|
|
||
10 |
13 |
8 |
|
|
|
|
|
||
|
|
|
7 |
|
|
|
|
|
|
|
|
12 |
15 |
6 |
|
|
|
|
|
|
11 |
|
|
5 |
|
|
|
|
|
|
|
|
4 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
|
|
|
|
|
|
|
|
|
2 |
|
|
|
|
|
|
|
|
|
1 |
|
|
|
|
|
|
|
|
|
0 |
100 |
200 |
300 |
400 |
500 |
(b) Incomplete triangle
Fig. 13.16. Contour completion process in the illusory triangle. Each element in the triangle is identified by a number 1 through to 9 counterclockwise from the top left vertex, with 2, 5, and 8 denoting the gaps. (a) A complete triangle with gaps in the middle of each side approximates the central triangular part of the edge-detected Kanizsa triangle (Figure 13.4). The MUAs corresponding to gaps are all active and synchronized with the other inputs. Overall, the synchronization of all nine inputs means that the system is perceiving a single coherent object (as also demonstrated quantitatively in Figure 13.18). (b) When one vertex (elements 6 and 7) is removed, areas representing gaps 5 and 8 become almost silent: The perception of a triangle disappears, as it does in the incomplete Kanizsa triangle (Figure 13.17). An animated demo of these examples can be seen at http://computationalmaps.org.
13.3.4 Completion of Illusory Contours
To test the model in perceiving illusory contours, a simplified illusory triangle, embedded in a background of six randomly oriented edges, was presented to the network (Figure 13.16a). This triangle has gaps in each of the three sides, approximating the edge-detected Kanizsa triangle (Figure 13.4) as well as possible with the small model retina and V1. The network was also tested with one vertex of the triangle removed (Figure 13.16b) to see whether both sides of the gaps are necessary for the illusion to appear. Figure 13.17 shows the actual images corresponding to these inputs. Otherwise the same simulation method was used as in the single-gap experiment.
As expected, all gaps of the complete triangle are activated and synchronized with the neighboring contour elements (Figure 13.16a). In contrast, in the incomplete triangle (Figure 13.16b) only gap 2 (in the left edge) is filled; gaps 5 and 8 (on the sides) are not. These results are consistent with those of the single contour (Figure 13.12a). However, what makes this experiment particularly interesting is that the three sides of the triangle are also synchronized. The sides constitute three
13.3 Contour Completion and Illusory Contours |
293 |
(a) Kanizsa triangle |
(b) One corner removed |
Fig. 13.17. Salience of complete vs. incomplete illusory triangles. The illusory object is vividly perceived in the complete Kanizsa triangle (a). However, when one corner is removed, this perception disappears (b).
|
0.6 |
|
|
|
|
|
|
0.5 |
|
|
|
|
|
(r) |
0.4 |
|
|
|
|
|
Correlation |
|
|
|
|
|
|
0.3 |
|
|
|
|
|
|
0.2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
0.1 |
|
|
|
|
|
|
0.0 |
Within |
Across |
Whole |
Contour |
BG |
|
|
|||||
|
|
|
|
|
vs. BG |
vs. BG |
Fig. 13.18. Contour completion performance in the illusory triangle. Each side of the triangle is represented by a group of three MUA sequences, and constitutes a separate contour. The average correlation coefficients were calculated over two trials for MUA sequences representing two elements within the same side, across different sides, anywhere in the whole triangle, one in a contour and the other in the background, and within the background. The elements in each side are strongly synchronized, but so are elements across different sides and in the whole triangle (the differences between “Within”, “Across”, and “Whole” are not significant with p > 0.1). Furthermore, the elements in the triangle are significantly more synchronized than contour and background elements, and elements in the background (p < 10−6). This result shows quantitatively that the three sides are perceived together as a single object.
independent contours with sharp angles between them, and based on the analysis in Section 13.2.4 would be expected to be desynchronized. However, as shown in Figure 13.18, all contour elements (within the same side, across different sides, and among the whole triangle) are highly correlated, suggesting that the network perceives only one object. How is such cross-contour synchronization possible?
At the vertices of the triangle, two contour elements with different orientation preference overlap. Since the afferent receptive fields in PGLISSOM are topologically organized, the two cortical areas responding to the two edges at the vertex are
