Добавил:
kiopkiopkiop18@yandex.ru t.me/Prokururor I Вовсе не секретарь, но почту проверяю Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Ординатура / Офтальмология / Английские материалы / Computational Maps in the Visual Cortex_Miikkulainen_2005

.pdf
Скачиваний:
0
Добавлен:
28.03.2026
Размер:
16.12 Mб
Скачать

294 13 Understanding Perceptual Grouping: Contour Integration

close by on the map. As shown in Figure 11.5, the excitatory connections not only connect to neurons with similar orientation preferences, but at a close range also to those with somewhat different orientation preferences. Thus, proximity of the inputs, as well as the good continuation of contours, determines the degree of synchronization. At the vertices, the two abutting inputs cause the corresponding cortical areas to synchronize, which in turn causes the three sides of the triangle to synchronize. As a result, the network represents the whole triangle as a coherent object.

These results show that PGLISSOM indeed performs contour completion, and also forms representations for whole objects. Inputs can be grouped through proximity as well as through good continuation. Such mechanisms arise automatically from the properties of afferent and lateral connections in the model, and may form a general principle for grouping (ordinary and illusory) in the visual system.

13.3.5 Salience of Closed Versus Open Contours

With a thorough understanding of how contour completion occurs in the model, let us now return to the psychophysical observation that closed contours are easier to detect than open contours (Kovacs and Julesz 1993; Pettet et al. 1998; Tversky et al. 2004). As discussed in Section 13.1.1, the most recent evidence suggests that there is no special reverberatory mechanism around the closed contour: The advantage is due to proximity and good continuation between elements (Tversky et al. 2004).

While it is difficult to replicate the control conditions in a small retina of PGLISSOM, the illusory triangle experiment in Section 13.3.2 can be used to test the fundamental principle of this theory computationally. The complete and incomplete triangles in Figure 13.16 form closed and open contours. Indeed, the perception of an illusory triangle breaks when one component is removed from the Kanizsa triangle (Figure 13.17).

To measure how salient the two objects are, the average correlation coefficients between the nine elements of the complete illusory triangle (elements 1 through to 9) and seven elements of the incomplete triangle (1–5 and 8–9) were calculated. The results are shown in Figure 13.19. The activities in the network for the closed contour are significantly more synchronized than those of the open contour, indicating that the closed contour is more salient.

The explanation for this effect in PGLISSOM is straightforward. Every part of the closed contour receives excitatory lateral contribution from both neighboring areas, and strong synchronization results along the contour. In contrast, at the two ends of an open contour the neurons only receive lateral excitation from one neighboring area, and the synchrony does not reach the same level of salience.

In this simple experiment, PGLISSOM has provided an independent computational confirmation of the current psychophysical theory on closed vs. open contours. The difference arises from local interactions based on proximity and good continuation, without a separate reverberatory mechanism. In the future, larger PGLISSOM models can be used to replicate the actual conditions in human experiments, leading to more detailed predictions and insights into this phenomenon.

13.4 Influence of Input Distribution on Anatomy and Performance

295

 

0.6

 

 

 

 

 

 

 

0.5

 

 

 

 

 

 

(r)

0.4

 

 

 

 

 

 

Correlation

 

 

 

 

 

 

0.3

 

 

 

 

 

 

0.2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0.1

 

 

 

 

 

 

 

0.0

Within

Within

Across

Across

Whole

Whole

 

 

 

 

closed

open

closed

open

closed

open

Fig. 13.19. Contour completion performance in closed vs. open contours. The average correlation coefficients between the MUA sequences in the complete and incomplete triangles (labeled “closed” and “open”) of Figure 13.16 are shown, calculated over two trials and different groups of contour elements. Although the differences in correlation within each side are not significant (p > 0.2), the elements in the closed contour are significantly more correlated across the sides and within the whole object than in the open contour (p < 0.004), consistent with psychophysical results (Section 13.1.1). The correlations between contour and background elements and within the background were significantly weaker in both cases (p < 0.03), indicating that both contours were perceived as single objects. The PGLISSOM model therefore provides independent computational support to the theory that closed contours are salient because of proximity and good continuation rather than a special reverberatory mechanism.

13.4 Influence of Input Distribution on Anatomy and Performance

In Section 13.2 we saw how lateral connectivity plays a central role in contour integration in PGLISSOM. Because these connections are learned through input-driven self-organization, different input distributions during development result in different anatomy and performance. The self-organizing process can therefore potentially account for the observed differences in human visual performance across different parts of the visual field, such as upper vs. lower hemifield and fovea vs. periphery (Section 13.1.1). This hypothesis can be tested computationally by training PGLISSOM with inputs of varying frequency and complexity.

13.4.1 Method

The visual inputs that the cortex receives during training may vary in several ways. For example, inputs in the fovea and lower hemifield may be more frequent, shorter, sharper, curved, textured, or have higher contrast. As was discussed in Section 13.1.1, distributions of these features across the visual field have not yet been fully characterized, and it would be somewhat premature to test the model with a selection of such variations. However, at a more abstract level, two distinctly different dimensions of variation can be identified: (1) the amount of training each area receives,

296 13 Understanding Perceptual Grouping: Contour Integration

and (2) the complexity of the training inputs it sees. In this section, these dimensions, represented by input frequency and curvature, are varied systematically, leading to verifiable predictions about the resulting anatomy and performance.

In the frequency experiment, two PGLISSOM networks trained with different input presentation frequencies were compared. The first one was similar to the contour integration network described in Section 13.2, i.e. trained with single randomly located and oriented elongated Gaussians. The second one was otherwise the same, except every other input presentation was skipped during training. The simulation parameters (excitatory radius, learning rate, thresholds, and connection death) were adapted according to the same schedule as before, modeling maturation based on time and trophic factors (Sections 4.4.3 and 16.1.6). In other words, the second network received input half as frequently as the first network during its maturation.

In the curvature experiment, each training input consisted of three short elongated Gaussian bars that together formed a smooth contour, located and oriented randomly on the retina. The angles between the bars were changed to adjust the curvature of the input. For the first network, the training inputs had uniformly randomly distributed curvature in the range [0o..25o], and for the second network in the range [0o..10o].

Over the course of self-organization, the input Gaussians were slightly broader and became elongated slightly slower than those used in Section 11.5. As a result, the learning was slower but the resulting network performed more robustly on the wider variety of inputs (Appendix D.2). Other than the difference in input distributions, all PGLISSOM networks were trained as in Section 11.5. After training, the afferent and lateral connectivity patterns and contour integration performance of each network were measured and the differences analyzed as in Sections 11.5 and 13.2, as will be described next.

13.4.2 Differences in Connection Patterns

All four simulations resulted in a similar map of orientation preferences, matching the results of previous self-organization experiments (Section 11.5). Two interesting observations can be made based on orientation selectivity, as shown in the distributions in Figure 13.20. First, neurons in the more frequently stimulated network were more selective than those in the less frequently stimulated one (Figure 13.20a), suggesting that its initial responses are sparser but stronger for specific inputs. Second, the networks trained with different curvature are equally selective (Figure 13.20b), suggesting that any performance differences are likely to be due to lateral connections.

To uncover any differences between the resulting lateral connection patterns, the (φ, θ, δ) statistics were calculated on the four networks as in Section 13.2.3. In the frequency experiment, two major differences emerged: (1) The high probability areas extend out longer in the high-frequency network (Figure 13.21a) than in the lowfrequency network (Figure 13.21b), i.e. the network with more frequent exposure to oriented edges can group together more distant inputs. (2) The most probable θ for a given (φ, δ) location tends to be cocircular in the high-frequency network, whereas

13.4 Influence of Input Distribution on Anatomy and Performance

297

 

 

 

100% frequency

 

 

 

50%

 

 

50% frequency

 

40%

 

 

 

 

 

 

of neurons

40%

 

 

 

of neurons

30%

 

 

 

 

30%

 

 

 

 

Portion

20%

 

 

 

Portion

20%

 

 

 

 

 

 

 

 

 

 

10%

 

 

 

 

10%

 

 

 

 

 

 

 

0%

 

 

 

 

0%

 

0.0

0.1

0.2

0.3

 

0.0

Orientation selectivity

[0o..25o] curvature [0o..10o] curvature

0.1

0.2

0.3

Orientation selectivity

 

(a) Frequency experiments

(b) Curvature experiments

Fig. 13.20. Orientation selectivity in SMAP with different input distributions. For each of the four networks, the selectivity of neurons in the SMAP was measured (as described in Appendix G.1.3) and plotted as a histogram; GMAP selectivities were similar and are not shown. (a) The histogram for the 100% presentation frequency peaks at around 0.32, and that of the 50% frequency around 0.12, suggesting that the responses of the high-frequency network are sparser but stronger for specific inputs. (b) The histogram for the high curvature range [0..25] and the low curvature range [0..10] are almost identical. Given that the orientation preferences were also almost identical, any differences in their performance are likely to be due to the lateral connections.

in the low-frequency network it is more collinear (i.e. the black edges in the high probability areas are more parallel).

As we saw in Section 11.5.3, collinearity is the most prominent feature in the input, and is therefore learned more reliably. With extensive training, it is extended to large distances, as happened with the high-frequency network. Cocircularity develops more slowly than collinearity because the network responds less strongly in the cocircular arrangement. The high-frequency map had enough input presentations and was able to learn the secondary (cocircularity) property of the input as well.

In the curvature experiment, high probability areas (red and orange) along the horizontal axis are broader in the map trained with a broader range of curvatures (Figure 13.22a) compared with the one trained with a narrower range (Figure 13.22b). As expected, the input-driven self-organizing process has encoded the input distribution differences into the lateral connections. As a result, the map with exposure to higher curvature should be better at integrating cocircular contours.

In summary, differences in the input distribution, whether presentation frequency or complexity of inputs, result in specific, predictable differences in the afferent and lateral connection patterns. Such a difference in structure predicts that contour integration performance will also differ in these networks, as will be tested in the next section.

298 13 Understanding Perceptual Grouping: Contour Integration

φ

= 90o Relative probability

1

 

 

0.1

 

0.01

 

0.001

 

0.0001

 

 

 

φ

= 0

o

 

 

 

 

 

φ

= 90o Relative probability

1

0.1

0.01

0.001

0.0001 φ = 0o

δ = 27

(a) 100% presentation frequency

δ = 27

(b) 50% presentation frequency

Fig. 13.21. Lateral excitatory connections in GMAP with different input frequencies. The connection probability distributions are displayed the same way as in Figure 13.9. As before, only GMAP is shown because it is responsible for contour integration in the model. The lateral connection profiles differ in two subtle ways: (1) The high probability areas (red and yellow) extend longer in the high-frequency map (a) than in the low-frequency map (b) (three vs. two rings of high probability). (2) The most probable θ (black oriented bars) are cocircular in (a), but mostly collinear in (b) (as seen e.g. in the second ring from the outside). These results predict that contours should be easier to detect in the high-frequency network.

φ

(a) [0o..25o]

= 90o Relative probability

1

0.1

0.01

0.001 φ = 0o

δ = 27 curvature range

φ

(b) [0o..10o]

= 90o Relative probability

1

0.1

0.01

0.001 φ = 0o

δ = 27 curvature range

Fig. 13.22. Lateral excitatory connections in GMAP with different curvature ranges. The network trained with a broader range of curvatures (a) has broader areas of high probability connections (red and yellow) than the network trained with a narrower range (b). As a result, contours with more curvature and higher orientation jitter should be easier to detect in network (a) than in (b).

13.5 Discussion

299

13.4.3 Differences in Contour Integration

For each of the four networks trained in Section 13.4.1, two contour integration experiments were performed, with orientation jitters 0and 40. The 40test case was chosen because contour integration performance in both humans and the model degrades most rapidly at around 40(Figure 13.7), making the differences due the input distributions most clearly visible. The method described in Section 13.2.1 was used for all experiments. Figures 13.23 and 13.24 display the MUA sequences in each case. The MUAs are significantly more synchronized for the high-frequency network than for the low-frequency one when the orientation jitter is the same (compare Figure 13.23a vs. b and c vs. d). For the networks trained with different curvature range, the degree of synchrony was similar for 0orientation jitter (Figure 13.24a vs. b), but the network trained with a broader range was significantly more synchronized in the 40case (Figure 13.24c vs. d). The correlation coefficients between the MUA sequences confirm these observations (Figure 13.25). Frequency makes a difference both in perceiving collinear and cocircular contours, whereas curvature matters only with the cocircular ones.

Such performance differences are predicted by the afferent and lateral connection patterns described in the previous section. Each of the four networks has lateral connections that can group collinear contours, so any difference in performance with 0orientation jitter must be due to the afferent connections. The neurons in the highfrequency network have more selective afferent connections, and therefore activate and synchronize more strongly for inputs that match their preferences. On the other hand, the afferent weights do not differ significantly in the curvature experiment, and neither does the performance of the two networks in the 0case. In contrast, with 40of jitter, the shape of the lateral connections makes a big difference. Each neighboring pair of contour elements is aligned on a cocircular path, and integration requires cocircular connections. Because the lateral connections in the high-frequency network and the high-curvature network are more cocircular, they can detect such contours with high orientation jitter much better than the low-frequency and lowcurvature networks.

In summary, differences in the input distribution, even as simple as presentation frequency or curvature, can change how the maps are organized, which in turn can affect performance in contour integration. Such differences in structure and function are due to the input-driven nature of self-organization. This principle provides a possible developmental explanation for the differences in contour integration performance across different areas of the visual field found in psychophysical experiments.

13.5 Discussion

The results in this chapter suggest that contour integration, segmentation, and completion can be due to synchronization mediated by self-organized afferent and lateral connections, and may form a general principle for grouping (ordinary and illusory) in the visual system.

300 13 Understanding Perceptual Grouping: Contour Integration

 

 

 

 

 

9

 

 

 

 

 

8

 

9

 

 

 

7

 

 

3

6

6

 

 

2

5

8

 

 

 

1

5

 

4

 

 

 

 

 

4

7

3

 

 

 

 

 

 

 

 

2

 

 

 

 

 

1

0

100

200

300

400

500

(a) 100% frequency: 0orientation jitter

6 9

3

2 5 8

1 4 7

9

8

7

6

5

4

3

2

1

0

100

200

300

400

500

(b) 50% frequency: 0orientation jitter

8

9

7

 

6

5

3

2

1

 

4

 

 

9

8

7

6

5

4

3

2

1

0

100

200

300

400

500

(c) 100% frequency: 40orientation jitter

93

8

2

6

7

1

5

 

4

 

9

8

7

6

5

4

3

2

1

0

100

200

300

400

500

(d) 50% frequency: 40orientation jitter

Fig. 13.23. Contour integration process with different input frequencies. In each MUA plot, the three bottom rows correspond to the MUA sequences for the three contour elements in the input and the rest correspond to background elements. For the same degree of orientation jitter (0or 40), the more frequently trained network is more strongly synchronized (a vs. b; c vs. d).

It may be possible to verify the synchronization hypothesis experimentally in the near future (Section 16.3.1). Meanwhile, the hypothesis is consistent with existing data on how temporal coding affects performance. Lee and Blake (2001) augmented the usual contour integration input with a temporal cues such as periodic flashing

13.5 Discussion

301

9

8

 

 

3

 

7

 

5

6

2

1

4

 

 

9

8

7

6

5

4

3

2

1

0

100

200

300

400

500

(a) [0o..25o] curvature: 0orientation jitter

9

83 6

 

 

2

5

7

 

 

1

 

4

9

8

7

6

5

4

3

2

1

0

100

200

300

400

500

(b) [0o..10o] curvature: 0orientation jitter

8

9

3

 

2

1

6

4

5

7

9

8

7

6

5

4

3

2

1

0

100

200

300

400

500

(c) [0o..25o] curvature: 40orientation jitter

6

5

93

8 2 4

71

9

8

7

6

5

4

3

2

1

0

100

200

300

400

500

(d) [0o..10o] curvature: 40orientation jitter

Fig. 13.24. Contour integration process with different curvature ranges. Both curvature networks show the same degree of synchrony for the 0orientation jitter (a vs. b), but in the 40case, the network trained with a broad range of curvatures becomes significantly more synchronized than the one trained with a narrow range (c vs. d). These observations and those from Figure 13.23 are confirmed quantitatively in Figure 13.25.

of contour elements. Strong spatial and temporal cues (such as smooth contours and synchronized flashing) resulted in accurate contour integration, as expected. However, when a weak spatial cue was combined with a weak temporal cue, the subjects performed better than expected. The two cues were not simply added together, but

302 13 Understanding Perceptual Grouping: Contour Integration

 

0.8

 

 

100% frequency

 

 

 

50% frequency

 

 

 

 

(r)

0.6

 

 

 

 

 

 

 

 

 

Correlation

0.4

 

 

 

 

0.2

 

 

 

 

 

 

 

 

 

 

0.0

 

 

 

 

 

0o

10o

20o

30o

40o

Orientation jitter

(a) 100% vs. 50% frequency

 

 

 

 

[0o.. 25o] curvature

 

0.9

 

 

[0o.. 10o] curvature

(r)

0.8

 

 

 

 

Correlation

0.7

 

 

 

 

0.6

 

 

 

 

 

 

 

 

 

 

0.5

 

 

 

 

 

0o

10o

20o

30o

40o

 

 

 

Orientation jitter

 

 

(b) [0..25] vs. [0..10] curvature

Fig. 13.25. Contour integration performance with different input distributions. The average correlation coefficients between the MUA sequences in each experiment are shown, calculated over two trials. (a) For both 0and 40orientation jitter, the high-frequency network was significantly more synchronized than the low-frequency network (p < 0.003). The difference is more pronounced in the 40case, as predicted by the lateral connection distributions in Figure 13.21. (b) At 0orientation jitter, the performance of broad and narrow curvature range networks is comparable (p > 0.7), but with 40of jitter the broad curvature network performs significantly better (p < 0.0009), as predicted by the connection distributions in Figure 13.22.

interacted nonlinearly. The temporal cues may not even have to be continuously synchronized to obtain this effect: Beaudot (2002) showed that it is enough to have the contour become visible slightly before the background. A possible interpretation of these results is that temporal cues in the stimulus, such as synchrony and initial activation advantage, enhance synchrony in neural activity, which then allows the perceptual system to bind the individual elements of the contour more strongly.

There are well-defined limits to the grouping process as well. The specific excitatory lateral connections allow only those contours to be completed that fall on a cocircular path. On the other hand, the model needs small amounts of afferent input to fill in the gap, so that arbitrarily large gaps will not be filled in. High orientation jitter makes the contour difficult to perceive as a whole, as it does in humans. The raw segmentation ability of the model is also limited, and curiously similar to the limited number of short-term memory slots, usually quoted as 7 ±2 (Miller 1956). It is difficult to say whether the memory and segmentation limits are related; however, similar limits for simultaneous representation have been observed on other temporal coding models (Horn and Opher 1998), and they seem to be a robust property of such systems.

The behavior of the model is primarily driven by the self-organized lateral connections. Their pattern matches edge distributions in natural images well, which results in good performance in contour integration. However, it is interesting to note that these connections were trained not with natural images, but with elongated Gaussian inputs. The gradual tapering of such patterns on both sides trains the connections to become cocircular. This result suggests that very simple visual inputs, such as

13.5 Discussion

303

those generated internally before birth (Section 2.3) could already prepare the animal for essential tasks in the actual visual environment. As demonstrated in Part III, training after birth with natural images will then further refine the circuitry for more accurate performance.

The PGLISSOM model can be extended in several ways to model a wider range of phenomena and to make it biologically more accurate. For example, the process of forming edge-induced illusory contours demonstrated in Section 13.3 could be extended to line-end-induced contours as well. As will be described in more detail in Section 17.2.12, the model could be extended with a V2 network, containing neurons with end-stopped receptive fields, and connected in a manner similar to GMAP in the current model. Synchronized activation in a group of such neurons in V2 would then be interpreted as a line-end-induced contour. In this manner, the same binding mechanism based on lateral interactions would account for both types of contours, providing an alternative to the bipole model discussed in Section 13.1.2.

The range of illusory contours could be expanded further by including feedback from higher levels of visual processing. For example, contours that establish an illusory object (such as the Kanizsa triangle of Figure 13.3a) cannot be explained entirely by low-level mechanisms (Hoffman 1998); they appear to be partly driven by object representations and cognitive factors as well. In fact, connections between lower and higher visual areas are reciprocal (Felleman and Van Essen 1991; Nelson 1995) and well suited for carrying out such computations. Extending PGLISSOM to include such high-level feedback is an interesting future research direction, as will be discussed in Sections 17.2.13 and 17.2.14.

The PGLISSOM model can also be extended in size. The retina and cortical maps are currently limited by the available computational resources, which makes it difficult to replicate the exact contour integration experiments done with humans, and especially those involving illusory contours and the perception of closed vs. open contours. The retina and the cortex would need to be an order of magnitude larger to approximate typical inputs consisting of about 200 line segments. Such simulations are currently not feasible, but the scaling techniques described in Chapter 15 and Section 17.2.9, coupled with the expected growth of computing power, should make them possible in a few years. While the current PGLISSOM model with smallscale inputs is a valid demonstration of the underlying processing principles, such larger-scale models would allow making detailed predictions that match actual psychophysical measurements. Such a large-scale model could also be trained with natural images, or with a combination of prenatal and postnatal inputs, further enhancing the realism of the model. It would then be possible to study new phenomena, such as interaction of multiple stimulus dimensions, as described in Section 17.2.8.

The model can also be extended with a more accurate representation of the input to the different visual areas. As was discussed in Section 13.1.1, contour integration is stronger in the fovea than in the periphery, and in the lower vs. upper hemifield. Section 13.4 demonstrated how such functional differences can result from different distributions of training inputs in these areas. To verify that such differences indeed exist, a method similar to that of Reinagel and Zador (1999) could be used: Input statistics from different parts of the visual field could be collected using eye-tracking