Добавил:
kiopkiopkiop18@yandex.ru t.me/Prokururor I Вовсе не секретарь, но почту проверяю Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Ординатура / Офтальмология / Английские материалы / Computational Maps in the Visual Cortex_Miikkulainen_2005

.pdf
Скачиваний:
0
Добавлен:
28.03.2026
Размер:
16.12 Mб
Скачать

10.3 Postnatal Development

231

 

 

 

 

(a) Prenatally trained network (ONOFF)

RF 1

 

 

0

6000

10000 12000 16000 20000 30000

RF 2

 

 

0

6000

10000 12000 16000 20000 30000

(c) Snapshots of two RFs in the prenatally trained network

(b) Na¨ıve network (ONOFF)

RF 1

 

 

0

6000

10000 12000 16000 20000 30000

RF 2

 

 

0

6000

10000 12000 16000 20000 30000

(d) Snapshots of two RFs in the na¨ıve network

Fig. 10.15. Prenatally established bias for learning faces. Plots (a) and b show the RFs for every third neuron from the FSA array, visualized as in Figure 10.12a. As the prenatally trained network learns from real images, the RFs morph smoothly into face prototypes, i.e. representations of average facial features and hair outlines (c). By postnatal iteration 30,000, nearly all neurons have learned facelike RFs, with very little effect from the background patterns or non-face objects (a). Postnatal learning is less uniform for the na¨ıve network, as can be seen in the RF snapshots in (d). In the end, many of the na¨ıve neurons do learn facelike RFs, but others become selective for general texture patterns, and some become selective for objects like the clock (b). Overall, the prenatally trained network is biased toward learning faces, while the initially uniform network more faithfully represents the environment. Thus, prenatal learning can allow the genome to guide development in a biologically relevant direction.

10.3.4 Decline in Response to Schematics

Like human infants, the HLISSOM model gradually becomes less responsive to schematic patterns during early postnatal learning (Figure 10.16). This decline results from the normalization of the afferent weights (Equation 4.8). As the FSA neurons learn the hair and face outlines typically associated with real faces, the connections from the internal features become weaker. Unlike real faces, the facelike schematic patterns match only these internal features, not the outlines. As a result, the network responds gradually less strongly to schematic patterns as real faces are learned. Eventually the response drops below the fixed activation threshold (θl in Equation 4.4) and at that point, the model no longer prefers facelike to non-facelike

232 10 Understanding High-Level Development: Face Detection

FSA-1000 FSA-0 LGN Retina

(a)

(b)

(c)

(d)

(e)

(f )

Fig. 10.16. Postnatal decline in response to schematic images. Before postnatal training, the prenatally trained FSA (row “FSA-0”), responds significantly more to the facelike stimulus (a) than to the three-dot stimulus (b; p = 0.05) or to the scrambled faces (c,d; p < 108; Appendix C.2). These responses are similar to those found by Johnson and Morton (1991) in infants up to 1 month of age. In some of their experiments, no significant difference was found between (a) and (b), which is unsurprising given that they are only barely significantly different here. As the FSA neurons learn from real faces postnatally, they respond less and less to schematic faces. The bottom row shows the FSA response after 1000 postnatal iterations. The FSA now rarely responds to (a) and (b), and the average difference between them is no longer significant (p = 0.25). Thus, no preference would be expected for the facelike schematic after postnatal learning, which is exactly what Johnson and Morton (1991) found for older infants, i.e. 6 weeks to 5 months old. The response to real faces also decreases slightly through learning (e,f ), because the newly learned average face and hair outline RFs are a weaker match to any particular face than were the original three-dot RFs. However, this decline is much smaller, because real faces are still more similar to each other than to the schematic faces. Thus, HLISSOM predicts that older infants will still show a face preference if tested with more-realistic stimuli, such as photographs.

schematics (because there is no FSA response, and V1 responses are similar). In a sense, the FSA has learned that real faces typically have both inner and outer features, and does not respond when either type of feature is absent or there is a poor match to real faces.

Yet, the FSA neurons continue to respond to real faces (as opposed to schematics) throughout postnatal learning (Figure 10.16e,f ). Thus, the model provides a clear prediction that the decline in face preferences is limited to schematics, and that no decline will be found if infants are tested with sufficiently realistic face stimuli.

10.4 Discussion

233

This prediction is an important departure from the CONSPEC/CONLERN model. In HLISSOM, an initially CONSPEC-like system is also like CONLERN, in that it will gradually learn from real faces. In contrast, in CONSPEC/CONLERN, CONLERN gradually matures and begins to inhibit CONSPEC, predicting that the decline also applies to real faces. These divergent predictions can be tested by presenting real faces to infants older than 1 month.

As was discussed in Section 10.1.1, the decline takes place at different times in the central vision and in the periphery. Such a difference can be due to gradual maturation of the fovea, as will be outlined in Section 10.4.

10.3.5 Mother Preferences

As was discussed in Section 10.1.1, infants a few days old prefer their mother’s face over other similar faces (Bushnell 2001; Bushnell et al. 1989; Pascalis et al. 1995; Walton, Armstrong, and Bower 1997; Walton and Bower 1993). Similar behavior can be observed in the HLISSOM model. Designating one of the female images as the mother, it was presented in 25% of the postnatal learning iterations, corresponding to the estimated proportion of time the infant spends viewing the mother’s face (Bushnell 2001). One of the other females with a similar face, designated as the stranger, was not presented at all during training. Over 500 training iterations, the FSA learned to respond to the mother significantly more strongly than to the stranger (Figure 10.17a,b).

Interestingly, the mother preference disappears when the hair outline is masked (Figure 10.17c,d), which is consistent with Pascalis et al.’s claim that newborns learn outlines only. However, Pascalis et al. (1995) did not test the crucial converse condition, i.e. whether newborns respond when the facial features are masked, leaving only the outlines. It turns out that HLISSOM does not respond to the head and hair outline alone either (Figure 10.17e,f ). Thus, contra Pascalis et al. (1995), we cannot conclude that what has been learned “has to do with the outer rather than the inner features of the face.”

In the model, the response declines with either type of masking because the model learns faces holistically, based on all facial features. As real faces are learned, the afferent weight normalization ensures that neurons respond only to patterns that are a good overall match to all of the weights, instead of matching only on a few features. Many authors have argued that adults also learn faces holistically (e.g. Farah et al. 1998). These results suggest that newborns may learn faces in the same way, and predict that newborns will not prefer their mother when her hair outline is visible but her facial features are masked. The time course of this behavior may to some extent depend on foveal maturation, as will be discussed in the next section.

10.4 Discussion

The HLISSOM simulations show that self-organization based on internally generated patterns and environmental inputs can together account for face detection in

234 10 Understanding High-Level Development: Face Detection

FSA-500 FSA-0 LGN Retina

(a)

(b)

(c)

(d)

(e)

(f )

Fig. 10.17. Mother preferences based on both internal and external features. Initially, the prenatally trained FSA responds to both women well, with no significant difference (p = 0.28; plots a,b in the row labeled “FSA-0”). The response is primarily due to the internal facial features (c,d), although the hair and one of the eyes also align into a three-dot pattern in both figures, causing weak spurious activation (a,b). Subsequently, image (a), designated as the mother, was presented in 25% of the postnatal learning iterations, while image (b), the stranger, was not presented at all. After 500 iterations (bottom row), the response to the mother is significantly greater than to the stranger (p = 0.001). This result replicates the mother preference found by Pascalis et al. (1995) in infants 3–9 days old. The same results are found in the counterbalancing condition — when trained on face (b) as the mother, (b) becomes preferred (p = 0.002; not shown). After training with real faces, there is no longer any FSA response to the facial features alone (c,d), which replicates Pascalis et al.’s (1995) finding that newborns no longer preferred their mother when her face outline was covered. Importantly, no preference is found for the face outline alone either (e,f ) suggesting that face learning in HLISSOM is holistic. This conclusion is contrary to Pascalis et al.’s (1995) conclusion but consistent with face learning in adults (Farah et al. 1998).

newborns and infants. This perspective leads to testable predictions and suggests future experiments. Importantly, several of HLISSOM’s predictions differ from other models and theories, making it possible to distinguish between them in the future.

One easily tested prediction is that newborn face preferences should not depend on the precise shape of the face outline. The Acerra et al. (2002) model (Section 10.1.3) makes the opposite prediction, because in that model the preferences arise from precise spacing differences between the external border and the internal facial features. Results from the HLISSOM simulations also suggest that newborns

10.4 Discussion

235

will have a strong preference for real faces (e.g. in photographs), whereas the Acerra et al. model predicts only a weak preference for real faces, if any (Section 10.1.3).

Experimental evidence to date cannot yet decide between the predictions of these two models. For instance, Simion et al. (1998a) did not find a significant schematic face preference in newborns 1–6 days old without a contour surrounding the internal features, which is consistent with the Acerra et al. model. However, the same study concluded that the shape of the contour “did not seem to affect the preference” for the patterns, which would not be consistent with the Acerra et al. model. As discussed earlier, newborns may not require any external contour, as the pattern-generation model predicts, until they have had postnatal experience with faces. Future experiments with younger newborns should compare model and newborn preferences between schematic patterns with a variety of border shapes and spacings. These experiments will either show that the border shape is crucial, as predicted by Acerra et al., or that it is unimportant, as predicted by the pattern-generation model.

The predictions of the pattern-generation model also differ from those of the Simion et al. (2001) top-heavy model (Section 10.1.3). The top-heavy model predicts that any face-sized border that encloses objects denser at the top than the bottom will be preferred over similar schematic patterns. The pattern-generation model predicts instead that a pattern with three dots in the typical symmetrical arrangement is preferred over the same pattern with both eye dots pushed to one side, despite both patterns being equally top heavy. These two models represent very different explanations of the existing data, and thus testing such patterns should offer clear support for one model over the other.

On the other hand, many of the predictions of the fully trained pattern-generation model implemented in HLISSOM are similar to those of the CONSPEC model proposed by Johnson and Morton (1991). In fact, the reduced HLISSOM face preference network in Section 10.2.6 (which does not include V1) can be seen as the first CONSPEC system to be implemented computationally, along with a concrete proposal for how such a system could be constructed during prenatal development. The primary architectural difference between the trained HLISSOM network and CONSPEC/CONLERN is that in HLISSOM only neurons located in cortical visual areas respond selectively to faces in the visual field, whereas both the subcortical CONSPEC and the cortical CONLERN systems are face selective.

Whether newborn face detection is mediated cortically or subcortically has been debated extensively, yet no clear consensus has emerged from behavioral studies (Simion et al. 1998a). If future brain imaging studies do discover face-selective visual neurons in subcortical areas of newborns, HLISSOM will need to be modified to include such areas. Yet, the key principles would remain the same, because internally generated activity also shapes subcortical regions (Wong 1999). Thus, experimental tests of the pattern-generation model vs. CONSPEC should focus on how the initial system is constructed, and not where it is located.

Although the HLISSOM model is a good match with current experimental data, it could be extended to account for more variation in the input. For example, the current model is only able to detect facelike patterns at one particular spatial scale. Because all experimental data on face preferences in newborns come from experiments with

236 10 Understanding High-Level Development: Face Detection

life-sized input at a distance of around 20 cm, multiple sizes were not necessary to account for human data. It would be easy to extend the model to multiple face sizes (i.e. distances) by varying the spatial scale of the training patterns during selforganization (Sirosh and Miikkulainen 1996b). The FSA in such a simulation would need to be much larger to represent the different sizes, and the resulting patchy FSA responses would require more complex methods of analysis, but the resulting model should perform like the current model at the same scale. With the multi-scale model, it would be possible to make specific predictions about how human face detection varies over distance.

All HLISSOM experiments were based on upright training patterns, because the Simion et al. (1998a,b) studies suggest that the orientation of face patterns is important even in the first few days of life. In the areas that generate training patterns, such a bias might be due to anisotropic lateral connectivity, which would cause spontaneous patterns in one part of the visual field to suppress those below them (discussed further in Section 16.2.2). On the other hand, tests with the youngest infants (less than 1 day) have not yet found orientation biases (Johnson et al. 1991). Thus, the experimental data are also consistent with a model that assumes unoriented patterns prenatally, followed by rapid learning from upright faces. Such a model would be more complicated to simulate and describe than the one presented in this chapter, but could use a similar architecture and learning rules.

Another important aspect of postnatal learning that is currently not explicitly included in the HLISSOM model is that of fovea vs. periphery (this extension will be discussed in Section 17.2.10). Preference for schematic faces is not measurable in central vision until 2 months of age (Maurer and Barrera 1981), and is gone by 5 months (Johnson et al. 1991). This time course is delayed relative to peripheral vision, where preferences exist at birth but disappear by 2 months. As was reviewed in Section 10.1.3, Johnson and Morton (1991) propose two separate explanations for these phenomena. In the periphery, the preferences disappear because CONLERN matures and inhibits CONSPEC, whereas in central vision they disappear because CONLERN learns properties of real faces and no longer responds to static schematic patterns. HLISSOM instead suggests a unified explanation for both phenomena: A single learning system stops responding to schematic faces because it has learned from real faces. Why, then, would the time course differ between peripheral and central vision? As Johnson and Morton acknowledged, the retina changes significantly over the first few months. In particular, at birth the fovea is much less mature than the periphery, and may not even be functional yet (Abramov, Gordon, Hendrickson, Hainline, Dobson, and LaBossiere 1982; Kiorpes and Kiper 1996). As a result, schematic face preferences in central vision may be delayed. A single cortical learning system like HLISSOM is thus sufficient to account for the time course of both central and peripheral schematic face preferences.

Central and peripheral differences may also have a role in how mother preferences develop postnatally. In a recent study, Bartrip, Morton, and de Schonen (2001) found that infants 19–25 days old do not prefer their mothers significantly when either the internal features or the external features are covered. This result partially confirms the predictions of Section 10.3.5, although tests still need to be run with

10.4 Discussion

237

newborns only a few days old, like Pascalis et al. (1995) did. Interestingly, Bartrip et al. also found that infants 35–40 days old do prefer their mothers even when the external outline is covered. The gradual maturation of the fovea may again explain these later-developing capabilities. Unlike the periphery, the fovea contains many ganglion cells with small RFs, which connect to cortical cells with small RFs (Merigan and Maunsell 1993). These neurons can learn smaller regions of the mother’s face, and their responses will allow the infant to recognize the mother even when other regions of the face are covered. In this way, simple documented changes in the retina can explain why mother preferences would differ over time in different parts of the visual field.

While the current HLISSOM simulations focus on how faces are detected, in the future the model could be used to study face recognition as well. These two tasks have very different requirements. In face detection, the system has to respond similarly to a wide range of different faces. This behavior is achieved in HLISSOM with low afferent learning rates: Each neuron develops preferences that match the long-term averages of faces. In contrast, in face recognition the responses to different individual faces have to differ significantly. Such behavior could be modeled in HLISSOM by including additional FSA-like regions with a higher learning rate: Different neurons would learn to prefer different faces. The final organization of such regions would not depend strongly on the prenatal training patterns, because the initial preferences would soon be overwritten with postnatal experiences. However, even for face recognition regions, prenatal training could speed up postnatal learning by ensuring that their initial state is close to patterns that will be experienced postnatally.

Because tests with human newborns are technically difficult and expensive to perform, understanding how face preferences develop can benefit from the study of model systems in other species. In particular, the phenomenon of chick imprinting has much in common with newborn and young infant face recognition (Bolhuis and Honey 1998; Horn 1985; Johnson and Morton 1991). Birds also exhibit REM sleep (Siegel 1999), and chicks have an “innate” preference for visual stimuli shaped like a head with a neck, on the day after hatching (Horn 1985). Interestingly, chicks do not significantly prefer such stimuli on the first day after hatching, and the preference does not depend on patterns experienced the first day. If such preferences arise from pattern generation, as in the HLISSOM model of face preferences, they may be due to patterns presented in REM sleep the first night. Such patterns may be triggered by the stress hormones that are released after hatching (as suggested by M. H. Johnson, personal communication, January 24th, 2002), but they would not take effect until the next REM sleep episode. Subsequent experimental studies of disrupting REM sleep in chicks can be conducted in conjunction with an HLISSOM-based model of chick imprinting. Such experiments would provide a concrete, practical test for high-level pattern generation as a general principle of development across species.

238 10 Understanding High-Level Development: Face Detection

10.5 Conclusion

The HLISSOM face detection simulations show that internally generated patterns and a self-organizing system can together explain why newborns prefer facelike visual input, how neonatals learn faces, and how face detection develops in the longer term. The model also suggests why newborns prefer specific patterns, why the response to schematic faces decreases over time, and how mother preferences develop. Unlike in other models, the same principles apply to both central and peripheral vision in HLISSOM, and the results differ only because the fovea matures more slowly.

These explanations and simulation results lead to several concrete predictions for future infant experiments. Such experiments may eventually verify the underlying hypothesis that the genome steers development through internally generated patterns, allowing sophisticated abilities to be learned faster and more robustly.

Part IV

PERCEPTUAL GROUPING

11

PGLISSOM: A Perceptual Grouping Model

Grouping of image elements into coherent objects is an intriguing, fundamental function of the visual cortex. Part IV of the book focuses on understanding this process, suggesting that self-organization of lateral connections plays a central role in it. In order to perform grouping, the LISSOM model is expanded in two ways. First, the flat two-dimensional map is extended into a two-layer structure, where long-range excitatory connections in the second layer perform binding and segmentation. Second, firing-rate units are extended into spiking units so that the system can represent temporal coding. The resulting model, PGLISSOM (perceptual grouping LISSOM; Choe 2001; Choe and Miikkulainen 2000, 2004), is used to demonstrate how V1 can perform grouping in schematic images. Simulating spiking and long-range excitation is computationally expensive, and the model can therefore include only the essential components. In particular, the high-level areas and subcortical areas (including the ON/OFF channels) of the HLISSOM experiments in Part III are not included in the simulations in Part IV.

In this chapter, the architecture and components of PGLISSOM are described in detail, showing how the network is initialized, activated, and trained. As a validation experiment, an orientation map is shown to self-organize like in firing-rate LISSOM models. In subsequent chapters in Part IV, PGLISSOM’s temporal coding and selforganization processes are demonstrated and analyzed, and the model is shown to account for low-level perceptual grouping phenomena such as contour integration and certain illusory contours.

11.1 Motivation: Temporal Coding

Recently, considerable evidence has emerged suggesting that low-level perceptual grouping, such as integrating a sequence of line segments into a coherent contour, take place early in the visual system, most likely in V1 (Kapadia, Ito, Gilbert, and Westheimer 1995; Polat, Mizobe, Pettet, Kasamatsu, and Norcia 1998; Stettler et al. 2002). A map-level model such as LISSOM is ideal for testing this hypothesis computationally. However, the LISSOM models introduced in Parts II and III consist of