Добавил:
kiopkiopkiop18@yandex.ru t.me/Prokururor I Вовсе не секретарь, но почту проверяю Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Ординатура / Офтальмология / Английские материалы / Computational Maps in the Visual Cortex_Miikkulainen_2005

.pdf
Скачиваний:
0
Добавлен:
28.03.2026
Размер:
16.12 Mб
Скачать

9.5 Discussion

201

for several weeks whether their eyes are open (i.e. when self-organization is driven by visual inputs) or whether they are sutured shut (i.e. when it is presumably still driven by internal patterns). At least for a while, therefore, postnatal self-organization may be influenced by internal inputs. How long this process continues and what its long-term effects are is not known; however, eventually visual inputs will have a significant effect, resulting in representations that match the input statistics, as was discussed in the previous section. It is possible that the role of internally generated inputs gradually changes from self-organization to maintenance, i.e. counteracting excessive adaptation to noisy inputs from environmental input that might otherwise take place. This possibility is discussed in more detail in Section 17.2.4.

When trained with natural images only, the map develops a very similar final organization as the prenatally trained HLISSOM maps (Figure 9.6). This result is interesting because it suggests that prenatal training is not necessary to obtain functional adult maps. However, such training can still be very useful for the animal. The animal will have a functioning orientation detection system immediately at birth, giving it a survival advantage. Prenatal learning may also make postnatal learning more robust against variations in parameter values and random fluctuations in the inputs. The prenatal training patterns are simpler and well separated from each other (Feller et al. 1996), suggesting that well-organized maps will develop under a greater range of conditions than they could for natural images. Prenatal training may also be important for the development of higher areas connected to V1 (e.g. V2 and V4), because it ensures that the map organization in V1 is approximately constant after birth (Figure 9.6). As a result, the higher areas can begin learning appropriate connection patterns with V1 even before eye opening.

Still, it is interesting that orientation maps can develop without any initial order under such widely varying input conditions: oriented patterns, large unoriented patterns, wide variety of natural images, and to some extent, even just noise. It does not seem likely that internally generated patterns would exist only in order to organize low-level maps, if they can be obtained so robustly. Instead, it is possible that the primary effect of prenatal training is to bias the system so that high-level functions are easier to develop. This is the hypothesis studied in detail in the next chapter.

9.5 Discussion

The results in this chapter show that prenatal training on internally generated activity followed by postnatal training on natural images can account for how orientation maps, orientation selectivity, receptive fields, and lateral connections in V1 develop. The same activity-dependent learning rules can explain development based on both internally and externally generated activity. The two types of activity serve important but different roles in this developmental process, and both are crucial for replicating the experimental data.

Comparing orientation maps and RFs trained on random noise vs. those trained on images, disks, or Gaussians suggests that oriented features are needed for realistic

202 9 Understanding Low-Level Development: Orientation Maps

receptive fields. Even though rough maps develop without such features, the receptive fields do not match those typically measured in animals. A similar result was recently found independently by Mayer et al. (2001) using single-RF simulations. However, they conclude that natural images are required for realistic RFs, because they did not consider patterns like noisy disks. The results in this chapter suggest that any pattern with large, coherent spots of activity will suffice, and thus that natural images are not strictly required for RF development.

In animals, the map that exists at eye opening has more noise and fewer selective neurons than the prenatally trained maps in Figures 9.1 and 9.2 (Chapman et al. 1996; Crair et al. 1998). As a result, in animals the postnatal improvement in selectivity is larger than that shown here for HLISSOM. The difference may result partly from measurement noise, but also partly from the immature receptive fields in the developing LGN (Tavazoie and Reid 2000). Using a more realistic model of the LGN would allow the map to improve more postnatally, but it would make the model significantly more complex to analyze. Neurons may also appear less selective at birth because the cortical responses vary more in infants. Such behavior could be modeled by adding internal noise to the prenatal neurons, which again would make the model more complex to analyze but would not fundamentally change the self-organizing process.

A recent study has also reported that the distribution of orientation-selective cells matches the environment even in very young ferrets, i.e. that horizontal and vertical orientations are over-represented in orientation maps at eye opening (Chapman and Bonhoeffer 1998). One possible explanation for this result is that the retinal ganglion cells along the horizontal and vertical meridians are distributed nonuniformly (Coppola et al. 1998), which could bias the statistics of internally generated patterns. Even if such a prenatal bias exists, HLISSOM shows how biased visual experience is sufficient for the map to develop preferences that match the visual environment.

9.6 Conclusion

The HLISSOM results show that internally generated activity and postnatal learning can together explain much of the development of orientation preferences. Either type of activity alone can lead to orientation maps, but only with realistic prenatal activity and postnatal learning with real images can the model account for the full range of experimental results. The model also suggests a distinct role for both kinds of inputs: Prenatal learning allows the animal to have a functional visual system at birth, forming a robust starting point for further development, and postnatal learning allows refining it to represent the environment better.

In this chapter, the HLISSOM model was tested in a domain that has abundant experimental data for validation. The next chapter will utilize this map as the first cortical processing stage, and will use the prenatal and postnatal simulation techniques to model how the cortical circuitry develops in the much less well-studied domain of face processing.

10

Understanding High-Level Development: Face

Detection

The previous chapter showed that internally generated patterns and visual experience can explain how orientation preferences develop prenatally and postnatally in V1, a process that is well documented and allows validating the model with neurobiological data. In this chapter, the same ideas will be applied to face detection, which has been extensively studied psychophysically, but where little neurobiological data exist. The simulations will demonstrate that internally generated patterns result in face preferences similar to those observed in newborns. When the system is trained further with real images, it learns faster and more robustly. The time course of learning matches that of human infants, showing a weaker response to schematic patterns and a stronger response to familiar faces. These results complement those for orientation processing, showing how prenatal and postnatal learning could also combine genetic and environmental influences in constructing higher visual function. The psychophysical data and existing theories on infant face detection are first reviewed below, followed by the description of HLISSOM prenatal and postnatal learning experiments.

10.1 Psychophysical and Computational Background

Although the neurobiological foundations of infant face detection are still unknown, it has been studied extensively using psychophysical methods. The experiments have inspired several computational models and theories, which will be reviewed and evaluated in this section.

10.1.1 Psychophysical Data

Although much of the biological data on the visual system comes from cats and ferrets, face-selective neurons or regions have not yet been documented in these animals, either adult or newborn. Even in primates, the data are sparse: The youngest

204 10 Understanding High-Level Development: Face Detection

primates that have been tested and found to have face-selective neurons are 6-week- old monkeys (Rodman 1994; Rodman, Skelly, and Gross 1991). Six weeks is a significant amount of visual experience, and it has not yet been possible to measure neurons or regions in younger monkeys. Thus, it is unknown whether the cortical regions that are face selective in adult primates are also face selective in newborns, or whether they are even fully functional at birth (Bronson 1974; Rodman 1994). As a result, how these regions develop remains highly controversial (see de Haan 2001; Gauthier and Logothetis 2000; Gauthier and Nelson 2001; Nachson 1995; Slater and Kirby 1998; Tovee´ 1998 for reviews).

While measurements at the neuron and region levels are not available, behavioral tests with human infants suggest that face detection develops like orientation maps. In particular, internal, genetically determined factors are also important for face detection. The main evidence comes from a series of studies showing that human newborns turn their eyes or head toward facelike stimuli in the visual periphery longer or more often than they do so for other stimuli (Goren et al. 1975; Johnson, Dziurawiec, Ellis, and Morton 1991; Johnson and Mareschal 2001; Johnson and Morton 1991; Mondloch, Lewis, Budreau, Maurer, Dannemiller, Stephens, and Kleiner-Gathercoal 1999; Simion, Valenza, Umilta,` and Dalla Barba 1998b; Valenza, Simion, Cassia, and Umilta` 1996). These effects have been found within minutes or hours after birth. Figure 10.1 shows how several of these studies have measured the face preferences, and Figure 10.2 shows a typical set of results. Whether these preferences represent genuine preference for faces is controversial, in part because measuring pattern preferences in newborns is difficult (Cohen and Cashon 2003; Easterbrook, Kisilevsky, Hains, and Muir 1999; Hershenson, Kessen, and Munsinger 1967; Kleiner 1987, 1993; Maurer and Barrera 1981; Simion, Cassia, Turati, and Valenza 2001; Slater 1993; Thomas 1965). Newborn preferences for additional patterns will be reviewed in Section 10.2, which also shows that HLISSOM exhibits similar face preferences when trained on internally generated patterns.

Early postnatal visual experience also affects face preferences, as it does how orientation maps develop. For instance, an infant only a few days old will prefer to look at its mother’s face, relative to the face of a female stranger with “similar hair coloring and length” (Bushnell 2001) or “broadly similar in terms of complexion, hair color, and general hair style” (Pascalis, de Schonen, Morton, Deruelle, and FabreGrenet 1995). A significant mother preference is found even when non-visual cues such as smell and touch are controlled (Bushnell 2001; Bushnell, Sai, and Mullin 1989; Field, Cohen, Garcia, and Greenberg 1984; Pascalis et al. 1995). The infant presumably prefers the mother because he or she has learned the mother’s appearance. Indeed, Bushnell (2001) found that newborns look at their mother’s face about 1/4 of their time awake over the first few days, which provides ample time for learning.

Pascalis et al. (1995) found that the mother preference disappears when the external outline of the face is masked, and argued that newborns are learning only face outlines, not faces. They concluded that newborn mother learning might differ qualitatively from adult face learning. However, HLISSOM simulation results in Section 10.3 will show that learning of the whole face (internal features and outlines)

10.1 Psychophysical and Computational Background

205

Fig. 10.1. Measuring newborn face preferences. A few minutes or hours after birth, human infants are presented schematic stimuli, measuring how far to the side their eyes or head track each stimulus. The experimenter does not see the specific pattern shown, and neither does the observer who measures the baby’s responses. Face preferences have been found even when the experimenter’s face and all other faces seen by the baby were covered by surgical masks. Reprinted with permission from Johnson and Morton (1991), copyright 1991 by Blackwell.

can also result in mother preferences. Importantly, masking the outline in HLISSOM also erases these preferences, even though outlines were not the only parts of the face that were learned. Thus, HLISSOM predicts that newborns instead learn faces holistically, as has been suggested for adults (Farah, Wilson, Drain, and Tanaka 1998).

Experiments with infants over the first few months reveal a surprisingly complex pattern of face preferences. Newborns up to 1 month of age continue to track facelike schematic patterns in the periphery, but older infants do not (Figure 10.3; Johnson et al. 1991). Curiously, in central vision, schematic face preferences are not measurable until about 2 months of age (Maurer and Barrera 1981), and they decline by 5 months of age (Johnson and Morton 1991). Section 10.3 will show that in each case such a decline can result from learning real faces, coupled with the different rate of maturation of fovea and periphery in the retina.

In summary, much of the neural basis of face processing is still unclear, in the adult and especially in newborns. However, behavioral experiments suggest that human newborns can detect faces already at birth, and their performance develops postnatally as they experience real faces. These experiments suggest that face-selective neurons develop based on both prenatal and postnatal factors, like neurons in the orientation map. Both cases can be explained based on internally generated neural activity: The system develops through input-driven self-organization both before and after birth.

206 10 Understanding High-Level Development: Face Detection

 

60o

 

60o

orientation

50o

orientation

50o

40o

40o

 

 

tracking

30o

tracking

30o

20o

20o

Final

Final

10o

10o

 

 

F

0o

F

0o

 

 

(a)

(b)

(c)

(d)

(e)

(f )

(g)

Within 1 hour

 

21 hours

 

Fig. 10.2. Face preferences in newborns. Using the procedure from Figure 10.1, Johnson et al. (1991) measured responses of human newborns to a set of head-sized schematic patterns. The graph at left gives the result of a study conducted within 1 hour after birth; the one at right gives results from a separate study with newborns an average of 21 hours old. Each bar indicates how far the newborns tracked the image pictured below with their eyes on average. Because the procedures and conditions differed between the two studies, only the relative magnitudes should be compared. Overall, the study at left shows that newborns respond to facelike stimuli (a,b) more strongly than to simple control conditions (c); all comparisons were statistically signiÞcant. This result suggests that face processing is in some way genetically coded. In the study at right, the checkerboard pattern (d) was tracked signiÞcantly farther than the other stimuli, and pattern (g) was tracked signiÞcantly less far; no signiÞcant difference was found between the responses to (e) and (f ). The ovals are not as visible to the newborn as the square dots, and the checkerboard stimulates newbornÕs low-level visual system extremely well. These results suggest that simple three-dot patterns can invoke face preferences much like facelike patterns do, but low-level visual stimulation can also have a signiÞcant effect. Replotted from Johnson et al. (1991).

10.1.2 Computational Models of Face Processing

The models discussed in this book so far have simulated visual processing only up to V1 and did not include any of the higher cortical areas that are thought to underlie face-processing abilities. Most computational systems that include face processing were not intended as biological models, but instead focus on speciÞc engineering applications such as face detection or face recognition (e.g. Bartlett, Movellan, and Sejnowski 2002; Burton, Bruce, and Hancock 1999; Graham and Allinson 1998; Ko and Byun 2003; Lawrence, Giles, Tsoi, and Back 1997; OÕToole, Millward, and Anderson 1988; Rao and Ballard 1995; Rowley, Baluja, and Kanade 1998; Viola and Jones 2004; Wiskott and von der Malsburg 1996; Yilmaz and Shah 2002; see Phillips, Wechsler, Huang, and Rauss 1998; Yang, Kriegman, and Ahuja 2002 for reviews). A few biologically motivated face processing models exist, but like the engineering systems they either bypass the circuitry in V1 and below, or treat it as a

10.1 Psychophysical and Computational Background

207

 

70o

 

70o

orientation

60o

orientation

60o

50o

50o

40o

40o

attention

attention

30o

30o

20o

20o

First

First

10o

10o

 

 

F

0o

F

0o

 

 

(a)

(b)

(c)

(d)

(e)

(f )

(g)

(h)

 

3 months

 

 

5 months

 

Fig. 10.3. Face preferences in young infants. In addition to newborns (Figure 10.2), Johnson and Morton (1991) also tested how infants at various postnatal ages up to 5 months respond to schematic patterns. They rotated the infant’s chair toward the stimulus and measured the angle at which he or she first attended (or oriented) to it. Neither 3-month-old nor 5-month- old infants significantly preferred facelike schematic patterns (b,c,d and f,g,h) over the controls (a and e). Results at earlier ages were variable, depending on the testing method (e.g. whether the stimuli were presented in central or peripheral vision). These results suggest that early postnatal visual experience significantly shapes the infant face preferences. Replotted from Johnson and Morton (1991).

fixed set of predefined filters (Acerra, Burnod, and de Schonen 2002; Bartlett and Sejnowski 1997, 1998; Dailey and Cottrell 1999; Gray, Lawrence, Golomb, and Sejnowski 1995; Wallis 1994; Wallis and Rolls 1997; see Valentin, Abdi, O’Toole, and Cottrell 1994 for a review). Given the output of the filtering stage, these models show how face-selective neurons and responses can develop from training with real images. The HLISSOM model of face processing develops such neurons and responses as well. However, it is the first to use a self-organized V1 as the input stage, and the first to demonstrate that the same computational mechanism could be responsible for processing in both V1 and the higher face-processing area. HLISSOM thus unifies these high-level models with the V1 models discussed earlier.

Of the biological models, the Dailey and Cottrell (1999) and Acerra et al. (2002) models have goals most similar to those of this chapter. Acerra et al. (2002) simulated newborn face preferences, and their work will be reviewed in the next subsection. Dailey and Cottrell (1999) instead had a more general focus on whether face detection needs to be genetically encoded. They showed in an abstract model how specific face-selective regions can arise without genetically specifying the weights of each neuron. As was discussed in Section 2.1.3, some of the higher visual areas of the adult human visual system respond more strongly to faces than objects; others have the opposite preferences. Moreover, some of the face-selective areas have been shown to occupy the same region of the brain in different individuals (Kan-

208 10 Understanding High-Level Development: Face Detection

wisher et al. 1997). This consistency suggests that those areas might be genetically specified for face processing.

To show that such explicit prespecification is not necessary, Dailey and Cottrell (1999) set up a pair of supervised networks that compete with each other to identify faces and to classify objects into categories. They provided one of the networks with real images filtered to preserve low-spatial-frequency information (i.e. slow changes in brightness across a scene), and another with the images filtered to preserve high-spatial-frequency information. These differences correspond to connecting each network to a subset of the neurons in V1, each with different preferred spatial frequencies. They found that the low-frequency network consistently developed face-selective responses, while the high-frequency network developed objectselective responses. Thus, they concluded that different areas may specialize for different tasks based on very simple, general differences in their connectivity, and that specific configuration of individual neurons need not be specified genetically to respond to faces.

Like the Dailey and Cottrell (1999) model, HLISSOM is based on the assumption that cortical neurons are not specifically prewired for face perception. To make detailed simulations practical, HLISSOM will model only a single high-level region, one that has sufficient spatial frequency information available to develop faceselective responses. Other regions presumably develop similarly, but become selective for objects or other image features instead.

10.1.3 Theoretical Models of Newborn Face Preferences

The computational models discussed in the previous section do not specifically explain why newborns should respond strongly to faces. Such explanations have all been conceptual, not computational (with the exception of Acerra et al. 2002). There are four main theories of this phenomenon: (1) the linear systems model, (2) sensory models (including the Acerra et al. (2002) computational model and the top-heavy conceptual model), (3) haptic models, and (4) multiple systems models. These theories will be reviewed below, showing how they compare to the pattern generation model, and arguing that it provides a simpler, more effective explanation.

Linear Systems Model

The linear systems model (LSM; Banks and Salapatek 1981; Kleiner 1993) is a straightforward and effective way of explaining a wide variety of newborn pattern preferences, and could easily be implemented as a computational model. Because it is general and simple, it constitutes a baseline model against which others can be compared. The LSM is based solely on the newborn’s measured contrast sensitivity function (CSF). For a given spatial frequency, the value of the CSF will be high if the early visual pathways respond strongly to that size of pattern, and low otherwise. The newborn CSF is limited by the immature state of the eye and the early visual pathways, which makes low frequencies more visible than fine detail.

10.1 Psychophysical and Computational Background

209

The LSM assumes that newborns pay attention to those patterns that give the largest response when convolved with the CSF. Low-contrast patterns and patterns with only very fine detail are only faintly visible, if at all, to newborns (Banks and Salapatek 1981). Conversely, faces might be preferred because they have strong spatial-frequency components in the ranges that are most visible to newborns.

However, studies have found that the LSM fails to account for the responses to facelike stimuli. For instance, some of the facelike patterns preferred by newborns have a lower amplitude spectrum in the visible range (and thus lower expected LSM response) than patterns that are less preferred (Johnson and Morton 1991). The LSM also predicts that the newborn will respond equally well to a schematic face regardless of its orientation, because the orientation does not affect the spatial frequency or the contrast. Instead, newborns prefer schematic facelike stimuli oriented right-side- up. Such a preference is found even when the inverted stimulus is a better match to the CSF (Valenza et al. 1996). Thus, the CSF alone does not explain face preferences, and a more complex model is required.

Acerra et al. Sensory Model

The LSM is a high-level abstraction of the properties of the early visual system. Sensory models extend the LSM to include additional constraints and circuitry, but without adding face-selective visual regions or systems. Acerra et al. (2002) recently developed such a computational model that can account for some of the face preferences found in the Valenza et al. (1996) study. Their model consists of a fixed Gabor-filter-based model of V1, plus a high-level sheet of neurons with modifiable connections. They model two conditions separately: newborn face preferences, and postnatal development by 4 months. The newborn model includes only V1, because they assume that the high-level sheet is not yet functional at birth.

Acerra et al. showed that the newborn model responds slightly more strongly to the upright schematic face pattern used by Valenza et al. (1996) than to the inverted one. This surprising result replicates the newborn face preferences found by Valenza et al. In the stimuli, only the internal facial features were inverted, not the entire pattern. In the upright case, the spacing is more regular between the internal features and the face outline (compare Figure 10.7d with 10.7g, top row). As a result, neurons whose RF lobes match the spacing respond more strongly, and the total response of all filters will be slightly higher for the facelike (upright) pattern than to the nonfacelike (inverted) pattern.

However, the Acerra et al. model was not tested with patterns from other studies of newborn face preferences, such as Johnson et al. (1991). The facelike stimuli published by Johnson et al. (1991) do not have a regular spacing between the internal features and the outline, and it is unlikely that the model will replicate preferences for these patterns. Moreover, Johnson et al. used a white paddle against a light-colored ceiling, and so their face outlines would have a much lower contrast than the blackbackground patterns used by Valenza et al. (1996). Thus, although border effects may have contributed to the face preferences found by Valenza et al., they are unlikely to explain those measured by Johnson et al.

210 10 Understanding High-Level Development: Face Detection

The Acerra et al. newborn model also does not explain newborn learning of faces, because their V1 model is fixed and the high-level area is assumed not to be functional at birth. Also importantly, the model was not tested with real images of faces, where the spacing of the internal features from the face outline varies widely depending on the way the hair falls. Because of these differences, we do not expect the Acerra et al. model to show a significantly higher response overall to photographs of real faces than to other similar images. The pattern-generation model will make the opposite prediction, and will explain how newborns can learn faces.

To explain learning of real faces in older infants, the Acerra et al. model relies on having face images strictly aligned in the input, having nothing but faces presented to the model (no objects, bodies, or backgrounds), and having the eyes in each face artificially boosted by a factor of 10 or 100 relative to the rest of the image. Because of these assumptions, it is difficult to evaluate how well their postnatal learning model corresponds to experimental data. In contrast, the HLISSOM model learns from faces presented at random locations on the retina, against natural image backgrounds, intermixed with images of other objects, and without special emphasis for faces relative to the other objects.

Top-Heavy Sensory Model

Simion et al. (2001) also presented a sensory model of newborn preferences, although their model is conceptual only. They observed that nearly all of the facelike schematic patterns that have been tested with newborns are top-heavy, i.e. they have a boundary with denser patterns in the upper than the lower half. They also ran behavioral experiments showing that newborns prefer several top-heavy (but not facelike) schematic patterns to similar but inverted patterns. Based on these results, they proposed that newborns prefer top-heavy patterns in general, and thus prefer facelike schematic patterns as a special case.

This hypothesis is compatible with most of the experimental data so far collected in newborns. However, facelike patterns have not yet been compared directly with other top-heavy patterns in newborn studies. Thus, it is not yet known whether newborns would prefer a facelike pattern to a similarly top-heavy but not facelike pattern. Future experimental tests with newborns can resolve this issue.

To be tested computationally, the top-heavy hypothesis would need to be made more explicit, with a specific mechanism for locating object boundaries and the relative locations of patterns within them. It would then be possible to test it with a variety of inputs, including photographs of real faces. Whereas the bulk of the current evidence suggests that newborns prefer face patterns in general, we expect that a computational test of the top-heavy model would find only a small preference (if any) for real faces, compared with many other common stimuli. Many real faces, such as those with beards, wide smiles, or wide-open mouths, are not necessarily top heavy, and would result in little or no response from the model.

This prediction is also supported by a systematic test of training pattern shapes with HLISSOM, presented in Section 10.2.6. Although many simple shapes including general top-heavy patterns result in weak face preferences, more facelike patterns