Добавил:
kiopkiopkiop18@yandex.ru t.me/Prokururor I Вовсе не секретарь, но почту проверяю Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Ординатура / Офтальмология / Английские материалы / Computational Maps in the Visual Cortex_Miikkulainen_2005

.pdf
Скачиваний:
0
Добавлен:
28.03.2026
Размер:
16.12 Mб
Скачать

10.1 Psychophysical and Computational Background

211

are necessary to obtain selective responses that allow matching the behavior of the model with newborn data. Such patterns will therefore be used in training the HLISSOM model of newborn face preferences.

Haptic Hypothesis

Bushnell (1998) proposed an explanation very different from that of the sensory models: A newborn may recognize facelike stimuli as a result of prenatal experience with its own face, via manual exploration of its facial features. Some support for this position comes from findings that newborn and infant monkeys respond equally or more strongly to pictures of infant monkeys than to adults (Rodman 1994; Sackett 1966).

However, the process by which a newborn could make such specific connections between somatosensory and visual stimulation, prior to visual experience, is not clear. Moreover, the haptic explanation does not account for several aspects of newborn face preferences. For instance, premature babies develop face preferences at the same post-conception age regardless of the age at which they were born (Ferrari, Manzotti, Nalin, Benatti, Cavallo, Torricelli, and Cavazzutti 1986). Presumably, the patterns of hand and arm movements would differ between the intrauterine and external environments, and thus the haptic hypothesis would predict that gestation time should have been an important factor. Several authors have also pointed out strong similarities between newborn face preferences and imprinting in newly hatched chicks; chicks, of course, do not have hands with which to explore, yet develop a specific preference for stimuli that resemble a (chicken’s) head and neck (Bolhuis 1999; Horn 1985). Some of these objections are overcome in a variant of the haptic hypothesis by Meltzoff and Moore (1993), who propose that direct proprioception of the infant’s own facial muscles is responsible.

However, neither variant can account for evidence suggesting that newborns’ preferences are specifically visual. For instance, newborns respond as well to patterns with a single dot in the nose and mouth area as to separate patterns for the nose and mouth (Johnson et al. 1991). This finding is easy to explain for visual images: In a blurred top-lit visual image, shadows under the nose blend together with the mouth into a single region. But the nose and mouth have opposite convexity, so it is difficult to see how they could be considered a single region for touch stimulation or proprioception. Similarly, newborns have so far only been found to prefer faces viewed from the front, and it is not clear why manual exploration or proprioception would favor that view in particular. Thus, in this chapter the newborn face preferences are assumed to be essentially visual.

Multiple-Systems Models

The most widely known conceptual model for newborn face preferences and later learning was proposed by Johnson and Morton (1991). Apart from HLISSOM simulations in this chapter that test some of its foundations, it has not yet been evaluated computationally.

212 10 Understanding High-Level Development: Face Detection

Johnson and Morton proposed that infant face preferences are mediated by two hypothetical visual processing systems that they dubbed CONSPEC and CONLERN. CONSPEC is a fixed system controlling orienting to facelike patterns, assumed to be located in the subcortical superior colliculus–pulvinar pathway. Johnson and Morton proposed that a CONSPEC responding to three dark dots in a triangular configuration, one each for the eyes and one for the nose/mouth region, would account for the newborn face preferences (see Figures 10.5a and 10.6c for examples).

CONLERN is a separate plastic cortical system, presumably corresponding to the face-processing areas that have been found in adults and in infant monkeys (Kanwisher et al. 1997; Rodman 1994). The CONLERN system would assume control only after about 6 weeks of age, and would account for the face preferences seen in older infants. Eventually, as it learns from real faces, CONLERN would gradually stop responding to schematic faces, which would explain why face preferences can no longer be measured with static schematic patterns by 5 months (Johnson and Morton 1991).

The CONSPEC/CONLERN model is plausible, given that the superior colliculus is relatively mature in newborn monkeys and is involved in controlling attention and other functions (Wallace et al. 1997). Moreover, some neurons in the adult superior colliculus/pulvinar pathway are selective for faces (Morris, Ohman, and Dolan 1999), although such neurons have not yet been found in young animals. The model also helps explain why infants are less interested in faces in the periphery after 1 month: The preferences may change as the attentional control shifts to the not-quite- mature cortical system (Johnson et al. 1991; Johnson and Morton 1991).

However, subsequent studies showed that even newborns are capable of learning individual faces (Slater 1993; Slater and Kirby 1998). Thus, if there are two visual processing systems, either both are plastic or both are functioning at birth, and thus there is no a priori reason why a single face-selective visual system would be insufficient. On the other hand, de Schonen, Mancini, and Liegeois (1998) argue for three visual processing systems: a subcortical one responsible for facial feature preferences at birth, another one responsible for newborn learning (of objects and head/hair outlines; Slater 1993), and a cortical system responsible for older infant and adult learning of facial features. And Simion, Valenza, and Umilta` (1998a) proposed that face selectivity instead relies on multiple visual processing systems within the cortex, maturing first in the dorsal stream but later supplanted by the ventral stream (which is where most of the face-selective visual regions have been found in adult humans).

In contrast to the increasing complexity of these explanations, the HLISSOM model shows that a single general-purpose, plastic visual processing system is sufficient, if that system is first exposed to internally generated facelike patterns of neural activity. As reviewed in Section 2.3.4, PGO activity waves during REM sleep represent a likely candidate for such activity. If the PGO waves have the simple three-dot configuration illustrated in Figures 8.4c and 10.5a, they can explain the measured face-detection performance of human newborns.

The three-dot training patterns are similar to the three-dot preferences proposed by Johnson and Morton (1991). However, in their model the patterns were imple-

10.2 Prenatal Development

213

mented as a hard-wired subcortical visual area responsible for orienting to faces. In HLISSOM, the areas receiving visual input learn from their input patterns, and only the pattern generator is assumed to be hard-wired. Both of these possible mechanisms require about the same amount of genetic specification. The crucial difference is that in the pattern generation approach the visual processing system can be arbitrarily complex because it learns the complexity from the input. In contrast, a hard-coded visual processing system like CONSPEC is limited by what can specifically be encoded in the genome. Such a system is a plausible model for subcortically mediated orienting at birth, but less plausible as a model of a cortical areas. In the future it may be possible to use imaging techniques to determine whether newborn face preferences are based on subcortical processes, or mediated by a hierarchy of cortical areas as they are in the adult.

Specifying only the training patterns also makes sense from an evolutionary perspective, because that way the patterns and the visual processing hardware can evolve independently. Separating these capabilities thus allows the visual system to become arbitrarily complex, while maintaining the same genetically specified function. Finally, assuming that only the pattern generator is hardwired explains how infants of all ages can learn faces.

In conclusion, previous face-processing models have not yet shown how newborns can have a significant preference for real faces at birth, or how newborns could learn from real faces. They have also not accounted for the full range of patterns with which newborns have been tested. Using the self-organized orientation map, the HLISSOM model of face detection will show how face-selective neurons can develop through internally generated activity, explaining face preferences at birth and later face learning. In the next two sections, the prenatal and postnatal phases of this process are each discussed in turn.

10.2 Prenatal Development

In this section, internally generated patterns are used to self-organize the full HLISSOM model prenatally, including both V1 and the FSA. Its performance on schematic facelike patterns is then shown to be remarkably similar to that of human infants. Importantly, the trained model responds strongly to real faces as well, but not to other naturally occurring objects. This behavior can be obtained with a variety of internally generated patterns that match the general outline of the human face.

10.2.1 Training Method

Previous developmental models have been tested only for small input areas such as those used in Chapters 4–9. In contrast, face detection requires processing headsized stimuli at a distance of about 20 cm from the baby’s eyes, filling a substantial portion of the visual field (about 45). To model behavior at this scale, the orientation map simulation from Section 9.2 was expanded to a very large V1 area

214 10 Understanding High-Level Development: Face Detection

(approximately 1600 mm2 in total) at a relatively low sampling density (approximately 50 neurons/mm2). The expansion was done using the scaling equations in Appendix A.2, which allow adjusting the simulation parameters algorithmically to obtain a simulation of a different size (the scaling equations will be described in detail in Chapter 15).

The cortical density was first reduced to the minimum value that would show an orientation map that matches animal maps (36 × 36), and the visual area was scaled to be just large enough to cover the visual images to be tested (288 × 288, i.e. width ×8 and height ×8). The FSA size was less crucial, and was set arbitrarily at 36 ×36. The FSA RF size was scaled to be large enough to span the central portion of a face input. The resulting network consisted of 438 × 438 retinal units, 220 × 220 PGO generator units, 204 ×204 ON-center LGN units, 204 ×204 OFF-center LGN units, 288 × 288 V1 units, and 36 × 36 FSA units, for a total of 408,000 distinct units. There were 80 million connections in total in the two cortical sheets, which required 300 MB of physical memory.

The RF centers of neurons in the FSA were mapped to the central 160 × 160 region of V1 such that even the units near the edge of the FSA had a complete set of afferent connections on V1, with no FSA RF extending over its edge. V1 was similarly mapped to the central 192 ×192 region of the LGN channels, and the LGN channels to the central 384×384 region of the retina and the central 192×192 region of the PGO generators. In the figures in this chapter, only the area mapped directly to V1 will be shown, to ensure that all plots have the same scale. The size of the input patterns on the retina was chosen to match the preferred spatial frequency of newborns, as cited by Valenza et al. (1996). Inputs were presented at the spatial scale where the frequency most visible to newborns produced the largest V1 response in the model. The rest of the simulation parameters are listed in Appendix C.3.

V1 was self-organized for 10,000 iterations with inputs consisting of 11 randomly located circular disks per iteration, each 50 units wide (Figure 10.4a). These simple patterns were used for clarity and simplicity, since the focus of these simulations is on the development of the high-level FSA region. If the V1 size were increased to provide more units for each location in the retina, the noisy disks patterns from the previous chapter could also have been used. The background activity level was 0.5, and the brightness of each disk relative to this surround (either +0.3 or 0.3) was chosen randomly. The borders of each disk were smoothed into the background level following a Gaussian width σd = 1.5.

The FSA was trained for 10,000 iterations based on the responses of the V1 network. Two triples of dark circular dots were used as input to V1, each arranged in a triangular facelike configuration (Figure 10.5a). As was discussed in Section 10.1.3, such patterns roughly correspond to the eye and nose/mouth areas of the face, as proposed by Johnson and Morton (1991). Each dot had a radius of 10 PGO units; the centers of the two top dots were separated by 50 units and they were located 54 units from the bottom dot. The dot was 0.3 units darker than the surround, which itself was 0.5 on a scale of 0 to 1. Each triple was placed at a random location each iteration, at least 118 PGO units away from the center of the other one to avoid overlap. The

10.2 Prenatal Development

215

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(a) Retinal activation

(b) LGN response

(c) V1 response

(d)

 

 

 

 

 

 

 

 

 

 

V1 H

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(e) (f )

(g) V1 OR preference &

(h)

(i) OR FFT

(j) Detail of V1 OR

RFs LIs

selectivity

OR H

 

preference & selectivity

Fig. 10.4. Self-organization of the scaled-up orientation map. These figures show a scaledup version of the “Disks” simulation from Figure 9.1, eight times wider and eight times taller. At this scale, each input includes multiple disks (a), the afferent weights of each neuron span only a small portion of the retina (drawn to scale in e) and the lateral weights only a small part of V1 (drawn to scale in f ), and the orientation map has many more orientation patches (g). Its Fourier transform (i) is still ring-shaped and its OR histogram (h) flat. Zooming in on the central 36 × 36 portion of this 288 × 288 map, plot (j) also shows that the local structure and selectivity of the map are similar to those in Section 9.2. The map appears blockier because the neuron density was reduced to the smallest acceptable value so that the network would be practical to simulate. Plot (c) shows that the orientation preference of each neuron that responds to the input (plotted as in Figure 6.5) is still a good match to the orientation of the input at that retinal location, and the histogram of the responses is unbiased, although noisy (d). Thus, this network is a reasonable approximation to a large area of V1 and the retina.

angle of each triple was drawn randomly from a narrow (σ = π/36 radians) normal distribution around vertical.

Because the model is very large and expensive to simulate, V1 was trained first, followed by the FSA. This arrangement reduces computational cost without significantly affecting how face preferences develop, which is the focus of the next subsection.

216 10 Understanding High-Level Development: Face Detection

(a) PGO activation

(b) V1 response

(c) FSA response

 

(d) FSA response

 

 

before training

 

after training

 

 

 

 

 

 

 

 

 

 

(e) Initial

(f ) Final

(g) Initial

(h) Final (i) Initial FSA map (j) Final FSA map

FSA RFs

FSA RFs

FSA LIs

FSA LIs

Fig. 10.5. Self-organization of the FSA map. The PGO activation is shown in gray scale from black to white (low to high), and the V1 and FSA activities and the afferent and lateral weights in gray scale from white to black (low to high). (a) Each input pattern consisted of two dark three-dot configurations with random nearly vertical orientations presented at random locations on the PGO sheet. (b) The V1 neurons compute their responses based on this input, relayed through the LGN. (c) FSA neurons initially respond to any activity in their receptive fields, but after training (d), only neurons with closely matching RFs respond. In the FSA plots, the inner square represents the FSA and is drawn to scale with the retina. The outer square is provided to help locate the FSA responses on the retina, as was done in Figures 4.4 and A.1(a). Through self-organization, the FSA neurons develop RFs selective for a range of V1 activity patterns like those resulting from the three-dot stimuli (e and f , drawn in the same scale as b for two sample neurons). The RFs are patchy because the weights target specific orientation patches in V1. This match between the FSA and the local self-organized pattern in V1 would be difficult to ensure without training on internally generated patterns. The FSA neurons also develop lateral inhibitory connections with a smooth Gaussian profile (g and h, drawn in the same scale as c and d for the two neurons in e and f ). Plots (i) and (j) show the afferent weights for every third neuron in the FSA. All neurons develop roughly similar weight profiles, differing primarily by the position of their preferred stimuli on the retina and by the specific orientation patches targeted in V1. The largest differences between RFs are along the outside border, where the neurons are less selective for three-dot patterns. Overall, the FSA develops into a face detection map, signaling the location of facelike stimuli.

10.2.2 V1 and Face-Selective Area Organization

Through self-organization, the scaled-up map shown in Figure 10.4g emerged. The map has the same structural features and similar quantitative measures as the previous LISSOM OR maps (Chapters 5 and 9). However, as was seen in Figure 8.2,

10.2 Prenatal Development

217

this scaled-up model can extract the salient local orientations even in large images, which has not yet been demonstrated for other models of V1 development.

After V1 had been trained, its weights were fixed and the FSA was allowed to activate and learn from the V1 responses to the three-dot patterns. The resulting face-selective map consists of an array of neurons that respond most strongly to patterns similar to the training patterns (Figure 10.5j). Despite the overall similarities between neurons, the individual weight patterns are unique because each neuron targets specific orientation patches in V1. Such complicated patterns would be difficult to specify and hardwire genetically, but they arise naturally from internally generated activity.

10.2.3 Testing Method

As the FSA training completed, the lower threshold of the sigmoid (i.e. θl in Figure 4.5) was increased to ensure that only patterns that are a strong match to an FSA neuron’s weights will activate it. An active neuron in the FSA thus indicates that there is a face at the corresponding location in the retina, which will be important for measuring the face-detection ability of the network.

The cortical maps were then tested on natural images and with the same schematic stimuli on which human newborns have been tested (Goren et al. 1975; Johnson and Morton 1991; Simion et al. 1998b; Valenza et al. 1996). For all such tests, the same parameter settings described in Appendix C.3 were used. Schematic images were scaled to a brightness range of 1.0 (i.e. the difference between the darkest and lightest pixels in the image). Natural images were scaled to a brightness range of 2.5, so that facial features in images with faces would have a contrast comparable to that of the schematic images. If both types of images were instead scaled to the same brightness range, the model would prefer the schematic faces over real ones. Using different scales is appropriate for these experiments because responses are compared only within groups of similar images, and the infant is assumed to adapt to each group. For simplicity, the model does not specifically include the mechanisms in the eye, LGN, and V1 that are responsible for such contrast adaptation.

The HLISSOM model provides detailed neural responses for each neural region. These responses constitute predictions for future electrophysiological and imaging measurements in animals and humans. However, the data currently available for infant face perception are behavioral. It consists of newborn attention preferences measured from visual tracking distance and looking time. Thus, validating the model on these data will require predicting a behavioral response based on the simulated neural responses.

As a general principle, newborns are assumed to pay attention to the stimulus whose overall neural response most clearly differs from those of typical stimuli. This idea can be quantified as

a(t) =

F (t)

+

V (t)

+

L(t)

,

(10.1)

 

 

 

 

 

 

 

 

 

F V L

218 10 Understanding High-Level Development: Face Detection

where a(t) is the attention level at time t, and F , V , and L represent the FSA, V1, and LGN regions: X(t) (either F , V , or L) is the total activity in region X at that time, while X is the average (or median) activity over the recent history. Because most stimuli activate the LGN and V1 but not the FSA, when a pattern evokes activity in the FSA the newborns would attend to it more strongly. Yet, stimuli evoking only V1 activity could still be preferred over facelike patterns if their V1 activity is much higher than typical.

Unfortunately, it is difficult to use such a formula to compare to newborn experiments, because the presentation order in those experiments is usually not known. As a result, the average or median value of patterns in recent history is not available. Furthermore, the numerical preference values computed in this way will differ depending on the specific set of patterns chosen, and thus will be different for each study.

Instead, a categorical approach inspired by Cohen (1998) will be used in the simulations below; it avoids these problems and leads to similar results. Specifically, when two stimuli both activate the model FSA, the one with the higher total FSA activation will be preferred. Similarly, with two stimuli activating only V1, the higher total V1 activation will be preferred. When one stimulus activates only V1 and another activates both V1 and the FSA, the pattern that produces FSA activity will be preferred, unless the V1 activity is much larger than for typical patterns. Using these guidelines, the computed model preferences can be validated against the newborn’s looking preferences, to determine if the model shows the same behavior as the newborn.

10.2.4 Responses to Schematic Patterns

This section and the following one present the model’s response to schematic patterns and real images after it has completed training on the internally generated patterns. The response to each schematic pattern is compared with behavioral results from infants, and the responses to real images constitute predictions for future experiments.

Figures 10.6 and 10.7 show that the model responses match the measured stimulus preferences of newborns remarkably well, with the same relative ranking in each case where infants have shown a significant preference between schematic patterns. These rankings are the main result from this simulation. Each category of schematic patterns is next analyzed in more detail, to understand what aspects of the model are responsible for the result.

Most non-facelike patterns activate only V1, and thus the preferences between those patterns are based only on the V1 activity values (Figures 10.6a,ei and 10.7f i). Patterns with numerous high-contrast edges have greater V1 response, which explains why newborns would prefer them. These preferences are in accord with the simple linear systems model (Section 10.1.3), because they are based only on the early visual processing.

Facelike schematic patterns activate the FSA, whether they are realistic or simply patterns of three dots (Figures 10.6bdand 10.7ad). The different FSA activation levels reflect the level of V1 response, not the precise shape of the pattern. Again

10.2 Prenatal Development

219

the responses explain why newborns would prefer patterns like the three-square face over the three-oval face, which has shorter edges. The preferences between these patterns are also compatible with the LSM, because in each case the strength of the V1 response in the model matches newborn preferences.

The comparisons between facelike and non-facelike patterns show how HLISSOM predictions differ from the LSM. HLISSOM predicts that the patterns that activate the FSA would be preferred over those activating only V1, except when the V1 response is highly anomalous. Most V1 responses are very similar, and so the patterns with FSA activity (Figures 10.6bd and 10.7ad) should be preferred over most of the other patterns, as found in infants. However, the checkerboard pattern (Figure 10.6a) has nearly three times as much V1 activity as any other pattern with a similar background. Thus, the HLISSOM results explain why the checkerboard would be preferred over other patterns, even ones that activate the FSA.

Because the RFs in the model are only a rough match to the schematic patterns, the FSA can have spurious responses to patterns that to adults do not look like faces. For instance, the inverted three-dot pattern in Figure 10.7e activated the FSA. In this case, part of the square outline filled in for the missing third dot of an upright pattern. Figure 10.8 shows that such spurious responses should be expected with inverted patterns, even if neurons prefer upright patterns. This result may explain why some studies have not found a significant difference between upright and inverted three-dot patterns (e.g. Johnson and Morton 1991). Because of such effects, future experimental studies should include additional controls besides inverted three-dot patterns.

Interestingly, the model also showed a clear preference in one case where no significant preference was found in newborns (Simion et al. 1998a): for the upright three-dot pattern with no face outline (Figure 10.7a), over the similar but inverted pattern (Figure 10.7i). The V1 responses to both patterns are similar, but only the upright pattern has an FSA response, and thus the model predicts that the upright pattern would be preferred.

This potentially conflicting result may be due to postnatal learning rather than capabilities at birth. As will be shown in the next section, postnatal learning of face outlines can have a strong effect on FSA responses. The newborns in the Simion et al. study were already 1–6 days old, and Pascalis et al. (1995) showed that newborns within this age range have already learned some of the features of their mother’s face outline. Thus, the pattern generation model predicts that if younger newborns are tested, they will prefer upright patterns even without a border. Alternatively, the border may satisfy some minimum requirement on size or complexity for patterns to attract a newborn’s interest. For instance, the total V1 activity may need to be above a certain threshold for the newborn to pay any attention to a pattern. If so, the HLISSOM procedure for deriving a behavioral response from the model response (Section 10.2.3) would need to be modified to include this constraint.

Overall, these results provide strong computational support for the speculation of Johnson and Morton (1991) that the newborn could simply be responding to a three-dot facelike configuration, rather than performing sophisticated face detection. Internally generated patterns provide an account for how such “innate” machinery

220 10 Understanding High-Level Development: Face Detection

Retina

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

LGN

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

V1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2728

 

 

 

927

 

 

836

 

 

664

 

 

1094

 

 

994

 

 

931

 

 

807

 

 

 

275

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

FSA

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0.0

 

 

 

5.8

 

 

5.5

 

 

4.0

 

 

0.0

 

 

0.0

 

 

0.0

 

 

0.0

 

 

0.0

 

 

 

(a)

 

 

 

(b)

 

 

(c)

 

 

(d)

 

 

(e)

 

 

(f )

 

 

(g)

 

 

(h)

 

 

 

(i)

Fig. 10.6. Response to schematic images by Goren et al. (1975) and Johnson et al. (1991).

The activations of the retina, LGN, V1, and FSA levels are shown using the plotting conventions from Figures 10.4 and 10.5. The top row shows a set of input images as they are drawn on the retina. These patterns were presented to newborn human infants on head-shaped paddles moving at a short distance (about 20 cm) from the eyes, against a light-colored ceiling. The newborn’s preference was determined by measuring the average distance his or her eyes or head tracked each pattern, compared with other patterns. Below, x>y indicates that image x was preferred over image y under those conditions. Goren et al. (1975) measured infants between 3 and 27 minutes after birth. They found that b>f >i and b>e>i. Similarly, Johnson et al. (1991), in one experiment measuring within 1 hour after birth, found b>e>i. In another, measuring at an average of 43 minutes, they found b>e, and b>h. Finally, Johnson and Morton (1991), measuring newborns an average of 21 hours old, found that a>(b,c,d), c>d, and b>d. The HLISSOM model has the same preference for each of these patterns, as shown in the images above. The second row shows the model LGN activations resulting from the patterns in the top row. The third row shows the V1 activations, with the numerical sum of the activities shown underneath. If only one unit were active at half strength, the sum would be 0.5; higher values indicate more activation. The bottom row displays the settled responses of the FSA, again with the numerical sum underneath. This sum represents the strength of the response of the model. The images are sorted left to right according to the preferences of the model. The strongest V1 response by nearly a factor of three is to the checkerboard pattern (a), which explains why the newborn would prefer that pattern over the others. The facelike patterns (bd) are preferred over patterns (ei) because of activation in the FSA. The details of the facelike patterns do not significantly affect the results — all of the facelike patterns (bd) lead to FSA activation, generally in proportion to their V1 activation levels. The remaining patterns are ranked by their V1 activity alone, because they do not activate the FSA. In all conditions tested, the HLISSOM model shows behavior remarkably similar to that of the newborns, and provides a detailed computational explanation for why these behaviors occur. Reprinted from Bednar and Miikkulainen (2003a).