Добавил:
kiopkiopkiop18@yandex.ru t.me/Prokururor I Вовсе не секретарь, но почту проверяю Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Ординатура / Офтальмология / Английские материалы / Computational Maps in the Visual Cortex_Miikkulainen_2005

.pdf
Скачиваний:
0
Добавлен:
28.03.2026
Размер:
16.12 Mб
Скачать

180 8 HLISSOM: A Hierarchical Model

to span a human face at close range. Although the infant connectivity patterns are not known, areas V4v (ventral V4) and LO (lateral occipital area) match this description based on adult patterns of connectivity (Haxby et al. 1994; Kanwisher et al. 1997; Rodman 1994; Rolls 1990). The generic term “face-selective area” is used rather than V4v or LO to emphasize that the results do not depend on the region’s precise location or architecture, only on the fact that the region has receptive fields large enough to allow face-selective responses. Through self-organization, neurons in the FSA become selective for patterns similar to faces, and do not respond to most other objects and scenes.

8.2.3 Afferent Normalization

Compared with LISSOM’s afferent stimulation function (Equation 4.5, Appendix A), HLISSOM adds an additional parameter γn to allow divisive (shunting) normalization:

 

γA

ξabAab,ij +

ξabAab,ij

 

sij =

ab ON

 

ab OFF

 

,

(8.1)

1 + γn

 

ξab +

 

 

 

ξab

 

 

 

ab ON

ab OFF

 

where ξab is the activation of neuron (a, b) in the receptive field of neuron (i, j) in the ON or OFF channels, Aab,ij is the corresponding afferent weight, and γA is a constant scaling factor. An analogous normalization is done on the inputs from the V1 to FSA.

Equation 8.1 divides the afferent stimulation of the neuron by the total activation in its receptive fields, i.e. it normalizes the response according to total input. If the unit has a strong afferent connection to an input location and that location is active, the normalization increases the neuron’s overall activation; if it has a weak connection to that location, it decreases the activation. This push–pull effect is an abstraction of contrast invariant responses in biology (Sections 16.1.4 and 17.1.2). As seen in Figures 8.2 and 8.3, afferent normalization helps ensure that the cortex responds uniformly even to large natural images, which have a wide variety of contrasts at different locations. As a result, all afferent weights can be excitatory, and adapt based on Hebbian learning as in LISSOM.

Artificial input patterns and inputs that cover only a small area of the visual field have relatively uniform contrasts. In simulations that use such input, afferent normalization can be omitted and γn left at zero. This was the case for the LISSOM simulations in Part II, and also for most other self-organizing models of the visual cortex. In the face perception simulations, however, large natural images are used, and afferent normalization is necessary.

8.3 Inputs, Activation and Learning

181

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(a) Retinal activation

(b) LGN response

 

 

 

 

 

 

 

 

 

 

 

 

(c) V1 response:

(d) V1 response:

(e) V1 response:

γn = 0, γA = 3.25

γn = 0, γA = 7.5

γn = 80, γA = 30

Fig. 8.2. Effect of afferent normalization on V1 responses. The LGN response (b) to the activation in (a) is visualized by subtracting the OFF channel activation from the ON, and the V1 responses (ce) by color coding each neuron according to how active it is and what orientation it prefers (as in Figure 6.5, except this network, from Section 10.2, is much larger). (c) Without afferent normalization (γn = 0), the network can respond only to the strongest contrasts in the image (as in Figure 6.5): The low-contrast oriented lines, such as those along the bottom of the chin, are lost. (d) When the afferent scale (γA) is increased, the network begins to respond to these lines as well, but its activation resulting from the high-contrast contours becomes widespread and unselective. (e) With normalization (γn = 80, γA = 30), the responses are largely invariant to input contrast, and instead are determined by how closely the input pattern matches the receptive field pattern of each neuron. The activations preserve the important features of the input, and the V1 activation pattern can be used as input to a higher level map for tasks such as face processing. Afferent normalization is therefore crucial for producing meaningful responses to natural inputs, which vary widely in contrast. Figure 8.3 shows how afferent normalization affects the responses of single neurons, which underlie these differences in the V1 response.

8.3 Inputs, Activation and Learning

Let us review the inputs and the activation and learning processes in HLISSOM, focusing on how they differ from LISSOM. As in LISSOM, learning is driven by input patterns drawn on the input sheets, which in HLISSOM consist of either the retina or the PGO generator, but not both at once (Figure 8.1). Since in Part II the goal was

182 8 HLISSOM: A Hierarchical Model

 

1.0

 

 

 

 

 

response

0.8

 

 

 

 

 

0.6

 

 

 

 

 

settled

 

 

 

 

 

0.4

 

 

 

 

 

Peak

 

 

 

 

 

0.2

 

 

 

 

 

 

 

 

 

 

 

 

0.0

30o

60o

90o

120o

150o

 

0o

 

 

 

Orientation

 

 

 

 

(a) γn = 0, γA = 3.25

 

 

1.0

 

 

 

 

100%

response

 

 

 

 

 

90%

0.8

 

 

 

 

80%

 

 

 

 

70%

 

 

 

 

 

60%

0.6

 

 

 

 

50%

 

 

 

 

40%

settled

 

 

 

 

 

 

 

 

 

30%

 

 

 

 

 

20%

0.4

 

 

 

 

10%

Peak

 

 

 

 

0.2

 

 

 

 

 

 

 

 

 

 

 

 

0.0

30o

60o

90o

120o

150o

 

0o

 

 

 

Orientation

 

 

 

 

(b) γn = 80, γA = 30

 

Fig. 8.3. Effect of afferent normalization on V1 neuron tuning. The differences in V1 population activities shown in Figure 8.2 are due to changes in how individual neurons respond at different contrasts. These plots show orientation tuning curves of the neuron at the center of the cortex, which prefers stimuli oriented at 60. Each curve shows the peak settled responses of this neuron to sine gratings whose orientations are indicated in the x-axis and contrast specified in the legend at right. In each case, the sine grating phase was used that resulted in the largest response. (a) Without afferent normalization, the neuron becomes less selective for orientation as contrast increases. Given enough contrast (above 50%), the neuron responds at full strength to inputs of all orientations, and thus no longer provides information about the input orientation. (b) With normalization, the tuning curve is the same over a wide range of contrasts, allowing the neuron to respond only to inputs that match its orientation preference. The curves are similar at 20% contrast (solid line), but the neuron now responds selectively to other contrasts as well. Afferent normalization is therefore crucial for preserving orientation selectivity over a wide range of contrasts.

to demonstrate which essential features of the input are responsible for orientation, ocular dominance, and direction selectivity, each simulation used only a single type of input. In contrast, with HLISSOM the goal is to understand the how internally generated inputs and environmental inputs together influence self-organization, and thus HLISSOM simulations include the input activity patterns thought to occur at each developmental age.

Internally generated activity has been observed in several locations in the developing visual system (Section 2.3). Given the current evidence, retinal waves are the most likely source for prenatal self-organization of V1 orientation maps (as will be discussed in Section 16.2.1). One such pattern is reproduced in Figure 8.4a, showing activity in retinal cells of the ferret before photoreceptors have developed. Although their precise origin still needs to be determined, retinal waves activate the ON and OFF neurons differently (Section 16.2.1; Myhr, Lukasiewicz, and Wong 2001). Such patterns can be modeled as “noisy disks”, i.e. large active areas (modeling ON channel activation) and large inactive areas (modeling OFF channel activation) with oriented edges in a noisy background (Figure 8.4b). These patterns will be used to

8.3 Inputs, Activation and Learning

183

 

 

 

 

 

 

 

 

 

 

(a) Retinal waves

(b) Noisy disks

(c) Three-dot patterns

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(d) Nature

(e) Landscapes

(f ) Faces

Fig. 8.4. Internally generated and environmental input patterns. The three top images depict prenatal input patterns on the retina and the PGO pathway in gray scale from black to white (low to high). (a) A sample retinal wave pattern from the ferret (see also Figure 1.2) is used to motivate the actual patterns in HLISSOM experiments. (b) The “noisy disk” representation of retinal waves is used to organize the orientation map prenatally. A light disk models activity in the ON channel and a dark disk that in the OFF channel. (c) A PGO activity configuration of three dark noisy disks, corresponding to the two eyes and the nose/mouth area, is proposed to underlie prenatal development of face preferences. The three bottom images are samples of visual inputs, including those of (d) nature, (e) landscapes, and (f ) faces. Randomly located retina-size segments (such as those shown by white squares) are used to train and test V1, and full face images to train and test the FSA, measuring how the variation in postnatal training affects the orientation map and how the face preferences develop postnatally. Sources: (a) Feller et al. (1996), (d) Shouval et al. (1996, 1997), (e) National Park Service (1995), (f ) Achermann (1995), copyright 1995 by University of Bern.

organize the HLISSOM orientation map prenatally, i.e. before the onset of visual experience. 1

Other sources of internally generated activity share features with retinal waves; however, their structure has not been characterized in enough detail to date to model directly. The hypothesis tested in the face preference experiments is that PGO patterns consisting of triplets of such waves (Figure 8.4c), corresponding roughly to the

1For convenience, the terms “prenatal” and “postnatal” are used to refer to the phases before and after the onset of visual experience. In humans this onset indeed coincides with birth, but in animals such as ferrets and cats it roughly matches eye opening, which happens several days or weeks after birth.

184 8 HLISSOM: A Hierarchical Model

dark outlines of the two eyes and the nose and mouth area, could explain why human newborns are drawn to facelike visual inputs. Like retinal waves, PGO waves are not the only possible source for such patterns, but they are the most likely cause for prenatal self-organization of higher levels (Section 16.2.2).

Postnatal training, on the other hand, is based on natural visual inputs in the retina, including photographic images of natural objects, landscapes, and faces (Figure 8.4d–f ). Each of these datasets has a slightly different distribution of orientations: Whereas objects have more horizontal and vertical edges than other orientations, landscapes are predominantly horizontal and faces mostly vertical. Compared with the prenatal PGO patterns, the face images include strong outlines and more detailed internal features. While it is difficult to obtain a realistic set of such patterns that would match the experience of an infant (as will be discussed in Section 17.3.3), it is possible to demonstrate what effects the variations of such patterns might have. In several experiments in Chapters 9 and 10, variations of these patterns as well as the prenatal ones will be tested to evaluate how strongly the developmental process depends on specific input features.

The ON and OFF sheets of the LGN in HLISSOM are identical to those in LISSOM, with DoG receptive fields (Equation 4.1) that filter out large, uniformly bright or dark areas, leaving only edges and lines (through Equation 4.3). The cortical sheets, i.e. V1 and the FSA, are similar to V1 in LISSOM. Each consists of initially unselective, laterally connected units that become selective through learning. V1 receives input from the ON and OFF cells of the LGN, while the FSA receives input from the laterally settled response of V1. The mapping from V1 to the FSA is constructed just like the mapping from the LGN to V1 in Figure A.1(a), so that no FSA neuron has a receptive field that is cropped by the border of V1.

HLISSOM simulations with large natural images generally start with an initial normalization strength γn of zero, because neurons are initially unselective. As neurons become selective over the course of training, γn is gradually increased. To prevent the net responses from decreasing, the scaling factor γA is set manually to compensate for each change to γn. The goal is to ensure that the afferent response ζ will continue to have values in the full range [0..1] for typical input patterns, regardless of the γn value. At the same time, γn ensures that the cortex responds to all areas of the input, not just the areas with the highest contrast.

After the afferent normalization, the initial cortical response is calculated from the afferent response using a sigmoid activation function (Equations 4.4–4.6). Activation then settles due to the lateral connections (Equation 4.7), and each weight is updated as in LISSOM. Through this process, HLISSOM develops realistic ordered maps, receptive fields, and lateral connections.

8.4 Effect of Input Sequence and Initial Organization

Before analyzing how the different training sequences affect self-organization in HLISSOM, it is necessary to verify that the resulting organization is indeed primarily determined by the inputs, and not by the initial random state of the network. This

8.5 Conclusion

185

hypothesis is experimentally verified in this section using HLISSOM, but it applies to all LISSOM models and also to other similar self-organizing models. There are two types of variability between runs with different random numbers: the random order, location, and position of the individual input patterns at each iteration, and the random initial values of the connection weights. Each one can be independently varied while the other is kept constant, and the resulting differences can be observed.

A series of orientation map simulations similar to those in Section 5.3 were run in this way (Figure 8.5; Appendix C.1). The results demonstrate that the map shape does not depend on the random initial values of the weights, as long as the initial weights are drawn from the same random distribution. This observation is consistent with those on the SOM model (Cottrell, de Bodt, and Verleysen 2001), confirming that self-organization is less sensitive to initial conditions than e.g. backpropagation learning (Kolen and Pollack 1990). Instead, the self-organized orientation map pattern in HLISSOM depends crucially on the stream of inputs. Two different streams lead to different orientation maps, even if the streams are drawn from the same distribution. The overall properties of the maps (such as the distance between orientation patches, the number of pinwheels, etc.) are very similar, but different input streams lead to different arrangements of patches and pinwheels.

HLISSOM is insensitive to initial weights for three reasons, all of which are common properties of most incremental Hebbian models. First, because input patterns vary smoothly, the receptive fields are relatively large, and early in self-organization the afferent weights are uniformly random, the initial scalar product responses between the input and weight vectors (Equation 4.5) are similar regardless of the specific weight values. Second, because these responses settle through lateral excitation (Equation 4.7), the final activity levels are even more similar. Third, with a high enough learning rate, the initial weight values are soon overwritten by the Hebbian learning based on the final responses (Equation 4.8). The net result is that as long as the initial weights are generated from the same distribution, their precise values do not significantly affect map organization. Similar invariance to the initial weights should be found in other Hebbian models that compute the scalar product of the input and a weight vector, particularly if they include lateral excitation and use a high learning rate in the beginning of self-organization.

In animals, maps are also similar between members of the same species, but they differ in the specific arrangements of orientation patches and pinwheels (Blasdel 1992b). The HLISSOM model predicts that the specific orientation map pattern in the adult animal depends primarily on the order and type of activity seen by the cortex in early development, and not on the details of the initial connectivity. This result also means that it is very important to study how different input streams affect the self-organization process, as will be done in the next two chapters.

8.5 Conclusion

The HLISSOM model includes the retina and a brainstem pattern generator, the LGN (both ON and OFF channels), V1, and a higher level face-selective region. It can be

186 8 HLISSOM: A Hierarchical Model

Weights 1, inputs 1

Weights 2, inputs 1

Weights 2, inputs 2

(a) Iteration 0

(b) Iteration 50

(c) Iteration 10,000

Fig. 8.5. Effect of different input streams and initial organizations on the self-organizing process. Using a different stream of random numbers for the weights (top two rows) results in different initial maps of orientation preference (a), but has almost no effect on the final selforganized maps (c), nor the lateral connections in them. (The lateral connections are shown in white outline for one sample neuron, marked with a small white square; orientation selectivity is not plotted in this Figure to make the preferences visible in the initial map.) The final result is the same because lateral excitation smooths out differences in the initial weight values, and leads to similar large-scale patterns of activation at each iteration. This process can be seen in the early map (b): The same large-scale features are emerging in both maps despite locally different patterns of noise caused by the different initial weights. In contrast, changing the input stream (bottom two rows) produces very different early and final map patterns and lateral connections, even when the initial weights are identical. Thus, the input patterns are the crucial source of variation, not the initial weights. An animated demo of these examples can be seen at http://computationalmaps.org.

8.5 Conclusion

187

trained with both internally generated patterns and natural images, and the resulting organization depends on the input sequence, not the initial unordered state. These components and properties allow HLISSOM to simulate developmental processes crucial for orientation and face processing in young animals and infants, as will be shown in Chapters 9 and 10. The results can be compared with experimental data and often lead to specific predictions for future experiments.

9

Understanding Low-Level Development: Orientation

Maps

Using the HLISSOM model introduced in the previous chapter, this chapter will demonstrate how genetic and environmental influences can interact in developing biologically realistic orientation maps, V1 receptive fields, and lateral connections. The focus will be specifically on orientation maps because of the wealth of experimental data now available about their development. Patterns resembling retinal waves are first shown to have the right properties for developing rudimentary maps like those seen in newborns. These maps are then refined in postnatal learning with natural images, allowing the map to adapt to the statistical properties of the environment. The simulations show how HLISSOM can account for much of the complex process of orientation map development in V1, and also serve as a well-grounded test case for the methods used in the face perception experiments in the next chapter.

9.1 Biological Motivation

The LISSOM simulations in Section 5.3.5 showed that orientation maps can form based on a variety of input patterns, as long as the patterns have sufficient spatial structure. The properties of the map are slightly different in each case, reflecting the features of the input. The first goal of this chapter is to analyze what kind of features internally generated input should have to explain the rudimentary orientation map structure seen in newborns.

In particular, Miller (1994) suggested that retinal wave patterns (discussed in Section 2.3.3) might be too large and too weakly oriented to drive the development of orientation preferences in V1. However, these patterns contain spots of activity that have oriented edges. As long as these spots are large relative to V1 receptive fields, their shape and size should not matter, nor should the background noise; the oriented edges should be enough for V1 neurons to learn to represent orientation. This hypothesis is indeed verified in Section 9.2 in computational experiments with HLISSOM: Retinal waves do have sufficient structure to allow orientation maps and selectivity to develop; further, training on such patterns results in maps that match newborn maps better than those trained with idealized inputs or with random noise.

190 9 Understanding Low-Level Development: Orientation Maps

This result allows asking the next question: How are such internal inputs combined with external ones during development? None of the map models discussed in Section 5.2 has yet demonstrated how the orientation map can smoothly integrate information from these two sources. A number of the models have simulated spontaneously generated activity (e.g. Burger and Lang 1999; Linsker 1986a,b,c; Mayer, Herrmann, and Geisel 2001; Miller 1994; Piepenbrock, Ritter, and Obermayer 1996), and a few models have shown self-organization based on natural images (as reviewed in Section 5.2). Yet, to our knowledge, the only orientation map model to be tested on a prenatal phase with spontaneous patterns followed by a postnatal phase is the Burger and Lang (1999) model. They found that if a map organized based on uniformly random noise was subsequently trained on natural images (actually, patches from a single natural image), the initial structure was soon overwritten. As was discussed in Section 8.1, this is a curious result because animal maps instead maintain the same overall structure during postnatal development.

Section 9.3 will demonstrate how the HLISSOM prenatal map is smoothly refined in postnatal training with natural images. The prenatal map organization is not very different from that of a na¨ıve network, i.e. an initially random map trained only with natural images. Therefore, postnatal training with natural images will only locally adjust the map, not replace it with something else. In this way HLISSOM will show how internal inputs and natural images can interact to construct realistic orientation maps, an important finding that has not been explained by previous models.

After demonstrating that prenatal and postnatal training together can account for the experimental data, the question is: Why are there two phases? There is indirect evidence that the visual cortex keeps getting trained with internally generated inputs at least for several weeks after birth (Crair et al. 1998). Would it be possible to obtain the refined adult orientation map in such continued training with internal inputs? Conversely, is the prenatal phase necessary, or would an adult map form just as well through postnatal training with natural images only?

Further simulations in Section 9.4 demonstrate that accurate adult maps can be obtained with internally generated patterns alone, and with natural images alone. However, there are good reasons why both phases exist: Prenatal training is an advantage because it allows the animal to have a functional visual system already at birth, and its further development will be more robust. Postnatal adaptation, on the other hand, allows it to form an accurate representation of the environment that it actually encounters during its life. Therefore, both phases serve a distinctly different role in constructing the visual system. How these two processes could continue interacting throughout the animal’s life, balancing the need to adapt to the environment and the need to maintain stable visual abilities, is in important further research question, discussed in Section 17.2.4.

9.2 Prenatal Development

In this section, HLISSOM is trained with internally generated patterns to develop a rudimentary orientation map similar to e.g. those of newborn kittens. Maps trained