Добавил:
kiopkiopkiop18@yandex.ru t.me/Prokururor I Вовсе не секретарь, но почту проверяю Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Ординатура / Офтальмология / Английские материалы / Computational Maps in the Visual Cortex_Miikkulainen_2005

.pdf
Скачиваний:
0
Добавлен:
28.03.2026
Размер:
16.12 Mб
Скачать

440 E SOM Simulation Specifications

Parameter Value

 

 

N

40

 

 

 

 

L

24

 

 

 

 

σu

0.1

 

 

 

 

tf

40000

 

 

 

 

 

 

 

 

 

 

 

 

Parameter

Value

Used in

Description

 

 

 

 

αS

0.42 exp(6.0 t/tf )

Section 3.4.2

SOM learning rate

σh

max[13.3 exp(5.0 t/tf ), 0.5] Section 3.4.2

Neighborhood width

Table E.1. Defaults for SOM simulations. The αS (i.e. α in Chapter 3) and σh parameters were reduced exponentially at every iteration t until they reached their minimal values as indicated. Some of these defaults are overridden in the individual SOM simulations, as described in the text.

F

Visual Coding Simulation Specifications

The simulations in Chapter 14 focused on demonstrating how LISSOM representations form a sparse, redundancy-reduced code that retains the salient information in the input and serves as an efficient foundation for later stages of the visual system. Except as indicated below, the simulations were based on the reduced LISSOM specifications of Appendix B and SOM specifications of Appendix E. In addition, backpropagation and perceptron networks were used for reconstruction and recognition, as described in the following sections.

F.1 Sparse Coding and Reconstruction

The sparse coding and input reconstruction experiments in Section 14.2 were based on the default reduced LISSOM parameters, except the training Gaussians were very long (σa = 30), only a single pattern per iteration was used (sd = 1) to avoid overlapping such long patterns, the afferent strength was decreased (γA = 0.7) to compensate for the larger inputs, the lateral inhibitory radius was increased (rI = Nd/1.2) to allow long-range interactions, the lateral inhibitory strength was increased (γI = 4.0) to compensate for the fixed weight of 1.0 being spread over more inhibitory connections, and the cortical density was reduced (Nd = 48) to make it practical to simulate these long connections.

The SoG network was based on the same parameters as the network with selforganized lateral connections, except the lateral excitatory and inhibitory strengths were adjusted in order to perform sparse coding. In the first experiment in section 14.2.3, γI was increased to 40 while γE remained at 0.9. In the second experi-

ment, γI was reduced to 11.4 without changing γE. In the third experiment, γI = 14 and γI = 0.46.

The reconstruction input consisted of pairs of contours, each with three Gaussian segments with axis lengths σa = 1.9 and σb = 1.2, and with centers of each segment separated by 9.5 units along the contour. The center of each contour was chosen randomly from a uniform distribution in the central two-thirds of the retina, with a minimum separation of 14.3 units between the centers. The orientation of each

442 F Visual Coding Simulation Specifications

contour was determined uniformly randomly as well. These patterns were chosen to show how the representation of visual input in the isotropic networks is degraded, i.e. how the interactions between unrelated contour elements cause the response to one or more of the elements to disappear.

The sparseness of the response was measured with population kurtosis K (Field 1994; Willmore and Tolhurst 2001):

 

1

 

ηij − η¯

4

 

 

K =

 

3,

(F.1)

N 2

ση

 

 

 

 

 

 

ij

 

 

 

where ηij is the response of neuron (i, j) in the N × N network and η¯ and ση are the mean and the standard deviation of the responses.

For input reconstruction, a fully connected feedforward backpropagation network (Chauvin and Rumelhart 1995; Hecht-Nielsen 1989; Parker 1982; Rumelhart et al. 1986; Werbos 1974) with one hidden layer and sigmoidal units was trained to map the V1 activity patterns to the corresponding input activity patterns in the retina. Because only a few cortical neurons receive input from the retinal receptors near the edges (Figure A.1; Section 6.2.1), only the central 24 × 24 area of the retina was included in the target pattern. Three different networks were trained, one for reconstruction from the initial V1 response, another from the settled LISSOM response, and a third from the settled response of the sum-of-Gaussians network.

An extensive search for appropriate backpropagation parameters was first done for each network with a set of 10,000 randomly generated input patterns and the corresponding V1 responses. A feedforward network with 500 hidden units and a learning rate of 0.25 for on-line backpropagation (where weights are changed after each input presentation, as opposed to after each pass through the training set) was found to perform consistently the best. The results are robust to relatively wide variations of these parameters: Doubling or halving these values led to only slightly weaker results.

A different dataset of 10,000 randomly generated patterns was then used to compare how well the input could be reconstructed from each of the three types of activity patterns. A 10-fold cross-validation experiment was run where 9000 patterns from this dataset were used for training, 500 for deciding when to stop training (based on the RMS error), and 500 for testing. The validation and testing sets had no overlap between the 10 runs; the training time varied between 37 and 40 epochs. The final performance was measured by counting how many of the reconstructed patterns were closest in Euclidean distance similarity measure to the actual input pattern in the test set.

F.2 Handwritten Digit Recognition

In the handwritten digit recognition simulations in Section 14.3, performance based on internal representations on SOM and LISSOM maps were compared. These representations were formed with the default parameter except as indicated below.

F.2 Handwritten Digit Recognition

443

The initial SOM map was formed in eight epochs over the training set, with a learning rate of 0.01, linearly reducing the neighborhood width from 20 to 8. The LISSOM training then continued for another 30 epochs, with the inhibition radius of 20 and the excitation radius linearly decreasing from eight to one. The initial SOM was also trained for another 30 epochs, linearly decreasing the neighborhood width from eight to one.

In the LISSOM simulations, an adaptive version of the sigmoid activation function was used. As the map learns and changes its responses, a histogram of the unit’s recent activity values is maintained and used to construct an activation function that approximates the cumulative probability of activation at a given level (Choe 1995). As a result, all units respond at different levels equally often, allowing competition and self-organization to occur robustly. Such adaptation was not necessary with the input distributions and network structures used in this book, which result in even distributions of activity already. However, adaptive sigmoids are useful in general in making the self-organizing process robust under various input and network conditions.

Appropriate learning parameters were found after some experimentation to be

αA = 0.005, αE = 0.003, αI = 0.003, αS = 0.006, γE = 1.05, γI = 1.35,

θl = 1.0, and θu = 3.5. In addition, the adaptation rate of the sigmoid ασ = 0.1 and four histogram bins were used (Choe 1995).

Twelve different splits into training and testing data were generated by randomly ordering the dataset and taking the first 2000 inputs for training and the last 992 for testing. On average, each split had 600 inputs in the training set that did not appear in other training sets.

The perceptrons were trained with the on-line version of the delta rule (i.e. adapting weights after each input presentation), with a learning rate αP = 0.2, for up to 500 epochs. These settings were found to be appropriate experimentally. Among the 2000 inputs used for LISSOM training, 1700 were used to train the perceptrons, and the remaining 300 were used as the validation set to determine when to stop training. After a good learning schedule and parameters were found in this way, the whole 2000 patterns were used to train the perceptron again, in order to utilize the small training set as well as possible. The final recognition performance of the total system was measured on the remaining 992 patterns, which neither the maps (LISSOM and SOM) nor the perceptrons had seen during training.

G

Calculating Feature Maps

Feature maps, such as orientation, ocular dominance, and direction maps, summarize the preferences of a large set of neurons at once. Each pixel in a feature map plot represents the preferred stimulus of that unit. The feature preferences can be measured using a number of algorithms, but the results from each algorithm are similar, as long as the neurons are strongly selective for that feature. For instance, most map measurement methods result in the same preferred orientation for units that are highly selective for orientation. They often differ slightly for unselective units, where the preference is not as clearly defined. The typical techniques used for measuring maps are first surveyed in this appendix. The details of the weighted average method are then presented, including how it was used to compute each type of feature map plot in this book.

G.1 Preference Map Algorithms

Preference maps can be calculated directly from the weight values of each neuron, or indirectly by presenting a set of input patterns and analyzing the responses of each neuron. Direct methods are more efficient and indirect methods more accurate, as will be described in the following subsections.

G.1.1 Estimating Maps from Weights

Some feature maps can be calculated directly from weight values. For instance, a map of preferred position can be estimated by computing the center of gravity of each neuron’s afferent weights. Due to Hebbian learning, the afferent weights tend to reflect the response properties of the neuron, so the center of gravity is a good measure of what position in the input sheet the neuron prefers.

More generally, preference maps can be computed from a neuron’s weights by fitting a parametric function to the afferent weights of each neuron; the parameters of the best fit constitute an estimate of the preferences of that neuron. For instance, an ellipsoidal Gaussian can be numerically optimized to fit the afferent weights of a

446 G Calculating Feature Maps

neuron in a reduced LISSOM network, and the orientation of the resulting Gaussian provides an estimate of the neuron’s orientation preference (Sirosh 1995). Unfortunately, it is difficult to ensure that any single parametric function will be a good match to all of the RF types that may be found in a network, particularly with natural images or random noise inputs. Thus, the parametric fitting method is difficult to use with the range of LISSOM simulations presented in this book.

Lateral interactions within the network can affect the feature preferences significantly under certain circumstances, such as during cortical reorganization (Chapter 6). In general, it is not feasible to extend methods based on direct weight analysis to include such interactions. In addition, these methods rely on internal information in the model that would not be available in animal experiments, and thus the results are not directly comparable to animal data. In such cases, a method based on neuronal responses must be used instead.

G.1.2 Discrete Pattern Method

Most map measurement methods, both for animals and for models, involve presenting a series of input patterns with varying parameter values and keeping track of the responses of each neuron. For example, in the discrete pattern method, an orientation map can be computed by presenting stimuli with several different orientations. The estimated orientation preference of the neuron is the orientation of the stimulus that led to the greatest response. Lateral interactions can be taken into account by measuring the responses after the network has settled.

In practice, more than one pattern is needed for each orientation, e.g. at different retinal positions and spatial frequencies, because a neuron will only respond to its preferred orientation if it is at the correct position. That is, even though responses will be collected only for the different values of the map parameter (such as orientation), the other parameters (such as location, spatial frequency, and eye of origin) must be varied to ensure that at least one appropriate pattern has been presented for each map parameter value. For each value of the map parameter, the peak response obtained using any combination of the other parameter values is stored. The map parameter value producing the peak response for any pattern tested is then taken as the preference of this neuron.

Any input pattern capable of eliciting a neural response can be used in this procedure, including oriented Gaussians and sine gratings. Sine gratings are more practical because fewer input patterns are needed to cover the space: they vary in only one spatial dimension (i.e. phase), whereas Gaussians vary over both x and y positions. With sine gratings, the map measurement procedure can also be seen as an approximation of discrete Fourier analysis.

Although the discrete pattern method is effective and allows taking lateral interactions into account, a large number of test patterns is necessary to achieve good resolution. For instance, to obtain orientation resolution of 1would require sine gratings with a complete set of phases, typically 24, to be presented at each of 180 orientations. Entire self-organization simulations only require 10,000 input presentations, so calculating just two orientation maps takes nearly as many presentations

G.1 Preference Map Algorithms

447

(180 ×24 ×2 = 8640) as the entire simulation. Thus, in practice the discrete pattern method is either prohibitively expensive or can provide only low-resolution maps.

G.1.3 Weighted Average Method

The feature maps in this book are based on the weighted average (also known as the vector sum1) method, introduced by Blasdel and Salama (1986). This technique generalizes the discrete pattern method by providing a continuous estimate of preferences between the discrete patterns.

As in the discrete pattern method, inputs that cover the whole range of parameter values (e.g. combinations of orientations, frequencies, and phases) are presented, and for each value of the map parameter, the peak response of the neuron is recorded. The crucial difference is that the preference is not just the map parameter value that led to the peak response, but the weighted average of the peak responses to all map parameter values. For a periodic parameter like orientation, the averaging must be done in the vector domain, so that orientations just above and below zero (e.g. 10and 170) average to 0(e.g. instead of 85). For non-periodic parameters such as ocular dominance, retinotopy, or spatial frequency, the arithmetic weighted average is used instead.

In computing the preferred orientation, for each test orientation φ, other pattern parameters such as spatial frequency and phase are varied systematically, and the peak response ηˆφ is recorded. A vector is then formed for each orientation φ with ηˆφ as its length and 2φ as its orientation (because orientation is π-periodic, not 2π- periodic), and these vectors are summed together to form vector V = (Vx, Vy):

Vx =

ηˆφ cos 2φ,

 

φ

 

(G.1)

Vy =

ηˆφ sin 2φ.

 

φ

The preferred orientation of the neuron, θ, is estimated as half the orientation of V:

θ =

1

(Vy, Vx),

 

2 atan2

(G.2)

where atan2(x, y) is a function that returns tan1(x/y) with the quadrant of the result chosen based on the signs of both arguments. The magnitude of V can be taken as an estimate for orientation selectivity; its variance can be reduced by dividing with the sum of component vector magnitudes, resulting in normalized selectivity S:

 

V 2

+ V 2

 

S =

x

y

(G.3)

 

 

,

 

 

 

φ ηˆφ

 

1This method should not be confused with the vector sum method for measuring perceived orientation of cortical responses (Section 7.2.1).

448 G Calculating Feature Maps

The neuron is highly selective if much of the response is in the direction of the preferred orientation, and unselective if the response is distributed widely across all orientations.

For example, assume that patterns were presented at orientations 0, 60, and 120, and phases 0, π8 , . . . , 78π , for a total of 24 patterns. For a given neuron, assume that the peak responses across all eight phases were 0.1 for 0, 0.4 for 60, and 0.8 for 120. The preferred orientation and selectivity of this neuron are

Vx = 0.1 cos 0

+ 0.4 cos

2π

+ 0.8 cos

4π

= 0.50,

3

 

3

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(G.4)

Vy = 0.1 sin 0 + 0.4 sin

2π

+ 0.8 sin

4π

 

= 0.35,

3

3

θ =

1

atan2

(Vy, Vx) = 107,

(G.5)

 

 

 

2

 

 

 

 

 

 

 

 

 

 

 

 

 

V 2

+ V 2

 

S =

 

 

 

x

 

 

 

 

y

 

 

 

= 0.47.

 

(G.6)

0.1 + 0.4 + 0.8

 

Thus, this neuron is estimated to prefer an orientation that is intermediate between two test patterns, with a relatively low selectivity because it had a significant response to two of the patterns.

The weighted average method results in highly accurate, continuously valued estimates while requiring fewer input presentations than the discrete pattern method. For instance, in informal tests using the reduced LISSOM model, maps computed with as few as 24 input presentations (three orientations, each with eight phases) using the weighted average method had higher orientation resolution than those computed with 864 input presentations (36 orientations, each with 24 phases) using the discrete pattern method. The resulting maps were similar, but the weighted average map more accurately represented fine differences in the preferences of nearby units (verified by comparing the afferent weights). The minimum number of patterns needed depends on how broadly the neurons are tuned; in general, the method is effective as long as neurons have a significant response to at least two of the patterns.

Lateral interactions can be included in the weighted average method the same way as in the discrete pattern method, by recording the peak responses after settling. In many cases, however, responses to afferent stimulation alone (Equation 4.5) provide a sufficient approximation. The maps computed in this manner are consistent with those computed from the settled responses; typically, differences are seen only in unselective neurons, for which the preferences are less clearly defined. To save computation time, nearly all maps in this book were computed based on the afferent responses. The plasticity experiments in Chapter 6 are an important exception: With lesions, the dynamic equilibrium between afferent and lateral inputs is disturbed, and it is necessary to observe the actual settled responses.

The same algorithm can be applied to any input feature that can be varied systematically, such as ocularity or direction. In each case, peak responses are collected for each value of the map parameter, and a weighted average is computed to estimate the preferred value. If the parameter is periodic, the average is a vector sum, and the selectivity is based on its magnitude. Otherwise, an arithmetic mean is used for the

G.4 Ocular Dominance Maps

449

average, and the selectivity is based on the highest response. Sections G.2–G.5 provide details for how these techniques were applied to measure each of the different types of feature maps.

G.2 Retinotopic Maps

The retinotopic maps in Chapter 6 were computed from the settled response, and therefore included the effect of lateral interactions. The input patterns were single Gaussians that varied in x and y position, each taking the values 0, 6, 12, 18, and 24 for a retina of size R=36. The preference in each dimension was computed as the arithmetic mean of each of the positions tested, weighted by the peak response to that position.

Elsewhere in the book the retinotopic maps were computed by finding the center of gravity of the afferent weights. For the self-organized maps in those chapters, this method is roughly equivalent to computing the position preference based on afferent stimulation, but more efficient.

G.3 Orientation Maps

The orientation maps were computed from the settled response in Chapter 6, and from afferent stimulation elsewhere. For most networks, they were measured using the weighted average method based on four orientations, i.e. 0, 45, 90, and 135, and 18 phases. Similar maps are obtained as long as at least three orientations and at least eight phases are included. For historical reasons, maps in Chapter 10 were based on 36 orientations and those in Chapter 11 on 18 orientations; the maps in Chapter 15 were calculated with the discrete pattern method with 36 orientations.

Because each simulation focused on a single-size LGN RF, the same spatial frequency was used for all test patterns (1.0 units on the 42 × 42 retina in Chapter 11 and 0.76 units on the 36 × 36 retina elsewhere). Orientation selectivity was calculated as in equation G.3, multiplied by 16 to highlight areas of low selectivity such as fractures and pinwheels.

G.4 Ocular Dominance Maps

For networks that included two eyes, ocular dominance and orientation preference were computed at the same time, both using the weighted average method. The various sine gratings were presented in only one eye at a time, with twice as many test patterns in total. The ocular dominance value was obtained as the weighted average of the peak response to any pattern in the left eye and the peak response to any pattern in the right eye, divided by their sum. Selectivity was computed by dividing the peak response to the dominant eye by the sum of the peak responses for the two eyes.

450 G Calculating Feature Maps

G.5 Direction Maps

Direction maps were computed like orientation maps, but using six different directions, 12 phases, and four speeds (ranging from 0.0 to 1.0 retinal pixels per step). For each direction, the sine grating orientation was chosen to be perpendicular to the direction of motion. Because direction is 2π-periodic (unlike orientation), the vectors in the sum represented the actual direction, rather than twice the orientation. The vector sum was computed just as for orientation, but without dividing the result by two. The direction selectivity was calculated as in equation G.3 and multiplied by 96 to highlight areas of low selectivity such as fractures and pinwheels.

G.6 Orientation Gradients

For any of the feature maps described above, a gradient plot can be calculated. For example, orientation gradient plots (such as those in Figures 5.1b and 5.10b) represent how abruptly the orientation preferences change across a given point in the map. The gradient is high at fractures, and low (and nearly constant) across linear zones.

To measure and visualize the orientation gradient, first the differences Dx,ij and Dy,ij in orientation preference of each unit (i, j) in the map and its preceding neighbor in the x and the y directions were calculated:

Dx,ij = ij − Ω(i−1)j ,

(G.7)

Dy,ij = ij − Ωi(j−1),

 

where ij is the orientation that unit (i, j) prefers. Negative differences and differences larger than 90were converted to the equivalent angles within [0..90] (for example, 110and 70are both equivalent to a 70difference). The gradient magnitude Dij is then given by

Dij = Dx2,ij + Dy2,ij .

(G.8)

These values were computed for each unit in the array (except those at the top and the left edge), and together they represent the gradient over the orientation map.