Ординатура / Офтальмология / Английские материалы / Computational Maps in the Visual Cortex_Miikkulainen_2005
.pdf
15.4 GLISSOM Scaling |
335 |
responding weights of B’s original neighbors, i.e. the neurons Bi of the original network that surround B in its two-dimensional neighborhood hB . These neurons are called B’s ancestors. In the middle, each neuron has four ancestors; at the corners, each has only one, and along the edges, each has two. Each ancestor has an influence f ranging from 0 to 1 on the computed weights of B, determined by its proximity to B:
fBBi = 1.0 − |
d(B, Bi) |
, |
(15.4) |
dmax |
where d(B, Bi) represents the Euclidean distance between B and its ancestor Bi in the two-dimensional area, and dmax is the maximum possible distance between B and any of its ancestors, i.e. the diagonal spacing between the ancestors. The afferent connection strength wAB is then a normalized influence-weighted linear combination of the weights from A to B’s ancestors:
wABi fBBi
wAB = |
i hB |
|
, |
(15.5) |
|
|
|||
|
|
fBBi |
|
|
i hB
where wABi is the afferent connection weight from receptor A to the ith ancestor of B. Because receptive fields are limited in size, not all ancestors receive connections from that receptor; only those that do contribute to the sum.
The lateral connection strengths from neuron C to neuron B in the scaled map are computed analogously based on the connection strengths between the ancestors of C and the ancestors of B. The two kinds of lateral connections, excitatory and inhibitory, are computed separately through the same procedure. First, the contribution of C’s ancestors to each Bi is calculated as
|
|
wCj Bi fCCj |
|
|
gCBi = |
j hC |
|
, |
(15.6) |
|
|
|||
|
|
fCCj |
|
|
j hC
where wCj Bi (either E or I) is the connection weight from the jth ancestor of C to the ith ancestor of B (if such a connection exists). The new lateral connection strength wCB is then the influence-weighted sum of the contributions from all ancestors of B:
gCBi fBBi
wCB = |
i hB |
|
. |
(15.7) |
|
|
|||
|
|
fBBi |
|
|
i hB
Because the neurons in the scaled network have more lateral connections than those in the original map, the new connections are usually pruned immediately during the scaling process: Each new connection is included in the scaled network only if it is
336 15 Scaling LISSOM simulations
96 × 96 V1 48 × 48 V1
(a) Afferent weights (b) Lateral excitatory |
(c) Lateral inhibitory |
(d) V1 |
weights |
weights |
|
Fig. 15.6. Scaling cortical density in GLISSOM. In a single GLISSOM cortical density scaling step, a 48 × 48 V1 (top row) is expanded into a 96 × 96 V1 (bottom row) at iteration 10,000 out of a total of 20,000. Smaller scaling steps are usually more effective, but the large step makes the changes more obvious. A set of weights for one neuron in each network is shown in (a–c). At this point in training, the afferent and lateral connection profiles are still only weakly oriented, and lateral connections have not been pruned extensively (the jagged black outline in (c) shows the current connectivity). The orientation map for each network is shown in (d), with the inhibitory weights of the sample neuron overlaid in white outline. The orientation map measured from the scaled map is identical to that of the 48 × 48 network, except that it has twice the resolution. This network can then self-organize at the new density to represent finer detail.
larger than the pruning threshold. This procedure makes sure the scaled networks require as little memory as possible.
Figure 15.6 shows an example scaling step for a partially organized orientation map and the weights of one neuron in it (to see the correspondence more clearly, the lateral connections were not pruned in this example). The larger map replicates the structures of the smaller one, and can be self-organized to represent further detail.
15.4.2 Method
The GLISSOM simulations were based on the same reduced LISSOM orientation model as the simulations in Section 15.2, with a 24 × 24 retina. The parameters of this model were adjusted with the density and area scaling equations to get the specific model for each of the comparisons.
Each GLISSOM simulation started with low cortical density. The scaling method was then used to increase the density gradually as the network self-organized. At the same time, the other parameters were adjusted according to the scaling equations to
15.4 GLISSOM Scaling |
337 |
make sure the map stayed functionally the same. Similar scaling could be used to increase the retinal density during self-organization, but because retinal processing does not affect the computation and memory usage much, retinal density was not adapted in the simulations.
The precise scaling schedule is not crucial as long as the initial map is large enough so that the largest features of the final map can be represented approximately in the initial map. A linear increase from the initial size No to the final size Nf usually works well. If scaling is faster than linear, i.e. the simulation scales up quickly to large maps which are then self-organized for a long time, the final maps will be more refined but the simulation takes more time; conversely, slower than linear scaling results in faster simulation but less accurate maps. If the scaling steps are very large, the final map may have more distortions. On the other hand, many small steps incurs a significant overhead of having to organize the initial weights many times. Therefore, the most effective schedule usually consists of a few medium-size steps.
The scaling steps N were computed as
N = No + m(No − Nf ), |
(15.8) |
where m is a constant whose values increase approximately linearly over the simulation. This equation allows specifying the scaling steps uniformly across experiments even with different network sizes. Unless stated otherwise, the simulations consisted of four steps, with m = 0.20 at iteration 4000, 0.47 at 6500, 0.67 at 12,000, and 1.0 at 16,000. The rest of the simulation details are described in Appendix B.3.
The GLISSOM maps formed in this manner were compared with LISSOM maps that were organized directly at the final size. The self-organization processes and the final maps are compared next.
15.4.3 Comparing LISSOM and GLISSOM Maps
The first result is that GLISSOM develops an orientation map in a similar process as a full-size LISSOM (Figure 15.7). Both networks pass through similar stages of intermediate order, while the GLISSOM map size gradually approaches that of the LISSOM map.
Second, as long as the initial GLISSOM map is sufficiently large to represent the global organization, GLISSOM results in an orientation preference map and weight patterns that are qualitatively and quantitatively equivalent to those of LISSOM (Figures 15.8 and 15.9).
Third, GLISSOM significantly reduces the overall computation time and memory usage (Figure 15.10). For example, for a final map with N = 144, LISSOM takes 5.1 hours for 20,000 training iterations, whereas GLISSOM finishes in 1.6 hours, yielding a speed-up ratio of 3.1. For the same simulation, LISSOM requires 317 MB of memory to store its connections, while GLISSOM requires only 60 MB, resulting in memory savings ratio of 5.2. Importantly, the speed-up and memory savings increase with larger networks, which means that GLISSOM can make simulation of very large networks practical.
338 15 Scaling LISSOM simulations
LISSOM |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
144 |
× 144 |
|
144 |
× 144 |
|
144 |
× 144 |
|
144 |
× 144 |
|
144 × 144 |
GLISSOM |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
36 |
× 36 |
|
36 |
× 36 |
|
58 |
× 58 |
|
87 |
× 87 |
|
144 × 144 |
|
(a) Iteration 0 |
|
(b) Iter. 1000 |
|
(c) Iter. 5000 |
|
(d) Iter. 10,000 |
|
(e) Iter. 20,000 |
||||
Fig. 15.7. Self-organization of LISSOM and GLISSOM orientation maps. The GLISSOM map is gradually scaled so that by the final iteration it has the same size as LISSOM. To make the scaling steps more obvious, this example is based on the smallest acceptable initial network; Figure 15.9 shows that results match even more closely for larger initial networks. At each iteration, the features that emerge in the GLISSOM map are similar to those of LISSOM except for discretization differences. An animated demo of these self-organization examples can be seen at http://computationalmaps.org.
|
0.05 |
|
|
|
|
|
|
|
|
(RMS) |
0.04 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
differences |
0.03 |
|
|
|
|
|
|
|
|
0.02 |
|
|
|
|
|
|
|
|
|
Weight |
0.01 |
|
|
|
|
|
|
|
|
|
00 |
18 |
36 |
54 |
72 |
90 |
108 |
126 |
144 |
|
|
|
|
|
No |
|
|
|
|
Fig. 15.8. Accuracy of the final GLISSOM map as a function of the initial network size.
Each point shows the RMS difference between the final values of the corresponding weights of each neuron in two networks: a 144 × 144 LISSOM map, and a GLISSOM network with an initial size shown on the x-axis and a final size of 144 × 144. Both maps were trained on the same stream of oriented inputs. The GLISSOM maps starting at most as large as N = 96 were based on four scaling steps, whereas the three larger starting points included fewer steps: N = 114 had one step at iteration 6500, N = 132 had one step at iteration 1000, and there were no scaling steps for N = 144. Low values of RMS difference indicate that the corresponding neurons in each map developed very similar weight patterns. The RMS difference drops quickly as larger initial networks are employed, becoming negligible above 36 × 36. As was described in Section 15.2.3, this lower bound is determined by rEf , the minimum size of the excitatory radius.
15.5 Scaling to Cortical Dimensions |
339 |
|||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(a) GLISSOM |
(b) GLISSOM |
(c) GLISSOM |
(d) LISSOM |
|||
No = 36: |
|
No = 54: |
|
No = 72: |
|
Fixed N = 144: |
1.63 hours, 61 |
MB |
1.94 hours, 60 |
MB |
2.29 hours, 69 |
MB |
5.13 hours, 317 MB |
Fig. 15.9. Orientation maps in LISSOM and GLISSOM. Above the minimum 36 × 36 initial network size, the final GLISSOM maps closely match those of LISSOM, yet take much less time and memory to simulate. Computation time increases smoothly as larger initial networks are used, allowing a tradeoff between accuracy and time. However, accurate maps are obtained substantially faster than with LISSOM. As long as the initial networks are small compared with the final maps, memory usage is bounded by the size of the final maps.
These results validate the hypothesis that a coarse approximation suffices for the early iterations in LISSOM. Early in training, only the large-scale organization of the map is important; using a smaller map for this stage does not significantly affect the final results. Once the large-scale structure settles, individual neurons become more selective and differentiate from their local neighbors; a denser map is required so that this detailed structure can develop. Thus, GLISSOM uses an appropriate map size for each stage in self-organization, in order to model development faithfully while saving simulation time and memory.
15.5 Scaling to Cortical Dimensions
The maps studied so far in this book represent only a small region of V1 and have a limited connectivity range. The results presented in this chapter make it possible to obtain a rough estimate of the resource requirements needed to approximate the full density, area, and connectivity of the visual cortex of a particular species. As discussed in the next section, with such a simulation it will be possible to study phenomena that require the entire visual field or the full cortical column density and connectivity. Calculating the full-scale parameter values is also useful because it can help tie the parameters of a small model to physical measurements. For instance, once the relevant scaling factors are calculated, the connection lengths, receptive field sizes, retinal area, and cortical area to be used in a model can all be derived from measurements in a biological preparation. Conversely, where such measurements are not available, GLISSOM parameter values that result in realistic behavior constitute predictions for future experiments.
In this section, the resource requirements and key LISSOM parameters are computed that make it possible to simulate the full human primary visual cortex at the
340 |
15 |
Scaling LISSOM simulations |
(millions)connectionsofNumber |
|
|
|
|
|
|
|
|||||
(hours)timeSimulation |
5 |
|
|
|
GLISSOM |
|
8 |
|
|
|
GLISSOM |
|
|||
|
|
|
|
|
LISSOM |
|
|
|
|
|
|
LISSOM |
|
||
|
4 |
|
|
|
|
|
|
|
6 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
|
|
|
|
|
|
|
2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
|
|
|
|
|
|
|
2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
0 |
|
|
|
|
|
|
|
0 |
|
|
|
|
|
|
|
36 |
54 |
72 |
90 |
108 |
126 |
144 |
|
36 |
54 |
72 |
90 |
108 |
126 |
144 |
|
|
|
|
N f |
|
|
|
|
|
|
|
N f |
|
|
|
|
|
|
(a) Simulation time |
|
|
|
|
(b) Peak memory usage |
|
||||||
LISSOM/GLISSOMtime |
|
|
|
|
|
|
|
memoryLISSOM/GLISSOM |
5 |
|
|
|
|
|
|
3 |
|
|
|
|
|
|
1 |
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
|
|
|
|
|
|
|
|
4 |
|
|
|
|
|
|
|
2 |
|
|
|
|
|
|
|
3 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
|
|
|
|
|
|
|
1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
0 |
|
|
|
|
|
|
|
0 |
|
|
|
|
|
|
|
36 |
54 |
72 |
90 |
108 |
126 |
144 |
|
36 |
54 |
72 |
90 |
108 |
126 |
144 |
|
|
|
|
N f |
|
|
|
|
|
|
|
N f |
|
|
|
|
|
|
(c) Speed-up |
|
|
|
|
|
(d) Memory savings |
|
|||||
Fig. 15.10. Simulation time and memory usage in LISSOM vs. GLISSOM. Computational requirements of the two methods are shown as a function of the network size. In the LISSOM simulations the network had a fixed size Nf , as indicated on the x-axis; in GLISSOM the initial size was No = 36 and the final size Nf as indicated on the x-axis. Each point represents one simulation; the variance between multiple runs was negligible (less than 1% even with different input sequences and initial weights). (a) Simulation time includes training and all other computations such as plotting, orientation map measurement, and GLISSOM’s scaling steps. The simulation times for LISSOM increase dramatically with larger networks, because larger networks have many more connections to process. In contrast, because GLISSOM includes fewer connections during most of the self-organizing process, its computation time increases only modestly for the same range of Nf . (b) Memory usage consist of the peak number of network connections required for the simulation; this peak determines the minimum physical memory needed when using an efficient sparse format for storing weights. LISSOM’s memory usage increases very quickly as Nf is increased, whereas GLISSOM is able to keep the peak number low; much larger networks can be simulated on a given machine with GLISSOM than with LISSOM. (c,d) With larger final networks, GLISSOM results in greater speed-up and memory savings, measured as the ratio between LISSOM and GLISSOM simulation time and memory usage.
15.5 Scaling to Cortical Dimensions |
341 |
column level. These calculations apply to models with one computational unit per cortical column, and at most one long-range connection between units.
Most of the simulations in the preceding sections include a map whose global features match approximately a 5 mm × 5 mm = 25 mm2 patch of macaque V1 (e.g. compare Figure 15.9 with Figure 9.4a). The full area of human V1 has been estimated at 2400 mm2 (Wandell 1995), and so a full-size simulation would need to have an area about 100 times as large as the current simulations.
The full density of a column-level model of V1 can also be calculated. The total number of neurons in human V1 has been estimated at 1.5 × 108 (Wandell 1995). Each cortical unit in LISSOM represents one vertical column, and the number of neurons per vertical column in primate V1 has been estimated at 259 (Rockel,
Hiorns, and Powell 1980). Thus, a full-density, full-area, column-level simulation of V1 would require about 1.5 × 108/259 ≈ 580, 000 column units in total, which
corresponds to LISSOM parameter N = √580, 000 ≈ 761.
More important than the number of units is the number of long-range lateral connections, because they determine the simulation time and memory requirements. Lateral connections in V1 can be as long as 8 mm (Gilbert et al. 1990), but the connections in the LISSOM models so far have been shorter in order to make them practical to simulate. For instance, scaling the parameters used in the previous sections to the
full density would result in an inhibitory radius r = 18, but matching the full 8 mm
I √
connection length at full density would require rI = 8 × 761/ 2400 ≈ 124. This larger radius requires about 45 times as much memory as rI = 18, because the memory usage increase with the area enclosed by rI. In the current LISSOM implementation, all of these possible connections must be stored in memory, so supporting such long connections would need enough memory for 7612 ×(2 ×124 + 1)2 ≈ 4 ×1010 connections in V1. Thus, simulating the entire V1 at full density would require about 4 ×4 ×1010/230 ≈ 150 gigabytes of RAM (assuming 4 bytes per connection). Such simulations are currently possible only on large supercomputers.
In contrast, because all possible final connections do not need to be included in the initial network, GLISSOM can make use of a sparse lateral connection storage format that takes much less memory, and correspondingly less computation time. The memory required depends on the number of connections that remain active after selforganization, which in current GLISSOM simulations is about 15%. As the radius rI increases, this percentage decreases quadratically, because long-range connections extend only along the preferred orientation of the neuron and not in all directions (Bosking et al. 1997; Sincich and Blasdel 2001). Thus, for the full-scale simulation, about 15%×182/1242 ≈ 0.3% of the connections would have to be included. Under
these assumptions, the memory requirement reduces to approximately 0.003×150× 1024 ≈ 460 MB. Thus, with GLISSOM it is possible to simulate the entire V1 at the
level of laterally connected cortical columns on existing desktop workstations.
342 15 Scaling LISSOM simulations
15.6 Discussion
The parameter scaling experiments in Section 15.2 showed that the LISSOM scaling approach is valid over a wide range of spatial scales. The GLISSOM experiments in Section 15.4 in turn showed that the equations can be used to reduce simulation time and memory requirements significantly and thereby make the study of large-scale phenomena tractable. The method is not specific to LISSOM; it should apply to most other models with specific intracortical connectivity, and it can be adapted to those with more abstract connectivity, such as a DoG interaction function. The growth process of GLISSOM should provide similar performance and memory benefits to most other densely connected models whose peak number of connections occurs early in training. Essentially, the GLISSOM method allows a fixed model to be turned into one that grows in place, by using scaling equations and an interpolation algorithm.
On the other hand, models that do not shrink an excitatory radius during selforganization, and therefore do not have a temporary period with widespread activation, benefit less from GLISSOM. For such models, it may be worthwhile to consider a related approach, whereby only the lateral connection density is gradually increased, instead of increasing the total number of neurons in the cortex. Such an approach would still keep the number of connections (and therefore the computational and memory requirements) low, while keeping the large-scale map features (such as the distance between orientation patches) constant over the course of selforganization.
The GLISSOM method is most effective when it can be initiated with very small maps. However, as was discussed in Section 15.2.3, the self-organizing process requires that the neighborhood radii are at least 1.0, even though the sampling limits imposed by the Nyquist theorem would allow smaller maps. One way to get around this limitation would be to approximate smaller radii with a technique similar to antialiasing in computer graphics. Before a weight value is used in Equation 4.7 at each iteration, it would be scaled by the proportion of its corresponding pixel’s area that is included in the radius. Because the mask would only apply to small radii, the added computational overhead would not be large. This technique should permit smaller networks to be simulated faithfully even with a discrete grid.
Apart from their application to simulations, the parameter scaling equations provide insight into how structures in the visual cortex differ between individuals, between species, and during development. In essence, the equations predict how the biophysical correlates of the parameters differ between any two similar cortical regions that differ in size. The discrepancy between the actual parameter values and those predicted by the scaling equations can help explain why different brain regions, individuals and species will have different functions and performance levels.
For instance, Equation 15.3 and the simulation results suggest that learning rates per connection should scale with the total number of connections per neuron. Otherwise, neurons in a more densely connected brain area would have significantly more plasticity, which (to our knowledge) has not been demonstrated. Consequently, unless the number of synapses per neuron is constant, the learning rate must be regulated at the level of the whole neuron rather than being a property of individ-
15.7 Conclusion |
343 |
ual synapses. This principle conflicts with assumptions implicit in most incremental Hebbian models that specify learning rates for individual connections directly. Future experimental work will be needed to determine whether such whole-neuron regulation of plasticity does occur, and if not, whether more densely connected regions also are more plastic.
Similarly, Equation 15.3 suggests that pruning is not based on an arbitrary fixed threshold, but depends on the total number of connections to a neuron. In the model, this behavior results from the divisive weight normalization, which ensures that increasing the number of connections makes each one weaker (as was discussed in Section 3.3, such normalization is consistent with recent biological results on neuronal regulation). If the pruning threshold were not normalized by the number of inputs, a fixed value that prunes e.g. 1% of the connections for a small cortex would prune all of the connections for a larger cortex. These findings provide independent computational and theoretical support for earlier experimental evidence that pruning is a competitive process, and not one based on a fixed threshold (Purves 1988).
The scaling equations are also an effective tool for making cross-species comparisons, particularly between species with different brain sizes. In effect, the equations specify the parameter values that a network should implement if it is to have similar behavior to a network of a different size. However, as pointed out by Kaas (2000), different species do not usually scale faithfully, probably due to geometrical, metabolic, and other restrictions. As a result, as V1 size increases, the lateral connection radii do not increase as specified in the cortical density scaling equations, and processing becomes more and more local. Kaas (2000) proposed that such limitations on connection length may explain why larger brains, such as human and macaque, are composed of so many visual areas, instead of just expanding the area of V1 to achieve the same functionality (see also Catania et al. 1999). The scaling equations in LISSOM provide a concrete platform on which to measure the tradeoffs between a small number of large visual areas and a large number of small, hierarchically connected visual areas.
15.7 Conclusion
The scaling equations and the GLISSOM method allow detailed laterally connected cortical models like LISSOM to be applied to much more complex, large-scale phenomena. Using GLISSOM, it should be possible to model all of V1 at the column level with desktop workstations. These methods also provide insight into how the cortical structures compare in brains that differ widely in size. Thus, the scaling equations and GLISSOM can help explain brain scaling in nature as well as provide a method for scaling up computational simulations of the brain.
16
Discussion: Biological Assumptions and Predictions
The experiments presented in this book provide computational support for the hypotheses presented in Chapter 1 about cortical structure, development, and function. To be well founded, a computational model should make only the assumptions that are necessary, and those assumptions should be compatible with biological evidence. Second, the model should suggest a realistic set of biological and psychological experiments that can verify or refute it. In this chapter, the assumptions underlying the self-organization, genetically driven development, and temporal coding in the LISSOM model are evaluated, and predictions are made based on the simulations. The next chapter focuses on computation, reviewing important new directions for future work.
16.1 Self-Organization
Many of the fundamental assumptions of the LISSOM model, such as the computation of the input activity as a weighted sum, the sigmoidal activation function, and Hebbian weight adaptation with normalization, are common to most neural network models. As was discussed in Chapter 3, their computational and biological validity has been examined in detail by other researchers. However, there are steps in the LISSOM self-organizing process that make it more complex than the usual abstract model of self-organizing maps. These are: (1) recurrent lateral interactions,
(2) adapting lateral connections, and (3) independent multiplicative normalization for each connection type. The self-organizing process in LISSOM is also based on a number of assumptions that were made out of computational necessity and have not yet been fully characterized experimentally. Those are: (4) short-range excitation and long-range inhibition, (5) connection death, and (6) parameter adaptation.
In this section, these assumptions will be evaluated based on how biologically valid and crucial they are for the self-organization phenomena discussed in this book. Assumptions necessary for genetically driven development and for functional effects such as grouping will be discussed in later sections.
