Ординатура / Офтальмология / Английские материалы / Computational Maps in the Visual Cortex_Miikkulainen_2005
.pdf242 11 PGLISSOM: A Perceptual Grouping Model
firing-rate units representing cortical columns. As we saw in Section 2.4.1, superposition catastrophe will occur when activities of firing-rate units representing separate objects are combined. In order to study perceptual grouping in V1, firing-rate units must be replaced with spiking units.
A self-organizing map with spiking units can be interpreted to model cortex in two ways: (1) The spiking units can stand for individual, representative neurons in cortical columns, or (2) they can be interpreted as subgroups of neurons with oscillatory total activity. Current neuroscience data are not specific enough to favor either interpretation; however, it is easier to describe the model and its behavior in terms of neurons, so the first alternative is adopted in Part IV.
A map of spiking neurons can self-organize like a firing-rate neuron map, and it can segment simple objects (such as relatively small squares) through synchronization and desynchronization of spiking events (Choe and Miikkulainen 1997, 1998; Ruf and Schmitt 1998). The long-range inhibitory lateral interactions play a crucial role in both behaviors: They establish competition that drives self-organization, and they establish desynchronization that drives segmentation. It is not necessary to include long-range excitatory connections to achieve these behaviors.
However, to achieve simultaneous binding and segmentation of more complex objects such as long contours, long-range excitatory lateral connections need to be included in the model. Such specific excitation is necessary to overcome the segmentation that would otherwise result from the lateral inhibition, and the excitatory connections need to be long enough to bind together several contour elements. Simply making the short-range excitatory lateral connections longer in a spiking LISSOM map does not work. The neurons then respond to almost any input, and a map where the afferent receptive fields of all neurons look almost the same results. The two functions of perceptual grouping and self-organization therefore have conflicting requirements: One depends on having long and the other short excitatory lateral connections.
In order to account for perceptual grouping in a self-organizing visual cortex, the PGLISSOM model was developed. PGLISSOM includes both short-range and long-range excitatory lateral connections between spiking neurons. To prevent the long-range excitatory connections from interfering with the self-organizing process, PGLISSOM includes two layers (or maps). In the first map (SMAP), excitatory lateral connections are short range to allow self-organization to take place; in the second map (GMAP), they have a long range and implement perceptual grouping (Figure 11.1). The neurons in the corresponding locations in the two maps are linked with excitatory connections in both directions, allowing GMAP to self-organize properly, driven by the self-organization of SMAP.
In summary, PGLISSOM extends LISSOM in two important ways: Spiking neurons are included to represent grouping through temporal coding, and long-range excitatory lateral connections to coordinate synchronization for temporal coding. The model is motivated computationally, but the design is also a good match with the layered architecture found in the visual cortex, as will be discussed in Section 16.3.3. In the following sections, the components of the PGLISSOM model are described detail.
11.2 The Self-Organization and Grouping Architecture |
243 |
GMAP
V1
SMAP
Retina
ON
Fig. 11.1. Architecture of the PGLISSOM model. The cortical network consists of two layers (or maps). The lower map (SMAP) has short-range lateral excitation (dotted square) and long-range lateral inhibition (dashed square), and drives the self-organization of the model. In the upper map (GMAP), both excitation and inhibition have very long range, establishing perceptual grouping and segmentation. The two maps both receive afferent input directly from a model retina, representing the ON channel like the reduced LISSOM model (Figure 6.3). The neurons in the vertically corresponding locations on the two maps are connected via excitatory intracolumnar connections in both directions, tying such neurons together into a functional unit (i.e. a cortical column). All neurons are spiking neurons (Figure 11.2); their firing rate is visualized in gray-scale coding from white to black (low to high).
11.2 The Self-Organization and Grouping Architecture
The overall organization of the PGLISSOM model is shown in Figure 11.1. The model consists of two layers (or maps), one overlaid (or stacked) on top of the other. Both maps are based on the LISSOM model, but the extent of lateral connections in the two maps differs. To control input conditions better and to reduce computational cost (Section 11.4), the model will be trained with schematic images instead of natural images. The ON/OFF channels of the LGN can therefore be bypassed (as was discussed in Section 6.2): Both maps receive afferent input only through the ON channel, reduced into direct connections from the retinal receptors. The high-level areas and the subcortical areas of HLISSOM are not relevant for the grouping study either, and were omitted.
The lower map, SMAP, is similar to the LISSOM and HLISSOM cortical networks. Short-range excitatory connections establish a local neighborhood that enforces local correlation, and longer inhibitory connections establish competition that
244 11 PGLISSOM: A Perceptual Grouping Model
decorrelates more distant activities. Through these connections, SMAP drives selforganization in the model. In the upper map, GMAP, both the excitatory and inhibitory connections have long range. The excitatory connections form the basis for perceptual grouping: Through the self-organizing process, they learn to encode correlations in the input distribution, and the strength of the connections controls the degree of synchronization across the neurons. The inhibitory connections are broad and have a long range, causing two or more synchronized populations of neurons to desynchronize, thus establishing background inhibition for segmentation.
Neurons in the vertically corresponding locations in the two maps form a functional unit (cortical column), and they are connected to each other in both directions through excitatory intracolumnar connections. These connections influence the activity on the opposite map so that both self-organization and grouping behaviors are shared by both maps. This two-component model of the cortical column is called the SG model, corresponding to the SMAP and GMAP the columns form. The neurons are all spiking neurons, as will be described next.
11.3 Spiking Unit Model
A schematic diagram of the spiking neuron is shown in Figure 11.2. The model is based on the integrate-and-fire model discussed in Section 3.1.3 (Eckhorn et al. 1990; Gerstner 1998b; Reitboeck et al. 1993). Each neuron has three components: leaky synapses, weighted summation, and a spike generator. The synapses continuously calculate the decayed sum of incoming spikes over time. Four different kinds of input connections contribute to the weighted sum: afferent, excitatory lateral, inhibitory lateral, and intracolumnar connections (Figure 11.1). The activations of the different kinds of inputs are summed and compared with the dynamic threshold in the spike generator. Details of each component will be discussed below.
11.3.1 Leaky Synapse
Each connection in the model is a leaky integrator (Section 3.1.3), modeling the exponential decay of postsynaptic potential (PSP) in biological neurons. At each connection, an exponentially decayed sum of incoming spikes is maintained:
˜ι(t) = |
t |
|
k) e−λk , |
(11.1) |
R(t |
− |
|||
|
|
|
|
k=0
where ˜ι(t) is the current decayed sum at time step t, R(t − k) is the spike (either 0 or 1) received k time steps in the past, and λ is the decay rate. Different types of connections have separate decay rates: afferent connections (λA), excitatory lateral connections (λE), inhibitory lateral connections (λI), and intracolumnar connections (λC). The most recent input has the most influence on the activity, but past inputs also have some effect. As was discussed in Section 3.1.3, this sum can be defined recursively as
11.3 Spiking Unit Model |
245 |
Afferent connections
~ |
A |
χ |
Σ
Excitatory lateral connections
~ |
E |
η |
γA
Spike generator
γ |
|
θ |
r |
|
Σ |
b Σ |
a |
γ
E 

Inhibitory lateral connections |
γ |
σ Σ v |
|
~ |
I |
I |
|
η |
|
||
|
|
Σ |
|
Intra−columnar connections |
γ |
|
C |
||
~ |
C |
|
ζ |
|
|
Σ
S
Fig. 11.2. The leaky integrator neuron model. Leaky integrators at each synapse perform decayed summation of incoming spikes, and the outgoing spikes are generated by comparing the weighted sum with the dynamic spiking threshold. Four types of inputs contribute to the activity: afferent, excitatory lateral, inhibitory lateral, and intracolumnar connections (Equation 11.3). The dynamic threshold consists of the base threshold θb, the absolute refractory contribution θa, and the relative refractory contribution θr (Equation 11.5). The base threshold is a fixed baseline value, and the absolute refractory term has a value of ∞ for a short time period immediately following an output spike. The relative refractory contribution is increased as output spikes are generated, and it decays to zero if the neuron stays silent.
˜ι(t) = R(t) + ˜ι(t |
− |
1) e−λ. |
(11.2) |
|
|
|
Such a formulation allows implementing the model efficiently, since the past spike values R(t − k), k > 0, do not need to be stored, nor do they have to be decayed repeatedly.
By adjusting the decay rate λ, the synapse can function as either a coincidence detector or as a temporal integrator. When the synaptic decay rate is high, the neuron can only activate when there is a sufficient number of inputs coming in from many synapses simultaneously. On the other hand, when the decay rate is low, the neuron accumulates the input. Thus, presynaptic neurons can have a lingering influence on the postsynaptic neuron. As will be shown in Section 12.2.1, by varying the decay rates for different types of connections, the relative time scales of the different connection types can be controlled, and desired synchronization behavior obtained.
246 11 PGLISSOM: A Perceptual Grouping Model
11.3.2 Activation
The neuron receives incoming spikes through the afferent, lateral excitatory and inhibitory, and intracolumnar input connections, and the decayed sums of the synapses are calculated according to Equation 11.2. These sums are then multiplied by the connection weights and summed to obtain the input activity (Figure 11.2). More specifically, the input activation vij (t) to the spike generator of the cortical neuron at location (i, j) at time t consists of (1) the input from a fixed-size receptive field in the retina, centered at the location corresponding to the neuron’s location in the cortical network (i.e. afferent input), (2) excitation and (3) inhibition from neurons around it in the same map, and (4) input from neurons in the same column in the other map (i.e. intracolumnar input):
vij (t) = σ γA |
χ˜xy (t − 1)Axy,ij − γE |
η˜kl (t − 1)Ekl,ij |
|
xy |
|
kl |
|
+ γI |
η˜kl (t − 1)Ikl,ij + γC |
˜ |
, (11.3) |
ζcd(t − 1)Ccd,ij |
|||
|
kl |
cd |
|
where χ˜xy (t − 1) is the decayed sum of spikes from retinal receptor (x, y) during the previous time step, Axy,ij is the afferent connection weight from that receptor to cortical neuron (i, j), η˜kl (t − 1) is the decayed sum of spikes from the map neuron (k, l), Ekl,ij is the excitatory and Ikl,ij the inhibitory lateral connection weight from
that neuron, ˜cd( − 1) is the decayed sum of spikes from neuron ( ) in the other
ζ t c, d
map in the same vertical column as (i, j), Ccd,ij is the intracolumnar connection weight from that neuron, and γA, γE, γI, and γC are constant scaling factors. The function σ(·) is the piecewise linear approximation of the sigmoid (Equation 4.4), used to keep the input activity to the spike generator between 0.0 and 1.0.
The activation vij (t) is then passed on to the spike generator, where comparison with the dynamic threshold is made. A spike is fired if the input activity exceeds the
threshold: |
1 if vij (t) > θ(t), |
|
|
S(t) = |
(11.4) |
||
0 otherwise, |
|||
|
|
where S(t) represents the spiking output of the neuron over time. The dynamic threshold θ(t) determines how much activation is necessary to generate a spike. It depends on how much time has passed since the last firing event, as will be described next.
11.3.3 Threshold Mechanism
For a short period of time immediately after they have spiked, biological neurons cannot generate another spike. This short interval is called the refractory period and consists of two parts: (1) During the absolute refractory period, the neurons cannot fire no matter how large the input is, and (2) during the relative refractory period, neurons can only spike if they receive very strong excitatory input.
11.4 Learning |
247 |
The dynamic threshold in the PGLISSOM spike generator implements a refractory period by providing a base threshold and raising the threshold dynamically, depending on the neuron’s spike activity. The spike generator compares the input activity to the dynamic threshold and decides whether to fire (Figure 11.2). The threshold θ(t) is a sum of three terms:
θ(t) = θb + θa(t) + γθθr(t), |
(11.5) |
where θb is the base threshold, θa(t) implements the absolute refractory period during which the neuron cannot fire, θr(t) implements the relative refractory period during which firing is possible but requires extensive input, and γθ is a scaling constant. More specifically, θa(t) = ∞ if the last spike was generated less than tr time steps ago, otherwise θa(t) = 0. The relative refractory component θr(t) is implemented as an exponentially decayed sum of the output spikes (Figure 11.2), i.e. a leaky integrator similar to the leaky synapses (Equation 11.2):
θr(t) = S(t) + θr(t − 1) e−λθ , |
(11.6) |
where λθ is the decay rate.
The usual integrate-and-fire model includes a similar dynamic threshold mechanism, but consists of θb and θr only (Eckhorn et al. 1990; Reitboeck et al. 1993). The absolute refractory period makes the model more realistic, but it also serves a computational purpose: It ensures that the neurons do not fire too rapidly. This property makes synchronization more robust against noise, as will be described in the next chapter.
The spikes generated this way are propagated through the connections and the average firing rates of the neurons in a small time window are gathered. Based on these firing rates, the connection weights are adapted. The details of the learning mechanism are explained next.
11.4 Learning
The PGLISSOM simulations begin with a network in an unorganized state: All connection weights in the network are initialized e.g. to uniformly random values between 0 and 1. The network is trained by presenting visual input, and adapting the connection weights according to the Hebbian learning rule. Instead of natural images as in Part III, schematic inputs will be used for the perceptual grouping experiments for two reasons: (1) Contour integration is a well-defined task with schematic inputs; it is possible to have tight control over stimulus configurations and therefore test behavior of the model systematically, as is done in psychological experiments on human subjects. (2) With schematic inputs, smaller networks can be used, making self-organization with spiking neurons computationally feasible.
The input to the network consists of oriented Gaussian bars similar to those in Section 5.3. To generate such an input pattern in a spiking model, the input neurons
248 11 PGLISSOM: A Perceptual Grouping Model
fire with different frequencies: At the center of the Gaussian, the spike rate is maximal, and the spike rate decreases gradually for neurons farther away from the center. At each input presentation, an input with a random orientation is placed at a random location in the retina, and the resulting retinal receptor activities are propagated through the afferent connections of the network. As in firing-rate LISSOM models, the input is kept constant while the cortical response settles through the lateral connections. The retinal receptors generate spikes at a constant rate and the cortical neurons generate and propagate spikes and adjust their dynamic thresholds according to Equations 11.1 through to 11.6. After a while, the neurons reach a stable rate of firing, and this rate is used to modify the weights.
The spiking rate η(t) is calculated as a running average:
η(t) = λrη(t − 1) + (1 − λr)S(t), |
(11.7) |
where λr is the retention rate, ν(t − 1) is the previous spiking rate, and S(t) is the output spike at time t (either 0 or 1). With this method, a short-term firing rate in a limited time window is calculated. For each input presentation, the average spiking rate of each neuron is calculated through several iterations. The weight modification then occurs as in firing-rate LISSOM models (Section 4.4.1). The afferent, lateral, and intracolumnar weights are modified according to the normalized Hebbian learning rule:
wpq,ij = |
wpq,ij + αXpq ηij |
, |
(11.8) |
||
uv |
(wpq,uv + αXpq ηuv ) |
||||
|
|
|
|||
where wpq,ij is the current connection weight (either A, E, I, or C) from neuron (p, q) to (i, j), wpq,ij is the new weight to be used until the end of the next settling process, α is the learning rate (αA for afferent, αE for excitatory lateral, αI
for inhibitory lateral, and αC for intracolumnar connections), and Xpq and ηij are the spiking rates of presynaptic and postsynaptic neurons after settling. Note that in PGLISSOM normalization is done presynaptically instead of postsynaptically; this change does not affect self-organization but results in stronger grouping (Section 16.1.3; Choe, Miikkulainen, and Cormack 2000; Sakamoto 2004). As before, those connections that become near zero in the learning process are deleted, and the radius of the lateral excitation in SMAP is gradually reduced following a preset schedule.
This process of input presentation, activation, and weight adaptation is repeated for a large number of input patterns, and the neurons gradually become sensitive to particular orientations at particular locations. In this way, the network forms a global retinotopic orientation map with patchy lateral connections similar to that of the earlier LISSOM OR model. In PGLISSOM, this map will then synchronize and desynchronize the firing of neurons to indicate binding and segmentation of visual input into different objects.
11.5 Self-Organizing Process |
249 |
11.5 Self-Organizing Process
After extending the LISSOM model with spiking neurons and excitatory lateral connections, a crucial question is whether the model is still capable of self-organization in the same way as the original model. Most importantly, do the lateral connections develop a characteristic patchy structure that could form the basis for perceptual grouping? This section will demonstrate that this is indeed the case: PGLISSOM self-organizes to form orientation maps and functionally specific lateral connection patterns.
11.5.1 Method
The SMAP consisted of 136 × 136 neurons, and GMAP of 54 × 54 neurons. GMAP was smaller to make simulations faster and to fit the model within the physical memory limit. The intracolumnar connections between SMAP and GMAP were proportional to scale, so that the relative locations of connected neurons in the two maps were the same. This connectivity scheme ensures that the global order of the two maps matches as they develop. The excitatory lateral connection radius in SMAP was initially seven and was gradually reduced to three. The lateral inhibitory radius was fixed at 10, i.e. smaller than in the LISSOM simulations, in order to reduce the computational requirements; reducing the radius further would not allow maps to self-organize properly. In GMAP, excitatory lateral connections had a radius of 40 and the inhibitory connections covered the whole map; such very long-range connections were necessary to achieve proper grouping behavior. Afferent connections from the retina had a radius of six in both maps (mapped according to Figure A.1), and intracolumnar connections a radius of two. The retina consisted of 46 ×46 receptors. As long as the relative sizes of the map, the retina, and the lateral connection radii are similar to these values, the maps will self-organize well. The rest of the parameter settings are specified in Appendix D.1.
The input consisted of single randomly located and oriented elongated Gaussians with major and minor axis lengths initially σa = 3.9 and σb = 0.8, and elongated to 6.7 and 0.7 after 1000 input presentations (of a total of 40,000 presentations). Such inputs result in sharp OR tuning and long lateral connections necessary for contour integration, as will be described in the next two subsections.
11.5.2 Receptive Fields and Orientation Maps
The self-organization process was very similar to that of the firing-rate LISSOM, and also matched experimental data well (as described in Sections 2.1, 5.1 and 5.3). As in earlier LISSOM simulations, the receptive fields are initially circular and have a random profile. They gradually become elongated and smoother as the training proceeds, and the neurons gradually develop orientation preferences.
The resulting receptive fields of SMAP and GMAP are shown in Figure 11.3a. The neurons belonging to the same column in SMAP and GMAP have highly similar orientation preferences. Most neurons are highly selective for orientation, and others
250 11 PGLISSOM: A Perceptual Grouping Model
GMAP
SMAP
(a) Afferent weights
(b) Retinotopy |
Fig. 11.3. Self-organized afferent weights and retinotopic organization. In (a), the afferent weight matrices of corresponding sample neurons in SMAP and GMAP are plotted (in gray scale as in Figure 6.4, organized as in Figure 5.8a): In SMAP, every fourth neuron horizontally and vertically is shown, and in GMAP, every tenth neuron. Both maps saw the same inputs during training, and due to the intracolumnar connections they developed matching orientation preferences. Since the ON/OFF channels in the LGN were bypassed in this simulation, the receptive fields are all unimodal. However, they display the same properties as the orientation model in Section 5.3: Most neurons are highly selective for orientation, and neurons near discontinuities are unselective. In (b), the center of gravity of the afferent weights of each neuron in the network are plotted as a grid in retinal space (as in Figure 5.11). Although the two maps differ in size, the overall organization closely matches: Neurons in the same cortical column receive input from the same locations in the retinal space. The overall organization of the map is an evenly spaced grid with local distortions, as observed in biology and in the LISSOM orientation map (Figure 5.11; Das and Gilbert 1997). The preferences are sharper and the distortions wider than in the LISSOM simulations because more elongated input patterns were used during training.
11.5 Self-Organizing Process |
251 |
|||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
GMAP
SMAP
(a) OR preference |
(b) OR selectivity |
(c) OR preference & |
(d) |
|
|
selectivity |
OR H |
Fig. 11.4. Self-organized orientation map. The orientation preference and selectivity of each neuron in SMAP and GMAP are plotted using the same conventions as in Figure 5.9. Because of the intracolumnar connections, the two maps develop similar organizations. As in LISSOM and in biological maps (Figures 2.5 and 5.9), the preferences change smoothly across the cortex, and exhibit features such as linear zones, pairs of pinwheels, saddle points, and fractures (outlined as in Figure 2.4). As in animal maps, the neurons at the pinwheel centers and fractures are unselective for orientation (these features are more prominent in the SMAP, which drives the self-organization). The orientation histograms are essentially flat and therefore free of artifacts. These plots show that realistic orientation maps can be formed with spiking neurons and with the SG model of cortical columns.
respond to a variety of orientations. The retinotopic organization in the two maps closely matches even though they had different numbers of neurons (Figure 11.3b). The receptive field centers are laid out in an evenly spaced grid that covers the whole retinal space, and the local distortions of this grid reflect the underlying orientation map.
To visualize the global order of the orientation maps, the orientation preferences and orientation selectivity of each neuron were measured (Figure 11.4). Despite the density difference, both maps have highly similar global order. The preferences change smoothly across the map, and contain features such as linear zones, matched pairs of pinwheels, saddle points, and fractures. Neurons at pinwheel centers and fractures are not very selective for orientation, similar to biological maps and the LISSOM orientation map (Sections 2.1.2 and 5.3.2).
In addition to the qualitative descriptions, quantitative measures introduced in Section 5.3.3 can be used to characterize the global properties of the maps. Orienta-
