
Учебники / Hearing - From Sensory Processing to Perception Kollmeier 2007
.pdf
A Physiologically-Based Population Rate Code for ITDs |
395 |
Fig. 4 Motor control suggests a possible strategy underlying the population rate code. Broken lines indicate that the sensory-motor connections are conceptual; no physical connections are implied
the direction of motion) with assistance from the splenium muscles on the right (ipsilateral to the motion).
The outputs of the population rate model may be suitable for generating motor commands that orient the head toward a sound source (Fig. 4). Specifically, because the MSO is broadly responsive to sounds originating in the contralateral hemisphere, its output is appropriate for driving the muscles which turn the head toward the contralateral hemisphere.
The LSO channels in Fig. 4 inhibit motion toward the ipsilateral side, consistent with the function illustrated by Eq. (2) and Fig. 3. Low-frequency LSO neurons, which presumably comprise the ITD-sensitive component of the LSO, make inhibitory projections to ipsilateral IC (Glendenning and Masterton 1983).
The broken lines in Fig. 4 emphasize that the auditory brainstem does not have direct control over motor action. Rather, the intention is only to suggest a functional strategy underlying the coding of ITD. It is conceivable, however, that the population rate code described here is an evolutionary remnant of a primitive, more direct coupling between sensory stimulation and motor response. In that context, it is interesting to note that the saccule responds to moderately intense acoustic stimulation and projects to SCM motoneurons by way of the vestibular nucleus (McCue and Guinan 1997).
4.3Conclusion
The population rate code and the motor interpretation represent a simple, alternative framework to the conventional labeled-line view of ITD coding. It may prove useful in considering such issues as the development of ITD processing, comparison of ITD processing across species, and binaural hearing in reverberant environments (Devore et al. 2006).
Acknowledgements. This work was supported by NIH grants DC07353 and DC002258.
396 |
K.E. Hancock |
References
Batra R, Kuwada S, Fitzpatrick DC (1997) Sensitivity to interaural temporal disparities of lowand high-frequency neurons in the superior olivary complex. I. Heterogeneity of responses. J Neurophysiol 78:1222–1236
Beckius GE, Batra R, Oliver DL (1999) Axons from anteroventral cochlear nucleus that terminate in medial superior olive of cat: observations related to delay lines. J Neurosci 19:3146–3161
Brand A, Behrend O, Marquardt T, McAlpine D, Grothe B (2002) Precise inhibition is essential for microsecond interaural time difference coding. Nature 417:543–547
Devore S, Ihlefeld A, Shinn-Cunningham BG, Delgutte B (2006) Neural and behavioral sensitivities to azimuth degrade similarly in reverberant environments. International Symposium on Hearing, Chap 24
Glendenning KK, Masterton RB (1983) Acoustic chiasm: efferent projections of the lateral superior olive. J Neurosci 3:1521–1537
Goldberg JM, Brown PB (1969) Response of binaural neurons of dog superior olivary complex to dichotic tonal stimuli: some physiological mechanisms of sound localization. J Neurophysiol 32:613–636
Hancock KE, Delgutte B (2004) A physiologically based model of interaural time difference discrimination. J Neurosci 24:7110–7117
Jeffress LA (1948) A place theory of sound localization. J Comp Physiol Psychol 41:35–39
Joris PX, van der Heijden M, Louage D, Van de Sande B, Van Kerckhoven C (2004) Dependence of binaural and cochlear “best delays” on characteristic frequency. In: Pressnitzer D, de Cheveigne A, McAdams S, Collet L (eds) Auditory signal processing: physiology, psychoacoustics, and models. Springer, Berlin Heidelberg New York
Marquardt T, McAlpine D (2001) Simulation of binaural unmasking using just four binaural channels. Assoc Res Otolaryngol Abs, p 87
McAlpine D, Jiang D, Palmer AR (2001) A neural code for low-frequency sound localization in mammals. Nat Neurosci 4:396–401
McCue MP, Guinan JJ Jr (1997) Sound-evoked activity in primary afferent neurons of a mammalian vestibular system. Am J Otol 18:355–360
Shackleton TM, Meddis R, Hewitt MJ (1992) Across frequency integration in a model of lateralization. J Acoust Soc Am 91:2276–2279
Stern RM, Zeiberg AS, Trahiotis C (1988) Lateralization of complex binaural stimuli: a weighted-image model. J Acoust Soc Am 84:156–165
Trahiotis C, Stern RM (1994) Across-frequency interaction in lateralization of complex binaural stimuli. J Acoust Soc Am 96:3804–3806
van Bergeijk W (1962) Variation on a theme of von Békésy: a model of binaural interaction. J Acoust Soc Am 34:1431–1437
von Békésy G (1960) Experiments in hearing. McGraw-Hill, New York, pp 272–301
Wightman FL, Kistler DJ (1992) The dominant role of low-frequency interaural time differences in sound localization. J Acoust Soc Am 91:1648–1661
Comment by Carr
Your paper discusses the advantages of a code without labeled lines for ITD. It is not clear to me why there need be a dichotomy between the labeled-line code strategy of an ITD map and a rate code. Many sensory variables are encoded by activity in populations of neurons with bell-shaped tuning curves. Models of information coding from these populations appear consistent with
A Physiologically-Based Population Rate Code for ITDs |
397 |
the behavioral resolution of ITD (Skottun et al. 2001). Takahashi et al. (2003), however, have argued that rate coding and place coding are not mutually exclusive. In the barn owl, they have shown that changes in the firing rate of space-specific neurons can serve as the basis of a spatial discrimination task. The place code, which is based on the position of active neurons in the space map of the barn owl IC, appears to be used to direct orientation toward a sound source, such as the motor task you use in your example.
References
Skottun BC, Shackleton TM, Arnott RH, Palmer AR (2001) The ability of inferior colliculus neurons to signal differences in interaural delay. Proc Natl Acad Sci 98:14050–14054
Takahashi TT, Bala AD, Spitzer MW, Euston DR, Spezio ML, Keller CH (2003) The synthesis and use of the owl’s auditory space map. Biol Cybern 89:378–387
Reply
As you describe, Takahashi et al. argue that the two codes might coexist, but are applicable to different tasks. They suggest a population rate code for the general task of discriminating two stimuli, and a place code for the more specific task of sound localization. What I tried to show is that the population rate code (with four channels) can also account for sound localization, that is, it can apply to both kinds of tasks, at least under the stimulus conditions considered. The larger issue, of course, is whether or not mammals use the same strategy as barn owls (and chickens?) to localize sound based on ITD. That, I believe, is still an open and interesting question.

43 A p-Limit for Coding ITDs: Neural Responses and the Binaural Display
DAVID MCALPINE1, SARAH THOMPSON1, KATHARINA VON KRIEGSTEIN2,
TORSTEN MARQUARDT1, TIMOTHY GRIFFITHS2, AND ADENIKE DEANE-PRATT1
1Introduction
Interaural time differences (ITDs) are the main cues used by humans to determine the horizontal position of low-frequency (<1500 Hz) sound sources. The neural representation of ITDs is presumed to be one in which brain centres in each hemisphere encode the opposite (contralateral) side of space (Jenkins and Merzenich 1984). Assumptions in most human psychophysical studies are that the range of ITDs encoded is constant across the range of sound frequencies at which sensitivity to ITDs in the fine-structure of sounds is observed (<1500 Hz) (Trahiotis and Stern 1989) and determined largely by the physiological range, but with greater ITDs of probably up to at least 3000 µs, explicitly encoded in the 500-Hz frequency band in order to account for human psychophysical performance (van der Heidjen and Trahiotis 1999).
The brain’s response to ITDs is commonly represented in the form of a cross-correlogram, which plots the cross-correlation function of the sound at each ear for each frequency channel following cochlear filtering. For the example in Fig. 1, a 500-Hz tone presented over stereo headphones, and containing an ITD of −1500 µs (i.e. leading at the left ear), activates the crosscorrelogram at multiple periods of the stimulus waveform, giving peaks in activity every 2000 µs within the 500-Hz channel (black sinusoid in 500-Hz channel). Human listeners report such a sound to have an intracranial image on the contralateral side to the leading ear (Trahiotis and Stern 1989), in this case the right side, consistent with an ITD of +500 µs. It appears that for tones and narrow bands of noise, the auditory system resolves the ambiguity in the internal representation by selecting the shortest of the possible ITDs – a weighting for centrality that has been explained by the existence of more coincidence-counting units encoding ITDs that lie within the physiological range (Stern et al. 1988). This central weighting is shown in Fig. 1 as an increased gray-scale density for shorter ITDs. As signal bandwidth is
1Ear Institute, University College London, London, UK, d.mcalpine@ucl.ac.uk, t.marquardt@ucl.ac.uk, adenike.turner@ucl.ac.uk, sarah.thompson@mrc-cbu.cam.ac.uk
2Wellcome Department of Imaging Neuroscience, Institute of Neurology, University College London, London, UK, kkriegs@fil.ion.ucl.ac.uk, t.d.griffiths@newcastle.ac.uk
Hearing – From Sensory Processing to Perception
B. Kollmeier, G. Klump, V. Hohmann, U. Langemann, M. Mauermann, S. Uppenkamp, and J. Verhey (Eds.) © Springer-Verlag Berlin Heidelberg 2007

400 |
D. McAlpine et al. |
Fig. 1 Representation of ITD information in the form of a cross-correlogram. Grey-scale density illustrates central weighting function of ITD detectors used in central weighting models. Curved dashed lines indicate the “π-limit”. See text for full description
increased, however, the intracranial auditory image shifts from the right to the left side until, for a bandwidth of 400 Hz (grey vertical bar to left of correlogram in Fig. 1), the image is lateralised fully to the left side corresponding to that of the “true” ITD. As for a tone, multiple peaks of activation appear in the plot, but the peaks at the true ITD (in this case, −1500 µs) are aligned across frequency channels, a pattern referred to as “straightness” (Trahiotis and Stern 1989). As such, it has been proposed that a second layer of coincidence detectors exists, potentially in the midbrain, multiplying acrossfrequency straightness (Stern and Trahiotis 1997) and thereby giving weight to the true ITD which, under real-world listening conditions, would be consistent across frequency for a single sound emanating from a single source. This straightness-weighting therefore accounts for the ‘true’ lateralized image of broadband sounds with high ITD on the basis of midbrain activity on the opposite side to the lateralised sound image (grey curve at top of Fig. 1).
However, straightness-weighting is unsatisfactory because not only would a full representation of ITDs have to exist in each frequency band, including ITDs that can never be experienced under natural listening conditions, but also a second level of coincidence detectors specifically looking for across-frequency straightness at these ITDs. The existence or otherwise of neural mechanisms specialized to detect such long ITDs is disputed.
A π-Limit for Coding ITDs: Neural Responses and the Binaural Display |
401 |
Recent investigations indicate only a restricted range of ITD detectors in the mammalian brain (McAlpine et al. 2001; Hancock and Delgutte 2004), with no neurons showing tuning for ITDs beyond approximately 1/2 a cycle of the centre frequency of each auditory filter. We refer to this as the π-limit (denoted by the curved dashed lines running down Fig. 1). Note that with the π-limit the 500-Hz frequency band possesses ITD detectors up to a maximum of 1000 µs, being 1/2 the period of a 500-Hz centre frequency. The π-limit must therefore account for the ‘true’ lateralized image of broadband sounds with high ITD on the basis of midbrain activity on the same side as the lateralised sound image. Note that the models make entirely opposite predictions to each other for ITDs of 1500 µs: the straightness model predicts greater activity in the brain hemisphere contralateral to the lateralised sound image, while the π-limit predicts greater activity ipsilateral. This is in contrast to ITDs of ±500 µs, for which both models predict greater activation in the brain hemisphere contralateral to the lateralized sound image.
Here, we examine the representation of ITD in the human midbrain and cortex using functional magnetic resonancing imaging (fMRI), the mismatch negativity potential (MMN), and headphone presented sounds with variable ITD.
2Methods
For the functional magnetic resonance imaging (fMRI) experiment, stimuli were 400-Hz band-pass noises exemplars (fixed-amplitude, random-phase) centred at 500 Hz. Stimuli consisted of eight consecutive noise bursts of 1 s each from the same condition (8 s), or 8 s of silence, presented via stereo headphones. ITDs were ±1500 µs, ±500 µs and 0 µs plus a silent condition. Sounds with negative ITDs were always perceived on the left, and those with positive ITDs on the right. BOLD contrast images were acquired using T2* weighted EPI and sparse imaging. A total of 48 slices in ascending axial order were obtained with cardiac gating for 14 subjects, all of whom were right-handed, with no hearing impairment or history of neurological disorder. Subjects were asked to pay attention to the noises, and to press a button on their keypad at the end of each trial to maintain alertness. They were also asked to keep their eyes open and fixate on a central cross, in order to counteract any confound caused by correlated eye movements. Imaging data were analysed using the statistical parametric mapping algorithm implemented in SPM2. Structural and EPI images were co-registered and normalised to a standard grey-matter template (Montreal Neurological Institute), Data were thereby transformed into a standard stereotaxic space and subsampled with a voxel resolution of 2 × 2 × 2 mm (original voxel resolution was approximately 3 × 3 × 3 mm). Data were spatially smoothed with a Gaussian smoothing kernel of 5 mm. SPM2 was used to compute individual subject analyses according to the General Linear Model by
402 |
D. McAlpine et al. |
fitting the data time-series with the canonical Haemodynamic Response Function (cHRF) at the onset of each trial. Each condition was modeled as an individual regressor in the design matrix, and statistical parameter estimates were computed individually for each brain voxel.
The mismatch negativity (MMN), a component of the auditory evoked potential produced by context-dependent changes in the environment, is reported as being greater at electrodes contralateral to the perceived location of a sound. In each of four blocks, participants read self-selected material while seated in a soundproof booth, and were instructed to ignore the sounds. Stimuli were presented via headphones at 65 dB SPL in pseudo-random order with a SOA of 1 s. Each block was ten minutes long and consisted of 480 standard stimuli and 30 × 4 deviant stimuli (0.8 and 0.05 probability of occurrence, respectively). A total of 2400 (1920 standard and 480 deviant) 400 Hz wide, bandpass-filtered white noise bursts, centred on 500 Hz were generated in MATLAB using the Binaural Toolbox. Noises of 500 ms duration, including a 50-ms rise/fall time, were digitized at 20,000 Hz. There was no difference in onset time, and delays were applied to the right channel only. Standard stimuli had an ITD of 0 µs and a perceived location in the middle of the head. The electroencephalogram (EEG) was recorded using NUAMPS with Ag-AgCl electrodes from 32 channels positioned according to the extended 10–20 system and re-referenced offline to the average from the mastoids. Horizontal and vertical electroocculograms were recorded with electrodes placed above and below the left eye, and on the outer canthi. Blink removal, artefact rejection (threshold: ±100 µV) and lowpass-filtering below 20 Hz were performed offline. Data for each condition were averaged from 600-ms epochs (beginning 100 ms pre-stimulus). MMN was defined as the mean amplitude of the deviant-standard difference waveforms in a 40-ms window centred on the grand average negative peak at each electrode. Statistical analysis was performed at three left-right electrode pairs: F3-F4, FC3-FC4, C3-C4. One-sample t-tests were used to determine that MMN amplitudes differed from zero and three-way ANOVA delay (−1500, −500, +500 +1500)* hemisphere (left, right)* row (F, FC, C) confirmed no main effects and no interactions of peak MMN latencies.
3Results
Initial analysis served to map functionally the IC for both hemispheres. To that end individual statistical maps were generated for the contrast All Sounds>Silence. Data from the resulting contrast maps for 14 subjects were entered into a second level analysis to allow population-level inferences to be drawn (random effects). The group analysis was thresholded at 0.05 FWE corrected to map the IC in both hemispheres as our a priori ROI. The one resulting ROI was divided in the midline (x = 0) in two ROIs representing

A π-Limit for Coding ITDs: Neural Responses and the Binaural Display |
403 |
right and left ICs respectively. Contrasts of interests were between [(−500 µs normal orientation) + (500 µs at right-left flipped orientation)]>[(1500 µs normal orientation) + (−1500 µs at right-left flipped orientation)]. These data are shown in Fig. 2, and demonstrate clearly that the representation of ITDs in the human midbrain switches from contralateral to ipsilateral when the ITD of bandpass noise is increased beyond the π-limit. In this, our result accords with a growing body of physiological data from the auditory midbrain of other mammals (McAlpine et al. 2001; Hancock and Delgutte 2004), suggesting that only a limited representation of ITDs, corresponding to an upper limit of approximately 1/2 the period of the stimulus centre frequency, exists within each sound frequency channel in the auditory brain. The data are also consistent with an absence of any preference for straightness weighting when the ITD is ±1500 µs.
Fig. 2 a Coronal slice, mapping the IC by contrasting all noise conditions with the silent baseline for all subjects. Activation is superimposed on a standard structural brain template. The area marked with a square is shown enlarged in b which shows group statistical parametric maps for the contrasts between ipsiand contralateral delays at different ITD (500 µs/1500 µs) within the functionally mapped IC. c Percentage change in BOLD response

404 |
D. McAlpine et al. |
Fig. 3 a Grand mean difference waveforms (left panels) for each condition at the electrode pairs FC3-FC4. Note that negative amplitudes are plotted upwards. b Mean MMN amplitudes from electrode pairs in a. The ITDs of ±1500 s correspond to 3/4 of a cycle of the stimulus centre frequency and ±500 s 60 1/4 of the cycle. Ipsilateral and contralateral refer to brain hemisphere referenced to the ear at which the sound leads in time
A similar pattern to the IC responses was also observed at the cortical level in the analysis of MMN at each electrode pair. Figure 3a shows the difference waveforms at FC3 (left hemisphere) and FC4 (right hemisphere). Peak activation occurs between 100 and 200 ms post-stimulus onset and is greater at the electrode contralateral to the perceived location for the −500-µs condition (p<0.05). This conforms to a bias toward greater right hemisphere activation previously seen in a number of studies and presumably due to right hemispheric specialisation for spatial stimuli. Sounds on the left that are within the π-limit elicit stronger contralateral activity, whereas right-sided stimuli are represented more bilaterally; outside the π-limit there is no difference between left and right electrodes.
In order to examine the data without this bias, mean contralateral (e.g. [+500 µs at FC3] + [−500 µs at FC4]; [+1500 µs at FC3] + [−1500 µs at FC4]) and ipsilateral (e.g. [+500 µs at FC4] + [−500 µs at FC3]; [+1500 µs at FC4] + [−1500 µs at FC3]) MMN amplitudes were compared. Here, contralateral activation was greater (p<0.05) within, but not beyond the π-limit, as can be seen in Fig. 3b. These data are consistent with our fMRI results and with animal data, but not with the straightness model or a full-range of ITD representation.
4Discussion
Imaging data from the human brain are consistent with the notion that ITD is represented by a restricted range of detectors, such that ITDs beyond 1/2 a cycle of the stimulus centre frequency are not explicitly coded. For the stimuli used in the current study, ITDs equivalent to ±3/4 of the period of the stimulus centre frequency are therefore likely encoded by neurons whose response
A π-Limit for Coding ITDs: Neural Responses and the Binaural Display |
405 |
maxima are evoked by ITDs equivalent to 1/4 of the period of the stimulus centre frequency, and of opposite sign (and opposite lateralised location). Thus, the notion that activation of auditory spatial detectors is determined by the side from which a sound source is heard to originate does not hold. Clearly, however, both the absolute brain activation as well as the relative activation across the two brain hemispheres, differs between the ITDs of ±500 and ±1500 µs. These differences likely stem from differences in interaural correlation between the two stimuli. How such differences might be interpreted as differences in lateral position and extent of an intracranial sound image remains to be determined, but the existence should be noted of successful models of binaural hearing, designed to account for psychophysical data, in which the notion of internal delays is completely absent.
The weighted image model of ITD processing (Stern et al. 1988), designed to account for human psychophysical performance, posits the existence of a second-level of coincidence detection explicitly to account for the switch in lateralised percept of noise with long (±1500 µs) delays as the bandwidth is increased. However, the current data call into question the existence of the required straightness detectors, at least up to the level of primary auditory cortex. Explicit anatomical and physiological manifestations such have been suggested for the barn-owl brain (Wagner et al. 1987) therefore might not be relevant to binaural hearing in mammals. Future use of binaural displays such as the cross-correlogram should take account of both physiological data when interpreting psychophysical findings. Although the straightness-weighting model is a simple and attractive notion, our data demonstrate that it does not hold for all ITDs, e.g. it cannot account for sensitivity to ITDs greater than 1/2 the stimulus period. Our data are entirely consistent with the π-limit model.
Changes in the perceived location of a sound have been reported to generate a MMN, which is reported to be maximal in the brain hemisphere contralateral to the perceived location. Ergo, there is presumed to be a switch in the laterality of brain activation to accompany the switch in lateral image for broadband stimuli with ITDs of opposite sign. The current study indicates this not to be the case. The MMN does not appear to accompany the perceptual shift, but rather accompanies the neural population most active, suggesting that its generation is related to detection of activity in a neural population, rather than the perceived location of a sound source. Therefore, the popularlyheld notion that the MMN is a marker for the perceptual processing of auditory cues, including cues for spatial hearing, may have to be revised.
References
Hancock KE, Delgutte B (2004) A physiologically based model of interaural time difference discrimination. J Neurosci 24:7110–7117
Jenkins WM, Merzenich MM (1984) Role of cat primary auditory cortex for sound-localization behavior. J Neurophysiol 52:819–847