Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Учебники / Hearing - From Sensory Processing to Perception Kollmeier 2007

.pdf
Скачиваний:
157
Добавлен:
07.06.2016
Размер:
6.36 Mб
Скачать

Logarithmic Scaling of Interaural Cross Correlation

385

predicted r-JND increases with the width of the histograms of the model’s output, i.e., the variance of rw. Because the variance of rw is inversely proportional to the duration of the temporal window and inversely proportional to the bandwidth of peripheral filter outputs, we conclude that the r-JND at rref = 0 is a consequence of binaural sluggishness and (for stimulus bandwidths 1 ERB) of monaural peripheral filter bandwidth. An unrealistic improvement of the r-JND with increasing peripheral filter bandwidths at higher frequencies is prevented by the hair cell transformation.

Accordingly, it is very likely that the model will provide (at least qualitatively) correct predictions for the effects of stimulus duration and bandwidth on IAC discrimination. Furthermore, since the model does not need any normalization of levels, its r-JND will increase if an additional interaural level difference is applied to the stimulus. In contrast, any normalizationbased model would be insensitive to ILDs. However, the model has not been tested quantitatively for such stimuli yet.

The window duration of the model’s feature extraction mechanism (T90=25 ms) is much shorter than the direct T90-estimate of 120 ms from own psychoacoustical data (analytically derived from Eq. 4). This apparent contradiction can be resolved by the hypothesis that two time windows contribute to the overall sluggishness of the whole perceptual process: The first window (25 ms) refers to the mechanism of feature extraction only. The hypothetical second time window characterizes the sluggishness of a presumably cortical mechanism which sorts the values of rw(t) into the bins of an ‘internal histogram’. Note that, for stimuli with static IAC, the second time window would have no effect on the shape (or width) of internal histograms and the corresponding psychometric functions for IAC discrimination at all.

The hypothesis that two time constants are required to account for the sluggishness of binaural perception has also been proposed by, e.g., Bernstein et al. (2001) for psychoacoustical data and by Dajani and Picton (2006) for electrophysiological data.

In summary, the proposed model quantitatively explains the dependency of the r-JND on rref as a consequence of binaural sluggishness, monaural filter bandwidth and hair cell transformation. The predicted thresholds of the model are compatible with literature and are roughly constant on the dB(N0/Np) scale, they amount to about 4 dB. The model avoids normalization, its components are computationally simple and are neurophysiologically plausible (e.g., EE and IE neurons for the additive and subtractive mechanism, respectively). Changes of the model’s output due to IAC transitions in the stimulus are linearly related to the amplitudes of the LAEP to these stimuli. For this reasons we believe that the model structure provides an adequate description of IAC processing by perceptual and neurophysiological means.

Acknowledgements. This research has been supported by the Deutsche Forschungsgemeinschaft.

386

H. Lu¨ddemann et al.

References

Akeroyd MA, Summerfield AQ (1999) A binaural analog of gap detection. J Acoust Soc Am 105(5):2807–2820

Bernstein LR, Trahiotis C, Akeroyd MA, Hartung K (2001) Sensitivity to brief changes of interaural time and interaural intensity. J Acoust Soc Am 109(4):1604–1615

Boehnke SE, Hall SE, Marquardt T (2002) Detection of static and dynamic changes in interaural correlation. J Acoust Soc Am 112(4):1617–1626

Breebaart J, Kohlrausch A (2001) The influence of interaural stimulus uncertainty on binaural signal detection. J Acoust Soc Am 109(1):331–345

Culling JF, Summerfield Q (1998) Measurements of the binaural temporal window using a detection task. J Acoust Soc Am 103(6):3540–3553

Culling JF, Colburn HS, Spurchise M (2001) Interaural correlation sensitivity. J Acoust Soc Am 110(2):1020–1029

Dajani H, Picton T (2006) Human auditory steady-state responses to changes in interaural correlation. Her Res 219(1/2):85–100

Gabriel KJ, Colburn HS (1981) Interaural correlation discrimination: I. Bandwidth and level dependence. J Acoust Soc Am 69(5):1394–1401

Kollmeier B, Gilkey RH (1990) Binaural forward and backward masking: evidence for sluggishness in binaural detection. J Acoust Soc Am 87(4):1709–1719

Kollmeier B, Holube I (1992) Auditory filter bandwidths in binaural and monaural listening conditions. J Acoust Soc Am 92(4.1):1889–1901

Pollack I, Trittipoe W (1959) Interaural noise correlation: examination of variables. J Acoust Soc Am 31(12):1616–1618

van der Heijden M, Trahiotis C (1997) A new way to account for binaural detection as a function of interaural noise correlation. J Acoust Soc Am 101(2):1019–1022

van der Heijden M, Trahiotis C (1998) Binaural detection as a function of interaural correlation and bandwidth of masking noise: implications for estimates of spectral resolution. J Acoust Soc Am 103(3):1609–1614

van de Par S, Trahiotis C, Bernstein LR (2001) A consideration of the normalization that is typically included in correlation-based models of binaural detection. J Acoust Soc Am 109(2):830–833

Comment by Carlyon

I don’t think you can conclude that the linear relationship between LAEP amplitudes and the ‘dB(C/A) scaled IAC’ is evidence that sensitivity to changes in IAC are based on that log-ratio measure. This is because you do not have evidence that the linear LAEP amplitude is the appropriate decision statistic. For example, you could have calculated the log (or some other transform) of the LAEP amplitude, and we don’t know whether one transform is better or worse than any other (or none). Depending on the transform used, you would need to change your log-scaled ratio to some other measure in order to maintain a linear relationship. To determine what scale the LAEP should be expressed in, one would have to have measures not only the LAEP mean amplitude, but also its standard deviation. One could then use a scale where the standard deviation is constant across the range of LAEP amplitudes measured. This might be a linear scale, but it might not.

Logarithmic Scaling of Interaural Cross Correlation

387

Reply

Indeed the variance over epochs in our EEG data does not depend on the IAC or the IAC transition and is roughly the same at all latencies, in particular, it does not increase at latencies corresponding to the N1-/P2-response. Since the dB(N0/Nπ) transform is the Fisher-Z-transform of the linear normalized IAC, also the variance of instantaneous dB(N0/Nπ)−IAC over time, as computed at the output of a moving temporal window, is the same for all our stimuli. Following your argument, one could easily interpret the constancy of variance of both, the EEG and the stimulus properties, in the sense that equal differences in LAEP amplitude correspond to equal discriminability by means of d-prime. However, such an interpretation would be invalid for the following reason.

In EEG recordings it is impossible to observe the activity of only those parts of the brain which are involved in binaural processing separate from other parts of the brain. Instead one always measures a far-field superposition of a comparatively tiny stimulus related brain response S (we found maximum LAEP peak amplitudes of 3–4 µV in the average over epochs) and spontaneous brain activity N which is much higher in amplitude than S and is considered as noise (with an RMS-value of about 10 µV in the filtered data before averaging).

Because the stimulus related binaural response (and in particular its possible variance) is much weaker than the spontaneous brain activity, it is not possible to separate the contributions of S and N to the EEG or its overall variance. Therefore the statistical approach described above is unfortunately not practicable for the analysis of EEG data from the auditory cortex.

Additionally, using methods similar to Shackleton et al. (2003, 2005), we computed the area below the ROC-curve corresponding to the amplitude statistics over 1000 epochs of EEG data elicited by stimuli with either the reference IAC or the deviant IAC. Even for the largest differences in IAC the ROC area (probability for correct discrimination) did not exceed 0.6, although the corresponding IAC transitions were clearly detectable in psychoacoustics (i.e., psychoacoustical ROC area close to 1).

References

Shackleton TM, Skottun BC, Arnott RH, Palmer AR (2003) Interaural time difference discrimination thresholds for single neurons in the inferior colliculus of Guinea pigs. J Neurosci 23(2):716–724 Shackleton TM, Arnott RH, Palmer AR (2005) Sensitivity to interaural correlation of single

neurons in the inferior colliculus of guinea pigs. J Assoc Res Otolaryngol 6(3):244–259

Comment by Lütkenhöner

The proposed transformation from the interaural correlation ρ to the dB-scaled ratio of diotic and antiphasic noise components, ρlog, is appealing. However, the question arises how to interpret ρlog in physiological terms. While the quantities

388

H. Lu¨ddemann et al.

ρ and ρlog are more or less proportional for ρ <0.5 (roughly corresponding toρlog <5 dB), ρlog becomes infinite for ρ →1. The following consideration might help to understand this kind of singularity. The case ρ→ −1 is related to the detection of a faint diotic noise, N0, against an antiphasic noise background, Nπ. If N0 is formally considered as the signal and Nπ as the noise, ρlog may be interpreted as the signal-to-noise ratio in dB. Correspondingly, −ρlog may be interpreted as the signal-to-noise ratio related to the detection of a faint antiphasic noise against a diotic noise background. This consideration suggests that the JNDs for |ρ|≈1 might be of a quite different nature than the JNDs for |ρ|≈0 (detection threshold versus discrimination threshold).

Reply

It is correct that the basic dB(N0/Nπ)-transform as given in our article becomes infinite for values of ρ=+1 or −1. On the physiological level, however, such infinite values will never occur due to irregularities of neural processing. In our model this is simulated by adding uncorrelated noise Nu to the dichotic input signal S at a signal-to-noise ratio (SNR) of 15 dB, which is assumed to be independent of the input signal’s IAC. In a model without haircell transform, the effective “internal IAC” of the mixture (S+Nu/SNR) is then ρ(SNR2/(1+SNR2). The corresponding transformed value is 10 . log (1+ρ+1/SNR2)/(1−ρ+1/SNR2). Thus, for an SNR of 5.6 (=15 dB) the internal IAC ranges between −18 and +18 dB(N0/Nπ).

Durlach et al. (1986), Koehnke et al. (1986), Culling et al. (2001) and Boehnke et al. (2002) suggested that the BMLD and IAC-JNDs can be explained by similar mechanisms. Our model can explain both kinds of experiments, e.g., BML data by van der Heijden and Trahiotis (1998) and by Breebaart and Kohlrausch (2001). However, Culling et al. (2001) described how different cues are used depending on masker bandwidth: In broadband conditions the signal is detected as a tone in the background noise. In contrast, a signal in narrowband noise causes a percept of spatial movement of the whole stimulus. Accordingly, the issue if listeners perform a discrimination or a detection strategy could depend on the stimulus’ bandwidth rather than on the masker’s IAC.

References

Durlach NI, Gabriel KJ, Colburn HS, Trahiotis C (1986) Interaural correlation discrimination: II. Relation to binaural unmasking. J Acoust Soc Am 79(5):1548–1557

Koehnke J, Colburn HS, Durlach NI (1986) Performance in several binaural-interaction experiments. J Acoust Soc Am 79(5):1558–1562

42 A Physiologically-Based Population Rate Code for Interaural Time Differences (ITDs) Predicts Bandwidth-Dependent Lateralization

KENNETH E. HANCOCK1,2

1Introduction

Interaural time difference (ITDs) are the most important cue to the location of sounds containing low-frequency energy (Wightman and Kistler 1992). ITDs are encoded centrally in the medial (MSO) and lateral (LSO) superior olives which transmit the code to the inferior colliulus (IC) (Batra et al. 1997; Goldberg and Brown 1969). Each ITD-sensitive neuron is characterized by its best ITD (BD), the one producing maximal discharge rate. It is a longstanding view that these neurons are conceptually arranged in an array with best frequency (BF) on one axis and BD on the other to form a labeled-line code. According to this model, the stimulus ITD corresponds to the BD (i.e. the label) of the most active neuron in the array (Jeffress 1948).

The labeled-line model is challenged by physiological data from guinea pig (confirmed in cat and gerbil) showing that the distribution of BD is highly dependent on BF, and in general does not correspond to the range of natu- rally-occurring ITDs (Brand et al. 2002; Hancock and Delgutte 2004; McAlpine et al. 2001). Instead, best interaural phase (BP = BD × BF) is more nearly independent of BF, such that the steepest slopes of neural rate-ITD curves tend to occur near the midline. Because the slopes, not the peaks, align near the midline (where perceptual ITD acuity is finest), it has been suggested that ITD is encoded by the discharge rate itself rather than by the locus of maximal activity (McAlpine et al. 2001). Thus, ITD may be represented by a population rate code, in which the activity of many neurons pool to form monolithic ITD channels on each side of the brain, and the stimulus ITD may be inferred by comparing the relative activity of the two channels (van Bergeijk 1962; von Békésy 1960).

Though the physiological data suggest the existence of a rate code, analysis of its viability has barely begun (Marquardt and McAlpine 2001). Here, we demonstrate that a population rate code model can account for the dependence of perceived laterality on stimulus bandwidth.

1Eaton-Peabody Laboratory, Massachusetts Eye & Ear Infirmary, Boston MA USA, Ken_Hancock@ meei.harvard.edu

2Department of Otology and Laryngology, Harvard Medical School, Boston MA USA

Hearing – From Sensory Processing to Perception

B. Kollmeier, G. Klump, V. Hohmann, U. Langemann, M. Mauermann, S. Uppenkamp, and J. Verhey (Eds.) © Springer-Verlag Berlin Heidelberg 2007

390

K.E. Hancock

2Methods

Model neurons are arranged into four arrays, one representing each MSO and LSO (Fig. 1C). Individual neurons are modeled by the cross-correlation operation depicted in Fig. 1A. The sounds at each ear are filtered using identical gammatone filters (center frequency CF and time constant t). The contralateral filter output is both delayed and phase-shifted (CD and CP, respectively), then multiplied by the ipsilateral filter output. The cross-correlator output is converted to a firing rate by a quadratic function with coefficients A and B.

The single neuron model is thus specified by the six parameters {CF, t, CD, CP, A, B}, whose values were previously constrained by fitting the model to cat IC data (Hancock and Delgutte 2004). For all model neurons, the coefficients A and B were assigned the mean physiological values. The filter time constant varies inversely with CF according to t = Q/CF, where Q = 0.3. The remaining parameters were assigned as described below.

We have made the simplest possible assumption that MSO and LSO have similar CF and BP distributions, and differ primarily in characteristic phase. The CP was set to zero for all neurons in the MSO arrays, and set to 0.5 cycles

Fig. 1 Population rate model of ITD coding: A model for ITD-sensitive neurons. Acoustic inputs are bandpass-filtered, then cross-correlated after applying delay (CD) and phase shift (CP) to one side. Quadratic function converts cross-correlation to neural firing rate; B broadband noise rate-ITD curves for two model neurons. Solid line, MSO neuron (CP=0). Dashed line, LSO neuron (CP = 0.5 cycles); C model neurons are grouped into four arrays, representing each MSO and LSO. CF is distributed along one dimension of array, best phase (BP) along the other. Output of each array is the sum of its neural rates; D array outputs as a function of ITD in response to broadband noise

A Physiologically-Based Population Rate Code for ITDs

391

in the LSO arrays. The CF parameter was represented along one dimension of each array according to a log-normal distribution with a mean of about 600 Hz, and ranging from 50 Hz to 1500 Hz. The best phase BP was represented along the other dimension following a Gaussian distribution with mean and standard deviation each equal to 0.3 cycles. Each value of BP was used to assign the characteristic delay: CD = BP/CF.

Responses of two model neurons to broadband noise varied in ITD are shown in Fig. 1B. For both neurons, CF = 650 Hz and BP = 0.2 cycles (CD = 308 s). The solid line represents the response of a model MSO neuron (CP = 0), and shows a peak firing rate at CD. The dashed line represents a model LSO neuron (CP = 0.5 cycles), and exhibits a null at CD.

The output of each array is the sum of its individual firing rates. The outputs are illustrated in Fig. 1D as functions of ITD for broadband noise stimulation. Each MSO is maximally active when the stimulus is in the contralateral hemisphere, while each LSO is most strongly activated by ipsilateral stimulation.

3Results

3.1Summary of Psychophysical Results to be Modeled

This section summarizes psychophysical data which illustrate a dependence of laterality on stimulus bandwidth (Trahiotis and Stern 1994), and which represent a nontrivial test of lateralization in the model. The stimulus is bandpass noise centered at 500 Hz, with ITD=1.5 ms. Figure 2A shows the pattern of activity produced by this stimulus in the BF-BD plane. The full display consists of a series of peaks and valleys, but for clarity we show only the two peaks closest to the midline (solid black lines). The straight contour is at 1.5 ms, corresponding to the ITD of the stimulus. The secondary contour is separated from the main contour by the CF period, and hence is curved.

When the stimulus is narrowband (dark gray shading), it is perceived on the left. Trahiotis and Stern (1994) explained this percept as “centrality” dominated, because it favors the secondary peak in the BF-BD plane which occurs nearer the midline. In contrast, broadband stimuli (light gray shading) are perceived on the right side. This was described as “straightness” dominated, because it favors the peak for which the ITD is consistent across CF.

Laterality also depends on the stimulus interaural phase difference (IPD). The dashed lines in Fig. 2A show the contours corresponding to a 270° phase shift. The ITD was adjusted so that the contours always passed through the center frequency at constant ITD values. Shifting the phase straightens the left contour and curves the right one. In the broadband condition, this stimulus is perceived on the left because that contour is favored by both straightness and centrality.

392

K.E. Hancock

Fig. 2 A Contours of peak activity in CF-BD plane produced by noise with ITD=1.5 ms. Solid lines, IPD=0°. Dashed lines, IPD=270°. B Image heard to the left for narrow bandwidths and/or large IPDs. Image heard to the right for wide bandwidths

3.2Model

The dependence of lateralization on bandwidth and IPD can be explained using models that incorporate both straightnessand centrality-weighting (Stern et al. 1988). Weighting that reflects the overall tendency for physiological BD values to occur within the naturally-occurring ITD range is the most straightforward realization of centrality (Shackleton et al. 1992; Stern et al. 1988), and is an explicit property of the model described in this chapter. One method of implementing straightness-weighting is to integrate across BF along constant values of BD (Shackleton et al. 1992). This fundamentally requires a labeled-line representation because the inputs to the integration stage must be segregated according to BD. We show here that the bandwidthdependent lateralization data can also be simulated using a simple population rate code model, without explicit straightness-weighting and the resulting need for labeled lines.

Figure 3A shows the output of each of the four channels as a function of bandwidth (for IPD = 0°). We consider first a two-channel model comprising only MSO activity, and argue that it is insufficient to predict the psychophysical data. We assume that the position estimate is simply a vector sum of the MSO rates, and note that activity in one MSO corresponds to an image position in the contralateral hemifield. As bandwidth gets larger, the activity in the right MSO (RMSO) decreases (solid gray line) while the activity in the left MSO (LMSO) increases (solid black line). This correctly predicts that the lateral

A Physiologically-Based Population Rate Code for ITDs

393

Fig. 3 A Individual channel responses to noise (ITD=1.5 ms, IPD=0°) vs bandwidth. B Left and right components of model position estimates. C Model fit to psychophysical data

position moves rightward with increasing bandwidth. But the image can never actually cross to the right of the midline because RMSO is always more active than LMSO. This reflects the fact that the secondary peak produced by the stimulus is always more central than the main peak.

A four-channel model incorporating both MSO and LSO, however, can account for both the trends and magnitudes of the psychophysical data. Lateral position estimates were generated by linear combination of the channel outputs:

P = a (LMSO RMSO ) + b (LLSO RLSO )

(1)

The parameters a = 1.55 × 10−3 and b = 1.45 × 10−3 were chosen to minimize the sum of squared error between the model position estimates and the psychophysical data. The resulting model fit (Fig. 3C) agrees well with the data (Fig. 2B), for the reasons discussed below.

394

K.E. Hancock

The effect of incorporating LSO channels into the model is illustrated in Fig. 3B. The position estimates derived from the channel outputs of Fig. 3A are shown decomposed into left and right components:

PR = a LMSO b RLSO

(2)

PL = a RMSO − b LLSO

Including LSO channels preserves the essential trend exhibited by the MSO outputs (position increasingly favors the right as bandwidth increases), but shifts the balance such that the position estimate crosses the midline and attains realistic magnitudes. Because the main peak occurs outside the natural range of ITD, it evokes unnaturally large activity in LLSO. In this model, it is heightened LLSO activity, rather than straightness, that trades with the centrality manifested in the RMSO activity to shift the image across the midline.

As IPD increases from 0° to 270°, the main stimulus peak curves away from the array of BDs comprising the LMSO channel. At the same time, the secondary peak straightens, becoming more closely aligned with the RMSO channel. Consequently, RMSO activity increases with respect to LMSO activity, and the image position shifts to the left. As discussed in Sect. 3.1, this is purely a reflection of centrality.

4Discussion

4.1Advantages to a Code Without Labeled Lines

Neural best ITDs are determined by several factors, including axonal propagation delays, inhibition, and perhaps peripheral mechanical delays due to interaural CF mismatches (Beckius et al. 1999; Brand et al. 2002; Joris et al. 2004). A labeled-line code demands a stable and relatively precise distribution of best ITDs from these combined factors. In contrast, the population rate model requires only a hemifield bias in the best ITD distribution of each channel. The details of the distribution are not necessarily important, especially if the ITD processor is part of a larger feedback control system that guides orienting movements by restoring balance among the outputs of the four sensory channels.

4.2A Motor Interpretation of the Population Rate Code

Rotation of the head about the vertical axis involves four muscles, two on each side of the head. Rotation to the right is accomplished primarily by the sternocleidomastoid (SCM) muscle on the left side (i.e. contralateral to