Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Учебники / Hearing - From Sensory Processing to Perception Kollmeier 2007

.pdf
Скачиваний:
160
Добавлен:
07.06.2016
Размер:
6.36 Mб
Скачать

364

J.F. Culling and B.A. Edmonds

From the standpoint of loudness research, the method employed represents an improvement over those used by Culling et al. (2001, 2003), because those studies focussed listeners’ attention on the loudness dimension only through instruction. Those designs were therefore open to the criticism that listeners might have responded to some other stimulus dimension, particularly the image width. In the present study, the listener was required to match stimuli in loudness using an intensity offset. Sound intensity has negligible influence on image width, and so would not provide the listeners with an alternative means of performing the task. The design also conforms to the methods used in the literature to measure binaural summation. There, the method is known as “binaural-monaural loudness matching” (Reynolds and Stevens 1960). The binaural summation effect reported by Reynolds and Stevens using this and other methods was also replicated.

Given that the results genuinely reflect an influence of interaural correlation on loudness, one may ask why Dubrovksii and Chernyak did not observe it in their experiments. The answer to this question probably lies in the effect of stimulus bandwidth. Dubrovskii and Chernyak used full spectrum white noise, only limited by the frequency response of their headphones (Russian TD-6 audiological headphones), which they stated had a high frequency cut-off at around 5 kHz. According to measurements by Robinson (1971), this is a rather conservative limit for these headphones. In the experiments reported here, the effect of interaural correlation is about 2 dB at bandwidths of 460–540 Hz and 100–900 Hz, but reduces to 0.8 dB (non-significant) at a bandwidth of 100–5000 Hz. Since the first two cases are both limited to frequencies at which binaural unmasking is effective, while the third extends to much higher frequencies, it is tempting to suppose that the substantial elevation of loudness at low frequencies is diluted by the presence of higher frequencies at which there is little effect. Further experiments will be needed to establish that the effect of interaural correlation is absent at higher frequencies using this paradigm. Certainly, the figure of 0.8 dB is close to the theoretical prediction of about 0.65 dB that one would expect from a linear dilution effect. Since Dubrovskii and Chernyak’s stimuli probably had an even broader spectrum, it is not surprising that they did not observe an effect.

The results support an interpretation of Zwicker and Zwicker (1991) in terms of interaural correlation. There are two aspects that seem to correspond well. First, they found that the overall increase in loudness produced by alternating a noise between the ears was insufficient to make the stimuli louder than continuous diotic noise at the same spectrum level. Thus, it was equivalent to an increase in intensity of something a little less than 3 dB. This corresponds well with the 2 dB observed here. Although Zwicker and Zwicker used a broad band of noise, that might have been susceptible to the dilution effect discussed above, it contained equal energy in each critical band, and so was spectrally tilted towards the low frequency region where binaural effects are strongest. Second, as noted by Zwicker

Interaural Correlation and Loudness

365

and Zwicker, the lowest of the alternation rates at which they observed increased loudness, 7 cycles/s, creates a situation in which at least one interaural transition occurs within any 100-ms window, corresponding with estimates of binaural temporal resolution (Grantham and Wightman 1978; Culling and Summerfield 1998).

Culling et al. (2001, 2003) provide a theoretical explanation for the effect of interaural correlation on loudness. They drew upon the work of Osman (1971) and Durlach et al. (1986), which suggested that the mechanism of binaural unmasking is sensitive to deviations in interaural correlation from unity. According to these ideas, during binaural unmasking, the presence of a signal with different interaural phase from the noise reduces the interaural correlation of the stimulus at the signal frequency. If, instead, the correlation is directly manipulated at a given frequency, an illusory experience of the signal is created. Extending this idea, Culling et al. suggested that beyond the point of detection, further decreases in correlation should be interpreted by the binaural system as increases in the relative intensity of the signal and hence result in progressive increases in the perceived loudness of the illusory signal. The present study shows that these increases in loudness also occur when the correlation of the whole stimulus is altered.

Loudness was found to be lower for the anticorrelated stimuli compared to the uncorrelated stimuli. This result is consistent with previous observations (Culling et al. 2003). The phenomenon may be explained with recourse to the idea that the mechanism of binaural unmasking operates independently in each frequency channel (Culling and Summerfield 1995). In order to recover signals from noise in any direction, the binaural system is thought to apply a compensating internal delay to the stimuli at each ear before assaying the correlation (e.g. Durlach 1972). The overall process is thus similar to measurement of the coherence. If the process operates independently in each frequency channel, anticorrelated noise will be interpreted as having a high coherence in all channels, because within each frequency channel the π phase difference can be approximately compensated by a single internal delay; within an individual channel a π phase shift is equivalent to much the same delay at each of the narrow range of frequencies that the frequency channel contains. Across different frequency channels the measurement mechanism will thus apply a delay equivalent to half the period of the centre-frequency of each channel. After these delays, the stimuli from each ear will always be highly correlated in all channels and therefore will not have the same degree of enhanced loudness as an uncorrelated stimulus.

Models of loudness are essentially monaural (Zwicker and Scharf 1965; Moore et al. 1999), including binaural processing only through the simple binaural summation process (Scharf and Fishken 1970). The evidence presented here suggests that some modification may be necessary in order to include the effect of interaural correlation at low frequencies.

366

J.F. Culling and B.A. Edmonds

Acknowledgement. Pilot data for these experiments, using the method of adjustment, were collected by Sonya Ginty in her final-year project. Work supported by UK EPSRC.

References

Culling JF, Summerfield Q (1995) Perceptual segregation of concurrent speech sounds: absence of across-frequency grouping by common interaural delay. J Acoust Soc Am 98:785–797 Culling JF, Summerfield Q (1998) Measurements of the binaural temporal window using a

detection task. J Acoust Soc Am 103:3540–3553

Culling JF, Colburn HS, Spurchise M (2001) Interaural correlation sensitivity. J Acoust Soc Am 110:1020–1029

Culling JF, Hodder KI, Colburn HS (2003) Interaural correlation discrimination with spectrallyremote flanking bands: constraints for models of binaural unmasking. Acta Acust united with Acustica 89:1049–1058

Dubrovskii NA, Chernyak RI (1969) Binaural summation under varying degrees of noise correlation. Sov Phys Acoust 14:326–332

Dubrovskii NA, Chernyak RI, Shapiro VM (1972) Binaural summation of differently correlated noises. Sov Phys Acoust 17:468–473

Durlach NI (1972) Binaural signal detection: equalization and cancellation theory. In: Tobias JV (ed) Foundations of modern auditory theory, vol 2. Academic Press, New York, pp 369–462 Durlach NI, Gabriel KJ, Colburn HS, Trahiotis C (1986) Interaural correlation discrimination:

II. Relation to binaural unmasking. J Acoust Soc Am 78:1458–1557

Grantham DW, Wightman FL (1978) Detectability of varying interaural temporal differences. J Acoust Soc Am 63:511–523

Levitt H (1971) Transformed up-down methods in psychoacoustics. J Acoust Soc Am 49:467–477

Moore BCJ, Glasberg BR, Vickers DA (1999) Further evaluation of a model of loudness perception applied to cochlear hearing loss. J Acoust Soc Am 106:898–907

Osman E (1971) A correlation model of binaural masking level differences. J Acoust Soc Am 50:1494–1495

Reynolds GS, Stevens SS (1960) Binaural summation of loudness. J Acoust Soc Am 32:192–205 Robinson DW (1971) A review of audiometry. Phys Med Biol 16:1–24

Scharf B, Fishken D (1970) Binaural summation of loudness reconsidered. J Exp Psych 86:374–379

Zwicker E, Scharf B (1965) A model of loudness summation. Psych Rev 72:3–26

Zwicker E, Zwicker UT (1991) Dependence of binaural loudness summation on interaural level differences, spectral distribution, and temporal resolution. J Acoust Soc Am 89:758–764

Comment by Weber

When listening to your presentation I got the following idea:

What do we basically need if we want to perceive a complex tone from a certain place in space? We would need a set of resonators for the complex tone and a delay line for performing the cross-correlation to position the complex sound in space.

We assume that the basilar membrane gives us the basis to calculate the complex tone. But what about the delay line? We would perhaps have the tendency to construct a delay line out of RC-elements.

Interaural Correlation and Loudness

367

But nature might be more efficient. It might use the basilar membrane itself also as delay line. The RC-elements might be a bit different to what we expect at first glance. But the basilar membrane might be at the same time the basis for frequency and correlation analysis.

So the basilar membrane would not only be regarded at as an important part of a frequency analyser but also as delay line for the calculation of the auditory space. When, e.g., using the HRTFs for special calculations both functions – the frequency analysis and the correlation analysis – need to be processed in combination. This might be performed by the basilar membrane being the basic part for a frequency analyser and delay line at the same time.

And if the basilar membrane has to perform a twofold role as frequency analyser and delay line, the requirements of both functions then might determine its special construction and functioning.

Are there consequences for the understanding of the auditory signal processing?

As one example one may speculate that the differences in loudness summation of correlated and uncorrelated noise might rely on the different spacious extensions of the two noises. The uncorrelated noise is more spacious than the correlated one and it is perceived as being louder.

It may be that the summation of loudness of objects in space shows similar behaviour as the spectral summation in the frequency domain. But this can be checked.

Reply

The idea that the internal delays needed for binaural processing might have a cochlear origin was proposed by Shamma et al. (1989). However, evidence in favour of the idea has since proved elusive. The most recent evidence points towards a role for timed inhibition rather than a Jeffress-style delay network (Brand et al. 2002).

It is tempting to think that the effect of interaural correlation on loudness might be mediated by the stimulation of a larger number of “spatial channels.” As you point out, there is an analogous effect in the frequency domain, in which equal-energy bands of noise are perceived as louder if they extend beyond one critical band (Feldtkeller and Zwicker 1956, as cited in Scharf 1970). Given our modern understanding of cochlear non-linearity, it seems likely that the bandwidth effect found by Zwicker and Feldtkeller is mediated by the cochlea’s compressive input-output function; when all the acoustic energy is concentrated on a single frequency channel the compression in that channel reduces the cochlea’s response relative to the situation where the energy is spread over many channels. I am not aware of any analogous mechanism that might be called upon to explain the effect of correlation on loudness, but I don’t think the idea can be ruled out.

368

J.F. Culling and B.A. Edmonds

References

Brand A, Behrend O, Marquardt T, McAlpine D, Grothe B (2002) Precise inhibition is essential for microsecond interaural time difference coding. Nature 417:543–547

Scharf B (1970) Critical bands. In: Tobias JV (ed) Foundations of modern auditory theory, vol 1. Academic Press, pp 159–202

Shamma SA, Shen N, Gopalaswamy P (1989) Stereausis: binaural processing without neural delays. J Acoust Soc Am 86:989–1006

40 Interaural Phase and Level Fluctuations as the Basis of Interaural Incoherence Detection

MATTHEW J. GOUPELL AND WILLIAM M. HARTMANN

1Introduction

Interaural coherence is a measure of the similarity of signals in a listener’s two ears. It is derived from the interaural cross-correlation function, which is a function of the interaural lag. The peak of the cross-correlation function is of particular interest. The value of the lag for which the peak occurs is regarded as the relevant interaural time difference (ITD) cue for the location of the sound image. This value of lag was given a place representation in the famous binaural model by Jeffress (1948). The height of the peak is thought to determine the compactness of the image. If the sounds in the two ears are identical except for an interaural delay, then the peak height has its maximum value of 1, and the image is expected to be maximally compact. If the height of the peak is less than 1, the image is broader or more diffuse (Barron 1983; Blauert and Lindemann 1986).

Listeners are particularly sensitive to deviations from a reference coherence of 1.0. Using narrowband noise, Gabriel and Colburn (1981) found that listeners could easily distinguish between noise with a coherence of 1.0 and noise with a coherence of 0.99. A reference coherence of 1.0 is also of interest in connection with the masking level difference (MLD). Wilbanks and Whitmore (1967) and Koehnke et al. (1986) concluded that the threshold signal-to-noise ratio for a heterophasic signal in a homophasic noise is essentially determined by the ability to detect the incoherence introduced by the out-of-phase signal.

The present work is also concerned with incoherence detection starting with perfectly coherent noise as a reference. Its working hypothesis is that the detection of a small amount of interaural incoherence is not based on the cross-correlation or the coherence per se. Instead, detection is hypothesized to be based on fluctuations in interaural phase difference (IPD) and/or interaural level difference (ILD) leading to salient fluctuations in the output of brainstem nuclei.

Michigan State University, USA, matt.goupell@gmail.com, hartmann@pa.msu.edu

Hearing – From Sensory Processing to Perception

B. Kollmeier, G. Klump, V. Hohmann, U. Langemann, M. Mauermann, S. Uppenkamp, and J. Verhey (Eds.) © Springer-Verlag Berlin Heidelberg 2007

370

M.J. Goupell and W.M. Hartmann

2Experiment

The experiment sought to test the adequacy of coherence per se by presenting listeners with reproducible noise samples, all of which had exactly the same value of interaural coherence. However, different noises had different amounts of IPD and ILD fluctuations. If it could be shown that the incoherence was significantly more detectable in some noise samples than in others, and if that difference correlated with IPD and ILD fluctuations, then the hypothesis would be supported.

Implementing the experimental plan required a choice of noise bandwidth and a choice of stimulus duration. A narrow bandwidth was chosen because fluctuations in different noise samples differ more widely when the bandwidth is narrow. A range of durations was chosen. Although many MLD experiments have been done with 500-ms stimuli, recent binaural models have stressed the potential importance of short-term correlations (Bernstein et al. 1999). In order to make a convincing case for or against coherence per se, it was necessary to perform the experiments over all relevant time scales.

2.1Stimuli

The stimuli were noise bands with 14-Hz nominal bandwidth and durations of 500, 100, 50, and 25 ms. The bands were centered on 500 Hz and experienced some inevitable splatter when the duration was brief. Splatter was reduced by using 30-ms or 10-ms raised-cosine edges.

One hundred different noises were created for each duration. The computation of noise stimuli targeted a fixed coherence, and stimulus selection after enveloping ensured that every noise had an interaural coherence of exactly 0.992. After the noises were computed, the interaural phase and level differences were computed as functions of time from the Hilbert transforms of the noises. Fluctuations were then characterized by the standard deviation of the IPD and the standard deviation of the ILD as computed over the duration of the noise. Those standard deviations form the vertical and horizontal axes, respectively, of Fig. 1, which shows the 100 noises for 500-ms duration. It is evident that some noises, e.g. noise number 79, have large IPD fluctuations whereas other noises, like noise number 2, have large ILD fluctuations. For some noises, e.g. number 90, both the IPD and the ILD fluctuations are large. Other noises, e.g. number 96, have very small fluctuations.

2.2Fluctuation Statistics

Average fluctuations statistics are shown in Table 1 as a function of duration. Two columns on the left show the mean of the standard deviation for IPD and ILD fluctuations. This mean is computed over the 100 stimuli in the entire ensemble of noises. These means correspond to the centroids of the mass of

Interaural Phase and Level Fluctuations as the Basis of Interaural Incoherence Detection 371

Fig. 1 IPD and ILD fluctuations for 100 noises, given as standard deviations averaged over time, for 500-ms noises

Table 1 Values of the mean and standard deviation of st[∆F] and st[∆L] for “14-Hz” noise-pairs with four durations: 25, 50, 100, and 500 ms. Correlations between st[∆F] and st[∆L] are also given

Duration (ms)

m (st[∆Φ])

m(st[∆L])

s (st[∆Φ])

s(st[∆L])

corr

(degrees)

(dB)

(degrees)

(dB)

 

 

 

 

 

 

25

3.62

0.41

3.56

0.36

0.53

50

5.85

0.73

6.83

0.65

0.74

100

7.81

1.06

6.83

0.68

0.75

500

12.20

1.60

5.14

0.50

0.68

 

 

 

 

 

 

points, shown for example in Fig. 1. The next two columns show the standard deviations (over 100 noises) of the standard deviations (over time). The IPD and ILD standard deviations correspond respectively to vertical and horizontal widths of the mass of points, shown for example in Fig. 1.

2.3Procedure

To create the stimuli for the experiment the five noises with the largest IPD fluctuations and the five noises with the smallest IPD fluctuations were chosen to make the ten stimuli of a collection to be called the “phase set.”

372

M.J. Goupell and W.M. Hartmann

Similarly ten stimuli were chosen on the basis of largest and smallest ILD fluctuations to make the level set.

The experiment was three-interval two-alternative forced choice. The first interval was diotic, the second or third interval, selected randomly, was dichotic with the interaural fluctuation. The remaining interval was again diotic. The two diotic intervals were created by presenting just the left channel of one of the ten noises, different from the target noise in the dichotic interval and different from each other. The listener’s task was to say which interval, the second or the third, contained the dichotic noise.

Beyond the simple forced choice, the listener was given the opportunity to declare confidence in his judgment. That procedural element led to a Confidence Adjusted Score (CAS) equal to the number of correct responses plus the number of correct confidence ratings. Listeners were discouraged from using the confidence rating casually. If an experimental run included more than one incorrect judgment, which the listener rated as confident, the run immediately terminated and the listener was obliged to begin again. The relative weighting given to confidence, whereby a confidence rating was given the same weight as a correct response, was determined by statistical tests which showed that the overall CAS was a relatively flat function of the weighting parameter when the relative weight was 1.0 (Goupell 2005).

An experiment run consisted of six trials for each of the ten dichotic noises presented in random order. A total of 6 runs led to a total of 36 trials for each noise. Thus, the maximum possible CAS was 72.

There were three listeners in the experiment. All were male and all had normal thresholds near 500 Hz.

2.4Results

The resulting CAS values for the phase sets and level sets are shown in Tables 2 and 3, respectively. The entries in the tables can be compared with the maximum possible value of 72. The random guessing limit would correspond to a CAS of 18, where half the decisions are correct and the listener is never confident about anything. Columns labeled “Min” and “Max” correspond to the five stimuli with minimum fluctuations and the five stimuli with maximum fluctuations. Therefore, each table entry is the average of the responses to five noises.

Tables 2 and 3 indicate that CAS values are considerably higher for the maximum fluctuations than for the minimum fluctuations for both the phase sets and the level sets, so long as the duration is 50 ms or greater. For a duration of 25 ms, there is little difference between Max and Min columns.

A two-sample one-tailed t-test tested the hypothesis that CAS values for maximum fluctuations were larger than CAS values for minimum fluctuations. The results of the test are given in Table 4. It is evident that the hypothesis is supported, normally at the 0.01 level or better, except for a duration of 25 ms.

Interaural Phase and Level Fluctuations as the Basis of Interaural Incoherence Detection

373

Table 2 CAS values (maximum 72) for three listeners – phase set

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

25 ms

 

 

50 ms

 

 

 

100 ms

 

 

 

500 ms

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Listener

Min

Max

 

Min

Max

 

Min

Max

 

Min

Max

 

 

 

 

 

 

 

 

 

 

 

 

 

D

16

25

11

45

 

8

57

 

40

62

M

23

24

12

50

 

13

60

 

37

61

W

25

25

14

53

 

17

67

 

28

60

 

 

 

 

 

 

 

 

Table 3 CAS values (maximum 72) for three listeners – level set

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

25 ms

 

50 ms

 

 

100 ms

 

 

500 ms

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Listener

Min

Max

Min

Max

Min

Max

Min

Max

 

 

 

 

 

 

 

 

 

 

 

D

15

21

12

46

 

13

58

 

40

64

M

20

27

15

47

 

15

50

 

35

70

W

17

29

19

44

 

19

54

 

31

66

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Table 4 The p-values for the phase and level sets with a nominal bandwidth of 14 Hz and four durations: 25, 50, 100, and 500 ms

 

25 ms

 

50 ms

 

 

100 ms

 

 

500 ms

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Listener

Phase

Level

 

Phase

Level

Phase

Level

Phase

Level

 

 

 

 

 

 

 

 

 

D

0.027

0.039

<0.001

0.010

<0.001

<0.001

<0.001

<0.001

M

0.306

0.023

0.002

0.002

<0.001

0.004

0.002

0.002

W

0.457

0.065

0.002

0.015

<0.001

0.002

0.001

<0.001

 

 

 

 

 

 

 

 

 

 

 

 

 

3Discussion and Further Experiments

The experiment made it clear that coherence itself, or cross-correlation of the stimulus waveform, is an inadequate predictor of incoherence detection. Instead, detection performance correlates well with size of fluctuations in interaural phase and level. This conclusion applies to a narrow band and stimulus durations from 500 to 50 ms.

3.125-ms Duration

The above conclusion does not apply for a stimulus duration of 25 ms, but for a 14-Hz bandwidth and 25-ms duration the stimulus fluctuations themselves are quite infrequent. As shown by the standard deviation columns in Table 1,