Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Учебники / Hearing - From Sensory Processing to Perception Kollmeier 2007

.pdf
Скачиваний:
150
Добавлен:
07.06.2016
Размер:
6.36 Mб
Скачать

144

S. van de Par et al.

2Experiment I

This experiment was carried out to investigate the ability of listeners to lateralize correctly two simultaneously presented, spectrally fully overlapping signals with opposing binaural cues. In order to perform this lateralization, listeners must be able to segregate and lateralize the signals based on their opposing binaural (ITD or ILD) cues and their different temporal structure. This task will be termed “discrimination task”, because it required listeners to discriminate between lateralizations of the signals.

Besides measuring lateralization thresholds for the two signals presented simultaneously, as a reference, also lateralization thresholds were measured for each of the signals separately. These tasks will be termed “detection task”, because they deal with basic lateralization detection thresholds.

2.1Method and Stimuli

For the discrimination task, stimuli consisted of two signals: a band-pass noise (BPN) with a flat spectral envelope, and a harmonic tone complex (HTC) with 20-Hz component spacings and a sinusoidal phase spectrum. Both signals had the same spectral range but differed in their temporal envelope structures. The level for each of the two signals was 65 dB SPL. The two reference intervals had binaural cues such that the BPN was lateralized to the right and HTC to the left using identical but opposing binaural cues. In the target interval the lateralizations of both signals were reversed.

In the narrowband conditions, bandwidths were one critical band wide (1 ERB) and centered at 280, 550 or 800 Hz. In the wideband condition, the bandwidth was 600-Hz wide (7 ERB) and centered at 550 Hz. The intervals had a duration of 400 ms, including 30-ms raised-cosine onset and offset ramps, and were separated by 300 ms of silence. For the ILD conditions, level changes were such that the total added energy of left and right target signal was always constant.

For the detection task, threshold ITDs and ILDs were measured for the BPN and the HTC in isolation. The two reference intervals had binaural cues such that the signal was lateralized to the left. The target interval had identical binaural cues such that the signal was lateralized to the right. Thus, total interaural differences between reference and test intervals were twice the values as reported hereafter.

The method used for measuring lateralization thresholds was a three-interval, three-alternative forced-choice adaptive-tracking procedure. The adaptive variable (ITD or ILD) was adjusted according to a two-down one-up rule, to track the 70.7%-correct response level. The initial adaptive variable was adjusted by multiplying or dividing the adaptive variable with a certain factor. Initially this factor was 2.51 (=108/20). After each second reversal the factor was reduced by taking its square root until the value of 1.12 (=101/20) was reached. Another eight reversals were measured at this step size and the median of these eight levels was used as an estimate of threshold. Feedback was provided after each trial.

Source Segregation Based on Temporal Envelope Structure and Binaural Cues

145

For each condition, at least four attempts were made by each subject to measure a threshold. However, when the adaptive variable exceeded a certain maximum value, the tracking procedure was terminated and no threshold was measured. For these conditions, measurements were either repeated to obtain a total of at least four proper threshold values, or, for the most difficult conditions, measurements were stopped and thresholds (if any) discarded. The thresholds for each condition were pooled and checked for severe outliers, thresholds at more than three times the interquartile range were removed from the data. Five normal hearing male subjects, including the four authors, participated in the experiments.

2.2Results and Discussion

Figure 1 shows the average detection and discrimination threshold and standard error of the mean for the various conditions.

For the ITD conditions (left panel), detection thresholds for lateralization changes of the narrowband BPN and HTC are about 10–20 s, while for these narrowband conditions a discrimination threshold could not be obtained. Detection thresholds for wideband BPN and HTC are similar to the lowest of the narrowband detection thresholds, about 12 s, while the wideband discrimination threshold is about 29 s. These data indicate that segregation of the BPN and the HTC by interaural time delays is not possible when their spectral energy is limited to one auditory filter.

For the ILD conditions (right panel), all detection thresholds for BPN and HTC are about 1 dB independent of center frequency and bandwidth, while narrowband discrimination thresholds are about 5–32 dB. Note that this 32-dB threshold at the 280-Hz center frequency is beyond the plausible range

ITD ( s)

50

 

 

 

 

Subject average

 

 

 

102

 

Subject average

 

280− Hz CF

Narrowband

800−Hz CF

Wideband

280−Hz CF

Narrowband

800−Hz CF

Wideband

45

 

550− Hz CF

550−Hz CF

 

550−Hz CF

550−Hz CF

40

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

35

 

 

 

 

 

 

 

 

 

 

 

101

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

30

 

 

 

 

 

 

 

 

 

 

 

(dB)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

25

 

 

 

 

 

 

 

 

 

 

 

ILD

 

 

 

 

20

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

100

 

 

 

 

15

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

10

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

5

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0

 

 

 

 

 

 

 

 

 

 

 

10−1

 

 

 

 

 

Discr

 

 

Discr

 

 

Discr

 

 

Discr

Discr

Discr

Discr

Discr

 

 

 

 

 

 

 

 

 

 

BPN HTC BPN HTC BPN HTC BPN HTC

 

BPN HTC

BPN HTC

BPN HTC

BPN HTC

Fig. 1 Average detection thresholds for BPN (circles) and HTC (squares), and discrimination thresholds (diamonds) are shown for ITD (left panel) and ILD (right panel) conditions for narrowband (open markers) conditions and wideband (closed markers) conditions. Vertical lines indicate the standard error of the mean

146

S. van de Par et al.

for localization. For these narrowband conditions, discrimination thresholds decrease with increasing center frequency. The wideband discrimination threshold is nearly equal to its corresponding detection thresholds. These data indicate that, in contrast to ITD narrowband conditions, segregation by interaural level differences of the narrowband noise and the harmonic tone complex is possible within one auditory filter, although thresholds are very high. Analogous to the ITD conditions, segregation is best for conditions where the stimulus spectrum covers multiple auditory filters. From this finding we conclude that segregation by interaural level differences is not only depending on the bandwidth of the signals, but also on the center frequency of the auditory filter.

Overall we can conclude that spectral energy in multiple auditory filters facilitates segregation by binaural listening of spectrally fully overlapping concurrent sound sources.

3Experiment II

This experiment was defined to investigate to what extent the ILD thresholds for the discrimination task obtained in Experiment I can be understood in terms of listening with one ear only.

3.1Method and Stimuli

The method was the same as in Experiment I, and measurements were limited to discrimination thresholds.

The stimuli were the same as the ILD conditions from Experiment I, only now the right ear signal was presented diotically. In this way, only monaural level cues resulting from the different temporal envelope structures of the HTC and the BPN were available for discriminating between the target and the reference intervals.

3.2Results and Discussion

Table 1 shows the subject average of monaural and previously established binaural thresholds and standard error of the mean for the various bandwidths and center frequencies.

As for binaural thresholds from Experiment I, monaural thresholds in the narrowband conditions decreased with an increase in center frequency. However, the monaural thresholds were much lower than binaural thresholds. Thresholds for the wideband condition were similar, indicating that monaural and binaural discrimination were equivalent.

Source Segregation Based on Temporal Envelope Structure and Binaural Cues

147

 

Table 1 Average discrimination thresholds and standard errors of the

 

 

mean (in brackets) for various bandwidths and center frequencies

 

 

 

 

 

 

 

 

 

 

 

Narrowband

 

Wideband

 

 

 

 

 

 

 

 

 

280-Hz

550-Hz

800-Hz

550-Hz

 

 

 

 

 

 

 

 

 

Monaural

4.7

3.2

2.5

1.2

 

 

[in dB]

(0.3)

(0.2)

(0.2)

(0.1)

 

 

Binaural

34.2

7.4

5.1

1.0

 

 

[in dB]

(6.9)

(0.8)

(0.8)

(0.1)

 

 

 

 

 

 

 

 

From these findings, we conclude that for narrowband signals binaural stimulus presentation can actually reduce discrimination performance in the case of ILD cues. In other words, in the binaural ILD discrimination condition, much better performance could be obtained if listeners were able to listen with only one ear.

4Experiment III

This experiment was defined to explore the effect of temporal envelope structure on the ability to segregate two spectrally overlapping signals by binaural cues.

4.1Method and Stimuli

The temporal envelopes of the BPN and HTC were manipulated in two ways. First, a random phase was applied to the HTC components, to yield a temporal envelope more similar to that of the BPN. Second, a 20-Hz amplitude modulation (AM) was applied to the BPN, to yield a temporal envelope more similar to that of the sine-phase HTC. The AM BPN and the HTC were presented temporally in-phase (φ = 0), such that the envelope maxima of both signals coincided, or out-of-phase (φ = π).

The method was the same as in Experiment I, except that measurements were limited to discrimination thresholds for the wideband condition, because these measurements led to the lowest thresholds.

4.2Results and Discussion

Table 2 shows the subject average and standard error of the mean for the different temporal envelope structure conditions (including the sine-phase HTC of Experiment I).

148 S. van de Par et al.

Table 2 Average discrimination thresholds and standard errors of the mean (in brackets) for various temporal envelope structures of HTC (sine/random phase) and AM BPN (φ = 0/φ = π)

 

Sine

Random

φ = 0

φ = π

 

 

 

 

 

ITD

29.2

-

-

30.0

[in µs]

(2)

 

 

(3)

ILD

1.0

29.0

1.5

1.0

[in dB]

(0.1)

(8.2)

(0.1)

(0.1)

 

 

 

 

 

For random phase conditions, segregation by ITDs was not possible, and segregation by ILDs was seriously degraded. For the conditions with the AM BPN and HTC presented in-phase (φ = 0), segregation by ITDs was also not possible, and for ILDs segregation was slightly reduced compared to the sinephase condition. For conditions with the AM BPN and HTC presented out-of- phase (φ = π), segregation was normal for both binaural cues.

From these findings, we conclude that in order to be able to segregate and lateralize the two signals, it is important that the temporal envelope structure of both signals is different, either due to different degrees of modulation or due to a difference in the relative timing of envelope maxima.

5Discussion and Conclusions

This study investigated to what extent two signals with a very similar spectral envelope but different temporal structures can be segregated based on binaural cues. From Experiment I, it appears that listeners can indeed segregate such signals based on a difference in binaural cues of the two signals if the bandwidth of the signals exceeds one critical band.

When we consider the ILD conditions, as was shown in Experiment II, the left or right ear signals by themselves provide monaural cues that are sufficient to distinguish reference and target intervals. Therefore, it is not certain that in Experiment I, the lateralization thresholds for the ILD conditions are necessarily based on binaural processing. Interestingly, for narrowband conditions, the performance based on listening only with the right ear signal was better than based on binaural listening, while in the wideband condition, performance was very similar. This suggests that for narrowband conditions, binaural fusion is mandatory and listening with one of the two ears is not possible.

When we consider the ITD conditions, segregation or lateralization of either one of the signals can only be explained based on binaural processing. However, the binaural cues are mixed equally within each critical band and therefore we have to assume that, somehow, the auditory system is able to separate the cues corresponding to both signals and couple them to the two signals.

Source Segregation Based on Temporal Envelope Structure and Binaural Cues

149

It is not clear, however, what would be the nature of binaural processing that allows separation of the ITD cues of both signals in Experiments I and III. It appears that a prerequisite for being able to segregate two binaural signals is that both signals have a distinctly different temporal envelope extending across several auditory filters. For the random-phase HTC and for the in-phase AM BPN, temporal envelopes of both signals were correlated and segregation was impossible. For the sine-phase HTC and the out-of-phase AM BPN, temporal envelopes were much less correlated and lateralization performance was not too dissimilar from that of the signals in isolation.

In addition, it appears that temporal envelopes resulting from narrowband signals by themselves do not provide sufficient differences in temporal envelopes across the various signals to facilitate segregation (cf. Experiment I). Possibly, across-frequency integration based on temporal envelope coherence helps to facilitate segregation (cf. Trahiotis and Stern 1994), but within-channel cues may also improve even when the signal exceeds one critical band.

One speculation would be that the monaural temporal envelope cues somehow help to select temporal intervals that belong to one of the two signals and that in this way temporally varying binaural cues are organized to facilitate lateralization.

Alternatively, the auditory system may adopt several hypotheses about how the monaural input signals can be segregated based on monaural temporal envelope features. The binaural cues corresponding to each of the envelope features may help to tip the balance towards one of the hypotheses based on the assumption that within one auditory stream, binaural cues have to be coherent across time.

In a previous contribution (van de Par et al. 2005) we reported a study where listeners had to discriminate between BPN and HTC signals both presented interaurally out-of-phase within an in-phase noise masker. Listeners were able to discriminate between these signals at signal-to-masker levels for which no monaural cues were available for performing this task. This suggests that within the binaural display, information is available about the temporal structure of the out-of-phase signal. An equalization-cancellation (EC) stage (e.g. Durlach, 1963; Breebaart et al. 2001) could in principle provide such information, because it would cancel the noise masker component of the stimulus but not the out-of-phase signal component. It would require, however, that within the binaural display, the capacity to process temporal information is sufficiently good to distinguish between HTC and BPN signals.

In the current study, in a similar way, an EC stage could remove one of the two signals, allowing for the temporal processing of the other signal. It is not evident, however, how this could explain the finding of Experiment III, that the in-phase AM BPN could not be segregated from the HTC. Possibly it is related to the observation that for any internal time delay that is applied in the equalization stage, the output of the cancellation stage has one dominant modulation rate of 20 Hz. For the out-of-phase condition, however, modulation rates of 40 and 20 Hz can be observed depending on the internal delay. This difference in modulation rate across the binaural display may facilitate segregation.

150

S. van de Par et al.

Finally, we want to draw attention to the rather low lateralization threshold of 29 s that was found in the ITD discrimination conditions with wideband stimuli. This indicates that a difference in azimuth between the simultaneously presented HTC and the BPN of about 6° in a free-field condition would already be sufficient for listeners to notice the difference in azimuth.

In conclusion, this study presents an interesting stimulus paradigm that reveals binaural segregation based on monaural temporal envelope cues. Results are not easily understood in terms of existing binaural models.

References

Breebaart J, van de Par S, Kohlrausch A (2001) Binaural processing model based on contralateral inhibition. I. Model setup. J Acoust Soc Am 110:1074–1088

Culling JF, Summerfield Q (1995) Perceptual separation of concurrent speech sounds: absence of across-frequency grouping by common interaural delay. J Acoust Soc Am 98:785–797 Durlach NI (1963) Equalization and cancellation theory of binaural masking-level differences.

J Acoust Soc Am 35:1206–1218

Trahiotis C, Stern RM (1994) Across-frequency interaction in lateralization of complex binaural stimuli. J Acoust Soc Am 96:3804–3806

van de Par S, Kohlrausch A, Breebaart J, McKinney M (2005) Discrimination of different temporal structures of diotic and dichotic target signals within diotic wide-band noise. In: Pressnitzer D, de Cheveigné A, McAdams S, Collet L (eds) Auditory signal processing: physiology, psychoacoustics, and models. Springer, Berlin Heidelberg New York, pp 398–404

Comment by Kollmeier

Your statement that monaural detection precedes binaural processing is in conflict with current models of binaural noise suppression for wideband experiments such as speech discrimination in noise with different spatial location of noise masker and speech signal. Beutelmann and Brand (2006, JASA) from our group, for example, were able to predict speech intelligibility in noise with different reverberant environments using EC-models in each critical band followed by a (monaural) speech intelligibility prediction stage employing the SII. How did you make sure that the discrimination task performed by your listeners was not corrupted by a confusion between signal and interferer?

References

Beutelmann R, Brand T (2006) Prediction of speech intelligibility in spatial noise and reverberation for normal-hearing and hearing-impaired listeners. J Acoust Soc Am 120:331–342

Source Segregation Based on Temporal Envelope Structure and Binaural Cues

151

Reply

We made sure that the two target signals were not confused by presenting the harmonic tone complex target in isolation before each single adaptive threshold measurement. Therefore the listeners knew what the target was that they needed to lateralize. The data that we obtained for the wideband ITD conditions are difficult to reconcile with the idea that after binaural interaction (e.g. Equalization and Cancellation) the binaural display provides a representation that is rich enough to process one of the stimulus components as if it were presented in isolation monaurally. For example, assuming the Equalization and Cancellation processing is able to cancel out the bandpass noise component, the remaining output of the EC stage would reflect the presence of the harmonic tone complex. If this information could be processed with the same temporal processing capacity as the monaural system could provide, listeners should have no difficulties in determining the lateralization of the tone complex in any of the ITD conditions of Experiment III. In contrast we found that listeners were not able to perform their task for conditions where monaural segregation is more difficult; i.e., for a random-phase tone complex, for the sine-phase tone complex temporally aligned with an AM noise, or for any of the narrowband ITD conditions. Therefore, we think that these results are related to the difficulties that listeners have to monaurally segregate the tone complex and the bandpass noise, which apparently is a prerequisite for being able to use the binaural ITD information.

Comment by Verhey

One of your conclusion is that monaural (better-ear) cues can not be used in the binaural case. This assumption is based on a diotic experiment which produced considerably lower thresholds than the binaural condition. This raises the question about the order of the experiment and the instructions for the subjects. If the diotic data has been obtained after the binaural data lower thresholds may have resulted from training effects. Did the author check for possible training effects? In addition, can the authors rule out the possibility that listeners are able to use the monaural cue, if they are explicitly told to do so in the binaural experiment?

Reply

After returning from the conference, we ran an additional series of measurements where the diotic and dichotic narrowband ILD condition was remeasured for a center frequency of 550 Hz. The two conditions were presented in an interleaved manner. Listeners were instructed to listen only to the right ear for the dichotic condition. We found essentially the same results as reported in our

152

S. van de Par et al.

paper, suggesting that differences in training across the different conditions did not play a role. In addition, the specific instruction did not lead to an improved performance for the dichotic condition. We note, however, that the increase in thresholds for the dichotic condition as compared to the diotic condition was more pronounced in some subjects than in others, both in the initial and the new measurements. Nevertheless, we consistently found an increase in thresholds in all listeners for the dichotic condition.

Comment by Ihlefeld and Shinn-Cunningham

We find your results very interesting, but think that there may be alternative interpretations of the results. Did you ask your listeners how many objects they perceived and where they heard the stimuli? You discuss the results as if the subjects must have both segregated and lateralized the sources properly in order to perform the task. However, because you used a 3AFC task, subjects could have simply listened for any difference across intervals, whether they heard the stimuli as one object whose spatial properties changed or two segregated objects in different locations. Moreover, as was brought up by the other commentary, we wondered whether the similarity of the two signals was a factor. We believe that it is necessary, but not sufficient, to hear sound sources as separate objects in order to perceive them as coming from different locations – that spatial cues do not drive segregation. If the listeners could not segregate the stimuli, they might be able to do the task by listening for changes in the spatial quality of the single source.

To satisfy our curiosity about what was actually happening, we generated and listened to stimuli constructed as described in your paper. In the narrowband stimulus conditions of Experiment I, we could hear two objects, but they appeared to come from one diffuse spatial location, preventing us from performing the task. In the broadband conditions of Experiment III that were difficult (in phase AM and random phase HTC), we only heard one object with a diffuse location. Only when the stimuli were broadband with very different envelopes (the sin-phase HTC in Experiment 1 and the pi-phase AM condition of Experiment III) did we hear two objects at different locations.

In short, we think that there are important interactions between the way listeners group the stimuli and how they perform the task.

Reply

We agree that any change in spatial quality of the composite stimulus across the test and the reference interval could in principle be sufficient to allow listeners to perform the task as long as this spatial quality has a left-right orientation. We performed a control experiment (which we didn’t mention in our presentation or paper) where the lateralization cues (ITDs) of both tone complex and the

Source Segregation Based on Temporal Envelope Structure and Binaural Cues

153

bandpass noise were roved by a certain random offset in the same direction. The offset was never larger than the lateralization cues before roving such that both the tone complex and the bandpass noise would still have ITDs that pointed to different sides of the head. In this way the spatial quality of the composite stimulus varied strongly across the three intervals, which should make it rather difficult to detect changes in the spatial quality of the composite stimulus. Nevertheless, ITD thresholds for wideband stimuli with a sine-phase tone complex hardly increased as a result of adding the roving. This result seems to be more in line with the assumption that listeners are able to lateralize the separate stimulus components.

Furthermore, your comments seem to follow our line of thinking expressed in the presentation and the paper, that the monaural dissimilarity between stimuli is a prerequisite to be able to assign the binaural cues to at least one of the objects (see also our reply to the comment of Kollmeier).