Учебники / Hearing - From Sensory Processing to Perception Kollmeier 2007
.pdf
272 |
C. Micheyl et al. |
3.0
2.5
2.0
d' 1.5
1.0
0.5
0.0
-1 |
0 |
1 |
2 |
Number of bursts between consecutive targets
Fig. 4 Performance in the discrimination task as a function of the number of bursts between consecutive targets. This is for 20-burst-long sequences, with roving target frequency. The different symbols indicate different protected-size widths, using the same format as in Fig. 2
trends, although in that task, reducing the target rate from one every two to one every three bursts did produce some decrement in performance.
One last result was obtained in a condition where the target tones were dispatched randomly across the bursts, and formed an irregular temporal pattern. Testing this condition in the detection task turned out to be impractical, because the listeners could no longer simply be asked to indicate whether they heard regularly repeating targets, and it was not obvious how else to instruct them. However, the condition could be tested in the frequency discrimination task, as listeners actually found it no more difficult to perceive the frequency shift applied to the target tones when these were temporally irregular than when they occurred at regular intervals. The results (not shown here) that were obtained in this condition using 10and 20-burst-long sequences confirmed this impression: performance with the temporally irregular targets was statistically indistinguishable from that obtained in the condition where the targets repeated at regular intervals with the same density.
4Discussion
The results allow us to rule out certainly potential explanations for how a tone sequence is detected in a random-masker background. First, detection does not seem to depend on average rate or energy differences between the target and masker frequencies: most of the conditions presented here had
Hearing Out Repeating Elements in Randomly Varying Multitone Sequences |
273 |
average target rates and energy that were identical to those of the masker components, and yet detection was possible, even when the target frequency was roved. Thus, the results argue strongly against temporal energy integration and event-counting mechanisms. Second, in the case of the roved target frequency, detection did not seem to be based on a serial search mechanism, which monitors each frequency band sequentially for the presence of the target. Third, the results argue against detection based on the temporal regularity of the target, since making the targets randomly irregular in time did not adversely affect discrimination performance.
The results of this study are generally consistent with the hypothesis that the detection of repeating target tones depends on the same mechanisms that have been proposed for the formation of auditory streams of pure tones. In particular, the results show a strong and systematic dependence of performance on the size of the protected region around the target tones. This parameter can be thought of as an analogue of the frequency-separation parameter in streaming experiments involving repeating tone sequences (Bregman 1990), and its influence is likely to depend on the frequency selectivity of neurons in the central auditory system (Fishman et al. 2001; Micheyl et al. 2005b). The improvement in performance found when the targets were presented every burst is also consistent with the fact that the degree of streaming is related to the gaps between successive tones in a stream (Bregman et al. 2000). Based on our other results (above) and those of Kidd et al. (2003), the improvement is unlikely to be due simply to increased signal energy or multiple looks. Alternative explanations based on physiological findings, such as response enhancement effects (Brosch and Schreiner 2000), remain viable.
A final similarity between our results and the known properties of auditory streaming relates to how performance improved over time, or as the number of bursts increased. Although we ruled out temporal energy-integration and serial-search mechanisms, this effect could still be explained by sensory- evidence-accumulation mechanisms; however, it is unclear just what evidence is accumulated, since the temporal regularity of the target does not seem to be key. A possible solution is that the increasing salience of the targets is not mediated by the accumulation of sensory information but rather by adaptation. Micheyl et al. (2005b) have recently shown how multi-second neural adaptation in the auditory cortex may explain the build-up of auditory stream segregation (Bregman 1978). Unfortunately, the characteristics of cortical adaptation to randomly varying tones like those used here are not known, and we can therefore only speculate as to whether this type of phenomenon may also explain the increasing salience of repeating target tones in random multitone backgrounds. Neurophysiological and modeling studies, which are currently being performed (see Elhilali and Shamma, this volume), will hopefully answer this and other important questions.
Acknowledgments. Work supported by NIDCD grant R01 DC 07657. The authors would like to thank Gerald Kidd for helpful suggestions.
274 |
C. Micheyl et al. |
References
Bregman AS (1978) Auditory streaming is cumulative. J Exp Psychol 4:380–387 Bregman AS (1990) Auditory scene analysis. MIT Press, Cambridge
Bregman AS, Ahad PA, Crum PA, O’Reilly J (2000) Effects of time intervals and tone durations on auditory stream segregation. Percept Psychophys 62:626–636
Brosch M, Schreiner CE (2000) Sequence sensitivity of neurons in cat primary auditory cortex. Cereb Cortex 10:1155–1167
Demany L, Ramos C (2005) On the binding of successive sounds: perceiving shifts in nonperceived pitches. J Acoust Soc Am 117:833–841
Fishman YI, Reser DH, Arezzo JC, Steinschneider M (2001) Neural correlates of auditory stream segregation in primary auditory cortex of the awake monkey. Hear Res 151:167–187
Kidd G Jr, Mason CR, Deliwala PS, Woods WS, Colburn HS (1994) Reducing informational masking by sound segregation. J Acoust Soc Am 95:3475–3480
Kidd G Jr, Mason CR, Richards VM (2003) Multiple bursts, multiple looks, and stream coherence in the release from informational masking. J Acoust Soc Am 114:2835–2845
Micheyl C, Carlyon RP, Cusack R, Moore BCJ (2005a) Performance measures of auditory organization. In: Pressnitzer D, de Cheveigné A, McAdams C, Collet L (eds) Auditory signal processing: physiology, psychoacoustics, and models. Springer, Berlin Heidelberg New York, pp 203–209
Micheyl C, Tian B, Carlyon RP, Rauschecker JP (2005b) Perceptual organization of tone sequences in the auditory cortex of awake macaques. Neuron 48:139–148
Neff DL, Green DM (1987) Masking produced by spectral uncertainty with multicomponent maskers. Percept Psychophys 41:409–415
Watson CS, Wroton HW, Kelly WJ, Benbassat CA (1975) Factors in the discrimination of tonal patterns. I. Component frequency, temporal position, and silent intervals. J Acoust Soc Am 57:1175–1185
30 The Dynamics of Auditory Streaming:
Psychophysics, Neuroimaging, and Modeling
MAKIO KASHINO1,2,3, MINAE OKADA2, SHIN MIZUTANI1,
PETER DAVIS1, AND HIROHITO M. KONDO1
1Introduction
Listening to a speaker or a melody in the presence of competing sounds crucially depends on our brain’s sophisticated ability to organize a complex sound mixture changing over time into coherent perceptual objects or “streams”, which generally correspond to sound sources in the environment (Bregman 1990). The acoustic factors governing this “auditory streaming” are well established (Carlyon 2004), and various theories have been proposed to explain auditory stream formation (Anstis and Saida 1985; Beauvois and Meddis 1991; Bregman 1990; Hartman and Johnson 1991; McCabe and Denham 1997; van Noorden, 1975). However, it remains unclear how and where auditory streaming is achieved in the brain.
A common limitation of early studies that tried to find the neural correlates of auditory streaming is that the neural response patterns corresponding to the different states of perceptual streaming were evoked by physically different stimuli (Alain et al. 1998; Fishman et al. 2001, 2004; Näätänen et al. 2001; Sussman et al. 1999). This makes it difficult to determine whether the neural response patterns reflect perception per se or they simply reflect the physical properties of the stimuli.
To overcome this difficulty, recent studies have taken advantage of the fact that the segregation of sounds into streams typically takes several seconds to build up (Carlyon et al. 2001; Cusack 2005; Gutschalk et al. 2005; Micheyl et al. 2005). Under appropriate conditions, a physically unchanging sequence of alternating tones initially tends to be heard as a single coherent stream, and after several seconds it appears to split into two distinct streams (Anstis and Saida 1985). This makes it possible to compare neural responses corresponding to different percepts without introducing any physical change in the stimulus. Based on this approach, the neural correlates of auditory streaming have been found in the primary auditory cortex (Micheyl et al. 2005), the non-primary auditory cortex (Gutchalk et al. 2005), and the intraparietal
1NTT Communication Science Laboratories, NTT Corporation, Japan, kashino@avg.brl.ntt.co.jp, shin@cslab.kecl.ntt.co.jp, davis@cslab.kecl.ntt.co.jp, hkondo@brl.ntt.co.jp
2ERATO Shimojo Implicit Brain Function Project, JST, Japan, mokada@shimojo.jst.go.jp 3Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology, Japan
Hearing – From Sensory Processing to Perception
B. Kollmeier, G. Klump, V. Hohmann, U. Langemann, M. Mauermann, S. Uppenkamp, and J. Verhey (Eds.) © Springer-Verlag Berlin Heidelberg 2007
276 |
M. Kashino et al. |
sulcus (Cusack 2005). Seemingly the findings rather diverge, and some other lines of research are desired.
Here, we focus on an aspect of auditory streaming that has been paid little attention to, namely, the spontaneous transitions of percepts for a physically unchanging sequence of alternating tones not only during but also after the initial buildup of stream segregation. We examine the nature of the perceptual transitions in a psychophysical experiment and analyze the data from a viewpoint of stochastic point process. Moreover, we explore brain activities correlated with the perceptual transitions using functional magnetic resonance imaging (fMRI).
2Psychophysical Experiment and Stochastic Process Analysis
2.1Methods
The test sequences were 900 repetitions (6 min.) of a triplet pattern composed of L and H tones (Fig. 1). The size of the frequency difference (∆ƒ) between L and H was varied from approximately 1/12 to 1 octave while the center frequency was fixed to 1 kHz. To avoid harmonic consonance between L and H, the frequency of each tone was set to the following values: L = 967 Hz and H=1039 Hz in the ∆ƒ≈1/12 octave condition, L = 937 Hz and H =1069 Hz in the ∆ƒ≈1/6 octave condition, L=883 Hz, H=1129 Hz in the ∆ƒ≈1/3 octave condition, L = 823 Hz, H = 1213 Hz in the ∆ƒ≈1/2 octave condition, and L = 691 Hz, H = 1447 Hz in the ∆ƒ≈1 octave condition. The duration of each tone was 40 ms, including rising and falling cosine ramps of 10 ms. The difference of onset time between the adjacent L tone and H tone within a triplet was 100 ms, and that between neighboring triplets was 200 ms. The duration of a triplet was 400 ms per cycle, including 160 ms of silence after the offset of the third tone in a triplet.
Fig. 1 Schematic representation of test sounds used in the experiment
The Dynamics of Auditory Streaming: Psychophysics, Neuroimaging, and Modeling |
277 |
Ten adults aged from 20 to 33 years with normal hearing participated. Participants were instructed to listen to the test sequence passively without any particular focus or attitude and judge whether they perceived “one stream” (LHL-LHL-. . .) with a galloping rhythm or “two streams” (L-L-. . .
and H---H---. . .) with an isochronous rhythm for each stream. Participants responded by touching the respective key of a response box whenever the perception changed in each session. The stimuli were presented to the left ear through a headphone at 60 dB SPL.
Each participants ran ten sessions for each of the five ∆ƒ conditions.
2.2Results and Discussions
In each session, we obtained a series of response times at which the percept changed from “one stream” to “two streams” or in the opposite direction. In all ∆ƒ conditions for all participants, the initial response was “one stream” and it then changed to “two streams” within several seconds. This confirmed the cumulative effect of stream segregation reported in previous studies (Anstis and Saida 1985). The time required for the buildup of streaming decreased as ∆ƒ increased. This tendency is evident in the time course of the mean number of perceived streams averaged in every millisecond across sessions and participants in each ∆ƒ condition (Fig. 2).
As the stimulus presentation progressed further, participants’ responses changed frequently in all ∆ƒ conditions. Figure 3 shows the mean number of perceptual transitions (Nt) averaged across sessions and participants for each ∆ƒ condition in the line plot. More than 30 perceptual transitions occurred in
Fig. 2 Time course of the mean number of perceived streams. Each line shows the transition of the mean number of perceived streams every millisecond
278 |
M. Kashino et al. |
Fig. 3 The mean number of perceptual transitions in a single session (Nt) in a line plot, and the mean total duration of “one stream” (T1) and “two streams” (T2) in bar plots
a session regardless the size of ∆ƒ. This is surprising, because it has been well established that such perceptual transitions can occur only within a limited range of ∆ƒ (1/3 ≤ ∆ƒ ≤ 1/2 octave at the repetition rate of 100 ms) (van Noorden 1975). Figure 3 also shows the mean total duration of “one stream” responses (T1) and that of “two streams” responses (T2) per session (360 s) in bar plots. Both values were averaged across sessions and participants in each ∆ƒ condition. As the size of ∆ƒ increased, the value of T1 decreased and that of T2 increased, and their proportions in a session reversed at ∆ƒ≈1/3 octave.
The autocorrelation functions of the response time series revealed no periodicity in the perceptual changes in all ∆ƒ conditions. Moreover, no correlation was found between successive intervals of the same percept, or between successive intervals of different percepts.
These findings suggest that auditory streaming should be thought of as a stochastic process; we know the likelihood of a particular percept but cannot determine which percept would occur at a particular time. The likelihood of a percept depends on ∆ƒ in the present case, but there is no fixed perceptual boundary along the ∆ƒ axis.
Figure 4 shows the histograms of the duration of each perceptual state for each ∆ƒ condition, pooled over the ten sessions and ten participants. The width of each bin is 500 ms. All of the distributions have a fast growth and a long tail. We used the lognormal, gamma and Weibull distributions for fitting to the histograms using the maximum likelihood method, and found that the lognormal distribution is the best fitting in 92% of the histograms.
The probability density function of the lognormal distribution is given by
|
1 |
- (lnt - n)2 |
/2v2 |
p (t)= |
|
e |
(1) |
2π vt |
The Dynamics of Auditory Streaming: Psychophysics, Neuroimaging, and Modeling |
279 |
Fig. 4 Histograms of the durations of each of the two percepts and the estimated probability distribution of durations for each ∆ƒ condition
where σ and µ are the mean and standard deviation of logarithm of t for t ≥ 0, respectively. Each panel of Fig. 4 shows the values of parameters σ and µ and estimated lognormal distributions plotted by the continuous line in each ∆ƒ condition. The Kolmogorov-Smirnov goodness-of-fit test rejected the lognormal distribution hypothesis in 32% of the histograms (p<0.05). This is comparable to a recent study on perceptual transitions in visual ambiguous figures, which also showed that the lognormal distribution is the best fitting (Zhou et al. 2004).
Next, we assumed two independent stochastic point processes that alternate with each other for perceptual transitions from “one stream” to “two streams” and in the opposite direction. Since no correlation was found between successive intervals, the rate of transition depends simply on the time from the previous transition. The rate of transition l(t) can be calculated from the distribution p(t) by Eq. (2):
m (t)= |
p (t) |
(2) |
t |
||
|
1- #p (x) dx |
|
0
280 |
M. Kashino et al. |
Fig. 5 Transition rates for a typical participant calculated from experimental histograms (cross points) and from the best fitting lognormal distribution (lines)
Figure 5 shows examples of transition rates for a typical participant calculated from experimental histograms (cross points) and from the best fitting lognormal distribution (lines). The transition rates grow quickly after the previous transition, and decay slowly to nonzero values. This shape cannot be produced by the gamma or Weibull distributions. Neural mechanisms having such transition rates may underlie the spontaneous transitions in auditory streaming.
3Neuroimaging
3.1Methods
We performed a pretest to select suitable participants for the fMRI experiment who had long intervals between perceptual transitions, taking slow time constants of blood oxygen level dependent (BOLD) response into account. The 24 participants (12 males, 12 females; 19–30 years of age) selected by the pretest were right-handed adults with no history of neurological and psychiatric illness. The stimuli and procedure of the pretest and the fMRI experiment were essentially the same as those in the psychophysical experiment described in Sect. 2, except for the following points. The pretest and the fMRI experiment consisted of 5 sessions (90 s for each). Only 2 ∆ƒ conditions (∆ƒ≈1/6 and 1/2 octave) were tested using separate groups of participants (12 participants for each).
The Dynamics of Auditory Streaming: Psychophysics, Neuroimaging, and Modeling |
281 |
Images were obtained using a 1.5-T MRI scanner. Functional images sensitive to BOLD signal were acquired by a single-shot echo-planar imaging sequence (TR 2 s, TE 48 ms, flip angle 80°, voxel size 3 × 3 × 7 mm, 20 contiguous slices). Data were analyzed by SPM2. We modeled each event of perceptual transitions with a canonical hemodynamic response function. A fixed effect model was used to obtain activation maps of subject-specific linear contrasts. These contrasts were entered to a random-effect model to estimate averaged activations.
3.2Results and Discussions
The behavioral data of the pretest and the fMRI experiment replicated well the essential features of the psychophysical data described in Sect. 2.
In the ∆ƒ≈1/2 octave condition, significant activation synchronized with the perceptual transitions was observed in the auditory areas (BA42), the superior temporal sulcus (BA21/22), and the posterior insular cortex. In the ∆ƒ≈1/6 octave condition, the supramarginal gyrus (BA40), the left intraparietal sulcus (BA7), and the thalamus were activated in addition to the regions activated in the ∆ƒ≈1/2 octave condition. Figure 6 shows regions activated
Fig. 6 Regions activated when the percept changed from “one stream” to “two streams” (bottom) and in the opposite direction (top) in the ∆ƒ≈1/6 octave condition (left) and in the ∆ƒ≈1/2 octave condition (right) (z=0, N=12)
