Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Учебники / Hearing - From Sensory Processing to Perception Kollmeier 2007

.pdf
Скачиваний:
160
Добавлен:
07.06.2016
Размер:
6.36 Mб
Скачать

Human Auditory Cortical Processing of Transitions Between ‘Order’ and ‘Disorder’

325

followed by a 600-ms post-transition segment (either random or constant). Controls were 1440 ms random or constant throughout. Random sequences contained successive tone pips of a random frequency. We used three pip durations (15, 30 and 60 ms) which were presented by blocks. Frequencies were drawn from 20 frequency values equally spaced on a log scale between 222 and 2000 Hz. Tone and pip onsets and offsets were ramped with 3-ms cosine-squared ramps. Presentation order was randomized with an interstimulus interval between 600 and 1400 ms.

The stimulus set also included a proportion (25–33%) of target stimuli (not analyzed), which subjects were required to rapidly respond to. These stimuli served to keep the listeners alert and attentive but did not involve IAC or tone-change processing.

Neuromagnetic recording and analysis - the magnetic signals were recorded using a 160-channel, whole-head axial gradiometer system (KIT, Kanazawa, Japan). The 20 strongest channels over the temporal regions of the head of each subject (5 sinks and 5 sources in each hemisphere) were determined in a pre-experiment (see Chait et al. 2005).

Two measures of dynamics of cortical processing are reported: the amplitude time course (increases and decreases in activation) as reflected in the root mean square (RMS) of the selected channels, and the accompanying spatial distributions of the magnetic field (contour plots) at certain times post onset. For illustration purposes, we plot the RMS derived from the grand-average (average over all subjects for each of the channels) but the statistical analysis is always performed on a subject-by-subject, hemisphere by hemisphere, basis, using the RMS values of the 10 channels chosen for each subject in each hemisphere.

For each of the stimulus conditions 1500-ms epochs (including 200-ms pre onset) were created. Approximately 100 epochs of each condition were averaged, low-pass filtered at 30 Hz, and base-line corrected to the pre-onset interval. The consistency, across subjects, of peaks and magnetic field distributions was assessed automatically as described in Chait et al. (2005).

3IAC Experiment Results

Clear transition responses were recorded from auditory cortex even though listeners performed a task that is irrelevant to IAC change. In Chait et al. (2005) we show that this pre-attentive cortical sensitivity mirrors psychophysicallymeasured sensitivity. Results for all conditions (including intermediate values of IAC) are reported in Chait et al. (2005). Here we concentrate on transitions between extreme values of IAC (1 and 0), denoted as 1→0 and 0→1. The auditory evoked responses to 1→0 and 0→1 transitions are shown in Fig. 2A,B. The origin of the time scale coincides with the onset of the signals and the change is introduced at 800 ms post onset. While the onset responses for

326

M. Chait et al.

Fig. 2 Measured data. RMS magnetic field measured in the IAC experiment (left) and tone experiment (right). Plotted in black is the root mean square (RMS) magnetic field over all 157 channels derived from the grand-average (average over subjects and stimulus repetitions) of the evoked auditory cortical field. The upper plot in each column corresponds to the order-to-disorder transition (correlated to uncorrelated, or constant to random). The bottom plots correspond to the opposite transition. The appropriate control conditions (no change) are plotted in grey. The onset of the stimulus and onset of change are marked with dashed lines. Contour plots indicate the topography of the magnetic field at critical time periods. Source = white, Sink = black. Onset response patterns differed between experiments. In the IAC experiment (left), stimulus onset responses to correlated (A) and uncorrelated (B) noise stimuli were comparable, and characterized by two peaks at approximately 70 ms and 170 ms post onset with similar magnetic field distributions (analogous to the classic M50 and M150 onset responses). In the tone experiment (right), onset responses were characterized by a single peak at approximately 110 ms post onset, with a topography characteristic of the classic M100 response. In contrast to the onset responses, transition responses were remarkably similar between experiments (compare A and C, B and D). Both experiments revealed the same asymmetry. Transitions from order to disorder (A and C) were characterized by peaks at ~70 ms and ~150 ms post change onset whereas transitions from disorder to order (B,D) were dominated by a single peak at around 150 ms. Furthermore, whereas both peaks in A and C exhibited an M50-like topography, the single peak in B and D was characterized by an opposite (M100 like) dipolar distribution

Human Auditory Cortical Processing of Transitions Between ‘Order’ and ‘Disorder’

327

correlated and uncorrelated noise are similar in temporal dynamics (two peaks at about 50 and 150 ms post onset) and spatial distribution, the responses to the IAC transitions (shaded) are markedly different. The change in the 1→0 condition (Fig 2A) is characterized by two peaks at ~70 ms and ~150 ms post change. In contrast, the opposite, 0→1 transition, evoked only one peak at about 130 ms. Remarkably, the dipolar distribution of the peak in the 0→1 stimulus is of opposite polarity to the two peaks in the 1→0 condition, indicating that this response cannot be a delayed activation of the neural substrates responsible for the first peak in the 1→0 condition, and is most probably generated by a different neural mechanism.

4Tone Experiment Results

The auditory evoked responses to constant-to-random and random-to- constant 300-ms pip-size stimuli are shown in Fig. 2C,D (results for other conditions and regarding cortical adjustment to pip-size are reported elsewhere; Chait et al. 2007). The change is introduced at 840 ms post onset. The MEG activity evoked by the stimuli exhibits an onset response, about 100 ms after the onset of the stimuli and a later response related to processing the change, which begins at about 900 ms post onset (60 ms post change). Unlike the onset responses which are very similar across conditions, transition responses are distinctly disparate in their temporal dynamics and field distributions. The change from a constant tone to a random sequence of tone pips evokes two consecutive deflections, at about 70 and 150 ms post change onset. The opposite transition evokes only one peak, occurring about 150 ms post change onset. As in the IAC data, the dipolar distribution of the transition response peak in Fig. 2D is of opposite polarity from the transition response peaks in Fig. 2C indicating that activity results from a different neural substrate. Thus the response to the ‘random-to-constant’ transition does not merely reflect delayed activation compared to the ‘constant-to-random’ transition. Rather, the data suggest that the sequence of cortical activation is distinct in each case: processing of transitions in each direction involves different neural computations.

5Discussion

‘Binaural sluggishness’ refers to the apparent insensitivity of the binaural system to rapidly varying interaural configuration. For example, it has been demonstrated that listeners become less sensitive to time-varying changes in interaural correlation as the change rate is increased (Grantham 1982). This ‘sluggishness’ is assumed to reflect the existence of a ‘binaural integration

328

M. Chait et al.

window’ that operates subsequent to the site of binaural interaction, i.e. at or above the IC. However, Joris et al. (2006) found no evidence of sluggishness in responses of IC cells to stimuli with dynamic interaural correlation. It is thus possible that the mechanisms we observe reflect the operation of this integration process. However, the similarity between the tone and IAC data presented here point to the fact that this mechanism is not special to binaural processing. The resemblance suggests that the responses we observe reflect the operation of a general auditory cortical change-detection mechanism, which handles in a common fashion changes along very different stimulus dimensions and specifically transitions between irregularity and regularity. Therefore ‘Binaural sluggishness’ might not be a result of special binaural integration windows but may in fact result from the same integration mechanisms that process monaural data. The apparently larger integration for binaural than monaural stimuli (e.g. Kollmeier and Gilkey 1990) may stem from differences in the statistics of the two kinds of signals. By searching for a monaural stimulus (for example by adjusting the pip size/distribution properties of our tone stimuli) that results in the same response properties (peak latencies) as the binaural stimuli in the IAC experiment, we may be able to learn more about how binaural information is centrally represented.

Change detection in humans is usually investigated with the MMN paradigm (Polich 2003), which is based on comparing brain responses to deviant (low probability) signals presented among standard (high probability) signals. The MMN response (derived by subtraction between the responses to standard and deviants) peaks between 100 and 200 ms and is interpreted as reflecting a process that registers a change in a sound feature and updates the ongoing representation (Winkler et al. 1996). MMN experiments are typically conducted with silent intervals between relevant stimuli, so that the time at which a stimulus is compared to the preceding one is defined by the experimenter. The experiments here target a stage that directly precedes the one probed with standard MMN techniques: In natural environments changes are superimposed on the continuous waveform that enters the ear and a listener thus needs a mechanism to decide at which point in a continuous sound the change is introduced.

In transitions such as correlated-to-uncorrelated or constant-to-random (Fig. 1A) the change can be detected immediately by the system – the first waveform sample that violates the regularity rule suffices to signal the transition. However in the opposite transition - from uncorrelated to correlated or from random-to-constant (Fig. 1B) the system must wait long enough to distinguish the transition from a momentary ‘lull’ in the fluctuation, such as might occur by chance. The amount of time an optimal listener would have to wait in order to detect the change depends on the statistical properties of the stimulus. This may explain the finding that the two kinds of transitions are handled by different cortical systems: The first deflection at ~70 ms post change, with M50 like dipolar distribution that occurs in situation when the change is immediately detectable (Fig. 2A,C) may be reflecting the operation of an ‘obligatory cortical integration window’. The response at around 100–150 ms with the M100 like

Human Auditory Cortical Processing of Transitions Between ‘Order’ and ‘Disorder’

329

dipolar distribution may reflect the operation of another, ‘adjustable window’, provided by a separate neural substrate, which integrates incoming information to reach a sufficient level of certainty that a change has occurred.

Interestingly, these transition responses are similar to the properties of the ‘pitch onset response’ (POR; Krumbholz et al. 2003; Gutschalk et al. 2004; Ritter et al. 2005; Chait et al. 2006). The POR, hypothesized to reflect cortical pitch processing mechanisms, is evoked by transitions between irregular click trains, which do not have a pitch and regular click trains which are perceived to have a sustained temporal pitch (Gutschalk et al. 2004) or by transitions between white noise and iterated rippled noise (Krumbholz et al. 2003). However, it is also possible to describe the shift as a transition between states that differ along a more abstract dimension, such as degree of ‘regularity’ or ‘order’. Our stimuli, whether constant/random tones or correlated /uncorrelated noise, are very simple examples of ‘order’ and ‘disorder’. We have recently replicated this pattern of results with regularly alternating pip sequences (rather than a constant tone) and speculate that more complex stimuli involving transitions between regularity and irregularity should evoked similar response patterns to the ones observed here.

Future experiments are needed to test the generality of the present findings, to clarify the relation between behavioral and brain responses to change and to understand better the rules that determine factors such as integration time. In addition to studying the dynamics of change detection, the paradigm introduced here may serve as a methodological tool to explore the perceptual relevance of various sound dimensions: are changes in different features, which are relevant to auditory objects (e.g. loudness, ITD, ILD, pitch) processed in the same way and equally quickly? Such an investigation may shed light on their relative importance in determining the emergence of auditory objects.

Acknowledgments. This work was supported by R01 DC05660 to DP. We thank Alain de Cheveigne’ for comments and discussion and Jeff Walker for excellent MEG technical support.

References

Chait M, Poeppel D, de Cheveigne A, Simon JZ (2005) Human auditory cortical processing of changes in interaural correlation. J Neurosci 25:8518–8527

Chait M, Poeppel D, Simon JZ (2006) Neural response correlates of detection of monaurally and binaurally created pitches in humans. Cereb Cortex 16:835–848

Chait M, Poeppel D, de Cheveigne’ A, Simon J (2007) Processing asymmetry of transitions between order and disorder in human auditory cortex. J Neurosci 27:5207–5214

Durlach NI (1963) Equalization and cancellation theory of binaural masking-level differences J Acoust Soc Am 35:1206–1218

Grantham DW (1982) Detectability of time-varying interaural correlation in narrow-band noise stimuli. J Acoust Soc Am 72:1178–1184

Gutschalk A, Patterson RD, Scherg M, Uppenkamp S, Rupp A (2004) Temporal dynamics of pitch in human auditory cortex. Neuroimage 22:755–766

Joris PX, van de Sande B, Recio-Spinoso A, van der Heijden M (2006) Auditory midbrain and nerve responses to sinusoidal variations in interaural correlation. J Neurosci 26:279–289

330

M. Chait et al.

Kollmeier B, Gilkey RH (1990) Binaural forward and backward masking: evidence for sluggishness in binaural detection. J Acoust Soc Am 87:1709–1719

Krumbholz K, Patterson RD, Seither-Preisler A, Lammertmann C, Lütkenhöner B (2003) Neuromagnetic evidence for a pitch processing center in Heschl’s gyrus. Cereb Cortex 13:765–772

Polich J (2003) Detection of change: event related potential and fMRI findings. Kluwer Academic Press, Boston, MA

Ritter S, Gunter Dosch H, Specht HJ, Rupp A (2005) Neuromagnetic responses reflect the temporal pitch change of regular interval sounds. Neuroimage 27:533–543

Winkler I, Karmos G, Naatanen R (1996) Adaptive modeling of the unattended acoustic environment reflected in the mismatch negativity event-related potential. Brain Res 742:239–252

Comment by Duifhuis

Inspecting your results as presented in Fig. 2 (and supported by the additional data that you presented, it seems to me that the response to the 1→0 condition (A) is very similar to the negative response to the 0→1 transition (B), and vice versa. In addition, the MEG profile indicates opposite directions. Thus the question arises if you have to consider the differences as results from different neural mechanisms, or just as the ‘opposite’ response of the same mechanism to the ‘opposite’ stimulus. The resulting positive peak then automatically comes later in B than in A, but could be generated by the same neural mechanism.

Reply

Figure 2 shows the root mean square of the magnetic field (computed over channels) so it is not meaningful to refer to the ‘negative’ of the response. The troughs of the RMS correspond to a lack of response (not a negative response) and therefore it cannot be flipped.

Comment by Riedel

EEG studies by Dajani and Picton (2006) and Lüddemann et al. (current volume) did not reveal strong differences in the evoked potential to changes from correlated to uncorrelated broad band noise and vice versa. Both conditions showed components P1, N1 and P2 with comparable amplitudes at similar latencies. Could you comment on possible reasons for the difference to your data?

References

Dajani HR, Picton TW (2006) Human auditory steady-state responses to changes in Interaural correlation. Hear Res 219:85–100

Human Auditory Cortical Processing of Transitions Between ‘Order’ and ‘Disorder’

331

Reply

Our data for detection of change in interaural correlation (IAC) are in line with several subsequent experiments investigating transitions between ‘order’ and ‘disorder’ in the frequency domain. The response asymmetry we observe in all of these experiments is consistent with MEG data for transitions between irregular and regular click trains (Gutschalk et al. 2002, 2004) and between noise and IRN (Lutkenhoner et al., in preparation; Rupp et al. 2005; Krumbholz et al. 2003), as well as EEG data for transitions between randomly alternating and constant tones (Jones 2002). This consistency supports our interpretation of the responses, as reflecting different demands on temporal integration in the course of change-detection, and specifically for transitions between regularity or ‘order’ and irregularity or ‘disorder’.

The discrepancy between your data and ours probably arises from the different procedures of stimulus presentation you (as well as Dajani and Picton 2006) employed. Specifically, in your experiments segments with different IAC are regularly alternating (change occurs exactly every 800 ms) such that the task faced by the auditory cortex is not change detection per se. There is much evidence in the MEG/EEG literature indicating that in cases where change is expected at a particular point in time (e.g. Lange et al. 2003) auditory cortex reacts faster and differently comparing to situations where change is unexpected (e.g. our studies, or the MMN literature).

It is clear however that the only way to resolve this discrepancy is investigating it experimentally via replication. We intend to do that at the earliest opportunity. Late update (June, 2007): see Chait et al (in press) for such a replication.

References

Chait M, Poeppel D, Simon JZ (in press) Stimulus contexts affects auditory cortical responses to changes in interaural correlation. J Neurophysiol

Dajani HR, Picton TW (2006) Human auditory steady-state responses to changes in Interaural correlation. Hear Res 219:85–100

Gutschalk A, Patterson RD, Rupp A, Uppenkamp S, Scherg M (2002) Sustained magnetic fields reveal separate sites for sound level and temporal regularity in human auditory cortex. Neuroimage 15:207–216

Gutschalk A, Patterson RD, Scherg M, Uppenkamp S, Rupp A (2004) Temporal dynamics of pitch in human auditory cortex. Neuroimage 22:755–766

Jones SJ (2002)The internal auditory clock: what can evoked potentials reveal about the analysis of temporal sound patterns, and abnormal states of consciousness? Neurophysiol Clin 32:241-253

Lange K, Rosler F, Roder B (2003) Early processing stages are modulated when auditory stimuli are presented at an attended moment in time: an event-related potential study. Psychophysiology 40:806-817

Rupp A, Uppenkamp S, Bailes J, Gutschalk A, Patterson RD (2005) Time constants in temporal pitch extraction: a comparison of psychophysical and magnetic data. In: Pressnitzer D, de Cheveigné A, McAdams S, Collet L (eds) Auditory signal processing: physiology, psychoacoustics, and models. Springer, Berlin Heidelberg New York, pp 119–125

36 Wideband Inhibition Modulates the Effect of Onset Asynchrony as a Grouping Cue

BRIAN ROBERTS1, STEPHEN D. HOLMES1, STEFAN BLEECK2, AND IAN M. WINTER2

1Introduction

Onset asynchrony is arguably the most powerful grouping cue for the separation of temporally overlapping sounds (see Bregman 1990). A component that begins only 30–50 ms before the others makes a greatly reduced contribution to the timbre of a complex tone, or to the phonetic quality of a vowel (e.g. Darwin 1984). This effect of onset asynchrony does not necessarily imply a cognitive grouping process; instead it may result from peripheral adaptation in the response to the leading component in the few tens of milliseconds before the other components begin (e.g., Westerman and Smith 1984). However, two findings suggest that the effect of onset asynchrony cannot be explained entirely by peripheral adaptation. First, though the effect is smaller, the contribution of a component to the phonetic quality of a short-duration vowel is reduced when it ends after the other components (Darwin and Sutherland 1984; Roberts and Moore 1991). Second, the reduced contribution of a leading component to vowel quality can be partly restored by accompanying its leading portion with an additional pure tone (Darwin and Sutherland 1984). This extra sound, called the captor tone, was set to begin at the same time as the leading component, to end at vowel onset, and to be an octave higher than the leading component. The partial restoration, of about one third, cannot be attributed either to peripheral adaptation or to two-tone suppression, because the captor tone was too remote in frequency from that of the leading component. Instead, Darwin and Sutherland (1984) attributed this restoration to the formation of a perceptual group between the leading portion and the captor, based on their shared onset time and harmonic relations, leaving the remainder of the component to integrate into the vowel percept. This interpretation has been highly influential in the grouping literature, but to date has not been tested. The experiments reported here have used both psychophysical and physiological approaches to evaluate Darwin and Sutherland’s interpretation.

1Psychology, School of Life and Health Sciences, Aston University, Birmingham, UK, b.roberts @aston.ac.uk, s.d.holmes@aston.ac.uk

2Centre for the Neural Basis of Hearing, Physiological Laboratory, University of Cambridge, Cambridge, UK, bleeck@gmail.com; imw1001@cam.ac.uk

Hearing – From Sensory Processing to Perception

B. Kollmeier, G. Klump, V. Hohmann, U. Langemann, M. Mauermann, S. Uppenkamp, and J. Verhey (Eds.) © Springer-Verlag Berlin Heidelberg 2007

334

B. Roberts et al.

2Psychophysical Methods

The extent to which extra energy added to the first-formant (F1) region integrates into the percept of a vowel was investigated, under various conditions, using a paradigm developed by Darwin (1984). It is possible to construct a continuum of vowels, differing only in their F1 frequencies, that spans the boundary between the phonemes /Ι/ and /ε/. In terms of the F1 values used to synthesize the vowels, the position of the phoneme boundary along this continuum will shift downwards if the effect of adding energy to a harmonic raises the perceived F1 frequency, and vice versa. For a fixed increase in the level of a harmonic near F1, the effect on perceptual integration of introducing an asynchrony or adding a captor can be assessed by comparing the size of the boundary shifts across conditions.

All vowels were synthesized on a fundamental (F0) frequency of 125 Hz and were 56 ms long (including rise/fall times of 16 ms). The F1 frequency ranged from 350 to 550 Hz (in 10-Hz steps) and the bandwidth was 90 Hz. The frequencies of formants 2–5 were set to 2300, 2900, 3800, and 4600 Hz, with bandwidths of 120, 170, 1000, and 1000 Hz. The standard stimulus arrangements used are illustrated in Fig. 1. The basic continuum, normalized so that each token was

Fig. 1 Schematic spectrograms showing stimuli from the four standard conditions

Wideband Inhibition Modulates the Effect of Onset Asynchrony as a Grouping Cue

335

heard diotically at 69 dB SPL, comprised the vowel-alone condition. The level of the 4th harmonic could be boosted by adding in-phase a 500-Hz tone set to 6 dB above the original level of that harmonic. The extra tone could be added in synchrony with the vowel (incremented-4th condition) or extended to begin before the vowel (leading-4th condition). Each captor condition was created by accompanying the leading portion of the added 500-Hz tone with a captor, whose onset time and harmonic status could be varied relative to the leading portion. The captor was set to the same level as that of the added tone, except in one experiment in which the relative level of the captor was varied. Each captor condition had a corresponding captor-control condition that differed from its counterpart only in that the leading portion of the added 500-Hz tone was removed. These conditions were included to control for any effect of the captor other than through its interaction with the leading portion of the added tone.

After initial training, listeners in each experiment (N = 12) heard ten randomized blocks of tokens from each condition and indicated on each trial whether the vowel token was perceived as /Ι/ or /ε/. The position of the phoneme boundary in each condition for each listener was estimated using probit analysis, and these boundary estimates were compared. The main purpose of the experiments was to measure the extent to which the asynchronyinduced reduction in the effect of the added 500-Hz tone is reversed by different kinds of captor. To this end, a measure called the restoration effect was computed. It was defined as the difference in Hz between the boundaries for the leading-4th and incremented-4th conditions minus the difference between the boundaries for the captor condition and its control. The efficacy of the different captors was compared using these restoration values. Each restoration effect can also be expressed as a percentage of the boundary difference between the leading-4th and incremented-4th conditions.

3Psychophysical Findings: Part 1

Darwin and Sutherland’s (1984) key finding that accompanying the leading portion of the added 500-Hz tone with a synchronous 1-kHz captor partly reverses the effect of the asynchrony was replicated in a series of experiments. However, the effects of varying the properties of the captor indicate that the restoration effect does not depend on either a shared onset time or a harmonic relationship between the captor and the leading portion of the added tone (Roberts and Holmes 2006).

Figure 2 shows the boundary positions (means and inter-subject standard errors) and restoration values for an experiment in which the leadtime on the 4th harmonic was long (240 ms), and where the captor tone (1 kHz) either began at the same time as the leading component, or 160 ms before or after it. The overall mean restoration effect was about 35%, and the three types of captor were similarly effective in reducing the asyn- chrony-induced exclusion of the added 500-Hz tone. As a precaution