Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Учебники / Hearing - From Sensory Processing to Perception Kollmeier 2007

.pdf
Скачиваний:
150
Добавлен:
07.06.2016
Размер:
6.36 Mб
Скачать

90

D. Hall and C. Plack

Comment by Yost

I liked your use of Huggins pitch to probe for pitch processing centres for many of the same reasons you stated in your paper. I have two questions.

1.Did you attempt to control for the fact that the Huggins stimulus condition produces both a pitch at the region of the interaural phase shift and a shift in the lateralized position of the pitch image? That is, an fMRI response might be due to the pitch and\or the laterality associated with the Huggins pitch stimulus.

2.Given the very weak pitch strength of Huggins pitch, I was surprised to see that the fMRI response to this stimulus was one of the strongest. Do you have an explanation for why the fMRI response was so strong for the Huggins pitch stimulus?

Reply

We did not make any attempt to control for the lateralisation cues in our Huggins stimulus, and we think that this would be hard to do, partly because the percept is not consistent across listeners. All our stimuli were compromised to some extent, since it is impossible to produce a pitch stimulus without introducing spectral, temporal, or spatial features that might be identified by a cortical mechanism not specific to pitch. Our approach was to try to find a single cortical locus that responded to all our stimuli, and hence might be a candidate locus for the common feature of pitch.

The response to Huggins was distributed in a similar manner to our other pitch stimuli. We did not find any relationship between response size and salience and we have no explanation at present for why this is the case.

Comment by de Cheveigné

I really like this study. The level of rigor and care is refreshing, and the sobering results are possibly more exciting than those of less controlled studies. In your talk you mention that your pitch-producing stimuli failed to activate the areas that have been identified as a ‘pitch centre’ in studies that used ‘iterated ripple noise’ (IRN) stimuli. IRN is physically similar to a random-phase harmonic complex (for large number of iterations), and it evokes a clear pitch, so the discrepancy is puzzling. This comment points to some properties of IRN that might possibly explain the paradox.

IRN is obtained by delaying noise by multiples of a time interval T, and adding up the delayed signals. An IRN of order N is the sum of N copies of the same noise with delays of 0 to N−1 times the interval (‘period’). IRN is quasi-periodic: the difference of one period to the next amounts to only about

Searching for a Pitch Centre in Human Auditory Cortex

91

1/N times the power (the rest being identical between periods). Over several periods the difference gets larger, and after N periods the waveform has been completely ‘renewed’. For large N, the period-to-period difference is small and the renewal slow, and so the stimulus is very much like a periodic tone, with a period derived from a ‘chunk’ of noise.

The delay-and-add process also affects the spectrum. For N=2 (‘repetition noise’) the long-term power spectrum is shaped as a raised cosine with peaks at multiples of 1/T. For large N the peaks become narrower and in the limit of large N the spectrum (calculated over any finite window) becomes similar to the line-spectrum of a harmonic complex tone, but with two qualifications: the phase of each harmonic is drawn from a uniform distribution between [0, 2π], and its amplitude is drawn from a Rayleigh distribution. In other words, IRN has an irregular spectral envelope, somewhat akin to that of a ‘vowel’. This envelope fluctuates slowly over time (all the more slowly as N is large) and this may induce perceptible fluctuations in the timbre of the stimulus over the duration of a stimulus, or from repetition to repetition. The non-flat spectral envelope of IRN, or its evolution over time, could set it apart from other pitch-producing stimuli, and sensitivity to these aspects could explain why a ‘pitch centre’ would respond only to IRN.

IRN is sometimes presented as an ideal stimulus for pitch studies, combining the virtues of white noise (lack of spectral structure) with a temporal structure sufficient to evoke pitch. This is incorrect: IRN offers spectral cues that are as clear as its temporal cues are clear (and its pitch salient). They may be made unusable by high-pass filtering and masking of combination tones, but IRN does not differ significantly in this respect from a complex tone. The number of iterations is convenient to manipulate pitch salience, but a similar effect may be obtained by adding a controlled amount of noise to a complex tone. IRN offers a non-flat spectral envelope, akin to that derived by sampling a period-length chunk of ongoing noise, but this does not make its timbre ‘noiselike’: for real noise this spectral shape would fade instantly, whereas for IRN it persists over relatively long periods of time. In this respect, IRN is akin to the result of exciting with a harmonic source a resonator with a random, slowly fluctuating transfer function. These properties covary with those that determine pitch and pitch strength, and with the random choice of noise, and thus the IRN stimulus is hard to control parametrically. It is not clear that the peculiar properties of IRN are an advantage, and thus one can question its systematic use in studies of pitch.

Reply

We were concerned that our pitch-evoking stimuli produced a different response pattern to that seen in previous fMRI studies using IRN. We followed the experiment described in the paper with a second experiment in which we presented IRN (10-ms delay, 16 iterations) to a subset of the original listeners.

92

D. Hall and C. Plack

Our results were consistent with the earlier studies, showing activation in anterolateral HG, and also planum temporale. The IRN effect in planum temporale was broadly consistent with that produced by our other pitch stimuli. Hence, it is possible that there is some feature of IRN not found in our other pitch stimuli that produces activation in anterolateral HG.

Thank you for your excellent summary of the spectro-temporal features of IRN that may underlie this effect. After reading your comment, we passed our stimuli though a cochlear model. Although the peaks spaced at frequency intervals of 1/delay were not resolved (since they correspond to harmonic numbers of 10–20), broader slowly-varying spectral features were clearly present in the model output produced by the IRN stimulus compared to that produced by the noise control.

Comment on de Cheveigne’s Comment to Hall and Plack by Yost and Patterson

It would probably be useful to separate imaging studies from psychoacoustical studies when discussing the utility of using IRN to study pitch processing. In fMRI, it is necessary to use a subtractive method to help insure that the stimulus feature of interest is the one that leads to an increased BOLD signal. As a result, it is important to control for all relevant stimulus features if many exist. In this regard Alain’s concern about the multiple stimulus features of IRN is relevant. However, the major IRN feature is the temporal regularity in the temporal fine structure, which is highly correlated with psychophysically measured pitch and pitch strength, and there is good evidence that this temporal regularity is processed by the auditory system (e.g., Yost et al. 1998; Patterson et al. 2000; Krumbholz et al. 2003). In some psychophysical studies, such as pitch matching, it seems unlikely that the variables mentioned in Alain’s comment would play a role in the pitch matching, given the robust pitch of IRN. His points may pertain to discrimination experiments, such as those often used in pitch-strength measurements. That is, IRN features other than those thought to control pitch strength may have an influence on estimates of pitch strength. However, here it is useful to consider both the perceptual salience of different IRN stimulus features and what we currently know about auditory processing. It appears that Alain is concerned that changes in ‘timbre’ associated with the delay-and-add process may provide discrimination cues that might confound the results, especially when this process is iterated. Alain uses the very large N (number of iterations) case to illustrate his points. The effects he describes decrease as N decreases, and in many IRN studies N is relatively small at 8 or fewer. IRN pitch does not change with N, but pitch strength does. However, even for N =1, the pitch strength of IRN is substantial and any timbre differences are subtle.

Alain does not mention the role that the temporal envelope may play in pitch processing. IRN for N of 8 or fewer has a flatter envelope than all of

Searching for a Pitch Centre in Human Auditory Cortex

93

the other stimuli used to study pitch. So, the use of IRN suggests that envelope cues may not be sufficient, and may not be necessary, for complex pitch processing.

We certainly encourage the generation of other stimuli to probe pitch processing, as suggested in Alain’s comment. However, until these other stimuli are specified and are shown to be better in some way than IRN, we do not believe that the use IRN should be discontinued as one of the stimuli used to study pitch processing. We agree that more needs to be done to determine which neural centres are involved in complex pitch processing.

References

Krumbholz K, Patterson RD, Nobbe A, Fastl H (2003) Microsecond temporal resolution in monaural hearing without spectral cues? J Acoust Soc Am 113(5):2790–2800

Patterson RD, Yost WA, Handel S, Datta JA (2000) The perceptual tone/noise ratio of merged iterated rippled noises. J Acoust Soc Am 107:1578–1588

Yost WA, Patterson RD, Sheft S (1998) The role of the envelope in processing iterated rippled noise. J Acoust Soc Am 104:2349–2361

11 Imaging Temporal Pitch Processing

in the Auditory Pathway

ROY D. PATTERSON1, ALEXANDER GUTSCHALK2, ANNEMARIE SEITHER-PREISLER3,

AND KATRIN KRUMBHOLZ4

1Introduction

Physiological studies of temporal pitch processing suggest that the processing of temporal regularity begins in the brainstem (e.g., Palmer and Winter 1992), which suggests that there is a hierarchy of temporal pitch processing in the auditory pathway as would be expected from computational models of auditory perception (e.g., Patterson et al. 1995; Pressnitzer et al. 2001). This chapter reports a series of brain imaging studies designed to search for evidence of the hierarchy.

2Imaging Temporal Pitch Processing with PET

There is an early positron emission tomography (PET) study of temporal pitch processing by Griffiths et al. (1998), who used Regular Interval Sounds (RIS) (Yost et al. 1996) to produce a spectrally balanced set of stimuli with varying pitch strength. A delay-and-add technique is used to produce a concentration of one time interval in what is otherwise a broadband noise. As the degree of regularity increases, the hiss of the noise dies away and a pitch at the delay increases in strength to the point where it dominates the perception. With appropriate high-pass filtering, these RIS produce essentially uniform excitation across the tonotopic dimension of neural activity in the auditory pathway (see Fig. 1 of Patterson et al. 2002). RIS are useful in imaging because they enable one to generate sets of spectrally matched stimuli that enhance the sensitivity of perceptual contrasts in functional imaging. A brief comparison of spectral and temporal models of pitch for brain imaging is presented

1Centre for the Neural Basis of Hearing, Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, UK, rdp1@cam.ac.uk

2Department of Neurology, University of Heidelberg, Heidelberg, Germany, Alexander.Gutschalk @med.uni-heidelberg.de

3Experimental Audiology, Münster University Hospital, Münster and CSS Institut für Psychologie, Karl-Franzens-Universität, Graz, Germany, Annemarie.Seither-Preisler@uni-graz.at

4MRC Institute of Hearing Research, University Park, Nottingham, UK, katrin@ihr.mrc.ac.uk

Hearing – From Sensory Processing to Perception

B. Kollmeier, G. Klump, V. Hohmann, U. Langemann, M. Mauermann, S. Uppenkamp, and J. Verhey (Eds.) © Springer-Verlag Berlin Heidelberg 2007

96

R.D. Patterson et al.

in Griffiths et al. (1998). In separate conditions, subjects were presented with sequences of stimuli having different, but fixed, values of pitch strength, and brain activation was observed to increase with pitch strength in the anterior region of auditory cortex referred to as Heschl’s gyrus. When conditions with varying pitch value were contrasted with fixed-pitch conditions, differential activation was observed in regions of the temporal lobe clearly anterior and posterior to auditory cortex. The results were interpreted as evidence of a hierarchy of pitch processing in the auditory pathway. The power of PET experiments is severely limited, however, by the amount of radiation that a subject can be exposed to in a year, and so PET has largely been replaced by fMRI and MEG for brain imaging in the auditory system.

3Imaging Temporal Pitch Processing with fMRI

Neural tissue draws oxygen from the blood when it is active, and functional Magnetic Resonance Imaging (fMRI) can be used to measure neural activation through the blood-oxygen-level depletion (BOLD) response. It is a noninvasive technique; however, it does have three important limitations: the scanner is very noisy, the subject has to stay very still, it is difficult to control stimulus fidelity. We begin by describing how these problems are managed.

3.1Managing fMRI Constraints

The most obvious limitation for auditory studies is that of the very loud noise that MR scanners make during structural and fMRI image acquisition (such noise can exceed 130 dB; Palmer et al. 1998). One widely used set of techniques overcomes the influence of scanner noise on stimulus presentation during fMRI by temporally separating EPI scanner noise from the experimental sounds, taking advantage of the fact that the peak of the hemodynamic response lags the stimulus by several seconds (e.g., Hall et al. 1999). Image acquisition in such ‘sparse-imaging’ designs should be as rapid as possible (under 3 s), so that activation within a scan is not contaminated by scanner noise, and follows a silent interval during which experimental sounds are presented. This silent period is often quite long (7–15 s), which ensures that activation data is uncontaminated by scanner noise; however, such procedures are time consuming. Accordingly, efforts are now being made to minimize scanner noise –redesigning the pulse sequence to modify gradient motion and, thus, gradient noise.

Any movement of the structures imaged, either within a scan or between scans, will blurr the image and reduce sensitivity by averaging voxels that have stimulus-related signal with those that do not. The subcortical nuclei of the auditory system, the CN, SOC and IC, move vertically with the pulsing of

Imaging Temporal Pitch Processing in the Auditory Pathway

97

the cardiac cycle across distances of up to half their diameter. So, if images are taken at random points of the cardiac cycle, the same point in the imaged volume would actually contain data from different brain structures on different scans, and it might include regions containing ventricle fluid. Fortunately, the temporal resolution of data capture in fMRI is relatively good (as opposed to the hemodynamic response); the data obtained in a single scan comes from a duration of about 20 ms, and so image acquisitions can be synchronized to a particular point in the cardiac cycle – a procedure referred to as ‘cardiac gating’ (Guimaraes et al. 1998). If the scan orientation is axial and the scan proceeds from bottom to top up the auditory pathway, then the scans of the brainstem structures occur shortly after the cardiac trigger in a part of the cycle where the position is predictable and the rate of motion is minimal. Guimaraes et al. (1998) described how this technique was used to image the response to a sinusoid in the CN and IC.

Finally, there is the problem of stimulus fidelity; the large magnetic field of the scanner precludes the use of headphones with metallic coils which would disrupt image quality, especially since auditory cortex is close to the ear canal. One solution is to conduct the sound to the subject via plastic tubes from distance speakers, but this restricts the frequency response. These problems prompted the development of magnet-friendly headsets with carbonfibre leads and either piezo-electric transducers or electrostatic transducers. Both transducers can present stimuli with reasonable bandwidth and fidelity close to the entrance to the ear canal, and they can be mounted in relatively flat, circum-aural ear cups that provide substantial attenuation from the scanner noise.

3.2Searching for Temporal Pitch Activation

With correct management of scanner noise, motion artifact and stimulus fidelity in hand, it became possible to brain image the auditory pathway with fMRI. Griffiths et al. (2001) conducted a study with RI sounds, and showed that with cardiac gating, sparse imaging and 48 repetitions of each condition, fMRI was sufficiently sensitive to image all of the monaural, subcortical nuclei of the auditory pathway simultaneously. Contrasts between the activation produced by RIS and spectrally matched noise confirmed that temporal pitch processing begins in subcortical structures (CN and IC). At the same time, a contrast between sounds with varying pitch and fixed pitch revealed that changing pitch does not produce more activation than fixed pitch in these regions. The results were interpreted as confirming that pitch processing begins in the brainstem but is not completed there as suggested in Griffiths et al. (1998).

The processing of pitch and melody by cortical regions in this study was reported in Patterson et al. (2002). The anterior-most transverse temporal gyrus of Heschl (HG) is the landmark for primary auditory cortex (Morosan

98

R.D. Patterson et al.

et al. 2001). Both the RIS and the noise produced more activation than silence, bilaterally, in two large clusters of voxels centred in the region on and behind HG, and the individual sound conditions all produced very similar patterns of activation. In PAC, when any sound condition was contrasted with any other, there was no residual activity. The obvious interpretation is PAC is fully engaged by the processing of any complex sound. When the fixed-pitch condition was contrasted with noise there was differential activation in antero-lateral HG bilaterally, just outside PAC, which was interpreted as a sign of temporal pitch processing. Finally, when activity in the melody conditions was contrasted with that in the fixed-pitch condition, it revealed differential activity in the superior temporal gyrus (STG) below HG, and in planum polare (PP) anterior to HG. Moreover, activity was more pronounced in the right hemisphere. Melody produced about the same level of activity as fixed pitch in HG itself, suggesting that the al-HG region is involved in determining the pitch value and pitch strength, rather than the contour of pitch change across a sequence of notes.

Penagos et al. (2004) extended these results using harmonic complex tones with and without resolved harmonics. With a 3-Tesla scanner, they were able to measure activation in the CN and IC, as well as PAC. There was a correlate of pitch salience in al-HG, which was interpreted as evidence of a pitch hierarchy with pitch-specific processing in al-HG. Warren et al. (2003) contrasted the chroma and height dimensions of pitch and found that, whereas chroma changes produced more activation in al-HG, pitch height changes produced more activation in PT directly behind al-HG. Recently, Bendor and Wang (2005) demonstrated the presence of cells in marmoset cortex that were sensitive to the low pitch of complex tones. The cells were in an area adjacent to PAC that Bendor and Wang argue is homologous to al-HG in humans.

4Imaging Temporal Pitch Processing with MEG

Magnetoencephalography (MEG) measures the strength and direction of postsynaptic activity in pyramidal cells of cortex running parallel to the scalp. The main advantage of MEG for the investigation of auditory function is that it has millisecond temporal resolution, so it can follow the temporal dynamics of auditory processing. There is a small, positive deflection of the auditory evoked field (AEF) with a latency in the range 50–90 ms referred to as the P1m, or P50m. However, it does not change amplitude or latency when the pitch strength of a tone is varied in isolation (Gutschalk et al. 2004a). The subsequent negative deflection associated with stimulus change is the most prominent part of the AEF; it appears in the interval between 80 and 150 ms post stimulus onset, and it is referred to as the N1m or N100m. It is a complex response that is generally assumed to represent the aggregate activity of multiple sources in auditory cortex, involved in processing different properties of sound onset.

Imaging Temporal Pitch Processing in the Auditory Pathway

99

Forss et al. (1993) showed that the latency of the N1m elicited by a regular click train is inversely related to the pitch of the sound, which suggests that the generators of the N1m are involved in pitch processing. However, as Näätänen and Picton (1987) pointed out, the N1m can be elicited by the onset of almost any kind of sound. So, while it is the case that the latency of the N1m varies with pitch, the response is fundamentally confounded with a large onset response to the energy of the sound. To isolate the pitch component of the N1m, Krumbholz et al. (2003) and Rupp et al. (2005) developed continuous stimulation techniques in which the sound begins with a noise and then, after the initial N1m has passed and the AEF has settled into a sustained response, the fine structure of the noise is regularized without changing the energy or the longer term spectral distribution of the energy. There is a marked perceptual change at the transition from noise to RIS, and it is accompanied by a prominent negative deflection in the magnetic field, referred to as the pitch onset response (POR). The inverse transition, from a RIS to noise, is much more difficult to detect (Uppenkamp et al., 2004; Rupp et al. 2005), and produces virtually no deflection of the AEF. Krumbholz et al. (2003) showed that the latency of the POR varies inversely with the pitch of the RIS, and the magnitude of the response increases with pitch strength. The source of the POR was located in the antero-lateral portion of HG close to the pitch region identified with fMRI by Patterson et al. (2002).

The notes of music and the vowels of speech produce sustained pitch perceptions, and when the duration is 100 ms, or more, they elicit a surface negative sustained field (SF) that rises after the N1m and continues to the end of the sound. Gutschalk et al. (2002) recorded the SF evoked by regular and irregular click trains. By contrasting the activity produced by regular and irregular conditions, they were able to dissociate activity associated with temporal regularity from activity associated with stimulus intensity. Two sources just lateral to PAC were isolated in each hemisphere. The more anterior, located in al-HG, was particularly sensitive to temporal pitch and largely insensitive to stimulus intensity. The more posterior, in PT just behind al-HG, was sensitive to intensity and largely insensitive to pitch. This double dissociation shows that al-HG also produces a sustained pitch response (SPR). The generators of the POR and SPR, on the one hand, and the components of the N1m and the sustained field that are indifferent to regularity, on the other hand, appear to arise from differentiable, but overlapping sites (Gutschalk et al. 2004a). The existence of a SPR as well as a POR in al-HG has now been confirmed in a succession of MEG studies (Gutschalk et al. 2004a, b, 2006; Seither-Preisler et al. 2004, 2006a, b).

The latencies of the POR and the SPR are both surprisingly long. The peak latency of the POR is about 120 ms plus four times the ‘period’ of the RI sound (Krumbholz et al. 2003); the SPR appears in the source wave between 200 and 300 ms post regularity onset, and rises to its sustained level over 100–200 ms. Several of the groups have modeled the temporal dynamics of

100

R.D. Patterson et al.

the POR and SPR (Gutschalk et al. 2004a, b; Krumbholz et al. (2003); Rupp et al. 2005; Seither-Preisler et al. 2006b), either qualitatively or quantitatively, using the auditory image model (AIM) (Patterson et al. 1995). The results consistently show that the latencies of the POR and SPR are substantially longer than the latencies that would be predicted with AIM for the build up of the pitch ridge in the auditory image.

5Conclusions

Brain imaging with PET and fMRI has been used to locate activity associated with temporal pitch processing on Heschl’s gyrus just antero-lateral to PAC. MEG has been used to reveal the temporal dynamics of the processing. The results suggest that both the POR and SPR reflect the measurement of pitch value and strength which, according to theory, occur relatively late in the pitch hierarchy.

References

Bendor D, Wang Q (2005) The neuronal representation of pitch in primate auditory cortex. Nature 436:1161–1165

Forss N, Mäkelä JP, McEvoy L, Hari R (1993) Temporal integration and oscillatory response of the human auditory cortex revealed by evoked magnetic fields to click trains. Hear Res 68:89–96

Griffiths TD, Büchel C, Frackowiak RSJ, Patterson RD (1998) Analysis of temporal structure in sound by the brain, Nature Neurosci 1:422–427

Griffiths TD, Uppenkamp S, Johnsrude I, Josephs O, Patterson RD (2001) Encoding of the temporal regularity of sound in the human brainstem. Nature Neurosci 4:633–637

Guimares A, Melcher J, Talavage T, Baker J, Ledden P, Rosen B, Kiang N, Fullerton B, Weisskoff R (1998) Imaging subcortical activity in humans. Hum Brain Map 6:33–41

Gutschalk A, Patterson RD, Rupp A, Uppenkamp S, Scherg M (2002) Sustained magnetic fields reveal separate sites for sound level and temporal regularity in human auditory cortex. NeuroImage 15:207–216

Gutschalk A, Patterson RD, Scherg M, Uppenkamp S, Rupp A (2004a) Temporal dynamics of pitch in human auditory cortex. NeuroImage 22:755–766

Gutschalk A, Patterson RD, Uppenkamp S, Scherg M, Rupp A (2004b) Recovery and refractoriness of auditory evoked fields after gaps in click trains. Eur J Neurosci 20:3141–3147

Gutschalk A, Patterson RD, Scherg M, Uppenkamp S, Rupp A (2006) The effect of context on the sustained pitch response in human auditory cortex. Cerebral Cortex (in press)

Hall D, Haggard M, Akeroyd M, Palmer A, Summerfield A, Elliott M, Gurney E, Bowtell R (1999) “Sparse” temporal sampling in auditory fMRI. Hum Brain Map 7:213–223

Krumbholz K, Patterson RD, Seither-Preisler A, Lammertmann C, Lütkenhöner B (2003) Neuromagnetic evidence for a pitch processing centre in Heschl’s gyrus. Cerebral Cortex 13:765–772

Morosan P, Rademacher J, Schleicher A, Amunts K, Schormann T, Zilles K (2001) Human primary auditory cortex: subdivisions and mapping into a spatial reference system. NeuroImage 13:684–701