Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Учебники / Auditory Perception - An Analysis and Synthesis Warren 2008

.pdf
Скачиваний:
199
Добавлен:
07.06.2016
Размер:
1.56 Mб
Скачать

Temporal induction 155

sensation level (dB above a subject’s threshold). It can be seen that virtually no continuity occurred until the frequency of the fainter tone was above 700 Hz, so that for these lower frequencies, as long as the tone was loud enough to be heard clearly, it was perceived to be pulsing.

When the weaker tone was 1,000 Hz (the same frequency as the stronger tone), homophonic continuity took place. Most of the subjects chose a 3 dB difference in level between the 1,000 Hz tones as the upper limit of homophonic continuity (rather interesting unexpected phenomena occurring at and near this upper limit will be discussed subsequently). Although the extent of temporal induction increased sharply from 700 through 1,000 Hz, there was a relatively gradual decrease as the frequency of the fainter tone was increased above 1,000 Hz, an asymmetry characteristic of masking functions (see Chapter 3).

The second part of this study measured simultaneous masking using the same subjects and the same frequencies used for determining continuity limits. But rather than alternating the louder and fainter tones, the louder 1,000 Hz tone was kept on continuously at 80 dB SPL, and the threshold was measured for detection of pulsed superimposed tones that were on for 300 ms and off for 300 ms. The data points shown by circles in Figure 6.1 give the simultaneous masking thresholds measured as decibels above the detection threshold for fainter tones having various frequencies when alternated with the louder fixed frequency tone. The figure shows that the simultaneous masking and temporal induction functions were quite similar except for frequencies at or near that of the 1,000 Hz tone and its octave. At identical inducer and inducee frequencies, as indicated earlier, temporal induction became homophonic, and the threshold for masking was equivalent to a jnd in intensity of the 1,000 Hz tone. The masking function is also influenced by the fluctuation in loudness produced by beats when the concurrent louder and fainter tones differ but lie within the same critical band. The functions for apparent continuity and masking were equivalent at 1,500 Hz and above, except for one frequency: there was a separation in the two functions at 2,000 Hz due to an increase in the masked threshold. The increase in masking one and two octaves above the frequency of a tonal masker has been known for some time (Wegel and Lane, 1924), as was described and discussed in Chapter 3. It can be seen in the figure that a change in slope also occurs in the temporal induction curve at 2,000 Hz; the reason for this slight change, while interesting, seems obscure at this time.

The third and fourth experiments provided additional evidence for the close relation between temporal induction and masking. The third experiment used a 1/3-octave band of noise centered on 1,000 Hz as the inducing sound, and found that the upper intensity limit for illusory continuity of tones was highest

156Perceptual restoration of missing sounds

for the tonal frequency corresponding to the center frequency of the noise band. As with the tonal inducer shown in Figure 6.1, the curve describing the intensity limits for temporal induction of the tone by the noise band was asymmetrical, with steeper slopes at the low frequency end. In the fourth and final experiment, a broadband noise with a frequency notch (that is, a rejected frequency band) centered on 1,000 Hz served as the inducer. In keeping with the hypothesized relation between masking and temporal induction, the upper intensity limit for apparent continuity was lowest for tones at the center frequency of this rejected noise band.

The roll effect as tonal restoration

Van Noorden (1975, 1977) discovered that, when faint 40 ms tone bursts were alternated with louder 40 ms tone bursts with short silent gaps separating the successive bursts, it was possible for listeners to hear the fainter tone burst not only when it actually was present, but also when the louder burst occurred. The gap was also heard, so that there was a discontinuity, and an apparent doubling of the actual rate of the fainter bursts. This ‘‘roll effect’’ required intensity and spectral relations between fainter and louder tones resembling those leading to illusory continuity of the fainter of two alternating temporally continuous tones. As van Noorden pointed out, it was as if the discrete restorations of the fainter tonal bursts leading to the roll effect required that the louder bursts could function as potential maskers. The duration of the silent gap played a critical role in this illusion: if shorter, the gap would not be detected and a continuous homophonic induction would take place, and if longer the percept of the fainter tone could not jump over the silent gap and reappear along with the higher amplitude tone.

Durational limits for illusory continuity

Studies of the illusory continuity of tones have not used interruption times greater than 300 ms, since tonal continuity cannot be maintained for longer periods. (See Verschuure, 1978, for a discussion of temporal limits for the continuity of tones.) Much longer continuity was reported by Warren, Obusek, and Ackroff (1972) for a 1/3-octave band of noise centered on 1,000 Hz when alternated with a louder 500 to 2,000 Hz band of noise of equal duration. All of our 15 subjects heard the fainter noise band continue for several seconds, eight heard illusory continuity for at least 20 s, and six still were hearing the absent noise band continue 50 s after it was replaced by the broadband noise.

Reciprocal changes in inducer and inducee

The upper duration limit for illusory continuity, or pulsation threshold, appears as a fairly sharp break, enabling reproducible measures to be made of

Temporal induction 157

this transition (Houtgast, 1972; Warren, 1972; Verschuure, Rodenburg, and Maas, 1974; Fastl, 1975; Schreiner, Gottlob, and Mellert, 1977). Laboratory studies dealing with this topic have concentrated on the conditions needed to produce apparent continuity of a fainter inducee, and have generally ignored possible concurrent changes occurring in the louder inducer. However, Warren (1984) described informal observations suggesting that a portion of the inducer’s neural representation was subtracted and used for the perceptual synthesis of the inducee. Subsequent formal experiments have provided quantitative evidence that this reallocation of the neural response from inducer to inducee does indeed occur (Warren, Bashford, and Healy, 1992; Warren, Bashford, Healy, and Brubaker, 1994). In addition, these studies described a number of previously unreported phenomena.

Figure 6.2 shows the reduction in apparent level of a 1,000 Hz sinusoidal inducer when alternated every 200 ms with inducees that were sinusoidal tones of the same or slightly different frequencies. The level of the inducer was fixed at 70 dB, and the inducees were presented at one of three different levels. In keeping with the reallocation hypothesis, the apparent amplitude (loudness) of the inducer decreased with increasing inducee amplitude under eight of the nine conditions, reflecting a greater reallocation of the inducer’s auditory

Figure 6.2 Temporal induction: loudness reduction of inducer produced by inducees. Means and standard errors are shown for changes produced in apparent level (loudness) of a 200 ms, 1,000 Hz inducer at 70 dB alternating with 200 ms inducees having three different frequencies, each presented at three different amplitudes. Inducee frequencies are given as the difference in semitones from the inducer frequency (each semitone is 1/12-octave). For further details, see the text. (From Warren, Bashford, Healy, and Brubaker, 1994.)

158 Perceptual restoration of missing sounds

– –

Figure 6.3 Temporal induction: loudness reduction of inducer produced by inducees. Means and standard errors are shown for changes in the apparent level (loudness) of a 200 ms, 1,000 Hz inducer at 70 dB alternating with 200 ms 66 dB inducees having nine different frequencies. Inducee frequencies are given as the difference in semitones from the inducer frequency (each semitone is 1/12-octave). (From Warren, Bashford, Healy, and Brubaker, 1994.)

representation for perceptual synthesis of the inducee. It can be seen that for a given inducee amplitude, the reduction was greatest for homophonic conditions.

Figure 6.3 shows the results of an experiment examining the effect of inducee frequencies covering a range of 22 semitones upon the loudness of a fixedfrequency inducer. As can be seen, a drop in loudness occurred for inducee frequencies within the range extending from 2 semitones below (891 Hz) to 10 semitones above (1,782 Hz) the frequency of the 1,000 Hz inducer. The decrease in the inducer’s apparent amplitude occurred even though the inducee appeared to be discontinuous over most of this range. This drop in loudness in the absence of inducee continuity suggested that if reallocation theory was valid, then incomplete induction took place. One possibility is that a portion of the inducer’s neural representation was used to increase the apparent duration of the inducee to some extent, but not enough to completely close the gaps. There had been an earlier report by Wrightson and Warren (1981) that a measurable late offset and early onset of a tone alternated with noise can occur when the tone does not appear to be continuous.

An additional experiment by Warren, Bashford, Healy, and Brubaker (1994) provided confirmation that the restoration of obliterated segments is not an all-or-none phenomenon, but rather the end point of a continuum of illusory lengthening. The same stimuli were employed as in the experiment

Temporal induction 159

Figure 6.4 Temporal induction: relation of changes in apparent inducee duration to changes in apparent inducer amplitude. The inducer was a 200 ms 1 kHz tone at 70 dB, and the inducees were 200 ms 66 dB tones having various frequencies expressed as semitone separations from the inducer frequency (each semitone

is 1/12-octave). The horizontal dotted line represents the actual value of inducee duration (left ordinate) as well as the actual value of inducer amplitude (right ordinate). Data are standardized so that the maximum reduction in apparent inducer amplitude (5.5 dB) matches the ceiling value for inducee apparent duration (full continuity). See the text for further information. (From Warren, Bashford, Healy, and Brubaker, 1994.)

represented in Figure 6.3, but rather than matching the apparent amplitude of the inducer, listeners matched the apparent duration of the inducee. The changes in apparent duration of inducees are shown in Figure 6.4, along with the corresponding changes in the apparent amplitude of the inducer derived from Figure 6.3. The close relation of the two functions is apparent.

Alternating levels of the same sound: some anomalous effects observed for the higher level sound in the homophonic induction of tones

Figure 6.3 shows that under homophonic conditions (0 semitone separation between the 70 dB 1,000 Hz tonal inducer and the 66 dB tonal inducee), the apparent level of the inducer dropped by 5.5 dB, a decrease in intensity considerably greater than the 2.2 dB change corresponding to a physical subtraction of the inducee level. When the inducee level is raised to about 67 or 68 dB (depending on the listener) continuity of the fainter sound ceases, and the two levels of the pure tones are heard to alternate. When the inducee level was within 1 or 2 dB below the alternation threshold, the inducer

160Perceptual restoration of missing sounds

no longer seemed tonal, but was heard as an intermittent harsh and discordant sound that was superimposed upon the apparently continuous level of the 66 dB pure tone. It appears that the residue remaining after reallocation does not correspond to the entire neural representation of a tone, but rather it consists of that portion of the inducer’s neural representation that normally signals an increase in level above that of the inducee.

There are several changes in the representation of a tone at the level of the auditory nerve that accompany an increase in stimulus amplitude, and any or all of these could serve as a basis for signaling an increase in loudness. These changes include the crossing of the response threshold of less sensitive fibers having the same characteristic frequency as the tone, an increase in the firing rate of fibers responding to the lower level, the asymmetrical further spread of excitation to fibers differing in characteristic frequency from that of the stimulus, and also complex changes associated with phase locking to the stimulus waveform (for a discussion of the possible neural cues employed for the coding of stimulus amplitude, see Javel, 1986; Relkin and Ducet, 1997; Smith, 1988). Although the relative importance of each of these potential cues signaling a loudness increase is not known, homophonic induction appears to shear off those components of the inducer’s neural representation that correspond to the lower-level inducee, allowing listeners to hear only those atonal-sounding components that signal the increment in level above that corresponding to the inducee. When the inducer and inducee levels differed by more than 5 or 6 dB the inducer seemed completely tonal despite reallocation. Figures 6.2, 6.3, and 6.4 make it clear that the loudness reduction produced by reallocation cannot be derived from the principles governing the addition and subtraction of sound in physical acoustics.

Differences in the homophonic induction of tone and noise

Warren, Bashford, Healy, and Brubaker (1994) found that the homophonic induction of a broadband noise differed from that of a tone in several respects, indicating that there are basic differences in the neural encoding of amplitude changes for these two types of sounds. It was found that: (1) the reduction in loudness of the inducing noise when the alternating levels are close in amplitude is not accompanied by appreciable change in timbre or quality of the inducer, as is the case with tones; (2) the decrease in inducer loudness is less than that observed for tonal induction involving equivalent amplitudes of inducer and inducee; and (3) there is no transition from homophonic continuity to the perception of alternating amplitudes as heard with tones when the two levels of broadband noise are brought close together. Even when the difference in levels is as little as 0.5 dB (the just

Temporal induction of dynamic signals 161

noticeable difference in intensity of broadband noise reported by G. A. Miller, 1947), the acoustically lower noise level appears to be continuous, and is heard along with a very faint pulsed additional noise corresponding to the diminished inducer.

These differences in homophonic induction of sinusoidal tones and broadband noise appear to be attributable to the nature of their neural representation: tones are delivered at a fixed amplitude to narrow cochlear regions; noises stimulate broad spectral regions, with individual critical bands receiving different fluctuating amplitude patterns. In order to evaluate the level of a broadband noise with a precision of 0.5 dB, it would seem necessary to combine the input from many loci over some appropriate integration time.

Binaural release from temporal induction

Although temporal induction can occur when the signal (inducee) and the interrupting sound (inducer) are presented monaurally or diotically, and hence appear to originate at the same location, what would happen if the inducer and the inducee appear to originate at different locations? As discussed in Chapter 2, when two sounds occur simultaneously, the ability of one to mask the other is decreased appreciably when interaural phase differences associated with each sound cause them to appear to be located at different azimuths. Kashino and Warren (1996) reported that when interaural phase relations differed for the inducer and inducee, temporal induction was inhibited, as measured both by the upper amplitude limit for inducee continuity, and by loudness reduction of the inducer. This ‘‘binaural release from temporal induction’’ is consistent with the hypothesis that the perceptual synthesis of a missing fragment depends upon the masking potential of an interpolated sound.

Temporal induction of dynamic signals

Temporal induction is not limited to the restoration of continuity of sounds such as tones and noises for which the restored segment resembles the preceding and following segments. Temporal induction can also restore obliterated portions of time-varying signals, such as segments of tonal frequency glides, phonemes of speech, and notes of familiar melodies.

Temporal induction of tonal frequency glides

Dannenbring (1976) studied the illusory continuity of tonal glides interrupted by noise. An example of the stimuli used is shown in Figure 6.5. The difference in frequency between the upper and lower limits of the frequency range transversed by the glides was varied from 100 through 1,000 Hz

162 Perceptual restoration of missing sounds

FREQUENCY (Hz)

1500

1000

500

N O I S E

N O I S E

W H I T E

W H I T E

500

1500

2500

TIME (ms)

Figure 6.5 Example of conditions used to study illusory continuity (temporal induction) of tonal glides interrupted by louder broadband noise. (Adapted from Dannenbring, 1976.)

(the center frequency of the glide range was always 1,000 Hz), and the duration of the glide from frequency peak to trough was varied from 250 through 2,000 ms. The duration of white noise bursts centered at the middle of each glide was adjusted by the subjects to the maximum duration that permitted the glide to appear continuous. The stimuli were presented diotically, with the noise burst at 90 dB and the tonal glide at 75 dB. Illusory continuity of somewhat longer duration was found as DF (the range of the frequency glide) increased. For the longest duration glide (2 s from peak to trough), the tonal glide was heard to continue smoothly through noise durations between 400 and 500 ms (mean of adjustments for the 20 subjects) for each value of DF.

Temporal induction of speech: phonemic restoration

It has been reported that listeners cannot tell that a phoneme is missing after it has been excised from a recorded sentence and replaced by an extraneous sound such as a cough or a noise burst (Warren, 1970b; Warren and Warren, 1970). Even when the listeners were told in advance that a particular speech sound (that was not identified) had been removed completely and

Temporal induction of dynamic signals 163

replaced by another sound and the recording was replayed several times, the sentence still appeared intact, and it was not possible for listeners to distinguish between the perceptually synthesized sound and those physically present (Warren and Obusek, 1971). It might be thought that listeners could identify the position of the extraneous sound in the sentence, and thus locate the missing segment. However, the extraneous sound could not be localized accurately: when subjects attempted to report its position, errors corresponding to a few hundred milliseconds were made. But, when a silent gap was present rather than an extraneous sound, restoration did not occur, and the location of the missing speech sound could be identified with accuracy.

Context provided by both prior and subsequent words can be used to identify an obliterated phoneme in a sentence (Warren and Warren, 1970; Sherman, 1971). However, information provided by the immediately preceding and following speech sounds can be surprisingly ineffective. Thus, when a phoneme in a sentence was deliberately mispronounced before deletion, for example the substitution of /t/ for /n/ in ‘‘commu/t/ication,’’ phonemic restoration of the contextually appropriate /n/ occurred despite the inappropriate coarticulation cues present in the intact neighboring phonemes (Warren and Sherman, 1974). Even when listeners were told that a speech sound had been replaced by a noise, the presence of lexically inconsistent coarticulation cues still did not permit them to identify the missing phoneme.

Without sentential context, phonemic restoration does not normally occur. However, when a gap in an isolated word is replaced by speech-modulated noise (as employed by Samuel, 1987, and others) rather than by stochastic noise, then the additional bottom-up cue provided by the amplitude contour of the missing phoneme can permit restoration of the appropriate phoneme (see Bashford, Warren, and Brown, 1996).

Phonemic restorations in Japanese sentences were reported by Sasaki (1980), who reasoned that there might be a melodic restoration effect similar to phonemic restoration. He did indeed find such an effect: when one or two notes were replaced by noise in familiar melodies played on a piano, listeners heard the missing note(s), and mislocalized the noise burst when required to report its location.

The inability to locate extraneous sounds in sentences and in melodies is consistent with reports of a general inability to detect order directly in sequences of nonverbal sounds with equivalent item durations (see Chapter 5). Although the phonemes forming sentences and the notes in musical passages may occur too rapidly to permit direct naming of order, verbal and tonal groupings can be recognized globally and distinguished from groupings of the same brief items in different orders. Following recognition of the overall

164Perceptual restoration of missing sounds

pattern, the identity and order of components forming these sequences can be inferred if they had been learned previously. This indirect mechanism for naming orders at rates too rapid for direct identification of sounds and their orders is not available for locating the position of extraneous sounds such as a cough replacing or masking a phoneme in a sentence, or a brief click occurring within an otherwise intact sentence.

Both phonemic and melodic restorations can be considered as specialized forms of temporal induction, with the identity of the induced sound determined by the special rules governing these familiar sequences. In keeping with the general principle found to govern temporal induction, there is evidence that restoration of a phoneme is enhanced when the extraneous sound in a sentence is capable of masking the restored speech sound (Layton, 1975; Bashford and Warren, 1987b).

Apparent continuity of speech produced by insertion of noise into multiple gaps

Miller and Licklider (1950) reported that, when recordings of phonetically balanced (PB) lists of monosyllabic words were interrupted regularly by silent gaps at rates from 10 to 15 times a second (50 percent duty cycle, so that on-time and off-time were equal), the silent intervals caused the voice to sound rough and harsh, and the intelligibility dropped. When the silent gaps were filled with a broadband noise that was louder than the speech, Miller and Licklider found that the word lists sounded more ‘‘natural’’ (their ‘‘picket fence effect’’), but intelligibility was no better than it was with silence. Bashford and Warren (1987b) extended this study of the effects of filling silent gaps with noise, using three types of recorded verbal stimuli: (1) PB word lists of the type employed by Miller and Licklider; (2) an article from a popular news magazine read backwards (the individual words were pronounced normally, but they were read in reverse order with an attempt to preserve normal phrasing contours); (3) the same magazine article read in a normal fashion. The rate at which syllables occurred was matched for all three stimuli, which were presented at peak intensity levels of 70 dB. Twenty listeners adjusted the interruption rate of the verbal stimuli (50 percent duty cycle, rise/fall time of 10 ms) to their threshold for detecting deletions in the speech. Under one condition the gaps were unfilled, and under the other condition the gaps were filled with broadband noise at 80 dB. Table 6.1 lists the deletion detection thresholds obtained in this study.

When each of the types of speech shown in Table 6.1 was interrupted by silent gaps having durations below the deletion detection threshold, they sounded rough or ‘‘bubbly,’’ but perceptually discrete gaps were not heard. When the silent gaps were filled with noise, for durations below the deletion