Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Учебники / Auditory Perception - An Analysis and Synthesis Warren 2008

.pdf
Скачиваний:
199
Добавлен:
07.06.2016
Размер:
1.56 Mб
Скачать

Pitch and infrapitch iterance 85

1/3-octave bands with center frequencies below the fifth harmonic (1,000 Hz), spectral resolution of individual harmonics occurs for both the pulse train and the RFN, and their tracings appear quite similar. At center frequencies corresponding to the fifth harmonic and above, evidence of the interaction of neighboring harmonics within single 1/3-octave bands can be seen, and the response curves derived from pulse trains and RFNs differ. The waveform envelopes derived from pulse trains take on a characteristic pulsate structure, with synchronous bursts of activity corresponding to each pulse of the broadband stimulus at each of the higher filter settings. There is no such resemblance of the 1/3-octave bands of the RFN either to each other or to the broadband parent waveform. The waveforms of successive bands for the RFN above the fourth harmonic can be seen to have different randomly determined envelopes that trace the irregular peaks and troughs of the waveform. However, each of the 1/3-octave bands has the same period of 5 ms. Despite the lack of any interband correlation of waveform or fine structure, and by virtue of their common period, these RFN bands fuse seamlessly into a single pitch matching that of a 200 Hz sinusoidal tone.

The spectrally resolved lower harmonics are not required for hearing pitch. When the first ten harmonics of a 200 Hz RFN pulse train are removed by highpass filtering, a residue pitch can be heard that matches that of the lower ten, but with a quite different timbre.

Pitch and infrapitch iterance

Frequency-dependent characteristics of repeating frozen noise segments can be heard with ease as a global percept or iterance encompassing the entire waveform over a range of approximately 15 octaves extending from about 0.5 Hz through 16,000 Hz. Five octaves of iterance (0.5 Hz through 20 Hz) lie below the pitch range. Figure 3.12 shows the perceptual characteristics for the RFNs having different fundamental frequencies along this continuum, as well as their possible neural bases. Although the boundaries for perceptual qualities and neural mechanisms for iterance are shown at discrete frequencies, it should be kept in mind that these boundaries actually represent gradual transitions occurring in the vicinity of the repetition frequencies indicated.

At fundamental frequencies above approximately 8,000 Hz, RFNs are equivalent to sinusoidal tones, since the harmonics of the fundamental lie beyond the nominal 16,000 Hz limit of audibility. At frequencies below 8,000 Hz, audible harmonics produce timbres that are different from that of a sinusoidal tone having the same frequency and pitch. Below about 1,000 Hz, these timbres become especially rich and distinctive (Warren and Bashford,

86 Perception of acoustic repetition: pitch and infrapitch

Figure 3.12 The iterance continuum. The figure represents the approximately 15 octaves over which global percepts can be heard for model periodic stimuli consisting of repeated segments derived from Gaussian (white) noise. Waveform repetition frequency is designated in hertz and also as the number of octaves above the lowest frequency for the perception of iterance encompassing the entire waveform. The upper portion of the figure describes the perceptual characteristics; transitions between the qualities given are gradual, with category boundaries at the approximate positions shown. Possible neurological mechanisms available for the detection of iterance at particular waveform repetition frequencies are also shown. For further description, see the text.

1981). Warren and Bashford also reported that from about 100 down to 20 Hz, RFNs have a hiss-like component that accompanies the pitch (the hiss can be eliminated by removing harmonics lying above 4,000 Hz). From about 100 through 70 Hz, the pitch seems continuous or smooth, and from about 70 through 20 Hz the pitch is heard as accented or pulsed. Pitch is absent for repetition frequencies below about 20 Hz. For the roughly five octaves of the infrapitch range extending from 20 Hz to 0.5 Hz (repetition periods from about 50 ms to 2 s), repeated noise segments are heard as global percepts that can be divided into two ranges (Guttman and Julesz, 1963). For the high infrapitch (HiIP) region from roughly 20 through 4 Hz, iterated noise segments have a clear staccato sound which they described as ‘‘motorboating.’’ For the low infrapitch (LoIP) region from roughly 4 through 1 or 0.5 Hz, they described the perceptual quality as ‘‘whooshing.’’ In keeping with the suggestion of Warren and Bashford (1981), RFNs in the HiIP and LoIP ranges are considered as ‘‘infratones’’ producing a detectable periodicity called ‘‘infrapitch.’’ As with tonal RFNs having harmonics in the pitch range, different infratonal waveforms having the same repetition frequency can be readily distinguished by their distinctive timbres. In the HiIP range it is not possible to detect constituent components or events, but in the LoIP range, after the initial global

Pitch and infrapitch iterance 87

recognition of iterance, an assortment of components such as thumps, clanks, and rattles characteristic of individual RFNs can be heard to emerge (Warren and Bashford, 1981). (It may be of some interest to note that the durations of phonemes correspond to HiIP periods, and durations of syllables and words to LoIP periods.) It has been reported by Kaernbach (1992, 1993) that when listeners were instructed to tap to features heard in the whooshing range, there was considerable agreement in tap points by different listeners presented with the same RFNs. He also reported that the distinctive patterns within these RFNs typically had durations less than 100 ms, and that the individual patterns consisted of spectral components that could extend over a range from one to several critical bands.

When RFNs have repetition frequencies longer than 2 s (corresponding to repetition frequencies below the 0.5 Hz limit for the global percept of infrapitch), limited portions of the waveform can be heard to repeat with continued listening, and listeners can tap regularly in synchrony with these recurrent features. These repetitive events resemble the components that can be heard in the whooshing range, and are separated by featureless-seeming noise. Repetition of RFNs having periods of at least 20 s can be detected in this manner not only by laboratory personnel, but also by college students during a half-hour session in which they started with RFN in the HiIP range and then heard a series of successively increasing RFN periods (Warren, Bashford, Cooley, and Brubaker, 2001; see also Kaernbach, 2004). The ability to retain detailed auditory memory traces for long durations can make it possible to employ top-down syntactic and semantic context in speech, and also permits the recognition of repetition and variations from strict repetition of melodic themes in music. Cowan, Lichty, and Grove (1990) reported that the memory for unattended spoken syllables is at least 10 s, matching the neuromagnetic evidence described by Sams, Hari, Rif, and Knuutila (1993) that there is an auditory memory trace for a particular tone that persists for about 10 s. Using event-related brain potential measurements, Winkler, Korzyukov, Gumenyuk, et al. (2002) found that a memory trace of a particular tone could be detected for at least 30 s.

It is not necessary that the features of a long-period RFN that are heard to repeat represent some particularly unusual portion of the waveform – a crossmodal cue (a light flash) repeating synchronously at a randomly selected point along 10 s RFNs allowed listeners to tap to the same region that had been highlighted previously in the same experimental session, and also to tap to that region again after delays of 24 hours or more (Bashford, Brubaker, and Warren, 1993).

Let us examine the neural mechanisms shown in Figure 3.12 as subserving the perception of infrapitch. An RFN in the LoIP whooshing range having a

88Perception of acoustic repetition: pitch and infrapitch

repetition frequency of 2 Hz can be considered as a sequence of harmonic sinusoidal components with a fundamental frequency of 2 Hz and a 2 Hz separation between neighboring spectral components. The lowest harmonics cannot be heard, and the thousands of higher harmonics in the audible range are too closely spaced to be resolved along the basilar membrane (see Plomp, 1964, for limits of the ear’s analyzing power). Many unresolved harmonics fall within a single critical band, and since these interacting harmonics have randomly determined relative phases and amplitudes, highly complex local patterns of stimulation are produced at every stimulated locus on the basilar membrane. Although the temporal patterns differ at separate loci, each of the complex patterns is repeated at the RFN repetition frequency of 2 Hz, as illustrated in Figure 3.13, and it is this cross-frequency equivalence that

determines the ensemble of 2 Hz iterance that listeners hear.

Figure 3.13 shows the patterns corresponding to a 10 ms portion of the 500 ms period of a 2 Hz iterated pink noise segment. The noise segment was excised from pink noise bandpassed from 100 through 10,000 Hz. Since the long-term spectrum of the parent pink noise had equal power for each of the 1/3-octave bands (which approximate the critical bandwidths), the patterns at the lower frequencies shown in Figure 3.13 can be seen more clearly than they could be for white noise for which power is proportional to the center frequency of 1/3-octave bands. It should be kept in mind that the temporal patterns for the 2 Hz signal are quite long and complex, each being fifty times the 10 ms portion shown in the figure. Also, it should be noted that while not all audible 1/3-octave bands are depicted, independent patterns repeated each 500 ms would be created for each such band. It has been demonstrated that information concerning infrapitch periodicity is available along the entire length of the basilar membrane, for when a 1/3-octave filter was swept through a broadband infratonal RFN, the repetition frequency was heard at all center frequencies of the filter (Warren and Bashford, 1981). Identification of the 2 Hz repetition could be accomplished through the type of neural periodicity detection designated as ‘‘complex pattern’’ in Figure 3.12.

When a pitch-range RFN having a repetition frequency of about 20 Hz is heard at a moderate sound pressure level of 70 dB, the amplitudes of the fundamental and the first few harmonics are below the high thresholds for these low frequency sinusoidal components. However, the fourth to sixth harmonics would be at or above threshold and could be resolved along the basilar membrane. These spatially resolved sinusoidal components produce the local excitation maxima that provide the basis for the ‘‘neural place’’ cues to repetition frequency shown in Figure 3.12, and can provide information concerning frequencies of RFNs from the lower limit of pitch to the frequency

Pitch and infrapitch iterance 89

Hz 6300

5000

4000

3150

2500

2000

1600

1250

1000

800

630

500

400

315

250

200

160

BB

 

PORTION OF REPEATED 500 ms NOISE SEGMENT

 

0

5

10

500

505

510

 

TIME (ms)

 

 

TIME (ms)

 

Figure 3.13 Simulation of the nature of the basilar membrane’s response to an iterated 500 ms segment of pink noise (repetition frequency 2 Hz). The bottom of the figure shows 10 ms tracings of the unfiltered or broadband waveform (BB), with the tracings shown above obtained from the same set of 1/3-octave filters used with a

pulse train in Figure 3.9. The remaining 490 ms of the period are not shown: the tracings in the figure recommence with the beginning of the next restatement of

the iterated patterns. Each of the 1/3-octave filters generated a different complex 500 ms pattern.

90Perception of acoustic repetition: pitch and infrapitch

limit for audibility at about 16,000 Hz. Nerve fibers originating at the places of excitation maxima may also provide neural periodicity information based on phase locking to resolved harmonics. This temporal cue, designated as ‘‘simple pattern’’ in Figure 3.12, can furnish information concerning RFN repetition from about 20 Hz to the roughly 4,000 or 4,500 Hz limit of phase locking (which is designated as the ‘‘melodic’’ pitch range in Figure 3.12 since it corresponds to the range of fundamental frequencies employed by orchestral instruments).

In addition to the phase-locked simple pattern, there is another class of temporal cues available for determining the iterance of both infrapitch and some pitch range RFNs. As discussed previously, when a nerve fiber’s characteristic frequency is several times greater than the repetition frequency of an RFN, interaction of harmonics within the same critical band produces distinctive amplitude patterns that can modulate the responses of a nerve fiber. These local envelope patterns are designated as ‘‘complex pattern’’ in Figure 3.12, and are illustrated for a tonal RFN in Figure 3.11 and for an infratonal RFN in Figure 3.13. The complex patterns differ for individual critical bands, but they all are the same repetition period as the RFN. However, RFNs with a fundamental frequency above a few kilohertz cannot produce complex pattern cues to repetition frequency, since their few harmonics lying within the audible frequency range can be resolved and do not interact within the ear to produce these patterns.

Let us recapitulate the types of neural information available for the perception of iterance with our model of generic periodic stimuli consisting of RFNs that have unrestricted randomly determined relative amplitudes and phases for each of their harmonic components. There are three main types of neurophysiological cues shown in Figure 3.12 and listed below which, taken together, encompass the approximately 15 octaves of iterance encompassing waveform repetition frequencies from about 0.5 Hz to 16,000 Hz.

1.Neural place information, requiring spectral resolution of the fundamental and/or lower harmonics and operating with RFN

repetition frequencies from the lower limit of pitch at about 20 Hz up to the frequency limit of hearing at about 16,000 Hz. It is absent for infrapitch.

2.Local complex pattern information produced by the interaction of unresolved harmonics. These patterns are carried by auditory nerve fibers having characteristic frequencies several times greater than the RFN frequency repetition. This mechanism operates over the entire infrapitch range, as well as the first seven octaves of RFN pitch from about 20 Hz to about 2,500 Hz (above this limit, harmonics are resolved).

Echo pitch and infrapitch echo 91

3.Phase-locked simple pattern information based upon the resolved spectral components of RFNs lying within the range from about 20 Hz to the upper limit of phase locking at about 4,000 or 4,500 Hz.

There is also a possibility (not depicted in Figure 3.12) that phase locking to local waveform envelope features rather than the fine structure components within an envelope may provide information concerning the RFN period up to the limit of audibility (see Burns and Viemeister, 1976).

If pitch and infrapitch represent regions of the same continuum, there may be infrapitch analogs of pitch-range phenomena. We have found that some phenomena that have been studied in the pitch range also occur with infrapitch. Echo pitch is one such phenomenon.

Echo pitch and infrapitch echo

When a noise is added to itself after a delay of s seconds, this repetition is heard as a pitch equal to 1/s Hz for values of s from 2 · 10 2 through 5 · 10 4 seconds (pitches from 50 through 2,000 Hz; see Bilsen and Ritsma, 1969–1970). Echo pitch is known by several other names including time difference tone, reflection tone, time-separation pitch, repetition pitch, and rip- pled-noise pitch. This last name refers to its rippled power spectrum with peaks occurring at integral multiples of 1/s Hz (see Figure 3.14A). The pitch heard with this rippled power spectrum is the same as that of a complex tone having harmonics positioned at the peaks of the rippling – however, echo pitch is considerably weaker than the pitch of a complex tone. While the rippled power spectrum shown in the figure corresponds to a mixture of undelayed and delayed noise having equal amplitudes, surprisingly a pitch can still be heard for differences in intensity up to about 20 dB between the delayed and undelayed sounds (Yost and Hill, 1978).

The detection of repetition for recycling frozen noise at frequencies below the pitch limit, as shown in Figure 3.12, led to the expectation that it might be possible to detect infrapitch echo at long repetition delays. The anticipated infrapitch echo was reported for delays as long as 0.5 s (Warren, Bashford, and Wrightson, 1979, 1980) and confirmed by Bilsen and Wieman (1980). At delays of 0.5 s, the spectral peaks of the rippled power spectrum are separated by 2 Hz, a spacing too close to permit resolution along the basilar membrane, so that infrapitch echo could only be detected by information provided through temporal analysis. Infrapitch echo resembled infrapitch RFNs as described by Guttman and Julesz (1963), but was considerably fainter (but, based on a suggestion by Bill Hartmann, detection of infrapitch echo was made easier by

92 Perception of acoustic repetition: pitch and infrapitch

POWER PER UNIT BANDWIDTH

A COPHASIC (NORMAL) ECHO

1/2t 1/t 3/2t 2/t 5/2t

B ANTIPHASIC (180˚ PHASE SHIFTED) ECHO

1/2t

1/t

3/2t

2/t

5/2t

3/t

FREQUENCY (Hz)

Figure 3.14 Rippled power spectra produced by adding a delayed restatement of

broadband noise to itself. For further details, see the text.

1/3-octave filtering). Although infrapitch repetition could be heard over a wide range of center frequencies, it seemed clearest at center frequencies of about 1,000 to 2,000 Hz. As shown in Figure 3.15, subjects were able to match the periods of echo delays for broadband noise quite accurately to the repetition rates of periodic sounds for values of s from 0.5 through 0.01 s, corresponding to repetition frequencies from 2 Hz through 100 Hz respectively. (Matches were made on the basis of pitch iterance at 50 Hz and above, and on the basis of infrapitch iterance from 2 Hz through 20 Hz.) At frequencies of 5 Hz and below, it was possible for subjects to dispense with a matching stimulus and indicate the repetition period of infrapitch echo by rhythmical tapping of a response key.

The nature of the infrapitch temporal analysis used for noise mixed with its echo may be clarified if we compare it with the infrapitch iterance of RFNs. We have seen that continued repetition of a segment of Gaussian noise A (that is, AAA . . . ) can be detected readily as a global ‘‘whooshing’’ pattern for periods of A as long as 1 or 2 s (Guttman and Julesz, 1963). In preliminary experiments, Warren, Bashford, and Wrightson (1980) found that multiple repetitions of the same waveform were not required. If a 500 ms segment of noise (A) was

Echo pitch and infrapitch echo 93

Figure 3.15 Apparent repetition rate of broadband noise mixed with its echo. Judgments of 50 and 100 Hz were made on the basis of pitch, and judgments

of 2, 5, 10, and 20 Hz on the basis of infrapitch. For further description, see the text. (From Warren, Bashford, and Wrightson, 1980.)

repeated a single time, followed by a second 500 ms segment of noise (B) repeated once, then a third 500 ms segment of noise (C) repeated once, etc., to produce the stimulus AABBCC . . . , listeners could detect such a ‘‘pure’’ infrapitch echo based on these single repetitions with somewhat greater ease than they could detect mixed infrapitch echo. The classical mixed infrapitch echo can be considered to have the composition A þ B, B þ C, C þ D, D þ E, . . . , where each letter corresponds to an independent noise segment. It can be seen that on the first appearance of C it is mixed with noise segment B, and on its single reappearance it is mixed with noise segment D. A mechanism is required for the extraction of temporal information corresponding to C from two uncorrelated maskers. A relatively simple mechanism involving recognition of the restatement of exceptional features associated with C might work for pure infrapitch echo, but would not work for mixed echo. Different exceptional features would be created by the mixture of C with B, and of C with D, and any unique characteristics of C by itself would be changed, and changed differently by its mixture with B and its mixture with D. A mechanism used for detection of infrapitch mixed echo needs to be capable of recognizing masked single repetitions of long-period random patterns. Mathematical models based upon autocorrectional analysis proposed by Licklider (1951) and Yost and Hill (1979) for pitch range stimuli could, in principle, work in the infrapitch range as well.

94 Perception of acoustic repetition: pitch and infrapitch

Reversing the polarity of the delayed noise (phase shifting by 180 to produce ‘‘antiphasic’’ echo) produced the rippled power spectrum shown in Figure 3.14B. Infrapitch repetition could be heard with equal ease for antiphasic and cophasic addition, and the apparent repetition frequency was 1/s Hz in both cases (Warren, Bashford, and Wrightson, 1980), so that the complex temporal pattern analysis was insensitive to polarity inversion in the infrapitch range. However, inverting the echo polarity in the pitch range has long been known to produce a marked effect: antiphasic echo pitch is weaker and has two simultaneous pitches, one about 10 percent higher and one about 10 percent lower than cophasic echo when the echo delay is approximately 4 ms (Fourcin, 1965; Bilsen, 1970; Yost, Hill, and Perez-Falcon, 1978). Warren, Bashford, and Wrightson (1980) found that the change from insensitivity to the phase of the echo in the infrapitch range to the phase sensitivity characteristic of the pitch range occurred at echo delays of about 30 ms, corresponding to roughly 35 Hz.

The spectral peaks of antiphasic echo pitch form an odd-harmonic sequence (i.e., frequencies 1, 3, 5, . . . times the frequency of the lowest peak). Since the pitches that had been reported were, at least for a 4 ms delay, equal to the cophasic echo pitch of 250 Hz plus or minus roughly 10 percent, this presented a troublesome problem for classical pitch theories. Bilsen and Goldstein (1974) and Bilsen (1977) attempted to account for this puzzle by considering that a central pitch extractor uses the spectrally dominant region (the fourth and fifth peaks) of the antiphasic echo spectrum to calculate the fundamental of an all-harmonic sequence having successive harmonics closest to these two frequencies. This calculation results in two closest matches or ‘‘pseudofundamentals’’ corresponding to the dual antiphasic echo pitches perceived by listeners. However, there are serious problems with this spectral model. The peaks above the fifth lie above the dominant region used to calculate the pseudofundamentals and should have no influence upon pitch if this model is correct – yet when peaks above the seventh were removed by filtering, the deviation of the pitches heard from 1/s Hz decreased by about 35 percent (Warren and Bashford, 1988). This observation, together with the results obtained with other filtering conditions, led Warren and Bashford to conclude that broadband antiphasic echo pitch represented a weighted average of different local pitches occurring at different spectral regions. The deviations of these local pitches from the cophasic (normal) echo pitch of 1/s Hz was attributed to the changes in local delay times produced by the 180 phase change (which is equivalent to plus or minus half the period of the local frequency). Thus, the composite or total repetition delays occurring at different local positions along the basilar membrane is equal to the normal or