- •Preface
- •Acknowledgments
- •Contents
- •1 Introduction
- •1.1 Auditory Temporal and Spatial Factors
- •1.2 Auditory System Model for Temporal and Spatial Information Processing
- •2.1 Analysis of Source Signals
- •2.1.1 Power Spectrum
- •2.1.2 Autocorrelation Function (ACF)
- •2.1.3 Running Autocorrelation
- •2.2 Physical Factors of Sound Fields
- •2.2.1 Sound Transmission from a Point Source through a Room to the Listener
- •2.2.2 Temporal-Monaural Factors
- •2.2.3 Spatial-Binaural Factors
- •2.3 Simulation of a Sound Field in an Anechoic Enclosure
- •3 Subjective Preferences for Sound Fields
- •3.2.1 Optimal Listening Level (LL)
- •3.2.4 Optimal Magnitude of Interaural Crosscorrelation (IACC)
- •3.3 Theory of Subjective Preferences for Sound Fields
- •3.4 Evaluation of Boston Symphony Hall Based on Temporal and Spatial Factors
- •4.1.1 Brainstem Response Correlates of Sound Direction in the Horizontal Plane
- •4.1.2 Brainstem Response Correlates of Listening Level (LL) and Interaural Crosscorrelation Magnitude (IACC)
- •4.1.3 Remarks
- •4.2.2 Hemispheric Lateralization Related to Spatial Aspects of Sound
- •4.2.3 Response Latency Correlates of Subjective Preference
- •4.3 Electroencephalographic (EEG) Correlates of Subjective Preference
- •4.3.3 EEG Correlates of Interaural Correlation Magnitude (IACC) Changes
- •4.4.1 Preferences and the Persistence of Alpha Rhythms
- •4.4.2 Preferences and the Spatial Extent of Alpha Rhythms
- •4.4.3 Alpha Rhythm Correlates of Annoyance
- •5.1 Signal Processing Model of the Human Auditory System
- •5.1.1 Summary of Neural Evidence
- •5.1.1.1 Physical Characteristics of the Ear
- •5.1.1.2 Left and Right Auditory Brainstem Responses (ABRs)
- •5.1.1.3 Left and Right Hemisphere Slow Vertex Responses (SVRs)
- •5.1.1.4 Left and Right Hemisphere EEG Responses
- •5.1.1.5 Left and Right Hemisphere MEG Responses
- •5.1.2 Auditory Signal Processing Model
- •5.2 Temporal Factors Extracted from Autocorrelations of Sound Signals
- •5.3 Auditory Temporal Window for Autocorrelation Processing
- •5.5 Auditory Temporal Window for Binaural Processing
- •5.6 Hemispheric Specialization for Spatial Attributes of Sound Fields
- •6 Temporal Sensations of the Sound Signal
- •6.1 Combinations of Temporal and Spatial Sensations
- •6.2 Pitch of Complex Tones and Multiband Noise
- •6.2.1 Perception of the Low Pitch of Complex Tones
- •6.2.3 Frequency Limits of Missing Fundamentals
- •6.3 Beats Induced by Dual Missing Fundamentals
- •6.4 Loudness
- •6.4.1 Loudness of Sharply Filtered Noise
- •6.4.2 Loudness of Complex Noise
- •6.6 Timbre of an Electric Guitar Sound with Distortion
- •6.6.3 Concluding Remarks
- •7 Spatial Sensations of Binaural Signals
- •7.1 Sound Localization
- •7.1.1 Cues of Localization in the Horizontal Plane
- •7.1.2 Cues of Localization in the Median Plane
- •7.2 Apparent Source Width (ASW)
- •7.2.1 Apparent Width of Bandpass Noise
- •7.2.2 Apparent Width of Multiband Noise
- •7.3 Subjective Diffuseness
- •8.1 Pitches of Piano Notes
- •8.2 Design Studies of Concert Halls as Public Spaces
- •8.2.1 Genetic Algorithms (GAs) for Shape Optimization
- •8.2.2 Two Actual Designs: Kirishima and Tsuyama
- •8.3 Individualized Seat Selection Systems for Enhancing Aural Experience
- •8.3.1 A Seat Selection System
- •8.3.2 Individual Subjective Preference
- •8.3.3 Distributions of Listener Preferences
- •8.5 Concert Hall as Musical Instrument
- •8.5.1 Composing with the Hall in Mind: Matching Music and Reverberation
- •8.5.2 Expanding the Musical Image: Spatial Expression and Apparent Source Width
- •8.5.3 Enveloping Music: Spatial Expression and Musical Dynamics
- •8.6 Performing in a Hall: Blending Musical Performances with Sound Fields
- •8.6.1 Choosing a Performing Position on the Stage
- •8.6.2 Performance Adjustments that Optimize Temporal Factors
- •8.6.3 Towards Future Integration of Composition, Performance and Hall Acoustics
- •9.1 Effects of Temporal Factors on Speech Reception
- •9.2 Effects of Spatial Factors on Speech Reception
- •9.3 Effects of Sound Fields on Perceptual Dissimilarity
- •9.3.1 Perceptual Distance due to Temporal Factors
- •9.3.2 Perceptual Distance due to Spatial Factors
- •10.1 Method of Noise Measurement
- •10.2 Aircraft Noise
- •10.3 Flushing Toilet Noise
- •11.1 Noise Annoyance in Relation to Temporal Factors
- •11.1.1 Annoyance of Band-Pass Noise
- •11.2.1 Experiment 1: Effects of SPL and IACC Fluctuations
- •11.2.2 Experiment 2: Effects of Sound Movement
- •11.3 Effects of Noise and Music on Children
- •12 Introduction to Visual Sensations
- •13 Temporal and Spatial Sensations in Vision
- •13.1 Temporal Sensations of Flickering Light
- •13.1.1 Conclusions
- •13.2 Spatial Sensations
- •14 Subjective Preferences in Vision
- •14.1 Subjective Preferences for Flickering Lights
- •14.2 Subjective Preferences for Oscillatory Movements
- •14.3 Subjective Preferences for Texture
- •14.3.1 Preferred Regularity of Texture
- •15.1 EEG Correlates of Preferences for Flickering Lights
- •15.1.1 Persistence of Alpha Rhythms
- •15.1.2 Spatial Extent of Alpha Rhythms
- •15.2 MEG Correlates of Preferences for Flickering Lights
- •15.2.1 MEG Correlates of Sinusoidal Flicker
- •15.2.2 MEG Correlates of Fluctuating Flicker Rates
- •15.3 EEG Correlates of Preferences for Oscillatory Movements
- •15.4 Hemispheric Specializations in Vision
- •16 Summary of Auditory and Visual Sensations
- •16.1 Auditory Sensations
- •16.1.1 Auditory Temporal Sensations
- •16.1.2 Auditory Spatial Sensations
- •16.1.3 Auditory Subjective Preferences
- •16.1.4 Effects of Noise on Tasks and Annoyance
- •16.2.1 Temporal and Spatial Sensations in Vision
- •16.2.2 Visual Subjective Preferences
- •References
- •Glossary of Symbols
- •Abbreviations
- •Author Index
- •Subject Index
6.2 Pitch of Complex Tones and Multiband Noise |
93 |
is represented as the linear combination of hemisphere-specific factors can explain these differences. Temporal sensations SL and spatial sensations SR can thus be modeled in terms of the contributions of different factors that dominate in the neural response:
SL = fL(x11) + fL(x21) + ... |
+ fL(xI1),1 = 1,2, ... L |
(6.5) |
SR = fR(x1r) + fR(x2r) + ... |
+ fR(xIr),1 = 1,2, ... R |
|
where L + R = J.
Individual differences in the weighting of factors can also produce differences in sensation and preference. Even for such temporal and spatial sensations, there are substantial individual differences due to multiple physical factors, as expressed by Equations (6.1) and (6.2). Individual differences can be caused both by differing individual sensitivities and/or unique responses to the various factors. These differences of sensation and preference can be seen as characteristics of different individual listeners who have distinct auditory and visual “personalities.”
Subjective responses that are related to the overall intensity of the evoked perceptual experience (e.g., preference or annoyance) can be expressed by both temporal and spatial factors, SL and SR, so that
S = SL + SR |
(6.6) |
Going back to the theory of subjective preference of the sound field described in Section 3.3, each of the scale values in Equation (6.6) may be given by SL = S2 + S3 and SR = S1 + S4. It is worth noting that on the subjective preference judgments for the sound field, factors such as τ1, φ1 extracted from the ACF and WIACC extracted from the IACF play a minor influence. In the following sections, we shall discuss temporal sensations according to the guideline given by Equation (6.6) with limited significant temporal factors.
6.2 Pitch of Complex Tones and Multiband Noise
Pitches that are heard at the fundamental frequencies of harmonic complex tones have relatively straightforward correlates in patterns of major peaks in their autocorrelation functions (ACFs). The pitch period corresponds to the time delay (τ1) of the first major peak. The pitch of multiband “complex noise” is also described by the value of τ1, and its strength is related to the value φ1. The autocorrelation model for pitch sensation holds for fundamental frequencies below 4-5 kHz and for missing fundamentals below 1200 Hz.
6.2.1 Perception of the Low Pitch of Complex Tones
Most of the sounds in tonal music that constitute the notes of melodies and harmonies are harmonic complex tones rather than pure tones. A harmonic complex
94 |
6 Temporal Sensations of the Sound Signal |
tone consists of a series of partials whose frequencies (f1, f2, f3, . . ., fm) are integer multiples (n = 1, 2, 3, . . ., m) of its fundamental frequency (F0). Such harmonic complexes produce the strongest pitches at their fundamentals, so long as these periodicities lie in the existence region of musical tonality (roughly 30–5000 Hz). Other, weaker pitches can also be heard that correspond to individual partials, especially the first five (harmonic number n < 6). What is interesting is that harmonic complexes having no energy at the fundamental frequency in their power spectra (i.e., they have only “upper” partials) can still produce strong “low” pitch at the fundamental itself. It is thus the cases for complex tones with a “missing fundamental” that strong pitches are heard that correspond to no individual frequency component, and this raises deep questions about whether patterns of pitch perception are consistent with frequency-domain representations. In order to save the notion of the auditory system as a general Fourier processor, it becomes necessary to postulate a complicated central harmonic analyzer.
As a result of these difficulties, some auditory theorists (Seebeck, 1844; Wever, 1949; Licklider, 1951; Rose, 1980) have instead sought temporal explanations for pitch, pointing to the elegance with which time-domain representations cope with the phenomenon of the missing fundamental. In the ACF, the positions of major peaks, which reflect the fundamental, are unchanged. Temporal theories have the advantage of explaining pitch perception of both low-frequency pure tones and complex tones in terms of the same central representations and mechanisms. They account for those pitch phenomena most important for music and speech (i.e., for periodicities between 30 and 4000 Hz). These explanations notwithstanding, it is also clear that temporal representations cannot account for high-frequency hearing and the (atonal) pitches evoked by pure tones with frequencies above 5000 Hz. Moreover, most auditory centers throughout the pathway have spatially ordered frequency maps that mimic the rough tonotopic organization of the cochlea.
For these reasons, many auditory theorists have postulated that hearing is based on dual frequencyand time-domain auditory representations. Maps based on cochlear “place” have been thought to cover the frequency range of pure-tone hearing and cochlear resonances, whereas the temporal representation has been thought to cover the range of periodicities available in neuronal firing patterns (roughly up to 4–5 kHz).
The first autocorrelation model developed to account for the pitch of the missing fundamental phenomenon was therefore originally formulated as a “duplex” model (Licklider, 1951). Licklider’s time-delay neural network architecture was similar in many respects to the Jeffress (1948) model of binaural crosscorrelation that had been proposed 3 years earlier. Licklider used a network of delay lines and coincidence counters arranged along the axes of frequency and delay to compute both a central spectrum and a central global temporal autocorrelation representation. Licklider’s later “triplex” model (1959) added a binaural crosscorrelation stage to the duplex model. In a similar vein, Cherry and Sayers (1956) combined autocorrelation and crosscorrelation operations to deal with issues related to aural fusion, sound separation, and directional hearing.
After a series of turns in the evolution of pitch theory (for a historical review, see de Boer, 1976), temporal models were neglected in favor of spectral pat-
6.2 Pitch of Complex Tones and Multiband Noise |
95 |
tern approaches. In the wake of the difficulties with Schouten’s temporal theory, spectral pattern recognition models were proposed to explain the strong low pitches produced by low, perceptually resolved harmonics (Goldstein, 1973; Wightman, 1973a,b; Terhardt, 1974). Two mechanisms were assumed, a spectral pattern mechanism for strong pitches of perceptually resolved low harmonics, and a temporal mechanism for weak pitches of perceptually unresolved high harmonics. Because the best models for low-frequency pure-tone pitch discrimination use interspike interval information, some theorists (Goldstein, 1973) left open the possibility that central representations of frequency might be based on interspike interval information in early auditory stations. Explicit temporal representations were thus marginalized to pitches produced by unresolved harmonics; phenomena that are largely irrelevant for pitch in music and speech.
Beginning in the 1980s, temporal models for pitch that were based on first-order interspike intervals (times between successive spikes produced by a given neuron) in the auditory nerve were proposed (Moore, 2003; van Noorden, 1982). In these models, interspike interval information was pooled together from all regions of the auditory nerve to form a temporal population code for frequency and periodicity. By the end of the decade, temporal autocorrelation models for pitch were revived and tested using computer simulations of the cochlea and auditory nerve (Meddis and Hewitt, 1991a,b). These autocorrelation models are based on all-order interspike intervals (times between all spikes produced by a neuron, consecutive and nonconsecutive) rather than first-order intervals. Soon after, neurophysiological studies of temporal discharge patterns in the cat auditory nerve (Cariani and Delgutte, 1996a,b; Cariani, 1999, 2001) were conducted to test the temporal models. Taken together, the computer simulations and neurophysiological studies showed that the temporal autocorrelation models based on interspike interval distributions could predict a very wide range of pitch phenomena: pitch of the missing fundamental, pitch equivalence between pure and complex tones, level and phase invariance, pitch shift of inharmonic complex tones, pitch dominance, octave similarity, and the nonspectral pitch of amplitude-modulated noise.
Analogous phenomena have also been observed for nonperiodic, inharmonic complex tones as well as nonstationary sounds (noises). It is important to note that more advanced temporal models go well beyond autocorrelation operations on the stimulus itself to include cochlear filtering and neuronal dynamics. Another line of research in temporal models for pitch has focused on the role of cochlear filtering on the temporal structure of the resulting signals. These studies (Yost et al., 1978; Yost, 1996a,b) used rippled noise stimuli to probe pitch strength, peripheral weighting, and the effects of the dominance region for pitch (Ritsma, 1967).
Time-domain cancellation models involving an array of delay lines and inhibitory gating neurons have also been proposed, and these generally behave in a manner similar to those based on autocorrelation (Cheveigne, 1998, 2004).
Here we propose a model for pitch that is based on a central autocorrelation representation (ACF). The ACF model predicts the pitches not only of complex tones and ripple noise, but also of multiband complex noise with missing fundamentals. Pitch can be calculated by the delay τ1 associated with the first major ACF peak, where pitch strength corresponds to the amplitude φ1 of this peak. The main purpose
96 |
6 Temporal Sensations of the Sound Signal |
of the next set of experiments described below is to apply the ACF model to predict the pitch of a harmonic complex with a missing fundamental.
First, a pitch-matching test, comparing pitches of pure and complex tones, was performed to reconfirm previous results (Sumioka and Ando, 1996). The test signals were all complex tones consisting of harmonics 3–7 of a 200-Hz fundamental. All tone components had the same amplitudes, as shown in Fig. 6.1. As test signals, the two waveforms of complex tones, (a) in-phases and (b) random-phases, were applied as shown in Fig. 6.2. Starting phases of all components of the in-phase stimuli were set at zero. The phases of the components of random-phase stimuli were randomly set to avoid any periodic peaks in the real waveforms. As shown in Fig. 6.3, the normalized ACF (NACF) of these stimuli were calculated at the integrated interval 2T = 0.8 s. Though the waveforms differ greatly from each other, as shown in Fig. 6.3, their NACFs are identical. The time delay at the first maximum peak of the NACF, τ1, equals 5 ms (200 Hz), corresponding to the fundamental frequency. Five 20to 26-year-old musicians participated as subjects in the experiment. Test signals were produced from the loudspeaker in front of each subject in a semi-anechoic chamber. The SPL of each complex tone at the center position of the listener’s head was fixed at 74 dB by analysis of the ACF (0). The distance between a subject and the loudspeaker was 0.8 m ± 1 cm.
Pitch matching results for the five subjects are shown in Fig. 6.4. The histograms show matching frequencies within each semitone (1/12 octave) band for in-phase and random-phase stimuli. The dominant pitch match of 200 Hz is absent from the spectrum of both stimuli, and this periodicity is not at all apparent in the waveform of the random-phase signal. However, it is readily apparent in the autocorrelation functions of the two stimuli, which are identical to each other (Fig. 6.3). For both in-phase and random-phase conditions, about 60% of the responses clustered within a semitone of the fundamental. There are no major differences in the distributions of pitch-matching data between the two conditions.
For more detail, the averaged values and standard deviations (SD) of the data obtained from each subject at frequencies near 200 Hz are listed in Table 6.1. Results obtained for pitch under the two conditions are definitely similar. In fact, the pitch strength remains invariant under both conditions. Thus, pitch of complex tones can be predicted from the time delay at the first major peak τ 1 of the NACF. This conclusion is in agreement with the findings of Yost (1996a), who demonstrated that pitch perception of iterated rippled noise determined by the first major ACF peak of the stimulus signal.
From Equation (6.6), pitch as one of temporal sensations may be expressed by
S = SL = fL(τ1) ≈ 1/τ1(Hz), |
(6.7) |
when φ1 = 1.
Individual differences in pitch perception were also found. The results for each subject are indicated in Fig. 6.5. Subjects B and D matched only around the fundamental frequency (200 Hz). About 20% of the responses were clustered around 400 Hz, and the NACF has a distinct dip at τ = 2.5 ms (Fig. 6.3). However, an octave
6.2 Pitch of Complex Tones and Multiband Noise |
97 |
Fig. 6.1 Complex tone presented with pure-tone components of 600, 800, 1000, 1200, and 1400 Hz without the fundamental frequency of 200 Hz
Fig. 6.2 Waveforms of 200 Hz missing-fundamental complex tones consisting of in-phase components (top) and random-phase components (bottom)
Fig. 6.3 Normalized autocorrelation function (NACF) of the two complex tones with different phase components, τ1 = 5 ms (200 Hz)
shift for a phase change (Lundeen and Small, 1984) was not observed in the results obtained from these subjects. Subjects A and E matched at the fundamental frequency and at the frequency an octave higher. This octave change might be caused by a similarity for the octave relation. The time delay of the ACF for this pitch is
98 |
6 Temporal Sensations of the Sound Signal |
Fig. 6.4 Results of pitch-matching tests for the two complex tones, τ1 = 5 ms (five subjects)
Table 6.1 Mean and standard deviation (SD) of the pitch-matching test for each subject
Matched frequency, mean value (Hz) |
SD (Hz) |
|
||||
|
|
|
|
|
|
|
Subject |
In-phase |
Random phase |
In-phase |
Random phase |
||
|
|
|
|
|
|
|
A |
202.6 |
201.0 |
|
1.89 |
2.44 |
|
B |
199.1 |
198.3 |
|
1.70 |
1.42 |
|
C |
202.5 |
202.1 |
|
1.18 |
1.76 |
|
D |
203.7 |
201.7 |
|
2.29 |
1.65 |
|
E |
202.2 |
202.2 |
|
1.87 |
2.07 |
|
Total |
201.9 |
201.0 |
|
2.43 |
2.38 |
|
|
|
|
|
|
|
|
6.2 Pitch of Complex Tones and Multiband Noise |
99 |
Fig. 6.5 Results of the pitch-matching tests for each of five subjects. (a–e)
100 |
6 Temporal Sensations of the Sound Signal |
2.5 ms, so this pitch cannot be predicted because of a dip in the ACF structure. None of the subjects matched at τ1 = 10 ms (100 Hz), which is an octave lower than the fundamental frequency, though there is a peak at τ1 = 10 ms (Fig. 6.3). Subject C matched in three categories of center frequencies (200.0, 224.5, and 317.5 Hz). Subject C may have sought a harmonic relation because he is a musician who uses the key of E-flat. Two notes in the E-flat major triad, E-flat and G, correspond to the semitone bins that had center frequencies of 317.5 and 200 Hz respectively. Despite these categorical errors, subject C’s pitch-matches in the vicinity of 200 Hz (Table 6.1) were comparable in accuracy to those of the other subjects.
6.2.2 Pitch of Multiband “Complex Noise”
The purpose of this experiment using complex noise was to determine if the amplitude φ1 of the first major autocorrelation peak determines the perceived strength of the pitch.
The experimental method was the same as that of the experiment described in the previous section. The bandwidths of each partial noise, which consist of the bandpass white noise with a cutoff slope of 1080 dB/octave, were changed. The center frequencies of the band noise components were 600, 800, 1000, 1200, and 1400 Hz. The complex signal consisting of band-pass noises with different center frequencies is called here “complex noise.” The bandwidths ( f) of the four components were 40, 80, 120, and 160 Hz (Fig. 6.6). Their waveforms (Fig. 6.7, left plots) have no obvious envelope periodicities. Measured results of the NACF for four conditions are shown on the right of Fig. 6.7. The amplitude of the maximum peak (indicated by arrows in the figures) in the NACF is increased with decreasing f. Four musicians from the first test and a new musician, 20 to 25 years old, participated as subjects in this experiment.
Fig. 6.6 Multiband complex noise containing five passbands with center frequencies: 600, 800, 1,000, 1,200, and 1,400 Hz. The fundamental frequency is centered on 200 Hz
The probabilities of the matching data counted for each 1/12-octave band are shown in Fig. 6.8. All histograms show that there is a strong tendency to perceive a pitch of 200 Hz for each stimulus. This agrees with the prediction based on the value of τ1. These results indicate that a stimulus with a narrow bandwidth gives a stronger pitch corresponding to 200 Hz than does a stimulus with a wide bandwidth. The standard deviation (SD) for the perceived pitches increased because the value
