- •Preface
- •Acknowledgments
- •Contents
- •1 Introduction
- •1.1 Auditory Temporal and Spatial Factors
- •1.2 Auditory System Model for Temporal and Spatial Information Processing
- •2.1 Analysis of Source Signals
- •2.1.1 Power Spectrum
- •2.1.2 Autocorrelation Function (ACF)
- •2.1.3 Running Autocorrelation
- •2.2 Physical Factors of Sound Fields
- •2.2.1 Sound Transmission from a Point Source through a Room to the Listener
- •2.2.2 Temporal-Monaural Factors
- •2.2.3 Spatial-Binaural Factors
- •2.3 Simulation of a Sound Field in an Anechoic Enclosure
- •3 Subjective Preferences for Sound Fields
- •3.2.1 Optimal Listening Level (LL)
- •3.2.4 Optimal Magnitude of Interaural Crosscorrelation (IACC)
- •3.3 Theory of Subjective Preferences for Sound Fields
- •3.4 Evaluation of Boston Symphony Hall Based on Temporal and Spatial Factors
- •4.1.1 Brainstem Response Correlates of Sound Direction in the Horizontal Plane
- •4.1.2 Brainstem Response Correlates of Listening Level (LL) and Interaural Crosscorrelation Magnitude (IACC)
- •4.1.3 Remarks
- •4.2.2 Hemispheric Lateralization Related to Spatial Aspects of Sound
- •4.2.3 Response Latency Correlates of Subjective Preference
- •4.3 Electroencephalographic (EEG) Correlates of Subjective Preference
- •4.3.3 EEG Correlates of Interaural Correlation Magnitude (IACC) Changes
- •4.4.1 Preferences and the Persistence of Alpha Rhythms
- •4.4.2 Preferences and the Spatial Extent of Alpha Rhythms
- •4.4.3 Alpha Rhythm Correlates of Annoyance
- •5.1 Signal Processing Model of the Human Auditory System
- •5.1.1 Summary of Neural Evidence
- •5.1.1.1 Physical Characteristics of the Ear
- •5.1.1.2 Left and Right Auditory Brainstem Responses (ABRs)
- •5.1.1.3 Left and Right Hemisphere Slow Vertex Responses (SVRs)
- •5.1.1.4 Left and Right Hemisphere EEG Responses
- •5.1.1.5 Left and Right Hemisphere MEG Responses
- •5.1.2 Auditory Signal Processing Model
- •5.2 Temporal Factors Extracted from Autocorrelations of Sound Signals
- •5.3 Auditory Temporal Window for Autocorrelation Processing
- •5.5 Auditory Temporal Window for Binaural Processing
- •5.6 Hemispheric Specialization for Spatial Attributes of Sound Fields
- •6 Temporal Sensations of the Sound Signal
- •6.1 Combinations of Temporal and Spatial Sensations
- •6.2 Pitch of Complex Tones and Multiband Noise
- •6.2.1 Perception of the Low Pitch of Complex Tones
- •6.2.3 Frequency Limits of Missing Fundamentals
- •6.3 Beats Induced by Dual Missing Fundamentals
- •6.4 Loudness
- •6.4.1 Loudness of Sharply Filtered Noise
- •6.4.2 Loudness of Complex Noise
- •6.6 Timbre of an Electric Guitar Sound with Distortion
- •6.6.3 Concluding Remarks
- •7 Spatial Sensations of Binaural Signals
- •7.1 Sound Localization
- •7.1.1 Cues of Localization in the Horizontal Plane
- •7.1.2 Cues of Localization in the Median Plane
- •7.2 Apparent Source Width (ASW)
- •7.2.1 Apparent Width of Bandpass Noise
- •7.2.2 Apparent Width of Multiband Noise
- •7.3 Subjective Diffuseness
- •8.1 Pitches of Piano Notes
- •8.2 Design Studies of Concert Halls as Public Spaces
- •8.2.1 Genetic Algorithms (GAs) for Shape Optimization
- •8.2.2 Two Actual Designs: Kirishima and Tsuyama
- •8.3 Individualized Seat Selection Systems for Enhancing Aural Experience
- •8.3.1 A Seat Selection System
- •8.3.2 Individual Subjective Preference
- •8.3.3 Distributions of Listener Preferences
- •8.5 Concert Hall as Musical Instrument
- •8.5.1 Composing with the Hall in Mind: Matching Music and Reverberation
- •8.5.2 Expanding the Musical Image: Spatial Expression and Apparent Source Width
- •8.5.3 Enveloping Music: Spatial Expression and Musical Dynamics
- •8.6 Performing in a Hall: Blending Musical Performances with Sound Fields
- •8.6.1 Choosing a Performing Position on the Stage
- •8.6.2 Performance Adjustments that Optimize Temporal Factors
- •8.6.3 Towards Future Integration of Composition, Performance and Hall Acoustics
- •9.1 Effects of Temporal Factors on Speech Reception
- •9.2 Effects of Spatial Factors on Speech Reception
- •9.3 Effects of Sound Fields on Perceptual Dissimilarity
- •9.3.1 Perceptual Distance due to Temporal Factors
- •9.3.2 Perceptual Distance due to Spatial Factors
- •10.1 Method of Noise Measurement
- •10.2 Aircraft Noise
- •10.3 Flushing Toilet Noise
- •11.1 Noise Annoyance in Relation to Temporal Factors
- •11.1.1 Annoyance of Band-Pass Noise
- •11.2.1 Experiment 1: Effects of SPL and IACC Fluctuations
- •11.2.2 Experiment 2: Effects of Sound Movement
- •11.3 Effects of Noise and Music on Children
- •12 Introduction to Visual Sensations
- •13 Temporal and Spatial Sensations in Vision
- •13.1 Temporal Sensations of Flickering Light
- •13.1.1 Conclusions
- •13.2 Spatial Sensations
- •14 Subjective Preferences in Vision
- •14.1 Subjective Preferences for Flickering Lights
- •14.2 Subjective Preferences for Oscillatory Movements
- •14.3 Subjective Preferences for Texture
- •14.3.1 Preferred Regularity of Texture
- •15.1 EEG Correlates of Preferences for Flickering Lights
- •15.1.1 Persistence of Alpha Rhythms
- •15.1.2 Spatial Extent of Alpha Rhythms
- •15.2 MEG Correlates of Preferences for Flickering Lights
- •15.2.1 MEG Correlates of Sinusoidal Flicker
- •15.2.2 MEG Correlates of Fluctuating Flicker Rates
- •15.3 EEG Correlates of Preferences for Oscillatory Movements
- •15.4 Hemispheric Specializations in Vision
- •16 Summary of Auditory and Visual Sensations
- •16.1 Auditory Sensations
- •16.1.1 Auditory Temporal Sensations
- •16.1.2 Auditory Spatial Sensations
- •16.1.3 Auditory Subjective Preferences
- •16.1.4 Effects of Noise on Tasks and Annoyance
- •16.2.1 Temporal and Spatial Sensations in Vision
- •16.2.2 Visual Subjective Preferences
- •References
- •Glossary of Symbols
- •Abbreviations
- •Author Index
- •Subject Index
6.2 Pitch of Complex Tones and Multiband Noise |
101 |
Fig. 6.7 Waveforms and the NACF of the four complex noises applied with f = (a) 40 Hz, (b) 80 Hz, (c) 120 Hz, and (d) 160 Hz
of φ1 decreased as f increased. The probabilities matched to 400 Hz (one octave higher than 200 Hz) keep increasing as the bandwidth becomes narrower. This is caused by the similarity of the octave relation under the pitch perception, which also appears in the first experiment. The probability of pitch around 200 Hz being identified is plotted in Fig. 6.9, as the function of the φ1. For narrower-band noise, the probability of a pitch of the fundamental frequency increases as the magnitude of the 5-ms peaks in the NACF increases. Thus, as φ1 increases, pitch strength also increases (r = 0.98). In this figure, the pitch-matching result from the previous section using the complex tones is also plotted at φ1 = 1.0.
Individual differences were also observed in the results obtained in tests with complex noises (Sakai et al., unpublished data). To summarize the results of this experiment, we found that the ACF model also successfully predicts the pitch of multiband complex noise stimuli with missing fundamentals.
6.2.3 Frequency Limits of Missing Fundamentals
We conducted a pitch-matching experiment to determine the upper frequency limit of pitches evoked by harmonic complex tones with missing fundamentals. Pitchmatching tests were conducted for two conditions: (1) for complex tones consisting
102 |
6 Temporal Sensations of the Sound Signal |
Fig. 6.8 Results of pitch-matching tests, with all five subjects. f: (a) 40 Hz, (b) 80 Hz, (c) 120 Hz, and (d) 160 Hz
of harmonics 2-4 (i.e. 2F0, 3F0, 4F0) of fundamental frequencies F0s of 500, 1000, 1200, 1600, 2000, or 3000, Hz and (2) for complex tones consisting only of harmonics 2 and 3 (i.e. 2F0, 3F0). It was found that (1) the ACF model holds for missing fundamental frequencies up to roughly 1200 Hz; (2) within this frequency range, the pitch can be reliably matched to the missing fundamental frequency even if the harmonic complex consists of only two tones.
For fundamental frequencies of 500, 1000, 1200, 1600, 2000, and 3000 Hz, stimuli consisting of two or three pure-tone components were produced in a computer (Inoue et al., 2001). The two-component stimuli consisted of the second and third harmonics of the fundamental frequency, and the three-component stimuli consisted of the second, third, and fourth harmonics. The starting phase of all components was
6.2 Pitch of Complex Tones and Multiband Noise |
103 |
Fig. 6.9 Relationship between φ1 and probability of the pitch being within 200 ± 16 Hz (r = 0.98, p < 0.01). For reference, the plot ( ) at φ1 = 1 is the result with the pure tone
adjusted to zero (in phase). The total SPL at the center of the listener’s head was fixed at 74 dB. The NACF of all stimuli was calculated obtaining the peak τ1 related to the fundamental frequency. The loudspeaker was placed in front of a subject in an anechoic chamber. The distance between the center of the subject’s head and the loudspeaker was 0.8 m. Three 21to 27-year-old musicians participated as subjects in the experiment. Pitch-matching tests were conducted using complex tones as test stimuli and a pure tone generated by a sinusoidal generator as a reference.
Pitch matches for all subjects are shown in Fig. 6.10. Whenever the missing fundamental frequency of the stimulus was 500, 1000, or 1200 Hz, more than 90% of
Fig. 6.10 Probability that three subjects adjusted a pure tone near the fundamental frequency of complex tones. Empty circles are results for two harmonics, and full squares are those for three harmonics
104 |
6 Temporal Sensations of the Sound Signal |
the responses obtained from all subjects under both conditions clustered around the fundamental frequency. When missing fundamentals were 1600, 2000, or 3000 Hz, however, the probability that the subjects matched the frequency of the pure tone to the fundamental frequency was much lower. These results imply that the ACF model is applicable when stimuli have missing fundamentals of 1200 Hz or less.
The reasons for this upper limit are fairly straightforward. According to neuronal autocorrelation models, in order to evoke a “missing fundamental,” one needs to satisfy at least one of two conditions that involve respectively either individual, cochlear-resolved harmonics or envelopes of unresolved, interacting harmonics (Cariani and Delgutte, 1996a,b). In the first mode, interspike intervals associated with individual harmonics are produced and summed together across the auditory nerve. Here one needs at least two resolved harmonics that are below the limit of significant phase-locked temporal information ( 4000 Hz), such that interspike intervals associated with their common subharmonic, the fundamental, will predominate in the pooled ACF representation. In the second mode, pairs of unresolved adjacent harmonics beat together to produce interspike intervals associated with their beat period, which is the fundamental period. For several reasons, this mechanism that is based on interval representation of the stimulus envelope is less effective at producing intervals close to the fundamental period, and as a consequence, the pitches evoked are weaker than those associated with the first mechanism. In the current context, in order to represent a 1500-Hz missing fundamental using the envelopebased mechanism, one would need several pairs of unresolved harmonics, all at 9000 Hz or above (n > 5, for F0 = 1500 Hz, fn > 9000 Hz). Because there are relatively few auditory nerve fibers in humans that are responsive to such high frequencies, and intervals from all regions are pooled together, intervals associated with envelopes in these frequency regions are dwarfed by the spontaneous activity in the rest of the auditory nerve. The result is that the interval peaks associated with the F0 envelope period are very shallow and do not rise above the signal/background threshold required for an audible low pitch [in Cariani and Delgutte (1996a), AM tones with 6400-Hz carriers and F0s from 80 to 320 Hz presented at 60 dB SPL did not generate enough intervals to exceed this signal/background threshold].
On the low-frequency side of fundamental pitch perception, in psychophysical experiments, the lowest periodicities that produce clear pitches capable of supporting melodic recognition are approximately 30 Hz (Pressnitzer et al., 2001). This may be a consequence of a limitation in the longest interspike intervals that central auditory pitch processors analyze. Many current ACF models of pitch and consonance (e.g., Cariani, 2001, 2002, 2004) therefore use a tapering interval weighting system that eliminates from consideration intervals longer than 33 ms.
It is worth noting that results of evoked magnetic response (N1m latency) correspond to the fundamental frequency down to 19 Hz (Yrttiaho et al., 2008). Thus, Equation (6.7) could hold for the fundamental frequency, 19 Hz < fL(τ1) ≤ 1200 Hz.
So far, we have come to the following two conclusions:
1.For low pitches of complex tones, the ACF model is applicable when a missing fundamental frequency is below 1200 Hz, and probably above 19 Hz.
