- •Preface
- •Acknowledgments
- •Contents
- •1 Introduction
- •1.1 Auditory Temporal and Spatial Factors
- •1.2 Auditory System Model for Temporal and Spatial Information Processing
- •2.1 Analysis of Source Signals
- •2.1.1 Power Spectrum
- •2.1.2 Autocorrelation Function (ACF)
- •2.1.3 Running Autocorrelation
- •2.2 Physical Factors of Sound Fields
- •2.2.1 Sound Transmission from a Point Source through a Room to the Listener
- •2.2.2 Temporal-Monaural Factors
- •2.2.3 Spatial-Binaural Factors
- •2.3 Simulation of a Sound Field in an Anechoic Enclosure
- •3 Subjective Preferences for Sound Fields
- •3.2.1 Optimal Listening Level (LL)
- •3.2.4 Optimal Magnitude of Interaural Crosscorrelation (IACC)
- •3.3 Theory of Subjective Preferences for Sound Fields
- •3.4 Evaluation of Boston Symphony Hall Based on Temporal and Spatial Factors
- •4.1.1 Brainstem Response Correlates of Sound Direction in the Horizontal Plane
- •4.1.2 Brainstem Response Correlates of Listening Level (LL) and Interaural Crosscorrelation Magnitude (IACC)
- •4.1.3 Remarks
- •4.2.2 Hemispheric Lateralization Related to Spatial Aspects of Sound
- •4.2.3 Response Latency Correlates of Subjective Preference
- •4.3 Electroencephalographic (EEG) Correlates of Subjective Preference
- •4.3.3 EEG Correlates of Interaural Correlation Magnitude (IACC) Changes
- •4.4.1 Preferences and the Persistence of Alpha Rhythms
- •4.4.2 Preferences and the Spatial Extent of Alpha Rhythms
- •4.4.3 Alpha Rhythm Correlates of Annoyance
- •5.1 Signal Processing Model of the Human Auditory System
- •5.1.1 Summary of Neural Evidence
- •5.1.1.1 Physical Characteristics of the Ear
- •5.1.1.2 Left and Right Auditory Brainstem Responses (ABRs)
- •5.1.1.3 Left and Right Hemisphere Slow Vertex Responses (SVRs)
- •5.1.1.4 Left and Right Hemisphere EEG Responses
- •5.1.1.5 Left and Right Hemisphere MEG Responses
- •5.1.2 Auditory Signal Processing Model
- •5.2 Temporal Factors Extracted from Autocorrelations of Sound Signals
- •5.3 Auditory Temporal Window for Autocorrelation Processing
- •5.5 Auditory Temporal Window for Binaural Processing
- •5.6 Hemispheric Specialization for Spatial Attributes of Sound Fields
- •6 Temporal Sensations of the Sound Signal
- •6.1 Combinations of Temporal and Spatial Sensations
- •6.2 Pitch of Complex Tones and Multiband Noise
- •6.2.1 Perception of the Low Pitch of Complex Tones
- •6.2.3 Frequency Limits of Missing Fundamentals
- •6.3 Beats Induced by Dual Missing Fundamentals
- •6.4 Loudness
- •6.4.1 Loudness of Sharply Filtered Noise
- •6.4.2 Loudness of Complex Noise
- •6.6 Timbre of an Electric Guitar Sound with Distortion
- •6.6.3 Concluding Remarks
- •7 Spatial Sensations of Binaural Signals
- •7.1 Sound Localization
- •7.1.1 Cues of Localization in the Horizontal Plane
- •7.1.2 Cues of Localization in the Median Plane
- •7.2 Apparent Source Width (ASW)
- •7.2.1 Apparent Width of Bandpass Noise
- •7.2.2 Apparent Width of Multiband Noise
- •7.3 Subjective Diffuseness
- •8.1 Pitches of Piano Notes
- •8.2 Design Studies of Concert Halls as Public Spaces
- •8.2.1 Genetic Algorithms (GAs) for Shape Optimization
- •8.2.2 Two Actual Designs: Kirishima and Tsuyama
- •8.3 Individualized Seat Selection Systems for Enhancing Aural Experience
- •8.3.1 A Seat Selection System
- •8.3.2 Individual Subjective Preference
- •8.3.3 Distributions of Listener Preferences
- •8.5 Concert Hall as Musical Instrument
- •8.5.1 Composing with the Hall in Mind: Matching Music and Reverberation
- •8.5.2 Expanding the Musical Image: Spatial Expression and Apparent Source Width
- •8.5.3 Enveloping Music: Spatial Expression and Musical Dynamics
- •8.6 Performing in a Hall: Blending Musical Performances with Sound Fields
- •8.6.1 Choosing a Performing Position on the Stage
- •8.6.2 Performance Adjustments that Optimize Temporal Factors
- •8.6.3 Towards Future Integration of Composition, Performance and Hall Acoustics
- •9.1 Effects of Temporal Factors on Speech Reception
- •9.2 Effects of Spatial Factors on Speech Reception
- •9.3 Effects of Sound Fields on Perceptual Dissimilarity
- •9.3.1 Perceptual Distance due to Temporal Factors
- •9.3.2 Perceptual Distance due to Spatial Factors
- •10.1 Method of Noise Measurement
- •10.2 Aircraft Noise
- •10.3 Flushing Toilet Noise
- •11.1 Noise Annoyance in Relation to Temporal Factors
- •11.1.1 Annoyance of Band-Pass Noise
- •11.2.1 Experiment 1: Effects of SPL and IACC Fluctuations
- •11.2.2 Experiment 2: Effects of Sound Movement
- •11.3 Effects of Noise and Music on Children
- •12 Introduction to Visual Sensations
- •13 Temporal and Spatial Sensations in Vision
- •13.1 Temporal Sensations of Flickering Light
- •13.1.1 Conclusions
- •13.2 Spatial Sensations
- •14 Subjective Preferences in Vision
- •14.1 Subjective Preferences for Flickering Lights
- •14.2 Subjective Preferences for Oscillatory Movements
- •14.3 Subjective Preferences for Texture
- •14.3.1 Preferred Regularity of Texture
- •15.1 EEG Correlates of Preferences for Flickering Lights
- •15.1.1 Persistence of Alpha Rhythms
- •15.1.2 Spatial Extent of Alpha Rhythms
- •15.2 MEG Correlates of Preferences for Flickering Lights
- •15.2.1 MEG Correlates of Sinusoidal Flicker
- •15.2.2 MEG Correlates of Fluctuating Flicker Rates
- •15.3 EEG Correlates of Preferences for Oscillatory Movements
- •15.4 Hemispheric Specializations in Vision
- •16 Summary of Auditory and Visual Sensations
- •16.1 Auditory Sensations
- •16.1.1 Auditory Temporal Sensations
- •16.1.2 Auditory Spatial Sensations
- •16.1.3 Auditory Subjective Preferences
- •16.1.4 Effects of Noise on Tasks and Annoyance
- •16.2.1 Temporal and Spatial Sensations in Vision
- •16.2.2 Visual Subjective Preferences
- •References
- •Glossary of Symbols
- •Abbreviations
- •Author Index
- •Subject Index
Chapter 2
Temporal and Spatial Aspects of Sounds
and Sound Fields
Temporal sequences of sounds are transformed into neural activity patterns that propagate through auditory pathways where more centrally-located processors concurrently analyze their form and interpret their meaning and relevance. Thus, a great deal of attention is paid here to analyzing the signal in the time domain. This chapter mainly treats technical aspects of the running autocorrelation function (ACF) of the signal, which contains the envelope and its finer structures as well as the power at the origin of time of the ACF. The ACF has the same information as the power density spectrum of the signal under analysis. From the ACF, however, significant factors may be extracted that are directly related to temporal sensations. The ACF signal representation exists in the auditory pathway, as is discussed in Chapters 4 and 5.
2.1 Analysis of Source Signals
2.1.1 Power Spectrum
As usual, we first discuss signal analysis in the frequency domain, in terms of the power density spectrum of a signal of time domain p(t), which is defined by
Pd(ω) = P(ω)P (ω), |
(2.1) |
where ω = 2πf, f is the frequency (Hz) and P(ω) is the Fourier transform of p(t), given by
|
1 |
+∞ |
|
P(ω) = |
|
p(t)e−jωtdt |
(2.2) |
2π |
|||
|
|
−∞ |
|
and the asterisk denotes the conjugate.
Y. Ando, P. Cariani (Guest ed.), Auditory and Visual Sensations, |
9 |
DOI 10.1007/b13253_2, C Springer Science+Business Media, LLC 2009 |
|
10 |
2 Temporal and Spatial Aspects of Sounds and Sound Fields |
The inverse Fourier transform is the original signal p(t):
p(t) = |
+∞ |
|
P(ω)ejωtdω |
(2.3) |
−∞
2.1.2 Autocorrelation Function (ACF)
For our purposes, the most useful signal representation, after the peripheral power spectrum process, is the Autocorrelation Function (ACF) of a source signal, which is defined by
lim 1 |
+T |
p (t)p (t + τ)dt |
|
|
p(τ) = T → ∞ |
|
|
(2.4) |
|
2T |
−T |
|||
where p’(t) = p(t) s(t), s(t) is the sensitivity of the ear. For practical convenience, s(t) can be chosen as the impulse response of an A-weighted network. It is worth noting that the physical system between the ear entrance and the oval window forms almost the same characteristics as the ear’s sensitivity (Ando, 1985, 1998).
The ACF can also be obtained from the power density spectrum, which is defined by Equation (2.4), so that
p(τ) = |
+∞ Pd(ω)ejωtdt |
(2.5) |
|
−∞ |
|
Pd(ω) = |
+∞ |
|
p(τ)e−jωτdτ |
(2.6) |
−∞
Thus, the ACF and the power density spectrum mathematically contain the same information. The normalized ACF is defined by
φp(τ) = p(τ)/p(0) |
(2.7) |
There are four significant items that can be extracted from the ACF:
(1)Energy represented at the origin of the delay, p(0).
(2)As shown in Fig. 2.1, the width of amplitude φ(τ), around the origin of the delay time defined at a value of 0.5, is Wφ(0), according to the fact that φ(τ) is en even function.
(3)Fine structure, including peaks and delays: For instance, τ1 and φ1 are the delay time and the amplitude of the first peak of the ACF, τn and φn being the delay time and the amplitude of the n-th peak. Usually, there are certain correlations between τ1 and τn +1 and between φ1 and φn+1, so that significant factors are τ1 and φ1.
(4)The effective duration of the envelope of the normalized ACF, τe, which is defined by the tenth-percentile delay and which represents a repetitive feature or reverberation containing the sound source itself (Fig. 2.2).
2.1Analysis of Source Signals
Fig. 2.1 Definition of temporal factors, τ 1 and φ1, as features of the normalized autocorrelation function (NACF). τ 1 is the time delay associated with the first ACF peak. φ1 is the relative amplitude of the first peak
φ (τ) p
11
Wφ(0)/2
1
φ 1
0
τ1
–1
0
Delay time τ
Fig. 2.2 Determination of the effective duration of running ACF, τe. Effective duration of the normalized ACF is defined by the delay τe at which the envelope of the normalized ACF becomes 0.1
10 log φp(τ) [dB]
0 τe
–5
–10
–15
0 |
100 |
200 |
|
Delay time τ |
[ms] |
The autocorrelation function (ACF) of any sinusoid (pure tone) having any phase is a zero-phase cosine of the same frequency. Since its waveform and ACF envelope is flat, with a slope of zero, its effective duration τe is infinite. The ACF of white noise with infinite bandwidth is the Dirac delta function δ(τ) which has an infinite slope. This means that the signal has an effective duration that approaches zero (no temporal coherence). As the bandwidth of the noise decreases, the effective duration (signal coherence) increases.
Table 2.1 lists three music and speech sources that were used extensively in many of our experiments. Motif A is the slow and sombre Royal Pavane by Orlando Gibbons (1583–1625). Motif B is the fast and playful final movement of Sinfonietta by Malcolm Arnold (1921–2006). Speech S is a poem by Japanese novelist and poet Doppo Kunikida (1871–1908). Examples of normalized ACFs (2T = 35 s) for the two extremes of slow and fast music are shown in Fig. 2.3.
12 |
2 Temporal and Spatial Aspects of Sounds and Sound Fields |
Table 2.1 Music and speech source signals used and their effective duration of the long-term ACF, τe measured in the early investigations (Ando, 1977; Ando and Kageyama, 1977), and the minimum value of running ACF, (τe)min (Ando, 1998)
Sound source1 |
Title |
Composer or writer |
τe2 (ms) |
(τe)min3 (ms) |
Music motif A |
Royal Pavane |
Orlando Gibbons |
127 (127) |
125 |
Music motif B |
Sinfonietta, Opus 48; |
Malcolm Arnold |
43 (35) |
40 |
|
Movement IV |
|
|
|
Speech S |
Poem read by a female |
Doppo Kunikida |
10 (12) |
|
|
|
|
|
|
1The left channel signals of original recorded signals (Burd, 1969) were used.
2Values of τe differ slightly with different radiation characteristics of loudspeakers used; thus all physical factors must be measured at the same condition of the hearing tests, 2T = 35 s.
3The value of (τe)min is defined by the minimum value in the running or short-moving ACF, for this analysis 2T = 2 s, with a running interval of 100 ms (see Section 5.3 for a recommended
2T). Subjective judgments may be made at the most active piece of sound signals.
(a)
(b)
Fig. 2.3 Examples of the long-time ACF (2T = 35 s) analyzed in the early stage of systematical investigations (Ando, 1977). (a) Music motif A: Royal Pavane, τe = 127 ms. (b) Music motif B: Sinfonietta, Opus 48, Movement III, allegro con brio, τe = 43 ms. Note that, according to the different characteristics of the loudspeakers used in the subjective judgments, the effective duration of ACF may differ slightly, for example, τe = 35 ms (music motif B)
