- •Preface
- •Acknowledgments
- •Contents
- •1 Introduction
- •1.1 Auditory Temporal and Spatial Factors
- •1.2 Auditory System Model for Temporal and Spatial Information Processing
- •2.1 Analysis of Source Signals
- •2.1.1 Power Spectrum
- •2.1.2 Autocorrelation Function (ACF)
- •2.1.3 Running Autocorrelation
- •2.2 Physical Factors of Sound Fields
- •2.2.1 Sound Transmission from a Point Source through a Room to the Listener
- •2.2.2 Temporal-Monaural Factors
- •2.2.3 Spatial-Binaural Factors
- •2.3 Simulation of a Sound Field in an Anechoic Enclosure
- •3 Subjective Preferences for Sound Fields
- •3.2.1 Optimal Listening Level (LL)
- •3.2.4 Optimal Magnitude of Interaural Crosscorrelation (IACC)
- •3.3 Theory of Subjective Preferences for Sound Fields
- •3.4 Evaluation of Boston Symphony Hall Based on Temporal and Spatial Factors
- •4.1.1 Brainstem Response Correlates of Sound Direction in the Horizontal Plane
- •4.1.2 Brainstem Response Correlates of Listening Level (LL) and Interaural Crosscorrelation Magnitude (IACC)
- •4.1.3 Remarks
- •4.2.2 Hemispheric Lateralization Related to Spatial Aspects of Sound
- •4.2.3 Response Latency Correlates of Subjective Preference
- •4.3 Electroencephalographic (EEG) Correlates of Subjective Preference
- •4.3.3 EEG Correlates of Interaural Correlation Magnitude (IACC) Changes
- •4.4.1 Preferences and the Persistence of Alpha Rhythms
- •4.4.2 Preferences and the Spatial Extent of Alpha Rhythms
- •4.4.3 Alpha Rhythm Correlates of Annoyance
- •5.1 Signal Processing Model of the Human Auditory System
- •5.1.1 Summary of Neural Evidence
- •5.1.1.1 Physical Characteristics of the Ear
- •5.1.1.2 Left and Right Auditory Brainstem Responses (ABRs)
- •5.1.1.3 Left and Right Hemisphere Slow Vertex Responses (SVRs)
- •5.1.1.4 Left and Right Hemisphere EEG Responses
- •5.1.1.5 Left and Right Hemisphere MEG Responses
- •5.1.2 Auditory Signal Processing Model
- •5.2 Temporal Factors Extracted from Autocorrelations of Sound Signals
- •5.3 Auditory Temporal Window for Autocorrelation Processing
- •5.5 Auditory Temporal Window for Binaural Processing
- •5.6 Hemispheric Specialization for Spatial Attributes of Sound Fields
- •6 Temporal Sensations of the Sound Signal
- •6.1 Combinations of Temporal and Spatial Sensations
- •6.2 Pitch of Complex Tones and Multiband Noise
- •6.2.1 Perception of the Low Pitch of Complex Tones
- •6.2.3 Frequency Limits of Missing Fundamentals
- •6.3 Beats Induced by Dual Missing Fundamentals
- •6.4 Loudness
- •6.4.1 Loudness of Sharply Filtered Noise
- •6.4.2 Loudness of Complex Noise
- •6.6 Timbre of an Electric Guitar Sound with Distortion
- •6.6.3 Concluding Remarks
- •7 Spatial Sensations of Binaural Signals
- •7.1 Sound Localization
- •7.1.1 Cues of Localization in the Horizontal Plane
- •7.1.2 Cues of Localization in the Median Plane
- •7.2 Apparent Source Width (ASW)
- •7.2.1 Apparent Width of Bandpass Noise
- •7.2.2 Apparent Width of Multiband Noise
- •7.3 Subjective Diffuseness
- •8.1 Pitches of Piano Notes
- •8.2 Design Studies of Concert Halls as Public Spaces
- •8.2.1 Genetic Algorithms (GAs) for Shape Optimization
- •8.2.2 Two Actual Designs: Kirishima and Tsuyama
- •8.3 Individualized Seat Selection Systems for Enhancing Aural Experience
- •8.3.1 A Seat Selection System
- •8.3.2 Individual Subjective Preference
- •8.3.3 Distributions of Listener Preferences
- •8.5 Concert Hall as Musical Instrument
- •8.5.1 Composing with the Hall in Mind: Matching Music and Reverberation
- •8.5.2 Expanding the Musical Image: Spatial Expression and Apparent Source Width
- •8.5.3 Enveloping Music: Spatial Expression and Musical Dynamics
- •8.6 Performing in a Hall: Blending Musical Performances with Sound Fields
- •8.6.1 Choosing a Performing Position on the Stage
- •8.6.2 Performance Adjustments that Optimize Temporal Factors
- •8.6.3 Towards Future Integration of Composition, Performance and Hall Acoustics
- •9.1 Effects of Temporal Factors on Speech Reception
- •9.2 Effects of Spatial Factors on Speech Reception
- •9.3 Effects of Sound Fields on Perceptual Dissimilarity
- •9.3.1 Perceptual Distance due to Temporal Factors
- •9.3.2 Perceptual Distance due to Spatial Factors
- •10.1 Method of Noise Measurement
- •10.2 Aircraft Noise
- •10.3 Flushing Toilet Noise
- •11.1 Noise Annoyance in Relation to Temporal Factors
- •11.1.1 Annoyance of Band-Pass Noise
- •11.2.1 Experiment 1: Effects of SPL and IACC Fluctuations
- •11.2.2 Experiment 2: Effects of Sound Movement
- •11.3 Effects of Noise and Music on Children
- •12 Introduction to Visual Sensations
- •13 Temporal and Spatial Sensations in Vision
- •13.1 Temporal Sensations of Flickering Light
- •13.1.1 Conclusions
- •13.2 Spatial Sensations
- •14 Subjective Preferences in Vision
- •14.1 Subjective Preferences for Flickering Lights
- •14.2 Subjective Preferences for Oscillatory Movements
- •14.3 Subjective Preferences for Texture
- •14.3.1 Preferred Regularity of Texture
- •15.1 EEG Correlates of Preferences for Flickering Lights
- •15.1.1 Persistence of Alpha Rhythms
- •15.1.2 Spatial Extent of Alpha Rhythms
- •15.2 MEG Correlates of Preferences for Flickering Lights
- •15.2.1 MEG Correlates of Sinusoidal Flicker
- •15.2.2 MEG Correlates of Fluctuating Flicker Rates
- •15.3 EEG Correlates of Preferences for Oscillatory Movements
- •15.4 Hemispheric Specializations in Vision
- •16 Summary of Auditory and Visual Sensations
- •16.1 Auditory Sensations
- •16.1.1 Auditory Temporal Sensations
- •16.1.2 Auditory Spatial Sensations
- •16.1.3 Auditory Subjective Preferences
- •16.1.4 Effects of Noise on Tasks and Annoyance
- •16.2.1 Temporal and Spatial Sensations in Vision
- •16.2.2 Visual Subjective Preferences
- •References
- •Glossary of Symbols
- •Abbreviations
- •Author Index
- •Subject Index
9.2 Effects of Spatial Factors on Speech Reception |
185 |
In this investigation another factor Wφ(0) |
was not considered because |
only more recently, in 2007, was it identified as a factor related to timbre (Section 6.6). The factor Wφ(0) is related to WIACC, because both of them are determined by a signal’s frequency composition. Once the factor Wφ(0) is also taken into consideration, the present results may be explained more precisely.
9.2 Effects of Spatial Factors on Speech Reception
We are interested in the effect of sound fields on the interactions of sounds, and in particular how reverberant environments degrade speech sounds. In these experiments, a loudspeaker located in front of the listener presented single syllables, while continuous white noise as a disturbance was produced from another loudspeaker located at different horizontal angles. Three temporal factors and the sound energy were extracted from the ACF of the speech signal, and three spatial factors were extracted from the IACF. Results show that two factors had significant effects on syllable identification: the effective duration, (τ e)min, in the temporal factors extracted from the running ACF, and the WIACC in the spatial factors extracted from the IACF.
In the previous section, we discussed how temporal factors extracted from the running ACF are related to speech intelligibility in sound fields with single echos. The auditory model was used to attempt to account for the identification of single syllables in noise disturbances from different directions (Ando and Yamasaki, unpublished). It is assumed that the specialization of the human cerebral hemisphere may relate to the highly independent contributions of spatial and temporal factors on speech identification. It may be the case that “cocktail party effects” might well be explained by such specialization of the human brain, because speech is mainly processed in the left hemisphere, while spatial information is independently processed in the right hemisphere at the same time. Based on such a model, we have described temporal and spatial sensations in Chapters 6 and 7, respectively. According to the model shown in Fig. 5.1 three temporal factors associated with the left hemisphere together with the sound energy were extracted from the ACF of the sound signal arriving at one of ear entrances. In addition, three spatial factors associated with the right hemisphere were extracted from the IACF of sound signals arriving at the two ear entrances. The running ACF and the running IACF with the integration interval 2T = 30 ms were analyzed using running steps of 10 ms.
For identification of the speech signals, psychological distances between characteristics of single syllables are calculated by Equation (9.2). The distance is a function of four factors extracted from the ACF, and these are mainly associated with neuronal responses from the left cerebral hemisphere. In addition, to find effects of off-direction noise, three spatial factors are extracted from the IACF, which are associated with the right cerebral hemisphere (Fig. 5.1). The distances due to the spatial factors, DIACC, DτIACC, and DWIACC, respectively, are given by
186 |
9 Applications (II) – Speech Reception in Sound Fields |
|||||||
|
I |
SF |
|
|
T |
|
|
|
DIACC(X,K) = |
|IACCi |
− IACCi |
| |
/I |
|
|||
X |
K |
|
||||||
|
i=1 |
|
|
|
|
|
|
|
|
I |
SF |
|
|
T |
|
|
|
DτIACC (X,K) = |
|τIACCi |
− τIAi |
|
| |
/I |
(9.6) |
||
X |
CC |
K |
||||||
|
i=1 |
|
|
|
|
|
|
|
|
I |
SF |
|
|
T |
|
|
|
DWIACC (X,K) = |
|WIACCi |
− WIACCi |
| |
/I |
|
|||
X |
K |
|
||||||
|
i=1 |
|
|
|
|
|
|
|
In general, shorter distances between the template syllable and the syllables accompanied by noise signify higher intelligibilities. According to multiple regression analysis, the non-identification (NI) rate of syllables that were not matched with the template, has been directly calculated, so that
NI(S0,SX) = SL + SR = [aDτe + bDτ1 + cDφ1]L
(9.7)
+ [dD (0) + eDIACC + fDτIACC + gDWICAA]R
where SL = [aDτe + bDτ1 + cDφ1]L, SR = [dD (0) + eDIACC + fDτIACC + gDWIACC]R, and P(0) is measured in dBA. The seven factors are classified into the left and right hemispheres by the model (Fig. 5.1). Note that the listening level or (0) is associated with the right hemisphere (Table 5.1). Weighting coefficients a through g in Equation (9.7) were determined by maximizing NI with experimental data.
Fourteen single syllables, /pa/ /pu/ /te/ /zo/ /bo/ /yo/ /mi/ /ne/ /kya/ /kyo/ /pya/ /gya/ /nya/ /zya/, with 4-s intervals between syllables, were presented to each subject by the frontal loudspeaker (ξ = 0◦, the distance to the center of the subject’s head, d = 70 cm ± 1 cm) in an anechoic chamber. The white noise used as a disturbance was continuously produced by one of the loudspeakers located at different horizontal angles: ξ = 30◦, 60◦, 90◦, 120◦, or 180◦ (d = 70 cm). The sound-pressure level measured in terms of p(0) of both speech signals and the continuous white noise were fixed at 65.0 dBA at the peak level. Ten subjects participated in the experiment, who were asked to identify what syllable was heard.
For example, values of τe extracted from the running ACF for the signal /mi/ with and without the noise (ξ = 90◦) as a function of time are shown in Fig. 9.6. The important initial half parts of the speech signal indicating (0) < 0.5 as shown in Fig. 9.7 of both template and test syllables with the noise were applied in computation by Equations (9.6) and (9.7).
Results of the non-identification NI rate for some single syllables as a function of the horizontal angle ξ of the noise disturbance are shown in Fig. 9.8. Almost similar tendencies NI of these syllables were found. When the noise arrived from 30◦, the NI indicated the maxima in the horizontal angle range tested, and when the noise was presented from 120◦, it was the minima. The same was true for the averaged NI rate as shown in Fig. 9.9.
9.2 Effects of Spatial Factors on Speech Reception |
187 |
Fig. 9.6 Values of effective duration τe extracted from the running ACF for the frontal signal /mi/ only, and the /mi/
with the white noise from ξ = 90◦
Fig. 9.7 For making comparison, initial pieces analyzed of a frontal single syllable with and without the white noise from ξ = 90◦
Fig. 9.8 Examples of the percentage of nonidentification (NI) for single syllables as a function of the horizontal angle of the white noise from different horizontal angles ξ. At the horizontal angle ξ = 120◦, the percentage of NI was minimum for the single syllables
188 |
9 Applications (II) – Speech Reception in Sound Fields |
Fig. 9.9 Averaged percentile of nonidentified syllables with all single syllables tested obtained by the listening test for different angles ξ of white-noise incidence as a disturbance
Because the direct speech sound arrived from the frontal direction to the listener, the value of τIACC is always close to zero being invariant. Thus, this factor was eliminated from the analysis by Equation (9.7) (Table 9.3). The minima of the psychological distance were always found for the noise disturbance from 120◦, so that the NIs were minima. On the other hand, when the noise disturbance arrived from 30◦, the distance due to τe for all of the syllables commonly indicated the maxima in six factors.
Table 9.3 Psychological distance calculated due to each of six factors
Horizontal angle of noise |
D (0) |
Dτe |
Dτ1 |
Dφ1 |
DIACC |
DwIACC |
30◦ |
0.064 |
0.420 |
0.164 |
0.442 |
0.248 |
0.052 |
60◦ |
0.056 |
0.351 |
0.247 |
0.355 |
0.266 |
0.049 |
90◦ |
0.063 |
0.348 |
0.162 |
0.401 |
0.292 |
0.049 |
120◦ |
0.058 |
0.279 |
0.157 |
0.376 |
0.270 |
0.043 |
180◦ |
0.074 |
0.383 |
0.171 |
0.494 |
0.247 |
0.071 |
The weighting coefficients in Equation (9.7) for the six factors are listed in Table 9.4 . According to the weighting coefficients obtained here, the factors τe and WIACC contributed significantly to the NI. For each single syllable, the relationship between the calculated values by Equation (9.7) and the measured values are shown in Fig. 9.10. Obviously, the linear relationship was achieved (r = 0.86, p < 0.01).
Table 9.4 Weighting coefficients determined
|
(0) |
τe |
τ1 |
φ1 |
IACC |
WIACC |
Coefficient |
0.053 |
0.335 |
0.028 |
0.136 |
0.086 |
0.384 |
|
|
|
|
|
|
|
