- •Preface
- •Introduction
- •1.1 Spatial coordinate systems
- •1.2 Sound fields and their physical characteristics
- •1.2.1 Free-field and sound waves generated by simple sound sources
- •1.2.2 Reflections from boundaries
- •1.2.3 Directivity of sound source radiation
- •1.2.4 Statistical analysis of acoustics in an enclosed space
- •1.2.5 Principle of sound receivers
- •1.3 Auditory system and perception
- •1.3.1 Auditory system and its functions
- •1.3.2 Hearing threshold and loudness
- •1.3.3 Masking
- •1.3.4 Critical band and auditory filter
- •1.4 Artificial head models and binaural signals
- •1.4.1 Artificial head models
- •1.4.2 Binaural signals and head-related transfer functions
- •1.5 Outline of spatial hearing
- •1.6 Localization cues for a single sound source
- •1.6.1 Interaural time difference
- •1.6.2 Interaural level difference
- •1.6.3 Cone of confusion and head movement
- •1.6.4 Spectral cues
- •1.6.5 Discussion on directional localization cues
- •1.6.6 Auditory distance perception
- •1.7 Summing localization and spatial hearing with multiple sources
- •1.7.1 Summing localization with two sound sources
- •1.7.2 The precedence effect
- •1.7.3 Spatial auditory perceptions with partially correlated and uncorrelated source signals
- •1.7.4 Auditory scene analysis and spatial hearing
- •1.7.5 Cocktail party effect
- •1.8 Room reflections and auditory spatial impression
- •1.8.1 Auditory spatial impression
- •1.8.2 Sound field-related measures and auditory spatial impression
- •1.8.3 Binaural-related measures and auditory spatial impression
- •1.9.1 Basic principle of spatial sound
- •1.9.2 Classification of spatial sound
- •1.9.3 Developments and applications of spatial sound
- •1.10 Summary
- •2.1 Basic principle of a two-channel stereophonic sound
- •2.1.1 Interchannel level difference and summing localization equation
- •2.1.2 Effect of frequency
- •2.1.3 Effect of interchannel phase difference
- •2.1.4 Virtual source created by interchannel time difference
- •2.1.5 Limitation of two-channel stereophonic sound
- •2.2.1 XY microphone pair
- •2.2.2 MS transformation and the MS microphone pair
- •2.2.3 Spaced microphone technique
- •2.2.4 Near-coincident microphone technique
- •2.2.5 Spot microphone and pan-pot technique
- •2.2.6 Discussion on microphone and signal simulation techniques for two-channel stereophonic sound
- •2.3 Upmixing and downmixing between two-channel stereophonic and mono signals
- •2.4 Two-channel stereophonic reproduction
- •2.4.1 Standard loudspeaker configuration of two-channel stereophonic sound
- •2.4.2 Influence of front-back deviation of the head
- •2.5 Summary
- •3.1 Physical and psychoacoustic principles of multichannel surround sound
- •3.2 Summing localization in multichannel horizontal surround sound
- •3.2.1 Summing localization equations for multiple horizontal loudspeakers
- •3.2.2 Analysis of the velocity and energy localization vectors of the superposed sound field
- •3.2.3 Discussion on horizontal summing localization equations
- •3.3 Multiple loudspeakers with partly correlated and low-correlated signals
- •3.4 Summary
- •4.1 Discrete quadraphone
- •4.1.1 Outline of the quadraphone
- •4.1.2 Discrete quadraphone with pair-wise amplitude panning
- •4.1.3 Discrete quadraphone with the first-order sound field signal mixing
- •4.1.4 Some discussions on discrete quadraphones
- •4.2 Other horizontal surround sounds with regular loudspeaker configurations
- •4.2.1 Six-channel reproduction with pair-wise amplitude panning
- •4.2.2 The first-order sound field signal mixing and reproduction with M ≥ 3 loudspeakers
- •4.3 Transformation of horizontal sound field signals and Ambisonics
- •4.3.1 Transformation of the first-order horizontal sound field signals
- •4.3.2 The first-order horizontal Ambisonics
- •4.3.3 The higher-order horizontal Ambisonics
- •4.3.4 Discussion and implementation of the horizontal Ambisonics
- •4.4 Summary
- •5.1 Outline of surround sounds with accompanying picture and general uses
- •5.2 5.1-Channel surround sound and its signal mixing analysis
- •5.2.1 Outline of 5.1-channel surround sound
- •5.2.2 Pair-wise amplitude panning for 5.1-channel surround sound
- •5.2.3 Global Ambisonic-like signal mixing for 5.1-channel sound
- •5.2.4 Optimization of three frontal loudspeaker signals and local Ambisonic-like signal mixing
- •5.2.5 Time panning for 5.1-channel surround sound
- •5.3 Other multichannel horizontal surround sounds
- •5.4 Low-frequency effect channel
- •5.5 Summary
- •6.1 Summing localization in multichannel spatial surround sound
- •6.1.1 Summing localization equations for spatial multiple loudspeaker configurations
- •6.1.2 Velocity and energy localization vector analysis for multichannel spatial surround sound
- •6.1.3 Discussion on spatial summing localization equations
- •6.1.4 Relationship with the horizontal summing localization equations
- •6.2 Signal mixing methods for a pair of vertical loudspeakers in the median and sagittal plane
- •6.3 Vector base amplitude panning
- •6.4 Spatial Ambisonic signal mixing and reproduction
- •6.4.1 Principle of spatial Ambisonics
- •6.4.2 Some examples of the first-order spatial Ambisonics
- •6.4.4 Recreating a top virtual source with a horizontal loudspeaker arrangement and Ambisonic signal mixing
- •6.5 Advanced multichannel spatial surround sounds and problems
- •6.5.1 Some advanced multichannel spatial surround sound techniques and systems
- •6.5.2 Object-based spatial sound
- •6.5.3 Some problems related to multichannel spatial surround sound
- •6.6 Summary
- •7.1 Basic considerations on the microphone and signal simulation techniques for multichannel sounds
- •7.2 Microphone techniques for 5.1-channel sound recording
- •7.2.1 Outline of microphone techniques for 5.1-channel sound recording
- •7.2.2 Main microphone techniques for 5.1-channel sound recording
- •7.2.3 Microphone techniques for the recording of three frontal channels
- •7.2.4 Microphone techniques for ambience recording and combination with frontal localization information recording
- •7.2.5 Stereophonic plus center channel recording
- •7.3 Microphone techniques for other multichannel sounds
- •7.3.1 Microphone techniques for other discrete multichannel sounds
- •7.3.2 Microphone techniques for Ambisonic recording
- •7.4 Simulation of localization signals for multichannel sounds
- •7.4.1 Methods of the simulation of directional localization signals
- •7.4.2 Simulation of virtual source distance and extension
- •7.4.3 Simulation of a moving virtual source
- •7.5 Simulation of reflections for stereophonic and multichannel sounds
- •7.5.1 Delay algorithms and discrete reflection simulation
- •7.5.2 IIR filter algorithm of late reverberation
- •7.5.3 FIR, hybrid FIR, and recursive filter algorithms of late reverberation
- •7.5.4 Algorithms of audio signal decorrelation
- •7.5.5 Simulation of room reflections based on physical measurement and calculation
- •7.6 Directional audio coding and multichannel sound signal synthesis
- •7.7 Summary
- •8.1 Matrix surround sound
- •8.1.1 Matrix quadraphone
- •8.1.2 Dolby Surround system
- •8.1.3 Dolby Pro-Logic decoding technique
- •8.1.4 Some developments on matrix surround sound and logic decoding techniques
- •8.2 Downmixing of multichannel sound signals
- •8.3 Upmixing of multichannel sound signals
- •8.3.1 Some considerations in upmixing
- •8.3.2 Simple upmixing methods for front-channel signals
- •8.3.3 Simple methods for Ambient component separation
- •8.3.4 Model and statistical characteristics of two-channel stereophonic signals
- •8.3.5 A scale-signal-based algorithm for upmixing
- •8.3.6 Upmixing algorithm based on principal component analysis
- •8.3.7 Algorithm based on the least mean square error for upmixing
- •8.3.8 Adaptive normalized algorithm based on the least mean square for upmixing
- •8.3.9 Some advanced upmixing algorithms
- •8.4 Summary
- •9.1 Each order approximation of ideal reproduction and Ambisonics
- •9.1.1 Each order approximation of ideal horizontal reproduction
- •9.1.2 Each order approximation of ideal three-dimensional reproduction
- •9.2 General formulation of multichannel sound field reconstruction
- •9.2.1 General formulation of multichannel sound field reconstruction in the spatial domain
- •9.2.2 Formulation of spatial-spectral domain analysis of circular secondary source array
- •9.2.3 Formulation of spatial-spectral domain analysis for a secondary source array on spherical surface
- •9.3 Spatial-spectral domain analysis and driving signals of Ambisonics
- •9.3.1 Reconstructed sound field of horizontal Ambisonics
- •9.3.2 Reconstructed sound field of spatial Ambisonics
- •9.3.3 Mixed-order Ambisonics
- •9.3.4 Near-field compensated higher-order Ambisonics
- •9.3.5 Ambisonic encoding of complex source information
- •9.3.6 Some special applications of spatial-spectral domain analysis of Ambisonics
- •9.4 Some problems related to Ambisonics
- •9.4.1 Secondary source array and stability of Ambisonics
- •9.4.2 Spatial transformation of Ambisonic sound field
- •9.5 Error analysis of Ambisonic-reconstructed sound field
- •9.5.1 Integral error of Ambisonic-reconstructed wavefront
- •9.5.2 Discrete secondary source array and spatial-spectral aliasing error in Ambisonics
- •9.6 Multichannel reconstructed sound field analysis in the spatial domain
- •9.6.1 Basic method for analysis in the spatial domain
- •9.6.2 Minimizing error in reconstructed sound field and summing localization equation
- •9.6.3 Multiple receiver position matching method and its relation to the mode-matching method
- •9.7 Listening room reflection compensation in multichannel sound reproduction
- •9.8 Microphone array for multichannel sound field signal recording
- •9.8.1 Circular microphone array for horizontal Ambisonic recording
- •9.8.2 Spherical microphone array for spatial Ambisonic recording
- •9.8.3 Discussion on microphone array recording
- •9.9 Summary
- •10.1 Basic principle and implementation of wave field synthesis
- •10.1.1 Kirchhoff–Helmholtz boundary integral and WFS
- •10.1.2 Simplification of the types of secondary sources
- •10.1.3 WFS in a horizontal plane with a linear array of secondary sources
- •10.1.4 Finite secondary source array and effect of spatial truncation
- •10.1.5 Discrete secondary source array and spatial aliasing
- •10.1.6 Some issues and related problems on WFS implementation
- •10.2 General theory of WFS
- •10.2.1 Green’s function of Helmholtz equation
- •10.2.2 General theory of three-dimensional WFS
- •10.2.3 General theory of two-dimensional WFS
- •10.2.4 Focused source in WFS
- •10.3 Analysis of WFS in the spatial-spectral domain
- •10.3.1 General formulation and analysis of WFS in the spatial-spectral domain
- •10.3.2 Analysis of the spatial aliasing in WFS
- •10.3.3 Spatial-spectral division method of WFS
- •10.4 Further discussion on sound field reconstruction
- •10.4.1 Comparison among various methods of sound field reconstruction
- •10.4.2 Further analysis of the relationship between acoustical holography and sound field reconstruction
- •10.4.3 Further analysis of the relationship between acoustical holography and Ambisonics
- •10.4.4 Comparison between WFS and Ambisonics
- •10.5 Equalization of WFS under nonideal conditions
- •10.6 Summary
- •11.1 Basic principles of binaural reproduction and virtual auditory display
- •11.1.1 Binaural recording and reproduction
- •11.1.2 Virtual auditory display
- •11.2 Acquisition of HRTFs
- •11.2.1 HRTF measurement
- •11.2.2 HRTF calculation
- •11.2.3 HRTF customization
- •11.3 Basic physical features of HRTFs
- •11.3.1 Time-domain features of far-field HRIRs
- •11.3.2 Frequency domain features of far-field HRTFs
- •11.3.3 Features of near-field HRTFs
- •11.4 HRTF-based filters for binaural synthesis
- •11.5 Spatial interpolation and decomposition of HRTFs
- •11.5.1 Directional interpolation of HRTFs
- •11.5.2 Spatial basis function decomposition and spatial sampling theorem of HRTFs
- •11.5.3 HRTF spatial interpolation and signal mixing for multichannel sound
- •11.5.4 Spectral shape basis function decomposition of HRTFs
- •11.6 Simplification of signal processing for binaural synthesis
- •11.6.1 Virtual loudspeaker-based algorithms
- •11.6.2 Basis function decomposition-based algorithms
- •11.7.1 Principle of headphone equalization
- •11.7.2 Some problems with binaural reproduction and VAD
- •11.8 Binaural reproduction through loudspeakers
- •11.8.1 Basic principle of binaural reproduction through loudspeakers
- •11.8.2 Virtual source distribution in two-front loudspeaker reproduction
- •11.8.3 Head movement and stability of virtual sources in Transaural reproduction
- •11.8.4 Timbre coloration and equalization in transaural reproduction
- •11.9 Virtual reproduction of stereophonic and multichannel surround sound
- •11.9.1 Binaural reproduction of stereophonic and multichannel sound through headphones
- •11.9.2 Stereophonic expansion and enhancement
- •11.9.3 Virtual reproduction of multichannel sound through loudspeakers
- •11.10.1 Binaural room modeling
- •11.10.2 Dynamic virtual auditory environments system
- •11.11 Summary
- •12.1 Physical analysis of binaural pressures in summing virtual source and auditory events
- •12.1.1 Evaluation of binaural pressures and localization cues
- •12.1.2 Method for summing localization analysis
- •12.1.3 Binaural pressure analysis of stereophonic and multichannel sound with amplitude panning
- •12.1.4 Analysis of summing localization with interchannel time difference
- •12.1.5 Analysis of summing localization at the off-central listening position
- •12.1.6 Analysis of interchannel correlation and spatial auditory sensations
- •12.2 Binaural auditory models and analysis of spatial sound reproduction
- •12.2.1 Analysis of lateral localization by using auditory models
- •12.2.2 Analysis of front-back and vertical localization by using a binaural auditory model
- •12.2.3 Binaural loudness models and analysis of the timbre of spatial sound reproduction
- •12.3 Binaural measurement system for assessing spatial sound reproduction
- •12.4 Summary
- •13.1 Analog audio storage and transmission
- •13.1.1 45°/45° Disk recording system
- •13.1.2 Analog magnetic tape audio recorder
- •13.1.3 Analog stereo broadcasting
- •13.2 Basic concepts of digital audio storage and transmission
- •13.3 Quantization noise and shaping
- •13.3.1 Signal-to-quantization noise ratio
- •13.3.2 Quantization noise shaping and 1-Bit DSD coding
- •13.4 Basic principle of digital audio compression and coding
- •13.4.1 Outline of digital audio compression and coding
- •13.4.2 Adaptive differential pulse-code modulation
- •13.4.3 Perceptual audio coding in the time-frequency domain
- •13.4.4 Vector quantization
- •13.4.5 Spatial audio coding
- •13.4.6 Spectral band replication
- •13.4.7 Entropy coding
- •13.4.8 Object-based audio coding
- •13.5 MPEG series of audio coding techniques and standards
- •13.5.1 MPEG-1 audio coding technique
- •13.5.2 MPEG-2 BC audio coding
- •13.5.3 MPEG-2 advanced audio coding
- •13.5.4 MPEG-4 audio coding
- •13.5.5 MPEG parametric coding of multichannel sound and unified speech and audio coding
- •13.5.6 MPEG-H 3D audio
- •13.6 Dolby series of coding techniques
- •13.6.1 Dolby digital coding technique
- •13.6.2 Some advanced Dolby coding techniques
- •13.7 DTS series of coding technique
- •13.8 MLP lossless coding technique
- •13.9 ATRAC technique
- •13.10 Audio video coding standard
- •13.11 Optical disks for audio storage
- •13.11.1 Structure, principle, and classification of optical disks
- •13.11.2 CD family and its audio formats
- •13.11.3 DVD family and its audio formats
- •13.11.4 SACD and its audio formats
- •13.11.5 BD and its audio formats
- •13.12 Digital radio and television broadcasting
- •13.12.1 Outline of digital radio and television broadcasting
- •13.12.2 Eureka-147 digital audio broadcasting
- •13.12.3 Digital radio mondiale
- •13.12.4 In-band on-channel digital audio broadcasting
- •13.12.5 Audio for digital television
- •13.13 Audio storage and transmission by personal computer
- •13.14 Summary
- •14.1 Outline of acoustic conditions and requirements for spatial sound intended for domestic reproduction
- •14.2 Acoustic consideration and design of listening rooms
- •14.3 Arrangement and characteristics of loudspeakers
- •14.3.1 Arrangement of the main loudspeakers in listening rooms
- •14.3.2 Characteristics of the main loudspeakers
- •14.3.3 Bass management and arrangement of subwoofers
- •14.4 Signal and listening level alignment
- •14.5 Standards and guidance for conditions of spatial sound reproduction
- •14.6 Headphones and binaural monitors of spatial sound reproduction
- •14.7 Acoustic conditions for cinema sound reproduction and monitoring
- •14.8 Summary
- •15.1 Outline of psychoacoustic and subjective assessment experiments
- •15.2 Contents and attributes for spatial sound assessment
- •15.3 Auditory comparison and discrimination experiment
- •15.3.1 Paradigms of auditory comparison and discrimination experiment
- •15.3.2 Examples of auditory comparison and discrimination experiment
- •15.4 Subjective assessment of small impairments in spatial sound systems
- •15.5 Subjective assessment of a spatial sound system with intermediate quality
- •15.6 Virtual source localization experiment
- •15.6.1 Basic methods for virtual source localization experiments
- •15.6.2 Preliminary analysis of the results of virtual source localization experiments
- •15.6.3 Some results of virtual source localization experiments
- •15.7 Summary
- •16.1.1 Application to commercial cinema and related problems
- •16.1.2 Applications to domestic reproduction and related problems
- •16.1.3 Applications to automobile audio
- •16.2.1 Applications to virtual reality
- •16.2.2 Applications to communication and information systems
- •16.2.3 Applications to multimedia
- •16.2.4 Applications to mobile and handheld devices
- •16.3 Applications to the scientific experiments of spatial hearing and psychoacoustics
- •16.4 Applications to sound field auralization
- •16.4.1 Auralization in room acoustics
- •16.4.2 Other applications of auralization technique
- •16.5 Applications to clinical medicine
- •16.6 Summary
- •References
- •Index
Two-channel stereophonic sound 83
Figure 2.5 Trade-off curves of the combined ICLD and ICTD by Williams. (adapted from Williams 2013).
2.1.4 Virtual source created by interchannel time difference
As stated in Section 1.7.1, for some signals with transient characteristics, the method of interchannel time difference (ICTD) or a combination of ICTD and ICLD can be used to recreate a virtual source. The trade-off curves of the combined ICLD and ICTD are applicable to analyze the direction of the summing virtual source. Figure 2.5 illustrates an example of these curves, which were derived by Williams (1987) through interpolation of the original data obtained by Simonson (1984) and called Williams curves. The original data are measured through a localization experiment by using the stimuli of speech and maracas. The trade-off curves for standard loudspeaker configuration with 2θ0 = 60° and three target azimuths of θI = 10°, 20°, and 30° are shown in the figure.
The localization results of the combined ICLD and ICTD vary among studies. For example, Figure 1.26 indicates that the mean perceived virtual source azimuth θI measured by Simonsen for an ICLD only is larger than those in the two other curves. As stated in Section 1.7.1, the perceived virtual source azimuth depends on stimuli and other experimental conditions. The curves in Figure 2.5 are often used in practice because Simonsen’s data are measured from natural stimuli rather than artificial ones. These differences have been observed, and the trade-off curves of the combined ICLD and ICTD have been remeasured using musical stimuli in some studies (Lee, 2010).
ICTD-based summing localization in two or more loudspeakers is a psychoacoustic phenomenon, but physical interpretations or models for this phenomenon have not been developed yet.This situation is different from the case of ICLD-based summing localization. Similar to the precedence effect, a neurophysiologic experiment on cats has demonstrated that the responses of inferior colliculus neurons caused by ICTD signals match those caused by the target source at the summing localization direction (Yin, 1994).Therefore, ICTD-based summing localization may be interpreted at the level of the neurophysiology of hearing.
2.1.5 Limitation of two-channel stereophonic sound
The spatial information of sound includes the localization information of sound sources and the comprehensive spatial information of environment reflections. In two-channel stereophonic sound, the spatial information of sound is represented by the relative relationship
84 Spatial Sound
between two channels or loudspeaker signals with various manners, resulting in various subjective perceptions or sensations in reproduction.
The directional localization information of the target source can be represented by the ICLD between two-channel signals. This presentation is termed amplitude stereophonic sound, level-difference stereophonic sound, or intensity stereophonic sound. The theory of amplitude stereophonic sound is relatively mature. As stated in previous sections, at low frequencies with f ≤ 0.7 kHz, the interaural phase delay difference ITDp created by in-phase loudspeaker signals with ICLD only matches with that of the target source. Within a frequency range of 0.7–1.5 kHz, ITDp created by ICLD is qualitatively consistent with but quantitatively deviates from that of the target source, resulting in frequency-dependent perceived source direction. At high frequencies (above 1.5 kHz), ICLD may result in interaural localization cues (such as ILD) that are quantitatively inconsistent with those of the target source. However, ICLD does not lead to conflicting interaural localization cues at high frequencies. For wideband stimuli with a low-frequency component below 1.5 kHz, ITDp dominates azimuthal localization. Therefore, ICLD yields an appropriate localization perception of a virtual source.
Summing localization with two loudspeakers can be further analyzed in terms of the reproduced sound field. Figure 2.6 illustrates the wave front amplitude of the superposed sound pressures created by two stereophonic loudspeakers (approximated as point sources) with identical signal amplitudes AL = AR. The distance between the loudspeakers and the origin of the coordinate is r0 = 2.5 m. The span angle between two loudspeakers is 2θ0 = 60°. Figure 2.6(a) and 2.6(b) present the results of the harmonic wave at f = 0.5 kHz and 1.5 kHz, respectively. In the regions adjacent to either of the loudspeakers, the wavefront is dominated by the spherical wave generated by the loudspeaker. Within a small region adjacent to the central line (bounded by the two dash lines in the figures), the superposed wavefront is approximated to that of a spherical wave incidence from the frontal direction θI= 0°. In the far-field distance, the superposed wavefront is further approximated to that of a plane wave incidence from the frontal direction. However, apart from the adjacent region of the central line, the superposed wavefront is no longer a plane or spherical wave. As frequency increases, the width of the region for reconstructing the plane or spherical wavefront narrows. As the receiver position moves toward the back, the span angle between two loudspeakers with respect to the receiver
(a) f = 0.5 kHz |
(b) f = 1.5 kHz |
Figure 2.6 Wavefront amplitude of the superposed sound pressures in stereophonic reproduction with identical signal amplitudes (a) f = 0.5 kHz; (b) f = 1.5 kHz.
Two-channel stereophonic sound 85
position reduces, and the width of the region for reconstructing the plane or spherical wavefront broadens. With an appropriate span angle between two loudspeakers and within the low-frequency range of f ≤ 0.7 kHz, the amplitude stereophonic sound can reconstruct the target plane or spherical wavefront in a region whose width matches the size of the head (Makita, 1962; Bennett et al., 1985). Therefore, the two-channel amplitude stereophonic sound is a typical example of a spatial sound based on the sound field approximation and psychoacoustics. Overall, the two-channel amplitude stereophonic sound can recreate a relatively authentic and natural virtual source between two loudspeakers.
The frequency-independent interchannel phase difference gives rise to conflicting interaural localization cues and consequently degrades the perceived quality of a virtual source or prevents localization. Two-channel out-of-phase signals may be applicable to recreate an outside-boundary virtual source and then broadens the frontal stereophonic stage. However, the resultant virtual source position is unstable as frequency varies. It is also unstable even when the listening position slightly changes (Section 2.4.2). Moreover, in some cases, twochannel out-of-phase signals may create an unnatural auditory event with an uncertain position.
For some signals with transient characteristics, the method of ICTD or a combination of ICLD and ICTD can be used to recreate a virtual source. This method has been applied to design some microphone techniques for two-channel stereophonic recording. The ICTDbased method is termed time (difference) stereophonic sound. The combination method of ICLD and ICTD is termed a combined amplitude and time stereophonic sound or intensity and time difference stereophonic sound. In Section 2.1.4, physical models for ICTD-based or a combination of ICLD–ICTD-based summing localization are unavailable. This situation is different from ICLD-based summing localization. Therefore, the time stereophonic sound and the combined amplitude and time stereophonic sound are usually designed on the basis of psychoacoustic experimental results, such as the Williams curves shown in Figure 2.5 (Williams, 1987; Wittek and Theile, 2002). Generally, the perceived quality of the virtual source created via the ICTD-based method is inferior to that created via the ICLD-based method. For practical (wideband) stimuli, the ICTD-based virtual source is blurry, with less naturalness and authenticity. The perceived direction of the virtual source also depends on the spectra and transient characteristics of the stimuli.
Overall, for any loudspeaker signal method, the two-channel stereophonic sound is unable to recreate the spatial information in the full horizontal plane, to say nothing of recreating the spatial information in a fully three-dimensional space. Generally, two-channel stereophonic sound can recreate spatial information within a frontal-horizontal sector bounded by two loudspeakers. Although the case of an outside-boundary virtual source is considered, twochannel stereophonic sound is theoretically able to recreate spatial information extending to the frontal-horizontal quadrants at most. However, the outside-boundary virtual source is usually unstable. These factors are limitations of two-channel stereophonic sound. For many practical applications, such as music reproduction or television sound, if a listener’s attention is focused to the frontal direction, two-channel stereophonic sound may meet the requirements to some extent.
The analysis in this section focuses on the methods for representing or encoding the directional information of target virtual sources in two-channel stereophonic signals. The spatial position of an actual sound source or virtual source is specified by its direction and distance. Although auditory distance perception is biased, the relative perceived distance of auditory events in spatial sound reproduction may be controlled by using some appropriate signal simulation and microphone techniques. Various possible cues for auditory distance perception discussed in Section 1.6.6 may be used to control auditory distance perception in reproduction. However, altering the ratio of direct and reflected sound energy in signals is a major
86 Spatial Sound
means to control the perceived distance in two-channel stereophonic and multichannel sound reproduction. Increasing the relative proportion of reflected sound creates a more distant auditory event or perception.
In Section 1.8, early lateral reflections and late diffuse reverberation are important for the sensations of an auditory source width and listener envelopment in a concert hall. Limited by its ability, two-channel stereophonic sound is unable to recreate the spatial information of these reflections exactly. Appropriate microphone and signal simulation techniques improve the perceived performance of reflected sound in stereophonic sound reproduction to some extent. Some psychoacoustic methods are available for recreating sensations similar to those caused by the reflections in a hall. For example, the perceived virtual source width can be controlled by introducing a small interchannel phase difference between two channel signals. In Section 1.7.3, the auditory event broadens and becomes blurred because of the reduction of positive interchannel correlation.
The aforementioned methods are applicable to represent the spatial information in twochannel stereophonic signals and then recreate various target auditory perceptions or sensations in reproduction. Some methods for a two-channel stereophonic sound are not based on strict acoustic theory. Instead, they are based on psychoacoustic experimental results, relevant experience, and requirements for practical uses. This problem is dealt with in the two-channel stereophonic recording discussed in the next section and the multichannel surround sound discussed in the succeeding chapters. It is also a feature of various spatial sound techniques based on sound field approximation and psychoacoustic principles.
2.2 MICROPHONE AND SIGNAL SIMULATION TECHNIQUES FOR TWOCHANNEL STEREOPHONIC SOUND
Two-channel stereophonic sound is popular in consumer use. It is usually applied to reproduce music (including classical and pop music), speech, and other program materials. However, the ability of a two-channel stereophonic sound to transmit and reproduce the spatial information of a sound field is limited. The key is the manner by which this limited ability is utilized properly to transmit and reproduce the desired information (including the localization information of direct sound and the comprehensive information of reflections) essential for auditory perceptions as much as possible.
As the first stage in the system chain of a two-channel stereophonic system, signal recording or picking up involves using some appropriate microphone techniques to capture the spatial information of an on-site sound field. Signal simulation or synthesis is a process by which appropriate signal processing techniques are utilized to artificially create the desired spatial information of sound. In accordance with the basic principle of stereophonic sound discussed in Section 2.1, the spatial information of sound is encoded into two-channel stereophonic signals. Various techniques for stereophonic signals recording and simulation have been developed and can be roughly classified into four categories.
The first category is the coincident microphone technique. It was developed on the basis of Blumlein’s patent in the 1930s (Blumlein, 1931). A pair of spatially coincident microphones with appropriate directivity is used to capture stereophonic signals. The directivity of a microphone pair encodes the direction information of a source into two channel signals with direction-dependent ICLD only. The coincident microphone technique can be further divided into two sub-categories, i.e., XY and mid-side (MS) microphone pair techniques.
