
- •Preface
- •Introduction
- •1.1 Spatial coordinate systems
- •1.2 Sound fields and their physical characteristics
- •1.2.1 Free-field and sound waves generated by simple sound sources
- •1.2.2 Reflections from boundaries
- •1.2.3 Directivity of sound source radiation
- •1.2.4 Statistical analysis of acoustics in an enclosed space
- •1.2.5 Principle of sound receivers
- •1.3 Auditory system and perception
- •1.3.1 Auditory system and its functions
- •1.3.2 Hearing threshold and loudness
- •1.3.3 Masking
- •1.3.4 Critical band and auditory filter
- •1.4 Artificial head models and binaural signals
- •1.4.1 Artificial head models
- •1.4.2 Binaural signals and head-related transfer functions
- •1.5 Outline of spatial hearing
- •1.6 Localization cues for a single sound source
- •1.6.1 Interaural time difference
- •1.6.2 Interaural level difference
- •1.6.3 Cone of confusion and head movement
- •1.6.4 Spectral cues
- •1.6.5 Discussion on directional localization cues
- •1.6.6 Auditory distance perception
- •1.7 Summing localization and spatial hearing with multiple sources
- •1.7.1 Summing localization with two sound sources
- •1.7.2 The precedence effect
- •1.7.3 Spatial auditory perceptions with partially correlated and uncorrelated source signals
- •1.7.4 Auditory scene analysis and spatial hearing
- •1.7.5 Cocktail party effect
- •1.8 Room reflections and auditory spatial impression
- •1.8.1 Auditory spatial impression
- •1.8.2 Sound field-related measures and auditory spatial impression
- •1.8.3 Binaural-related measures and auditory spatial impression
- •1.9.1 Basic principle of spatial sound
- •1.9.2 Classification of spatial sound
- •1.9.3 Developments and applications of spatial sound
- •1.10 Summary
- •2.1 Basic principle of a two-channel stereophonic sound
- •2.1.1 Interchannel level difference and summing localization equation
- •2.1.2 Effect of frequency
- •2.1.3 Effect of interchannel phase difference
- •2.1.4 Virtual source created by interchannel time difference
- •2.1.5 Limitation of two-channel stereophonic sound
- •2.2.1 XY microphone pair
- •2.2.2 MS transformation and the MS microphone pair
- •2.2.3 Spaced microphone technique
- •2.2.4 Near-coincident microphone technique
- •2.2.5 Spot microphone and pan-pot technique
- •2.2.6 Discussion on microphone and signal simulation techniques for two-channel stereophonic sound
- •2.3 Upmixing and downmixing between two-channel stereophonic and mono signals
- •2.4 Two-channel stereophonic reproduction
- •2.4.1 Standard loudspeaker configuration of two-channel stereophonic sound
- •2.4.2 Influence of front-back deviation of the head
- •2.5 Summary
- •3.1 Physical and psychoacoustic principles of multichannel surround sound
- •3.2 Summing localization in multichannel horizontal surround sound
- •3.2.1 Summing localization equations for multiple horizontal loudspeakers
- •3.2.2 Analysis of the velocity and energy localization vectors of the superposed sound field
- •3.2.3 Discussion on horizontal summing localization equations
- •3.3 Multiple loudspeakers with partly correlated and low-correlated signals
- •3.4 Summary
- •4.1 Discrete quadraphone
- •4.1.1 Outline of the quadraphone
- •4.1.2 Discrete quadraphone with pair-wise amplitude panning
- •4.1.3 Discrete quadraphone with the first-order sound field signal mixing
- •4.1.4 Some discussions on discrete quadraphones
- •4.2 Other horizontal surround sounds with regular loudspeaker configurations
- •4.2.1 Six-channel reproduction with pair-wise amplitude panning
- •4.2.2 The first-order sound field signal mixing and reproduction with M ≥ 3 loudspeakers
- •4.3 Transformation of horizontal sound field signals and Ambisonics
- •4.3.1 Transformation of the first-order horizontal sound field signals
- •4.3.2 The first-order horizontal Ambisonics
- •4.3.3 The higher-order horizontal Ambisonics
- •4.3.4 Discussion and implementation of the horizontal Ambisonics
- •4.4 Summary
- •5.1 Outline of surround sounds with accompanying picture and general uses
- •5.2 5.1-Channel surround sound and its signal mixing analysis
- •5.2.1 Outline of 5.1-channel surround sound
- •5.2.2 Pair-wise amplitude panning for 5.1-channel surround sound
- •5.2.3 Global Ambisonic-like signal mixing for 5.1-channel sound
- •5.2.4 Optimization of three frontal loudspeaker signals and local Ambisonic-like signal mixing
- •5.2.5 Time panning for 5.1-channel surround sound
- •5.3 Other multichannel horizontal surround sounds
- •5.4 Low-frequency effect channel
- •5.5 Summary
- •6.1 Summing localization in multichannel spatial surround sound
- •6.1.1 Summing localization equations for spatial multiple loudspeaker configurations
- •6.1.2 Velocity and energy localization vector analysis for multichannel spatial surround sound
- •6.1.3 Discussion on spatial summing localization equations
- •6.1.4 Relationship with the horizontal summing localization equations
- •6.2 Signal mixing methods for a pair of vertical loudspeakers in the median and sagittal plane
- •6.3 Vector base amplitude panning
- •6.4 Spatial Ambisonic signal mixing and reproduction
- •6.4.1 Principle of spatial Ambisonics
- •6.4.2 Some examples of the first-order spatial Ambisonics
- •6.4.4 Recreating a top virtual source with a horizontal loudspeaker arrangement and Ambisonic signal mixing
- •6.5 Advanced multichannel spatial surround sounds and problems
- •6.5.1 Some advanced multichannel spatial surround sound techniques and systems
- •6.5.2 Object-based spatial sound
- •6.5.3 Some problems related to multichannel spatial surround sound
- •6.6 Summary
- •7.1 Basic considerations on the microphone and signal simulation techniques for multichannel sounds
- •7.2 Microphone techniques for 5.1-channel sound recording
- •7.2.1 Outline of microphone techniques for 5.1-channel sound recording
- •7.2.2 Main microphone techniques for 5.1-channel sound recording
- •7.2.3 Microphone techniques for the recording of three frontal channels
- •7.2.4 Microphone techniques for ambience recording and combination with frontal localization information recording
- •7.2.5 Stereophonic plus center channel recording
- •7.3 Microphone techniques for other multichannel sounds
- •7.3.1 Microphone techniques for other discrete multichannel sounds
- •7.3.2 Microphone techniques for Ambisonic recording
- •7.4 Simulation of localization signals for multichannel sounds
- •7.4.1 Methods of the simulation of directional localization signals
- •7.4.2 Simulation of virtual source distance and extension
- •7.4.3 Simulation of a moving virtual source
- •7.5 Simulation of reflections for stereophonic and multichannel sounds
- •7.5.1 Delay algorithms and discrete reflection simulation
- •7.5.2 IIR filter algorithm of late reverberation
- •7.5.3 FIR, hybrid FIR, and recursive filter algorithms of late reverberation
- •7.5.4 Algorithms of audio signal decorrelation
- •7.5.5 Simulation of room reflections based on physical measurement and calculation
- •7.6 Directional audio coding and multichannel sound signal synthesis
- •7.7 Summary
- •8.1 Matrix surround sound
- •8.1.1 Matrix quadraphone
- •8.1.2 Dolby Surround system
- •8.1.3 Dolby Pro-Logic decoding technique
- •8.1.4 Some developments on matrix surround sound and logic decoding techniques
- •8.2 Downmixing of multichannel sound signals
- •8.3 Upmixing of multichannel sound signals
- •8.3.1 Some considerations in upmixing
- •8.3.2 Simple upmixing methods for front-channel signals
- •8.3.3 Simple methods for Ambient component separation
- •8.3.4 Model and statistical characteristics of two-channel stereophonic signals
- •8.3.5 A scale-signal-based algorithm for upmixing
- •8.3.6 Upmixing algorithm based on principal component analysis
- •8.3.7 Algorithm based on the least mean square error for upmixing
- •8.3.8 Adaptive normalized algorithm based on the least mean square for upmixing
- •8.3.9 Some advanced upmixing algorithms
- •8.4 Summary
- •9.1 Each order approximation of ideal reproduction and Ambisonics
- •9.1.1 Each order approximation of ideal horizontal reproduction
- •9.1.2 Each order approximation of ideal three-dimensional reproduction
- •9.2 General formulation of multichannel sound field reconstruction
- •9.2.1 General formulation of multichannel sound field reconstruction in the spatial domain
- •9.2.2 Formulation of spatial-spectral domain analysis of circular secondary source array
- •9.2.3 Formulation of spatial-spectral domain analysis for a secondary source array on spherical surface
- •9.3 Spatial-spectral domain analysis and driving signals of Ambisonics
- •9.3.1 Reconstructed sound field of horizontal Ambisonics
- •9.3.2 Reconstructed sound field of spatial Ambisonics
- •9.3.3 Mixed-order Ambisonics
- •9.3.4 Near-field compensated higher-order Ambisonics
- •9.3.5 Ambisonic encoding of complex source information
- •9.3.6 Some special applications of spatial-spectral domain analysis of Ambisonics
- •9.4 Some problems related to Ambisonics
- •9.4.1 Secondary source array and stability of Ambisonics
- •9.4.2 Spatial transformation of Ambisonic sound field
- •9.5 Error analysis of Ambisonic-reconstructed sound field
- •9.5.1 Integral error of Ambisonic-reconstructed wavefront
- •9.5.2 Discrete secondary source array and spatial-spectral aliasing error in Ambisonics
- •9.6 Multichannel reconstructed sound field analysis in the spatial domain
- •9.6.1 Basic method for analysis in the spatial domain
- •9.6.2 Minimizing error in reconstructed sound field and summing localization equation
- •9.6.3 Multiple receiver position matching method and its relation to the mode-matching method
- •9.7 Listening room reflection compensation in multichannel sound reproduction
- •9.8 Microphone array for multichannel sound field signal recording
- •9.8.1 Circular microphone array for horizontal Ambisonic recording
- •9.8.2 Spherical microphone array for spatial Ambisonic recording
- •9.8.3 Discussion on microphone array recording
- •9.9 Summary
- •10.1 Basic principle and implementation of wave field synthesis
- •10.1.1 Kirchhoff–Helmholtz boundary integral and WFS
- •10.1.2 Simplification of the types of secondary sources
- •10.1.3 WFS in a horizontal plane with a linear array of secondary sources
- •10.1.4 Finite secondary source array and effect of spatial truncation
- •10.1.5 Discrete secondary source array and spatial aliasing
- •10.1.6 Some issues and related problems on WFS implementation
- •10.2 General theory of WFS
- •10.2.1 Green’s function of Helmholtz equation
- •10.2.2 General theory of three-dimensional WFS
- •10.2.3 General theory of two-dimensional WFS
- •10.2.4 Focused source in WFS
- •10.3 Analysis of WFS in the spatial-spectral domain
- •10.3.1 General formulation and analysis of WFS in the spatial-spectral domain
- •10.3.2 Analysis of the spatial aliasing in WFS
- •10.3.3 Spatial-spectral division method of WFS
- •10.4 Further discussion on sound field reconstruction
- •10.4.1 Comparison among various methods of sound field reconstruction
- •10.4.2 Further analysis of the relationship between acoustical holography and sound field reconstruction
- •10.4.3 Further analysis of the relationship between acoustical holography and Ambisonics
- •10.4.4 Comparison between WFS and Ambisonics
- •10.5 Equalization of WFS under nonideal conditions
- •10.6 Summary
- •11.1 Basic principles of binaural reproduction and virtual auditory display
- •11.1.1 Binaural recording and reproduction
- •11.1.2 Virtual auditory display
- •11.2 Acquisition of HRTFs
- •11.2.1 HRTF measurement
- •11.2.2 HRTF calculation
- •11.2.3 HRTF customization
- •11.3 Basic physical features of HRTFs
- •11.3.1 Time-domain features of far-field HRIRs
- •11.3.2 Frequency domain features of far-field HRTFs
- •11.3.3 Features of near-field HRTFs
- •11.4 HRTF-based filters for binaural synthesis
- •11.5 Spatial interpolation and decomposition of HRTFs
- •11.5.1 Directional interpolation of HRTFs
- •11.5.2 Spatial basis function decomposition and spatial sampling theorem of HRTFs
- •11.5.3 HRTF spatial interpolation and signal mixing for multichannel sound
- •11.5.4 Spectral shape basis function decomposition of HRTFs
- •11.6 Simplification of signal processing for binaural synthesis
- •11.6.1 Virtual loudspeaker-based algorithms
- •11.6.2 Basis function decomposition-based algorithms
- •11.7.1 Principle of headphone equalization
- •11.7.2 Some problems with binaural reproduction and VAD
- •11.8 Binaural reproduction through loudspeakers
- •11.8.1 Basic principle of binaural reproduction through loudspeakers
- •11.8.2 Virtual source distribution in two-front loudspeaker reproduction
- •11.8.3 Head movement and stability of virtual sources in Transaural reproduction
- •11.8.4 Timbre coloration and equalization in transaural reproduction
- •11.9 Virtual reproduction of stereophonic and multichannel surround sound
- •11.9.1 Binaural reproduction of stereophonic and multichannel sound through headphones
- •11.9.2 Stereophonic expansion and enhancement
- •11.9.3 Virtual reproduction of multichannel sound through loudspeakers
- •11.10.1 Binaural room modeling
- •11.10.2 Dynamic virtual auditory environments system
- •11.11 Summary
- •12.1 Physical analysis of binaural pressures in summing virtual source and auditory events
- •12.1.1 Evaluation of binaural pressures and localization cues
- •12.1.2 Method for summing localization analysis
- •12.1.3 Binaural pressure analysis of stereophonic and multichannel sound with amplitude panning
- •12.1.4 Analysis of summing localization with interchannel time difference
- •12.1.5 Analysis of summing localization at the off-central listening position
- •12.1.6 Analysis of interchannel correlation and spatial auditory sensations
- •12.2 Binaural auditory models and analysis of spatial sound reproduction
- •12.2.1 Analysis of lateral localization by using auditory models
- •12.2.2 Analysis of front-back and vertical localization by using a binaural auditory model
- •12.2.3 Binaural loudness models and analysis of the timbre of spatial sound reproduction
- •12.3 Binaural measurement system for assessing spatial sound reproduction
- •12.4 Summary
- •13.1 Analog audio storage and transmission
- •13.1.1 45°/45° Disk recording system
- •13.1.2 Analog magnetic tape audio recorder
- •13.1.3 Analog stereo broadcasting
- •13.2 Basic concepts of digital audio storage and transmission
- •13.3 Quantization noise and shaping
- •13.3.1 Signal-to-quantization noise ratio
- •13.3.2 Quantization noise shaping and 1-Bit DSD coding
- •13.4 Basic principle of digital audio compression and coding
- •13.4.1 Outline of digital audio compression and coding
- •13.4.2 Adaptive differential pulse-code modulation
- •13.4.3 Perceptual audio coding in the time-frequency domain
- •13.4.4 Vector quantization
- •13.4.5 Spatial audio coding
- •13.4.6 Spectral band replication
- •13.4.7 Entropy coding
- •13.4.8 Object-based audio coding
- •13.5 MPEG series of audio coding techniques and standards
- •13.5.1 MPEG-1 audio coding technique
- •13.5.2 MPEG-2 BC audio coding
- •13.5.3 MPEG-2 advanced audio coding
- •13.5.4 MPEG-4 audio coding
- •13.5.5 MPEG parametric coding of multichannel sound and unified speech and audio coding
- •13.5.6 MPEG-H 3D audio
- •13.6 Dolby series of coding techniques
- •13.6.1 Dolby digital coding technique
- •13.6.2 Some advanced Dolby coding techniques
- •13.7 DTS series of coding technique
- •13.8 MLP lossless coding technique
- •13.9 ATRAC technique
- •13.10 Audio video coding standard
- •13.11 Optical disks for audio storage
- •13.11.1 Structure, principle, and classification of optical disks
- •13.11.2 CD family and its audio formats
- •13.11.3 DVD family and its audio formats
- •13.11.4 SACD and its audio formats
- •13.11.5 BD and its audio formats
- •13.12 Digital radio and television broadcasting
- •13.12.1 Outline of digital radio and television broadcasting
- •13.12.2 Eureka-147 digital audio broadcasting
- •13.12.3 Digital radio mondiale
- •13.12.4 In-band on-channel digital audio broadcasting
- •13.12.5 Audio for digital television
- •13.13 Audio storage and transmission by personal computer
- •13.14 Summary
- •14.1 Outline of acoustic conditions and requirements for spatial sound intended for domestic reproduction
- •14.2 Acoustic consideration and design of listening rooms
- •14.3 Arrangement and characteristics of loudspeakers
- •14.3.1 Arrangement of the main loudspeakers in listening rooms
- •14.3.2 Characteristics of the main loudspeakers
- •14.3.3 Bass management and arrangement of subwoofers
- •14.4 Signal and listening level alignment
- •14.5 Standards and guidance for conditions of spatial sound reproduction
- •14.6 Headphones and binaural monitors of spatial sound reproduction
- •14.7 Acoustic conditions for cinema sound reproduction and monitoring
- •14.8 Summary
- •15.1 Outline of psychoacoustic and subjective assessment experiments
- •15.2 Contents and attributes for spatial sound assessment
- •15.3 Auditory comparison and discrimination experiment
- •15.3.1 Paradigms of auditory comparison and discrimination experiment
- •15.3.2 Examples of auditory comparison and discrimination experiment
- •15.4 Subjective assessment of small impairments in spatial sound systems
- •15.5 Subjective assessment of a spatial sound system with intermediate quality
- •15.6 Virtual source localization experiment
- •15.6.1 Basic methods for virtual source localization experiments
- •15.6.2 Preliminary analysis of the results of virtual source localization experiments
- •15.6.3 Some results of virtual source localization experiments
- •15.7 Summary
- •16.1.1 Application to commercial cinema and related problems
- •16.1.2 Applications to domestic reproduction and related problems
- •16.1.3 Applications to automobile audio
- •16.2.1 Applications to virtual reality
- •16.2.2 Applications to communication and information systems
- •16.2.3 Applications to multimedia
- •16.2.4 Applications to mobile and handheld devices
- •16.3 Applications to the scientific experiments of spatial hearing and psychoacoustics
- •16.4 Applications to sound field auralization
- •16.4.1 Auralization in room acoustics
- •16.4.2 Other applications of auralization technique
- •16.5 Applications to clinical medicine
- •16.6 Summary
- •References
- •Index

Chapter 2
Two-channel stereophonic sound
A two-channel stereophonic sound is the simplest and most common spatial sound technique and system. The spatial information within a certain frontal–horizontal sector (one-dimen- sional space) can be recreated on the basis of the principle of sound field approximation and psychoacoustics by using a pair of frontal loudspeakers. The two-channel stereophonic sound is considered a milestone in the developments and applications of spatial sound techniques and is still the most popular technique in use. This chapter is not intended to review the detailed history and development of the two-channel stereophonic sound. For details, readers can refer to a previous study (Xie X.F., 1981). In this chapter, the basic principles and some issues related to the applications of the two-channel stereophonic sound are presented to provide readers with sufficient background information for further discussion on multichannel surround in succeeding chapters. In Section 2.1, the basic principle of recreating spatial information by the two-channel stereophonic sound is addressed. The corresponding summing localization equations are derived, and some rules in the summing localization of a virtual source are discussed. In Section 2.2, methods for generating two-channel stereophonic signals are introduced, including various microphone recording and signal simulating techniques. In Section 2.3, the compatibility between stereophonic and mono reproduction and the problems of up/downmixing between mono and stereophonic signals are briefly discussed. In Section 2.4, some issues related to practical two-channel stereophonic reproduction, such as loudspeaker arrangement, the compensation for off-central listening position are addressed.
2.1 BASIC PRINCIPLE OF A TWO-CHANNEL STEREOPHONIC SOUND
2.1.1 Interchannel level difference and summing localization equation
The two-channel stereophonic sound is designed on the basis of the results of summing localization with two sound sources (loudspeakers) described in Section 1.7.1. In the late 1950s and beginning of the 1960s, the principle of two-channel stereophonic sound was reanalyzed by some researches (Clack et al., 1957; Leakey, 1959; Bauer, 1961a; Makita, 1962; Mertens, 1965).
In Figure 2.1, a pair of loudspeakers are arranged symmetrically in the front of the listener with azimuths θL = θ0 and θR = −θ0. The two loudspeaker signals in the frequency domain are EL and ER (stereophonic signals are usually denoted as notation L and R, whereas frequencydomain signals are denoted as notation E in this book). Two identical loudspeaker signals with different amplitudes are written as
EL EL f ALEA f |
ER ER f AREA f , |
(2.1.1) |
DOI: 10.1201/9781003081500-2 |
71 |

72 Spatial Sound
Figure 2.1 Configuration of two-channel stereophonic loudspeakers.
where AL and AR are normalized amplitudes, relative gains, or panning coefficients of the left and right loudspeaker signals, respectively. For in-phase loudspeaker signals with a level difference only, AL and AR are real and non-negative numbers. EA(f ) represents the signal waveform in the frequency domain and determines the overall complex-valued pressure (including magnitude and phase) in reproduction. For harmonic or narrow-band signals, the perceived virtual source direction is independent from EA(f ). Therefore, a unit EA(f ) can be assumed in the analysis. In this case, AL and AR can also be regarded as normalized loudspeaker signals in the frequency domain. If necessary, EA(f ) should be multiplied to the results derived from the normalized loudspeaker signals when the absolute amplitude of the reproduced sound pressures is considered. When AL and AR are frequency independent, the two loudspeaker signals in the time domain can be expressed by replacing EA(f ), EL(f ), and ER(f ) in Equation (2.1.1) with their time-domain forms eA(t), eL(t), and eR(t), respectively.
At low frequencies, the head shadow is negligible, and the two ears are approximated as two points in the free space separated by 2a, where a is the head radius. For simplicity, loudspeakers are approximated as point sources. When the source distance with respect to the head center is much larger than the head radius, i.e., r0>>a, the incident wave generated by loudspeakers can be further approximated as plane waves. For convenience in analysis, the overall gain of electroacoustic reproduction system is calibrated so that loudspeakers are equivalent to point sources with the strength Qp = 4πr0 for unit input signals. In this case, according to Equations (1.2.4) and (1.2.6), the transfer coefficient from the loudspeaker signal to the pressure amplitude of a free-field plane wave at the origin of the coordinate (the position of head center in the absence of head) is equal to a unit. This assumption is held for the discussions in the succeeding chapters when plane waves generated by loudspeakers at a far-field distance are considered (here, “loudspeaker signals” are used to refer to the input signals of the electroacoustic reproduction system). Under the above assumption and letting EA(f ) being a unit, the binaural sound pressures in the frequency domain are the superposition of those caused by the incident plane waves from two loudspeakers and can be written as
P |
A |
exp |
|
jkr |
A |
exp |
|
jkr |
, |
|
L |
L |
LL |
R |
|
LR |
|
(2.1.2) |
|||
P |
A |
exp |
|
jkr |
A |
exp |
|
jkr |
. |
|
R |
L |
RL |
R |
|
RR |
|
|

Two-channel stereophonic sound 73
where k = 2π f/c is the wave number, and c = 343 m/s is the speed of sound, and
rLL rRR r0 asin 0 |
rLR rRL r0 asin 0, |
(2.1.3) |
denote the distances from the loudspeaker to the ipsilateral (near) and contralateral (far) ears, respectively. At a distance of r0 >> a, if incident waves from loudspeakers are represented as a spherical wave rather than approximated as plane waves, AL and AR in Equation (2.1.2) are substituted with AL/4πr0 and AR/4πr0, respectively. Even in this case, the resultant localization equation is identical to that derived by the approximation of a plane wave. In Equations (2.1.2) and (2.1.3), the common phase factor exp(−jkr0) represents the linear delay caused by the sound propagation from each loudspeaker to the origin and can be omitted. Omitting this phase factor is equivalent to supplementing an initial linear phase exp(jkr0) to the complex-valued strength Qp of the point source or loudspeakers mentioned above. This manipulation is also equivalent to a normalization that makes the transfer coefficient from the loudspeaker signal to the pressure amplitude of the free-field plane wave at the origin to be a unit. Then, the interaural phase difference is calculated as
|
|
|
|
AL AR |
|
|
|
|
|||
SUM L R 2 arctan |
|
|
tan kasin 0 |
|
, |
||||||
A A |
|||||||||||
|
|
|
|
L |
R |
|
|
|
|
||
or an interaural phase delay difference |
|
|
|
|
|
|
|
|
|||
|
SUM |
1 |
|
AL AR |
|
|
|
||||
ITDp,SUM |
|
|
|
arctan |
|
|
tan kasin 0 |
. |
|||
2 f |
f |
A |
A |
||||||||
|
|
|
|
|
L |
|
R |
|
|
|
(2.1.4a)
(2.1.4b)
The subscript “SUM” in the two equations represents the case of summing localization with two loudspeakers. As stated in Section 1.6.5, the interaural phase delay difference is considered a dominant cue for azimuthal localization at low frequencies. A comparison between the combined ITDp,SUM in Equation (2.1.4b) and the single-source ITDp derived from prior auditory experiences [Equation (1.6.1)] enables the determination of the azimuthal position θI of the summing virtual source as
1 |
AL AR |
|
|
|
|
|||
sin I |
|
arctan |
|
tan kasin 0 |
|
, |
(2.1.5) |
|
ka |
A A |
|||||||
|
|
L |
R |
|
|
|
|
At low frequencies with ka << 1, Equation (2.1.5) can be expanded as a Taylor series of ka (or ka sinθ0). If the first expansion term is retained, the equation can be simplified as
sin I |
AL AR |
sin 0 |
AL /AR 1 |
sin 0. |
(2.1.6) |
|
|
||||
|
AL AR |
AL /AR 1 |
|
This expression is the virtual source localization equation for the two-channel stereophonic sound, i.e., the famous stereophonic law of sine. This law demonstrates that the spatial position θI of the summing virtual source is completely determined by the amplitude ratio (AL/AR) between the two loudspeaker signals and the half-span angle θ0 between the two loudspeakers with respect to the listener, but it is irrelevant to frequency and head radius. For an average head radius with a = 0.0875 m, Equation (2.1.6) is quite effective below 0.7 kHz.

74 Spatial Sound
Thus, Equation (2.1.6) suggests the following:
1.When AL and AR are identical, sinθI is zero, indicating that the summing virtual source is positioned at the midpoint between two loudspeakers.
2.When AL is larger than AR, sinθI is positive, meaning that the summing virtual source is positioned close to the left loudspeaker.
3.When AL is far larger than AR, sinθI is approximately equal to sinθ0, indicating that the summing virtual source is positioned at the left loudspeaker.
4. Similar results are obtained when AR is larger than AL because of the left-right symmetry in configuration.
In Equation (2.1.6), θ0 = 30° or 2θ0 = 60°(standard stereophonic loudspeaker configuration) is substituted, and the relationship between the position of the virtual source and the interchannel level difference (ICLD) between loudspeaker signals denoted by d = 20 log10(AL/AR) dB is illustrated in Figure 2.2. In Figure 2.2, θI varies continuously from 0° to approximately 30°as d increases from 0 dB to +30 dB. This finding is consistent with the results of the virtual source localization experiment with two stereophonic loudspeakers.
Some remarks on summing localization with two stereophonic loudspeakers and the stereophonic law of sine are as follows:
1. The stereophonic law of sine is based on the principle of the summing localization of two sound sources. In stereophonic localization, ITDp,SUM encoded in the superposed binaural pressures is used by the auditory system to identify the position of the virtual source at low frequencies. ITDp,SUM is controlled by the ICLD. This finding indicates that transformation occurs from the ICLD at the two loudspeaker signals to ITDp,SUM at the binaural pressures. The ICLD regarding the loudspeaker signals should not be confused with the interaural level difference (ILD) at the two ears (introduced in Section 1.6.2).
2. The approach of creating localization cues by adjusting the ICLD is invalid above 1.5 kHz for two-channel stereophonic reproduction because the superposed binaural pressures contain only the localization cue of ITDp, which is an effective cue below 1.5 kHz. For wideband stimuli containing low-frequency components below 1.5 kHz,
Figure 2.2 Relationship between the position of the virtual source and the interchannel level difference between the loudspeaker signals calculated from the stereophonic laws of sine and tangent, respectively.

Two-channel stereophonic sound 75
creating a virtual source by using ICLD is still valid because of the dominant role of ITDp in azimuthal localization at low frequencies.
3. An anticlockwise spherical coordinate system with respect to the head center is employed in this book. If the clockwise spherical coordinate system is used, a negative sign should be supplemented to the sine law in Equation (2.1.6).
The law of sine is derived under the assumption that a listener’s head is fixed to the front orientation. When the listener’s head rotates around the vertical axis with an azimuth δθ (δθ > 0 represents an anticlockwise rotation to the left, and δθ < 0 denotes a clockwise rotation to the right), the distances from two loudspeakers to two ears in Figure 2.1 become
rLL r0 asin 0
rRR r0 asin 0
|
rRL r0 |
asin 0 |
, |
(2.1.7) |
|
|
rLR r0 |
asin 0 |
. |
||
|
Similar to the derivation from Equation (2.1.1) to (2.1.4), the interaural phase delay difference becomes
|
1 |
|
A sin kasin |
|
|
0 |
|
|
A sin kasin |
|
|
|
|
|
|
||||
ITDp,SUM |
arctan |
L |
|
|
|
|
R |
|
|
0 |
|
|
. |
(2.1.8) |
|||||
f |
A cos kasin |
0 |
|
|
A cos |
|
kasin |
0 |
|
||||||||||
|
|
|
L |
|
|
|
|
|
R |
|
|
|
|
|
The azimuth δθ of the rotation represents the virtual source direction with respect to the fixed coordinate, i.e., ˆI , by choosing the azimuth δθ of rotation so that the listener is oriented to the virtual source direction and the interaural phase delay difference ITDp,SUM given by Equation (2.1.8) consequently vanishes. Here, the notation θˆI is used to denote the azimuth of the virtual source because the result of the head’s rotation may be different from that of the fixed head orientation. When ITDp,SUM = 0 is substituted in Equation (2.1.8), the following equation is obtained:
A sin kasin |
|
|
A sin kasin |
|
|
0. |
(2.1.9) |
||||
L |
|
0 |
|
|
R |
|
0 |
|
|
|
At low frequencies with ka << 1, Equation (2.1.9) can be expanded as a Taylor series of ka. If only the first expansion term is retained, the virtual source azimuth is determined in accordance with the law of tangent
|
AL AR |
|
AL /AR 1 |
|
|
|
tan I |
tan 0 |
tan 0. |
(2.1.10) |
|||
|
|
|||||
|
AL AR |
AL /AR 1 |
|
For an average head radius, Equation (2.1.10) is quite effective below 0.7 kHz. Makita (1962) supposed that the perceived virtual source direction in the superposed sound field is consistent with the inner normal direction (opposite to the direction of the medium velocity) of the superposed wavefront at the receiver position. Equation (2.1.10) can also be derived from Makita’s hypothesis (Section 3.2.2). Actually, Makita’s hypothesis is equivalent to that of the rotation of the listener’s head to the orientation of the virtual source.
The results calculated from Equation (2.1.10) are presented in Figure 2.2. In particular, the span angle between two loudspeakers is also 2θ0 = 60°. The results of Equation (2.1.10) are similar to those of Equation (2.1.6) because tanθ ≈ sinθ for θ ≤ 30°. Therefore, for loudspeaker configuration with the span angle 2θ0 ≤ 60°, the perceived virtual source direction is relatively stable during head rotation. In practice, Equations (2.1.6) and (2.1.10) are