
- •Preface
- •Introduction
- •1.1 Spatial coordinate systems
- •1.2 Sound fields and their physical characteristics
- •1.2.1 Free-field and sound waves generated by simple sound sources
- •1.2.2 Reflections from boundaries
- •1.2.3 Directivity of sound source radiation
- •1.2.4 Statistical analysis of acoustics in an enclosed space
- •1.2.5 Principle of sound receivers
- •1.3 Auditory system and perception
- •1.3.1 Auditory system and its functions
- •1.3.2 Hearing threshold and loudness
- •1.3.3 Masking
- •1.3.4 Critical band and auditory filter
- •1.4 Artificial head models and binaural signals
- •1.4.1 Artificial head models
- •1.4.2 Binaural signals and head-related transfer functions
- •1.5 Outline of spatial hearing
- •1.6 Localization cues for a single sound source
- •1.6.1 Interaural time difference
- •1.6.2 Interaural level difference
- •1.6.3 Cone of confusion and head movement
- •1.6.4 Spectral cues
- •1.6.5 Discussion on directional localization cues
- •1.6.6 Auditory distance perception
- •1.7 Summing localization and spatial hearing with multiple sources
- •1.7.1 Summing localization with two sound sources
- •1.7.2 The precedence effect
- •1.7.3 Spatial auditory perceptions with partially correlated and uncorrelated source signals
- •1.7.4 Auditory scene analysis and spatial hearing
- •1.7.5 Cocktail party effect
- •1.8 Room reflections and auditory spatial impression
- •1.8.1 Auditory spatial impression
- •1.8.2 Sound field-related measures and auditory spatial impression
- •1.8.3 Binaural-related measures and auditory spatial impression
- •1.9.1 Basic principle of spatial sound
- •1.9.2 Classification of spatial sound
- •1.9.3 Developments and applications of spatial sound
- •1.10 Summary
- •2.1 Basic principle of a two-channel stereophonic sound
- •2.1.1 Interchannel level difference and summing localization equation
- •2.1.2 Effect of frequency
- •2.1.3 Effect of interchannel phase difference
- •2.1.4 Virtual source created by interchannel time difference
- •2.1.5 Limitation of two-channel stereophonic sound
- •2.2.1 XY microphone pair
- •2.2.2 MS transformation and the MS microphone pair
- •2.2.3 Spaced microphone technique
- •2.2.4 Near-coincident microphone technique
- •2.2.5 Spot microphone and pan-pot technique
- •2.2.6 Discussion on microphone and signal simulation techniques for two-channel stereophonic sound
- •2.3 Upmixing and downmixing between two-channel stereophonic and mono signals
- •2.4 Two-channel stereophonic reproduction
- •2.4.1 Standard loudspeaker configuration of two-channel stereophonic sound
- •2.4.2 Influence of front-back deviation of the head
- •2.5 Summary
- •3.1 Physical and psychoacoustic principles of multichannel surround sound
- •3.2 Summing localization in multichannel horizontal surround sound
- •3.2.1 Summing localization equations for multiple horizontal loudspeakers
- •3.2.2 Analysis of the velocity and energy localization vectors of the superposed sound field
- •3.2.3 Discussion on horizontal summing localization equations
- •3.3 Multiple loudspeakers with partly correlated and low-correlated signals
- •3.4 Summary
- •4.1 Discrete quadraphone
- •4.1.1 Outline of the quadraphone
- •4.1.2 Discrete quadraphone with pair-wise amplitude panning
- •4.1.3 Discrete quadraphone with the first-order sound field signal mixing
- •4.1.4 Some discussions on discrete quadraphones
- •4.2 Other horizontal surround sounds with regular loudspeaker configurations
- •4.2.1 Six-channel reproduction with pair-wise amplitude panning
- •4.2.2 The first-order sound field signal mixing and reproduction with M ≥ 3 loudspeakers
- •4.3 Transformation of horizontal sound field signals and Ambisonics
- •4.3.1 Transformation of the first-order horizontal sound field signals
- •4.3.2 The first-order horizontal Ambisonics
- •4.3.3 The higher-order horizontal Ambisonics
- •4.3.4 Discussion and implementation of the horizontal Ambisonics
- •4.4 Summary
- •5.1 Outline of surround sounds with accompanying picture and general uses
- •5.2 5.1-Channel surround sound and its signal mixing analysis
- •5.2.1 Outline of 5.1-channel surround sound
- •5.2.2 Pair-wise amplitude panning for 5.1-channel surround sound
- •5.2.3 Global Ambisonic-like signal mixing for 5.1-channel sound
- •5.2.4 Optimization of three frontal loudspeaker signals and local Ambisonic-like signal mixing
- •5.2.5 Time panning for 5.1-channel surround sound
- •5.3 Other multichannel horizontal surround sounds
- •5.4 Low-frequency effect channel
- •5.5 Summary
- •6.1 Summing localization in multichannel spatial surround sound
- •6.1.1 Summing localization equations for spatial multiple loudspeaker configurations
- •6.1.2 Velocity and energy localization vector analysis for multichannel spatial surround sound
- •6.1.3 Discussion on spatial summing localization equations
- •6.1.4 Relationship with the horizontal summing localization equations
- •6.2 Signal mixing methods for a pair of vertical loudspeakers in the median and sagittal plane
- •6.3 Vector base amplitude panning
- •6.4 Spatial Ambisonic signal mixing and reproduction
- •6.4.1 Principle of spatial Ambisonics
- •6.4.2 Some examples of the first-order spatial Ambisonics
- •6.4.4 Recreating a top virtual source with a horizontal loudspeaker arrangement and Ambisonic signal mixing
- •6.5 Advanced multichannel spatial surround sounds and problems
- •6.5.1 Some advanced multichannel spatial surround sound techniques and systems
- •6.5.2 Object-based spatial sound
- •6.5.3 Some problems related to multichannel spatial surround sound
- •6.6 Summary
- •7.1 Basic considerations on the microphone and signal simulation techniques for multichannel sounds
- •7.2 Microphone techniques for 5.1-channel sound recording
- •7.2.1 Outline of microphone techniques for 5.1-channel sound recording
- •7.2.2 Main microphone techniques for 5.1-channel sound recording
- •7.2.3 Microphone techniques for the recording of three frontal channels
- •7.2.4 Microphone techniques for ambience recording and combination with frontal localization information recording
- •7.2.5 Stereophonic plus center channel recording
- •7.3 Microphone techniques for other multichannel sounds
- •7.3.1 Microphone techniques for other discrete multichannel sounds
- •7.3.2 Microphone techniques for Ambisonic recording
- •7.4 Simulation of localization signals for multichannel sounds
- •7.4.1 Methods of the simulation of directional localization signals
- •7.4.2 Simulation of virtual source distance and extension
- •7.4.3 Simulation of a moving virtual source
- •7.5 Simulation of reflections for stereophonic and multichannel sounds
- •7.5.1 Delay algorithms and discrete reflection simulation
- •7.5.2 IIR filter algorithm of late reverberation
- •7.5.3 FIR, hybrid FIR, and recursive filter algorithms of late reverberation
- •7.5.4 Algorithms of audio signal decorrelation
- •7.5.5 Simulation of room reflections based on physical measurement and calculation
- •7.6 Directional audio coding and multichannel sound signal synthesis
- •7.7 Summary
- •8.1 Matrix surround sound
- •8.1.1 Matrix quadraphone
- •8.1.2 Dolby Surround system
- •8.1.3 Dolby Pro-Logic decoding technique
- •8.1.4 Some developments on matrix surround sound and logic decoding techniques
- •8.2 Downmixing of multichannel sound signals
- •8.3 Upmixing of multichannel sound signals
- •8.3.1 Some considerations in upmixing
- •8.3.2 Simple upmixing methods for front-channel signals
- •8.3.3 Simple methods for Ambient component separation
- •8.3.4 Model and statistical characteristics of two-channel stereophonic signals
- •8.3.5 A scale-signal-based algorithm for upmixing
- •8.3.6 Upmixing algorithm based on principal component analysis
- •8.3.7 Algorithm based on the least mean square error for upmixing
- •8.3.8 Adaptive normalized algorithm based on the least mean square for upmixing
- •8.3.9 Some advanced upmixing algorithms
- •8.4 Summary
- •9.1 Each order approximation of ideal reproduction and Ambisonics
- •9.1.1 Each order approximation of ideal horizontal reproduction
- •9.1.2 Each order approximation of ideal three-dimensional reproduction
- •9.2 General formulation of multichannel sound field reconstruction
- •9.2.1 General formulation of multichannel sound field reconstruction in the spatial domain
- •9.2.2 Formulation of spatial-spectral domain analysis of circular secondary source array
- •9.2.3 Formulation of spatial-spectral domain analysis for a secondary source array on spherical surface
- •9.3 Spatial-spectral domain analysis and driving signals of Ambisonics
- •9.3.1 Reconstructed sound field of horizontal Ambisonics
- •9.3.2 Reconstructed sound field of spatial Ambisonics
- •9.3.3 Mixed-order Ambisonics
- •9.3.4 Near-field compensated higher-order Ambisonics
- •9.3.5 Ambisonic encoding of complex source information
- •9.3.6 Some special applications of spatial-spectral domain analysis of Ambisonics
- •9.4 Some problems related to Ambisonics
- •9.4.1 Secondary source array and stability of Ambisonics
- •9.4.2 Spatial transformation of Ambisonic sound field
- •9.5 Error analysis of Ambisonic-reconstructed sound field
- •9.5.1 Integral error of Ambisonic-reconstructed wavefront
- •9.5.2 Discrete secondary source array and spatial-spectral aliasing error in Ambisonics
- •9.6 Multichannel reconstructed sound field analysis in the spatial domain
- •9.6.1 Basic method for analysis in the spatial domain
- •9.6.2 Minimizing error in reconstructed sound field and summing localization equation
- •9.6.3 Multiple receiver position matching method and its relation to the mode-matching method
- •9.7 Listening room reflection compensation in multichannel sound reproduction
- •9.8 Microphone array for multichannel sound field signal recording
- •9.8.1 Circular microphone array for horizontal Ambisonic recording
- •9.8.2 Spherical microphone array for spatial Ambisonic recording
- •9.8.3 Discussion on microphone array recording
- •9.9 Summary
- •10.1 Basic principle and implementation of wave field synthesis
- •10.1.1 Kirchhoff–Helmholtz boundary integral and WFS
- •10.1.2 Simplification of the types of secondary sources
- •10.1.3 WFS in a horizontal plane with a linear array of secondary sources
- •10.1.4 Finite secondary source array and effect of spatial truncation
- •10.1.5 Discrete secondary source array and spatial aliasing
- •10.1.6 Some issues and related problems on WFS implementation
- •10.2 General theory of WFS
- •10.2.1 Green’s function of Helmholtz equation
- •10.2.2 General theory of three-dimensional WFS
- •10.2.3 General theory of two-dimensional WFS
- •10.2.4 Focused source in WFS
- •10.3 Analysis of WFS in the spatial-spectral domain
- •10.3.1 General formulation and analysis of WFS in the spatial-spectral domain
- •10.3.2 Analysis of the spatial aliasing in WFS
- •10.3.3 Spatial-spectral division method of WFS
- •10.4 Further discussion on sound field reconstruction
- •10.4.1 Comparison among various methods of sound field reconstruction
- •10.4.2 Further analysis of the relationship between acoustical holography and sound field reconstruction
- •10.4.3 Further analysis of the relationship between acoustical holography and Ambisonics
- •10.4.4 Comparison between WFS and Ambisonics
- •10.5 Equalization of WFS under nonideal conditions
- •10.6 Summary
- •11.1 Basic principles of binaural reproduction and virtual auditory display
- •11.1.1 Binaural recording and reproduction
- •11.1.2 Virtual auditory display
- •11.2 Acquisition of HRTFs
- •11.2.1 HRTF measurement
- •11.2.2 HRTF calculation
- •11.2.3 HRTF customization
- •11.3 Basic physical features of HRTFs
- •11.3.1 Time-domain features of far-field HRIRs
- •11.3.2 Frequency domain features of far-field HRTFs
- •11.3.3 Features of near-field HRTFs
- •11.4 HRTF-based filters for binaural synthesis
- •11.5 Spatial interpolation and decomposition of HRTFs
- •11.5.1 Directional interpolation of HRTFs
- •11.5.2 Spatial basis function decomposition and spatial sampling theorem of HRTFs
- •11.5.3 HRTF spatial interpolation and signal mixing for multichannel sound
- •11.5.4 Spectral shape basis function decomposition of HRTFs
- •11.6 Simplification of signal processing for binaural synthesis
- •11.6.1 Virtual loudspeaker-based algorithms
- •11.6.2 Basis function decomposition-based algorithms
- •11.7.1 Principle of headphone equalization
- •11.7.2 Some problems with binaural reproduction and VAD
- •11.8 Binaural reproduction through loudspeakers
- •11.8.1 Basic principle of binaural reproduction through loudspeakers
- •11.8.2 Virtual source distribution in two-front loudspeaker reproduction
- •11.8.3 Head movement and stability of virtual sources in Transaural reproduction
- •11.8.4 Timbre coloration and equalization in transaural reproduction
- •11.9 Virtual reproduction of stereophonic and multichannel surround sound
- •11.9.1 Binaural reproduction of stereophonic and multichannel sound through headphones
- •11.9.2 Stereophonic expansion and enhancement
- •11.9.3 Virtual reproduction of multichannel sound through loudspeakers
- •11.10.1 Binaural room modeling
- •11.10.2 Dynamic virtual auditory environments system
- •11.11 Summary
- •12.1 Physical analysis of binaural pressures in summing virtual source and auditory events
- •12.1.1 Evaluation of binaural pressures and localization cues
- •12.1.2 Method for summing localization analysis
- •12.1.3 Binaural pressure analysis of stereophonic and multichannel sound with amplitude panning
- •12.1.4 Analysis of summing localization with interchannel time difference
- •12.1.5 Analysis of summing localization at the off-central listening position
- •12.1.6 Analysis of interchannel correlation and spatial auditory sensations
- •12.2 Binaural auditory models and analysis of spatial sound reproduction
- •12.2.1 Analysis of lateral localization by using auditory models
- •12.2.2 Analysis of front-back and vertical localization by using a binaural auditory model
- •12.2.3 Binaural loudness models and analysis of the timbre of spatial sound reproduction
- •12.3 Binaural measurement system for assessing spatial sound reproduction
- •12.4 Summary
- •13.1 Analog audio storage and transmission
- •13.1.1 45°/45° Disk recording system
- •13.1.2 Analog magnetic tape audio recorder
- •13.1.3 Analog stereo broadcasting
- •13.2 Basic concepts of digital audio storage and transmission
- •13.3 Quantization noise and shaping
- •13.3.1 Signal-to-quantization noise ratio
- •13.3.2 Quantization noise shaping and 1-Bit DSD coding
- •13.4 Basic principle of digital audio compression and coding
- •13.4.1 Outline of digital audio compression and coding
- •13.4.2 Adaptive differential pulse-code modulation
- •13.4.3 Perceptual audio coding in the time-frequency domain
- •13.4.4 Vector quantization
- •13.4.5 Spatial audio coding
- •13.4.6 Spectral band replication
- •13.4.7 Entropy coding
- •13.4.8 Object-based audio coding
- •13.5 MPEG series of audio coding techniques and standards
- •13.5.1 MPEG-1 audio coding technique
- •13.5.2 MPEG-2 BC audio coding
- •13.5.3 MPEG-2 advanced audio coding
- •13.5.4 MPEG-4 audio coding
- •13.5.5 MPEG parametric coding of multichannel sound and unified speech and audio coding
- •13.5.6 MPEG-H 3D audio
- •13.6 Dolby series of coding techniques
- •13.6.1 Dolby digital coding technique
- •13.6.2 Some advanced Dolby coding techniques
- •13.7 DTS series of coding technique
- •13.8 MLP lossless coding technique
- •13.9 ATRAC technique
- •13.10 Audio video coding standard
- •13.11 Optical disks for audio storage
- •13.11.1 Structure, principle, and classification of optical disks
- •13.11.2 CD family and its audio formats
- •13.11.3 DVD family and its audio formats
- •13.11.4 SACD and its audio formats
- •13.11.5 BD and its audio formats
- •13.12 Digital radio and television broadcasting
- •13.12.1 Outline of digital radio and television broadcasting
- •13.12.2 Eureka-147 digital audio broadcasting
- •13.12.3 Digital radio mondiale
- •13.12.4 In-band on-channel digital audio broadcasting
- •13.12.5 Audio for digital television
- •13.13 Audio storage and transmission by personal computer
- •13.14 Summary
- •14.1 Outline of acoustic conditions and requirements for spatial sound intended for domestic reproduction
- •14.2 Acoustic consideration and design of listening rooms
- •14.3 Arrangement and characteristics of loudspeakers
- •14.3.1 Arrangement of the main loudspeakers in listening rooms
- •14.3.2 Characteristics of the main loudspeakers
- •14.3.3 Bass management and arrangement of subwoofers
- •14.4 Signal and listening level alignment
- •14.5 Standards and guidance for conditions of spatial sound reproduction
- •14.6 Headphones and binaural monitors of spatial sound reproduction
- •14.7 Acoustic conditions for cinema sound reproduction and monitoring
- •14.8 Summary
- •15.1 Outline of psychoacoustic and subjective assessment experiments
- •15.2 Contents and attributes for spatial sound assessment
- •15.3 Auditory comparison and discrimination experiment
- •15.3.1 Paradigms of auditory comparison and discrimination experiment
- •15.3.2 Examples of auditory comparison and discrimination experiment
- •15.4 Subjective assessment of small impairments in spatial sound systems
- •15.5 Subjective assessment of a spatial sound system with intermediate quality
- •15.6 Virtual source localization experiment
- •15.6.1 Basic methods for virtual source localization experiments
- •15.6.2 Preliminary analysis of the results of virtual source localization experiments
- •15.6.3 Some results of virtual source localization experiments
- •15.7 Summary
- •16.1.1 Application to commercial cinema and related problems
- •16.1.2 Applications to domestic reproduction and related problems
- •16.1.3 Applications to automobile audio
- •16.2.1 Applications to virtual reality
- •16.2.2 Applications to communication and information systems
- •16.2.3 Applications to multimedia
- •16.2.4 Applications to mobile and handheld devices
- •16.3 Applications to the scientific experiments of spatial hearing and psychoacoustics
- •16.4 Applications to sound field auralization
- •16.4.1 Auralization in room acoustics
- •16.4.2 Other applications of auralization technique
- •16.5 Applications to clinical medicine
- •16.6 Summary
- •References
- •Index

Multichannel horizontal surround sound 199
lateral region of 54.6° < θ < 110°. This defect influences slightly and remains acceptable in sound reproduction with accompanying pictures. However, as stated in Section 1.8.3, the lateral early reflections from θ = 55° ± 20° are vital to the auditory spatial impression (especially the ASW) in a concert hall. A 5.1-channel sound with pair-wise amplitude panning fails to simulate and reproduce the spatial information of lateral reflections perfectly and therefore displays a defect in music reproduction only. This defect also reveals the limitations of using the indirect method discussed in Section 3.1 to simulate the auditory sensation of a concert hall in 5.1-channel reproduction. Further analysis of the 5.1-channel loudspeaker configuration with LS and RS loudspeakers at ±120° yields similar results (Xie, 1997). A virtual source localization experiment also validates the aforementioned analysis (Xie, 1997; Xie, 2001a; Martin et al., 1999).
The aforementioned characteristics and defects are due to the 5.1-channel loudspeaker configuration. The 5.1-channel sound was originally designed for sound reproduction with accompanying pictures; therefore, an irregular and front-biased loudspeaker configuration is used. It was not originally intended to recreate the full 360° virtual source in the horizontal plane. Using 5.1-channel sound for general reproduction (including music reproduction only) is a compromise. The analysis in this section reveals the limitation of 5.1-channel sound with pair-wise amplitude panning. Other signal panning and mixing methods are also available to recreate a virtual source in the 5.1-channel sound, and the decorrelated signal method is applicable to simulate the subjective sensation in reflective environments.
5.2.3 Global Ambisonic-like signal mixing for 5.1-channel sound
In addition to pair-wise amplitude panning, other signal panning and mixing methods have been explored for the 5.1-channel sound. Sound field or Ambisonic-like signal mixing methods similar to those discussed in Section 4.3 are suggested for recreating the virtual source in various horizontal directions, although the 5.1-channel loudspeaker configuration is inherently inappropriate for recreating a full 360° horizontal virtual source. Deriving Ambisoniclike decoding matrices or signals for irregular loudspeaker configurations based on physical and psychoacoustic criteria is complicated and sometimes difficult. Despite their complexity, some authors derived Ambisonic-like signals and mixing methods for 5.1-channel loudspeaker configurations. These methods are theoretically attractive but rarely used in practical program production.
As a general case, if M loudspeakers are arranged in a horizontal circle with regular or irregular azimuthal intervals and the azimuth of the ith loudspeaker is θi, the normalized signal amplitude of the ith loudspeaker is a linear combination of azimuthal harmonics up to the order Q,
|
1 |
Q |
1 |
2 |
|
|
|
|
|
i 0, 1... M 1 , (5.2.15) |
|||
Ai S Atotal D0 |
i Dq |
i cos q S Dq |
i sin q S |
|||
|
|
q 1 |
|
|
|
|
where θS is the azimuth of the target source. In contrast to the case of a regular loudspeaker configuration given by Equation (4.3.50), the coefficients D01 i for different loudspeakers are different in the case of irregular loudspeaker configurations; as such, they cannot be identically normalized to a unit. Given the loudspeaker configuration and the order Q, the decoding matrix or loudspeaker signals are derived by searching for a set of decoding coef-
|
|
0 |
|
i |
q |
|
i |
q |
|
i |
|
|
|
|
|
ficients |
D 1 |
|
|
, D 1 |
|
|
, D 2 |
|
|
, |
q 1, 2 Q, |
i 0, 1.. |
|
M 1 so that the reproduced |

200 Spatial Sound
sound field satisfies certain optimized criteria. Equation (5.2.15) involves [M(2Q + 1)] decoding coefficients to be determined. Before the optimized criteria are applied, the decoding coefficients are simplified by considering symmetry. Even for irregular loudspeaker configurations, loudspeaker arrangements are usually left-right symmetric. For an arbitrary pair of left-right symmetric loudspeakers i and i′, their azimuths satisfy θi = −θi′ and θi ≠ 0°or 180°. The coefficients in Equation (5.2.15) for a pair of left-right symmetric loudspeakers satisfy the following equation because cosqθS is an even function with cos(−qθS) = cosqθS and thsinqθS is an odd function with sin(−qθS) = −sinqθS:
D01 i D01 i |
Dq1 i Dq1 i |
Dq2 i Dq2 i q 1, 2 Q. |
(5.2.16) |
For loudspeakers at θi = 0° and 180°, the coefficients satisfy |
|
||
|
Dq2 i 0 |
q 1, 2 Q. |
(5.2.17) |
In addition, if a pair of loudspeakers i and i" is arranged front-back symmetrically, their azimuths satisfy θi”= (180° − θi) in the left-half horizontal plane or θi”= (−180°−θi) in the righthalf horizontal plane. Since cos[q(±180° − θS)] = (−1)q cos qθS, sin [q(±180° − θS)] = (−1)q + 1 sin qθS, then
D01 i D01 i |
(5.2.18) |
Dq1 i 1 q Dq1 i Dq2 i 1 q 1 Dq2 i q 1, 2 Q. |
Symmetry reduces the number of decoding coefficients to be determined and simplifies the procedures of optimization. For the ITU 5.1-channel loudspeaker configuration, the number of loudspeakers is M = 5. For Q = 1 to 4 order signals, Equation (5.2.13) involves [M(2Q + 1)] = 15, 25, 35, and 45 unknown coefficients, respectively. By considering the leftright symmetry, the number of coefficients is reduced to (5Q + 3) = 8, 13, 18, and 23, respectively. Therefore, using symmetry is a mathematical skill for deriving Ambisonic-decoding equations and signals.
Similar to cases of regular loudspeaker configurations, the criteria of the optimized velocity localization vector (equivalent to optimized interaural phase delay difference and its dynamic variation with head rotation) and optimized energy localization vector are used to derive the decoding coefficients. From Equation (5.2.15), the following quantities are evaluated as the functions of the coefficients to be determined:
M 1 |
|
M 1 |
|
M 1 |
|
|
|
Ai cos i |
|
Ai sin i |
|
PA Ai |
Vx |
Vy |
(5.2.19) |
||
i 0 |
|
i 0 |
|
i 0 |
|
|
|
|
|||
M 1 |
|
M 1 |
|
M 1 |
|
Pow Ai2 |
Ix Ai2 cos i |
Iy Ai2 sin i. |
|
||
i 0 |
|
i 0 |
|
i 0 |
|
Ideally, the decoding coefficients are derived with the following optimized criteria (Gerzon and Barton, 1992):

Multichannel horizontal surround sound 201
1.Criterion 1. The virtual source direction θv evaluated from the velocity localization vector given by Equation (3.2.22) should be equal to θE evaluated from the energy localization vector given by Equation (3.2.34), and θv and θE should be as close to the target source direction as possible, i.e.,
v E S. |
(5.2.20) |
|
According to Equation (5.2.19), the first equality in Equation (5.2.20) yields |
|
|
|
|
(5.2.21) |
VyIx |
VxIy. |
2.Criterion 2. For all target source azimuths θS or part of target source azimuths for which the accuracy of localization is important, the velocity vector magnitude rv given by Equation (3.2.29) is optimized as close to the unit as possible at low frequencies
below 0.4–0.7 kHz. The energy vector magnitude rE expressed in Equation (3.2.36) is optimized as close to the unit as possible within the midand high-frequency range of 0.7–4.0 kHz.
3.Criterion 3. The overall sound pressures P′A and overall power Pow′ in the origin should be a constant and independent from the target source azimuth θS.
For the first-order signals and if the velocity localization vector is considered only, the optimized criterion yields a set of linear equations of the decoding coefficients, which can be easily solved. In Equation (5.2.15), the amplitudes of the first-order loudspeaker signals are written as
|
1 |
1 |
2 |
|
i 0, 1 M 1 (5.2.22) |
Ai S Atotal D0 |
i D1 |
i cos S D1 |
i sin S |
Letting θv=θS, rv= 1, and applying the criterion of P′A = PA = 1 (i.e., the overall reproduced sound pressure in the origin is equal to the unit target sound pressure), Equations (3.2.20) and (3.2.22) yield
M 1 |
M 1 |
M 1 |
|
Ai S 1 |
Ai S cos i cos S |
Ai S sin i sin S. |
(5.2.23) |
i 0 |
i 0 |
i 0 |
|
The left sides of the three equations above represent the normalized pressure, x, and y components of the velocity localization vector of the reproduced sound field in the origin, respectively. The right sides of the above equations represent the three corresponding physical quantities in the target sound field, respectively. Therefore, Equation (5.2.23) shows that the three physical quantities in reproduction match with those in the target sound field. Substituting Equation (5.2.22) into Equation (5.2.23) yields a linear matrix equation for the unknown coefficients:
S2D Atotal Y2D D2D S2D, |
(5.2.24) |

202 Spatial Sound
where S2D = [W, X, Y]T is a 3 ×1 column matrix or vector composed of the first-order independent signals given by Equation (4.3.3), the subscript “2D” denotes the case of two-dimen- sional (horizontal) reproduction. [D2D] is an M×3 decoding matrix to be determined:
|
1 |
0 |
|
1 |
0 |
|
2 |
0 |
|
|
|
|||
|
D0 |
|
D1 |
|
D1 |
|
|
|
||||||
|
D1 |
|
D1 |
|
D 2 |
|
|
(5.2.25) |
||||||
D2D |
0 |
|
1 |
1 |
|
1 |
1 |
|
1 |
. |
||||
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
1 |
M 1 |
1 |
M 1 |
2 |
M 1 |
|
|
||||||
D0 |
D1 |
D1 |
|
|
[Y2D] is a 3 × M matrix with its entries being the cosine and sine of each loudspeaker azimuth:
1 |
1 |
|
1 |
|
|
|
cos 1 |
|
cos M 1 |
|
(5.2.26) |
Y2D cos 0 |
. |
||||
|
sin 1 |
|
sin M 1 |
|
|
sin 0 |
|
|
For M > 3, the unknown coefficients in Equation (5.2.24) can be solved from the following pseudo-inverse methods:
total |
2D |
2D |
|
2D |
|
|
2D 2D |
|
|
||
|
|
1 . |
|
||||||||
A |
D |
pinv Y |
|
Y |
T |
|
Y Y |
T |
(5.2.27) |
The solution given by Equation (5.2.27) is valid for regular and irregular loudspeaker configurations and satisfies the condition of constant-amplitude normalization given by Equation (4.3.80a). When M= 3, if the matrix [Y2D] is invertible, the solution of Equation (5.2.24) is directly obtained from the inverse matrix of [Y2D]. For regular loudspeaker configuration, the solution or loudspeaker signals expressed in Equation (5.2.27) are equivalent to Equation (4.3.15) with b = 2 and constant-amplitude normalization. For irregular loudspeaker configurations such as 5.1-channel configuration, the solution given by Equation (5.2.27) is often close to singular and thus unstable (Neukom, 2006). In this case, a slight change in loudspeaker positions or responses may influence the performance in reproduction. In addition, the following undesirable situation may occur. That is, the signal amplitude for a loudspeaker near the target source direction is large; the signal amplitude for some other loudspeakers is also large but out of phase. The destructive interference among the sound wave from all loudspeakers makes the superposed sound pressure in the origin match with that of the target source, but the overall power of all loudspeaker signals increases dramatically. Moreover, the superposed sound pressure at the off-central (off-origin) position deviates from that of the target plane wave obviously.
If the energy localization vector is considered, the optimized criteria yield a set of nonlinear equations of the unknown coefficients. Solving these equations deals with the problem of nonlinear optimization and thus is complicated. The equations are usually solved by numerical methods. The following problems may occur in solving the equations:
1.For some practical irregular loudspeaker configurations, it may only be able to satisfy the aforementioned optimized criteria partly or approximately rather than completely and exactly. In addition, it may only be able to satisfy the optimized criteria in some

Multichannel horizontal surround sound 203
target source directions rather than in all target source directions. In this case, the error in target source directions that are important for auditory perception (such as frontal directions) is considered preferentially in the optimization procedure.
2.The optimized solution depends on the optimized criteria and the measures of errors. A comprehensive measure of the overall error should be a weighted combination of the errors evaluated from the aforementioned Criteria 1 to 3. The weights are chosen in accordance with the relative importance of different errors. Different error weights lead to various results.
3.The final results of some nonlinear optimization procedures depend on the initial parameters. Inappropriate initial parameters may lead to local rather than global optimized solutions.
4.Under certain optimized criteria, two or more sets of the solution of the decoding coefficients may be available. A set of appropriate solutions should be finally identified from physical and auditory analysis or even from experience.
5.Some solutions may be nearly singular and unstable so that slight variations in parameters (such as loudspeaker azimuths) remarkably change the resultant coefficients. Such a solution is obviously inappropriate.
With these problems, deriving the decoding coefficients for irregular loudspeaker configurations becomes mathematically difficult. Despite these difficulties, some studies have derived Ambisonic-decoding coefficients and signals for irregular loudspeaker configurations, especially for the 5.1-channel configuration by using various nonlinear optimization methods.
Gerzon and Barton (1992) first investigated the solution of the decoding coefficients of second-order Ambisonic-like signals for irregular configurations with five loudspeakers. They presented their results at the 92nd Convention of the Audio Engineering Society held in Vienna and termed Vienna decoders. The azimuths of loudspeakers in Gerzon’s configuration slightly differ from those recommended by the ITU. In Gerzon’s configuration, the span angle between the left and right loudspeakers in the front was wider than that recommended by the ITU, and the span angle between a pair of surround loudspeakers was narrower than that recommended by the ITU. The ITU did not publish the standard of 5.1-channel sound when Gerzon presented his work in 1992. Gerzon outlined a method for solving decoding coefficients but did not give a detailed mathematical derivation. Many set solutions are available for the decoding coefficients. These solutions depend on some predetermined parameters that should be chosen on the basis of certain auditory localization criteria. The optimization of a velocity localization vector below 0.4 kHz and an energy localization vector at above 0.7 kHz yields two sets of decoding coefficients at low and mid-high frequencies, respectively. Accordingly, the lowand mid-high-frequency components of independent signals are individually decoded by two decoding matrices rather than a common decoding matrix and shelf filters similar to those in Figure 4.19. The overall gains of two decoding matrices are chosen to balance the timbre between low and mid-high frequencies so that the root-mean-square value of decoding coefficients at mid-high frequencies matches with that at low frequencies. This signal mixing method improves the horizontal localization performance and the stability of the front virtual source.
Craven (2003) gave the frequency-independent decoding equation for an ITU 5.1-chan- nel configuration with the 4th-order Ambisonic-like signal mixing. Decoding coefficients were derived from the cost function based on the aforementioned optimized criteria. The conjugate-gradient method was used to solve the nonlinear optimization problem. The final convergence was accelerated using the second derivatives in a Newton iteration. Craven presented the final results of (un-normalized) loudspeaker signal amplitudes or gains as shown in the following equations but did not provide the detailed mathematical derivation:

204 Spatial Sound
Figure 5.7 The 4th-orderAmbisonic-like signal panning curves for an ITU 5.1-channel loudspeaker configuration.
AL 0.167 0.242 cos S 0.272sin S 0.053cos2 S 0.222sin 2 S |
|
0.084cos3 S 0.059sin3 S 0.070cos4 S 0.084sin 4 S |
|
AC 0.105 0.332 cos S 0.265cos2 S 0.169cos3 S 0.060cos4 S |
|
AR 0.167 0.242 cos S 0.272sin S 0.053cos2 S 0.222sin 2 S |
. |
0.084cos3 S 0.059sin3 S 0.070cos4 S 0.084sin 4 S |
(5.2.28)
ALS 0.356 0.360cos S 0.425sin S 0.064cos2 S 0.118sin 2 S0.047 sin3 S 0.027 cos4 S 0.061sin 4 S
ARS 0.356 0.360cos S 0.425sin S 0.064cos2 S 0.118sin 2 S0.047 sin3 S 0.027 cos4 S 0.061sin 4 S
Figure 5.7 illustrates the signal panning curves of Equation (5.2.28). The amplitudes of AL, AR, ALS, and ARS signals, especially the two surround signals, are asymmetric about the axis direction of the corresponding loudspeakers. This asymmetry is adapted to an irregular loudspeaker configuration. By contrast, for regular loudspeaker configurations, the amplitudes of loudspeaker signals given by Equation (4.3.53) are even functions of the variable (θS – θi) and therefore symmetric about the axis directions of the corresponding loudspeakers. The symmetry can also be observed in the example illustrated in Figure 4.17. In addition, for a regular loudspeaker configuration, the Q-order Ambisonic reproduction requires at least (2Q + 1) or (2Q + 2) loudspeakers (Section 4.3.3). Therefore, a regular five-loudspeaker configuration is appropriate for the firstand second-order Ambisonic reproduction. Conversely, for an irregular loudspeaker configuration, higher-order azimuthal harmonic components are necessary to fit the required asymmetric polar pattern of loudspeaker signals.
Craven (2003) presented some evaluated results for loudspeaker signals given in Equation (5.2.28). Overall, the velocity vector magnitude rvis improved at the lateral direction θS = 90° and the rear direction θS =180° in comparison with those of the pair-wise amplitude panning. For example, at the rear direction θS = 180°, rv is 0.693for loudspeaker signals expressed in Equations (5.2.13) and 0.342 for pair-wise amplitude panning. This finding is due to the out- of-phase and small-amplitude AL and AR signals in Equation (5.2.28). However, the accuracy of the virtual source near θS = 50° should be further improved.
Poletti (2007)used the criterion of minimizing the overall square error between the reproduced sound pressures and target plane wave pressures to design Ambisonic signals.

Multichannel horizontal surround sound 205
A weighted combination of the power of each loudspeaker signal is used as a penalty function and added to the cost function of the square error to enhance the stability and avoid the excessive overall power of loudspeaker signals caused by an irregular loudspeaker configuration. The weights in the penalty function depend on the difference between the direction of each loudspeaker and the direction of the target source. On the basis of this method, Poletti designed a decoding equation for an ITU 5.1-channel loudspeaker configuration with the 4th- order Ambisonic-like signal mixing.
Wiggins (2007) used a Tabu search method to derive the decoding coefficients for an ITU 5.1-channel loudspeaker configuration with the 4th-order Ambisonic-like signal mixing. L (uniform) target source directions θS, l with l = 0, 1, 2… (L – 1) are chosen within the azimuthal region of −180° < θS ≤ 180° (or considering the left–right symmetry within the azimuthal region of −180 ° < θS ≤ 180°). Six root-mean-square (RMS) errors are evaluated:
Err |
|
1 |
L 1 |
1 P |
2 |
Err |
|
1 |
L 1 |
1 |
Pow |
|
2 |
, |
||
L |
|
L |
|
|
||||||||||||
1 |
|
A S,l |
|
2 |
|
|
|
|
S,l |
|
||||||
|
|
|
l 0 |
|
|
|
|
|
|
l 0 |
|
|
|
|
|
|
Err |
|
1 |
L 1 |
1 r |
2 |
Err |
|
1 |
L 1 |
1 |
r |
2 |
, |
(5.2.29) |
||
L |
|
L |
|
|||||||||||||
3 |
|
|
v S,l |
4 |
|
|
E S,l |
|
|
|
||||||
|
|
|
l 0 |
|
|
|
|
|
|
l 0 |
|
|
|
|
|
|
|
|
|
L 1 |
|
|
|
|
|
|
L 1 |
|
|
|
|
|
|
Err5 |
|
1 |
v,l S,l 2 |
|
Err6 |
|
1 |
E,l S,l 2 . |
|
|
||||||
L |
|
L |
|
|
||||||||||||
|
|
|
l 0 |
|
|
|
|
|
|
l 0 |
|
|
|
|
|
|
1.Err1 is the RMS error of the reproduced sound pressure P′A(θS,,l) at the origin over the L target directions. P′A(θS,,l) is calculated from Equation (5.2.19), and the target sound pressure at the origin is normalized to a unit.
2.Err2 is the RMS error of the reproduced power Pow′(θS,,l) at the origin over the L target directions. Pow′(θS,,l) is calculated from Equation (5.2.19), and the target power is normalized to a unit.
3.Err3 is the RMS error of the velocity vector magnitude rv(θS,,l) over the L target directions. rv(θS,,l) is calculated from Equation (3.2.29), and the ideal value of rv is a unit.
4.Err4 is the RMS error of the energy vector magnitude rE(θS,,l) over the L target directions. rE(θS,,l) is calculated from Equation (3.2.36), and the ideal value of rE is a unit.
5.Err5 is the RMS error of the velocity localization vector-based azimuth θv,,l over L target directions. θv,,l is calculated from Equation (3.2.27), and the ideal value is θv,,l = θS,,l.
6.Err6 is the RMS error of the energy localization vector-based azimuth θE,,l over the L target directions. θE,,l is calculated from Equation (3.2.34), and the ideal value is
θE,,l = θS,,l.
The cost function for solving the decoding coefficients is a weighted combination of the six RMS errors:
Err w1Err1 w2Err2 w3Err3 w4Err4 w5Err5 w6Err6. |
(5.2.30) |
The weights w1–w6 determine the relative contributions of each term in Equation (5.2.30). Increasing the weight of a certain term reduces the error caused by this term but at the cost of increasing the error caused by the other terms. Therefore, the weights should be chosen in accordance with certain psychoacoustic rules and the desired performance in reproduction.