
- •Preface
- •Introduction
- •1.1 Spatial coordinate systems
- •1.2 Sound fields and their physical characteristics
- •1.2.1 Free-field and sound waves generated by simple sound sources
- •1.2.2 Reflections from boundaries
- •1.2.3 Directivity of sound source radiation
- •1.2.4 Statistical analysis of acoustics in an enclosed space
- •1.2.5 Principle of sound receivers
- •1.3 Auditory system and perception
- •1.3.1 Auditory system and its functions
- •1.3.2 Hearing threshold and loudness
- •1.3.3 Masking
- •1.3.4 Critical band and auditory filter
- •1.4 Artificial head models and binaural signals
- •1.4.1 Artificial head models
- •1.4.2 Binaural signals and head-related transfer functions
- •1.5 Outline of spatial hearing
- •1.6 Localization cues for a single sound source
- •1.6.1 Interaural time difference
- •1.6.2 Interaural level difference
- •1.6.3 Cone of confusion and head movement
- •1.6.4 Spectral cues
- •1.6.5 Discussion on directional localization cues
- •1.6.6 Auditory distance perception
- •1.7 Summing localization and spatial hearing with multiple sources
- •1.7.1 Summing localization with two sound sources
- •1.7.2 The precedence effect
- •1.7.3 Spatial auditory perceptions with partially correlated and uncorrelated source signals
- •1.7.4 Auditory scene analysis and spatial hearing
- •1.7.5 Cocktail party effect
- •1.8 Room reflections and auditory spatial impression
- •1.8.1 Auditory spatial impression
- •1.8.2 Sound field-related measures and auditory spatial impression
- •1.8.3 Binaural-related measures and auditory spatial impression
- •1.9.1 Basic principle of spatial sound
- •1.9.2 Classification of spatial sound
- •1.9.3 Developments and applications of spatial sound
- •1.10 Summary
- •2.1 Basic principle of a two-channel stereophonic sound
- •2.1.1 Interchannel level difference and summing localization equation
- •2.1.2 Effect of frequency
- •2.1.3 Effect of interchannel phase difference
- •2.1.4 Virtual source created by interchannel time difference
- •2.1.5 Limitation of two-channel stereophonic sound
- •2.2.1 XY microphone pair
- •2.2.2 MS transformation and the MS microphone pair
- •2.2.3 Spaced microphone technique
- •2.2.4 Near-coincident microphone technique
- •2.2.5 Spot microphone and pan-pot technique
- •2.2.6 Discussion on microphone and signal simulation techniques for two-channel stereophonic sound
- •2.3 Upmixing and downmixing between two-channel stereophonic and mono signals
- •2.4 Two-channel stereophonic reproduction
- •2.4.1 Standard loudspeaker configuration of two-channel stereophonic sound
- •2.4.2 Influence of front-back deviation of the head
- •2.5 Summary
- •3.1 Physical and psychoacoustic principles of multichannel surround sound
- •3.2 Summing localization in multichannel horizontal surround sound
- •3.2.1 Summing localization equations for multiple horizontal loudspeakers
- •3.2.2 Analysis of the velocity and energy localization vectors of the superposed sound field
- •3.2.3 Discussion on horizontal summing localization equations
- •3.3 Multiple loudspeakers with partly correlated and low-correlated signals
- •3.4 Summary
- •4.1 Discrete quadraphone
- •4.1.1 Outline of the quadraphone
- •4.1.2 Discrete quadraphone with pair-wise amplitude panning
- •4.1.3 Discrete quadraphone with the first-order sound field signal mixing
- •4.1.4 Some discussions on discrete quadraphones
- •4.2 Other horizontal surround sounds with regular loudspeaker configurations
- •4.2.1 Six-channel reproduction with pair-wise amplitude panning
- •4.2.2 The first-order sound field signal mixing and reproduction with M ≥ 3 loudspeakers
- •4.3 Transformation of horizontal sound field signals and Ambisonics
- •4.3.1 Transformation of the first-order horizontal sound field signals
- •4.3.2 The first-order horizontal Ambisonics
- •4.3.3 The higher-order horizontal Ambisonics
- •4.3.4 Discussion and implementation of the horizontal Ambisonics
- •4.4 Summary
- •5.1 Outline of surround sounds with accompanying picture and general uses
- •5.2 5.1-Channel surround sound and its signal mixing analysis
- •5.2.1 Outline of 5.1-channel surround sound
- •5.2.2 Pair-wise amplitude panning for 5.1-channel surround sound
- •5.2.3 Global Ambisonic-like signal mixing for 5.1-channel sound
- •5.2.4 Optimization of three frontal loudspeaker signals and local Ambisonic-like signal mixing
- •5.2.5 Time panning for 5.1-channel surround sound
- •5.3 Other multichannel horizontal surround sounds
- •5.4 Low-frequency effect channel
- •5.5 Summary
- •6.1 Summing localization in multichannel spatial surround sound
- •6.1.1 Summing localization equations for spatial multiple loudspeaker configurations
- •6.1.2 Velocity and energy localization vector analysis for multichannel spatial surround sound
- •6.1.3 Discussion on spatial summing localization equations
- •6.1.4 Relationship with the horizontal summing localization equations
- •6.2 Signal mixing methods for a pair of vertical loudspeakers in the median and sagittal plane
- •6.3 Vector base amplitude panning
- •6.4 Spatial Ambisonic signal mixing and reproduction
- •6.4.1 Principle of spatial Ambisonics
- •6.4.2 Some examples of the first-order spatial Ambisonics
- •6.4.4 Recreating a top virtual source with a horizontal loudspeaker arrangement and Ambisonic signal mixing
- •6.5 Advanced multichannel spatial surround sounds and problems
- •6.5.1 Some advanced multichannel spatial surround sound techniques and systems
- •6.5.2 Object-based spatial sound
- •6.5.3 Some problems related to multichannel spatial surround sound
- •6.6 Summary
- •7.1 Basic considerations on the microphone and signal simulation techniques for multichannel sounds
- •7.2 Microphone techniques for 5.1-channel sound recording
- •7.2.1 Outline of microphone techniques for 5.1-channel sound recording
- •7.2.2 Main microphone techniques for 5.1-channel sound recording
- •7.2.3 Microphone techniques for the recording of three frontal channels
- •7.2.4 Microphone techniques for ambience recording and combination with frontal localization information recording
- •7.2.5 Stereophonic plus center channel recording
- •7.3 Microphone techniques for other multichannel sounds
- •7.3.1 Microphone techniques for other discrete multichannel sounds
- •7.3.2 Microphone techniques for Ambisonic recording
- •7.4 Simulation of localization signals for multichannel sounds
- •7.4.1 Methods of the simulation of directional localization signals
- •7.4.2 Simulation of virtual source distance and extension
- •7.4.3 Simulation of a moving virtual source
- •7.5 Simulation of reflections for stereophonic and multichannel sounds
- •7.5.1 Delay algorithms and discrete reflection simulation
- •7.5.2 IIR filter algorithm of late reverberation
- •7.5.3 FIR, hybrid FIR, and recursive filter algorithms of late reverberation
- •7.5.4 Algorithms of audio signal decorrelation
- •7.5.5 Simulation of room reflections based on physical measurement and calculation
- •7.6 Directional audio coding and multichannel sound signal synthesis
- •7.7 Summary
- •8.1 Matrix surround sound
- •8.1.1 Matrix quadraphone
- •8.1.2 Dolby Surround system
- •8.1.3 Dolby Pro-Logic decoding technique
- •8.1.4 Some developments on matrix surround sound and logic decoding techniques
- •8.2 Downmixing of multichannel sound signals
- •8.3 Upmixing of multichannel sound signals
- •8.3.1 Some considerations in upmixing
- •8.3.2 Simple upmixing methods for front-channel signals
- •8.3.3 Simple methods for Ambient component separation
- •8.3.4 Model and statistical characteristics of two-channel stereophonic signals
- •8.3.5 A scale-signal-based algorithm for upmixing
- •8.3.6 Upmixing algorithm based on principal component analysis
- •8.3.7 Algorithm based on the least mean square error for upmixing
- •8.3.8 Adaptive normalized algorithm based on the least mean square for upmixing
- •8.3.9 Some advanced upmixing algorithms
- •8.4 Summary
- •9.1 Each order approximation of ideal reproduction and Ambisonics
- •9.1.1 Each order approximation of ideal horizontal reproduction
- •9.1.2 Each order approximation of ideal three-dimensional reproduction
- •9.2 General formulation of multichannel sound field reconstruction
- •9.2.1 General formulation of multichannel sound field reconstruction in the spatial domain
- •9.2.2 Formulation of spatial-spectral domain analysis of circular secondary source array
- •9.2.3 Formulation of spatial-spectral domain analysis for a secondary source array on spherical surface
- •9.3 Spatial-spectral domain analysis and driving signals of Ambisonics
- •9.3.1 Reconstructed sound field of horizontal Ambisonics
- •9.3.2 Reconstructed sound field of spatial Ambisonics
- •9.3.3 Mixed-order Ambisonics
- •9.3.4 Near-field compensated higher-order Ambisonics
- •9.3.5 Ambisonic encoding of complex source information
- •9.3.6 Some special applications of spatial-spectral domain analysis of Ambisonics
- •9.4 Some problems related to Ambisonics
- •9.4.1 Secondary source array and stability of Ambisonics
- •9.4.2 Spatial transformation of Ambisonic sound field
- •9.5 Error analysis of Ambisonic-reconstructed sound field
- •9.5.1 Integral error of Ambisonic-reconstructed wavefront
- •9.5.2 Discrete secondary source array and spatial-spectral aliasing error in Ambisonics
- •9.6 Multichannel reconstructed sound field analysis in the spatial domain
- •9.6.1 Basic method for analysis in the spatial domain
- •9.6.2 Minimizing error in reconstructed sound field and summing localization equation
- •9.6.3 Multiple receiver position matching method and its relation to the mode-matching method
- •9.7 Listening room reflection compensation in multichannel sound reproduction
- •9.8 Microphone array for multichannel sound field signal recording
- •9.8.1 Circular microphone array for horizontal Ambisonic recording
- •9.8.2 Spherical microphone array for spatial Ambisonic recording
- •9.8.3 Discussion on microphone array recording
- •9.9 Summary
- •10.1 Basic principle and implementation of wave field synthesis
- •10.1.1 Kirchhoff–Helmholtz boundary integral and WFS
- •10.1.2 Simplification of the types of secondary sources
- •10.1.3 WFS in a horizontal plane with a linear array of secondary sources
- •10.1.4 Finite secondary source array and effect of spatial truncation
- •10.1.5 Discrete secondary source array and spatial aliasing
- •10.1.6 Some issues and related problems on WFS implementation
- •10.2 General theory of WFS
- •10.2.1 Green’s function of Helmholtz equation
- •10.2.2 General theory of three-dimensional WFS
- •10.2.3 General theory of two-dimensional WFS
- •10.2.4 Focused source in WFS
- •10.3 Analysis of WFS in the spatial-spectral domain
- •10.3.1 General formulation and analysis of WFS in the spatial-spectral domain
- •10.3.2 Analysis of the spatial aliasing in WFS
- •10.3.3 Spatial-spectral division method of WFS
- •10.4 Further discussion on sound field reconstruction
- •10.4.1 Comparison among various methods of sound field reconstruction
- •10.4.2 Further analysis of the relationship between acoustical holography and sound field reconstruction
- •10.4.3 Further analysis of the relationship between acoustical holography and Ambisonics
- •10.4.4 Comparison between WFS and Ambisonics
- •10.5 Equalization of WFS under nonideal conditions
- •10.6 Summary
- •11.1 Basic principles of binaural reproduction and virtual auditory display
- •11.1.1 Binaural recording and reproduction
- •11.1.2 Virtual auditory display
- •11.2 Acquisition of HRTFs
- •11.2.1 HRTF measurement
- •11.2.2 HRTF calculation
- •11.2.3 HRTF customization
- •11.3 Basic physical features of HRTFs
- •11.3.1 Time-domain features of far-field HRIRs
- •11.3.2 Frequency domain features of far-field HRTFs
- •11.3.3 Features of near-field HRTFs
- •11.4 HRTF-based filters for binaural synthesis
- •11.5 Spatial interpolation and decomposition of HRTFs
- •11.5.1 Directional interpolation of HRTFs
- •11.5.2 Spatial basis function decomposition and spatial sampling theorem of HRTFs
- •11.5.3 HRTF spatial interpolation and signal mixing for multichannel sound
- •11.5.4 Spectral shape basis function decomposition of HRTFs
- •11.6 Simplification of signal processing for binaural synthesis
- •11.6.1 Virtual loudspeaker-based algorithms
- •11.6.2 Basis function decomposition-based algorithms
- •11.7.1 Principle of headphone equalization
- •11.7.2 Some problems with binaural reproduction and VAD
- •11.8 Binaural reproduction through loudspeakers
- •11.8.1 Basic principle of binaural reproduction through loudspeakers
- •11.8.2 Virtual source distribution in two-front loudspeaker reproduction
- •11.8.3 Head movement and stability of virtual sources in Transaural reproduction
- •11.8.4 Timbre coloration and equalization in transaural reproduction
- •11.9 Virtual reproduction of stereophonic and multichannel surround sound
- •11.9.1 Binaural reproduction of stereophonic and multichannel sound through headphones
- •11.9.2 Stereophonic expansion and enhancement
- •11.9.3 Virtual reproduction of multichannel sound through loudspeakers
- •11.10.1 Binaural room modeling
- •11.10.2 Dynamic virtual auditory environments system
- •11.11 Summary
- •12.1 Physical analysis of binaural pressures in summing virtual source and auditory events
- •12.1.1 Evaluation of binaural pressures and localization cues
- •12.1.2 Method for summing localization analysis
- •12.1.3 Binaural pressure analysis of stereophonic and multichannel sound with amplitude panning
- •12.1.4 Analysis of summing localization with interchannel time difference
- •12.1.5 Analysis of summing localization at the off-central listening position
- •12.1.6 Analysis of interchannel correlation and spatial auditory sensations
- •12.2 Binaural auditory models and analysis of spatial sound reproduction
- •12.2.1 Analysis of lateral localization by using auditory models
- •12.2.2 Analysis of front-back and vertical localization by using a binaural auditory model
- •12.2.3 Binaural loudness models and analysis of the timbre of spatial sound reproduction
- •12.3 Binaural measurement system for assessing spatial sound reproduction
- •12.4 Summary
- •13.1 Analog audio storage and transmission
- •13.1.1 45°/45° Disk recording system
- •13.1.2 Analog magnetic tape audio recorder
- •13.1.3 Analog stereo broadcasting
- •13.2 Basic concepts of digital audio storage and transmission
- •13.3 Quantization noise and shaping
- •13.3.1 Signal-to-quantization noise ratio
- •13.3.2 Quantization noise shaping and 1-Bit DSD coding
- •13.4 Basic principle of digital audio compression and coding
- •13.4.1 Outline of digital audio compression and coding
- •13.4.2 Adaptive differential pulse-code modulation
- •13.4.3 Perceptual audio coding in the time-frequency domain
- •13.4.4 Vector quantization
- •13.4.5 Spatial audio coding
- •13.4.6 Spectral band replication
- •13.4.7 Entropy coding
- •13.4.8 Object-based audio coding
- •13.5 MPEG series of audio coding techniques and standards
- •13.5.1 MPEG-1 audio coding technique
- •13.5.2 MPEG-2 BC audio coding
- •13.5.3 MPEG-2 advanced audio coding
- •13.5.4 MPEG-4 audio coding
- •13.5.5 MPEG parametric coding of multichannel sound and unified speech and audio coding
- •13.5.6 MPEG-H 3D audio
- •13.6 Dolby series of coding techniques
- •13.6.1 Dolby digital coding technique
- •13.6.2 Some advanced Dolby coding techniques
- •13.7 DTS series of coding technique
- •13.8 MLP lossless coding technique
- •13.9 ATRAC technique
- •13.10 Audio video coding standard
- •13.11 Optical disks for audio storage
- •13.11.1 Structure, principle, and classification of optical disks
- •13.11.2 CD family and its audio formats
- •13.11.3 DVD family and its audio formats
- •13.11.4 SACD and its audio formats
- •13.11.5 BD and its audio formats
- •13.12 Digital radio and television broadcasting
- •13.12.1 Outline of digital radio and television broadcasting
- •13.12.2 Eureka-147 digital audio broadcasting
- •13.12.3 Digital radio mondiale
- •13.12.4 In-band on-channel digital audio broadcasting
- •13.12.5 Audio for digital television
- •13.13 Audio storage and transmission by personal computer
- •13.14 Summary
- •14.1 Outline of acoustic conditions and requirements for spatial sound intended for domestic reproduction
- •14.2 Acoustic consideration and design of listening rooms
- •14.3 Arrangement and characteristics of loudspeakers
- •14.3.1 Arrangement of the main loudspeakers in listening rooms
- •14.3.2 Characteristics of the main loudspeakers
- •14.3.3 Bass management and arrangement of subwoofers
- •14.4 Signal and listening level alignment
- •14.5 Standards and guidance for conditions of spatial sound reproduction
- •14.6 Headphones and binaural monitors of spatial sound reproduction
- •14.7 Acoustic conditions for cinema sound reproduction and monitoring
- •14.8 Summary
- •15.1 Outline of psychoacoustic and subjective assessment experiments
- •15.2 Contents and attributes for spatial sound assessment
- •15.3 Auditory comparison and discrimination experiment
- •15.3.1 Paradigms of auditory comparison and discrimination experiment
- •15.3.2 Examples of auditory comparison and discrimination experiment
- •15.4 Subjective assessment of small impairments in spatial sound systems
- •15.5 Subjective assessment of a spatial sound system with intermediate quality
- •15.6 Virtual source localization experiment
- •15.6.1 Basic methods for virtual source localization experiments
- •15.6.2 Preliminary analysis of the results of virtual source localization experiments
- •15.6.3 Some results of virtual source localization experiments
- •15.7 Summary
- •16.1.1 Application to commercial cinema and related problems
- •16.1.2 Applications to domestic reproduction and related problems
- •16.1.3 Applications to automobile audio
- •16.2.1 Applications to virtual reality
- •16.2.2 Applications to communication and information systems
- •16.2.3 Applications to multimedia
- •16.2.4 Applications to mobile and handheld devices
- •16.3 Applications to the scientific experiments of spatial hearing and psychoacoustics
- •16.4 Applications to sound field auralization
- •16.4.1 Auralization in room acoustics
- •16.4.2 Other applications of auralization technique
- •16.5 Applications to clinical medicine
- •16.6 Summary
- •References
- •Index

176 Spatial Sound
Table 4.1 Parameters and characters for the first-order horizontal Ambisonics with regular loudspeaker configurations and different decoding methods.
Criteria for optimization |
Frequency range |
Listening region |
Normalization |
|
Atotal |
b |
rv = 1 |
Low |
Small |
Amplitude |
1/M |
2 |
|
rv = 1 |
Low |
Small |
Power |
1/ |
3M |
2 |
Maximize rE |
Mid and high |
Small |
Power |
1/ |
2M |
2 |
In-phase |
Full |
Large |
Power |
|
2/ 3M |
1 |
|
|
|
|
|
|
|
4.3.3 The higher-order horizontal Ambisonics
Sound field signals in Ambisonics are extended from the first order to higher orders to improve the accuracy in spatial information reproduction, which are termed higher-order Ambisonics (HOA; Xie X.F., 1978b; Xie and Xie, 1996; Bamford and Vanderkooy, 1995; Daniel et al., 1998, 2003). For the first-order horizontal Ambisonics, the signal of the loudspeaker at azimuth θi is given in Equation (4.3.15) and can be written as the linear combination of a target-azimuthal-independent component W = 1 and a pair of first-order target-azimuthal harmonics X = cosθS and Y = sinθS:
Ai S Atotal W D11 i X D12 i Y
Atotal 1 D11 i cos S D12 i sin S ,
(4.3.45)
where
D11 i bcos i |
D12 i bsin i. |
(4.3.46) |
For the second-order horizontal Ambisonics, two additional second-order target-azimuthal harmonics components are supplemented to the signals expressed in Equation (4.3.45):
U cos 2 S V sin 2 S. |
(4.3.47) |
Then, the normalized signal amplitude of the loudspeaker at θi becomes a linear combination of five independent components or signals W, X, Y, U, and V:
|
1 |
2 |
|
1 |
i U |
2 |
i V |
|
|
|
Ai S Atotal W D1 |
|
i X D1 |
i Y D2 |
D2 |
|
. (4.3.48) |
||||
|
1 |
|
|
2 |
i sin S |
1 |
|
|
2 |
|
Atotal 1 |
D1 |
i cos S D1 |
D2 |
i cos 2 S D2 |
i sin 2 S |
Generally, the independent signals of the Q-order horizontal Ambisonics with Q ≥ 1 consist of the preceding (2Q + 1) azimuthal harmonics up to the Q order:
1, cos q S, sin q S q 1, 2. Q. |
(4.3.49) |

Horizontal surround with regular loudspeaker configuration 177
The normalized signal amplitude of the loudspeaker at θi is a linear combination of (2Q + 1) independent components:
Q |
1 |
2 |
|
|
|
|
|||
Ai S Atotal[1 Dq |
i cos q S Dq |
i sin q S . |
(4.3.50) |
q 1
Equation (4.3.50) can also be written in the matrix form as
A Atotal D2D S. |
(4.3.51) |
where A = [A0(θS), A1 (θS),…AM−1 (θS)]T is an M × 1 column matrix or vector composed of M normalized loudspeaker signals; the superscript “T” denotes the matrix transpose; S = [1,
cosθS, sinθS, cos2θS, sin2θS…, cosQθS, sinQθS]T is a (2Q + 1) × 1 column matrix composed of (2Q + 1) normalized independent signals; [D2D] is an M × (2Q + 1) decoding matrix with entries 1, Dq1 i , Dq2 i , q 1, 2 Q.
Loudspeaker signals depend on entries Dq1 i and Dq2 i . When a loudspeaker configuration is regular and θi of each loudspeaker is given in Equation (4.3.14a), similar to Equation (4.3.46), Dq1 i and Dq2 i take the forms
Dq1 i 2 q cos q i |
Dq2 i 2 q sin q i q 0 b 2 1. |
(4.3.52) |
Then,
|
Q |
|
2 q cos q i |
Ai S Atotal 1 |
|
|
q 1 |
Q
Atotal 1 2 q cos q S
q 1
|
|
|
|
|
|
cos q S q sin q i sin q S |
|
|
|
(4.3.53) |
|
|
||
|
||
i . |
|
|
|
|
|
|
|
The decoding parameter κq specifies the relative proportion of each azimuthal harmonics in loudspeaker signals. A set of κq can be regarded as an azimuthal harmonic window applied to truncate the azimuthal harmonics up to order Q. The loudspeaker signal magnitude maximizes when the target source direction is exactly consistent with the loudspeaker direction, i.e., θS = θi, then
Ai S max |
|
Q |
|
|
|
|
|
(4.3.54) |
|
Atotal 1 |
2 q . |
|||
|
|
q 1 |
|
|
Similar to the case of the first-order Ambisonics, for a regular loudspeaker configuration with θi given in Equations (4.3.14a), (4.3.53), (4.3.16) to (4.3.18) verify that when
M Q 2, |
(4.3.55) |

178 Spatial Sound
the following equation is obtained:
M 1
Ai S MAtotal
i 0 |
|
(4.3.56) |
|
M 1 |
M 1 |
||
|
|||
Ai S cos i 1MAtotal cos S |
Ai S sin i 1MAtotal sin S. |
|
|
i 0 |
i 0 |
|
Equation (4.3.56) is only related to the parameter or proportion κ1 of the first-order azimuthal harmonic component and independent from the second or higher-order azimuthal harmonic component. The perceived virtual source direction is evaluated from Equations (3.2.7) and (3.2.9). For a fixed head,
|
M 1 |
|
|
sin I |
Ai S sin i |
1 sin S, |
|
i 0 |
(4.3.57) |
||
M 1 |
|||
|
Ai S |
|
|
i 0
For the head oriented to the virtual source,
M 1
Ai S sin i
|
i 0 |
|
|
tan I |
|
tan S. |
(4.3.58) |
M 1 |
|||
|
Ai S cos i |
|
|
i 0
The condition of the head oriented to the virtual source in Equation (4.3.58), or the direction of the velocity localization vector does not place restrictions on κq, and the condition for a fixed head in Equation (4.3.57) only limits κ1. When
1 1 |
or b 2 1 2, |
(4.3.59) |
the following equation is obtained:
sin I sin S. |
(4.3.60) |
Equations (4.3.58) and (4.3.60) yield
I I S. |
(4.3.61) |
In this case, the perceived virtual source azimuth matches with that of the target source within the full horizontal direction of −180° ≤ θS ≤ 180°, and the results of the head oriented to the virtual source are consistent with those of a fixed head. Equation (3.2.29) proves that the velocity vector magnitude is rv = 1 when κ1 = b/2 =1. In other words, for Q > 1 order horizontal Ambisonics, the optimized velocity localization vector only requires the decoding parameter of the first-order azimuthal harmonics to be κ1 = b/2 = 1 without restriction on κq for the second or higher-order azimuthal harmonics.

Horizontal surround with regular loudspeaker configuration 179
κq can also be derived from other physical criteria. In Section 9.1, under the criterion of spatial harmonic decomposition and each order approximation of the target sound field, the parameters should be
q b/2 1 |
q 1, 2 Q. |
(4.3.62) |
Loudspeaker signals with the parameter given in Equation (4.3.62)are the conventional or fundamental solution of Ambisonic signals to which a rectangular azimuthal harmonic window is applied to truncate azimuthal harmonic components up to Q.
The two second-order azimuthal harmonic components of the second-order Ambisonics are given in Equation (4.3.47). These two components are equivalent to the signals captured by a pair of the second-order directional microphones, and the polar patterns are illustrated in Figure 4.16. The responses of the second-order azimuthal harmonics vary faster than those of the first-order azimuthal harmonics. Similarly, in comparison with the (Q − 1)-order reproduction signals, two additional Q-order target-azimuthal harmonic components are supplemented to the Q-order reproduction signals. These additional components are equivalent to the signals captured by a pair of the Q-order directional microphones. In practice, the realization of the higher-order directional microphones is difficult, but the higher-order azimuthal harmonic components can be derived from the outputs of a spherical microphone array (Section 9.8). In addition, higher-order azimuth harmonic components can be easily simulated by signal processing.
The normalized signals of Q-order horizontal Ambisonics can be obtained by substituting Equation (4.3.62) into Equation (4.3.53):
|
|
Q |
|
|
|
|
Ai S Atotal |
|
2 cos q i cos q S |
|
|||
1 |
sin q i sin q S |
|||||
|
|
q 1 |
|
|
|
|
|
|
Q |
|
|
|
|
Atotal |
|
|
|
|
|
(4.3.63) |
1 |
2 cos q S i |
|||||
|
|
q 1 |
|
|
|
|
|
|
|
1 |
|
|
|
|
sin Q |
S i |
|
|
||
Atotal |
|
|
2 |
|
. |
|
|
sin |
S i |
|
|
||
|
|
|
|
|||
|
|
2 |
|
|
||
|
|
|
|
|
|
(a) U = cos2θS |
(b) V = sin2θS |
Figure 4.16 Polar patterns of a pair of the second-order directional microphones (a) U = cos2θS; (b) V = sin2θS.

180 Spatial Sound
(a) Q = 1 (b) Q = 2 (c) Q = 3
Figure 4.17 Polar patterns of the preceding three orders Ambisonic signals Ai(θS) = Ai(θS − θi) (a) Q = 1;
(b) Q = 2; (C) Q = 3
The recording and reproduction of Ambisonic signals are analyzed to obtain insights into the physical nature of Equation (4.3.63). Figure 4.17 illustrates the polar pattern of the preceding three orders Ambisonic signals Ai(θS) = Ai(θS − θi). In Figure 4.17, the maximal magnitude of the signal is normalized to a unit. From the point of signal recording, Equation (4.3.63) is equivalent to a signal captured with a microphone with an appropriate directivity and main axis direction. The main lobe of the signal is centered at θ′i = (θS − θi) = 0°. The response |Ai(θS − θi)| maximizes at the on-axis direction θ′i = 0° and then decreases as | θ′i| increases. As | θ′i | further increases, the responses exhibit the side and rear lobes (in-phase or out-of-phase) and null points of the polar patterns at some azimuths. As order Q increases, the width of the main lobe and the responses of the side and rear lobes decrease, sharpening the directivity of the resultant signal.
From the point of reproduction, the reproduced sound field in Ambisonics is a superposition of the pressures caused by M loudspeakers. Equation (4.3.63) represents the signal for the ith loudspeaker with θi being the azimuth of the loudspeaker and θS being the azimuth of the target source. As Q of reproduction signals increases, the relative signal magnitude of the loudspeaker nearest the target source direction increases, and the relative signal magnitudes (crosstalk) of the other loudspeakers decrease. Consequently, the perceived performance of the virtual source improves, and the listening region widens. However, these improvements occur at the cost of increasing the complexity of the system. For example, Figure 4.18 illustrates the virtual source position for Q = 1, 2, and 3 order horizontal Ambisonic reproduction. The results are evaluated from Equation (3.2.6) for an eight-loudspeaker configuration (Figure 4.12) and a fixed head. The frequency is f = 0.7 kHz, and the precorrected head radius is a′ = 1.25 × 0.0875 m. Figure 4.18 illustrates the results within 0° ≤ θS ≤ 90° only because of symmetry. Ideally, the perceived virtual source direction should be consistent with that of the
Figure 4.18 Virtual source position for Q = 1, 2, and 3 order horizontal Ambisonic reproduction with an eightloudspeaker configuration and a fixed head.The frequency is f = 0.7 kHz.

Horizontal surround with regular loudspeaker configuration 181
target source direction. The figure also indicates that the movement of the virtual source with frequency decreases as the order increases, and the upper frequency limit for the accurate reproduction of spatial information increases. This problem is addressed in Section 9.3.1.
In the case of the Q-order horizontal Ambisonic signals given in Equation (4.3.14a) for a regular loudspeaker configuration with the number of loudspeakers,
M 2Q 1 |
(4.3.64) |
Equation (4.3.63) verifies that the signals of other 2Q loudspeakers vanish when the target source is located at the direction of one loudspeaker except for the signal of this loudspeaker. In other words, the virtual source in loudspeaker directions is recreated by the single loudspeaker, the crosstalks among loudspeakers vanish, and localization performance enhances. The first-order reproduction with three loudspeakers discussed in Section 4.2.2 is a special example of this case. Here, the analysis is extended to the case of arbitrary Q-order reproduction. If M > (2Q + 1) loudspeakers are used in the Q-order reproduction, crosstalks among loudspeaker signals exist even if the target source is located at the direction of one loudspeaker. For horizontal Ambisonics with a regular loudspeaker configuration, the reproduced sound field exhibits symmetry against the rotation around the vertical axis.
Similar to the case of the first-order Ambisonics, the energy localization vector discussed in Section 3.2.2 is applicable to the analysis of the secondand higher-order Ambisonics. The mid-and high-frequency criteria for optimizing the energy localization vector can be used to choose κq in Equation (4.3.53) (Daniel et al., 1998).From Equation (4.3.53) and by using Equations (4.3.16) to (4.3.18), when
|
M |
|
2Q 1 , |
|
|
|
|
(4.3.65) |
|
|
|
|
|
|
|
|
|
the following equation is obtained: |
|
|
|
|
|
|
|
|
M 1 |
|
|
|
|
|
Q |
|
|
2 |
|
|
2 |
|
1 |
2 |
|
(4.3.66) |
Pow Ai |
S MAtotal |
|
2 q |
. |
||||
i 0 |
|
|
|
|
|
q 1 |
|
|
Therefore, for regular loudspeaker configurations, the overall free-field sound power at the origin in reproduction is independent from the target source direction θS.
Similar to the case of the first-order Ambisonics, by considering Equation (4.3.53), using Equations (4.3.16) to (4.3.18), and applying some simple formulas of trigonometric functions, when
|
M 2Q 2 , |
(4.3.67) |
|
the following equations are obtained: |
|
|
|
M 1 |
|
Q |
|
Ai2 |
S cos i 2MAtotal2 |
q q 1 cos S, |
(4.3.68) |
i 0 |
|
q 1 |
|
M 1 |
|
Q |
|
Ai2 |
S sin i 2MAtotal2 |
q q 1 sin S, |
(4.3.69) |
i 0 |
|
q 1 |
|

182 Spatial Sound
where κ0 = 1.
According to Equation (3.2.34), the direction θE of the energy localization vector satisfies the following equations:
|
M 1 |
|
|
|
|
|
Q |
|
|
|
Ai2 |
S cos i |
|
|
2 q q 1 cos S |
||||
rE cos E |
i 0 |
|
|
|
|
|
q 1 |
|
, |
M 1 |
|
|
|
|
Q |
|
|||
|
Ai2 |
S |
|
|
1 2 q2 |
|
|
||
|
i 0 |
|
|
|
|
|
q 1 |
(4.3.70) |
|
|
M 1 |
|
|
|
|
|
Q |
||
|
|
|
|
|
|
|
|
||
|
Ai2 |
S sin i |
|
2 q q 1 sin S |
|
|
|||
rE sin E |
i 0 |
|
|
|
|
|
q 1 |
. |
|
M 1 |
|
|
|
|
Q |
||||
|
Ai2 |
S |
|
|
1 2 q2 |
|
|
||
|
i 1 |
|
|
|
|
|
q 1 |
|
|
Then,
tan E tan S. |
(4.3.71) |
For a regular loudspeaker configuration in the horizontal plane, the direction of the energy localization vector for the Q-order Ambisonic signals given in Equation (4.3.53) matches that of the target source direction and is independent of κq (q = 1, 2…Q). This feature is desirable if the hypothesis of the energy localization vector theorem is valid above 0.7 kHz.
For the Q-order horizontal Ambisonics with a regular loudspeaker configuration, the condition that the overall power is given in Equation (4.3.65) is target-direction-independent requires M ≥ (2Q + 1) reproduction channels and loudspeakers. The result of the energy localization vector theorem given in Equation (4.3.67)requires one more channel and loudspeaker than that of Equation (4.3.65), i.e., it requires (2Q + 2) channels and loudspeakers at least. Therefore, using (2Q +2) loudspeakers at least is appropriate for the Q-order horizontal Ambisonics to consider the requirement of the energy localization vector. This number of channels and loudspeakers is minimal for the Q-order horizontal Ambisonic reproduction. The same conclusion is made in Section 4.3.2 for the first-order reproduction, and the conclusion is extended to the arbitrary Q-order reproduction. Therefore, Q = 1, 2, 3, and 4-order horizontal Ambisonics require 4, 6, 8, and 10 loudspeakers, respectively. This conclusion is consistent with the results derived from an evaluation of the width of the directivity of the signals for arbitrary Q-order reproduction (Xie and Xie, 1996). As stated in Section 4.2.2, a further increase in the number of loudspeakers may decrease the perceived difference between a virtual source in and off loudspeaker directions, but it may also cause some other problems.
The energy vector magnitude of the Q-order reproduction is evaluated from Equation (4.3.70) or more directly from Equation (3.2.36):
|
Q |
|
|
|
|
2 q q 1 |
|
||
rE |
q 1 |
|
. |
(4.3.72) |
|
Q |
|||
|
1 2 q2 |
|
q 1

Horizontal surround with regular loudspeaker configuration 183
For the conventional solution of the Ambisonic signals with κq = κ0 = 1 given in Equation (4.3.62), Equation (4.3.72) yields
rE |
|
2Q |
. |
(4.3.73) |
1 |
|
|||
|
2Q |
|
Q = 1, 2, and 3-order Ambisonics have the resultant rE of 0.667, 0.800, and 0.857, respectively. As Q increases, rE gradually approaches the unit value, i.e., it approaches the case of ideal reproduction.
If the hypothesis of the energy localization vector theorem is valid above 0.7 kHz, the criterion of maximizing the energy vector magnitude can be applied to choose κq in Equation (4.3.53). In Equation (4.3.72). According to the condition:
|
rE |
0 |
q 1, 2 Q, |
(4.3.74) |
|
|
|
||||
|
q |
|
|
|
|
a set of equations for κq is obtained |
|
|
|
||
q 1 2rE q q 1 |
0 |
q 1, 2 Q 1 |
(4.3.75) |
||
Q 1 2rE Q 0. |
|
||||
|
|
The solution for these equations is expressed as
|
q |
|
|
|
q cos |
|
|
q 1, 2 Q. |
|
2Q 2 |
||||
|
|
|
The maximum energy vector magnitude is
rE max cos 2 2 .
Q
(4.3.76)
(4.3.77)
Q = 1, 2, and 3-order horizontal Ambisonics have (rE)max of 0.707, 0.866, and 0.924, respectively. As Q increases, (rE)max approaches the unit value. However, Equation (4.3.73) indicates that rE approaches the unit value as Q increases even for the conventional solution of Ambisonic signals. Therefore, the optimization of energy vector magnitude at mid and high frequencies may be unnecessary for choosing κq in the HOA.
Similar to the case of the first-order Ambisonics, the in-phase solutions for the secondor higher-order horizontal Ambisonic signals can be derived to restrain the out-of-phase crosstalks from the opposite channels. Various in-phase solutions are provided for the second- and higher-order horizontal Ambisonics. Some additional criteria are applied to derive the in-phase solutions, resulting in maximum rE, maximum front–back ratio, a maximum integrated front–back ratio, smooth and first-order extended solutions (Monro, 2000). If the following κq is chosen for the in-phase solution (Daniel, 2000; Neukom, 2007),
Q! 2 |
|
q Q q ! Q q ! , |
(4.3.78) |