
- •Preface
- •Introduction
- •1.1 Spatial coordinate systems
- •1.2 Sound fields and their physical characteristics
- •1.2.1 Free-field and sound waves generated by simple sound sources
- •1.2.2 Reflections from boundaries
- •1.2.3 Directivity of sound source radiation
- •1.2.4 Statistical analysis of acoustics in an enclosed space
- •1.2.5 Principle of sound receivers
- •1.3 Auditory system and perception
- •1.3.1 Auditory system and its functions
- •1.3.2 Hearing threshold and loudness
- •1.3.3 Masking
- •1.3.4 Critical band and auditory filter
- •1.4 Artificial head models and binaural signals
- •1.4.1 Artificial head models
- •1.4.2 Binaural signals and head-related transfer functions
- •1.5 Outline of spatial hearing
- •1.6 Localization cues for a single sound source
- •1.6.1 Interaural time difference
- •1.6.2 Interaural level difference
- •1.6.3 Cone of confusion and head movement
- •1.6.4 Spectral cues
- •1.6.5 Discussion on directional localization cues
- •1.6.6 Auditory distance perception
- •1.7 Summing localization and spatial hearing with multiple sources
- •1.7.1 Summing localization with two sound sources
- •1.7.2 The precedence effect
- •1.7.3 Spatial auditory perceptions with partially correlated and uncorrelated source signals
- •1.7.4 Auditory scene analysis and spatial hearing
- •1.7.5 Cocktail party effect
- •1.8 Room reflections and auditory spatial impression
- •1.8.1 Auditory spatial impression
- •1.8.2 Sound field-related measures and auditory spatial impression
- •1.8.3 Binaural-related measures and auditory spatial impression
- •1.9.1 Basic principle of spatial sound
- •1.9.2 Classification of spatial sound
- •1.9.3 Developments and applications of spatial sound
- •1.10 Summary
- •2.1 Basic principle of a two-channel stereophonic sound
- •2.1.1 Interchannel level difference and summing localization equation
- •2.1.2 Effect of frequency
- •2.1.3 Effect of interchannel phase difference
- •2.1.4 Virtual source created by interchannel time difference
- •2.1.5 Limitation of two-channel stereophonic sound
- •2.2.1 XY microphone pair
- •2.2.2 MS transformation and the MS microphone pair
- •2.2.3 Spaced microphone technique
- •2.2.4 Near-coincident microphone technique
- •2.2.5 Spot microphone and pan-pot technique
- •2.2.6 Discussion on microphone and signal simulation techniques for two-channel stereophonic sound
- •2.3 Upmixing and downmixing between two-channel stereophonic and mono signals
- •2.4 Two-channel stereophonic reproduction
- •2.4.1 Standard loudspeaker configuration of two-channel stereophonic sound
- •2.4.2 Influence of front-back deviation of the head
- •2.5 Summary
- •3.1 Physical and psychoacoustic principles of multichannel surround sound
- •3.2 Summing localization in multichannel horizontal surround sound
- •3.2.1 Summing localization equations for multiple horizontal loudspeakers
- •3.2.2 Analysis of the velocity and energy localization vectors of the superposed sound field
- •3.2.3 Discussion on horizontal summing localization equations
- •3.3 Multiple loudspeakers with partly correlated and low-correlated signals
- •3.4 Summary
- •4.1 Discrete quadraphone
- •4.1.1 Outline of the quadraphone
- •4.1.2 Discrete quadraphone with pair-wise amplitude panning
- •4.1.3 Discrete quadraphone with the first-order sound field signal mixing
- •4.1.4 Some discussions on discrete quadraphones
- •4.2 Other horizontal surround sounds with regular loudspeaker configurations
- •4.2.1 Six-channel reproduction with pair-wise amplitude panning
- •4.2.2 The first-order sound field signal mixing and reproduction with M ≥ 3 loudspeakers
- •4.3 Transformation of horizontal sound field signals and Ambisonics
- •4.3.1 Transformation of the first-order horizontal sound field signals
- •4.3.2 The first-order horizontal Ambisonics
- •4.3.3 The higher-order horizontal Ambisonics
- •4.3.4 Discussion and implementation of the horizontal Ambisonics
- •4.4 Summary
- •5.1 Outline of surround sounds with accompanying picture and general uses
- •5.2 5.1-Channel surround sound and its signal mixing analysis
- •5.2.1 Outline of 5.1-channel surround sound
- •5.2.2 Pair-wise amplitude panning for 5.1-channel surround sound
- •5.2.3 Global Ambisonic-like signal mixing for 5.1-channel sound
- •5.2.4 Optimization of three frontal loudspeaker signals and local Ambisonic-like signal mixing
- •5.2.5 Time panning for 5.1-channel surround sound
- •5.3 Other multichannel horizontal surround sounds
- •5.4 Low-frequency effect channel
- •5.5 Summary
- •6.1 Summing localization in multichannel spatial surround sound
- •6.1.1 Summing localization equations for spatial multiple loudspeaker configurations
- •6.1.2 Velocity and energy localization vector analysis for multichannel spatial surround sound
- •6.1.3 Discussion on spatial summing localization equations
- •6.1.4 Relationship with the horizontal summing localization equations
- •6.2 Signal mixing methods for a pair of vertical loudspeakers in the median and sagittal plane
- •6.3 Vector base amplitude panning
- •6.4 Spatial Ambisonic signal mixing and reproduction
- •6.4.1 Principle of spatial Ambisonics
- •6.4.2 Some examples of the first-order spatial Ambisonics
- •6.4.4 Recreating a top virtual source with a horizontal loudspeaker arrangement and Ambisonic signal mixing
- •6.5 Advanced multichannel spatial surround sounds and problems
- •6.5.1 Some advanced multichannel spatial surround sound techniques and systems
- •6.5.2 Object-based spatial sound
- •6.5.3 Some problems related to multichannel spatial surround sound
- •6.6 Summary
- •7.1 Basic considerations on the microphone and signal simulation techniques for multichannel sounds
- •7.2 Microphone techniques for 5.1-channel sound recording
- •7.2.1 Outline of microphone techniques for 5.1-channel sound recording
- •7.2.2 Main microphone techniques for 5.1-channel sound recording
- •7.2.3 Microphone techniques for the recording of three frontal channels
- •7.2.4 Microphone techniques for ambience recording and combination with frontal localization information recording
- •7.2.5 Stereophonic plus center channel recording
- •7.3 Microphone techniques for other multichannel sounds
- •7.3.1 Microphone techniques for other discrete multichannel sounds
- •7.3.2 Microphone techniques for Ambisonic recording
- •7.4 Simulation of localization signals for multichannel sounds
- •7.4.1 Methods of the simulation of directional localization signals
- •7.4.2 Simulation of virtual source distance and extension
- •7.4.3 Simulation of a moving virtual source
- •7.5 Simulation of reflections for stereophonic and multichannel sounds
- •7.5.1 Delay algorithms and discrete reflection simulation
- •7.5.2 IIR filter algorithm of late reverberation
- •7.5.3 FIR, hybrid FIR, and recursive filter algorithms of late reverberation
- •7.5.4 Algorithms of audio signal decorrelation
- •7.5.5 Simulation of room reflections based on physical measurement and calculation
- •7.6 Directional audio coding and multichannel sound signal synthesis
- •7.7 Summary
- •8.1 Matrix surround sound
- •8.1.1 Matrix quadraphone
- •8.1.2 Dolby Surround system
- •8.1.3 Dolby Pro-Logic decoding technique
- •8.1.4 Some developments on matrix surround sound and logic decoding techniques
- •8.2 Downmixing of multichannel sound signals
- •8.3 Upmixing of multichannel sound signals
- •8.3.1 Some considerations in upmixing
- •8.3.2 Simple upmixing methods for front-channel signals
- •8.3.3 Simple methods for Ambient component separation
- •8.3.4 Model and statistical characteristics of two-channel stereophonic signals
- •8.3.5 A scale-signal-based algorithm for upmixing
- •8.3.6 Upmixing algorithm based on principal component analysis
- •8.3.7 Algorithm based on the least mean square error for upmixing
- •8.3.8 Adaptive normalized algorithm based on the least mean square for upmixing
- •8.3.9 Some advanced upmixing algorithms
- •8.4 Summary
- •9.1 Each order approximation of ideal reproduction and Ambisonics
- •9.1.1 Each order approximation of ideal horizontal reproduction
- •9.1.2 Each order approximation of ideal three-dimensional reproduction
- •9.2 General formulation of multichannel sound field reconstruction
- •9.2.1 General formulation of multichannel sound field reconstruction in the spatial domain
- •9.2.2 Formulation of spatial-spectral domain analysis of circular secondary source array
- •9.2.3 Formulation of spatial-spectral domain analysis for a secondary source array on spherical surface
- •9.3 Spatial-spectral domain analysis and driving signals of Ambisonics
- •9.3.1 Reconstructed sound field of horizontal Ambisonics
- •9.3.2 Reconstructed sound field of spatial Ambisonics
- •9.3.3 Mixed-order Ambisonics
- •9.3.4 Near-field compensated higher-order Ambisonics
- •9.3.5 Ambisonic encoding of complex source information
- •9.3.6 Some special applications of spatial-spectral domain analysis of Ambisonics
- •9.4 Some problems related to Ambisonics
- •9.4.1 Secondary source array and stability of Ambisonics
- •9.4.2 Spatial transformation of Ambisonic sound field
- •9.5 Error analysis of Ambisonic-reconstructed sound field
- •9.5.1 Integral error of Ambisonic-reconstructed wavefront
- •9.5.2 Discrete secondary source array and spatial-spectral aliasing error in Ambisonics
- •9.6 Multichannel reconstructed sound field analysis in the spatial domain
- •9.6.1 Basic method for analysis in the spatial domain
- •9.6.2 Minimizing error in reconstructed sound field and summing localization equation
- •9.6.3 Multiple receiver position matching method and its relation to the mode-matching method
- •9.7 Listening room reflection compensation in multichannel sound reproduction
- •9.8 Microphone array for multichannel sound field signal recording
- •9.8.1 Circular microphone array for horizontal Ambisonic recording
- •9.8.2 Spherical microphone array for spatial Ambisonic recording
- •9.8.3 Discussion on microphone array recording
- •9.9 Summary
- •10.1 Basic principle and implementation of wave field synthesis
- •10.1.1 Kirchhoff–Helmholtz boundary integral and WFS
- •10.1.2 Simplification of the types of secondary sources
- •10.1.3 WFS in a horizontal plane with a linear array of secondary sources
- •10.1.4 Finite secondary source array and effect of spatial truncation
- •10.1.5 Discrete secondary source array and spatial aliasing
- •10.1.6 Some issues and related problems on WFS implementation
- •10.2 General theory of WFS
- •10.2.1 Green’s function of Helmholtz equation
- •10.2.2 General theory of three-dimensional WFS
- •10.2.3 General theory of two-dimensional WFS
- •10.2.4 Focused source in WFS
- •10.3 Analysis of WFS in the spatial-spectral domain
- •10.3.1 General formulation and analysis of WFS in the spatial-spectral domain
- •10.3.2 Analysis of the spatial aliasing in WFS
- •10.3.3 Spatial-spectral division method of WFS
- •10.4 Further discussion on sound field reconstruction
- •10.4.1 Comparison among various methods of sound field reconstruction
- •10.4.2 Further analysis of the relationship between acoustical holography and sound field reconstruction
- •10.4.3 Further analysis of the relationship between acoustical holography and Ambisonics
- •10.4.4 Comparison between WFS and Ambisonics
- •10.5 Equalization of WFS under nonideal conditions
- •10.6 Summary
- •11.1 Basic principles of binaural reproduction and virtual auditory display
- •11.1.1 Binaural recording and reproduction
- •11.1.2 Virtual auditory display
- •11.2 Acquisition of HRTFs
- •11.2.1 HRTF measurement
- •11.2.2 HRTF calculation
- •11.2.3 HRTF customization
- •11.3 Basic physical features of HRTFs
- •11.3.1 Time-domain features of far-field HRIRs
- •11.3.2 Frequency domain features of far-field HRTFs
- •11.3.3 Features of near-field HRTFs
- •11.4 HRTF-based filters for binaural synthesis
- •11.5 Spatial interpolation and decomposition of HRTFs
- •11.5.1 Directional interpolation of HRTFs
- •11.5.2 Spatial basis function decomposition and spatial sampling theorem of HRTFs
- •11.5.3 HRTF spatial interpolation and signal mixing for multichannel sound
- •11.5.4 Spectral shape basis function decomposition of HRTFs
- •11.6 Simplification of signal processing for binaural synthesis
- •11.6.1 Virtual loudspeaker-based algorithms
- •11.6.2 Basis function decomposition-based algorithms
- •11.7.1 Principle of headphone equalization
- •11.7.2 Some problems with binaural reproduction and VAD
- •11.8 Binaural reproduction through loudspeakers
- •11.8.1 Basic principle of binaural reproduction through loudspeakers
- •11.8.2 Virtual source distribution in two-front loudspeaker reproduction
- •11.8.3 Head movement and stability of virtual sources in Transaural reproduction
- •11.8.4 Timbre coloration and equalization in transaural reproduction
- •11.9 Virtual reproduction of stereophonic and multichannel surround sound
- •11.9.1 Binaural reproduction of stereophonic and multichannel sound through headphones
- •11.9.2 Stereophonic expansion and enhancement
- •11.9.3 Virtual reproduction of multichannel sound through loudspeakers
- •11.10.1 Binaural room modeling
- •11.10.2 Dynamic virtual auditory environments system
- •11.11 Summary
- •12.1 Physical analysis of binaural pressures in summing virtual source and auditory events
- •12.1.1 Evaluation of binaural pressures and localization cues
- •12.1.2 Method for summing localization analysis
- •12.1.3 Binaural pressure analysis of stereophonic and multichannel sound with amplitude panning
- •12.1.4 Analysis of summing localization with interchannel time difference
- •12.1.5 Analysis of summing localization at the off-central listening position
- •12.1.6 Analysis of interchannel correlation and spatial auditory sensations
- •12.2 Binaural auditory models and analysis of spatial sound reproduction
- •12.2.1 Analysis of lateral localization by using auditory models
- •12.2.2 Analysis of front-back and vertical localization by using a binaural auditory model
- •12.2.3 Binaural loudness models and analysis of the timbre of spatial sound reproduction
- •12.3 Binaural measurement system for assessing spatial sound reproduction
- •12.4 Summary
- •13.1 Analog audio storage and transmission
- •13.1.1 45°/45° Disk recording system
- •13.1.2 Analog magnetic tape audio recorder
- •13.1.3 Analog stereo broadcasting
- •13.2 Basic concepts of digital audio storage and transmission
- •13.3 Quantization noise and shaping
- •13.3.1 Signal-to-quantization noise ratio
- •13.3.2 Quantization noise shaping and 1-Bit DSD coding
- •13.4 Basic principle of digital audio compression and coding
- •13.4.1 Outline of digital audio compression and coding
- •13.4.2 Adaptive differential pulse-code modulation
- •13.4.3 Perceptual audio coding in the time-frequency domain
- •13.4.4 Vector quantization
- •13.4.5 Spatial audio coding
- •13.4.6 Spectral band replication
- •13.4.7 Entropy coding
- •13.4.8 Object-based audio coding
- •13.5 MPEG series of audio coding techniques and standards
- •13.5.1 MPEG-1 audio coding technique
- •13.5.2 MPEG-2 BC audio coding
- •13.5.3 MPEG-2 advanced audio coding
- •13.5.4 MPEG-4 audio coding
- •13.5.5 MPEG parametric coding of multichannel sound and unified speech and audio coding
- •13.5.6 MPEG-H 3D audio
- •13.6 Dolby series of coding techniques
- •13.6.1 Dolby digital coding technique
- •13.6.2 Some advanced Dolby coding techniques
- •13.7 DTS series of coding technique
- •13.8 MLP lossless coding technique
- •13.9 ATRAC technique
- •13.10 Audio video coding standard
- •13.11 Optical disks for audio storage
- •13.11.1 Structure, principle, and classification of optical disks
- •13.11.2 CD family and its audio formats
- •13.11.3 DVD family and its audio formats
- •13.11.4 SACD and its audio formats
- •13.11.5 BD and its audio formats
- •13.12 Digital radio and television broadcasting
- •13.12.1 Outline of digital radio and television broadcasting
- •13.12.2 Eureka-147 digital audio broadcasting
- •13.12.3 Digital radio mondiale
- •13.12.4 In-band on-channel digital audio broadcasting
- •13.12.5 Audio for digital television
- •13.13 Audio storage and transmission by personal computer
- •13.14 Summary
- •14.1 Outline of acoustic conditions and requirements for spatial sound intended for domestic reproduction
- •14.2 Acoustic consideration and design of listening rooms
- •14.3 Arrangement and characteristics of loudspeakers
- •14.3.1 Arrangement of the main loudspeakers in listening rooms
- •14.3.2 Characteristics of the main loudspeakers
- •14.3.3 Bass management and arrangement of subwoofers
- •14.4 Signal and listening level alignment
- •14.5 Standards and guidance for conditions of spatial sound reproduction
- •14.6 Headphones and binaural monitors of spatial sound reproduction
- •14.7 Acoustic conditions for cinema sound reproduction and monitoring
- •14.8 Summary
- •15.1 Outline of psychoacoustic and subjective assessment experiments
- •15.2 Contents and attributes for spatial sound assessment
- •15.3 Auditory comparison and discrimination experiment
- •15.3.1 Paradigms of auditory comparison and discrimination experiment
- •15.3.2 Examples of auditory comparison and discrimination experiment
- •15.4 Subjective assessment of small impairments in spatial sound systems
- •15.5 Subjective assessment of a spatial sound system with intermediate quality
- •15.6 Virtual source localization experiment
- •15.6.1 Basic methods for virtual source localization experiments
- •15.6.2 Preliminary analysis of the results of virtual source localization experiments
- •15.6.3 Some results of virtual source localization experiments
- •15.7 Summary
- •16.1.1 Application to commercial cinema and related problems
- •16.1.2 Applications to domestic reproduction and related problems
- •16.1.3 Applications to automobile audio
- •16.2.1 Applications to virtual reality
- •16.2.2 Applications to communication and information systems
- •16.2.3 Applications to multimedia
- •16.2.4 Applications to mobile and handheld devices
- •16.3 Applications to the scientific experiments of spatial hearing and psychoacoustics
- •16.4 Applications to sound field auralization
- •16.4.1 Auralization in room acoustics
- •16.4.2 Other applications of auralization technique
- •16.5 Applications to clinical medicine
- •16.6 Summary
- •References
- •Index

Multichannel spatial surround sound 233
recreated by the two loudspeakers, and the signal for the other loudspeaker vanishes. If the target source is located between two adjacent loudspeakers in a horizontal plane, VBAP is simplified into horizontal pair-wise amplitude panning.
The resultant signal amplitudes A1, A2, and A3 should be normalized further. The normalized amplitudes of actual loudspeaker signals are A1, A2, and A3 multiplying Atotal. For constant-power normalization, the following equation is obtained:
Atotal |
1 |
, |
(6.3.5) |
A12 A22 A32 |
The aforementioned discussion is the basic principle of VBAP.
VBAP satisfies the low-frequency optimized criterion of the direction of the velocity localization vector. However, it does not satisfy the optimized criterion of the unit velocity vector magnitude rv = 1 except the target virtual source located in the loudspeaker direction. If the head rotates to an orientation so that ITDp vanishes, the lateral displacement of the perceived virtual source with respect to the fixed coordinate is identical to that of the target source. However, for a fixed head with a frontal orientation, the lateral displacement of the perceived virtual source may differ from that of the target source because of the mismatched ITDp in reproduction. Pulkki (2001a) also validated this result by using a binaural auditory model. In addition, the vertical displacement of the perceived virtual source may differ from that of the target source because of the mismatched variation in ITDp caused by head rotation. The accuracy of the perceived virtual source direction in reproduction depends on the positions of three active loudspeakers on a spherical surface and can be analyzed using the localization equations in Section 6.1.1. The situation here is similar to the cases of pair-wise amplitude panning in a horizontal plane (Section 4.1.2) or in a median plane (Section 6.2).
Four active loudspeakers with at least one out-of-phase loudspeaker signal are needed to further satisfy the optimized criterion of rv = 1. The function of an out-of-phase loudspeaker signal is similar to the cases of horizontal reproduction described in Sections 3.2.2 and 4.1.3. Out-of-phase loudspeaker signals are used in global or local Ambisonic signal mixing for horizontal and spatial reproduction, as discussed in Sections 4.3, 5.2.3, 5.2.4, and Section 6.4. The analysis here also reveals the difference between VBAP and Ambisonic signal mixing.
VBAP employs three active loudspeakers to recreate a virtual source within a spherical triangle formed by loudspeakers. Therefore, the error of the perceived virtual source direction is within the region bounded by active loudspeakers despite the position of a listener. The crosstalk from loudspeakers in opposite directions and a serious localization error in off-central listening position are averted. As the total number of loudspeakers increases, the active loudspeaker triangle grid becomes dense, and the localization error is further reduced. VBAP is simple and easily implemented in program production. Therefore, it possesses some advantages in practical uses. Some experiments on VBAP with various loudspeaker configurations have validated the aforementioned analysis (Pulkki, 2001a; Wendt et al., 2014).
6.4 SPATIAL AMBISONIC SIGNAL MIXING AND REPRODUCTION
6.4.1 Principle of spatial Ambisonics
Spatial Ambisonics is another typical multichannel spatial sound and signal mixing method (Gerzon, 1973). It was developed in the 1970s and originally termed Periphony. Currently, it is a hot topic in spatial sound. As an extension of horizontal Ambisonics, spatial Ambisonics involves the decomposition and reconstruction of a sound field by a series

234 Spatial Sound
of directional harmonics (spherical harmonic functions). Similar to horizontal Ambisonics, spatial Ambisonics can be analyzed with various mathematical and physical methods. A traditional analysis based on a virtual source localization theorem is addressed in this section, and a stricter analysis based on sound field reconstruction is presented in Chapter 9.
The first-order spatial Ambisonics involves four independent signals. If the maximal amplitude of independent signals is normalized to a unit, a set of normalized independent signals (amplitudes) can be chosen as
W 1 |
X cos S cos S |
Y sin S cos S |
Z sin S, |
(6.4.1) |
where (θS, ϕS) represent the azimuth and elevation of the target source in the original sound field. Equation (6.4.1) represents the signals captured by four coincident microphones in the original sound field. W is the signal from an omnidirectional microphone (spherical symmetrical directivity); X, Y, and Z are signals from three bidirectional microphones (symmetrical directivity around the polar axis) with their main axes pointing to the front, left, and top directions, respectively. The directional patterns of these four microphones are illustrated in Figure A.1 in Appendix A. W, X, Y, and Z also represent the pressure and the x (front-back), y (left-right), and z (up-down) components of the velocity of the medium in the original sound field, respectively.
In reproduction, M loudspeakers are arranged on a spherical surface around a listener. The distance from each loudspeaker to the origin (center of the sphere) satisfies the condition of a far-field distance. The direction of the ith loudspeaker is (θi,ϕi) with i = 0,1,2…M – 1, and loudspeaker signals can be written as a form of sound field signal mixing, i.e., as a linear combination of the independent signals W,X,Y, and Z given in Equation (6.4.1):
A |
, |
A |
D 1 |
|
, |
W D 1 |
|
, |
X D 2 |
|
, |
|
Y D 1 |
|
, |
Z |
|
|
|
|||||||||||||||||||||
i S |
S |
total |
|
00 |
|
i |
|
i |
|
|
|
11 |
i |
|
i |
|
|
|
11 |
|
i |
i |
|
|
|
10 |
i |
|
i |
|
|
|
|
|
||||||
|
|
|
D 1 |
|
, |
D 1 |
|
, |
|
cos |
|
cos |
D 2 |
|
, |
|
sin |
|
cos |
|
|
|
|
|||||||||||||||||
|
|
Atotal |
|
00 |
|
i |
|
i |
|
11 |
|
i |
i |
|
|
S |
|
|
S |
|
11 |
|
|
|
i |
i |
|
|
S |
|
|
S |
|
(6.4.2) |
||||||
|
|
|
|
|
|
1 |
|
, |
|
sin |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
|
|
|
D |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||
|
|
|
|
10 |
|
i |
i |
|
S |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||
|
i 0, 1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
|
|
M 1 , |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||
where the decoding coefficients D 1 |
|
, |
, D 1 |
|
|
, |
|
, D 2 |
|
|
, |
|
, and D 1 |
|
, |
can |
||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
00 |
|
i |
|
i |
|
11 |
i |
i |
|
11 |
|
i |
|
i |
|
|
|
10 |
|
i |
i |
|
be chosen by using various methods depending on the loudspeaker configuration and optimized criteria for a reproduced sound field.
Generally, for left-right symmetrical loudspeaker configurations, decoding coefficients satisfy symmetrical relationships similar to those in Equations (5.2.16)–(5.2.18). If loudspeaker configurations are also up-down symmetric, for any pair of up-down symmetrical loudspeakers I and i′, their azimuths and elevations satisfy θi= θi′ and ϕi = −ϕi′, and
D 1 |
|
, |
D 1 |
|
i |
, |
|
||||
00 |
|
i |
i |
00 |
|
|
|
|
i |
|
|
D 1 |
|
, |
D 1 |
|
|
i |
, |
. |
|||
10 |
|
i |
i |
10 |
|
|
i |
|
D 1 |
|
, |
D 1 |
|
, |
D 2 |
|
, |
D 2 |
|
, |
11 |
i |
i |
11 |
i |
i |
11 |
i |
i |
11 |
i |
i (6.4.3) |
Symmetry greatly simplifies the decoding coefficients.
The decoding coefficients in Equation (6.4.2) are solved in accordance with various optimized criteria. Similar to the case of horizontal Ambisonics with an irregular loudspeaker configuration in Section 5.2.3, the optimized criterion of the perceived virtual source direction

Multichannel spatial surround sound 235
matching that of a target source can be used when the head is fixed and when the head is turning at low frequencies; as such, θ I = θ′I = θS and ϕI = ϕ′I = ϕ″I = ϕS in Equations (6.1.11), (6.1.13), and (6.1.14). With these parameters, a set of linear equations for decoding coefficients can be obtained. Furthermore, θv= θS, ϕv = ϕS, and rv= 1 can be set in Equation (6.1.18) by using the optimized criterion of the velocity localization vector; the normalized pressure amplitude in the origin is set to a unit, so the following equation is obtained:
M 1 |
M 1 |
|
|
|
Ai S, S 1 |
Ai S, S cos i cos i cos S cos S |
|
|
|
i 0 |
i 0 |
. |
(6.4.4) |
|
M 1 |
M 1 |
|||
|
|
|||
Ai S, S sin i cos i sin S cos S |
Ai S, S sin i sin S |
|
|
|
i 0 |
i 0 |
|
|
Substituting Equation (6.4.2) into Equation (6.4.4) leads to a set of linear equations or a matrix equation for decoding coefficients:
3D |
total 3D |
3D |
3D |
|
(6.4.5) |
|
S |
A Y |
D |
S |
, |
||
|
where S′3D = [W, X, Y, Z]T is a 4 × 1 column matrix or vector composed of independent signals in Equation (6.4.1), the subscript “3D” denotes the case of three-dimensional spatial Ambisonics; and [D′3D] is an M ×4 decoding matrix to be solved:
|
D 1 |
|
, |
|
D 1 |
|
, |
|
D 2 |
|
, |
|
D 1 |
|
, |
|
|
|
||||
|
00 |
0 |
0 |
11 |
0 |
0 |
11 |
0 |
0 |
10 |
0 |
0 |
|
|
||||||||
|
D 1 |
, |
|
D 1 |
, |
|
D 2 |
, |
|
D 1 |
, |
|
|
. (6.4.6) |
||||||||
D3D |
|
00 |
1 |
1 |
|
11 |
1 |
1 |
11 |
1 |
1 |
|
10 |
1 |
1 |
|
||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
D 1 |
|
|
, |
|
D 1 |
|
|
, |
|
D 2 |
|
|
, |
|
D 1 |
|
|
, |
|
|
|
|
|
00 |
|
M 1 |
M 1 |
11 |
|
M 1 |
M 1 |
11 |
|
M 1 |
M 1 |
10 |
|
M 1 |
M 1 |
|
|
[Y′3D] is a 4 × M matrix with its entries related to the directions of loudspeakers:
|
|
1 |
1 |
|
1 |
|
|
|
|
|
cos 1 cos 1 |
|
cos M 1 cos M 1 |
|
|
Y3D |
cos 0 cos 0 |
|
|
||||
sin 0 cos 0 |
sin 1 cos 1 |
|
sin M 1 cos M 1 |
. |
(6.4.7) |
||
|
|
sin 0 |
sin 1 |
|
sin M 1 |
|
|
|
|
|
|
Similar to the case of Equation (5.2.24), for M> 4, the decoding coefficients in Equation (6.4.5) can be solved from the following pseudo-inverse methods if the loudspeaker configuration is chosen appropriately so that the matrix [Y′3D] is well conditioned (Section 9.4.1):
T |
|
T |
|
1 |
|
Atotal D3D pinv Y3D Y3D |
Y3D Y3D . |
(6.4.8) |
|||
|
|
|
|
|
The solution given in Equation (6.4.8) satisfies the condition of constant-amplitude normalization. For M = 4, the decoding coefficients can be directly solved from the inverse of

236 Spatial Sound
the matrix [Y′3D]. Except for some regular loudspeaker configurations, the solution shown in Equation (6.4.8) usually does not satisfy the criterion of constant power in reproduction:
M 1 |
|
Pow Ai2 const. |
(6.4.9) |
i 0
For the solution expressed in Equation (6.4.8), the direction of the energy localization vector evaluated using Equation (6.1.20) generally does not coincide with that of the velocity localization vector, i.e., (θE, ϕE) ≠ (θv, ϕv). However, as indicated by Equation (6.4.4), the first-order spatial Ambisonics with a decoding matrix given by Equation (6.4.8) is an example in which Equations (6.1.11), (6.1.13), and (6.1.14) are consistent. In other words, the first-order spatial Ambisonics creates ITDp, and its dynamic variation with head rotation and tilting that match with those of the target source at very low frequencies.
Similar to the case of horizontal Ambisonics (Section 4.3.1), spatial Ambisonics has various forms of independent signals. In principle, four independent linear combinations of W, X, Y, and Z can serve as a set of independent signals of spatial Ambisonics. Different sets of independent signals can be converted mutually. The decoding equations for different sets of independent signals are related via linear matrix transformation. In practice, four independent signals can be captured by four cardioid or subcardioid microphones with symmetrical directivity around the polar axis. The main axes of four microphones are pointed to the four non-parallel directions: left-front-up (LFU), left-back-down (LBD), right-front-down (RFD), and right-back-up (RBU) (Figure 6.7). The normalized amplitudes of microphone signals are
A |
|
A |
1 bcos |
|
|
i 0, 1, 2, 3. |
(6.4.10) |
i |
i |
mic |
i |
|
|
where 0.5 ≤ b ≤ 1, Ω′i′ is the angle between the target source direction and the main axis of the ith microphone, and Amic is a coefficient associated with the acoustic-electric efficiency or gain of microphones. The independent signals expressed in Equation (6.4.10) are termed A-format spatial Ambisonic signals.
Figure 6.7 Main axis directions of the four microphones for recording the A-format first-order spatial Ambisonic signals.

Multichannel spatial surround sound 237
W, X, Y, and Z given in Equation (6.4.1) can be conveniently chosen as the independent signals of the first-order spatial Ambisonics. W, X, Y, and Z can be derived from the signals presented in Equation (6.4.10) through linear transformation. Let
A |
A |
|
|
|
A |
A |
|
A |
A |
|
|
A |
A |
|
|
|
. (6.4.11) |
|||
LFU |
0 |
|
|
0 |
LBD |
1 |
1 |
RFD |
|
2 |
2 |
RBU |
3 |
3 |
|
|||||
The linear combinations of the four signals in Equation (6.4.1) yield |
|
|
|
|
||||||||||||||||
|
W |
|
|
|
|
|
|
|
4Amic |
|
|
|
|
|
|
|||||
|
|
ALFU ALBD ARFD ARBU |
|
|
|
|
|
|
||||||||||||
|
|
X |
|
|
|
|
|
|
|
|
4b |
Amic cos S cos S |
|
|
|
|
||||
|
|
|
3 |
|
|
|
|
|||||||||||||
|
|
|
ALFU ALBD ARFD ARBU |
|
|
|
|
|||||||||||||
|
|
Y |
|
|
|
|
|
|
|
|
4b |
Amic sin S cos S |
|
|
|
(6.4.12) |
||||
|
|
|
ALFU ALBD ARFD ARBU |
3 |
|
|
|
|
||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
4b |
Amic sin S. |
|
|
|
|
|
||
|
|
Z |
ALFU ALBD ARFD ARBU |
3 |
|
|
|
|
|
The normalized independent signals W, X, Y, and Z are obtained by equalizing the signals in Equation (6.4.12) with an appropriate gain. In practice, the levels of X, Y, and Z are increased by 3 dB to ensure a diffused-field power response identical to that of the M signal. The independent signals are written as
W W 1 |
X 2X 2 cos S cos S |
Y 2Y 2 sin S cos S |
(6.4.13) |
|
|
2Z |
2 sin S . |
|
|
|
|
|||
Z |
|
|
The decoding equation should be appropriately changed to accommodate the independent signals in Equation (6.4.13). The spatial Ambisonic system with independent signals given by Equation (6.4.13) is termed the first-order B-format spatial Ambisonics (Gerzon, 1985).
The secondand higher-order directional harmonics (appendix A) are supplemented as new independent signals to improve the performance in reproduction, and the first-order spatial Ambisonics can be extended to the higher-order spatial Ambisonics. Under the optimized criterion of matching each order approximation of reproduced sound pressures with those of the target source within a given listening region and below a certain frequency limit, the decoding coefficients or matrix and loudspeaker signals are solved via a pseudoinverse method similar to Equation (6.4.8). This problem is addressed in Section 9.3.2. For the secondand higher-order spatial Ambisonics with decoding coefficients given in Equation (6.4.4), localization Equations (6.1.11), (6.1.13), and (6.1.14) yield consistent results. Equation (6.4.4) only specifies the zero and first-order decoding coefficients, flexibly leaving room for choosing the secondand higher-decoding coefficients.
Moreover, the equivalence between A-format and B-format spatial Ambisonics allows a post-equalization of recorded source signal at a given direction (Favrot and Faller, 2020). By using an appropriate matrix, the recorded B-format signals can be converted into a set of A-format signals with the main axis of polar pattern of one of A-format signal A′0( Ω′0) points to the concerned source direction. The resultant signal A′0( Ω′0) is equalized and, and the equalized signal along with the other un-equalized A-format signals are converted back into B-format signals by an invert matrix.