
- •Preface
- •Introduction
- •1.1 Spatial coordinate systems
- •1.2 Sound fields and their physical characteristics
- •1.2.1 Free-field and sound waves generated by simple sound sources
- •1.2.2 Reflections from boundaries
- •1.2.3 Directivity of sound source radiation
- •1.2.4 Statistical analysis of acoustics in an enclosed space
- •1.2.5 Principle of sound receivers
- •1.3 Auditory system and perception
- •1.3.1 Auditory system and its functions
- •1.3.2 Hearing threshold and loudness
- •1.3.3 Masking
- •1.3.4 Critical band and auditory filter
- •1.4 Artificial head models and binaural signals
- •1.4.1 Artificial head models
- •1.4.2 Binaural signals and head-related transfer functions
- •1.5 Outline of spatial hearing
- •1.6 Localization cues for a single sound source
- •1.6.1 Interaural time difference
- •1.6.2 Interaural level difference
- •1.6.3 Cone of confusion and head movement
- •1.6.4 Spectral cues
- •1.6.5 Discussion on directional localization cues
- •1.6.6 Auditory distance perception
- •1.7 Summing localization and spatial hearing with multiple sources
- •1.7.1 Summing localization with two sound sources
- •1.7.2 The precedence effect
- •1.7.3 Spatial auditory perceptions with partially correlated and uncorrelated source signals
- •1.7.4 Auditory scene analysis and spatial hearing
- •1.7.5 Cocktail party effect
- •1.8 Room reflections and auditory spatial impression
- •1.8.1 Auditory spatial impression
- •1.8.2 Sound field-related measures and auditory spatial impression
- •1.8.3 Binaural-related measures and auditory spatial impression
- •1.9.1 Basic principle of spatial sound
- •1.9.2 Classification of spatial sound
- •1.9.3 Developments and applications of spatial sound
- •1.10 Summary
- •2.1 Basic principle of a two-channel stereophonic sound
- •2.1.1 Interchannel level difference and summing localization equation
- •2.1.2 Effect of frequency
- •2.1.3 Effect of interchannel phase difference
- •2.1.4 Virtual source created by interchannel time difference
- •2.1.5 Limitation of two-channel stereophonic sound
- •2.2.1 XY microphone pair
- •2.2.2 MS transformation and the MS microphone pair
- •2.2.3 Spaced microphone technique
- •2.2.4 Near-coincident microphone technique
- •2.2.5 Spot microphone and pan-pot technique
- •2.2.6 Discussion on microphone and signal simulation techniques for two-channel stereophonic sound
- •2.3 Upmixing and downmixing between two-channel stereophonic and mono signals
- •2.4 Two-channel stereophonic reproduction
- •2.4.1 Standard loudspeaker configuration of two-channel stereophonic sound
- •2.4.2 Influence of front-back deviation of the head
- •2.5 Summary
- •3.1 Physical and psychoacoustic principles of multichannel surround sound
- •3.2 Summing localization in multichannel horizontal surround sound
- •3.2.1 Summing localization equations for multiple horizontal loudspeakers
- •3.2.2 Analysis of the velocity and energy localization vectors of the superposed sound field
- •3.2.3 Discussion on horizontal summing localization equations
- •3.3 Multiple loudspeakers with partly correlated and low-correlated signals
- •3.4 Summary
- •4.1 Discrete quadraphone
- •4.1.1 Outline of the quadraphone
- •4.1.2 Discrete quadraphone with pair-wise amplitude panning
- •4.1.3 Discrete quadraphone with the first-order sound field signal mixing
- •4.1.4 Some discussions on discrete quadraphones
- •4.2 Other horizontal surround sounds with regular loudspeaker configurations
- •4.2.1 Six-channel reproduction with pair-wise amplitude panning
- •4.2.2 The first-order sound field signal mixing and reproduction with M ≥ 3 loudspeakers
- •4.3 Transformation of horizontal sound field signals and Ambisonics
- •4.3.1 Transformation of the first-order horizontal sound field signals
- •4.3.2 The first-order horizontal Ambisonics
- •4.3.3 The higher-order horizontal Ambisonics
- •4.3.4 Discussion and implementation of the horizontal Ambisonics
- •4.4 Summary
- •5.1 Outline of surround sounds with accompanying picture and general uses
- •5.2 5.1-Channel surround sound and its signal mixing analysis
- •5.2.1 Outline of 5.1-channel surround sound
- •5.2.2 Pair-wise amplitude panning for 5.1-channel surround sound
- •5.2.3 Global Ambisonic-like signal mixing for 5.1-channel sound
- •5.2.4 Optimization of three frontal loudspeaker signals and local Ambisonic-like signal mixing
- •5.2.5 Time panning for 5.1-channel surround sound
- •5.3 Other multichannel horizontal surround sounds
- •5.4 Low-frequency effect channel
- •5.5 Summary
- •6.1 Summing localization in multichannel spatial surround sound
- •6.1.1 Summing localization equations for spatial multiple loudspeaker configurations
- •6.1.2 Velocity and energy localization vector analysis for multichannel spatial surround sound
- •6.1.3 Discussion on spatial summing localization equations
- •6.1.4 Relationship with the horizontal summing localization equations
- •6.2 Signal mixing methods for a pair of vertical loudspeakers in the median and sagittal plane
- •6.3 Vector base amplitude panning
- •6.4 Spatial Ambisonic signal mixing and reproduction
- •6.4.1 Principle of spatial Ambisonics
- •6.4.2 Some examples of the first-order spatial Ambisonics
- •6.4.4 Recreating a top virtual source with a horizontal loudspeaker arrangement and Ambisonic signal mixing
- •6.5 Advanced multichannel spatial surround sounds and problems
- •6.5.1 Some advanced multichannel spatial surround sound techniques and systems
- •6.5.2 Object-based spatial sound
- •6.5.3 Some problems related to multichannel spatial surround sound
- •6.6 Summary
- •7.1 Basic considerations on the microphone and signal simulation techniques for multichannel sounds
- •7.2 Microphone techniques for 5.1-channel sound recording
- •7.2.1 Outline of microphone techniques for 5.1-channel sound recording
- •7.2.2 Main microphone techniques for 5.1-channel sound recording
- •7.2.3 Microphone techniques for the recording of three frontal channels
- •7.2.4 Microphone techniques for ambience recording and combination with frontal localization information recording
- •7.2.5 Stereophonic plus center channel recording
- •7.3 Microphone techniques for other multichannel sounds
- •7.3.1 Microphone techniques for other discrete multichannel sounds
- •7.3.2 Microphone techniques for Ambisonic recording
- •7.4 Simulation of localization signals for multichannel sounds
- •7.4.1 Methods of the simulation of directional localization signals
- •7.4.2 Simulation of virtual source distance and extension
- •7.4.3 Simulation of a moving virtual source
- •7.5 Simulation of reflections for stereophonic and multichannel sounds
- •7.5.1 Delay algorithms and discrete reflection simulation
- •7.5.2 IIR filter algorithm of late reverberation
- •7.5.3 FIR, hybrid FIR, and recursive filter algorithms of late reverberation
- •7.5.4 Algorithms of audio signal decorrelation
- •7.5.5 Simulation of room reflections based on physical measurement and calculation
- •7.6 Directional audio coding and multichannel sound signal synthesis
- •7.7 Summary
- •8.1 Matrix surround sound
- •8.1.1 Matrix quadraphone
- •8.1.2 Dolby Surround system
- •8.1.3 Dolby Pro-Logic decoding technique
- •8.1.4 Some developments on matrix surround sound and logic decoding techniques
- •8.2 Downmixing of multichannel sound signals
- •8.3 Upmixing of multichannel sound signals
- •8.3.1 Some considerations in upmixing
- •8.3.2 Simple upmixing methods for front-channel signals
- •8.3.3 Simple methods for Ambient component separation
- •8.3.4 Model and statistical characteristics of two-channel stereophonic signals
- •8.3.5 A scale-signal-based algorithm for upmixing
- •8.3.6 Upmixing algorithm based on principal component analysis
- •8.3.7 Algorithm based on the least mean square error for upmixing
- •8.3.8 Adaptive normalized algorithm based on the least mean square for upmixing
- •8.3.9 Some advanced upmixing algorithms
- •8.4 Summary
- •9.1 Each order approximation of ideal reproduction and Ambisonics
- •9.1.1 Each order approximation of ideal horizontal reproduction
- •9.1.2 Each order approximation of ideal three-dimensional reproduction
- •9.2 General formulation of multichannel sound field reconstruction
- •9.2.1 General formulation of multichannel sound field reconstruction in the spatial domain
- •9.2.2 Formulation of spatial-spectral domain analysis of circular secondary source array
- •9.2.3 Formulation of spatial-spectral domain analysis for a secondary source array on spherical surface
- •9.3 Spatial-spectral domain analysis and driving signals of Ambisonics
- •9.3.1 Reconstructed sound field of horizontal Ambisonics
- •9.3.2 Reconstructed sound field of spatial Ambisonics
- •9.3.3 Mixed-order Ambisonics
- •9.3.4 Near-field compensated higher-order Ambisonics
- •9.3.5 Ambisonic encoding of complex source information
- •9.3.6 Some special applications of spatial-spectral domain analysis of Ambisonics
- •9.4 Some problems related to Ambisonics
- •9.4.1 Secondary source array and stability of Ambisonics
- •9.4.2 Spatial transformation of Ambisonic sound field
- •9.5 Error analysis of Ambisonic-reconstructed sound field
- •9.5.1 Integral error of Ambisonic-reconstructed wavefront
- •9.5.2 Discrete secondary source array and spatial-spectral aliasing error in Ambisonics
- •9.6 Multichannel reconstructed sound field analysis in the spatial domain
- •9.6.1 Basic method for analysis in the spatial domain
- •9.6.2 Minimizing error in reconstructed sound field and summing localization equation
- •9.6.3 Multiple receiver position matching method and its relation to the mode-matching method
- •9.7 Listening room reflection compensation in multichannel sound reproduction
- •9.8 Microphone array for multichannel sound field signal recording
- •9.8.1 Circular microphone array for horizontal Ambisonic recording
- •9.8.2 Spherical microphone array for spatial Ambisonic recording
- •9.8.3 Discussion on microphone array recording
- •9.9 Summary
- •10.1 Basic principle and implementation of wave field synthesis
- •10.1.1 Kirchhoff–Helmholtz boundary integral and WFS
- •10.1.2 Simplification of the types of secondary sources
- •10.1.3 WFS in a horizontal plane with a linear array of secondary sources
- •10.1.4 Finite secondary source array and effect of spatial truncation
- •10.1.5 Discrete secondary source array and spatial aliasing
- •10.1.6 Some issues and related problems on WFS implementation
- •10.2 General theory of WFS
- •10.2.1 Green’s function of Helmholtz equation
- •10.2.2 General theory of three-dimensional WFS
- •10.2.3 General theory of two-dimensional WFS
- •10.2.4 Focused source in WFS
- •10.3 Analysis of WFS in the spatial-spectral domain
- •10.3.1 General formulation and analysis of WFS in the spatial-spectral domain
- •10.3.2 Analysis of the spatial aliasing in WFS
- •10.3.3 Spatial-spectral division method of WFS
- •10.4 Further discussion on sound field reconstruction
- •10.4.1 Comparison among various methods of sound field reconstruction
- •10.4.2 Further analysis of the relationship between acoustical holography and sound field reconstruction
- •10.4.3 Further analysis of the relationship between acoustical holography and Ambisonics
- •10.4.4 Comparison between WFS and Ambisonics
- •10.5 Equalization of WFS under nonideal conditions
- •10.6 Summary
- •11.1 Basic principles of binaural reproduction and virtual auditory display
- •11.1.1 Binaural recording and reproduction
- •11.1.2 Virtual auditory display
- •11.2 Acquisition of HRTFs
- •11.2.1 HRTF measurement
- •11.2.2 HRTF calculation
- •11.2.3 HRTF customization
- •11.3 Basic physical features of HRTFs
- •11.3.1 Time-domain features of far-field HRIRs
- •11.3.2 Frequency domain features of far-field HRTFs
- •11.3.3 Features of near-field HRTFs
- •11.4 HRTF-based filters for binaural synthesis
- •11.5 Spatial interpolation and decomposition of HRTFs
- •11.5.1 Directional interpolation of HRTFs
- •11.5.2 Spatial basis function decomposition and spatial sampling theorem of HRTFs
- •11.5.3 HRTF spatial interpolation and signal mixing for multichannel sound
- •11.5.4 Spectral shape basis function decomposition of HRTFs
- •11.6 Simplification of signal processing for binaural synthesis
- •11.6.1 Virtual loudspeaker-based algorithms
- •11.6.2 Basis function decomposition-based algorithms
- •11.7.1 Principle of headphone equalization
- •11.7.2 Some problems with binaural reproduction and VAD
- •11.8 Binaural reproduction through loudspeakers
- •11.8.1 Basic principle of binaural reproduction through loudspeakers
- •11.8.2 Virtual source distribution in two-front loudspeaker reproduction
- •11.8.3 Head movement and stability of virtual sources in Transaural reproduction
- •11.8.4 Timbre coloration and equalization in transaural reproduction
- •11.9 Virtual reproduction of stereophonic and multichannel surround sound
- •11.9.1 Binaural reproduction of stereophonic and multichannel sound through headphones
- •11.9.2 Stereophonic expansion and enhancement
- •11.9.3 Virtual reproduction of multichannel sound through loudspeakers
- •11.10.1 Binaural room modeling
- •11.10.2 Dynamic virtual auditory environments system
- •11.11 Summary
- •12.1 Physical analysis of binaural pressures in summing virtual source and auditory events
- •12.1.1 Evaluation of binaural pressures and localization cues
- •12.1.2 Method for summing localization analysis
- •12.1.3 Binaural pressure analysis of stereophonic and multichannel sound with amplitude panning
- •12.1.4 Analysis of summing localization with interchannel time difference
- •12.1.5 Analysis of summing localization at the off-central listening position
- •12.1.6 Analysis of interchannel correlation and spatial auditory sensations
- •12.2 Binaural auditory models and analysis of spatial sound reproduction
- •12.2.1 Analysis of lateral localization by using auditory models
- •12.2.2 Analysis of front-back and vertical localization by using a binaural auditory model
- •12.2.3 Binaural loudness models and analysis of the timbre of spatial sound reproduction
- •12.3 Binaural measurement system for assessing spatial sound reproduction
- •12.4 Summary
- •13.1 Analog audio storage and transmission
- •13.1.1 45°/45° Disk recording system
- •13.1.2 Analog magnetic tape audio recorder
- •13.1.3 Analog stereo broadcasting
- •13.2 Basic concepts of digital audio storage and transmission
- •13.3 Quantization noise and shaping
- •13.3.1 Signal-to-quantization noise ratio
- •13.3.2 Quantization noise shaping and 1-Bit DSD coding
- •13.4 Basic principle of digital audio compression and coding
- •13.4.1 Outline of digital audio compression and coding
- •13.4.2 Adaptive differential pulse-code modulation
- •13.4.3 Perceptual audio coding in the time-frequency domain
- •13.4.4 Vector quantization
- •13.4.5 Spatial audio coding
- •13.4.6 Spectral band replication
- •13.4.7 Entropy coding
- •13.4.8 Object-based audio coding
- •13.5 MPEG series of audio coding techniques and standards
- •13.5.1 MPEG-1 audio coding technique
- •13.5.2 MPEG-2 BC audio coding
- •13.5.3 MPEG-2 advanced audio coding
- •13.5.4 MPEG-4 audio coding
- •13.5.5 MPEG parametric coding of multichannel sound and unified speech and audio coding
- •13.5.6 MPEG-H 3D audio
- •13.6 Dolby series of coding techniques
- •13.6.1 Dolby digital coding technique
- •13.6.2 Some advanced Dolby coding techniques
- •13.7 DTS series of coding technique
- •13.8 MLP lossless coding technique
- •13.9 ATRAC technique
- •13.10 Audio video coding standard
- •13.11 Optical disks for audio storage
- •13.11.1 Structure, principle, and classification of optical disks
- •13.11.2 CD family and its audio formats
- •13.11.3 DVD family and its audio formats
- •13.11.4 SACD and its audio formats
- •13.11.5 BD and its audio formats
- •13.12 Digital radio and television broadcasting
- •13.12.1 Outline of digital radio and television broadcasting
- •13.12.2 Eureka-147 digital audio broadcasting
- •13.12.3 Digital radio mondiale
- •13.12.4 In-band on-channel digital audio broadcasting
- •13.12.5 Audio for digital television
- •13.13 Audio storage and transmission by personal computer
- •13.14 Summary
- •14.1 Outline of acoustic conditions and requirements for spatial sound intended for domestic reproduction
- •14.2 Acoustic consideration and design of listening rooms
- •14.3 Arrangement and characteristics of loudspeakers
- •14.3.1 Arrangement of the main loudspeakers in listening rooms
- •14.3.2 Characteristics of the main loudspeakers
- •14.3.3 Bass management and arrangement of subwoofers
- •14.4 Signal and listening level alignment
- •14.5 Standards and guidance for conditions of spatial sound reproduction
- •14.6 Headphones and binaural monitors of spatial sound reproduction
- •14.7 Acoustic conditions for cinema sound reproduction and monitoring
- •14.8 Summary
- •15.1 Outline of psychoacoustic and subjective assessment experiments
- •15.2 Contents and attributes for spatial sound assessment
- •15.3 Auditory comparison and discrimination experiment
- •15.3.1 Paradigms of auditory comparison and discrimination experiment
- •15.3.2 Examples of auditory comparison and discrimination experiment
- •15.4 Subjective assessment of small impairments in spatial sound systems
- •15.5 Subjective assessment of a spatial sound system with intermediate quality
- •15.6 Virtual source localization experiment
- •15.6.1 Basic methods for virtual source localization experiments
- •15.6.2 Preliminary analysis of the results of virtual source localization experiments
- •15.6.3 Some results of virtual source localization experiments
- •15.7 Summary
- •16.1.1 Application to commercial cinema and related problems
- •16.1.2 Applications to domestic reproduction and related problems
- •16.1.3 Applications to automobile audio
- •16.2.1 Applications to virtual reality
- •16.2.2 Applications to communication and information systems
- •16.2.3 Applications to multimedia
- •16.2.4 Applications to mobile and handheld devices
- •16.3 Applications to the scientific experiments of spatial hearing and psychoacoustics
- •16.4 Applications to sound field auralization
- •16.4.1 Auralization in room acoustics
- •16.4.2 Other applications of auralization technique
- •16.5 Applications to clinical medicine
- •16.6 Summary
- •References
- •Index

Binaural reproduction and virtual auditory display 513
For a (Q, P)-order ARMA or IIR filter model, the system function is
|
|
|
|
|
|
|
|
|
|
Q |
|
|
H (z) = |
b0 + b1z |
−1 |
+ …+ bQz |
−Q |
|
|
∑bqz−q |
|
|
|||
|
|
= |
|
q=0 |
, |
(11.4.6) |
||||||
1 |
+ a1z |
−1 |
+ …+ aPz |
−P |
|
P |
||||||
|
|
|
|
|
1 |
+ ∑apz−p |
|
|
||||
|
|
|
|
|
|
|
|
|
|
p=1
where (ap, bq) are a set of (Q+P+1) filter coefficients.
With HRTF-based filter designs, the coefficients in Equation (11.4.5) or Equation (11.4.6) can be selected appropriately so that filter responses exactly or approximately match with the target HRTF or HRIR in some mathematical or perceptual criteria. An HRTF-based filter design has been developed with various methods, including conventional time windowing or frequency sampling methods for FIR filters and Prony or the Yule–Walker method for IIR filters. A balanced model truncation (BMT) method is used to design the IIR filter from the original HRIR with a finite length (Mackenzie et al., 1997). To improve the computational efficiency of binaural synthesis for various target virtual source directions, Haneda et al. (1999) proposed a common acoustical pole and zero (CAPZ) model for HRTF-based filters. In this model, HRTFs in M directions are represented by an ARMA model with directionindependent poles and direction-dependent zeros. Poles represent the direction-independent peaks in HRTF magnitudes caused by ear canal resonance, and zeros denote the variation in the directions of HRTFs. For a group of HRTFs in M directions, the CAPZ model involves fewer parameters and is simpler than conventional ARMA models.
The performance of HRTF-based filters can be evaluated using various error criteria. HRTF-based filters are usually designed to minimize certain physical errors, such as the square error between filter and target responses. Some methods based on auditory error criteria, such as the logarithmic error criterion method (Blommer and Wakefield, 1997) and the frequency-warped method (Härmä et al., 2000), have also been suggested for HRTFbased filter designs. Therefore, HRTF-based filter designs are an important topic in VAD (Xie, 2008a, 2013; Huopaniemi et al., 1999).
11.5 SPATIAL INTERPOLATION AND DECOMPOSITION OF HRTFS
11.5.1 Directional interpolation of HRTFs
Far-field HRTFs are continuous functions of the source direction (θS, ϕS). As stated in Section 11.2.1, HRTFs are usually measured in discrete and finite directions, that is, sampled in directions around a spatial spherical surface or a horizontal circle. Under certain conditions, HRTFs in unmeasured directions can be reconstructed or estimated from the measured data by using various interpolation schemes.
For example, at a constant source distance r = rS, HRTFs at the arbitrary unmeasured azimuth θS can be estimated from the HRTFs measured at M horizontal azimuths [that is, H(θi, f) with i = 0, 1, …, M − 1] by using the linear interpolation scheme:
M−1 |
|
ˆ |
(11.5.1) |
H (θS, f ) ≈ ∑AiH (θi , f ), |
i=0

514 Spatial Sound
where the subscript for the left or right ear is omitted, ˆ (θ , ) is the interpolated HRTF,
H S f
and Ai = Ai(θS) is a set of weights related to the target azimuth. Various interpolation schemes can be developed using different methods for selecting the measured azimuths and weights. In each direction, digital measurement leads to HRTF at N discrete frequencies; therefore, Equation (11.5.1) is the directional interpolation equation for N discrete frequencies of f =fk (k = 0, 1… N −1).
Equation (11.5.1) can be extended to three-dimensional spatial directions as
M−1 |
|
ˆ |
(11.5.2) |
H (θS, φS, f ) ≈ ∑AiH (θi , φi ,f ), |
|
i=0 |
|
where ˆ (θ , φ , ) is the interpolated HRTF at the arbitrary unmeasured direction (θ , ϕ )
H S S f S S and the H(θi, ϕi, f) with i = 0,1,2…M−1 are HRTFs at M measured directions. Equations
(11.5.1) and (11.5.2) are HRTF linear interpolation equation in the frequency domain and applicable to both complex-valued HRTFs and HRTF magnitudes. Interpolation of HRTF magnitudes alone improved performance.
Because of the linear characteristic of temporal-frequency Fourier transformation, Equations (11.5.1) and (11.5.2) are also applicable to the HRIRs in the time domain. For example, the time domain version of Equation (11.5.2) is given by
M−1 |
|
ˆ |
(11.5.3) |
h(θS, φS, t) ≈ ∑Aih(θi , φi , t). |
i=0
If HRTFs satisfy the minimum-phase approximation given by Equation (11.4.1), Equations (11.5.1) to (11.5.3) are also applicable to the minimum-phase HRTFs or HRIRs. Interpolation of minimum-phase HRTFs improved performance. However, the resultant HRTF is not always a minimum-phase function because a weighted sum of minimum-phase functions does not always result in a minimum-phase function. In addition, some work suggested imposing arrival time correction on HRIR interpolation (Matsumoto et al., 2004). That is, prior to interpolation, the arrival time of the HRIRs for each source direction is made synchronous by shifting the onset time of each HRIR. Time correction also improves the performance of interpolation.
A simple example for directional interpolation is adjacent linear interpolation. Within the azimuthal region θi < θS < θi+1 in the horizontal plane, a HRTF at azimuth θS is approximated by the first-order term of its Taylor expansion of θS
H (θS , f ) ≈ H (θi , f ) + |
∂H (θS |
, f ) |
|
|
(θS − θi ) |
||
|
|||||||
∂θS |
|
|
|
θS =θi |
|||
|
|
|
|
|
(11.5.4) |
||
|
|
|
|
|
|
||
≈ H (θi , f ) + |
H (θi+1, f ) − H (θi , f ) |
(θS − θi ), |
|||||
|
θi+1 − θi |
|
|||||
|
|
|
|
or
ˆ |
(11.5.5) |
H (θS , f ) ≈ Ai+1H (θi+1, f ) + AiH (θi , f ). |

Binaural reproduction and virtual auditory display 515
The weights are given by
Ai+1 = |
θS −θi |
Ai = 1 − |
θS −θi |
. |
(11.5.6) |
θi+1 −θi |
|
||||
|
|
θi+1 −θi |
|
Therefore, the unmeasured HRTF at θS is approximated as the weighted sum of a pair adjacent HRTFs, and the weights Ai and Ai+1 are independent from frequency. Equation (11.5.5) is the equation of conventional adjacent linear interpolation, a special case of Equation (11.5.1).
Bilinear interpolation is a three-dimension extension of adjacent linear interpolation (Wightman et al., 1992). Given that HRTFs are measured at a constant source distance r = rS, the spherical surface (upon which the source is located) is sampled along both azimuthal and elevation directions, resulting in a measurement grid, with its vertices representing the source directions for measurement. The HRTF at an unmeasured direction within the grid are approximated as a weighted sum or average of the HRTFs associated with the four nearest directions:
A spherical triangular interpolation scheme is established (Freeland et al., 2004). The measured positions consist of a triangular grid on a spherical surface. The HRTF in an unmeasured direction within a grid is approximated as a weighted sum of the measured HRTFs at the three adjacent vertices of the grid.
Similar to the case of the HRTF-based filter in Section 11.4, the performance of interpolation can be evaluated using various error criteria. For example, the relative energy error is defined as
|
|
|
|
ˆ |
φS, f ) |
|
2 |
|
|
||
|
|
|
|
|
|
|
|||||
|
|
|
|
|
|
|
|
||||
ErrR (θS, φS, f ) = |
H (θS, φS, f ) − H(θS, |
|
|
(×100%), |
(11.5.7) |
||||||
|
|
H (θS, φS, f ) |
|
2 |
|
|
|
||||
|
|
|
|
|
|
||||||
where H(θS, ϕS, f) and |
ˆ |
are the target and interpolated HRTFs, respectively. |
|||||||||
H (θS, φS, f ) |
11.5.2 Spatial basis function decomposition and spatial sampling theorem of HRTFs
HRTFs are multivariable functions. Even for a specified individual and in a far field, HRTFs are complex-valued functions of the source direction (θS, ϕS) and frequency f. This multivari- able-dependent characteristic yields the substantial dimensionality of the entire HRTF data, so the analysis and representation of HRTFs are complicated. Alternatively, an efficient or low-dimensional representation of HRTFs can be achieved by decomposing HRTFs into a weighted sum of appropriate basis functions, where the dependencies of HRTFs on different variables are separately represented by variations in basis functions and weights. The basis function decompositions of HRTFs are applicable to the binaural synthesis of multiple virtual sources in VAD.
HRTF linear decomposition is categorized into two basic types: spatial basis function decomposition and spectral shape basis function decomposition. The former is addressed here, and the latter is discussed in Section 11.5.4. Several spatial basis function decomposition schemes exist. Among them, the spatial harmonic decomposition scheme, which is closely related to Ambisonics, leads to the spatial sampling theorem of HRTFs and is applicable to simplifying HRTF representation and binaural synthesis in VAD.

516 Spatial Sound
The azimuthal harmonic decomposition of horizontal HRTFs is discussed first (Zhong and Xie, 2005, 2009). At each given elevation ϕS= ϕ0, such as the horizontal plane ϕS = 0°, a far-field HRTF for a specified individual and ear is a continuous function of the azimuth with a period of 2π. Therefore, it can be expanded as a realor complex-valued azimuthal Fourier series as
(1) |
+∞ |
(1) |
(2) |
|
|
||||
H (θS , f ) = H0 |
(f ) + ∑ Hq |
(f )cos qθS + Hq |
(f )sin qθS |
|
|
q=1 |
|
|
(11.5.8) |
+∞ |
|
|
|
|
|
|
|
|
|
= ∑Hq (f )exp(jqθS ). |
|
|||
q=−∞ |
|
|
|
|
Therefore, H(θS, f) is decomposed into a weighted sum of infinite orders of azimuthal harmonics. The azimuthal harmonics {cosqθS, sinqθS} or {exp(jqθS)} are an infinite set of orthogonal basis functions. They depend only on θS. The frequency-dependent coefficients or weights
{Hq(1) (f ),Hq(2) (f )} or {Hq(f)} represent the azimuthal spectrum of HRTFs. They can be evaluated from the continuous H(θS, f) as follows:
|
1 |
π |
|
|
|
|
|
|
|
||
H0(1) (f ) = |
∫H (θS , f )dθS |
|
|
|
|
|
|
|
|||
2π |
|
|
|
|
|
|
|
||||
|
|
|
−π |
|
|
|
|
|
|
|
|
|
1 |
π |
|
|
1 |
π |
|
|
|
||
Hq(1) (f ) = |
∫H (θS , f )cos qθSdθS |
Hq(2) (f ) = |
∫H (θS , f )sin qθSdθS |
(11.5.9) |
|||||||
π |
π |
||||||||||
|
|
−π |
|
|
|
−π |
|
|
|
||
H0 (f ) = H0(1) (f ) Hq (f ) = |
1 Hq(1) (f ) − jHq(2) (f ) |
H−q (f ) = |
1 Hq(1) (f ) + jHq(2) (f ) |
||||||||
|
|
|
|
2 |
|
|
|
|
2 |
|
|
q = 1, 2, 3… |
|
|
|
|
|
|
|
In addition to frequency dependence, {Hq(1) (f ),Hq(2) (f )} or {Hq(f)} are relevant to elevation,
individuals, and ears. For simplicity, however, these variables are excluded from the following discussion. Moreover, the weights here are written as functions of frequency; they should not be confused with the notation of Hankel functions in Chapters 9 and 10.
If the HRTF is azimuthally bandlimited such that all azimuthal harmonics with the order
(1) |
|
(2) |
|
|
|
|
|q| >Q vanish [that is, Hq |
(f ) = Hq |
(f ) = Hq (f ) = 0 for |q| >Q], Equation (11.5.8) becomes |
||||
H (θS, f ) = |
(1) |
(f ) + |
Q |
(1) |
(2) |
|
|
||||||
H0 |
∑ Hq |
(f )cos qθS + Hq |
(f )sin qθS |
|||
|
|
|
q=1 |
|
|
(11.5.10) |
|
+Q |
|
|
|
|
|
|
|
|
|
|
|
|
= ∑Hq (f )exp(jqθS ). |
|
|||||
|
q=−Q |
|
|
|
|
|
In this case, H(θS, f) is composed of (2Q + 1) azimuthal harmonics and determined by the (2Q + 1) azimuthal Fourier coefficients {Hq(1) (f ), Hq(2) (f )} or {Hq(f)}. These (2Q + 1)
azimuthal Fourier coefficients can be evaluated from the HRTFs measured or sampled at

Binaural reproduction and virtual auditory display 517
M uniform azimuths within −π < θS ≤ π (−180° < θS ≤ 180°). Let H(θi, f) be the HRTFs measured at M uniform azimuths, so Equation (11.5.10) yields
Q
H (θi , f ) = H0(1) (f ) + ∑ Hq(1) (f )cos qθi + Hq(2) (f )sin qθi
q=1 |
|
+Q |
|
= ∑Hq (f )exp(jqθi ) |
(11.5.11) |
|
q=−Q
θi = 2Mπ i , i = 0,1…(M − 1).
If θi in Equation (11.5.11) exceeds π, it should be replaced with θi− 2π to keep the azimuthal variable within the range of −π< θ ≤ π because of the periodic variation in azimuth θi. For
( |
) |
|
M ≥ |
2Q+ 1 , |
(11.5.12) |
the (2Q + 1) azimuthal Fourier coefficients can be solved using M linear equations expressed in Equation (11.5.11). Fourier coefficients are calculated using the discrete orthogonality of the trigonometric function from Equations (4.3.16) to (4.3.18):
|
|
M−1 |
|
|
H0(1) (f ) = |
1 |
∑H (θi, f ) |
|
|
M |
|
|
||
|
|
i=0 |
|
|
|
|
M−1 |
|
M−1 |
Hq(1) (f ) = |
2 |
∑H (θi, f )cos qθi Hq(2) (f ) = |
2 |
∑H (θi, f )sin qθi 1 ≤ q ≤ Q (11.5.13) |
M |
M |
|||
|
|
i=0 |
|
i=0 |
Hq(1) (f ) = Hq(2) (f ) = 0 Q < q ≤ (M − 1)/2. |
|
|
Substituting Equation (11.5.13) into Equation (11.5.11) leads to an interpolation equation for the azimuthal continuous HRTFs:
M−1
Hˆ (θS, f ) = M1 ∑i=0 H (θi , f )
|
1 |
|
|
|
|
sin Q + |
|
(θ − θi ) |
|
|
|
|
2 |
|
. |
(11.5.14) |
|
θ − |
θi |
||||
|
|
||||
sin |
2 |
|
|
|
|
|
|
|
|
The comparison of Equation (11.5.14) with Equation (11.5.1) yields the following weights for azimuthal interpolation:
|
|
|
|
1 |
|
(θS − θi ) |
||
|
1 |
|
sin[ Q + |
2 |
|
|||
Ai = |
|
|
|
|
|
|||
M |
sin |
θS |
− θi |
|
||||
|
|
|
|
2 |
||||
|
|
|
|
|
|
Q
=M1 1 + 2∑(cos qθi
q=1
cos qθS + sin qθi sin qθS ) . (11.5.15)

518 Spatial Sound
Overall, at each given elevation of ϕ0, azimuthal HRTFs can be decomposed as a weighted sum of the azimuthal harmonics. If the azimuthal HRTF can be represented by the azimuthal harmonics up to order Q, azimuthal continuous HRTFs can be reconstructed from M ≥ (2Q + 1) azimuthal measurements uniformly distributed in the −π < θ ≤ π region. In other words, the azimuthal sampling rate should be at least twice that of the azimuthal Fourier harmonic bandwidth of HRTFs; otherwise, spatial aliasing occurs in interpolated HRTFs. This statement is the azimuthal sampling theorem of HRTFs, which is similar to the Shannon–Nyquist theorem for time sampling.
The minimal number of azimuthal measurements for the recovery of azimuthal continuous HRTF is expressed in Equation (11.5.12). An analysis of the KEMAR–HRTFs in the horizontal plane indicates that the highest-order Q of the azimuthal harmonics increases with frequency, with Q = 32 within the frequency range of f ≤ 20 kHz. In this case, the contributions of 32 preceding-order azimuthal harmonics to the mean relative energy of HRTFs are larger than 0.99 (relative energy error of less than 1%). The Shannon–Nyquist azimuthal sampling theorem requires the minimal azimuthal measurements of Mmin = (2Q + 1) = 65 to recover the azimuthal continuous HRTFs in the horizontal plane. As source elevation deviates from the horizontal plane, the minimal azimuthal measurements required to recover the azimuthal continuous HRTFs decrease. The analyses of human HRTFs yield similar results. Similar to the case in Section 11.5.1, imposing arrival time correction on HRIRs or considering the HRTF magnitudes alone obviously reduces the minimal azimuthal measurements for recovering azimuthal continuous HRTFs.
The preceding discussion can be extended to a three-dimensional case. Far-field HRTFs can be decomposed by realor complex-valued spherical harmonic functions (Evans et al., 1998). Similar to the case of spatial Ambisonics in Section 9.1.2, the source direction is denoted by the notation ΩS, then
∞ 1 2 |
∞ l |
|
H (ΩS, f ) = ∑∑∑Hlm(σ ) (f )Ylm(σ ) (ΩS ) = ∑∑Hlm (f )Ylm (ΩS ) |
(11.5.16) |
|
l=0 m=0 σ =l |
l=0 m=−l |
|
Therefore, H(ΩS, f) is decomposed into a weighted sum of infinite orders of spherical harmonic functions. Spherical harmonic functions are the orthogonal basis functions of ΩS. Frequency-dependent weights [spherical harmonic coefficients Hlm(σ ) (f ) or Hlm(f)], which represent the spherical harmonic spectrum of HRTFs, can be evaluated from a directional continuous H(ΩS, f) by using the orthogonality of spherical harmonic functions, as given by Equation (A.12) in Appendix A.
If the HRTF is spatially bandlimited so that all spherical harmonic components with the order l ≥ L vanish, Equation (11.5.16) becomes
L−1 l |
2 |
L−1 l |
|
H (ΩS, f ) = ∑∑∑Hlm(σ ) (f )Ylm(σ ) (ΩS ) = ∑∑Hlm (f )Ylm (ΩS ). |
(11.5.17) |
||
l=0 m=0 σ =1 |
l=0 m=−l |
|
In this case, the HRTF is determined by L2 spherical harmonic coefficients. Given the measured HRTFs at M discrete-sampled directions, Equation (11.5.17) leads to M linear equations with regard to L2 spherical harmonic coefficients:
L−1 l |
2 |
L−1 l |
|
H (Ωi , f ) = ∑∑∑Hlm(σ ) (f )Ylm(σ ) (Ωi ) = ∑∑Hlm (f )Ylm (Ωi ), |
(11.5.18) |
||
l=0 m=0 σ =1 |
l=0 m=−l |
|

Binaural reproduction and virtual auditory display 519
where H(Ωi, f), i = 0, 1… (M − 1) are M-measured HRTFs. Similar to the case of spatial Ambisonics in Section 9.3.2, for M ≥ L2, Equation (11.5.18) can be solved by the pseudoinverse method. In particular, if the measured directions (directional sampling on a spherical surface) are chosen such that spherical harmonic functions satisfy the discrete orthogonality up to the (L − 1) order expressed in Equation (A.20) in Appendix A, an accurate solution of spherical harmonic coefficients can be found. Directional continuous HRTFs are obtained by substituting the resultant spherical harmonic coefficients into Equation (11.5.17). M ≥ L2 gives the low limit on the number of directional measurements required to recover the directional continuous HRTF, which is the consequence of the Shannon–Nyquist spatial sampling theorem. The practical required number of directional measurements is usually larger than the low limit, depending on the directional sampling scheme.
As stated in Section 11.2.1, near-field HRTF measurements are relatively difficult because of their source distance-dependent features. The spherical harmonic decomposition algorithm can be extended to the distance extrapolation of HRTFs, that is, to estimate the nearfield HRTFs on the basis of far-field measurements (Duraiswami et al., 2004 Pollow et al., 2012; Zhang et al., 2010). According to the acoustic principle of reciprocity, the pressure is invariant after the positions of the sound source and the receiver are exchanged. Therefore, for a source located at the position of the ear, the pressure at the receiver position (rS, ΩS) and the HRTF can be decomposed by spherical harmonic functions as
∞ |
l |
|
|
H (rS, ΩS , f ) = ∑∑Hlm (f )Ξ l (krS )Ylm (ΩS ) |
|
||
l=0 m=−l |
|
|
|
= H00(1) (f )Ξ0 (krS )Y00(1) (ΩS ) |
(11.5.19) |
||
∞ |
l |
|
|
+∑∑ Hlm(1) (f )Ξ l |
(krS )Ylm(1) (ΩS ) + Hlm(2) (f )Ξ l (krS )Ylm(2) (ΩS ) , |
|
l=1 m=0
and
Ξ l (krS ) = (−j )l +1 krS exp(jkrS )hl (krS ), |
(11.5.20) |
where hl(krS) is the l-order spherical Hanker function of the secondary type. Similar to the case of far-field HRTFs, if all spherical harmonic components with the order l ≥ L vanish in Equation (11.5.20), then the spherical harmonic coefficients of decomposition can be evaluated from M directional measurements H(r0, Ωi, f), i = 0, 1… (M − 1) at a constant source distance rS = r0 (for example, at a far-field distance) provided that M satisfies the condition of the Shannon–Nyquist spatial sampling theorem. The distanceand directional continuous HRTFs can be obtained by substituting the resultant spherical harmonic coefficients into Equation (11.5.19).
The basic functions in spatial harmonic decomposition are continuous and predetermined by which spatial continuous HRTFs are recovered. However, the efficiency of spatial harmonic decomposition is not optimal from the point of data reduction because more basis functions are usually required. HRTFs can also be decomposed by other spatial basis functions although these spatial functions may not be continuous and predetermined. If HRTFs can be decomposed by a small set of spatial basis functions, the dimensionality of data is greatly reduced, and HRTFs in more directions can be recovered from a small set of directional measurements (fewer than the requirement of the Shannon–Nyquist spatial sampling