Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
акустика / xie_bosun_spatial_sound_principles_and_applications.pdf
Скачиваний:
175
Добавлен:
04.05.2023
Размер:
28.62 Mб
Скачать

Binaural reproduction and virtual auditory display  513

For a (Q, P)-order ARMA or IIR filter model, the system function is

 

 

 

 

 

 

 

 

 

 

Q

 

 

H (z) =

b0 + b1z

1

+ …+ bQz

Q

 

 

bqzq

 

 

 

 

=

 

q=0

,

(11.4.6)

1

+ a1z

1

+ …+ aPz

P

 

P

 

 

 

 

 

1

+ apzp

 

 

 

 

 

 

 

 

 

 

 

 

p=1

where (ap, bq) are a set of (Q+P+1) filter coefficients.

With HRTF-based filter designs, the coefficients in Equation (11.4.5) or Equation (11.4.6) can be selected appropriately so that filter responses exactly or approximately match with the target HRTF or HRIR in some mathematical or perceptual criteria. An HRTF-based filter design has been developed with various methods, including conventional time windowing or frequency sampling methods for FIR filters and Prony or the Yule–Walker method for IIR filters. A balanced model truncation (BMT) method is used to design the IIR filter from the original HRIR with a finite length (Mackenzie et al., 1997). To improve the computational efficiency of binaural synthesis for various target virtual source directions, Haneda et al. (1999) proposed a common acoustical pole and zero (CAPZ) model for HRTF-based filters. In this model, HRTFs in M directions are represented by an ARMA model with directionindependent poles and direction-dependent zeros. Poles represent the direction-independent peaks in HRTF magnitudes caused by ear canal resonance, and zeros denote the variation in the directions of HRTFs. For a group of HRTFs in M directions, the CAPZ model involves fewer parameters and is simpler than conventional ARMA models.

The performance of HRTF-based filters can be evaluated using various error criteria. HRTF-based filters are usually designed to minimize certain physical errors, such as the square error between filter and target responses. Some methods based on auditory error criteria, such as the logarithmic error criterion method (Blommer and Wakefield, 1997) and the frequency-warped method (Härmä et al., 2000), have also been suggested for HRTFbased filter designs. Therefore, HRTF-based filter designs are an important topic in VAD (Xie, 2008a, 2013; Huopaniemi et al., 1999).

11.5  SPATIAL INTERPOLATION AND DECOMPOSITION OF HRTFS

11.5.1  Directional interpolation of HRTFs

Far-field HRTFs are continuous functions of the source direction (θS, ϕS). As stated in Section 11.2.1, HRTFs are usually measured in discrete and finite directions, that is, sampled in directions around a spatial spherical surface or a horizontal circle. Under certain conditions, HRTFs in unmeasured directions can be reconstructed or estimated from the measured data by using various interpolation schemes.

For example, at a constant source distance r = rS, HRTFs at the arbitrary unmeasured azimuth θS can be estimated from the HRTFs measured at M horizontal azimuths [that is, H(θi, f) with i = 0, 1, …, M − 1] by using the linear interpolation scheme:

M1

 

ˆ

(11.5.1)

H (θS, f ) AiH (θi , f ),

i=0

514  Spatial Sound

where the subscript for the left or right ear is omitted, ˆ (θ , ) is the interpolated HRTF,

H S f

and Ai = Ai(θS) is a set of weights related to the target azimuth. Various interpolation schemes can be developed using different methods for selecting the measured azimuths and weights. In each direction, digital measurement leads to HRTF at N discrete frequencies; therefore, Equation (11.5.1) is the directional interpolation equation for N discrete frequencies of f =fk (k = 0, 1… N −1).

Equation (11.5.1) can be extended to three-dimensional spatial directions as

M1

 

ˆ

(11.5.2)

H (θS, φS, f ) AiH (θi , φi ,f ),

i=0

 

where ˆ (θ , φ , ) is the interpolated HRTF at the arbitrary unmeasured direction (θ , ϕ )

H S S f S S and the H(θi, ϕi, f) with i = 0,1,2…M−1 are HRTFs at M measured directions. Equations

(11.5.1) and (11.5.2) are HRTF linear interpolation equation in the frequency domain and applicable to both complex-valued HRTFs and HRTF magnitudes. Interpolation of HRTF magnitudes alone improved performance.

Because of the linear characteristic of temporal-frequency Fourier transformation, Equations (11.5.1) and (11.5.2) are also applicable to the HRIRs in the time domain. For example, the time domain version of Equation (11.5.2) is given by

M1

 

ˆ

(11.5.3)

h(θS, φS, t) Aih(θi , φi , t).

i=0

If HRTFs satisfy the minimum-phase approximation given by Equation (11.4.1), Equations (11.5.1) to (11.5.3) are also applicable to the minimum-phase HRTFs or HRIRs. Interpolation of minimum-phase HRTFs improved performance. However, the resultant HRTF is not always a minimum-phase function because a weighted sum of minimum-phase functions does not always result in a minimum-phase function. In addition, some work suggested imposing arrival time correction on HRIR interpolation (Matsumoto et al., 2004). That is, prior to interpolation, the arrival time of the HRIRs for each source direction is made synchronous by shifting the onset time of each HRIR. Time correction also improves the performance of interpolation.

A simple example for directional interpolation is adjacent linear interpolation. Within the azimuthal region θi < θS < θi+1 in the horizontal plane, a HRTF at azimuth θS is approximated by the first-order term of its Taylor expansion of θS

H (θS , f ) H (θi , f ) +

H (θS

, f )

 

 

(θS θi )

 

θS

 

 

 

θS =θi

 

 

 

 

 

(11.5.4)

 

 

 

 

 

 

H (θi , f ) +

H (θi+1, f ) H (θi , f )

(θS θi ),

 

θi+1 θi

 

 

 

 

 

or

ˆ

(11.5.5)

H (θS , f ) Ai+1H (θi+1, f ) + AiH (θi , f ).

Binaural reproduction and virtual auditory display  515

The weights are given by

Ai+1 =

θS θi

Ai = 1

θS θi

.

(11.5.6)

θi+1 θi

 

 

 

θi+1 θi

 

Therefore, the unmeasured HRTF at θS is approximated as the weighted sum of a pair adjacent HRTFs, and the weights Ai and Ai+1 are independent from frequency. Equation (11.5.5) is the equation of conventional adjacent linear interpolation, a special case of Equation (11.5.1).

Bilinear interpolation is a three-dimension extension of adjacent linear interpolation (Wightman et al., 1992). Given that HRTFs are measured at a constant source distance r = rS, the spherical surface (upon which the source is located) is sampled along both azimuthal and elevation directions, resulting in a measurement grid, with its vertices representing the source directions for measurement. The HRTF at an unmeasured direction within the grid are approximated as a weighted sum or average of the HRTFs associated with the four nearest directions:

A spherical triangular interpolation scheme is established (Freeland et al., 2004). The measured positions consist of a triangular grid on a spherical surface. The HRTF in an unmeasured direction within a grid is approximated as a weighted sum of the measured HRTFs at the three adjacent vertices of the grid.

Similar to the case of the HRTF-based filter in Section 11.4, the performance of interpolation can be evaluated using various error criteria. For example, the relative energy error is defined as

 

 

 

 

ˆ

φS, f )

 

2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

ErrR (θS, φS, f ) =

H (θS, φS, f ) H(θS,

 

 

(×100%),

(11.5.7)

 

 

H (θS, φS, f )

 

2

 

 

 

 

 

 

 

 

 

where H(θS, ϕS, f) and

ˆ

are the target and interpolated HRTFs, respectively.

H (θS, φS, f )

11.5.2  Spatial basis function decomposition and spatial sampling theorem of HRTFs

HRTFs are multivariable functions. Even for a specified individual and in a far field, HRTFs are complex-valued functions of the source direction (θS, ϕS) and frequency f. This multivari- able-dependent characteristic yields the substantial dimensionality of the entire HRTF data, so the analysis and representation of HRTFs are complicated. Alternatively, an efficient or low-dimensional representation of HRTFs can be achieved by decomposing HRTFs into a weighted sum of appropriate basis functions, where the dependencies of HRTFs on different variables are separately represented by variations in basis functions and weights. The basis function decompositions of HRTFs are applicable to the binaural synthesis of multiple virtual sources in VAD.

HRTF linear decomposition is categorized into two basic types: spatial basis function decomposition and spectral shape basis function decomposition. The former is addressed here, and the latter is discussed in Section 11.5.4. Several spatial basis function decomposition schemes exist. Among them, the spatial harmonic decomposition scheme, which is closely related to Ambisonics, leads to the spatial sampling theorem of HRTFs and is applicable to simplifying HRTF representation and binaural synthesis in VAD.

516  Spatial Sound

The azimuthal harmonic decomposition of horizontal HRTFs is discussed first (Zhong and Xie, 2005, 2009). At each given elevation ϕS= ϕ0, such as the horizontal plane ϕS = 0°, a far-field HRTF for a specified individual and ear is a continuous function of the azimuth with a period of 2π. Therefore, it can be expanded as a realor complex-valued azimuthal Fourier series as

(1)

+∞

(1)

(2)

 

 

H (θS , f ) = H0

(f ) + Hq

(f )cos qθS + Hq

(f )sin qθS

 

q=1

 

 

(11.5.8)

+∞

 

 

 

 

 

 

 

= Hq (f )exp(jqθS ).

 

q=−∞

 

 

 

 

Therefore, H(θS, f) is decomposed into a weighted sum of infinite orders of azimuthal harmonics. The azimuthal harmonics {cosS, sinS} or {exp(jqθS)} are an infinite set of orthogonal basis functions. They depend only on θS. The frequency-dependent coefficients or weights

{Hq(1) (f ),Hq(2) (f )} or {Hq(f)} represent the azimuthal spectrum of HRTFs. They can be evaluated from the continuous H(θS, f) as follows:

 

1

π

 

 

 

 

 

 

 

H0(1) (f ) =

H (θS , f )dθS

 

 

 

 

 

 

 

2π

 

 

 

 

 

 

 

 

 

 

π

 

 

 

 

 

 

 

 

1

π

 

 

1

π

 

 

 

Hq(1) (f ) =

H (θS , f )cos qθSdθS

Hq(2) (f ) =

H (θS , f )sin qθSdθS

(11.5.9)

π

π

 

 

π

 

 

 

π

 

 

 

H0 (f ) = H0(1) (f ) Hq (f ) =

1 Hq(1) (f ) jHq(2) (f )

Hq (f ) =

1 Hq(1) (f ) + jHq(2) (f )

 

 

 

 

2

 

 

 

 

2

 

q = 1, 2, 3

 

 

 

 

 

 

 

In addition to frequency dependence, {Hq(1) (f ),Hq(2) (f )} or {Hq(f)} are relevant to elevation,

individuals, and ears. For simplicity, however, these variables are excluded from the following discussion. Moreover, the weights here are written as functions of frequency; they should not be confused with the notation of Hankel functions in Chapters 9 and 10.

If the HRTF is azimuthally bandlimited such that all azimuthal harmonics with the order

(1)

 

(2)

 

 

 

 

|q| >Q vanish [that is, Hq

(f ) = Hq

(f ) = Hq (f ) = 0 for |q| >Q], Equation (11.5.8) becomes

H (θS, f ) =

(1)

(f ) +

Q

(1)

(2)

 

 

H0

Hq

(f )cos qθS + Hq

(f )sin qθS

 

 

 

q=1

 

 

(11.5.10)

 

+Q

 

 

 

 

 

 

 

 

 

 

= Hq (f )exp(jqθS ).

 

 

q=−Q

 

 

 

 

 

In this case, H(θS, f) is composed of (2Q + 1) azimuthal harmonics and determined by the (2Q + 1) azimuthal Fourier coefficients {Hq(1) (f ), Hq(2) (f )} or {Hq(f)}. These (2Q + 1)

azimuthal Fourier coefficients can be evaluated from the HRTFs measured or sampled at

Binaural reproduction and virtual auditory display  517

M uniform azimuths within −π < θS ≤ π (−180° < θS ≤ 180°). Let H(θi, f) be the HRTFs measured at M uniform azimuths, so Equation (11.5.10) yields

Q

H (θi , f ) = H0(1) (f ) + Hq(1) (f )cos qθi + Hq(2) (f )sin qθi

q=1

 

+Q

 

= Hq (f )exp(jqθi )

(11.5.11)

 

q=−Q

θi = 2Mπ i , i = 0,1(M 1).

If θi in Equation (11.5.11) exceeds π, it should be replaced with θi− 2π to keep the azimuthal variable within the range of −π< θ ≤ π because of the periodic variation in azimuth θi. For

(

)

 

M

2Q+ 1 ,

(11.5.12)

the (2Q + 1) azimuthal Fourier coefficients can be solved using M linear equations expressed in Equation (11.5.11). Fourier coefficients are calculated using the discrete orthogonality of the trigonometric function from Equations (4.3.16) to (4.3.18):

 

 

M1

 

 

H0(1) (f ) =

1

H (θi, f )

 

 

M

 

 

 

 

i=0

 

 

 

 

M1

 

M1

Hq(1) (f ) =

2

H (θi, f )cos qθi Hq(2) (f ) =

2

H (θi, f )sin qθi 1 q Q (11.5.13)

M

M

 

 

i=0

 

i=0

Hq(1) (f ) = Hq(2) (f ) = 0 Q < q (M 1)/2.

 

 

Substituting Equation (11.5.13) into Equation (11.5.11) leads to an interpolation equation for the azimuthal continuous HRTFs:

M1

Hˆ (θS, f ) = M1 i=0 H (θi , f )

 

1

 

 

 

sin Q +

 

(θ θi )

 

 

 

2

 

.

(11.5.14)

θ

θi

 

 

sin

2

 

 

 

 

 

 

 

The comparison of Equation (11.5.14) with Equation (11.5.1) yields the following weights for azimuthal interpolation:

 

 

 

 

1

 

(θS θi )

 

1

 

sin[ Q +

2

 

Ai =

 

 

 

 

 

M

sin

θS

θi

 

 

 

 

 

2

 

 

 

 

 

 

Q

=M1 1 + 2(cos qθi

q=1

cos qθS + sin qθi sin qθS ) . (11.5.15)

518  Spatial Sound

Overall, at each given elevation of ϕ0, azimuthal HRTFs can be decomposed as a weighted sum of the azimuthal harmonics. If the azimuthal HRTF can be represented by the azimuthal harmonics up to order Q, azimuthal continuous HRTFs can be reconstructed from M ≥ (2Q + 1) azimuthal measurements uniformly distributed in the −π < θ π region. In other words, the azimuthal sampling rate should be at least twice that of the azimuthal Fourier harmonic bandwidth of HRTFs; otherwise, spatial aliasing occurs in interpolated HRTFs. This statement is the azimuthal sampling theorem of HRTFs, which is similar to the Shannon–Nyquist theorem for time sampling.

The minimal number of azimuthal measurements for the recovery of azimuthal continuous HRTF is expressed in Equation (11.5.12). An analysis of the KEMAR–HRTFs in the horizontal plane indicates that the highest-order Q of the azimuthal harmonics increases with frequency, with Q = 32 within the frequency range of f ≤ 20 kHz. In this case, the contributions of 32 preceding-order azimuthal harmonics to the mean relative energy of HRTFs are larger than 0.99 (relative energy error of less than 1%). The Shannon–Nyquist azimuthal sampling theorem requires the minimal azimuthal measurements of Mmin = (2Q + 1) = 65 to recover the azimuthal continuous HRTFs in the horizontal plane. As source elevation deviates from the horizontal plane, the minimal azimuthal measurements required to recover the azimuthal continuous HRTFs decrease. The analyses of human HRTFs yield similar results. Similar to the case in Section 11.5.1, imposing arrival time correction on HRIRs or considering the HRTF magnitudes alone obviously reduces the minimal azimuthal measurements for recovering azimuthal continuous HRTFs.

The preceding discussion can be extended to a three-dimensional case. Far-field HRTFs can be decomposed by realor complex-valued spherical harmonic functions (Evans et al., 1998). Similar to the case of spatial Ambisonics in Section 9.1.2, the source direction is denoted by the notation ΩS, then

1 2

l

 

H (ΩS, f ) = ∑∑∑Hlm(σ ) (f )Ylm(σ ) (ΩS ) = ∑∑Hlm (f )Ylm (ΩS )

(11.5.16)

l=0 m=0 σ =l

l=0 m=−l

 

Therefore, H(ΩS, f) is decomposed into a weighted sum of infinite orders of spherical harmonic functions. Spherical harmonic functions are the orthogonal basis functions of ΩS. Frequency-dependent weights [spherical harmonic coefficients Hlm(σ ) (f ) or Hlm(f)], which represent the spherical harmonic spectrum of HRTFs, can be evaluated from a directional continuous H(ΩS, f) by using the orthogonality of spherical harmonic functions, as given by Equation (A.12) in Appendix A.

If the HRTF is spatially bandlimited so that all spherical harmonic components with the order l L vanish, Equation (11.5.16) becomes

L1 l

2

L1 l

 

H (ΩS, f ) = ∑∑∑Hlm(σ ) (f )Ylm(σ ) (ΩS ) = ∑∑Hlm (f )Ylm (ΩS ).

(11.5.17)

l=0 m=0 σ =1

l=0 m=−l

 

In this case, the HRTF is determined by L2 spherical harmonic coefficients. Given the measured HRTFs at M discrete-sampled directions, Equation (11.5.17) leads to M linear equations with regard to L2 spherical harmonic coefficients:

L1 l

2

L1 l

 

H (Ωi , f ) = ∑∑∑Hlm(σ ) (f )Ylm(σ ) (Ωi ) = ∑∑Hlm (f )Ylm (Ωi ),

(11.5.18)

l=0 m=0 σ =1

l=0 m=−l

 

Binaural reproduction and virtual auditory display  519

where H(Ωi, f), i = 0, 1… (M − 1) are M-measured HRTFs. Similar to the case of spatial Ambisonics in Section 9.3.2, for M L2, Equation (11.5.18) can be solved by the pseudoinverse method. In particular, if the measured directions (directional sampling on a spherical surface) are chosen such that spherical harmonic functions satisfy the discrete orthogonality up to the (L − 1) order expressed in Equation (A.20) in Appendix A, an accurate solution of spherical harmonic coefficients can be found. Directional continuous HRTFs are obtained by substituting the resultant spherical harmonic coefficients into Equation (11.5.17). M L2 gives the low limit on the number of directional measurements required to recover the directional continuous HRTF, which is the consequence of the Shannon–Nyquist spatial sampling theorem. The practical required number of directional measurements is usually larger than the low limit, depending on the directional sampling scheme.

As stated in Section 11.2.1, near-field HRTF measurements are relatively difficult because of their source distance-dependent features. The spherical harmonic decomposition algorithm can be extended to the distance extrapolation of HRTFs, that is, to estimate the nearfield HRTFs on the basis of far-field measurements (Duraiswami et al., 2004 Pollow et al., 2012; Zhang et al., 2010). According to the acoustic principle of reciprocity, the pressure is invariant after the positions of the sound source and the receiver are exchanged. Therefore, for a source located at the position of the ear, the pressure at the receiver position (rS, ΩS) and the HRTF can be decomposed by spherical harmonic functions as

l

 

 

H (rS, ΩS , f ) = ∑∑Hlm (f )Ξ l (krS )Ylm (ΩS )

 

l=0 m=−l

 

 

= H00(1) (f )Ξ0 (krS )Y00(1) (ΩS )

(11.5.19)

l

 

 

+∑∑ Hlm(1) (f )Ξ l

(krS )Ylm(1) (ΩS ) + Hlm(2) (f )Ξ l (krS )Ylm(2) (ΩS ) ,

 

l=1 m=0

and

Ξ l (krS ) = (j )l +1 krS exp(jkrS )hl (krS ),

(11.5.20)

where hl(krS) is the l-order spherical Hanker function of the secondary type. Similar to the case of far-field HRTFs, if all spherical harmonic components with the order l L vanish in Equation (11.5.20), then the spherical harmonic coefficients of decomposition can be evaluated from M directional measurements H(r0, Ωi, f), i = 0, 1… (M − 1) at a constant source distance rS = r0 (for example, at a far-field distance) provided that M satisfies the condition of the Shannon–Nyquist spatial sampling theorem. The distanceand directional continuous HRTFs can be obtained by substituting the resultant spherical harmonic coefficients into Equation (11.5.19).

The basic functions in spatial harmonic decomposition are continuous and predetermined by which spatial continuous HRTFs are recovered. However, the efficiency of spatial harmonic decomposition is not optimal from the point of data reduction because more basis functions are usually required. HRTFs can also be decomposed by other spatial basis functions although these spatial functions may not be continuous and predetermined. If HRTFs can be decomposed by a small set of spatial basis functions, the dimensionality of data is greatly reduced, and HRTFs in more directions can be recovered from a small set of directional measurements (fewer than the requirement of the Shannon–Nyquist spatial sampling