Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
акустика / xie_bosun_spatial_sound_principles_and_applications.pdf
Скачиваний:
174
Добавлен:
04.05.2023
Размер:
28.62 Mб
Скачать

206  Spatial Sound

Various weights and combinations of errors lead to different optimized decoding coefficients. For example, at low frequencies, the reproduced sound pressure, the velocity localization vector-based azimuth, the velocity vector magnitude, and the energy localization vectorbased azimuth should be optimized preferentially. Accordingly, the combination of Err1, Err3, Err5, and Err6 is preferentially chosen as the cost function. At mid-high frequencies, the reproduced power, energy vector magnitude, velocity localization vector-based azimuth, and energy localization vector-based azimuth should be optimized preferentially. Accordingly, the combination of Err2, Err4, Err5, and Err6 are preferentially chosen as the cost function. This choice of a frequency-dependent cost function leads to two sets of the decoding equations for lowand mid-high frequencies.

In the Tabu search, given the form of the overall cost function, decoding coefficients are initialized at some random values (or some predetermined values that have been derived from other methods). Each coefficient is increased or decreased at a predetermined step size in a given order but is restricted within a predetermined bound. Then, the variation in the overall cost (error) function is evaluated. Decoding coefficients change toward the direction of a decreasing overall cost function, and the best results are kept. A convergent result is obtained by a recursive search. On the basis of the initial coefficients expressed in Equation (5.2.28), Wiggins derived a set of decoding coefficients by using a Tabu search. In comparison with the decoding coefficients given in Equation (5.2.28), decoding with Tabu-search-based coefficients improves the accuracies in a virtual source direction (especially for the front source) and power level, although the maximum rv and rE decrease slightly.

New optimized criteria, such as uniformity and standard deviation in localization, have also been supplemented to the cost function for solving Ambisonic-decoding coefficients (Moore and Wakefield, 2008). Accordingly, the cost function is constructed by supplementing the weighted combination of the standard deviations of Err3, Err4, Err5, and Err6 over the L target directions into Equation (5.2.30). The optimized procedure for the second-order Ambisoniclike signal mixing has been written as software (Heller et al., 2010). Some other mathematical algorithms, such as artificial neuron networks and genetic algorithms, have also been used for nonlinear optimization to derive the Ambisonic-like decoding equations and coefficients of irregular loudspeaker configurations (Tsang and Cheung, 2009; Tsang et al., 2009).

Some studies have indicated that all aforementioned optimized criteria are difficult to satisfy for the second-order Ambisonic-like signal mixing via an ITU5.1-channel loudspeaker configuration. Four loudspeaker configurations are relatively appropriate for the secondorder Ambisonic-like signal mixing. Adding a front-center loudspeaker is not always beneficial to the lateral virtual source for the second-order Ambisonic-like signal mixing. Therefore, region-dependent signal mixing methods can be used. Three front loudspeakers with pairwise amplitude panning (or local Ambisonic-like mixing discussed in Section 5.2.4) can be used to recreate the virtual source within the front region of −30° ≤ θS ≤30°. The four loudspeakers of L, R, LS, and LS with Ambisonic-like signal mixing are used to recreate lateral and rear virtual sources. The design of this signal mixing method is relatively simple. For example, a frequency-dependent gain is designed to reduce the variation in the virtual source position with frequency (Xie, 2001a). Heller et al. (2010) illustrated more examples for four horizontal loudspeakers with irregular configurations.

5.2.4  Optimization of three frontal loudspeaker signals and local Ambisonic-like signal mixing

In multichannel sound reproduction, a horizontal plane can be divided into subregions, and the virtual source in each subregion is recreated by loudspeakers in the same subregion. The loudspeaker signals in each subregion can also be optimized. For local sound field signal

Multichannel horizontal surround sound  207

mixing or local Ambisonic-like signal mixing, the normalized loudspeaker signals in each subregion are a linear combination of azimuthal harmonics. This mixing differs from global Ambisonic (-like) signal mixing. In the latter, the normalized signals of all loudspeakers are a linear combination of azimuthal harmonics.

For 5.1-channel sound, the virtual source in the frontal region is recreated by the three frontal loudspeakers. On the basis of the virtual sound localization theorem, Gerzon (1990) first derived the local Ambisonic-like signal mixing for three frontal loudspeakers. Generally, similar to Equation (5.2.22), Equation (5.2.31) expresses the normalized signals for three frontal loudspeakers with the first-order local Ambisonic-like signal mixing:

 

 

 

 

AL S Atotal D0,1L D1,1L

cos S D1,2L sin S

 

 

 

 

 

 

 

 

 

1

1

2

 

 

 

 

 

 

 

 

 

AR S Atotal D0,R D1,R cos S D1,R sin S

,

 

 

(5.2.31)

 

 

 

 

 

1

1

2

 

 

 

 

 

 

 

 

 

AC S Atotal D0,C D1,C cos S D1,C sin S

 

 

 

 

Form

the

left–right symmetry, decoding

coefficients satisfy

D 1

D 1

, D 1

D 1 ,

D 2

D 2

, D 2

 

 

 

 

 

0,L

0,R

1,L

1,R

0 . If the optimization at low frequencies is considered only, the criteria

1,L

 

1,R

1,C

 

 

 

 

 

 

 

 

 

of the optimized interaural phase delay difference and its variation with the head rotation or the equivalent criteria of the optimized velocity localization vector in Equation (5.2.31) are used to derive the decoding coefficients. Gerzon (1990) presented the results for three frontal loudspeakers arranged at 0° and ±45°. Here, the results for the frontal loudspeakers with an ITU configuration are given. The decoding coefficients are derived by substituting Equation (5.2.31) into Equation (3.2.22) and letting θv = θS and rv = 1. Then, the normalized loudspeaker signals in Equation (5.2.31) become

 

cos S 2

 

 

AL Atotal 1

3 sin S

 

 

cos S 2

 

(5.2.32)

AR Atotal 1

3 sin S

AC Atotal

3 2cos S

 

 

Similar to Equation (4.3.41), Equation (5.2.33) shows Atotal(overall gain) for constantamplitude normalization:

Atotal

1

 

2 3 .

(5.2.33)

Atotal for constant-power normalization is slightly complicated. Equation (5.2.32) proves

that a target-azimuth-independent Atotal results in an overall power of Pow′ = AL2 + AR2 + AC2 that varies with the target azimuth θS. The virtual source direction depends on the relative

magnitude (and phase or time) relationships among the loudspeaker signals. As such, a tar-

get-azimuth-dependent Atotal can be chosen so that the overall power of loudspeaker signals is normalized to a unit, then

Atotal

 

 

 

 

 

1

 

 

 

 

.

(5.2.34)

 

 

 

 

 

 

 

 

 

 

 

8 1

 

3 sin

2

S

4 1

 

 

1/2

 

 

 

 

 

11

 

3 cos S

 

 

 

208 Spatial Sound

Figure 5.8 The first-order local Ambisonic-like panning curves for three frontal loudspeakers.

This set of signals is difficult to be created with a practical microphone recording technique, but it can be easily simulated via signal processing.

Figure 5.8 illustrates the first-order local Ambisonic-like panning curves for the three frontal loudspeakers. The signals are normalized according to Equation (5.2.34). For a target source at each loudspeaker direction (θS = 0° or ±30°), the signals for two other loudspeakers vanish. In this case, the virtual source is recreated by a single loudspeaker without a crosstalk (with infinite interchannel separation). This ideal feature is similar to that in pair-wise amplitude panning. However, for the first-order local Ambisonic-like panning, the virtual source between two loudspeakers is recreated by three loudspeakers, and the signal for the third loudspeaker is out of phase. This feature is found in Ambisonic-like panning. As stated in Sections 3.2.2 and 4.1.3, an out-of-phase signal is essential to ensure that the velocity vector magnitude is rv = 1 and to stabilize the virtual source. Equations (3.2.7) and (3.2.9) prove that the perceived virtual source direction matches with that of the target source at low frequencies and within the region of −30° ≤ θ ≤ 30°.

Equation (3.2.34) verifies that the energy localization vector-based azimuth θE is inconsistent with the velocity localization vector-based azimuth θv for signal mixing given by Equation (5.2.32). Gerzon (1990) also pointed out that the mismatch of θE and θv may lead to the directional distortion of a virtual source at high frequencies, such as the movement of a high-frequency virtual source near the front toward the direction of central–frontal loudspeakers. Gerzon (1992c) further proposed an optimized method through which the velocity localization vector at low frequencies and the energy localization vector at mid-high frequencies are considered. That is, the decoding coefficients are chosen so that θE matches with θv.

The aforementioned optimized signal mixing for three frontal loudspeakers is theoretically perfect, but it is rarely used in practical program production.

5.2.5 Time panning for 5.1-channel surround sound

Time panning for 5.1-channel surround sound is similar to the case of a two-channel stereophonic sound. Some studies have investigated the possibility of using pair-wise time panning to recreate a virtual source in 5.1-channel reproduction with transient stimuli. Using female speech as a stimulus, Martin et al. (1999) explored the summing localization with ICTD only in the 5.1-channel loudspeaker configuration. They arranged the three frontal loudspeakers identical to those in Figure 5.2, but they arranged two surround loudspeakers in azimuths of±120°.