Добавил:
kiopkiopkiop18@yandex.ru t.me/Prokururor I Вовсе не секретарь, но почту проверяю Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Ординатура / Офтальмология / Английские материалы / Auditory and Visual Sensations_Ando, Cariani_2009.pdf
Скачиваний:
0
Добавлен:
28.03.2026
Размер:
12.86 Mб
Скачать

5.2 Temporal Factors Extracted from Autocorrelations of Sound Signals

83

We conclude that the listening level (LL) and the IACC are associated with the right cerebral hemisphere, and the temporal factors, t1 and Tsub, and the sound field in a room are associated with the left (Table 5.1). Hemispheric specializations for temporal and spatial sensations may relate to the highly independent contribution their factors play in the perception of subjective attributes. For example, the “cocktail party effect,” the ability to “hear out” individual speakers amidst many background voices, may be explained at least partially on terms of such specializations, because speech is mainly processed in the left hemisphere, in a manner somewhat independent of the spatial information that is represented in the right hemisphere (Section 9.2).

Based on the model, we will describe the perceptual, subjective attributes of sounds and sound fields in terms of signal representations and processing operations, relating these where possible to neuronal signal processing in the ascending auditory pathways and auditory regions of the two cerebral hemispheres.

5.2Temporal Factors Extracted from Autocorrelations of Sound Signals

The autocorrelation function ACF of the sound signals at the left and the right ear entrances, respectively, is defined by,

ll(τ )

=

 

1

 

+T

f1

 

(t)f1

(t

+

τ )dt

 

 

 

 

 

2T

T

 

 

 

 

(5.1)

rr(τ )

=

 

1

 

+T

fr

 

(t)fr

(t

+

τ )dt

 

 

2T

 

 

 

 

T

 

 

 

 

 

where p’(t) = p(t) s(t), s(t) is the ear sensitivity, which is essentially the transfer function of the physical system from the free field to the oval window of the cochlea. The auditory temporal integration window 2T is described in the following section. Rigorously, although the sensitivities of individual listeners vary, for practical purposes, the impulse response of an A-weighted network is commonly used as a generic substitute.

The ACF and the power density spectrum mathematically contain the same information, and both kinds of representations are present at the level of the auditory nerve. For reasons discussed in the preceding sections, we believe that the fine spectral and temporal information needed for the analysis of music and speech is mainly conveyed through distributions of interspike intervals that constitute autocorrelation-like representations. This means that features in the neural ACF have extensive correspondences with those of the ACFs of acoustic sound sources, both in isolation and in filtered through sound fields. We believe that the correlation mechanisms in the neural system may work in a manner analogous to the features and operations on the ACF in the model.

As is mentioned in Section 2.1, five temporal factors are extracted from the ACF analysis, many of which are associated with corresponding auditory percepts:

84

5 Model of Temporal and Spatial Factors in the Central Auditory System

1.ACF amplitude at zero lag (loudness). Energies at two ear entrances represented are given by Φll(0) and Φrr(0). The geometrical mean of the sound energies arriving at the two ears yields the binaural listening level (LL) that is given by

LL = 10log[ ll(0) rr(0)]1/2/ Ref(0)

where ΦRef(0) is the reference level corresponding to the pressure 20 μPa that is the reference value of the sound pressure level (SPL).

2.Width of the ACF peak about zero lag. As shown in Fig. 2.1, the width of amplitude φ(τ ) around the origin of the delay time defined at a value of 0.5, is Wφ(0), given that φ(τ ) is an even function. In the fine structure of the autocorrelation function, which consists of peaks, troughs, and delays, for instance, τ 1 and φ1, respectively, are the delay time and the amplitude of the first ACF peak. After the first peak there are subsequent peaks and delays, τ n and φn being the delay time and the amplitude of the nth local peak (n > 1). Usually there are certain correlations between τ n and τ n +1 and between φn and φn+1.

Two of the most significant factors that are related to the perceived timbre of signals lie at the first ACF maximum (aside from the central peak at zero-lag origin). These are:

3.Factor τ 1, the delay of the first ACF peak, which reflects the frequency of the highest partial present in the signal, i.e., signal bandwidth.

4.Factor φ1, amplitude of the first ACF peak, which reflects the magnitude of this partial.

5.Effective duration. The effective duration of the envelope of the normalized ACF, τ e, is defined by the tenth-percentile delay and represents a repetitive feature

containing the sound source itself (Ando, 1985, 1998). Intuitively, the effective duration is a measure of the duration of periodic structure in the signal.

The normalized ACF is defined by

φp(τ ) = p(τ )/ p(0)

(5.2)

As in Fig. 2.2, effective duration τ e is obtained by fitting a straight line to extrapolate the delay time when the ACF amplitude would be 10% of its zero–lag value (i.e., at –10 dB). The linear extrapolation in effect assumes that the initial envelope of the ACF decays exponentially. It is remarkable that these temporal factors or features of the ACF can effectively describe four temporal sensations, i.e., loudness, pitch, timbre, and duration. These are discussed further in Chapter 6 and in Ando (2002).

5.3 Auditory Temporal Window for Autocorrelation Processing

In analyzing the running autocorrelation function ACF, we need to know the temporal integration time of auditory signals, the so-called auditory-temporal window 2T in Equation (5.1). The initial part of the ACF within the effective duration τ e of the

5.3 Auditory Temporal Window for Autocorrelation Processing

85

Fig. 5.4 Relationship between the temporal integration window (2T)r and the minimum value of the effective duration (τ e)min of the running ACF. Different symbols signify experimental results using different sound sources

ACF contains the information about the signal that is most relevant to its perception. In order to determine the auditory-temporal window, we can examine the time duration over which the perceptual effects of sounds are integrated. Successive loudness judgments in pursuit of the running listening level LL have been conducted (Mouri, Akiyama and Ando, 2001). Results are shown in Fig. 5.4, so that the recommended signal duration (2T)r to be analyzed is approximately given by

(2T)r 30(τe)min

(5.3)

where (τ e)min is the minimum value of τ e obtained by analyzing the running ACF. The auditory-temporal integration window should thus be about 30 times the minimum effective duration of the signal. This signifies an adaptive temporal window that depends on the temporal characteristics of the sound signal presented. Therefore, the range of the optimal temporal window duration differs according to whether the source material consists of musical pieces, with long effective durations, or continuous speech, with short ones. For music, the optimal auditory-temporal window range (2T)r is 0.5–5 s, whereas for continuous speech, it is 50–100 ms for vowels and 5–10 ms for consonants. The auditory-temporal window roughly corresponds to “fast” or “slow” time constant settings in contemporary sound meters. Ideally, for a given type of sound source, such parameters would be replaced by an optimal temporal window computed from the effective durations of the signals themselves. The running time stepsize (Rs), which signifies the degree of overlap of successive segments of the signal that are analyzed, is not so critical. The analysis stepsize K2(2T)r is typically a fraction K2 of the auditory-temporal window, with K2 being in the range of 1/4–1/2.

86

5 Model of Temporal and Spatial Factors in the Central Auditory System

5.4 Spatial Factors and Interaural Crosscorrelation

In the auditory signal processing model, features in the interaural crosscorrelation function IACF underlie most perceptual attributes associated with spatial hearing, i.e., the spatial sensations. The interaural crosscorrelation function IACF for the sound pressures, p(t)l,r, arriving at the two ear entrances is given by

lr(τ ) =

1

+T

(t)fr

(t + τ )dt

(5.4)

 

fl

2T

 

 

T

 

 

 

where fl,r(t) = p(t)l,r s(t), s(t) being the ear sensitivity, which is essentially the transfer function of the outer and middle ears. The normalized IACF in the possible range of maximum interaural delay times is given by,

lr(τ ) = lr(τ )/[ ll(0) rr(0)]1/2, 1 ms <τ < +1 ms

(5.5)

where Φll(0) and Φrr(0) are the ACFs at τ = 0, or the sound energies arriving at the leftand right-ear entrances, respectively.

From the IACF analysis, four spatial factors are computed (Fig. 2.8):

1.The magnitude IACC of the interaural crosscorrelation function (IACF, denoted φlr) is defined by the maximum value of the crosscorrelation function

IACC = φlr(τ )|max

(5.6)

for possible interaural time delays in the physiological range: –1 ms < τ < +1 ms.

2.The interaural delay time at which the IACF has its maximal value, i.e., the delay at which IACC is defined is τ IACC.

3.The effective bandwidth of the IACC, WIACC is the width of the IACF peak whose maximum defines the IACC. The bandwith is measured by finding the interaural delay interval around the IACF peak for which the value is at least δ below its maximal value, the IACC. This interaural correlation bandwidth WIACC may correspond to the just-noticeable-differences (JND) of the interaural correlation magnitude IACC and its associated percepts: apparent source width, subjective diffuseness, and envelopment. This value mainly depends on the frequency composition of the source signal. One perceives sounds that are spatially well defined and localized in the horizontal plane at a lateral position associated

with the interaural time delay τ IACC when there is a sharp peak in the IACF and a correspondingly small value of WIACC. On the other hand, when listening to a sound field with a low value of IACC that is less than 0.15, then a subjectively diffuse sound is perceived (Damaske and Ando, 1972).

4.The denominator of Equation (5.5), defined by the binaural listening level LL, is the geometrical mean of the sound energies arriving at the two ears.