Добавил:

Sekretar kiopkiopkiop18@yandex.ru t.me/Prokururor I Вовсе не секретарь, но почту проверяю Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Ростовский Государственный Медицинский Университет

Предмет:

Медицина общая

Файл:

Ординатура / Офтальмология / Английские материалы / Auditory and Visual Sensations_Ando, Cariani_2009.pdf

Скачиваний:

Добавлен:

28.03.2026

Размер:

12.86 Mб

Скачать

☆

<<< < Предыдущая 27 28 29 30 31 32 33 34 35 36 37 3839 / 6539 40 41 42 43 44 45 46 47 48 49 50 51 > Следующая >>>

Chapter 9

Applications (II) – Speech Reception

in Sound Fields

Our correlation-based auditory signal processing model can be applied to the perception of a single syllable in a sound ﬁeld. Reception of speech signals reproduced in the frontal direction for a direct sound with a single echo is well described in terms of temporal factors that are extracted from the running ACF of the sound signal. Effects of noise disturbances from different horizontal angles on error rates of single syllable identiﬁcation were investigated. Results show that syllable nonidentiﬁcation (NI) rates can be predicted from temporal factors extracted from the ACF and from spatial factors extracted from the IACF. Perceptual dissimilarity of sounds due to different source locations was assessed in psychophysical experiments and accurately estimated using temporal and spatial factors.

9.1 Effects of Temporal Factors on Speech Reception

This section describes a method of calculating the speech intelligibility (SI) of each syllable in terms of the distance between the direct sound as a template and the sound ﬁeld with a single echo. In the calculation of the distance, three temporal factors as well as the sound energy F(0) extracted from the ACF are taken into account. The speech transmission index (STI) for the global speech intelligibility of sound ﬁelds is well known (Houtgast et al., 1980). We have also proposed a method for calculating the speech intelligibility of single syllables (Ando et al., 1999, 2000).

A perceptual distance between a template source signal and a sound-ﬁeld signal is introduced. This distance quantiﬁes the effect of room reverberations in altering the perceptual representation of the signal, such that identiﬁcation is degraded. The greater the perceptual distance between the two signals, the more different they are perceived to be. The greater the perceptual difference between the signal at its source and the signal received by the listener, the lower the likelihood that the listener will recognize and identify it correctly.

A distance between the template source signal and the sound-ﬁeld signal is introduced. Let SKT be the characteristics of an isolated template-syllable K, and let SXSF be that of another syllable X in the sound ﬁeld; symbol T refers to the template, and SF is the sound ﬁeld. Let CKT and CXSF be characteristics of the

Y. Ando, P. Cariani (Guest ed.), Auditory and Visual Sensations,	179
DOI 10.1007/b13253_9, C Springer Science+Business Media, LLC 2009

180	9 Applications (II) – Speech Reception in Sound Fields

isolated template SKT of syllable K and another syllable X in a sound ﬁeld SXSF being processed in the auditory-brain system. Then, the distance between SKT and SXSF is given by

dk = D SXSF,SKT		= CXSF,CKT							(9.1)
where
	I	SF				T
Dτe (X,K) =	\|log(τei	SF	− log(τei			T	\|	/I
Dτe (X,K) =	\|log(τei	X	− log(τei			K	\|	/I
	i=1
	I	SF				T
Dφ1(X,K) =	\|log(φ1i	SF		− log(φ1i		T	\|	/I
Dφ1(X,K) =	\|log(φ1i	X		− log(φ1i		K	\|	/I
	i=1
	I	SF				T
Dτ 1(X,K) =	\|log(τ1i	SF	− log(τ1i			T	\|	/I	(9.2)
Dτ 1(X,K) =	\|log(τ1i	X	− log(τ1i			K	\|	/I	(9.2)
	i=1
	I		SF				SF
D (0)(X,K) =	\|log( (0)i		SF		/ (0)max		SF
D (0)(X,K) =	\|log( (0)i		X		/ (0)max		X
	i=1
	− log( (0)i	T	/ (0)max			T	\|	/I
	− log( (0)i	K	/ (0)max			K	\|	/I

and K and X represent the syllable number of template (T) and the syllable in a sound ﬁeld (SF), respectively. Also, i is the frame number of the running ACF, and I is the total frame number.

Let N be the total number of single syllables, then the percentile SI of the syllable K for one of four factors may be obtained as

SIK(ψ) = 100N exp −	dK	· · ·	dK dK			· · ·	dK	(9.3)
	d1		dK - 1		dk + 1		dN

where ψ = τe, τ1, φ1, or (0).

Let us now demonstrate an example for estimating the intelligibility of single syllables in the sound ﬁeld, which consists of a direct sound and a single echo. The amplitude of the echo was the same as that of the direct sound, and the delay time of the single echo t1 was varied in the range between 0 and 480 ms. The test signal consisted of Japanese single syllables with maskers (Fig. 9.1). The masker was used to control the percentage of correctly identiﬁed signals over a wide range, thereby avoiding saturation effects when identiﬁcation would otherwise approach 100%. The direct sound without maskers was used as a template. As shown in Fig. 9.2, the important initial parts of the normalized running ACF for the range of was analyzed.

9.1 Effects of Temporal Factors on Speech Reception

181

Fig. 9.1 An example of a single syllable with artiﬁcial nonmeaningful forward and backward maskers. The direct sound without maskers was used as a template. CV is a single syllable consisting of a consonant and a vowel

Fig. 9.2 Relative sound energy of a single syllable as a function of time. The important former half part of the running ACF φ(0) < 0.5 of both a template and a test syllable was analyzed. C and V are parts of a consonant and a vowel, respectively

(0) > 0.5

(9.4)

In the analyses, two distances for the direct sound and the echo were calculated with the four factors. One was the distance between the template and the direct sound with the maskers. The other was the distance between the template and the echo with the maskers. The shorter of two distances was selected in the calculation of syllable intelligibility. Using the distances given by Equation (9.2), we calculated the intelligibility of all syllables due to each factor by using Equation (9.3). The SI in total due to the four factors (see Section 5.2) is combined linearly, so that

SI = aSI(τe) + bSI(τ1) + cSI(φ1) + dSI( (0))

(9.5)

where a, b, c, and d are weighting coefﬁcients signifying contributions of the four factors, which are determined so as to maximize SI. If the sound-pressure level is ﬁxed at a constant and spatial factors are invariable, then the fourth term is eliminated and SI is the left hemisphere specialization.

Japanese syllables classiﬁed by the consonant and vowel categories of Table 9.1 were used because listeners recognize these phonetic elements in syllables with minimal confusion (Korenaga, 1997). The frontal loudspeaker in an anechoic chamber

182

9 Applications (II) – Speech Reception in Sound Fields

Table 9.1 Categorization of Japanese single syllables

(a) Unvoiced consonant

Consonant

Vowel

Not contracted

(Category A)

Contracted

KYA

SYA

TYA

HYA

PYA

(Category B)

KYU

SHU

TYU

HYU

PYU

KYO

SHO

TYO

HYO

PYO

(b) Voiced consonant

Consonant

Vowel N

Not

–

contracted

–

(Category C)

–

Contracted

NYA

MYA

–

RYA

–

GYA

ZYA

–

BYA

(Category D)

NYU

MYU

–

RYU

–

GY U

ZYU

–

BYU

NYU

MYO

–

RYU

–

GYO

ZYO

–

BYO

reproduced both of the direct sound and the echo as well as the noise masker. The running ACF with the integration interval 2T = 30 ms with the running step of 10 ms was analyzed. Twenty-one subjects participated in the experiments, so that about a 5% (= 1/21) error might result if a single subject judged wrong.

Results of both calculated and tested intelligibility for a single syllable belonging to each category as a function of the delay time of echo are demonstrated in Fig. 9.3a–d. Averaged results for each category are shown in Fig. 9.4a–d, and averaged values for all syllables are shown in Fig. 9.5. Because we can hear the single syllables twice after the delay time of echo in the condition t1 > 100 ms, the local maxima appeared around 300 ms. In the larger delay range of the reﬂection of more than 300 ms, intelligibility is decreased due to effects of the masker. Even so, calculated values are in good agreement with the tested values. This implies that the four factors extracted from the ACF of the source signals and the sound-ﬁeld signals are effective in identifying speech. Contributions of the four factor to the SI obtained by the multiple regressions are indicated in Table 9.2. It is found that the most signiﬁcant factor of the four is the effective duration of the ACF, τe.

It is concluded, therefore, that:

9.1	Effects of Temporal Factors on Speech Reception	183
a	b

Fig. 9.3 Examples of calculated and tested intelligibilities for the single syllable belonging to each category. (a) /ha/. (b) /be/. (c) /hya/. (d) /nyu/. Twenty-one subjects participated in the experiments

Fig. 9.4 Averaged results of calculated and tested intelligibilities for each category. (a) Category A. (b) Category B. (c) Category C. (d) Category D

184	9 Applications (II) – Speech Reception in Sound Fields

Fig. 9.5 Mean values of calculated and tested intelligibilities for all syllables used

Table 9.2 Contribution of each temporal factor extracted from the running ACF on speech intelligibility. Values were normalized by their maxima of four coefﬁcients obtained from the multiple regression analysess

Consonant	Vowel	τe	τ1	φ1	(0)
	A	0.50	0.60	0.36	1.00
	U	0.07	0.08	0.25	1.00
	E	1.00	0.66	0.23	0.46
Unvoiced	O	0.62	1.00	0.13	0.25
	YA	1.00	0.17	0.55	0.56
	YU	1.00	0.19	0.30	0.47
	YO	1.00	0.30	0.41	0.88
	A	0.74	0.92	0.89	1.00
	I	0.88	0.38	0.01	1.00
	U	1.00	0.80	0.30	0.29
	E	0.19	0.42	1.00	0.01
Voiced	O	1.00	0.25	0.09	0.80
	YA	1.00	0.17	0.55	0.56
	YU	1.00	0.19	0.30	0.47
	YO	0.25	0.21	1.00	0.51
					1.00

1.The speech identiﬁcation in the sound ﬁeld may be described by the four factors extracted from the running ACF of the target signal and the sound ﬁeld with the signal.

2.The most signiﬁcant factor is the effective duration of the ACF, τe, which is intimately related to [ t1]p and whose cortical response correlates are mainly associated with the left hemisphere (Table 5.1).

<<< < Предыдущая 27 28 29 30 31 32 33 34 35 36 37 3839 / 6539 40 41 42 43 44 45 46 47 48 49 50 51 > Следующая >>>

Соседние файлы в папке Английские материалы

#
28.03.202619.66 Mб0Atlas of Oculofacial Reconstruction Principles and Techniques for the Repair of Periocular Defects_Harris_2009.chm
#
28.03.202626.64 Mб0Atlas of Oculoplastic and Orbital Surgery_Spoor_2010.pdf
#
28.03.202611.53 Mб0Atlas of Refractive Surgery_Boyd_2000.pdf
#
28.03.202621.99 Mб0Atlas of Uveitis and Scleritis _Ganesh,Agarwal,George,Biswas_2005.pdf
#
28.03.202679.73 Mб0ATLAS Optical Coherence Tomography of Macular Diseases and Glaucoma 2nd edition_Gupta_2006.pdf
#
28.03.202612.86 Mб0Auditory and Visual Sensations_Ando, Cariani_2009.pdf
#
28.03.20264.82 Mб0Automated Image Detection of Retinal Pathology_Jelinek, Cree_2009.pdf
#
28.03.20264.34 Mб0Basic Ophthalmology_Bradford_2004.djvu
#
28.03.20266.98 Mб0Basic Ophthalmology_Bradford_2004.pdf
#
28.03.20266.36 Mб0Basic Principles of Ophthalmic Surgery_Arnold_2006.pdf
#
28.03.20265.79 Mб0Basic Sciences in Ophthalmology_Velayutham_2009.pdf