Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Тверской Государственный Технический Университет

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

Семинар / Диссертации / The Netherlands, 2011.pdf

Скачиваний:

Добавлен:

19.05.2015

Размер:

2.6 Mб

Скачать

☆

<<< < Предыдущая 13 14 15 16 17 18 19 20 21 22 23 2425 / 6725 26 27 28 29 30 31 32 33 34 35 36 37 > Следующая >>>

5.5 Signal processing

ECG

speech signal

reductionnoise	errorsrecordingofremoval
	&

		personality traits
		extroversion		neuroticism
	variance HR				denoted emotions
HR	SD HR	&substractionbaseline selectionfeature		valence and	arousal
				valence and
				arousal
	MAD HR		A	arousal
			A	dimensons
			N

	intensity				negative
	intensity		O		negative
F0			O		positive
			V		positive
	SD F0				valence

			A


				6 emotion	valence
				classes	valence
	energy			classes
	energy
	environment			gender	age
	[ home / office ]			gender	age
	[ home / office ]

Figure 5.2: The processing scheme of Unobtrusive Sensing of Emotions (USE). It shows how the physiological signals (i.e., speech and the ECG), the emotions as denoted by people, personality traits, people’s gender, and the environment are all combined in one ANOVA. Age was determined but not processed. Note that the ANOVA can also be replaced by a classiﬁer or an agent, as a module of an AmI [694].

Explanation of the abbreviations: ECG: electrocardiogram; HR: heart rate; F0: fundamental frequency of pitch; SD: standard deviation; MAD: mean absolute deviation; and ANOVA: ANalysis Of VAriance.

5.5 Signal processing

This section describes how all of the data was recorded and, subsequently, processed (see also Figure 5.2). Speech utterances were recorded continuously by means of a standard Trust multi function headset with microphone. The recording was performed in SoundForge 4.5.278 (sample rate: 44.100 Hz; sample size: 16 bit). Parallel with the speech recording, a continuous recording of the ECG was done through a modiﬁed Polar ECG measurement belt. The Polar ECG belt was connected to a data acquisition tool (NI USB-6008). Its output was recorded in a LabVIEW 7.1 program, with a sample rate of 200 Hz.

5.5.1 Signal selection

The speech signal of three participants was not recorded due to technical problems. For one other participant, the speech signal was too noisy. These four participants were excluded from further analysis. With four other participants, either a signiﬁcant amount of noise was present in their ECG or the signal was even completely absent. These participants were omitted from further processing.

5 Emotion models, environment, personality, and demographics

Since one of the main aims was to unveil any possible added value of speech and ECG features to each other, all data was omitted from analysis of the eight participants whose ECG or speech signals were not recorded appropriately. This resulted in a total of 32 participants (i.e., 16 men and 16 women), whose signals were processed. Regrettably and surprisingly, the eight participants whose data was not processed, all participated in the ofﬁce-like environment. So, 20 participants participated in this research in a home-like environment and 12 of participants sat down in an ofﬁce-like environment. Conveniently, of these 32 participants, men and women were equally present in both environments.

5.5.2 Speech signal

For each participant, approximately 25 minutes of sound was recorded during the study. However, since only parts in which they spoke are of interest, the parts in which the participants did not speak were omitted from further processing. Some preprocessing of the speech signal was required before the features could actually be extracted from the signal. We started with the segmentation of the recorded speech signal in such a way that the speech signal was determined separately for each picture. Next, the abnormalities in the speech signals were removed. This resolved all technical inconveniences, such as: recorded breathing, tapping on the table, coughing, cleaning the throat, and yawning. This resulted in a ‘clean’ signal, as is also illustrated in Figures 5.3a and 5.3b.

After the selection of the appropriate speech signal segments and their normalization, the feature extraction was conducted. Several parameters derived from speech have been investigated in a variety of settings with respect to their use in the determination of people’s emotional state. Although no general consensus exists concerning the parameters to be used, much evidence exists for the standard deviation (SD) of the fundamental frequency (F0) (SD F0), the intensity of air pressure (I), and the energy of speech (E) [182, 590, 644, 725, 739]. We will limit the set of features to these, as an extensive comparison of speech features falls beyond the scope of this study.

For a domain [0, T ], the energy (E) is deﬁned as:

1	Z0	T
		x2(t) dt,	(5.1)
T

where x(t) is the amplitude or sound pressure of the signal in Pa (Pascal) [54]. Its discrete equivalent is:

N −1

1 X x2(ti), (5.2)

N i=0

where N is the number of samples.

5.5 Signal processing

(mW)	0.5
									Signal

Amplitude	0
Amplitude	−0.5
	−0.5	3.5	4	4.5	5	5.5	6	6.5	7
	3	3.5	4	4.5	5	5.5	6	6.5	7
					Time (seconds)

(Hz)	400								Pitch
Frequency	200
Frequency	0
	0	3.5	4	4.5	5	5.5	6	6.5	7
	3	3.5	4	4.5	5	5.5	6	6.5	7
(Pa)	0.05				Time (seconds)
	0.05
									Energy
Pressure	0.04								Energy
	0.03
	0.02

Sound	0.01
Sound	3	3.5	4	4.5	5	5.5	6	6.5	7
	3	3.5	4	4.5	5	5.5	6	6.5	7
					Time (seconds)

(dB)	80								Intensity
Pressure	70
	60
	50
Air	50
Air	3	3.5	4	4.5	5	5.5	6	6.5	7
	3	3.5	4	4.5	5	5.5	6	6.5	7
					Time (seconds)

(a) A speech signal and its features of a person in a relaxed state.

(mW)	0.5
									Signal

Amplitude	0
Amplitude	−0.5
	−0.5	3.5	4	4.5	5	5.5	6	6.5	7
	3	3.5	4	4.5	5	5.5	6	6.5	7
					Time (seconds)

(Hz)	400								Pitch
Frequency	200
Frequency	0
	0	3.5	4	4.5	5	5.5	6	6.5	7
	3	3.5	4	4.5	5	5.5	6	6.5	7
(Pa)	0.05				Time (seconds)
	0.05
									Energy
Pressure	0.04								Energy
	0.03
	0.02
Sound	0.01
Sound	3	3.5	4	4.5	5	5.5	6	6.5	7
	3	3.5	4	4.5	5	5.5	6	6.5	7
					Time (seconds)

(dB)	80								Intensity
Pressure	70
	60
	50
Air	50
Air	3	3.5	4	4.5	5	5.5	6	6.5	7
	3	3.5	4	4.5	5	5.5	6	6.5	7
					Time (seconds)

(b) A speech signal and its features of a person in a sad, tensed state.

Figure 5.3: Two samples of speech signals from the same person (an adult man) and their accompanying extracted fundamental frequencies of pitch (F0) (Hz), energy of speech (Pa), and intensity of air pressure (dB). In both cases, energy and intensity of speech show a similar behavior. The difference in variability of F0 between (a) and (b) indicates the difference in experienced emotions.

5 Emotion models, environment, personality, and demographics

For a domain [0, T ], intensity (I) is deﬁned as:

1	Z0	T
		x2(t) dt,	(5.3)
10 log10 T P02

where P0 = 2 · 10−5 Pa is the auditory threshold [54]. I is computed over the discrete signal in the following manner:

N −1

10 log10 1P2 X x2(ti). (5.4)

N 0 i=0

It is expressed in dB (decibels) relative to P0.

Both the I and the E are directly calculated over the clean speech signal. To determine the F0 from the clean speech signal, a fast Fourier transform has to be applied over the signal. Subsequently, its SD is calculated; see also Eq. 5.5. For a more detailed description of the processing scheme, we refer to [53].

5.5.3 Heart rate variability (HRV) extraction

From the ECG signal a large number of features can be derived that are said to relate to the emotional state of people [14, 304, 327, 349, 672, 676]. This research did, however, not aim to provide an extensive comparison of ECG features. Instead, the use of the combination of the ECG signal with the speech signal was explored. Therefore, one well-known distinctive feature of the ECG was chosen: the variance of heart rate (HRV).

The output of the ECG measurement belt has a constant (baseline) value during the pause between two heart beats. Each new heart beat is characterized by a typical slope consisting of four elements, called: P, Q, R, and S (see Figure 5.4). A heart beat is said to be

Figure 5.4: A schematic representation of an electrocardiogram (ECG) denoting four R- waves, from which three R-R intervals can be determined. Subsequently, the heart rate and its variance (denoted as standard deviation (SD), variability, or mean absolute deviation (MAD)) can be determined.

<<< < Предыдущая 13 14 15 16 17 18 19 20 21 22 23 2425 / 6725 26 27 28 29 30 31 32 33 34 35 36 37 > Следующая >>>

Соседние файлы в папке Диссертации

#
19.05.20155.69 Mб343316164.doc
#
19.05.20152.23 Mб26The American University in Cairo, 2010.pdf
#
19.05.20152.6 Mб29The Netherlands, 2011.pdf
#
19.05.201517.01 Mб287Борисова.docx
#
19.05.20152.53 Mб45Лапшина.pdf