- •Stellingen
- •Propositions
- •List of Figures
- •List of Tables
- •1 Introduction
- •Introduction
- •Affect, emotion, and related constructs
- •Affective Computing: A concise overview
- •The closed loop model
- •Three disciplines
- •Human-Computer Interaction (HCI)
- •Health Informatics
- •Three disciplines, one family
- •Outline
- •2 A review of Affective Computing
- •Introduction
- •Vision
- •Speech
- •Biosignals
- •A review
- •Time for a change
- •3 Statistical moments as signal features
- •Introduction
- •Emotion
- •Measures of affect
- •Affective wearables
- •Experiment
- •Participants
- •Equipment and materials
- •Procedure
- •Data reduction
- •Results
- •Discussion
- •Comparison with the literature
- •Use in products
- •4 Time windows and event-related responses
- •Introduction
- •Data reduction
- •Results
- •Mapping events on signals
- •Discussion and conclusion
- •Interpreting the signals measured
- •Looking back and forth
- •5 Emotion models, environment, personality, and demographics
- •Introduction
- •Emotions
- •Modeling emotion
- •Ubiquitous signals of emotion
- •Method
- •Participants
- •International Affective Picture System (IAPS)
- •Digital Rating System (DRS)
- •Signal processing
- •Signal selection
- •Speech signal
- •Heart rate variability (HRV) extraction
- •Normalization
- •Results
- •Considerations with the analysis
- •The (dimensional) valence-arousal (VA) model
- •The six basic emotions
- •The valence-arousal (VA) model versus basic emotions
- •Discussion
- •Conclusion
- •6 Static versus dynamic stimuli
- •Introduction
- •Emotion
- •Method
- •Preparation for analysis
- •Results
- •Considerations with the analysis
- •The (dimensional) valence-arousal (VA) model
- •The six basic emotions
- •The valence-arousal (VA) model versus basic emotions
- •Static versus dynamic stimuli
- •Conclusion
- •IV. Towards affective computing
- •Introduction
- •Data set
- •Procedure
- •Preprocessing
- •Normalization
- •Baseline matrix
- •Feature selection
- •k-Nearest Neighbors (k-NN)
- •Support vector machines (SVM)
- •Multi-Layer Perceptron (MLP) neural network
- •Discussion
- •Conclusions
- •8 Two clinical case studies on bimodal health-related stress assessment
- •Introduction
- •Post-Traumatic Stress Disorder (PTSD)
- •Storytelling and reliving the past
- •Emotion detection by means of speech signal analysis
- •The Subjective Unit of Distress (SUD)
- •Design and procedure
- •Features extracted from the speech signal
- •Results
- •Results of the Stress-Provoking Story (SPS) sessions
- •Results of the Re-Living (RL) sessions
- •Overview of the features
- •Discussion
- •Stress-Provoking Stories (SPS) study
- •Re-Living (RL) study
- •Stress-Provoking Stories (SPS) versus Re-Living (RL)
- •Conclusions
- •9 Cross-validation of bimodal health-related stress assessment
- •Introduction
- •Speech signal processing
- •Outlier removal
- •Parameter selection
- •Dimensionality Reduction
- •k-Nearest Neighbors (k-NN)
- •Support vector machines (SVM)
- •Multi-Layer Perceptron (MLP) neural network
- •Results
- •Cross-validation
- •Assessment of the experimental design
- •Discussion
- •Conclusion
- •10 Guidelines for ASP
- •Introduction
- •Signal processing guidelines
- •Physical sensing characteristics
- •Temporal construction
- •Normalization
- •Context
- •Pattern recognition guidelines
- •Validation
- •Triangulation
- •Conclusion
- •11 Discussion
- •Introduction
- •Hot topics: On the value of this monograph
- •Applications: Here and now!
- •TV experience
- •Knowledge representations
- •Computer-Aided Diagnosis (CAD)
- •Visions of the future
- •Robot nannies
- •Digital Human Model
- •Conclusion
- •Bibliography
- •Summary
- •Samenvatting
- •Dankwoord
- •Curriculum Vitae
- •Publications and Patents: A selection
- •Publications
- •Patents
- •SIKS Dissertation Series
List of Tables
1.1An overview of common physiological signals and features used in ASP. The reported response times are the absolute minimum; in practice longer time
windows are applied to increase the recording’s reliability. . . . . . . . . . . . |
6 |
1.2Design feature delimitation of psychological constructs related to affective phenomena, including their brief definitions, and some examples. This table
is adopted from [58, Chapter 6] and [219, Chapter 2]. . . . . . . . . . . . . . . . |
9 |
1.3An overview of 24 handbooks on affective computing. Selection criteria: i) on emotion and/or affect, ii) either a significant computing or engineering ele- ment or an application-oriented approach, and iii) proceedings, M.Sc.-theses,
Ph.D.-theses, books on text-analyses, and books on solely theoretical logic- |
|
based approaches were excluded. . . . . . . . . . . . . . . . . . . . . . . . . . . |
11 |
2.1 Review of 12 representative machine learning studies employing computer |
|
vision to recognize emotions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . |
26 |
2.2Speech signal analysis: A sample from history. . . . . . . . . . . . . . . . . . . 28
2.3Review of 12 representative machine learning studies employing speech to
recognize emotions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |
29 |
2.4 An overview of 61 studies on automatic classification of emotions, using |
|
biosignals / physiological signals. . . . . . . . . . . . . . . . . . . . . . . . . . |
33 |
3.1The eight film scenes with the average ratings with the accompanying stan- dard deviations (between brackets) given by subjects (n = 24) on both ex- perienced negative and positive feelings. Four emotion classes are founded: neutral, mixed, positive, and negative, based on the latter two dimensions.
The top eight film scenes were selected for further analysis. . . . . . . . . . . . 47
3.2The discriminating statistical parameters for the EDA, EMG corrugator supercilii, and EMG zygomaticus signals. For each parameter, the average value for all four emotion classes (i.e., neutral: 0; positive: +; mixed: +/-; negative: -.) is provided as well as the strength and significance of its discriminating ability. Additionally, as measure of effect size partial eta squared (η2) is reported,
which indicates the proportion of variance accounted for [211, 737]. . . . . . . 52
xv
Contents
5.1The 30 IAPS pictures [374] with the average ratings given by the participants on the positive valence, negative valence, and arousal Likert scales. From the positive and negative valence ratings, three valence categories were derived: neutral, positive, and negative. Using the scores on arousal, two arousal categories were determined: low and high. Consequently, we were able to assess a discrete representation of the valence-arousal (VA) that distinguished six
compounds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.2Legend of the factors included in the analyses presented in Section 5.6, partic-
ular in Tables 5.3-5.6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.3Results of the repeated measures Multivariate Analysis of Variance (MANOVA) on the valence-arousal (VA) model and its distinct dimensions.
The threshold for significance was set to p ≤ .010. . . . . . . . . . . . . . . . . 90
5.4Results of the repeated measures Analysis of Variance (ANOVA)s on the valence-arousal (VA) model and its distinct dimensions. The threshold for
significance was set to p ≤ .010. . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.5Results of the repeated measures MANOVA on the six basic emotions. The threshold for significance was set to p ≤ .010. . . . . . . . . . . . . . . . . . . . 92
5.6Results of the repeated measures ANOVAs on the six basic emotions. The threshold for significance was set to p ≤ .010. For the Intensity (I) of speech
no results are reported as none of them exceeded the threshold. . . . . . . . . 92
6.1The six film scenes with the average ratings given by the participants on the positive valence, negative valence, and arousal Likert scales. From the pos- itive and negative valence ratings, three valence categories can be derived:
neutral, positive, and negative. Using the scores on arousal, two arousal categories can be determined: low and high . . . . . . . . . . . . . . . . . . . . . . 104
6.2Legend of the factors included in the analyses presented in Section 6.5, particularly in Tables 6.3-6.6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.3Results of the repeated measures MANOVA on the valence-arousal (VA) model and its distinct dimensions. The threshold for significance was set to p
≤ .010. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.4Results of the repeated measures ANOVAs on the valence-arousal (VA) model and its distinct dimensions. The threshold for significance was set to p ≤ .010. For the Intensity (I) and Energy (E) of speech no results are reported as none
of them exceeded the threshold. . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.5Results of the repeated measures MANOVA on the six basic emotions. The threshold for significance was set to p ≤ .010. . . . . . . . . . . . . . . . . . . . 107
6.6Results of the repeated measures ANOVAs on the six basic emotions. The
threshold for significance was set to p ≤ .010. For the Intensity (I) and Energy
(E) of speech no results are reported as none of them exceeded the threshold. 108
7.1The best feature subsets from the time domain, for k-nearest neighbor (k-NN) classifier with Euclidean metric. They were determined by analysis of vari- ance (ANOVA), using normalization per signal per participant. EDA denotes
the electrodermal activity or skin conductance level. . . . . . . . . . . . . . . . 122
xvi
Contents
7.2The recognition precision of the k-nearest neighbors (k-NN) classifier, with k = 8 and the Euclidean metric. The influence of three factors is shown: 1) normalization, 2) analysis of variance (ANOVA) feature selection (FS), and 3)
Principal Component Analysis (PCA) transform. . . . . . . . . . . . . . . . . . 122
7.3Confusion matrix of the k-NN classifier of EDA and EMG signals for the best reported input preprocessing, with a cityblock metric and k = 8. . . . . . . . . 124
8.1Introduction to (the DSM-IV TR [9] criteria for) Post-Traumatic Stress Disorder (PTSD). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
8.2Correlations between Subjective Unit of Distress (SUD) and the parameters of the five features derived from the speech signal, both for the Re-Living (RL)
and the Stress-Provoking Stories (SPS) study. . . . . . . . . . . . . . . . . . . . 144
9.1Standardized regression coefficients β of a Linear Regression Model (LRM) predicting the Subjective Unit of Distress (SUD) using speech parameters. HF denotes High-Frequency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
9.2The classification results (in %) of k-nearest neighbors (k-NN), support vector machine (SVM) (see also Figure 9.4.1), and artificial neural network (ANN).
Correct classification (CN ), baseline (or chance) level for classification (µN ),
and relative classification rate (CN ; see also Eq. 9.3) are reported. The Subjective Unit of Distress (SUD) was taken as ground truth, with several quantiza-
tion schemes. N indicates the number of SUD levels. . . . . . . . . . . . . . . . 158
9.3The classification results (in %) of k-nearest neighbors (k-NN) and support
vector machine (SVM). Baseline (or chance) level for classification (µN ), cor-
rect classification (CN ), and relative classification rate (CN ; see also Eq. 9.3) are reported. N takes either the value 2 or 3. Both the storytelling (ST) and reliving study (RL) analyzed, with + and − denoting respectively the happiness
and stress triggering conditions. . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
10.1Distribution of eccrine (sweat) glands in man, adopted from [561, Chapter 6]. 170
10.2Results of a representative study on the influence of climate on the number of sweat glands, adopted from [561, Chapter 6]. . . . . . . . . . . . . . . . . . . . 170
10.3Results of a representative study on skin temperature (in oC) and thermal cir-
culation index (CI) (i.e., CI = (skin,air) / (interior,skin)) in relation to several body regions, adopted from [561, Chapter 10] Room temperature was 22.8
oC and rectal temperature (as reference temperature) was 37.25oC. . . . . . . . 171
10.4Eight methods to normalize affective signals. x denotes the (original) signal
and min and max are its (estimated) minimum and the maximum. µB , minB , maxB , and σB are respectively the mean, minimum, maximum, and standard deviation of the baseline. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
10.5Standard statistics on three time windows of an EDA signal, as presented in Figure 10.3. These three time windows are close-ups of the signal presented in Figure 10.2, which in turn is a fragment of the signal presented in Figure 10.1.
Note. SD denotes Standard Deviation. . . . . . . . . . . . . . . . . . . . . . . . 174
xvii
Contents
11.1A description of the four categories of affective computing in terms of computer science’s input/output (I/O ) operations. In terms of affective computing, I/O denotes the expression (O ) and the perception, impression, or recognition (I) of affect. This division is adapted from the four cases identified by
Rosalind W. Picard [520]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
xviii
I.PROLOGUE
