Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Семинар / Диссертации / The Netherlands, 2011.pdf
Скачиваний:
29
Добавлен:
19.05.2015
Размер:
2.6 Mб
Скачать

10 Guidelines for ASP

Lost Data

4500

 

 

 

 

 

 

 

 

 

 

4000

 

 

 

 

 

 

 

 

 

 

3500

 

 

 

 

 

 

 

 

 

 

3000

 

 

 

 

 

 

 

 

 

 

2500

 

 

 

 

 

 

 

 

 

 

2000

 

 

 

 

 

 

 

 

 

 

1500

 

 

 

 

 

 

 

 

 

 

1000

 

 

 

 

 

 

 

 

 

 

500

2

4

6

8

10

12

14

16

18

20

0

 

 

 

 

 

Hours

 

 

 

 

 

Figure 10.4: A typical sample of lost data with an EDA signal, as frequently occurs in realworld recordings.

of factors besides affect [78], as has been illustrated throughout this monograph.

To bring the theory just presented into practice, we present an example on the level of activity, as a factor of context. Figure 2.1 illustrates how pervasive motion artifacts can be for ASP in real world settings. Both heart rate and electrodermal activity are elevated during the period of high activity (i.e., from 27 to 30 minutes), as automatically determined through accelerometers. However, as the signal graphs show, the changes in heart rate follow changes in activity much more rapidly than electrodermal activity does, both in terms of onset and, especially, in terms of recovery. For level 4 (walking) in Figure 2.1, it even seems that the physical effects are so dominant that ASP should not be attempted. In contrast, with levels 1 (lying down), 2 (sitting), and 3 (standing/strolling) this is possible. So, physical activity can easily cast a shadow over affective (bio)signals.

10.3 Pattern recognition guidelines

As is illustrated in Table 2.4, a plethora of feature selection algorithms and classifiers has been applied in affective computing studies. Much has been said on the pros and cons of

178

10.3 Pattern recognition guidelines

the methods applied, each article is accompanied with its own reasoning on it or simply ignores possible alternatives. For example, Picard and colleagues [268, 271, 524, 703] posed that sequential floating forward search (SFFS) was superior to ‘standard’ stepwise feature selection (SFS). This claim was questioned in follow-up research by others and recently rejected by Way, Sahiner, Hadjiiski, and Chan [708], who concluded: “PCA was comparable to or better than SFS and SFFS for LDA at small samples sizes” (p. 907) and “In general, the SFFS method was comparable to the SFS method . . . ” (p. 907). However, it should be noted that Way et al. [708] concerned a(n excellent) simulation study not in the domain of affective computing. Nevertheless, it illustrates the still ongoing debate on pattern recognition and machine learning methods. In the choice of these methods, pragmatic considerations, personal preferences, science’s current fashion often seem to be dominant factors of choice. Moreover, several excellent handbooks are available on pattern recognition and machine learning [48, 170, 457, 648, 689] as well as a range of excellent tutorial and survey articles. Therefore, we will refrain from providing an overview of these techniques and provide general but crucial guidelines for the classification of affective signals, which are often violated. We pose that the triplet of guidelines can significantly improve ASP.

10.3.1 Validation

In the pursuit of a method to trigger emotions in a more or less controlled manner, a range of methods have been applied: actors, images (IAPS) (see Chapter 5), sounds (e.g., music) [316, 681], (fragments of) movies (see Chapters 3, 4, 6, and 7), speech [677], commercials [529], games (including serious gaming), agents, virtual reality [86, 474, 488, 616], reliving of emotions (see Chapters 8 and 9), and real world experiences [269, 270, 272, 316]; see also Table 2.4. However, how can we know which of these methods actually triggered participants’ true emotions? This is a typical concern of validity, which is a crucial issue for ASP. For ASP purposes, validity can best be obtained through four approaches: content, criteria-related, construct, and ecological validation, which I will discuss next.

Content validity refers to a) The agreement of experts on the domain of interest (e.g., limited to a specific application or group of people, such as twins [427429]); b) The degree to which a feature (or its parameters) of a given signal represents a construct; and c) The degree to which a set of features (or their parameters) of a given set of signals adequately represents all facets of the domain. For instance, employing only skin conductance level (SCL) for ASP will lead to a weak content validity when trying to measure emotion, as SCL is known to relate to the arousal component of an emotion, but not to the valence component. However, when trying to measure only emotional arousal, measuring only SCL may form strong content validity.

Criteria-related validity handles the quality of the translation from the preferred mea-

179

10 Guidelines for ASP

surement (e.g., ECG) to an alternative (e.g., BVP), rather than to what extent the measurement represents a construct (e.g., a dimension of emotion space). Emotions are preferably measured at the moment they occur; however, measurements before (predictive) or after (postdictive) the particular event are sometimes more feasible (e.g., through subjective questionnaires). The quality of these translations are referred to as respectively predictive or postdictive validity. A third form of criteria-related validity is concurrent validity: a metric for the reliability of measurements (e.g., EDA recording on the foot sole) applied in relation to the preferred standard (e.g., EDA recording on the hand palm). For instance, the more affective states are discriminated the higher the concurrent validity.

A construct validation process aims to develop a nomological network (i.e., a ground truth) or an ontology or semantic network, built around the construct of interest. Such a network requires theoretically grounded, observable, operational definitions of all constructs and the relations between them. Such a network aims to provide a verifiable theoretical framework. The lack of such a network is one of the most pregnant problems ASP is coping with. This problem has been assessed in Chapters 5 and 6 that applied both the valencearousal model and basic emotion categories as representations for affective states. A frequently occurring mistake is that emotions are denoted, where moods (i.e., longer objectunrelated affective states with very different physiology) are meant. This is very relevant for ASP, as it is known that moods are accompanied by very different physiological patterns than emotions are [223].

Ecological validity refers to the influence of the context on measurements. We identify two issues: 1) Natural affective events are sparse, which makes it hard to let participants cycle through a range of affective states in a limited time frame; and 2) The affective signals that occur are easily contaminated by contextual factors; so, using a context similar to that of the intended ASP application for initial learning is of vital importance. Although understandable from a measurement-feasibility perspective, emotion measurements are often taken in controlled laboratory settings. This makes results poorly generalizable to real-world applications.

The concern of validity touches upon the essence of research. However, it is still frequently ignored in branches of computer science. With this guideline, I hope to have provided some workable definitions of four types of validity that are crucial for ASP. These four types of validity should be respected, both when conducting research and when developing applications.

10.3.2 Triangulation

Triangulation is the strategy of combining multiple data sources, investigators, methodological approaches, theoretical perspectives, or analytical methods within the same study

180

10.3 Pattern recognition guidelines

[344, 651]. This provides the methodological instruments to “separate the construct under consideration from other irrelevancies in the operationalization” [273, p. 15901]. We propose to adopt this principle of triangulation, as applied in the social sciences and human-computer interaction, for ASP. Within the domain of affective computing, the constructs under investigation are emotions and irrelevancies can be the various sources of noise, as were mentioned in Chapter 2 and in Section 10.2.1.

Generally, five types of triangulation are distinguished [344, 651], each having their own advantages, namely:

1.Data triangulation: Three dimensions can be distinguished in data sources: time, space (or setting), and person (i.e., the one who obtained the recordings) [154]. Time triangulation can be applied when data is collected at different times [344]; for example, as is done by Picard et al. [524], Healey and Picard [272], and Janssen, Van den Broek, and Westerink [316]. In general, variance in events, situations, times, places, and persons are considered as sources of noise; however, they can also add to the study. Extrapolations on multiple data sets can provide more certainty in such cases. In turn, corrections can also be made to atypical data in a result set that clearly deviates from other results [651].

2.Investigator triangulation: Multiple observers, interviewers, coders, or data analysts can participate in the study. Agreement between these researchers, without prior discussion or collaboration with one another, increases the credibility of the observations [154]. Par excellence, this type of triangulation can be employed on including context and unveiling events as this often includes subjective interpretations of events, see also Section 10.2.4.

3.Methodological triangulation: This can refer to either data collection methods or research designs [404]. The major advantage is that deficiencies and biases that stem from a single method can be countered [651]. Multiple data sets (e.g., both qualitative and quantitative) and signal processing techniques (e.g., in the time and spectral domain) can be employed (see Table 2.4). Moreover, multiple feature extraction paradigms, feature reduction algorithms, and classification schemes can be employed (again, see Table 2.4). Further, note that methodological triangulation is also called multi-method, mixed-method, and methods triangulation [233].

4.Theoretical triangulation: Employing multiple theoretical frameworks when examining a phenomenon [154, 301, 396]; for example, using both a categorical (or discrete) and a continuous (e.g., valance-arousal) model of emotion [673, 676]. See Chapters 5 and 6 for a discussion on this topic.

5.Analytical triangulation: The combination of multiple methods or classification methods to analyze data [344, 682]. As is shown in Table 2.4, this approach has already

181

10 Guidelines for ASP

often been employed. For example, Picard et al. [524] and Healey and Picard [272] combined different signals from the same modality and Kapoor, Burleson, and Picard [328] and Bailenson et al. [25] combined biosignals with a vision-based approach. Paulmann, Titone, and Pell [512] who combined speech processing and eye tracking, which revealed that emotional prosody has a rapid impact on gaze behavior during social information processing. This facilitates (cross) validation of data sources, as is also described in Section 10.3.1.

In general, in well controlled research, we advise the recording of at least 3 affective biosignals and the derivation of at least 3 features from them, for each construct under investigation. In ambulatory, real-world research much more noise will be recorded, as also described in Chapter 2 and Section 10.2.1. To ensure that this noise can be canceled out, we advise the recording of many more affective biosignals and also the extraction of more features from them. As a rule of thumb for ambulatory research we advise researchers to record as many signals possible, avoiding interference with participants’ natural behavior. However, a disadvantage accompanies this advice, as “a ‘more is better’ mentality may result in diluting the possible effectiveness of triangulation” [651, p. 256] Moreover, where possible, qualitative and subjective measures should always accompany the signals (e.g., questionnaires, video recordings, interviews, and Likert scales); for example, see [272, 616, 677, 716].

10.3.3 User identification

Throughout the field of affective computing, there is a considerable debate on present on generic versus personal approaches to emotion recognition. Some research groups specialized in affective computing have moved from general affective computing to affective computing for specialized groups or individuals. In general, the identification of users has major implications for ASP. We propose three distinct categories, from which research in affective science could choose:

1.all: generic ASP; see also Table 2.4 and [676, 679, 681]

2.group: tailored ASP; for example, see [104, 188, 274, 354, 592, 627, 633]

3.individual: personalized ASP; for example, see [40, 272, 316, 427430, 464, 524, 624]

Although attractive from a practical point of view, the category all will probably not solve the mysteries concerning affect. It has long been known in physiology, neurology, and psychology that special cases can help in improving ASP [633]. For the categories group and individual, the following subdivision can thus be made:

1.Specific characteristic; for example, autism [119], depression [592], and criminals versus students [274] but also baseline blood pressure, hypertensive medication, body mass, smoking [625], and alcohol consumption [227]; see also [633] and [85, Chapter

182

10.3 Pattern recognition guidelines

31; p. 732].

2.Psychological traits; for example, personality [57, 188, 354, 362, 441, 676] or empathy [80, 152, 251, 612]).

3.Demographics; for example, age [314, 435, 553], sex/gender [361, 718], nationality [458], ethnics/race [585, Chapter 28], [56, 314, 401, 603, 627], culture [56, 239, 450, 470], socioeconomic status [239, 470], and level of education [676].

4.Activities; for example, office work [316, 681], driving a car [272, 329, 330, 474] or flying a plane, and running [270].

This subdivision is based on current practice with ASP; however, possibly it should be altered.

So far, comparisons between research results on ASP are mostly made between results of either individuals or groups selected to resemble the general population (cf. Table 2.4). However, user-tailored approaches should be explored as well. In particular, experiences with specific groups can substantially contribute to the further development of ASP, as has been seen in other sciences (e.g., biology, psychology, and medicine).

Having said that, the question remains, how to handle this striking variety between people. We propose three approaches that can possibly tackle these problems:

1.Hybrid classification systems [45]. Most often, such architectures incorporate both a (logic-based) reasoning system and a pattern recognition component. To the authors knowledge, so far, this approach has not been applied for ASP. It has, however, been applied successfully for speech-based emotion recognition [591].

2.Multi-agent systems and multi-classifier systems [724]. Two approaches within this field could be of interest: 1) Multi-layered architectures, where each layer determines the possible classes to be processed or the classifiers to be chosen for the next layer and 2) An ensemble of classifiers, trained on the same or distinct biosignals and their features. Their outputs are collected into one compound classification, often determined through a voting scheme. For example, Atassi and Esposito (2008) [20] applied a twolayer classification system for speaker independent classification of six emotions, For more information on this topic, we refer to Lam and Suen [370] and Kuncheva [364].

3.Biosignal signatures. Related to schemes that are used in forensics [559] and with functional neuroimaging (e.g., EEG, fMRI, MEG, NIRSI, SPECT, and PET) [85, Chapter 2; p. 34], ASP could benefit from personalized profiles or schemes that tailor to a generic profile of people’s unique biosignal signatures [172, 464, 726]. Moreover, this approach could be extended to incorporate context information, as is already done in forensics [559]. Biosignal signatures require advanced multi-modal data mining and knowledge discovery strategies, and are related to the Picard et al.’s baseline ma-

183

Соседние файлы в папке Диссертации