Добавил:
kiopkiopkiop18@yandex.ru t.me/Prokururor I Вовсе не секретарь, но почту проверяю Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Ординатура / Офтальмология / Английские материалы / Eye Movements A Window on Mind and Brain_Van Gompel_2007

.pdf
Скачиваний:
0
Добавлен:
28.03.2026
Размер:
15.82 Mб
Скачать

564

G. Underwood et al.

Abstract

Early studies of the inspection of scenes suggested that eye fixations are attracted to objects that are incongruent with the gist of a picture, whereas more recent studies have questioned this conclusion. The two experiments presented here continue to investigate the potency of incongruent objects in attracting eye fixations during the inspection of pictures of real-world scenes. Pictures sometimes contained an object that violated the gist, such as a cow grazing on a ski slope, and the question asked was whether fixations were attracted to these objects earlier than when they appeared in congruent contexts. In both experiments earlier fixation of incongruent objects was recorded, suggesting a role for peripheral vision in the early comprehension of the gist of a scene and in the detection of anomalies.

Ch. 26: Congruency, Saliency and Gist in the Inspection of Objects in Natural Scenes

565

Are fixations attracted to objects that violate the gist of a scene? Are viewers more likely to look at a wrench on an office desk than at a stapler? Studies of the inspection of scenes have suggested that eye fixations are attracted to objects that are incongruent with the gist of a picture, but more recently results have suggested that incongruous objects have no special status. The present experiments attempt to resolve this debate with new data.

Mackworth and Morandi (1967) recorded eye movements as their participants viewed colour photographs, and found that more eye fixations were made to regions of the scene that were rated as being more informative by a group of independent judges. This effect was present during the first 2 s of inspection as well as later in the sequence long after the gist of the scene would have been acquired. This suggests that objects are analysed in terms of their meaning very early in the inspection of scenes. Loftus and Mackworth (1978) confirmed the effects of top-down processes on the early inspection of scenes. Their viewers were presented with line drawings of recognisable scenes (e.g. a farmyard) in which one particular object was anomalous (an octopus in the farmyard, for example) or congruent (a tractor). Viewers were more likely to look earlier and for longer durations at anomalous objects than at congruent objects placed in the same location, suggesting that the semantic content of pictures influences early inspection and is used to direct attention to these objects. Subsequent research using line drawings of scenes has since questioned whether or not covert attention and overt gaze shifts preferentially seek out objects that have a low semantic plausibility in the scene (De Graef, Christiaens, and d’Ydewalle, 1990; Friedman, 1979; Henderson, Weeks, & Hollingworth, 1999). In a more recent study of the attention given to incongruent objects in photographs of real-world scenes, however, we have reported their effectiveness in attracting early eye fixations (Underwood & Foulsham, 2006). There is considerable inconsistency in the appearance of this congruency effect.

De Graef et al. (1990) proposed that effects of contextual violations are only apparent during later stages of scene inspection, contrary to the notion of schema-driven object perception effective within a single fixation. Henderson et al. (1999) also recorded eye movements during the inspection of line drawings of familiar scenes and similarly found no early effects of incongruous objects. In their first experiment viewers were free to inspect the scene in preparation for a memory test, and in the second they searched for a target object in order to make a present/absent decision. Scenes sometimes contained incongruous objects (for example, a microscope in a bar room), and sometimes the object appeared in a congruous scene (the microscope was in a laboratory setting). There was no evidence of the early fixation of incongruent objects, suggesting that the early inspection of scenes is determined by visual factors as opposed to semantic factors.

This pattern of results prompted Henderson et al. (1999) to suggest a “saliency map” model of eye guidance in scene perception, in which the initial analysis of the image identifies the low-level visual features such as colour and brightness variations. Eye movements are initially guided to the areas of greatest conspicuity in this account, rather than to anomalies of the gist because the meaning of the picture is not identified at this stage. The gist can only be identified after the initial analysis of the visual characteristics of the image. Henderson, Brockmole, Castelhano, and Mack (this volume) have recently

566

G. Underwood et al.

revised this account, placing less emphasis on visual saliency. The present experiments exclude the possibility that attention is attracted to incongruous objects because they have high visual conspicuity, by controlling for the saliency of the objects edited into the pictures.

An issue of concern here is the lack of uniformity of stimuli in terms of visual complexity employed across past researchers to investigate the effect of congruency on object perception. The line drawings used by De Graef et al. (1990) and Henderson et al. (1999) were created by tracing and somewhat simplifying the contours of photographs of real-world settings to produce what were described to be “reasonable approximations to the level of complexity found in natural scenes” (Henderson et al., 1999, p. 212). Due to the stylisation and abstraction of these drawings, however, it was often extremely difficult to recognise the target object. This problem is acknowledged by De Graef et al. (1990), who declared that many objects were excluded from analysis due to difficulties with identification. To what extent does a microscope in a bar room, or indeed a cocktail glass in a laboratory (as used by Henderson et al., 1999), contribute to our overall understanding of the gist of the scene? It seems intuitively reasonable to predict that strange and out of place objects, such as an octopus in a farmyard or a tractor on the sea bed, should be fixated relatively early during scene inspection, once the gist has been recognised and when a gist-violating object starts to present difficulties in the resolution of the meaning of the picture.

Our first experiment addressed the question of whether the semantic content of a scene attracts attention, by recording any differences in the initial eye fixations on objects that are placed in expected or unexpected contexts. Whereas Loftus and Mackworth (1978) found that incongruous objects attracted early fixations – an indication of attentional capture following the detection of a scene inconsistency – other studies have found no effect of incongruency (De Graef et al., 1990; Henderson et al., 1999). All three studies used line drawings, but in the Loftus and Mackworth experiment the object of interest may have been drawn with no features overlapping with the background. The published example suggests this, but in contrast De Graef et al. (1990) and Henderson et al. (1999) drew their objects against cluttered backgrounds. The possibility here is that the effect reported by Loftus and Mackworth depends upon visual conspicuity, and that incongruity will be effective only if a conspicuous object stands out from its background. We have previously confirmed the potency of visual conspicuity in picture inspection (Underwood & Foulsham, 2006; Underwood, Foulsham, van Loon, Humphreys, & Bloyce, 2006), and so it is necessary to eliminate low-level visual saliency as a potential confound. When viewers inspect pictures in preparation for a memory test, their early fixations are drawn to regions that are visually distinctive, although in our search tasks and in the search task reported by Henderson et al. (this volume) the effects of saliency are minimal. They are not minimal in free-inspection tasks, however. To control for effects of visual saliency we screened all stimuli for conspicuity differences between objects in the pictures because the task set for the viewers was to inspect in preparation for a memory test. Itti and Koch (2000) have developed software for the determination of the saliency of regions on pictures according to variations in colour, brightness, and orientation, and this program

Ch. 26: Congruency, Saliency and Gist in the Inspection of Objects in Natural Scenes

567

was used to match the saliency values of objects as they appeared in their different contexts.

1. Experiment 1: Incongruous objects in pictures

This experiment investigated whether the semantic content of a scene determines inspection. It investigated whether attention is drawn to objects that are semantically congruent or incongruent to the rest of the scene. The objects from the first experiment were placed in congruent or incongruent backgrounds. Eye movements were recorded during the inspection of these scenes. By comparing the congruent and incongruent conditions, it could be established whether attention is drawn to objects that are appropriate to the scene or inappropriate to the scene. A neutral condition was also included to obtain baseline values for inspection of the same objects in the absence of contextual cues.

1.1. Method

Participants. The volunteers in this experiment were 18 undergraduates from the University of Nottingham and all had normal or corrected-to-normal vision.

Stimuli and apparatus. Three different sets of 60 pictures were created and each figure contained one object of interest. Each set consisted of an equal number of congruent, incongruent, and neutral pictures, defined according to whether the object would normally be found in that location. The 20 objects in each of the three conditions were equally often from indoor and outdoor locations. Examples are shown in Figures 1a (congruent), 1b (incongruent), and 1c (neutral).

The pictures were created by editing the objects into photographs of background scenes using Adobe Photoshop software. The objects were matched so that each indoor object was paired with an outdoor object with similar physical features (for example, a vacuum cleaner and a lawn mower, as in Figure 1a). The congruent pictures were created by placing each object onto its appropriate background (for example, a vacuum cleaner was placed in a hall, as in Figure 1a); the incongruent pictures were created by replacing the objects from the congruent pictures with its matched object (in this example, the vacuum cleaner would be replaced by the lawn mower, as in Figure 1b). The neutral pictures were created by editing each object into one of the neutral backgrounds such as scenes consisting of pictures of brick walls that could be indoors or outdoors (Figure 1c). Objects of interest were always edited into pictures away from the centre, where the initial fixation was to be made. An additional four pictures were created for practice trials. The three sets of pictures contained each background scene and each object of interest, in three different combinations. Objects were permutated across conditions to create the three sets of pictures for use with three groups of participants. This ensured that each object would be seen by each participant, and that over the course of the experiment each object was seen equally often against each of the three backgrounds.

568

G. Underwood et al.

Figure 1a. An example of a congruent indoor picture (top panel) and a congruent outdoor picture (bottom panel). (See Color Plate 4.)

Each photograph was processed using the Itti and Koch (2000) software for the determination of the peaks of low-level visual saliency. This program identifies changes in brightness, colour, and orientation and creates a saliency map that is a representation of locations that are visually conspicuous. The rank order of saliency peaks on each picture was used to compare each object as it appeared in the congruous, incongruous, and neutral contexts. A rank of 1 was allocated to the most salient object in the scene. The mean rankings for objects were: congruous context 3.65 (SD = 2 08), incongruous context 3.55 (SD = 1 92), and neutral context 1.05 (SD = 0 22). The congruous and incongruous rankings did not differ t < 1 . All but one of the objects in the neutral context was

Ch. 26: Congruency, Saliency and Gist in the Inspection of Objects in Natural Scenes

569

Figure 1b. An example of an incongruent indoor picture (top panel) and an incongruent outdoor picture (bottom panel). (See Color Plate 5.)

identified as the visually most salient object. As a consequence of this inhomogeneity of variance, no statistical comparison was possible between the neutral condition and the other two conditions. It was concluded that the objects in congruous and incongruous contexts had similar saliency rankings, and that these were lower than the ranks of the same objects placed in neutral scenes.

The pictures were presented on a personal computer with a standard colour monitor, which was approximately 60 cm from the seated participant. Stimuli were presented using the E-prime experimental control software. Eye movements were recorded using an SMI iView X RED system with a remote camera for recording eye position every 20 ms.

570

G. Underwood et al.

Figure 1c. An example of a neutral picture containing an indoor object (left panel) and a neutral picture containing an outdoor object (right panel). (See Color Plate 6.)

Head movements were restricted by requiring participants to rest their head on a chin rest. The spatial accuracy of the system was 0.5 degrees.

Design and procedure. A two-factor within-groups design was used; the factors were object type (indoor and outdoor) and background (congruent, incongruent and neutral). Each participant saw one of the sets of 60 pictures (thus there were three different groups of participants). This design ensured that participants did not see the same object twice during the course of testing, but ensured that each object appeared in each of the three contexts over the course of the whole experiment. Each participant saw each object and each background scene just once during the experiment, but different participants saw the objects and scenes in different combinations.

Participants were first calibrated with the eye-tracker. Calibration involved fixating a central marker, which then appeared at 8 points within the sides of the display area once eye movements had stabilised. Before each picture was presented a fixation cross appeared in the centre of the screen for 1000 ms and each picture was then presented for 5000 ms. After the practice trials were presented, a text display was shown giving instructions for the memory test. In the memory test another picture was presented that had or had not been seen previously. Participants had to decide whether this picture had been previously presented. The picture was displayed until the participant made

Ch. 26: Congruency, Saliency and Gist in the Inspection of Objects in Natural Scenes

571

a response that also terminated the display. At the end of each block of 12 pictures, the same memory test instructions as those given in the practice trial were presented, and a test picture was then presented. The test picture either had been presented in that block or was a new picture. Eye movements were recorded during the presentation of all of the pictures, but not during the memory test.

1.2. Results

Measures of the fixation of the object of interest in each picture are the focus here, and this object will be referred to as the target object. The mean number of fixations before target fixation, and the mean duration of the first gaze on the target were calculated according to the type of background and the type of object (see Table 1). These measures give an indication of the delay in first fixating the object of interest and the amount of attention given to it during the first inspection.

Mean number of fixations before the first fixation of the target. The fixation count prior to target fixation was defined as the number of fixations from the onset of the display (including the initial central fixation) up to the point of first fixation upon the target object, providing a measure of the degree to which the semantic anomaly in a peripheral region of the scene could guide early foveal fixation. The two factors entered into a 2 × 3 ANOVA were background, which had three levels (congruent, incongruent, and neutral), and object, which had two levels (indoor and outdoor). There was a main effect of background on the number of fixations before fixating on the target object F2 34 = 130 91 MSe = 0 012 p < 0 001 . Pairwise comparisons showed that there was a reliable difference between the congruent and incongruent contexts p < 0 05) with target fixation occurring after fewer fixations when the background was incongruent.

Table 1

Means (and standard deviations) of the fixation measures taken in Experiment 1

 

 

Mean number of fixations

Mean first gaze duration on

Context

 

prior to target fixation

target (ms)

 

 

 

 

Congruent

Indoor

3 83

963

 

 

1 44

(472)

 

Outdoor

2 91

1156

 

 

0 9

(451)

Incongruent

Indoor

2 48

1425

 

 

0 63

(617)

 

Outdoor

3 07

1346

 

 

0 70

(577)

Neutral

Indoor

1 39

2190

 

 

0 37

(736)

 

Outdoor

1 31

2420

 

 

0 47

(779)

 

 

 

 

572

G. Underwood et al.

There was earlier fixation of a target object in a neutral context relative to the target in an incongruent and a congruent context (both contrasts at p < 0 001). There was no main effect of indoor/outdoor object type F1 17 = 1 321 , but this factor did interact with background congruency F2 34 = 5 951 MSe = 0 016 p < 0 01 . Pairwise comparisons showed that for pictures containing indoor objects, there were reliable differences between those with congruent and incongruent backgrounds p < 0 001) with fixation occurring earliest when the background was incongruent. There was earlier fixation of objects in neutral contexts in comparison with both the incongruent and congruent backgrounds (both contrasts at p < 0 001). For pictures containing outdoor objects, there were significant differences between pictures with incongruent and congruent backgrounds in comparison with objects in neutral contexts (both contrasts at p < 0 001). However, there was no significant difference between pictures with congruent and incongruent backgrounds, unlike the comparisons between pictures containing indoor objects.

Duration of the first gaze on the target object. The duration of initial gaze on a target was taken as the sum of the time spent fixating the target object from when the eyes initially landed on the object until the eyes first left that object. A 2 × 3 ANOVA indicated that there was a main effect of background congruency F2 34 = 89 02 MSe = 166575 p < 0 001 . Pairwise comparisons showed that there were longer first gazes on incongruent rather than congruent p < 0 01 ; and that they were longer on objects in neutral contexts in comparison with incongruent and congruent contexts (both contrasts at p < 0 001). There was a main effect of object type, with longer gazes upon outdoor objects F1 17 = 7 70 MSe = 51400 p < 0 05 , and there was an interaction between background and object F2 34 = 4 50 MSe = 60116 p < 0 05 . Pairwise comparisons indicated that for indoor objects all three context conditions differed from each other, with longer gazes on incongruent than on congruent objects p < 0 01 , and on neutral objects than on both congruent and incongruent objects (both at p < 0 001). For outdoor objects, congruent and incongruent object gaze durations did not differ, but first gazes on objects in neutral scenes were again longer than gazes on indoor or outdoor scenes (both at p < 0 001).

1.3. Discussion

When the background was incongruent with the target object, target fixation occurred after fewer fixations than it was congruent, and the gaze duration on the object was longer. When the background was neutral (as opposed to congruent or incongruent) the target object was fixated earlier and attracted longer initial gazes. When comparing the congruent and incongruent conditions the largest differences occurred with objects taken from an indoor scene. When a scene contained an indoor object such as a vacuum cleaner, fixation of the target object occurred sooner and the first inspection was longer when the background was incongruent rather than when it was congruent. With an outdoor object such as a lawnmower, the effect of congruency was mainly upon gaze duration. When an object was placed in a neutral scene the target object was looked at for longer, and was looked at sooner than objects placed in either a congruous or an incongruous background.

Ch. 26: Congruency, Saliency and Gist in the Inspection of Objects in Natural Scenes

573

However, this result is not surprising since the target object was the only object to inspect in the neutral condition; the target was the only object in the scene.

The results established that fixations were more attracted to an incongruent object within a scene than to a congruent object when placed against a similar background, confirming Loftus and Mackworth’s (1978) findings with line drawings and Underwood and Foulsham’s (2006) findings with photographs. The congruency effect with rich realworld scenes suggests that cognitive factors are important in governing the early inspection of scenes. It implies a role of scene-specific schemas in governing attention. These schemas generate expectations about what the scene will contain. When we view a scene we are influenced by these expectations and are drawn to objects that are unexpected.

This experiment established that indoor objects were fixated sooner when the background was semantically incongruent with the object than when it was congruent. However, this effect did not occur for outdoor objects placed in indoor scenes. This is curious and we have no explanation of this inconsistency other than to suggest that plausibility may be responsible. It could be argued that it is more surprising to see an indoor object in an outdoor scene than it is to see an outdoor object in an indoor scene. Although an outdoor object placed in an indoor scene may be semantically incongruent with that scene, its presence may not be necessarily unexpected because outdoor objects are sometimes stored indoors. Therefore, it may be more appropriate to describe this as an effect of expectancy or plausibility rather than congruency. The expectancy effect would imply that we are attracted to objects that are not expected to be in a scene. If the outdoor objects could be equally expected in both the congruent and incongruent conditions then we might expect no congruency effect. The second experiment investigates this possible variation in the plausibility of incongruent objects. In addition to fixations on congruent and incongruent objects we introduced bizarre combinations of objects and scenes, to give further emphasis to the implausibility of an object appearing in that specific context.

2. Experiment 2: Bizarre objects in pictures

Natural photographs of scenes were employed in Experiment 1 to investigate the influence of object-scene congruency upon eye guidance. These pictures were paired so that in one background an object appeared either as congruent or as incongruent in relation to the gist of the scene. In this experiment a third condition was introduced in order to create a neutral baseline as an indication of the amount and duration of fixation received by each object independently of distracters and the acquisition of scene gist. In agreement with the findings of Loftus and Mackworth (1978), it was found that viewers located the incongruent object with fewer prior fixations than their congruent counterparts. Also, once fixated, incongruous objects received longer total gaze duration than congruous objects. It was concluded that one is attracted to objects that violate the gist of a scene, and that such eye movements are programmed within the first fixation or two, after the retrieval of the appropriate scene schema. Experiment 1 found inconsistencies in the effects of congruency on the early fixation of indoor and outdoor scenes that may be attributed to