Добавил:
kiopkiopkiop18@yandex.ru t.me/Prokururor I Вовсе не секретарь, но почту проверяю Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Ординатура / Офтальмология / Английские материалы / Eye Movements A Window on Mind and Brain_Van Gompel_2007

.pdf
Скачиваний:
0
Добавлен:
28.03.2026
Размер:
15.82 Mб
Скачать

574

G. Underwood et al.

differences in the plausibility of an object appearing in the scene. In Experiment 2 we take the plausibility of objects to an extreme, with the introduction of an experimental condition in which the object of interest in a particular context is anomalous and can be best described as bizarre.

2.1. Method

Participants. Twenty-one University of Nottingham undergraduates took part in this experiment. None had taken part in the first experiment, and all had normal or corrected- to-normal vision.

Stimuli and apparatus. Thirty digital images of recognisable natural scenes were used as stimuli. These were displayed on a standard colour monitor at a distance of 60 cm from the seated participant. These images were digitally edited using Adobe Photoshop software so that a target object appeared in three differing scenes. The size and spatial location of the objects were kept constant. For each scene there were three corresponding pictures, providing the basis for the three conditions: plausible, implausible, and bizarre. Each picture contained a sufficient number of distractor items to ensure that the target object had no significance for the participants as a target object in this experiment. Scenes were then grouped so that the contextually consistent object from a given scene, when placed in the same spatial location in its grouped scene, became either implausible or bizarre. For example, a racing car appeared as the consistent target object in a picture of a racetrack, and subsequently appeared as the inconsistent target object on a farmyard, and as the bizarre target object in a picture of a harbour. Figure 2 shows a different example of objects imposed on a scene to create plausible, implausible, and bizarre combinations. The rotation of target objects across conditions was again intended to control for the visual saliency and attractiveness of the individual objects themselves. Each target object was pasted in a way that size and support relations were not violated.

The mean visual saliency values for each picture were again computed using software provided by Itti and Koch (2000). This was performed to avoid any possible correlation between semantic saliency and visual saliency of target objects within different settings. The mean ranks for visual saliency for plausible, implausible, and bizarre pictures were 3.07 SD = 2 26 , 2.80 SD = 2 12 , and 2.77 SD = 1 96 respectively. A one-factor ANOVA revealed no significant difference in visual saliency across the levels of plausibility F < 1 . This analysis shows that target objects placed in a bizarre setting were no more visually salient within that background (in terms of variations in orientation, intensity and colour) than when placed in a plausible or implausible scene.

Participants’ eye movements and keyboard responses to each display were recorded using an SMI EyeLink tracker which has a spatial accuracy of 0.5 degrees. Eye position was recorded every 4 ms. The eye-tracker was head-mounted, and a chin-rest was also used to minimise head movement and to ensure a constant-viewing distance of stimuli.

Design and procedure. Each participant viewed 30 pictures, with 10 pictures from each scene group (plausible, implausible, and bizarre). As with Experiment 1, the three

Ch. 26: Congruency, Saliency and Gist in the Inspection of Objects in Natural Scenes

575

Figure 2a. A sample picture as used in the plausible condition with the skier wearing the pink jacket as the target object. The fixations of one participant are superimposed on the stimulus in this example, with lines indicating the movements between fixations that were themselves shown as circles. Duration of fixation is indicated by the size of the circles with larger circles indicating longer fixation durations than smaller ones. (See Color Plate 7.)

Figure 2b. A sample picture with fixation patterns of one viewer as seen in the implausible condition with a snowman appearing as the target object. (See Color Plate 8.)

576

G. Underwood et al.

Figure 2c. A sample picture with fixation patterns of one viewer as seen in the bizarre condition with the cow as the target object. (See Color Plate 9.)

pictures within a scene group were rotated across participants so that over sets of three participants all were viewed exactly once.

The instructions to participants explicitly stated that the experiment involved a recognition memory test and that their eye movements would be monitored whilst they viewed pictures that they would later be asked to discriminate against comparable new ones. The recognition memory test was administered only during practice. Although we are interested only in the first few fixations, we found that in Experiment 1, with a fixed display period, participants sometimes did not complete their scanning of the picture, and so this was changed in Experiment 2, to give full opportunity to inspect every object shown. Each test picture remained on display until the participant pressed a key on the computer keyboard. Each participant saw ten pictures from each condition, and did not see the same scene more than once. Pictures were presented in a randomised order.

2.2. Results

The means of the measures of interest (number of fixation prior to inspection of the target object and duration of the first fixation) are shown in Table 2.

Mean number of fixations before the first fixation of the target. A one-factor ANOVA revealed a reliable main effect of semantic consistency upon the number of prior fixations F2 40 = 29 23 MSe = 0 24 p < 0 001 . Pairwise comparisons showed that each condition was significantly different to each other (all contrasts at p < 0 01). Objects in the bizarre condition were inspected earlier than those in the implausible condition, which in turn were inspected earlier than objects in the plausible contexts.

Ch. 26: Congruency, Saliency and Gist in the Inspection of Objects in Natural Scenes

577

 

Table 2

 

 

Means (and standard deviations) of the fixation measures taken in Experiment 2

 

 

 

 

 

 

Mean number of fixations prior to

Mean first gaze duration on

Context

target fixation

target (ms)

 

 

 

 

 

Plausible

2 68

 

759

 

0 75

 

(219)

Implausible

2 02

 

841

 

0 51

 

(274)

Bizarre

1 53

 

913

 

0 31

 

(290)

 

 

 

 

Duration of the first gaze on the target object. A one-factor ANOVA indicated a main effect of context on the duration of initial gaze on the target F2 40 = 3 84 MSe = 32577 p < 0 05 , and pairwise comparisons found that the only reliable difference was that there were longer gazes on objects in bizarre contexts than upon those that were in plausible settings (p < 0 05).

2.3. Discussion

The results show that when inspecting a picture in preparation for a short memory test, fewer saccades were made prior to inspection of a target object that is semantically inconsistent with the gist of the scene, and that this effect was enhanced with extremely implausible objects. This result confirms and extends the effect reported in our first experiment here. Target objects in the bizarre condition were preceded by fewer prior fixations than targets in the implausible or plausible condition, and similarly, target objects in the implausible condition were preceded by fewer prior fixations than targets in the plausible condition. The data also show that once fixated, the duration of the first gaze on the target object is significantly longer in the bizarre condition than in the plausible condition. If the duration of the first gaze at object is a reflection of comprehension, then these results indicate that objects that are of great semantic inconsistency are more difficult to process than their plausible counterparts. This is consistent with studies showing that objects are more difficult to detect or name if they are incongruous or that violate the logic of a scene (e.g., Biederman, Mezzanotte, & Rabinowitz, 1982; Davenport & Potter, 2004).

An important finding from these experiments is that fixation patterns in scenes were sensitive to the extent of the implausibility of the object. There were differences between bizarre and implausibly placed objects and this difference may help explain an inconsistency in the literature. Incongruous objects have not always attracted early fixations (e.g. De Graef et al., 1990; Henderson et al., 1999), in contrast with the results of the two experiments here, and in contrast with Loftus and Mackworth’s (1978) study as well as other recent experiments that have used photographs of natural scenes

578

G. Underwood et al.

(Underwood & Foulsham, 2006). Possibly, the discrepancy in results found in earlier studies may be explained on the basis of the varying levels of the semantic plausibility of target stimuli employed. It may be that the greater the semantic implausibility of the target object, the more likely it will be to find an effect of semantic inconsistency on object perception. Perhaps the octopus in Loftus and Mackworth’s (1978) farmyard was more bizarre than the cocktail glass in Henderson, Weeks, and Hollingworth’s (1999) laboratory scene, although we cannot be sure that these examples were typical of the stimuli used in their experiments. Furthermore, this pattern of results cannot be explained by the premise that objects in the bizarre condition were simply more visually salient against a particular background than object–scene pairings in the plausible condition because there was no difference in visual saliency ranks across the three levels of plausibility in our second experiment.

The finding that incongruous objects (and especially those in the bizarre condition) are fixated sooner than non-informative objects can be argued to demonstrate fast analysis of scene semantics, and that the initial fixations on the scene can be determined in part by the detection of a violation of the gist. These findings support the data presented by Loftus and Mackworth (1978) who argued the occurrence of at least three stages of picture viewing. First, the gist of the scene must be determined, which is now known to occur very rapidly within a single fixation (Biederman et al., 1982; Davenport & Potter, 2004; De Graef, 2005; Underwood, 2005). Secondly, objects in peripheral vision must be at least partially identified on the basis of their physical characteristics. Lastly, the viewer must determine the probability of encountering the object within a given scene once the gist has been identified. In this model, fixations will be directed to objects with low a priori expectations of being in the scene. Longer durations of the initial gaze upon an inconsistent target object are taken to be due to the process of linking the object to the existing schema for any given scene. They suggest that gist acquisition is analogous to the activation of a schema, and that subsequent fixations are drawn to objects as a means of verifying their association with that schema. The additional time spent gazing at an informative object may reflect the amount of time it takes to add the new object to the many instances of the scene represented within the viewer’s personal schema. Alternatively, the longer duration of initial fixation may be due to the viewer actively searching for the appropriate schema from which the inconsistent object belongs as a means of attempting to understand why that object might have been placed within the incongruent scene. Once an incongruent object has been detected, it may take longer to resolve the conflict between the schemas activated by the object and the scene. Hollingworth and Henderson’s (2000) ‘Attentional Attraction’ hypothesis is a potential explanation of the earlier fixation of an inconsistent object as well as longer gaze durations. Here, covert attention is drawn to an object when there is difficulty reconciling the identity of the object with the scene schema. The role of attention may be to make sure that a perceptual mistake has not been made (“is that really a cow on a ski slope?”) or to check for additional information that could help to reconcile the conceptual discrepancy.

On the basis of the present data we can conclude that the top-down cognitive mechanisms involved in recognising the gist of a picture and identifying regions of potential

Ch. 26: Congruency, Saliency and Gist in the Inspection of Objects in Natural Scenes

579

semantic inconsistency are used to guide fixations. In order to initiate an eye movement that results in the early fixation of the inconsistent object, the recognition of the scene schema must be completed very early during picture inspection.

References

Biederman, I., Mezzanotte, R. J., & Rabinowitz, J. C. (1982). Scene perception: Detecting and judging objects undergoing relational violations. Cognitive Psychology, 14, 143–177.

Davenport, J. L., & Potter, M. C. (2004). Scene consistency in object and background perception. Psychological Science, 15, 559–564.

De Graef, P. (2005). Semantic effects on object selection in real-world scene perception. In G. Underwood (Ed.), Cognitive processes in eye guidance (pp. 189–211). Oxford: Oxford University Press.

De Graef, P., Christiaens, D., & d’Ydewalle, G. (1990). Perceptual effects of scene context on object identification. Psychological Research, 52, 317–329.

Friedman, A. (1979). Framing pictures: The role of knowledge in automatized encoding and memory for gist.

Journal of Experimental Psychology: General, 108, 316–355.

Henderson, J. M., Brockmole, J. R., Castelhano, M. S., & Mack, M. (2006). Visual saliency does not account for eye movements during visual search in real-world scenes. (This volume.)

Henderson, J. M., Weeks, P. A., & Hollingworth, A. (1999). The effects of semantic consistency on eye movements during scene viewing. Journal of Experimental Psychology: Human Perception and Performance, 25, 210–228.

Hollingworth, A., & Henderson, J. M. (2000). Semantic informativeness mediates the detection of changes in natural scenes. Visual Cognition, 7, 213–235.

Itti, L., & Koch, C. (2000). A saliency-based search mechanism for overt and covert shifts of visual attention.

Visual Research, 40, 1489–1506.

Loftus, G. R. (1972). Eye fixations and recognition memory for pictures. Cognitive Psychology, 3, 525–551. Loftus, G. R., & Mackworth, N. H. (1978). Cognitive determinants of fixation location during picture viewing.

Journal of Experimental Psychology: Human Perception and Performance, 4, 565–572.

Mackworth, N. H., & Morandi, A. J. (1967). The gaze selects informative details within pictures. Perception and Psychophysics, 2, 547–552.

Underwood, G. (2005). Eye fixations on pictures of natural scenes: Getting the gist and identifying the components. In G. Underwood (Ed.), Cognitive processes in eye guidance. (pp. 163–187). Oxford: Oxford University Press.

Underwood, G., & Foulsham, T. (2006). Visual saliency and semantic incongruency influence eye movements when inspecting pictures. Quarterly Journal of Experimental Psychology, 18, 1931–1949.

Underwood, G., Foulsham, T., van Loon, E., Humphreys, L., & Bloyce, J. (2006). Eye movements during scene inspection: A test of the saliency map hypothesis. European Journal of Cognitive Psychology, 59, 321–342.

This page intentionally left blank

Chapter 27

SACCADIC SEARCH: ON THE DURATION OF A FIXATION

IGNACE TH. C. HOOGE, BJÖRN N. S. VLASKAMP and EELCO A. B. OVER

Utrecht University, The Netherlands

Eye Movements: A Window on Mind and Brain

Edited by R. P. G. van Gompel, M. H. Fischer, W. S. Murray and R. L. Hill Copyright © 2007 by Elsevier Ltd. All rights reserved.

582

I. Th. C. Hooge et al.

Abstract

Is it the fixated stimulus element or the fixation history that determines fixation duration? We measured 93 922 fixations to investigate how fixation times are adjusted to the demands of a visual search task.

Subjects had to search for an O among C’s. The C’s could have a large gap (0 220 ) or a small gap (0 044 ). We varied the proportions of both types of C’s in the displays. The main results are that: (1) fixation time depended on the element fixated (small gap C’s were 40 ms longer fixated than large gap C’s), (2) fixation time on large gap C’s decreased with increasing proportion of large gap C’s in the display and (3) fixation time on large gap C’s depended on the gap size of the previously fixated element.

We conclude that fixation time depends both on the fixation history (expected analysis time) and on the current fixation element (actual analysis time). The contribution of both components on fixation time may depend on the task and the amount of useful information in the displays. Pre-programming of fixation times appears to be conservative such that extension of fixation time occurs in the next fixation whereas shortening of fixation time appears to be delayed.

Ch. 27: Saccadic Search: On the Duration of a Fixation

583

To inspect a visual scene, observers usually make saccades to direct the fovea to interesting parts of the scene. Between saccades the eyes fixate and visual analysis of the scene may take place. The visual system needs time to analyse the retinal image. This time may vary from several tens of milliseconds to hundreds of milliseconds. It is obvious that fixation times have to be adjusted to these analysis times to make search efficient. If fixation is too short, the retinal image will be overwritten by the next retinal image before it is processed. If fixation is too long, observers spoil time doing nothing.

Fixation times have been a research topic for years (for a good review, see Rayner, 1998) and many authors have shown that longer mean fixation times are found in more difficult tasks (Cornelissen, Bruin, & Kooijman, 2005; Hooge & Erkelens, 1996, 1998; Jacobs, 1986; Näsänen, Ojanpää, & Kojo, 2001; Rayner & Fisher, 1987; Vlaskamp, Over, & Hooge, 2005), suggesting that fixations are set to longer durations when the search task requires it. However, this is not a strict one-to-one relation, one may find long and short individual fixation times in a search task with homogenous stimulus material (e.g., one type of non-targets). In addition, it is found in a variety of tasks that the width of the distribution of fixation times is broad and scales with the mean (Harris, Hainline, Abramov, Lemerise, & Camenzuli, 1988).

Usually, individual fixations last from 50 to 700 ms. According to Viviani (1990) at least three processes may take place during fixation: (1) saccade programming, (2) analysis of the foveal image and (3) selection of a new saccade target.

It takes about 150 ms to program a saccade to a target appearing randomly in time and space (Becker & Jürgens, 1979). As stated above, individual saccades may have preceding fixations that are shorter than 150 ms, because multiple saccades may be programmed in parallel (McPeek, Skavenski, & Nakayama, 2000). If two saccades are programmed shortly after each other, the fixation time between these saccades becomes short.

The analysis of the foveal image and the selection of the saccade target also consume time and these processes may occur in parallel with saccade programming (Viviani, 1990; Hooge & Erkelens, 1996). Search is effective if fixations are long enough to allow foveal analysis and saccade target selection to take place. However, it is not clear how saccade programming and these two visual processes are synchronized. Also questionable is how precisely they are synchronized. Hooge and Erkelens (1996) report the occurrence of many return saccades. From this it is hypothesized that they bring back the eyes to a location that was fixated too briefly for completed visual analysis. The majority of the fixations, however, are long enough. How does the brain determine the duration of a fixation? At least, the brain should have knowledge (probably in advance) about the time required for visual analysis to program saccades in such way that search is effective.

An interesting discussion concerns whether fixation times are pre-programmed by the use of the expected analysis time determined during previous fixations or whether the foveal analysis is monitored (process-monitoring) and fixation times are adjusted after visual analysis is complete (Greene & Rayner, 2001; Hooge and Erkelens, 1996, 1998; Rayner, 1978; Rayner & Pollatsek, 1981; Vaughan, 1982; Vaughan & Graefe, 1977).

The main evidence for process-monitoring comes from onset-delay experiments in reading research. In an onset-delay paradigm, certain words disappear at fixation onset and