Ординатура / Офтальмология / Английские материалы / Eye Movements A Window on Mind and Brain_Van Gompel_2007
.pdf624 L. Gareze and J. M. Findlay
location of the target and decreased as distance from the target increased, indicating that participants were fixating the cross prior to each scene display as instructed. The data were analysed with a logistic regression procedure appropriate for the binary outcome
variable. There was no overall main effect of consistency (X2 |
1 < 1 p = 0 |
59) but there |
|||||
|
|
|
|
|
2 |
1 = 4 21, |
|
was a significant interaction between fixation position and consistency (X |
|
||||||
p = 0 040). This |
occurred because of significantly higher accuracy for consistent targets |
||||||
|
2 |
1 = 5 48 p = 0 019), but also significantly higher accuracy |
|||||
presented at fixation (X |
|
||||||
for inconsistent targets presented 3 from fixation (X2 1 = 4 40 p = 0 036) and slightly higher accuracy for inconsistent targets at all other fixation positions, although none of these comparisons was significant when tested individually. This finding suggests that the extrafoveal identification of inconsistent targets was facilitated above performance for consistent targets during the brief presentation of a line drawing scene, even when the target was presented approximately 12 from fixation.
As individual objects in this experiment were not presented in both a consistent and an inconsistent background, the possibility that an advantage might have been caused by an inadvertent failure to match the inconsistent target’s features to that of the matched consistent target was investigated. A comparison of object sizes across consistent and inconsistent targets indicated that there was no significant difference between the two groups (t42<1 p = 0 42). In fact, consistent targets were on average slightly larger than inconsistent targets, so a size difference was unlikely to result in the advantage for inconsistent targets presented in extrafoveal vision.
It was noted that some participants were unable to identify certain targets even with extended viewing. It was confirmed by a further informal investigation using naïve observers that the target objects were not always identified correctly when presented within a scene, as they were in the experiment. Thus, it is difficult to be certain that the apparent advantage for inconsistent targets at extrafoveal locations can really be attributed to the processing of semantic information.
The presence of an inconsistent object advantage when using images which naïve observers struggled to identify raised the issue of whether the effect produced was generated by the processing of semantic information at all. Visual differences between consistent and inconsistent targets in scenes could have been introduced in the process of creating the images, as inconsistent targets required some manipulation before they could appear compatible with the scene background. To investigate this possibility, Experiment 2 replicated Experiment 1, but displayed inverted images. In every other way, the experiments were identical.
3. Experiment 2
Each image presentation was inverted, to interfere with semantic processing, altering the task to one of matching visual features between the brief inverted scene display and the inverted response display. We hypothesised that if a consistency effect persisted, with improved accuracy for inconsistent targets at extrafoveal locations, this effect could
Ch. 29: Absence of Scene Context Effects in Object Detection and Eye Gaze Capture |
625 |
be attributed to visual differences rather than semantic ones. If the inconsistent object advantage were extinguished, this result would suggest that the inversion of the images successfully interfered with semantic processing and abolished the effect.
3.1. Results
Accuracy in this experiment was slightly lower than that in Experiment 1 (Figure 3b), with performance at the furthest fixation position not significantly better than chance for either consistent or inconsistent targets. The broadly similar identification probabilities in the two experiments suggest that orientation invariant features, perhaps of a visual rather than a semantic nature, were mainly used for the task. The absence of a consistency effect at any eccentricity in Experiment 2 indicates that there was no effect of semantic relationship between the target and the scene background when the images were inverted. Thus the presence of an inconsistent object advantage in the peripheral positions in Experiment 1 supports the suggestion of a contribution from semantic processing.
The data from the two experiments were directly compared using a binary logistic regression analysis in which image orientation was added as a variable. For extrafoveal locations, there was a significant main effect of orientation (X2 1 = 12 4 p < 0 001), with higher accuracy for upright images (Experiment 1) than for inverted images (Experiment 2). A significant interaction between orientation and consistency (X2 1 = 5 97 p = 0 015) further suggested that orientation had a greater effect on performance for inconsistent targets than for consistent targets. An additional analysis confirmed this relationship by investigating whether orientation had a significant main effect on consistent and inconsistent trials separately. Image orientation did not significantly affect performance for
consistent |
trials (X2 |
1<1 p = 0 84), but a significant effect was found for inconsis- |
||
|
2 |
|
||
tent trials (X |
|
1 = 12 0 p = 0 001), indicating that performance on inconsistent targets |
||
at extrafoveal locations was significantly reduced by inverting the experimental images. These results support the hypothesis that some processing of semantic information might have occurred during the experimental process, as inverting the images reduced accuracy.
As noted previously, a cause for concern with the materials was that it was not possible to display the same object in both a consistent and an inconsistent setting and that participants sometimes struggled to identify the targets and the background scenes. Therefore, the extent to which we can attribute a significant consistency effect to semantic processing is compromised. To overcome this, a set of photographic images was produced in which familiar target objects were manipulated so that the same object could be displayed in both a consistent and an inconsistent background.
4. Experiment 3
In Experiment 3, we investigated whether a consistency effect could be produced using more naturalistic (and identifiable) photographic stimuli. The stimuli were created by identifying familiar backgrounds (household scenes) and objects (household items) to
626 |
L. Gareze and J. M. Findlay |
serve as targets. Consistent and inconsistent targets were placed in the same location in a scene and each consistent target also served as an inconsistent target in a different scene background. Therefore, consistent and inconsistent targets at the same location were quite closely matched for size and shape and each object was used as a target twice in different scene backgrounds (Figure 4). Only one instance of each scene was presented to each participant.
4.1. Results
The use of photographic images resulted in improved performance (Figure 5a), with accuracy above 90% for both consistent and inconsistent targets presented at fixation, and performance appeared to plateau above chance level between positions 3 and 4 (9 and 12 ), at approximately 66%. However, the change in visual stimuli also completely eradicated any evidence of a consistency effect. Even when the target was presented at fixation, consistent and inconsistent targets were identified with the same accuracy.
(a) |
(b) |
(c) |
(d) |
Figure 4. Experiment 3 – Examples of scenes used as experimental images. (a) Kitchen (consistent target toaster), (b) Kitchen (inconsistent target teddy bear), (c) Playroom (consistent target teddy bear), (d) Playroom (inconsistent target toaster).
Ch. 29: Absence of Scene Context Effects in Object Detection and Eye Gaze Capture |
627 |
|
100 |
|
|
|
|
|
|
100 |
|
|
|
|
|
|
90 |
+× |
+ |
|
|
|
|
90 |
|
|
|
|
|
|
|
|
× |
|
|
|
|
|
+ |
|
|
|
|
|
80 |
|
|
× |
|
|
|
80 |
|
|
|
|
|
|
|
|
|
|
|
× |
|
|
|
|
|||
|
|
|
|
+ |
|
|
|
|
|
|
|
|
|
(%) |
70 |
|
|
|
× |
|
(%) |
70 |
|
× |
|
|
|
|
|
|
|
+ |
+× |
|
|
+ |
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|||
Accuracy |
60 |
|
|
|
|
|
Accuracy |
60 |
|
|
+ |
|
|
50 |
|
|
|
|
|
50 |
|
|
× |
+× |
× |
||
|
|
|
|
|
|
|
|
|
|||||
|
|
|
|
|
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
|||
|
40 |
|
|
|
|
|
|
40 |
|
|
|
|
|
|
30 |
|
|
|
|
|
|
30 |
|
|
|
|
|
|
20 |
|
|
|
|
Consistency |
|
20 |
|
|
|
|
Consistency |
|
10 |
|
|
|
|
+ consistent |
|
10 |
|
|
|
|
+ consistent |
|
|
|
|
|
|
|
|
|
|
|
|
||
|
0 |
|
|
|
|
× inonsistent |
|
0 |
|
|
|
|
× inonsistent |
|
0° |
3° |
6° |
9° |
12° |
|
0° |
3° |
6° |
9° |
12° |
||
|
|
|
|
||||||||||
|
|
|
Eccentricity (approx) |
|
|
|
|
Eccentricity (approx) |
|
||||
|
|
|
|
(a) |
|
|
|
|
|
|
(b) |
|
|
Figure 5. Results of experiments using naturalistic scene stimuli. (a) Experiment 3 – Grey-scale photographs.
(b) Experiment 4 – Line drawings derived from grey-scale photographs. Graphs show the change in accuracy by eccentricity (at 60 cm) and target object consistency. Chance level of 50% is indicated.
As each object served as a consistent and an inconsistent target, we analysed the data with an items analysis using a matched pairs t-test where accuracy for each object was compared across conditions (although the object size was not necessarily constant). There was no difference in performance at all between an object located in a consistent and an inconsistent background (t31<1 p = 0 94). We also compared performance for the size-matched consistent and inconsistent targets, which appeared at the same location. A matched pairs t-test again found no significant difference in accuracy (t31<1 p = 0 95), indicating that there was no effect of consistency between consistent and inconsistent targets in a scene.
5. Experiment 4
In order to explain the failure to elicit a consistency effect with the photographic stimuli, we investigated whether the nature of the stimuli or their composition could be responsible. In Experiment 4, the photographic scenes were converted to line drawings which maintained a reasonable level of detail, to enable the scenes and the targets to be readily identified. The line drawings were closely matched to the photographs but they were not identical. Again, participants were required to identify which of two line drawings of household objects had appeared in the brief scene presentation.
5.1. Results
In Experiments 1–3, participants with accuracy below 60% were replaced. However, the lack of available participants made this impossible in Experiment 4, and so accuracy was significantly lower in this experiment than in the previous ones. It should be noted
628 |
L. Gareze and J. M. Findlay |
that the decrease in accuracy was not solely caused by the inability to replace poorly performing participants, as only 5 participants were replaced in Experiment 3, compared to 31 participants eligible for replacement in Experiment 4.
Even when the target was presented at fixation, accuracy was only 81%, falling below 52% at 9 from fixation. This decrease in accuracy indicated that the increased visual information in the photographic images facilitated performance on this task, rather than hindered it (Figure 5b). There was no evidence of a consistency effect at any position. As in Experiment 3, we compared accuracy for a target according to whether it appeared in a consistent or inconsistent scene and found no evidence of a consistency effect (t31<1 p = 0 38). Similarly, comparing performance on a consistent and an inconsistent target located in the same scene also failed to reveal any consistency effects (t31 = 1 05, p = 0 30).
We compared performance between photographs and line drawings of photographs (Experiments 3 and 4) and found that accuracy decreased equally for consistent and inconsistent targets at all positions when line drawings were displayed, with differences up to 20% between the two conditions. Although a significant main effect of image type was found as expected (X2 1 = 90 2 p<0 001), confirming that accuracy was significantly poorer in Experiment 4 than Experiment 3, there was no evidence of a main effect or interaction involving consistency. These data support the conclusion that the consistency manipulation in these experiments did not produce any evidence of differential processing of consistent and inconsistent targets.
6. Discussion of Experiments 1–4
This series of experiments has enabled us to examine the effect of consistency in a psychophysical identification task. In three of the four experiments, a small advantage for consistent objects when viewed foveally was found, although this effect was absent in Experiment 3 (photographic stimuli). Experiment 3 resulted in the highest overall identification performance (over 90% accuracy with direct fixation) and it is possible that global context has less of an effect when the targets are easily identifiable. In all the experiments, accuracy declined systematically with eccentricity as expected and in no case did the results suggest that the detection superiority for consistent targets was maintained in extrafoveal vision.
However, at extrafoveal locations, the findings were less clear. The only significant difference between performance for consistent and inconsistent targets occurred when presenting upright line drawings obtained from the Leuven library. In Experiment 1, we found a significant advantage for inconsistent targets presented extrafoveally. This advantage was extinguished when the images were inverted. This manipulation would have certainly interfered with the identification of the individual target objects as well as the global scene and was hypothesised to inhibit the processing of semantic information.
Two considerations might cast doubt on this conclusion of an identification advantage for inconsistent stimuli. First, separate testing by questionnaire (Gareze, 2003) showed
Ch. 29: Absence of Scene Context Effects in Object Detection and Eye Gaze Capture |
629 |
that identification of the targets used was not always possible even with unlimited viewing. Second, each object was not used both in a consistent and in an inconsistent background and although the object characteristics were balanced as closely as possible, it is possible that subtle visual differences occurred between the two sets. These considerations led to the design of Experiment 3, in which photographic scenes were used with each target object appearing both in a consistent and in an inconsistent background.
In Experiment 3, no differences at all were found between the identification percentages for consistent and inconsistent targets. When the same material was presented in line drawing form (Experiment 4), identification accuracy was reduced. However, this did not result in any significant effects of consistency, although a small non-significant consistency advantage occurred at the foveal position.
To determine whether physical properties of the targets may have influenced consistency effects, we analysed the data considering target object size. Approximate area of each target was calculated by noting the size of the smallest box which could contain it and, within each image set, they were classified as being small, medium or large. In Experiments 1 and 2, small objects were contained within a pixel area of 7000 square pixels or less (<7 square approx.), medium-sized targets within 17 000 square pixels (7 –16 square approx.) and large targets were greater than 17 000 square pixels (>16 square approx.). In both experiments, the same 18 targets were classified as ‘small’, 16 as ‘medium’ and 10 were ‘large’.
Targets within the photographic stimuli used in Experiments 3 were smaller overall and 24 targets were classified as ‘small’, less than 4000 square pixels (<4 square), 23 were medium sized, between 4000 and 8000 square pixels (4–8 square), and 17 were considered ‘large’, greater than 8000 square pixels (>8 square). As converting the photographs to line drawings for Experiment 4 altered the displays slightly, the targets were categorised independently according to the same criteria, with 20 small targets, 23 medium-sized targets and 21 large targets. Each target was presented at each possible eccentricity.
In Experiment 1, large objects produced a significant consistency effect (X2 1 = 8 85 p = 0 003), with higher accuracy for inconsistent targets than consistent targets at extrafoveal locations. Medium-sized targets only indicated a trend in this direction and small targets showed no effect of consistency, possibly because of identification difficulties at extrafoveal locations from a brief presentation. However, for all target object sizes, accuracy for consistent objects presented at fixation was slightly, but not significantly, higher than for inconsistent objects. Experiment 2 produced no evidence of a consistency effect mediated by object size, supporting the assumption that semantic information was not obtained from these inverted images.
In Experiment 3, objects of medium size produced an advantage for consistent targets at all fixation positions (X2 1 = 7 24 p = 0 007) and conversely, large targets indicated an advantage for inconsistent targets across all fixation positions (X2 1 = 14 6 p < 0 001). In Experiment 4, a significant advantage for consistent targets was found for mediumsized objects (X2 1 = 4 76 p = 0 029). Accuracy fell to chance level at position 3 (9 ) and no difference was found between consistent and inconsistent targets at this position or
630 |
L. Gareze and J. M. Findlay |
beyond. This finding suggests that the advantage for medium-sized consistent objects was mediated by their detectability, as the consistency effect was only evident when accuracy was above chance. Unlike Experiment 3, no corresponding effect was found for large objects, with no difference between consistent and inconsistent targets at any fixation position. This result suggests that the conversion from photographs to line drawings may have interfered with the identification of large targets, particularly inconsistent targets, even though accuracy remained high at all fixation positions.
These findings suggest that the exhibition of consistency effects was closely linked to the detectability of the target objects, as consistency effects were limited to conditions in which the target would be more easily identified. Significant consistency effects were not found for small target objects in any experiment, or for medium-sized targets presented beyond 6 from fixation in Experiment 4. The most reliable difference was an advantage for consistent targets over inconsistent targets presented directly at fixation, which was present in three of the four experiments but only significant in Experiment 1. This pattern was also found within the target size analysis. Where visible differences occurred at fixation, the advantage was for consistent targets, with the single exception of large targets in Experiment 3 where no similar effect was found within the whole dataset. This advantage for consistent targets at fixation suggests that a compatible scene context can facilitate accurate responses in this detection task when the target is directly foveated.
Unlike the more universal consistent object advantage at fixation, other significant consistency effects, such as the inconsistent object advantage in extrafoveal vision found in Experiment 1, were apparent only under certain conditions, suggesting that they may have been influenced by other factors unrelated to semantic consistency. The existence of both a consistent object advantage and an inconsistent object advantage for different target sizes within the same data set in Experiment 3 may shed light on why reliable consistency effects have proved difficult to elicit in previous work. Although the number of objects included in each size category may be too small to provide reliable evidence of consistency effects, these data suggest that object size may influence the exhibition of such effects and is worthy of further investigation when considering the often conflicting data in this field.
7. Experiment 5
The results of Experiment 1 suggested that semantically inconsistent objects might be identified more readily than consistent objects from a brief scene presentation, although this result was not replicated in Experiment 3. Experiment 5 investigates whether the effects of semantic inconsistency appear in free viewing. As discussed in the Introduction, following the pioneering work of Loftus and Mackworth (1978), a number of studies have analysed eye scan records during naturalistic scene viewing tasks to investigate whether semantic information can be extracted extrafoveally. Many studies have failed to replicate the original finding but more recent careful work (e.g. Hollingworth & Henderson, 2000) has reopened the question. Hence, we carried out a study in which participants were
Ch. 29: Absence of Scene Context Effects in Object Detection and Eye Gaze Capture |
631 |
shown extended (7 s) presentations of the simple line drawing scenes and the grey-scale photographs (used in Experiments 1 and 3 respectively) and their eye movements were recorded for the duration. These data were investigated for evidence that the semantic inconsistency between inconsistent targets and their scene background could be detected prior to direct fixation, compared to scenes containing only consistent objects.
If the inconsistent targets in the Leuven image set were more salient, visually or semantically, than the consistent targets and this difference could be detected extrafoveally, then we would expect the eye movement data to reflect this in measures of saccade behaviour prior to target fixation. The effect of the (more naturalistic) consistency manipulation in photographs on eye movement behaviour was also investigated in this way.
7.1. Method
Twenty-four participants were recruited from Durham University and all had normal, uncorrected vision. A subset of the experimental images used in Experiments 1 and 3 was displayed to each participant in separate blocks, which were counterbalanced across participants. All images were used but the number of times a participant viewed a given scene background was controlled as in the previous experiments and each background was displayed once only. The scenes were presented centrally and subtended approximately 16 × 12 at a viewing distance of 85 cm. Participants were instructed to view the scenes naturally and that no memory test would follow.
Their eye movements were recorded for the 7 s display duration using a Fourward Technologies Dual Purkinje Generation 5.5 eye tracker. The resolution of the eye tracker was 10 min. of arc. and the sampling rate was every millisecond. The movements of the right eye were monitored but viewing was binocular. Head movements were restrained with a chin rest and two forehead rests. The accuracy of the record was checked every four trials and recalibration occurred when necessary. The eye movement data were analysed offline by a semi-automated procedure. A computer algorithm detected the saccades using a velocity criterion and each record was inspected individually.
7.2. Results
Table 1 summarises the effects of consistency on eye movement behaviour in simple line drawings and photographs. For both image types, a significant effect of consistency was found on measures following direct fixation of the object, such as the first fixation duration (the duration of the first fixation on the target) the first pass fixation duration (the sum of the first and any consecutive fixations on the target before moving the eyes away) and the total fixation duration (the sum of all fixations on the target). As hypothesised, in line drawings, inconsistent targets were fixated for significantly longer than consistent targets (t34 1 = −2 76 p = 0 008), although whether this was due to difficulty in reconciling the semantic inconsistency between the object and the scene or difficulty in identifying the line drawing object is still undetermined. In photographs, a similar effect
632 |
L. Gareze and J. M. Findlay |
Table 1
Experiment – Free scene viewing of simple line drawings and grey-scale photographs. Summary of results for consistent and inconsistent line drawings and photographs. Measures show the mean value across the 24 subjects
|
Line drawings |
|
Photographs |
||
|
|
|
|
|
|
Measure |
Consistent |
Inconsistent |
|
Consistent |
Inconsistent |
|
|
|
|
|
|
Probability of target |
91.7 |
93.2 |
89.6 |
85.4 |
|
fixation (%) |
|
|
|
|
|
Number of saccades prior |
4.5 |
5.2 |
5.2 |
5.2 |
|
to fixation |
|
|
|
|
|
Arrival time (ms) |
1309 |
1613 |
1856 |
1803 |
|
Saccade amplitude to |
3.7 |
3.8 |
3.3 |
3.7 |
|
the target ) |
|
|
|
|
|
First fixation duration (ms) |
383 |
550 |
380 |
433 |
|
First pass fixation duration (ms) |
573 |
718 |
431 |
549 |
|
Total fixation duration (ms) |
1020 |
1244 |
775 |
1010 |
|
p < 05
p < 01
was found. Although the difference in first fixation time did not reach statistical significance (t46 = −1 56 p = 0 13), there was a significant effect of consistency on the first pass (t46 = −2 58 p = 0 013) and on the total fixation durations (t46 = −3 40, p = 0 001), with longer fixations on inconsistent targets. Upon first fixation, the photographs of inconsistent household objects did not elicit significantly longer fixations than those of consistent household objects, perhaps because they were not sufficiently inconsistent with the scene context or unrecognisable within the background.
To investigate the possibility of a consistency effect prior to target fixation, we considered the evidence that inconsistent targets were fixated earlier than consistent targets. However, there was no evidence that the eyes were directed towards inconsistent targets any sooner than consistent targets in either image type. Although there was a slight difference in the time taken to fixate the target and the number of saccades executed prior to target fixation in line drawings, this difference was in the opposite direction to the hypothesis that inconsistent targets would be fixated sooner.
Saccade amplitudes towards the targets were comparable between consistent and inconsistent targets, in both line drawings and photos, indicating that the objects were selected as saccade targets from approximately the same level of extrafoveal processing. The mean saccade size was approximately 3 75 for saccades to both consistent and inconsistent line drawing targets. This value is considerably less than the eccentricity at which many of the targets were presented in Experiment 1 and which produced evidence of facilitated performance for inconsistent targets compared to consistent targets. As the number of fixations executed prior to target fixation was also comparable across consistency conditions
Ch. 29: Absence of Scene Context Effects in Object Detection and Eye Gaze Capture |
633 |
and across image types, we can conclude that targets were selected for fixation from equivalent levels of extrafoveal processing regardless of whether the scenes were simple line drawings or complex grey-scale photographs.
For the photographs, we also compared the behaviour for each target in two different scene contexts and two matched targets in the same scene background. This counterbalancing of targets allowed us to investigate the possibility of consistency effects more closely. However, again, there was no evidence of a consistency effect prior to target fixation, with the only significant differences being in fixation measures. This lack of evidence towards increased visual salience of inconsistent targets indicates that the inconsistent advantage in brief presentations for line drawings (Experiment 1) was not manifest in Experiment 5. Whatever process facilitated object detection in the scene regions containing inconsistent targets in brief presentations failed to influence the eye movement pattern during natural scene viewing.
Targets in the photographs were fixated later (by an average of 369 ms) than those in line drawings, although the number of saccades executed prior to target fixation was approximately the same. This discrepancy suggests that fixations on distractors (which would all be consistent) in photographs were longer than in line drawings (although this is not true of fixations on targets). Total fixation durations on targets in line drawings were considerably longer than in photos (over 200 ms) but this finding can be explained by the relative differences in scene composition between line drawings and photographs. Fewer items to explore in line drawings would result in more refixations on targets than in photographic images, producing longer total fixation times for targets in line drawing images than for those in complex photos containing many distractors to explore.
To investigate whether fixation durations on distractors were longer in photographs than in line drawings, we divided the 7000 ms presentation time into 1000 ms time bins and allocated individual fixation durations to the time bin in which the fixation started. We excluded fixations directed at targets (as consistency effects were seen) and also the final fixation in each trial, as this fixation was terminated artificially by the disappearance of the image. Therefore, considerably fewer fixations were allocated to the final bin than the preceding ones.
Figure 6 displays the mean fixation duration (including 95% confidence limits) across the time course of the trial, for line drawings and photographs. In both cases, the mean fixation duration increases from the shortest fixations, commencing within the first 1000 ms, to a maximum value for fixations beginning 3000–5000 ms into the trial. As the trial elapses beyond this point, fixation durations appear to shorten. This trend for shorter fixations at the start of the trial has been noted previously (as reviewed in Findlay and Gilchrist, 2003). Using computer-rendered images of room interiors, Unema, Pannasch, Joos, & Velichkovsky (2005) found that an asymptotic fixation duration was reached after approximately 3.4 s during a 20 s trial, after which fixation duration levelled off, while our data indicate fixation duration decreasing towards the end of the trial, perhaps motivated by the imminent disappearance of the display.
Within each time bin, the mean fixation duration was longer for photographs than for line drawings, by an average difference of 33 ms. This stable effect accounts in part
