Visual Search Latency Functions
Visual search tasks are frequently used to study visual cognition. The primary data obtained in such tasks are search latency functions, which represent the time required to detect the presence or absence of a target as a function of the total number of display elements. Pioneering work on visual search discovered the so-called “pop out” effect: the time to find targets characterized by a unique feature is typically independent of the number of distractor elements in a display, producing a flat search latency function. In contrast, the time to detect a target defined by a unique combination of features generally increases with the number of distractor items, producing search latency functions with positive slopes. These results formed the basis for Treisman’s feature integration theory of attention, which proposed that the visual features that produced pop out were detected and represented preattentively, and comprised a primitive vocabulary for visual perception.
More recent results have shown that, in some situations, targets defined by feature combinations can pop out. As this is inconsistent with feature integration theory, new theories of visual search have emerged in which the primary predictors of search latency functions are the similarities between target and nontarget elements. On the basis of these new theories, visual cognition researchers are now interested in altering search latency functions by manipulating display element similarity. This requires experimenters to calibrate stimulus similarity. Unfortunately, these calibration techniques are themselves controversial.
In contrast to traditional approaches, the experiments reported below were not motivated by existing theories of visual search. Instead, these studies were designed to test a prediction based upon several different computational models of how visual attention can be shifted. As a result, we succeeded in manipulating search latency in experiments which held stimulus properties constant. The results from this novel paradigm support a major assumption common to these computational models of attention. Furthermore, they provide additional insights into the mechanisms that underlie visual search and visual cognition.
Several researchers have proposed computational models of the attentional shifts that are required to detect targets in a visual search task. While the specific details of these models differ, their general structure is quite similar.
First, these models represent the display being searched as an array of processing units that can adopt different levels of activity. As the search task begins, the activity of each processing unit is a function of the visual distinctiveness of the location that the processor represents (i.e., how different it is in appearance relative to its neighbors). Typically, this measure of distinctiveness is global, in that it is computed across all of the features that characterize the display elements (see, for example, the saliency map described by Koch and Ullman, 1985). Distinctiveness is not defined with respect to individual features (e.g., distinctiveness is not computed for an element’s color and orientation separately).
Second, these models describe the attentional “spotlight” used to search for targets as a network that can “examine” the activity of these processors. The network identifies the most active processor (representing the most distinctive location) within the examined region by implementing a winner-take-all (WTA). Such a competition is defined by lateral inhibition: each processing unit has an excitatory recurrent connection to itself, and has inhibitory connections to neighboring processing units. The experiments described below used adapting luminance to manipulate these inhibitory mechanisms.
Third, in the context of visual search, the models propose that once the WTA competition has been completed, and the most distinctive location has been identified, the display element at this location is examined to see whether or not it is the target. If it is, search stops. If it is not, activity at this location either decays or is inhibited, and the search for the next most distinctive location in the display occurs via another WTA competition.
This type of model provides a straightforward account of pop out. Targets defined by a unique feature will be identified immediately, independently of the total number of display elements. Such targets will produce high global measures of distinctiveness, because the global measure is sensitive to unique features. As a result a pop out target will win the first WTA competition, which processes all display elements in parallel.
This type of model also provides a straightforward account of search latency functions obtained for targets defined by unique conjunctions of features. When the global measure of distinctiveness is computed, these targets will not be coded as being more distinctive than their neighbors, because the global measure is insensitive to the unique feature combinations. If one assumes slight random variations in the connections that define the WTA competition, then the winner of the first WTA competition in these displays will essentially be selected at random. Repeated WTA competitions will be required until the target is actually detected. Thus, for conjunction targets, the WTA models essentially describe search as random selection without replacement. As a result, search latencies will increase with the number of display elements. As well, the search process should be self-terminating, and accordingly the slope of the search latency function when the target is present should be half the slope of the search latency function when the target is absent.
- April 30th