Frontal Lobe and Process Dissociation

This is an image taken from a typical PET acquisition. It is a tomographic view of a brain examination in transaxial view. Red areas show more accumulated radioactivity and blue areas are partions where low to no activity was accumulated. It should illustrate how a typical PET image looks like. It was taken with an ECAT Exact HR+ PET Scanner.

Animal evidence on the organization of frontal lobe and limbic structures and pathways and evidence from frontal lobe patients have led to cognitive models which attempt to explain human-motivated abstract reasoning abilities. These models relate motivated information processing to the activation of specific frontal brain areas and circuits that are further related to ongoing autonomic changes.

The Wisconsin Card Sorting Test (WCST) has been generally considered a prototype examination of abstract reasoning and frontal lobe function. In the WCST, the subject has to match geometrical shapes on a target card to four standard cards with different geometrical shapes. The subject has to find a rule on the basis of positive or negative feedback upon the response. When the rule has been found, application of this rule provides positive feedback until (after 10 trials) the rule is suddenly changed. From this point on, the sequence of rule search followed by rule application is repeated. Perseverative responses occur when a subject is unable to switch to a new rule, but continues to apply the old rule.

Evidence has established that subjects with frontal lobe damage, elderly adults, and Attention Deficit Disordered Hyperactive children perform relatively poorly on the WCST. These subject groups detect fewer rules, and exhibit more perseveration than control subjects. From experimental studies in patients with prefrontal lesions, Dubois et al. (1995) concluded that two dysfunctions emerge in these patients: (1) a disruption of rule finding, because of the emergence of non-cognitive, rigid patterns of behavior and (2) inadequate feedback integration in the selection process of the appropriate response.

Support, in particular, for the first dysfunction has come from PCA studies of WCST scores in frontal patients, schizophrenic patients and control subjects. These studies have consistently extracted a main factor, which explained most of the variance. Scores such as conceptual level response and number of correct categories had negative loadings, while perseverative error scores had positive loadings on this factor. Moreover, this factor clearly discriminated between control and frontal lobe patients, or schizophrenic patients. Interpretations such as ‘perseveration’, ‘abstract thinking ability’, ‘concept formation’, ‘undifferentiated executive function’ and ‘flexibility’ have been proposed for this PCA component.

Other studies have also indicated that a deficit in frontal executive functions may deteriorate WCST performance. Neurophysiological studies, using PET scan techniques, indicated that WCST problem solving engages, in particular, the dorsolateral prefrontal cortex . Also Ragland et al. (1997) reported more selective dorsolateral prefrontal and inferior frontal regional cerebral blood flow activation in good WCST performers than in middle and bottom WCST performers. This evidence suggests that WCST problem solving is related to activation in left frontal, left prefrontal, and left temporal cortical areas.

In the literature, the WCST has often been conceptualized as a working memory task that invokes specific intermediary operations, including switching cognitive sets according to changing contingencies, rule learning, formation of conceptual sets, application of detected concepts, maintaining sets, and use of error information. Close inspection of the test trials reveals that the WCST requires two distinct types of problem-solving strategies: rule search and (after the rule has been found) rule application for 10 consecutive trials. In rule search, the subject has to deduce the relationship between the standard and target cards. This demands a complex, ‘open ended’ set of processes, such as, the inhibition of an earlier rule, testing of alternative rules, and updating after feedback. In contrast, after the rule has been found, rule application only requires a sequence of relatively simple, pre-planned operations, the subject only matches one dimension of the geometrical shapes of the target and standard cards and ignores the other dimensions.

Theoretical models assume that in novel or complex tasks, such as the WCST, a supervisory controller co-ordinates the selection of the intermediary operations. According to Norman and Shallice (1986) a supervisory attentional system allows for conscious attentional control to modulate performance. It operates by the allocation of attentional resources, which add to the activation values of selected operations and decrease the activation values of earlier operations. Norman and Shallice suggested that WCST perseverative responses in frontal lobe patients result from a deficit in the allocation and supervision of attention. This model is consistent with results of frontal patients which indicated that attention to and integration of feedback information in these patients is inadequate. Also Dunbar and Sussman (1995) concluded from experiments with normal subjects that WCST perseveration may result from an inability to exert attentional control resulting in a failure to update feedback information. The theory of Norman and Shallice attributes a main regulatory frontal lobe function to attentional processes during WCST rule search. Moreover, it suggests a complementary process that is sufficient for relatively simple or well-learned acts: automatic contention scheduling. This mechanism involves the automatic execution of an action sequence (schema) that does not require attention. As argued above, WCST rule application demands a similar process. It requires the pre-planned execution of a series of processing steps which together result in the application of a rule. Following Norman and Shallice’s theory, this rule application process may be automatically executed and may not require attentional resources.

Based on more recent animal brain research, Gray (1995a, b) has proposed a system of frontal/septo-hippocampal pathways that controls the automatic, pre-planned execution of step-by-step motor programs: the behavioral approach system. When a comparator process triggers a mismatch, this ongoing behavior is inhibited by the activation of a behavioral inhibition system which also increments arousal and attention. From a related animal research perspective, Pribram (1991) has argued that attentional processes in the frontal lobes and in the limbic system are closely connected to habituation, familiarization and orientation. A novel stimulus or a reinforcing stimulus such as positive feedback may disrupt habituation and elicit an orienting response. Moreover, it is well known that the frontal lobes have output pathways that project to the brain-stem, to control autonomic physiological support for anticipated behaviors and orienting responses. Hence, Pribram (1991) argued that an important part of the cortically and sub-cortically elicited orienting response is a set of viscero-autonomic responses, such as brief heart rate change, skin conductance change and change in respiratory rate. His research suggested to us that WCST rule application and rule search might also elicit viscero-autonomic responses. Below we will argue on the basis of the literature how such viscero-autonomic responses, in particular cardiac responses, may be associated with WCST performance.

Tags:

The Visual Perception of Human

Motion involves change in position, such as in this perspective of rapidly leaving Yongsan Station

Human observers are particularly sensitive to human movement. For example, adults can rapidly perceive a human form in a display of discrete elements (commonly referred to as “point-lights” and illustrated in Fig. 1a) moving as if attached to the major joints of an otherwise invisible person. Though no explicit contours, textures, or colors indicate the presence of a human form, the visual system is able to extract the structure of a human body from the motion of these elements within a fraction of a second.

Fig. 1. Static illustrations of point-light displays of a walking human figure. (A) An upright figure in which all of the major joints, or points of articulation, of the human form are demarcated. In the actual displays, the gray outline of a person does not appear. It is included here to illustrate the structure of the figure that quickly becomes apparent when the elements move. (B) The same form rotated to an inverted orientation. Though structurally identical to the upright figures, inverted displays are rarely identified as depicting human form.

Since Johansson’s introduction of point-light displays into contemporary perceptual psychology over two decades ago, numerous researchers have attempted to extend general models of the visual perception of structure from motion to account for the perception of human movement from motion-carried information. Under such general accounts, the same visual processes are used to extract the structure of any object. All objects, and all object parts, are thus perceptually equivalent. Indeed, models based on hierarchical vector analysis or assumptions of pairwise rigidity among elements have had some success in accounting for the perception of the complex, jointed structure of a human body.

Nonetheless, these general structural accounts have not been able to mimic either the robustness or the limitations of human performance in the perception of point-light walkers. Adult observers fail to accurately detect, organize, or identify these displays when the figure is presented upside-down. Both upright and inverted displays of human locomotion possess the same hierarchical structure, the same rigid relationships, and the same oscillatory motion trajectories. Under accounts in which perception relies entirely on such properties, the visual analysis of upright and inverted displays should be identical. Human performance, however, is not. Models that posit only general analyses and organizing heuristics are insufficient to account for the orientation-specificity of the perception.

Increasing evidence suggests that the perceptual analyses underlying the perception of human movement may differ from other motion analyses. The visual perception of human movement may benefit from a convergence of motion and form processes that does not occur during other perceptual analyses. Other studies suggest that the visual analysis of human movement differs from other analyses because it depends upon activity of the motor system. To the extent that either or both of these approaches are correct, one can conclude that the recognition of a human actor executing some movement may very well involve a special process that taps domain-specific information about human motor activity.

If such a mechanism exists, can it be best described as a global or local process? Local processes are thought to occur in lower levels of the visual system and to be restricted to brief temporal intervals and small spatial neighborhoods. The results of these “local” analyses are then passed onto and processed by higher level or more “global” mechanisms that process information across larger spatio-temporal extents. While local and global are difficult to define as absolute terms, most studies of the visual perception of human movement have defined local analyses as the computations conducted on individual points (joints) or point pairs (limbs). Global analyses are conducted over larger areas and generally involve an entire point-light walker. For a discussion of local and global factors in visual completion, see Tse (1999) and Van Lier (1999).

Evidence supporting the global analysis of human movement comes from studies demonstrating that observers are able to extract human structure from displays in which visual noise renders local motion information organizationally ambiguous. This suggests that the visual system can exploit configural or global information in the absence of unambiguous local motion cues. Evidence supporting the hypothesis that the visual analysis of human movement relies on local processes comes from a series of apparent motion experiments in which subjects viewed an animated walker within a mask. Since subjects could only discriminate leftward – from rightward-facing walkers under short range apparent motion conditions, their perception of the walkers’ movements appears to have depended upon local motion analyses operating within small temporal windows. More recent research has suggested that neither local nor global processes alone can account for the visual perception of human locomotion. Instead, visual processes at both high and low levels of the visual system appear to make important contributions to the visual perception of human movement. This conclusion suggests that a new approach to the study of the visual perception of human movement may be warranted.

Instead of focusing on the local/global debate, the goal of the current series of experiments was to determine the minimal stimulus information necessary for the visual perception of human movement. In this way we hoped to identify the information that normally triggers processing by the mechanisms thought to underlie the visual perception of human locomotion.

Our approach is based on the assumption that the perception of human movement involves the integration of form and motion information. Neither form nor motion information alone defines the category of human movement. As Johansson’s earliest demonstrations show, observers do not perceive human form in any single static frame of the animated figure even though the configuration of the elements is consistent with a human figure. Conversely, motion alone does not convey human form and movement. When the elements comprising a point-light display are spatially scrambled so that the configuration of the elements is no longer consistent with a human figure, the impression of human or animal form is greatly diminished. Thus, the visual system’s capacity to extract the human form from motion relies on the integration of both form and motion cues.

In order to create varying exemplars of human locomotion, we manipulated two perceptual components that play a fundamental role in the production of human walking: dynamic symmetry among the limbs and the principal axis of organization. Dynamic symmetry refers to the equal and opposite motions of adjacent limbs (either contralaterally or ipsilaterally). During human locomotion, when one limb moves forward, it’s neighboring limbs move backward, anti-phase to the first limb. This anti-phase patterning of limb movements is an invariant of human gait. In the human body, the principal axis of organization refers to the primary structure about which the limbs are organized, namely the torso. Principal axes play an important role in the recognition of objects generally. The structure of an object’s principal axis also appears to distinguish between classes of animals. Thus, the principal axis of organization and the dynamic symmetry of the limb movements are features likely a priori to contribute to the perception of human form and movement.

Tags:

In Depth Object Views

A 3D model of a Mangalore from the film The Fifth Element in the 3D modeler LightWave, shown in various manners and from different perspectives

As we move through the physical world, we are constantly presented with different 3D views of the objects within our environment. Indeed, both observer movement and object movement can cause the visible parts of an object to become occluded and previously hidden parts to become revealed. As a result, we frequently observe different object views with different part configurations. How does the visual system integrate different object views so that we can recognize the same object across changes in viewpoint?

Traditional models of object recognition addressing this question seem to cluster into two general categories: structural-description and image-based approaches. Structural-description models suggest that objects are represented based on the arrangement of their parts independent of the observer’s viewpoint. Image-based models propose that objects are represented as sets of multiple 2D views.

More specifically, structural-description and image-based models suggest different approaches to the recognition of unfamiliar views of an object rotating in depth. Structural-description models suggest that we can recognize unfamiliar views if they have the same part configuration as familiar views. For example, the “geon structural descriptions” (GSDs) model suggests that objects are represented by viewpoint-invariant volumetric primitives, known as geons. Viewpoint-invariance can be achieved when objects can be decomposed into a distinct configuration of 3D parts that does not change with any object transformation. When these conditions are satisfied, object naming and matching can be performed in a viewpoint-invariant manner.

Image-based models suggest that unfamiliar views can be recognized by extrapolating from “known” object views or by interpolating between “known” object views. Psychophysical studies suggest that both humans and monkeys can interpolate between familiar object views and recognize novel views falling within limited generalization fields that span up to approximately 45° from “known” orientations. Consistent with these behavioural generalization fields, many inferotemporal (IT) neurons seem to respond selectively to the training orientation of an object and more broadly to neighboring orientations.

There is evidence that this restricted generalization can be expanded with frequent exposure to multiple object views especially when these views are qualitatively similar. Moreover, the interpolation between “known” views can be facilitated when multiple object views are linked by temporal sequence. As objects rotate in depth, we perceive multiple views in a temporal sequence.

Several studies suggest that temporal contiguity is important for linking multiple object views. For example, viewing structured sequences of multiple briefly presented views of a 3D object facilitates object naming over viewing random view sequences. When presented with three different sequences of faces, each one consisting of five different faces in a different pose, subjects have greater difficulty discriminating between different faces from the same sequence than different faces from different sequences. Also, viewpoint-invariant performance has been shown for rotating familiar and novel objects in a short-term recognition test when the study and the test views share the same parts. Finally, orientation priming is observed across blocks when the prime and the target objects are from the same visually homogenous class, but not when they are from different categories. However, viewpoint-dependent performance has been observed for the same objects in a long-term recognition test. These results suggest that viewpoint-invariance can be achieved in memory for multiple temporally contiguous views of an object.

Furthermore, neurophysiological studies provide evidence for associative mechanisms based on image similarity or temporal contiguity. Cells in the inferotemporal cortex of monkeys become tuned to a small number of dissimilar visual patterns (colored fractal patterns) when these patterns are sequentially paired over many trials in a pair-associate task or when monkeys perform a standard or a delayed matching-to-sample task. These studies suggest that viewpoint-invariant performance can be observed when viewpoint-dependent object representations are temporally associated.

All of the above studies investigated object recognition across changes in the orientation of an otherwise static object. However, outside of the laboratory, changes in object orientation occur when objects and observers move. As a result, we are presented with continuous sequences of multiple object views that are highly similar to each other and close in space and time. Can the visual system take advantage of motion and integrate different views of 3D objects rather than associating static snapshots based solely on their similarity and temporal contiguity?

Previous research suggests that motion can link 2D object views and thereby facilitate viewpoint-invariance within the object’s path of motion. The following experiments investigated whether motion could similarly enhance the integration of 3D object views more readily than temporal sequence and lead to the immediate construction of viewpoint-invariant representations of 3D objects rotating in depth. Can such a motion-based linkage occur even when depth rotations produce different visible part configurations? We were especially interested in how novel object views might be represented. Priming of novel views falling within the object’s motion path would suggest the existence of viewpoint-invariant object representation across depth rotation. In Experiment 1, we asked whether priming would occur for novel views of 3D objects when the prime views were linked by motion. In Experiment 2, we asked whether priming would occur for novel views of 3D rotating objects when the prime views had different part configurations. Experiment 3 examined whether priming would occur for novel object views when the prime views were separated spatiotemporally but perceived in motion when an occluder was placed between them. Finally, in Experiment 4, priming for familiar objects across depth rotations was investigated.

Numerous psychophysical studies have suggested that objects rotating in depth are represented in a viewpoint-dependent manner. As a result, observers show higher error rates and longer reaction times when identifying objects from novel views than from “known” views. For example, depth rotation of familiar objects has been shown to result in slower naming performance as the rotation causes an object to diverge from its canonical view. Performance in naming line drawings of familiar objects rotated in depth is best for canonical views and for views sharing similar image structures. Also, accidental foreshortened views of line drawings are more difficult to identify than non-foreshortened views of the same objects. Consistent with these object identification studies, naming photographs of familiar objects in a priming paradigm shows less priming when the same objects are presented a second time but at a different view. This priming decrease is larger when objects are studied in familiar views but tested in unfamiliar ones.

Similar performance decrements with changes in viewpoint have been reported for unfamiliar objects rotated in depth, such as wire-form objects and photographs of objects made of clay. Also, 3D objects rotated in depth have been shown to require more recognition time as the rotation angle increases.

Tags:

With Respect to Reference Frames

Description of relations between Axial tilt (or Obliquity), rotation axis, plane of orbit, celestial equator and ecliptic.

There are multiple ways in which the human visual system can encode objects. An object can be specified relative to the observer, to the environment, to its own intrinsic structure or to other objects in the environment. Each instance requires the adoption of specific spatial frames of reference. In general, reference frames provide a structure for specifying an object’s spatial composition and position.

Spatial reference frames can also be utilized in multiple ways to transform objects in the imagination. For example, if an observer wanted to construe an object at a different orientation without actually performing any actions, she could try at least two mental operations, each of which requires the rotation of a different reference frame with respect to a given stationary frame. She could either picture the object turning to its new orientation (object-relative or intrinsic reference frame) or she could imagine moving herself to the viewpoint corresponding to the new orientation (egocentric or relative reference frame). Both of these processes have been implicated in human beings’ ability to update objects across different viewpoints.

Despite the seeming importance of both of these processes, the majority of research has focused primarily on the first type: imagined object rotations. For example, the classic studies of Shepard and colleagues established that observers mentally rotate the axes of one object into congruence with those of another object in deciding whether their shapes are similar. Other studies have examined observers’ ability to predict the orientational outcome of single objects rotated about multiple axes. However, until recently, imagined rotations of the self have received less empirical consideration.

The goal of this paper is to provide a more comprehensive account of the role of spatial reference frames in mental rotation. We review studies from several related research domains – mental rotation, object recognition, perspective-taking, and motor imagery – to examine effects of multiple reference frames on imagined transformations of the self and of objects. This approach is specifically intended to shed light on the recent finding of inferior object (versus viewer) rotation performance, as evidenced by longer reaction times and higher error rates. We find that this discrepancy may be attributable to differences in the way the reference frames corresponding to each imagined rotation are transformed by the human cognitive system.

After a review of reference frames, the first main section of the paper focuses on factors affecting imagined object rotations. A recurring finding of the studies we reviewed is that imagining an object’s rotation is problematic when no information other than the object’s initial orientation is provided. This suggests a general deficit with imagining a cohesive rotation of the object’s intrinsic frame. For such tasks, observers are likely to depend on supplementary information from other frames, such as the environmental frame. As evidenced in the object recognition literature, imagined object rotations are further facilitated by view-specific encoding with respect to the relative frame.

The second main section of the paper focuses on factors affecting imagined viewer rotations. A review of several motor imagery studies indicates that imagined viewer rotations are less susceptible to misalignment with respect to the environmental frame, perhaps due to the inherently cohesive structure of the relative frame itself. We end with a review of research directly comparing imagined object- and viewer rotations, which provides further evidence of differences in the ways the respective reference frames of each type of rotation are transformed.

We begin our review with a brief discussion of spatial reference frames. As mentioned above, imagining an object rotating to your current viewpoint or imagining yourself rotating around an object to a new viewpoint require the adoption of different reference frames. In the first section, we describe the principle frames involved in such movements. In the second section, we examine how the object and viewer frames move with respect to the environmental frame.

Rotation of an object predominantly utilizes an object-relative or intrinsic reference frame, which is defined with respect to the object’s intrinsic top/bottom, front/back, and right/left axes. Rotation of the viewer around the object predominantly utilizes an egocentric or relative reference frame, which specifies the location of external objects with respect to the major up/down, front/back, and right/left axes of the observer’s body. The egocentric frame is often further broken down to relate objects to specific parts of the body. For example, retinocentric encoding specifies an object with respect to the nodal point of the eye. Headcentric encoding specifies an object with respect to the center of the head. Bodycentric encoding specifies an object with respect to axes of individual body parts, such as the hand.

The environmental reference frame specifies the cardinal directions of north, south, east, and west. However, it can also be thought of more specifically as pertaining to structures and planar surfaces that are usually fixed with respect to the environment, such as the walls, floor, and ceiling of a room. Thus, the turning of a room about an observer or object would constitute an environment rotation.

As pointed out by Hinton and Parsons (1988), a complete description of reference systems requires consideration of their relationship to other frames. Besides the different rotation frames involved, one major difference between imagined object and viewer rotations is the way their effective reference frames change with respect to the environmental frame. In imagined object rotations, the intrinsic frame moves with respect to environment, whereas the observer’s relative frame remains fixed. In imagined viewer rotations, the intrinsic frame remains fixed, and the relative frame moves with respect to the environment. The latter situation has been referred to as the “more radical change in our total experience” because it interferes with our natural inclination to be oriented with respect to the environmental frame. The ensuing conflict between physical and imagined viewpoints has lead some researchers to assert that imagined object rotations must necessarily be easier and more natural to perform than imagined rotations of the self.

Tags:

Object Constancy and Visual System

OBJECT ORIENTED PROGRAMMING

Visual object constancy is the ability to determine the identity of an object from its variable image. The image varies considerably following plane and depth rotations of the object, due to viewing the object from different positions. Our visual system is said to achieve object constancy because it can usually cope with such variation, by accurately recognising a range of retinal images as depicting the same object. For example, most people would recognise the left four pictures in Fig. 1 as depicting an iron, yet the pictures differ in size, outline shape and in the presence of features and parts.

Fig. 1. Eight different views of an iron. On the left are line drawings, on the right are matched silhouettes. The iron rotates in depth through 0° (top), 30°, 60° and 90° (bottom) views. The 0° views fully reveal the main axis of elongation of the object. The 90° views are so foreshortened that the main axis of elongation of the image no longer coincides with the main axis of elongation of the object, but instead is the vertical axis. The 60° view was rated as the most canonical, typical view of the iron. On the left are the RTs to verify the identity of the line drawings in a speeded word-picture verification task; on the right are the analogous RTs for silhouettes, from Experiment 2 of Lawson & Humphreys (1999).

Although we can achieve object constancy, we are sensitive to image variation and we can accurately report the size and orientation of an object. In addition, speed of object recognition is influenced by the familiarity of a given view, its similarity to views of other objects from which it must be discriminated, and its “goodness” – how well it depicts the object. An aerial view of a house may be recognised accurately, but recognition would usually be slow compared to a street-level view. This is because aerial views of houses are uncommon, similar to aerial views of churches and barns, and “poor” (the view hides the 3D structure of the house and many of its distinctive features). Finally, as it is ecologically important to achieve object constancy efficiently, the visual system has presumably been driven to optimise it, and we may be unaware of the true processing costs involved.

In addition to achieving object constancy, the visual system must discriminate between stimuli which differ in semantically important ways. Improving the achievement of object constancy will often impair discrimination, so the achievement of these two functions will be in conflict, and the visual system must reach an appropriate compromise between them. If the visual system ignores much image variation, it will be easy to achieve object constancy (because the difference in size and shape between dachshunds and alsations can be ignored, to recognise them both as dogs), but it will then be harder to discriminate between different objects (wolves and alsations, which are visually similar but which have different semantic properties), and vice versa.

The achievement of object constancy can thus only be examined in relation to the difficulty of the discrimination task required of subjects. If we only have to distinguish a red cube from a metal cheese-grater and a yellow pool of paint, then simple surface or texture or shape information will accurately identify all three objects, irrespective of the view presented. The achievement of object constancy will be trivial. In contrast, under everyday viewing conditions, there are usually many objects which could be present in any situation, yet we rarely misidentify objects, even if the object is unlikely in that situation (a frog in our bedroom). Although difficult, discrimination is both rapid and accurate, so achieving object constancy is also hard. The difficulty of discrimination depends not just on the number of objects to be distinguished, but also on their similarity. Animals may be harder to recognise than manmade artefacts because, as a category, animals are more visually similar to each other than are artefacts.

Current theories of human visual object recognition acknowledge the importance of accounting for our ability to achieve object constancy and our ability to discriminate between different classes of objects. However, there is little consensus between these accounts about the representations and processes which are involved.

There are three classes of account of the achievement of object constancy. Invariant features accounts suggest that invariant features can be used to distinguish most views of one object from most views of all other objects. A feature is only unique to an object in the context of the set of objects from which that object is to be distinguished, i.e. given a particular discrimination task. Multiple view accounts suggest that the visual system stores several representations of each object, and that a given view is matched to the nearest view-specific, stored representation. Transformation accounts propose that the retinal image can be transformed to reduce differences between the image and a view-specific, stored representation.

Although representation (multiple view) and process (transformation) accounts differ conceptually, they are hard to distinguish empirically. Representations and processes cannot be examined independently of each other. A pattern of performance resulting from a certain representation being stored can be exactly replicated by specifying a particular process. Any well-specified theory of object recognition must describe both the representations stored by the visual system and the processes employed to access those representations. Furthermore, as more interactive models of human information processing become popular (e.g., neural network models), the distinction between representation and process is likely to break down. I have, however, discussed representation and process accounts separately since most theoreticians distinguish between the two. It is worth noting here that there are clear, object-specific effects in recognition. View-specific effects in priming studies are tied to subject’s experience with particular objects. Object-specificity is typically associated with access to different representations rather than the use of different processes.

Invariant features, multiple view and transformation accounts are not incompatible, and it is likely that the visual system employs them all to some degree to achieve object constancy. For example, Jolicoeur (1990) proposed an account of the effects of plane rotation on picture recognition which included all three classes of account. Jolicoeur suggested that plane rotated views of objects are often transformed to match a stored, upright view (transformation account), although plane-disoriented views of objects may also be stored, allowing direct matching of plane rotated views (multiple-views account). In addition, Jolicoeur (1990) suggested that a functionally distinct, feature-based route was also available which enabled pictures to be recognised by matching simple attributes such as colour, texture or shape, many of which are orientation-invariant (invariant features account). This latter route is most useful if distinctive objects are presented, or if a small set of stimuli are presented repeatedly, enabling subjects to learn which features are invariant. Note that subjects do not always use invariant information when it is available, and they may need explicit encouragement or training to take advantage of it.

Tags:

Amodal Completion and Process Dissociation

Green square, tilted, with colored cross and a diagonal.

In our visual world, objects usually occlude parts of themselves and parts of other objects. Yet we do not have the idea of being surrounded with just object fragments. Somehow, the visual system is able to give us the impression of complete objects. In such cases, perceptual interpretations clearly exceed the information that is present in the retinal image.

In Fig. 1A, a typical textbook example of visual occlusion is shown. In principle, this pattern could be interpreted in many different ways. Usually, observers prefer a partly occluded square, which is preferred to other possible interpretations such as the ‘mosaic’ or a rather arbitrary completion. This phenomenon of occlusion was first studied by Chapanis and McCleary (1953); Dinnerstein and Wertheimer (1957) and Michotte, Thinès, and Crabbé (1964). The latter scientists specifically studied the apparent ability of the visual system to fill in the missing or occluded parts and referred to it with the – still frequently used – term ‘amodal completion’. Nowadays, the study of amodal completion receives considerable attention and is recognized more and more as a fundamental issue in visual perception.

One way to investigate preferred completions is to just ask observers to draw their interpretation of occlusion patterns. This procedure has the advantage of obtaining many different spontaneous interpretations but has the disadvantage of potential sensitivity to irrelevant task influences (e.g., drawing capacities). In other studies the perceptual relevance of amodal completion has been tested by more refined experimental paradigms. For example, Gerbino and Salmaso (1987) showed the functional equivalence of partly occluded shapes and complete shapes by means of a simultaneous matching task. On the basis of their experimental results, Gerbino and Salmaso (1987) argued that partly occluded shapes were perceived as complete by means of a possible ‘automatic transformation of the visual code’. Additionally, Sekuler and Palmer (1992) studied the time course of completion by means of the primed matching paradigm. Their main result was that within a short period of time (as short as 200 ms) a partly occluded figure is perceptually represented as complete.

We take the above as support for the perceptual relevance of amodal completion (further on in this paper we will briefly return to this issue) and focus on the figural properties that determine completion. Our starting point here is the observation that when looking at a pattern such as in Fig. 1, most observers have a clear preference for one specific occlusion interpretation. Generally, there is broad consensus and consistency with respect to the preferred completions. To explain and predict perceived completions, a distinction between ‘local’ and ‘global’ accounts has often been made in the literature. In the following, I will first briefly discuss the impact of local and global figural properties and after that examine their impact on two potential extensions of the (hitherto) rather limited stimulus domain in amodal completion research.

Local accounts strongly rely on the impact of local cues such as specific discontinuities of contours at points of occlusion. As a proponent of a local account, Kanizsa, 1975; Kanizsa, 1979 and Kanizsa, 1985 emphasized the role of Good Continuation in amodal completion, generally taking place at so-called T-junctions. Rock (1983) implicitly accounted for the effect of T-junctions by stating that the visual system avoids a coincidental meshing of borders: Generally, a continuation of the occluding contour at a T-junction avoids such a coincidental meshing. In recent years, Kellman and Shipley (1991) greatly influenced completion research by means of the so-called relatability criterion. This criterion predicts completion by a smooth curve whenever virtual linear extensions of contours meet behind the occluding surface at angles equal or larger than 90°.

In a critical review, however, Boselie and Wouterlood (1992) pointed out several drawbacks of the relatability criterion, and additionally demonstrated that the relatability criterion does not always predict the perceived shape correctly. Their counterexamples fall into classes such as ‘relatable edges, but no completion perceived’ and ‘non-relatable edges, but completion perceived’. Moreover, the status of linear extensions, coterminating in one common angle, as in the ‘square’ interpretation of Fig. 1, is not clear in Kellman and Shipley’s model. In addition, Wouterlood and Boselie (1992) developed their own local completion model, exclusively based on the principle of Good Continuation. Depending on the type of junctions between two surfaces, their model predicts either a mosaic of surfaces or a completion of one of the surfaces. In the latter case the completion always proceeds by means of a linear extension of the contours. In spite of their local account, Wouterlood and Boselie (1992) acknowledged the influence of pattern regularities on the perceptual outcome. To avoid regularity-based influences they therefore restricted the predictive domain of their model to completely irregular patterns. So, in fact, the example of Fig. 1 lies outside the stimulus domain for which Wouterlood and Boselie’s (1992) model has been designed.

In contrast to local approaches, global approaches take into account shape regularities such as symmetries. Within Structural Information Theory (SIT), initiated by Leeuwenberg, 1969 and Leeuwenberg, 1971 and further developed since then, this tendency has been operationalized by means of a regularity-based coding system. Within SIT, visual patterns are encoded by means of three different coding rules; Iteration, Symmetry and Alternation, and proceeds as follows. First, the encoding of a visual shape requires a mapping of the shape to a symbol series. Heuristically, this can be done by tracing the contour of the shape after having labelled all edges and angles such that equal edges and equal angles obtain equal symbols. Second, the symbol series is reduced by extracting the maximum amount of regularity from such a series (e.g., the symbol series aaaa can be encoded into 4*(a) by means of the Iteration rule; the series abba into S[(a)(b)] by the Symmetry rule; and the series akal into left angle bracket(a)right-pointing angle bracket/left angle bracket(k)(l)right-pointing angle bracket by the Alternation rule). Third, the interpretation with the simplest code, or the smallest amount of descriptive parameters, is predicted to be preferred. It should be noted here that the SIT account is not to be taken as an actual process account, but rather as an operational tool to determine regularity-based rankings of alternative pattern interpretations.

Tags:

Image Amodal Completion Process Dissociation

This is a contour map labeled according to accepted cartographic conventions. Labels are placed in a sligthly curved line stepping up to the summit from several directions.

Several authors have offered demonstrations that the visual system links image fragments into larger regions in a bottom-up or stimulus-driven manner. When those fragments are taken to be the visible portions of an occluded object, the object is said to ‘amodally complete’ behind the occluder. There has been a great deal of debate regarding the cues that the visual system uses to determine whether two partially occluded objects amodally complete into a single object behind an occluder. Broadly speaking, there have been three types of theories to account for amodal completion: (1) those that argue that completion occurs when certain conditions are satisfied among local image cues, such as contour orientations and junctions, (2) those that argue that completion occurs when certain conditions are satisfied among global image cues, such as symmetry, regularity, or simplicity of form or pattern and (3) those that argue that completion occurs when certain conditions are satisfied among internal representations, such as surfaces and volumes, which must be inferred from image cues.

More specifically, theories of type (1) have generally argued that completion occurs because of good continuation among image contours that terminate at an image tangent discontinuity such as a T-junction. Type (1) theories are essentially local in nature because the initiating conditions for contour interpolation are local tangent discontinuities. Initiated contour interpolations can link up at a distance with other interpolated contours if they meet the condition of good contour continuation. Theories of type (2) have challenged the notion that completion takes place because of local cues, such as contour orientations and discontinuities, and have emphasized instead the importance of global regularities such as symmetry in the patterns of completing image regions. Most type (2) theories emphasize, global regularities in the inputs to completion processes, whereas others emphasize the importance of regularity and simplicity in the outputs of completion processes. Traditional theories of type (3) have emphasized that amodal completion occurs because of surface completion on a common depth planeNakayama, K. and Shimojo, S., 1992. Experiencing and perceiving visual surfaces. Science 257, pp. 1357–1363. View Record in Scopus | Cited By in Scopus (141). A more recent type (3) approach has emphasized the importance of volume completion.

These three positions are not mutually exclusive. For example, according to the account of completion developed here, the visual system uses both (1) local and (2) global image cues to generate representations of (3) volumes that may occlude one another under constraint of (2) learned or innate priors that minimize output complexity. This paper describes empirical evidence for a new type (3) account of completion that subsumes the flat surface approach as a degenerate case and emphasizes the importance of surface edges, curved surfaces, surface self-occlusions, and volumes in completion processes. According to this view, while the contour relationships emphasized by type (1) theories and global pattern regularities emphasized by type (2) theories play an important role in amodal completion, they are not direct cues to amodal completion. Rather, they and other image cues are used to infer edge, surface, and volume relationships in the world, and completion occurs at a level where these relationships are analyzed.

The proposal that completion is based upon contour relatability held broad appeal because one could determine whether contours were relatable just by comparing contour orientations in the image. A comparison of image contour orientations is the sort of algorithm that could be implemented in a reasonably straightforward manner using local operators. This suggested that one of the fundamental sources of image ambiguity, occlusion, might be overcome in a local and stimulus-driven manner, without recourse to complex global representations or higher-level knowledge. If image ambiguity due to occlusion could be overcome in a local, and stimulus-driven manner, perhaps other types of image ambiguity could also be overcome in this way. It might then be possible to maintain the reductionistic program of modeling vision as an entirely stimulus-driven succession of information processing stages over local cues.

It would indeed be computationally simplest if there were some locally measurable invariant cue in the image that was always present when the image depicted an occlusion scene and was never present otherwise. The visual system could then search for this occlusion cue and know with certainty that an object was occluded. Since Helmholtz (1962/1867) many researchers have maintained that image tangent discontinuities such as T-junctions approximate an invariant cue to occlusion, because these are always present in images depicting occlusion scenes. While it is true that T-junctions are generically present in the image when one surface occludes another surface separated in depth, T-junctions are not necessarily present in cases of perceived occlusion where one surface conforms to another. Similarly, it was suggested that contour relatability might also be an image cue generically present in all images projected from occlusion scenes. But this is also not true. Although contour relatability may be a useful cue to completion, it is neither necessary nor sufficient for a percept of amodal completion to take place.

Perhaps completion could be based on constructing and relating surfaces without having to go to a volume level of representation. However, even ‘surface relatability’ (among surface extensions into 3D occluded space) is neither necessary nor sufficient for amodal completion to be perceived. That is, there are no partially occluded contours or regions (in the image) or edges and surfaces (in the world) that can be extended along their respective visible trajectories (in the image or into occluded 3D space, respectively) such that they will link up with other partially occluded contours, regions, edges or surfaces.

These counter examples to ‘traditional’ contour- and surface-based theories of completion raise the fundamental question motivating the present investigation. If contour and region relatability in the image, as well as edge and surface relatability in the world, are not necessary and not sufficient for completion to take place, might there be some other more general principle that will allow us to predict when the visual system will amodally complete partially occluded volumes into whole volumes and when it will not? The proposed answer to this question for amodally completing surfaces and volumes is ‘complete mergeability’.

Tags:

Difficulty in Grouping Process Dissociation

Relation between computer vision and various other fields

Perceptual grouping is the process by which raw image elements are aggregated into larger and more meaningful collections. Grouping is widely assumed to be early, automatic, and preattentive, though the extent to which grouping can proceed without attention is controversial. Grouping and scene organization can impose decisive influences on other low-level processes. Grouping is a necessary precursor to object recognition, because for complexity reasons only well-organized groups, rather than arbitrary subsets of the image, can be compared against as stored object models. Nevertheless, grouping is certainly one of the least understood problems in vision. This state of affairs reflects the difficulty of precisely formalizing subtle human intuitions about the relative “reasonableness” of candidate groups.

Indeed, notwithstanding the rapidity and effortlessness with which human perceivers perform it, grouping is an extremely difficult problem from a computational point of view. The number of candidate groups in a configuration of n items is equal to the number of subsets and hence is exponential (2n); the number of partitions (divisions of the n items into disjoint subsets) is a far larger exponential function of n. Many early grouping phenomena, such as the detection of collinearity, are often treated by researchers as local problems in a restricted neighborhood, thus reducing the amount of computation required. However, the more general problem of grouping is well known to involve global effects. Long-distance influences over large areas of the image are common, meaning the fundamental complexity remains extremely high (a fact reflected in the very term “Gestalt”, connoting the primacy of the whole). Perhaps the best illustration of the difficulty is the fact that in computational vision, it has become commonplace to require a human user to outline target shapes in images before recognition or motion tracking can commence, because existing grouping algorithms do not provide sufficiently robust or accurate results. The lack of good algorithms in turn reflects the failure of psychologists to propose a theory rigorous and concrete enough to be implemented computationally.

Yet the real theoretical difficulty in grouping stems from the difficulty in clearly defining the computational goal: a rigorous definition of what makes a “good group”. Unlike such physically grounded variables as depth, color, and motion, goodness of grouping candidates does not have an objective physical definition. Some ways of combining image elements simply seem more intuitively reasonable than others. The Gestaltists called this elusive quality of perceptual goodness Prägnanz, usually translated as “good form”.

Two general strategies for attacking this problem in the literature can be distinguished. Loosely, some authors seek to explain the procedure by which the visual system arrives at its preferred percept – i.e., find a process model – while others attempt to characterize the nature of the preferred percept itself (cf. the distinction between dynamic and static approaches noted by Van der Helm & Leeuwenberg (1996)). The distinction is related to Marr’s well-known division between an algorithmic theory and a theory of the computation, the latter sometimes referred to as a competence theory following Chomsky’s terminology. As such the two approaches operate at distinct but mutually compatible levels of analysis. The research described in the current paper places the emphasis on the competence theory, on the belief that trying to discover how the visual system computes something – without first defining that thing – amounts to letting the tail wag the dog.

Hence, this paper focuses on an attempt to define in formal terms exactly which interpretation for a given scene is most preferred by human observers, and why. Mathematical details and computational issues in the theory, called minimal model theory, are explained in more detail elsewhere. The emphasis here will be on one particular issue: the role of grouping units. What kind of groups – contours, surfaces, objects etc. – are image items aggregated into, and why? In particular I will attempt to shed light on the somewhat amorphous concept of “object”, the grouping unit most difficult to define and hence, perhaps, most in need of a rigorous theory.

In the common wisdom, perceptual grouping is the process whereby the visual image is decomposed into objects. However, this definition is somewhat at odds with the way perceptual grouping is studied in practice by researchers in the field. More commonly, research has centered around how visual items are organized into striated patterns, contours, and Moiré patterns. Researchers studying perceptual completion behind a subjective occluder or a visible occluder have usually conceptualized the completed thing as a simple surface. Such an object though is at most a very simple one, consisting of only a single closed region, and almost invariably 2D. The computational literature has also focused primarily on contours and surfaces. In the human vision literature in general, there is a widespread view promulgated by Gibson (1979) that surfaces rather than objects are the primary unit of visual representation.

Objects per se have been little studied in the context of grouping. For the most part this probably stems from the difficulty in precisely defining them. Contours always have a certain well-defined geometrical form: they are 1D space curves, i.e., smooth deformations of the unit line. Similarly, surfaces are always smooth deformations of a neighborhood of the plane. Many objects are simply 3D analogs of contours and surfaces: smoothly bounded regions of 3D space (i.e., “blobs”). In general though objects can be more complex than this, having parts and articulated substructures, and potentially complex spatial relations within them. Given the difficulty in completely characterizing human grouping preferences even for these geometrically simpler units, grouping researchers have not often approached the more abstract problem of objects directly.

Our overall goal is to develop a theory describing the rules human observers use to choose the best interpretation of a given scene. Hence, we begin with the problem of defining an “interpretation”, a term often used in vision in a loose way but rarely defined carefully.

Reiter, R. and Mackworth, A. K., 1989. A logical framework for depiction and image interpretation. Artificial Intelligence 41, pp. 125–155. Abstract | MathSciNet | View Record in Scopus | Cited By in Scopus (31)Reiter and Mackworth (1989) (see also Clowes, 1971) have proposed a definition of an interpretation using ideas from mathematical logic, a field in which the idea of enumerating the alternative interpretations of a fixed set of facts is a central concept. Their definition is quite technical. The discussion here follows their definition only loosely, and is oriented specifically around the idea of choosing grouping units.


View Within Article

Tags:

Object Individuation and Process Dissociation

A fetus in its mother's womb, viewed in a sonogram (brightness scan)

The last 20 yrs have witnessed important progress in the study of infant object perception and cognition. Fantz, R. L., 1964. Visual experience in infants: Decreased attention to familiar patterns relative to novel ones. Science 146, pp. 668–670.Fantz (1964) first pioneered the methodology of preferential looking with infants and showed that we can study perceptual discrimination in very young infants using this method. Subsequently, Spelke (1985) and others extended the use of this method to ask questions beyond simple perceptual discriminations. For example, do infants perceive objects as three-dimensional? Do infants understand that objects are cohesive and do not leave parts of themselves behind while moving through space? Do infants comprehend that two solid objects cannot occupy the same space at the same time? Under what conditions do infants arrive at representations of two as opposed to one object in an event?

The general method of these studies exploits the fact that infants (as well as adults) tend to look longer at new and unexpected events. Infants are shown the same event or objects repeatedly, and their looking times are recorded. With each repetition, the infant’s looking time declines; that is, infants “habituate.” When the infant’s looking time has reached some pre-set criterion (usually 50% of the initial looking times summed over three trials), test trials begin. Infants are alternately shown an expected outcome (an outcome that is consistent with adults’ understanding of the physical or social world) and an unexpected outcome (an outcome that is inconsistent with adults’ understanding of the physical or social world). If infants have the same understanding of the events shown during habituation as adults, they should look longer at the unexpected outcome relative to the expected outcome. This particular version of the methodology is often called the “visual preference for violation of expectancy” paradigm. It is in one sense akin to the measure of reaction time: It takes the infant longer to process an anomalous event/outcome than one that is consistent with their general model of the world.

Many researchers have contributed extensively to the literature of how infants understand the physical world around them. Spelke and her colleagues have discovered several principles that guide young infants’ perception and reasoning of physical objects. First, physical objects are cohesive; they move as wholes, and they do not leave parts of themselves behind. Second, objects obey principles of continuity and solidity; they move on spatiotemporally continuous paths, and two objects cannot occupy the same space at the same time. Third, objects act on each other upon contact; that is, there is no action at a distance. These principles stay at the core of our mature understanding of physical entities as adults. Against the backdrop of a strong Piagetian tradition in developmental psychology, which claims that young infants experience a “blooming, buzzing confusion” as opposed to coherent three-dimensional objects in their surroundings, Spelke and her collaborators have shown that some of our deepest beliefs about how physical objects should behave may have their roots in early infancy, perhaps given innately. Other researchers have focused on infants’ reasoning about specific types of physical events such as occlusion and support, and how infants perceive the causal relations among objects.

These research enterprises have been followed up by many laboratories. Much controversy has been generated over (a) the nature of the infants’ representations when they show such early competence; (b) how these new findings should be reconciled with the highly robust and replicable findings by Piaget and his associates; and (c) how these early representations are related to the mature cognitive system in adults.

My research focuses on a particular aspect of object representations, namely the issue of object individuation and object identity. Specifically, how do infants arrive at representations of multiple objects and trace their identity through time and space? What sources of information do infants employ in this process? We know that adults use at least three sources of information in object individuation: Spatiotemporal information, object property information, and object kind information. Spatiotemporal information refers to generalizations such as one object cannot be at two places at the same time and objects travel on spatiotemporally continuous paths. Object property information refers to how we can use general Gestalt principles of good form, good continuation, and relevant featural differences (e.g., color, shape, size, or texture) in object individuation. Lastly, object kind information refers to our knowledge about specific categories of objects. For example, size change may or may not indicate a change of identity depending on whether the entity under consideration is biological or not. In other words, our criteria for object individuation and object identity are kind-relative.

Bower (1974) was the first to suggest that young infants may use spatiotemporal information to individuate objects before they use object property information to do so. In a series of experiments, Bower found that before five months of age, infants’ tracking behavior was interrupted if a moving object stopped abruptly, but not if an object (e.g., a toy bunny) had apparently turned into a different object (e.g., a toy truck). He concluded that infants at five months represent moving and stationery objects as distinct objects: When an object stops moving, it is no longer the same object. However, when a toy bunny apparently has turned into a toy truck, infants up to five months are unable to use the property differences to arrive at a representation of two distinct objects. As we will see below, although Bower’s particular spatiotemporal rule may be incorrect and his results have been difficult to replicate, his insight about the relative importance of spatiotemporal and object property information in early object individuation may well be correct.

In recent years, much more research has been conducted on how infants resolve the ambiguity concerning the number of objects in an event or a scene. Ambiguity in object individuation can arise in at least three ways. First, two fully visible adjacent segments may or may not belong to the same object. Second, two segments may be occluded at their boundary such that it is ambiguous whether they belong to a single object or not. Third, infants are constantly in situations where they have to determine whether two encounters with an object are one object seen on two different occasions or are two numerically distinct objects.

Tags:

Part of Ndm Decision Making

Situation awareness

In June 1994 a conference was held on Naturalistic Decision Making (NDM) in Dayton, OH. The book represents the content of this conference and is organised into five parts: 1) About naturalistic decision making, 4 chapters; 2) Applications of NDM research: perspectives from panels of applied researchers, 11 chapters; 3) Research reports, 9 chapters; 4) Methodological and theoretical considerations, 10 chapters and finally 5) one chapter in which Gary Klein tries to answer the question of where NDM is going.

The first part of the book gives an overview of the state of affairs of NDM and how it has progressed since the previous conference in 1989. Since its start, NDM has generally presented itself in relation to traditional decision making research. There appeared to be a general dissatisfaction with the application of formal models to real life settings, which could not be used to explain decision making processes of experienced people in their natural environment. NDM mainly distinguishes itself from traditional decision making by studying experienced people, using realistic task settings and a focus on situation assessment rather than the comparison of options. However, numerous critical notes are presented in this first part of the book as well. Both Beach et al. and Howell point out that much research done under the heading of NDM is far from new. Studies on expertise, for example, has been conducted within various domains, and NDM would only profit from taking these research findings into account. Howell indicates as a main weakness that only little effort has been put in enhancing the thin theoretical basis of NDM since the 1989 conference.

The second part of the book focusses on applications of NDM research and reflects panel discussions on a wide range of applied areas, including health care (Bogner), command and control (Drillings & Serfaty), aviation (Kaempf & Orasanu), business and industry (Schmitt) and process control (Roth). The chapters all indicate the value of NDM for applied settings and provide a research agenda for the specific domains. As noted by several authors there is a clear need, however, for research providing empirical support for the effectiveness of the NDM approach and the usefulness of its implications for training and interface design.

In the third part of the book nine studies are reported in order to expand the theoretical basis of NDM. Six of them used simulation tasks that fall in the military domain and the other three were conducted in a nuclear power station, a manufacture company and an operating room of a hospital. Data were acquired by using typical process-tracing techniques like thinking-aloud protocols, video-recordings and observations by domain-experts. As noted by Serfaty et al. there are still a lot of theoretical issues to be worked out. One such issue, according to them, is how knowledge in experts is organized and how experts retrieve that knowledge. A beginning was made by Schraagen who reported that experienced and less experienced DC officers differed in the way they recall information: experienced DC officers retrieved information in a causal form whereas less experienced officers recalled the information in a temporal form.

The fourth part of the book addresses methodological and theoretical aspects of NDM. Cohen, Freeman and Thompson, for example, mention a technique, “the crystal ball technique”, that is used for training experts’ metarecognitional skills. The crystall ball is an imaginary perfect intelligence source that, for instance, forces officers to generate alternative explanations for judgements they are fairly certain of. The method exposes the assumptions underlying these judgements and, by evaluating these assumptions, determines its reliability. Mitchel, Morris and Ockerman describe how Recognition-Primed Decision (RPD) Making is used as a technique to support reuse in software design. Although Klein sums up empirical support for a number of important assertions of the RPD model (people use their experience to generate plausible courses of action to be considered first; experts’ performance does not decline under time pressure; and, experts adopt a course of action without comparing alternative courses of action), Lipshitz and Shaul indicate that the RPD model misses an explication of the constructs schemata and mental models. These constructs are essential components of other NDM theories, as in, for example, Endsley’s model of Situation Awareness. Smith and Marshall also stress the importance of interactive knowledge organisation in human memory. Similar to Endley’s conception of mental models these schemata support recognition of important elements in the environment, the integration of these elements into a coherent pattern and the planning of actions to achieve one’s goals. However, one of the shortcomings of the RPD model is in explaining how a mental model is used to match an actual situation to knowledge representations in memory.

Tags: