Image Amodal Completion Process Dissociation
Several authors have offered demonstrations that the visual system links image fragments into larger regions in a bottom-up or stimulus-driven manner. When those fragments are taken to be the visible portions of an occluded object, the object is said to ‘amodally complete’ behind the occluder. There has been a great deal of debate regarding the cues that the visual system uses to determine whether two partially occluded objects amodally complete into a single object behind an occluder. Broadly speaking, there have been three types of theories to account for amodal completion: (1) those that argue that completion occurs when certain conditions are satisfied among local image cues, such as contour orientations and junctions, (2) those that argue that completion occurs when certain conditions are satisfied among global image cues, such as symmetry, regularity, or simplicity of form or pattern and (3) those that argue that completion occurs when certain conditions are satisfied among internal representations, such as surfaces and volumes, which must be inferred from image cues.
More specifically, theories of type (1) have generally argued that completion occurs because of good continuation among image contours that terminate at an image tangent discontinuity such as a T-junction. Type (1) theories are essentially local in nature because the initiating conditions for contour interpolation are local tangent discontinuities. Initiated contour interpolations can link up at a distance with other interpolated contours if they meet the condition of good contour continuation. Theories of type (2) have challenged the notion that completion takes place because of local cues, such as contour orientations and discontinuities, and have emphasized instead the importance of global regularities such as symmetry in the patterns of completing image regions. Most type (2) theories emphasize, global regularities in the inputs to completion processes, whereas others emphasize the importance of regularity and simplicity in the outputs of completion processes. Traditional theories of type (3) have emphasized that amodal completion occurs because of surface completion on a common depth planeNakayama, K. and Shimojo, S., 1992. Experiencing and perceiving visual surfaces. Science 257, pp. 1357–1363. View Record in Scopus | Cited By in Scopus (141). A more recent type (3) approach has emphasized the importance of volume completion.
These three positions are not mutually exclusive. For example, according to the account of completion developed here, the visual system uses both (1) local and (2) global image cues to generate representations of (3) volumes that may occlude one another under constraint of (2) learned or innate priors that minimize output complexity. This paper describes empirical evidence for a new type (3) account of completion that subsumes the flat surface approach as a degenerate case and emphasizes the importance of surface edges, curved surfaces, surface self-occlusions, and volumes in completion processes. According to this view, while the contour relationships emphasized by type (1) theories and global pattern regularities emphasized by type (2) theories play an important role in amodal completion, they are not direct cues to amodal completion. Rather, they and other image cues are used to infer edge, surface, and volume relationships in the world, and completion occurs at a level where these relationships are analyzed.
The proposal that completion is based upon contour relatability held broad appeal because one could determine whether contours were relatable just by comparing contour orientations in the image. A comparison of image contour orientations is the sort of algorithm that could be implemented in a reasonably straightforward manner using local operators. This suggested that one of the fundamental sources of image ambiguity, occlusion, might be overcome in a local and stimulus-driven manner, without recourse to complex global representations or higher-level knowledge. If image ambiguity due to occlusion could be overcome in a local, and stimulus-driven manner, perhaps other types of image ambiguity could also be overcome in this way. It might then be possible to maintain the reductionistic program of modeling vision as an entirely stimulus-driven succession of information processing stages over local cues.
It would indeed be computationally simplest if there were some locally measurable invariant cue in the image that was always present when the image depicted an occlusion scene and was never present otherwise. The visual system could then search for this occlusion cue and know with certainty that an object was occluded. Since Helmholtz (1962/1867) many researchers have maintained that image tangent discontinuities such as T-junctions approximate an invariant cue to occlusion, because these are always present in images depicting occlusion scenes. While it is true that T-junctions are generically present in the image when one surface occludes another surface separated in depth, T-junctions are not necessarily present in cases of perceived occlusion where one surface conforms to another. Similarly, it was suggested that contour relatability might also be an image cue generically present in all images projected from occlusion scenes. But this is also not true. Although contour relatability may be a useful cue to completion, it is neither necessary nor sufficient for a percept of amodal completion to take place.
Perhaps completion could be based on constructing and relating surfaces without having to go to a volume level of representation. However, even ‘surface relatability’ (among surface extensions into 3D occluded space) is neither necessary nor sufficient for amodal completion to be perceived. That is, there are no partially occluded contours or regions (in the image) or edges and surfaces (in the world) that can be extended along their respective visible trajectories (in the image or into occluded 3D space, respectively) such that they will link up with other partially occluded contours, regions, edges or surfaces.
These counter examples to ‘traditional’ contour- and surface-based theories of completion raise the fundamental question motivating the present investigation. If contour and region relatability in the image, as well as edge and surface relatability in the world, are not necessary and not sufficient for completion to take place, might there be some other more general principle that will allow us to predict when the visual system will amodally complete partially occluded volumes into whole volumes and when it will not? The proposed answer to this question for amodally completing surfaces and volumes is ‘complete mergeability’.
- June 29th