In Depth Object Views
As we move through the physical world, we are constantly presented with different 3D views of the objects within our environment. Indeed, both observer movement and object movement can cause the visible parts of an object to become occluded and previously hidden parts to become revealed. As a result, we frequently observe different object views with different part configurations. How does the visual system integrate different object views so that we can recognize the same object across changes in viewpoint?
Traditional models of object recognition addressing this question seem to cluster into two general categories: structural-description and image-based approaches. Structural-description models suggest that objects are represented based on the arrangement of their parts independent of the observer’s viewpoint. Image-based models propose that objects are represented as sets of multiple 2D views.
More specifically, structural-description and image-based models suggest different approaches to the recognition of unfamiliar views of an object rotating in depth. Structural-description models suggest that we can recognize unfamiliar views if they have the same part configuration as familiar views. For example, the “geon structural descriptions” (GSDs) model suggests that objects are represented by viewpoint-invariant volumetric primitives, known as geons. Viewpoint-invariance can be achieved when objects can be decomposed into a distinct configuration of 3D parts that does not change with any object transformation. When these conditions are satisfied, object naming and matching can be performed in a viewpoint-invariant manner.
Image-based models suggest that unfamiliar views can be recognized by extrapolating from “known” object views or by interpolating between “known” object views. Psychophysical studies suggest that both humans and monkeys can interpolate between familiar object views and recognize novel views falling within limited generalization fields that span up to approximately 45° from “known” orientations. Consistent with these behavioural generalization fields, many inferotemporal (IT) neurons seem to respond selectively to the training orientation of an object and more broadly to neighboring orientations.
There is evidence that this restricted generalization can be expanded with frequent exposure to multiple object views especially when these views are qualitatively similar. Moreover, the interpolation between “known” views can be facilitated when multiple object views are linked by temporal sequence. As objects rotate in depth, we perceive multiple views in a temporal sequence.
Several studies suggest that temporal contiguity is important for linking multiple object views. For example, viewing structured sequences of multiple briefly presented views of a 3D object facilitates object naming over viewing random view sequences. When presented with three different sequences of faces, each one consisting of five different faces in a different pose, subjects have greater difficulty discriminating between different faces from the same sequence than different faces from different sequences. Also, viewpoint-invariant performance has been shown for rotating familiar and novel objects in a short-term recognition test when the study and the test views share the same parts. Finally, orientation priming is observed across blocks when the prime and the target objects are from the same visually homogenous class, but not when they are from different categories. However, viewpoint-dependent performance has been observed for the same objects in a long-term recognition test. These results suggest that viewpoint-invariance can be achieved in memory for multiple temporally contiguous views of an object.
Furthermore, neurophysiological studies provide evidence for associative mechanisms based on image similarity or temporal contiguity. Cells in the inferotemporal cortex of monkeys become tuned to a small number of dissimilar visual patterns (colored fractal patterns) when these patterns are sequentially paired over many trials in a pair-associate task or when monkeys perform a standard or a delayed matching-to-sample task. These studies suggest that viewpoint-invariant performance can be observed when viewpoint-dependent object representations are temporally associated.
All of the above studies investigated object recognition across changes in the orientation of an otherwise static object. However, outside of the laboratory, changes in object orientation occur when objects and observers move. As a result, we are presented with continuous sequences of multiple object views that are highly similar to each other and close in space and time. Can the visual system take advantage of motion and integrate different views of 3D objects rather than associating static snapshots based solely on their similarity and temporal contiguity?
Previous research suggests that motion can link 2D object views and thereby facilitate viewpoint-invariance within the object’s path of motion. The following experiments investigated whether motion could similarly enhance the integration of 3D object views more readily than temporal sequence and lead to the immediate construction of viewpoint-invariant representations of 3D objects rotating in depth. Can such a motion-based linkage occur even when depth rotations produce different visible part configurations? We were especially interested in how novel object views might be represented. Priming of novel views falling within the object’s motion path would suggest the existence of viewpoint-invariant object representation across depth rotation. In Experiment 1, we asked whether priming would occur for novel views of 3D objects when the prime views were linked by motion. In Experiment 2, we asked whether priming would occur for novel views of 3D rotating objects when the prime views had different part configurations. Experiment 3 examined whether priming would occur for novel object views when the prime views were separated spatiotemporally but perceived in motion when an occluder was placed between them. Finally, in Experiment 4, priming for familiar objects across depth rotations was investigated.
Numerous psychophysical studies have suggested that objects rotating in depth are represented in a viewpoint-dependent manner. As a result, observers show higher error rates and longer reaction times when identifying objects from novel views than from “known” views. For example, depth rotation of familiar objects has been shown to result in slower naming performance as the rotation causes an object to diverge from its canonical view. Performance in naming line drawings of familiar objects rotated in depth is best for canonical views and for views sharing similar image structures. Also, accidental foreshortened views of line drawings are more difficult to identify than non-foreshortened views of the same objects. Consistent with these object identification studies, naming photographs of familiar objects in a priming paradigm shows less priming when the same objects are presented a second time but at a different view. This priming decrease is larger when objects are studied in familiar views but tested in unfamiliar ones.
Similar performance decrements with changes in viewpoint have been reported for unfamiliar objects rotated in depth, such as wire-form objects and photographs of objects made of clay. Also, 3D objects rotated in depth have been shown to require more recognition time as the rotation angle increases.
- July 3rd