Object Constancy and Visual System

OBJECT ORIENTED PROGRAMMING

Visual object constancy is the ability to determine the identity of an object from its variable image. The image varies considerably following plane and depth rotations of the object, due to viewing the object from different positions. Our visual system is said to achieve object constancy because it can usually cope with such variation, by accurately recognising a range of retinal images as depicting the same object. For example, most people would recognise the left four pictures in Fig. 1 as depicting an iron, yet the pictures differ in size, outline shape and in the presence of features and parts.

Fig. 1. Eight different views of an iron. On the left are line drawings, on the right are matched silhouettes. The iron rotates in depth through 0° (top), 30°, 60° and 90° (bottom) views. The 0° views fully reveal the main axis of elongation of the object. The 90° views are so foreshortened that the main axis of elongation of the image no longer coincides with the main axis of elongation of the object, but instead is the vertical axis. The 60° view was rated as the most canonical, typical view of the iron. On the left are the RTs to verify the identity of the line drawings in a speeded word-picture verification task; on the right are the analogous RTs for silhouettes, from Experiment 2 of Lawson & Humphreys (1999).

Although we can achieve object constancy, we are sensitive to image variation and we can accurately report the size and orientation of an object. In addition, speed of object recognition is influenced by the familiarity of a given view, its similarity to views of other objects from which it must be discriminated, and its “goodness” – how well it depicts the object. An aerial view of a house may be recognised accurately, but recognition would usually be slow compared to a street-level view. This is because aerial views of houses are uncommon, similar to aerial views of churches and barns, and “poor” (the view hides the 3D structure of the house and many of its distinctive features). Finally, as it is ecologically important to achieve object constancy efficiently, the visual system has presumably been driven to optimise it, and we may be unaware of the true processing costs involved.

In addition to achieving object constancy, the visual system must discriminate between stimuli which differ in semantically important ways. Improving the achievement of object constancy will often impair discrimination, so the achievement of these two functions will be in conflict, and the visual system must reach an appropriate compromise between them. If the visual system ignores much image variation, it will be easy to achieve object constancy (because the difference in size and shape between dachshunds and alsations can be ignored, to recognise them both as dogs), but it will then be harder to discriminate between different objects (wolves and alsations, which are visually similar but which have different semantic properties), and vice versa.

The achievement of object constancy can thus only be examined in relation to the difficulty of the discrimination task required of subjects. If we only have to distinguish a red cube from a metal cheese-grater and a yellow pool of paint, then simple surface or texture or shape information will accurately identify all three objects, irrespective of the view presented. The achievement of object constancy will be trivial. In contrast, under everyday viewing conditions, there are usually many objects which could be present in any situation, yet we rarely misidentify objects, even if the object is unlikely in that situation (a frog in our bedroom). Although difficult, discrimination is both rapid and accurate, so achieving object constancy is also hard. The difficulty of discrimination depends not just on the number of objects to be distinguished, but also on their similarity. Animals may be harder to recognise than manmade artefacts because, as a category, animals are more visually similar to each other than are artefacts.

Current theories of human visual object recognition acknowledge the importance of accounting for our ability to achieve object constancy and our ability to discriminate between different classes of objects. However, there is little consensus between these accounts about the representations and processes which are involved.

There are three classes of account of the achievement of object constancy. Invariant features accounts suggest that invariant features can be used to distinguish most views of one object from most views of all other objects. A feature is only unique to an object in the context of the set of objects from which that object is to be distinguished, i.e. given a particular discrimination task. Multiple view accounts suggest that the visual system stores several representations of each object, and that a given view is matched to the nearest view-specific, stored representation. Transformation accounts propose that the retinal image can be transformed to reduce differences between the image and a view-specific, stored representation.

Although representation (multiple view) and process (transformation) accounts differ conceptually, they are hard to distinguish empirically. Representations and processes cannot be examined independently of each other. A pattern of performance resulting from a certain representation being stored can be exactly replicated by specifying a particular process. Any well-specified theory of object recognition must describe both the representations stored by the visual system and the processes employed to access those representations. Furthermore, as more interactive models of human information processing become popular (e.g., neural network models), the distinction between representation and process is likely to break down. I have, however, discussed representation and process accounts separately since most theoreticians distinguish between the two. It is worth noting here that there are clear, object-specific effects in recognition. View-specific effects in priming studies are tied to subject’s experience with particular objects. Object-specificity is typically associated with access to different representations rather than the use of different processes.

Invariant features, multiple view and transformation accounts are not incompatible, and it is likely that the visual system employs them all to some degree to achieve object constancy. For example, Jolicoeur (1990) proposed an account of the effects of plane rotation on picture recognition which included all three classes of account. Jolicoeur suggested that plane rotated views of objects are often transformed to match a stored, upright view (transformation account), although plane-disoriented views of objects may also be stored, allowing direct matching of plane rotated views (multiple-views account). In addition, Jolicoeur (1990) suggested that a functionally distinct, feature-based route was also available which enabled pictures to be recognised by matching simple attributes such as colour, texture or shape, many of which are orientation-invariant (invariant features account). This latter route is most useful if distinctive objects are presented, or if a small set of stimuli are presented repeatedly, enabling subjects to learn which features are invariant. Note that subjects do not always use invariant information when it is available, and they may need explicit encouragement or training to take advantage of it.