The Visual Perception of Human

Motion involves change in position, such as in this perspective of rapidly leaving Yongsan Station

Human observers are particularly sensitive to human movement. For example, adults can rapidly perceive a human form in a display of discrete elements (commonly referred to as “point-lights” and illustrated in Fig. 1a) moving as if attached to the major joints of an otherwise invisible person. Though no explicit contours, textures, or colors indicate the presence of a human form, the visual system is able to extract the structure of a human body from the motion of these elements within a fraction of a second.

Fig. 1. Static illustrations of point-light displays of a walking human figure. (A) An upright figure in which all of the major joints, or points of articulation, of the human form are demarcated. In the actual displays, the gray outline of a person does not appear. It is included here to illustrate the structure of the figure that quickly becomes apparent when the elements move. (B) The same form rotated to an inverted orientation. Though structurally identical to the upright figures, inverted displays are rarely identified as depicting human form.

Since Johansson’s introduction of point-light displays into contemporary perceptual psychology over two decades ago, numerous researchers have attempted to extend general models of the visual perception of structure from motion to account for the perception of human movement from motion-carried information. Under such general accounts, the same visual processes are used to extract the structure of any object. All objects, and all object parts, are thus perceptually equivalent. Indeed, models based on hierarchical vector analysis or assumptions of pairwise rigidity among elements have had some success in accounting for the perception of the complex, jointed structure of a human body.

Nonetheless, these general structural accounts have not been able to mimic either the robustness or the limitations of human performance in the perception of point-light walkers. Adult observers fail to accurately detect, organize, or identify these displays when the figure is presented upside-down. Both upright and inverted displays of human locomotion possess the same hierarchical structure, the same rigid relationships, and the same oscillatory motion trajectories. Under accounts in which perception relies entirely on such properties, the visual analysis of upright and inverted displays should be identical. Human performance, however, is not. Models that posit only general analyses and organizing heuristics are insufficient to account for the orientation-specificity of the perception.

Increasing evidence suggests that the perceptual analyses underlying the perception of human movement may differ from other motion analyses. The visual perception of human movement may benefit from a convergence of motion and form processes that does not occur during other perceptual analyses. Other studies suggest that the visual analysis of human movement differs from other analyses because it depends upon activity of the motor system. To the extent that either or both of these approaches are correct, one can conclude that the recognition of a human actor executing some movement may very well involve a special process that taps domain-specific information about human motor activity.

If such a mechanism exists, can it be best described as a global or local process? Local processes are thought to occur in lower levels of the visual system and to be restricted to brief temporal intervals and small spatial neighborhoods. The results of these “local” analyses are then passed onto and processed by higher level or more “global” mechanisms that process information across larger spatio-temporal extents. While local and global are difficult to define as absolute terms, most studies of the visual perception of human movement have defined local analyses as the computations conducted on individual points (joints) or point pairs (limbs). Global analyses are conducted over larger areas and generally involve an entire point-light walker. For a discussion of local and global factors in visual completion, see Tse (1999) and Van Lier (1999).

Evidence supporting the global analysis of human movement comes from studies demonstrating that observers are able to extract human structure from displays in which visual noise renders local motion information organizationally ambiguous. This suggests that the visual system can exploit configural or global information in the absence of unambiguous local motion cues. Evidence supporting the hypothesis that the visual analysis of human movement relies on local processes comes from a series of apparent motion experiments in which subjects viewed an animated walker within a mask. Since subjects could only discriminate leftward – from rightward-facing walkers under short range apparent motion conditions, their perception of the walkers’ movements appears to have depended upon local motion analyses operating within small temporal windows. More recent research has suggested that neither local nor global processes alone can account for the visual perception of human locomotion. Instead, visual processes at both high and low levels of the visual system appear to make important contributions to the visual perception of human movement. This conclusion suggests that a new approach to the study of the visual perception of human movement may be warranted.

Instead of focusing on the local/global debate, the goal of the current series of experiments was to determine the minimal stimulus information necessary for the visual perception of human movement. In this way we hoped to identify the information that normally triggers processing by the mechanisms thought to underlie the visual perception of human locomotion.

Our approach is based on the assumption that the perception of human movement involves the integration of form and motion information. Neither form nor motion information alone defines the category of human movement. As Johansson’s earliest demonstrations show, observers do not perceive human form in any single static frame of the animated figure even though the configuration of the elements is consistent with a human figure. Conversely, motion alone does not convey human form and movement. When the elements comprising a point-light display are spatially scrambled so that the configuration of the elements is no longer consistent with a human figure, the impression of human or animal form is greatly diminished. Thus, the visual system’s capacity to extract the human form from motion relies on the integration of both form and motion cues.

In order to create varying exemplars of human locomotion, we manipulated two perceptual components that play a fundamental role in the production of human walking: dynamic symmetry among the limbs and the principal axis of organization. Dynamic symmetry refers to the equal and opposite motions of adjacent limbs (either contralaterally or ipsilaterally). During human locomotion, when one limb moves forward, it’s neighboring limbs move backward, anti-phase to the first limb. This anti-phase patterning of limb movements is an invariant of human gait. In the human body, the principal axis of organization refers to the primary structure about which the limbs are organized, namely the torso. Principal axes play an important role in the recognition of objects generally. The structure of an object’s principal axis also appears to distinguish between classes of animals. Thus, the principal axis of organization and the dynamic symmetry of the limb movements are features likely a priori to contribute to the perception of human form and movement.