Views of Shape Space Order Isomorphism
The process of recognition is usually conceptualized as finding a match between a perceptual representation of a given stimulus and the representations (or traces) in memory of previously encountered stimuli. Thus, this notion of recognition is intimately connected to the notion of representation, “the most important concept ever evoked in explaining the mind” (p. 1).
The book starts off rather philosophically, from Edelman’s unshakeable belief in the veridicality of shape perception and his optimism regarding the plausibility of a particular formal theory of it. He quotes classic philosophers like Berkeley, Hume, Locke, Wittgenstein, … and situates his approach in the present-day cluster of philosophy of mind, philosophy of language, and theory of knowledge (e.g., Clark, Cummins, Dretske, Millikan, Putnam). Starting from the age-old problem of understanding what it means to see a cat on a mat, the book is explicitly written by a cognitive scientist, a non-philosopher, “to meet the philosophers (at least the more empirically minded of them) halfway down the road” (p. 2).
Chapter 1 (pp. 1–10) reviews the theoretical and practical difficulties with the attempt to base recognition on geometrically reconstructed representations of distal objects (like Marr’s 2,5-D sketch): Representation by reconstruction first leads to the homunculus problem, and, second, has been shown to be practically impossible, even using the most sophisticated computer vision techniques. Any alternative approach must then state what representation is, if not reconstruction, and, in what sense, if not geometric, it can be veridical. Edelman’s core principle, introduced in Chapter 2, is that representation is representation of similarities.
Chapter 2 (pp. 11–42) contains three rather different sections. Section 2.1 distinguishes four recognition tasks in which representations are needed: (1) identification (i.e., recognition of a previously seen view of an object), (2) generalization (i.e., recognition of an object despite a change in its appearance due to some transformation), (3) categorization (i.e., attribution of an object to a class of similar objects), and (4) analogy (i.e., drawing a parallel between transformations of distinct objects). Section 2.2 attempts to define and formalize the notion of representation. Borrowing from Shepard, Edelman distinguishes between two types of mappings: first-order isomorphism, between objects in the world and their corresponding internal representations, and second-order isomorphism, between the relations among several external objects and the relations among their corresponding internal representations. Because first-order isomorphism, or representation by similarity, leads to the problematic reconstructionist approach, Edelman proposes to follow the formally just as good second-order isomorphism, which he calls “representation of similarity”. Section 2.3 reviews several problems with current computational theories of recognition, such as Biederman’s “Recognition by Components” (RBC) theory, Ullman’s alignment theory, and multidimensional feature spaces.
Chapter 3 (pp. 43–74) introduces the notion of a shape space, a formalism borrowed from mathematical statistics (e.g., Kendall). Shape space is a metric space in which each point corresponds to a particular shape and in which geometric similarity between shapes can be defined rigorously as proximity, a quantity inversely related to distance. Edelman argues that shape similarity can be made not to suffer from the objections commonly raised against a metric-space approach to similarity (e.g., arbitrary choice of features, context dependence, and asymmetries of comparisons). Although a large number of parameters is needed to fully represent distal shape space, fewer may be sufficient if, as in second-order isomorphism, only relations between objects must be represented (e.g., a blending coefficient when morphing the shape of a cow and a pig). To assure a proper mapping from distal to proximal shape space, a perceptual system must carry out many measurements (M) and then reduce the dimensionality of the resulting space (R), while getting rid of the extrinsic variables such as pose and illumination. The perceptual mapping must satisfy the following requirements: (1) the mapping should not collapse any behaviorally relevant dimension of shape variation (i.e., regularity), (2) small changes in object geometry should be mapped onto small changes in R (i.e., smoothness), (3) R should be locally decomposable into view space and shape space components, and (4) R should be locally low-dimensional. With “locally” Edelman refers to a small part of R that contains a class of intrinsically similar shapes.
To realize in practice this theoretically possible veridicality of representation one must find a way to identify the relevant low-dimensional structure within the high-dimensional measurement space. This is done in Chapter 4 (pp. 75–110) for the various recognition tasks distinguished earlier (identification, generalization, categorization, and analogy). The mechanisms that are needed to solve these tasks are based on “navigation” in a shape-space “landscape” with “landmarks”. A novel shape (or a novel view of a previously seen shape) can be localized in shape space based on its similarity to known objects (or stored views of known objects). A tuned unit (or module of units) that responds optimally to some shape (i.e., the landmark) and progressively less to progressively less similar shapes, is enough to implement this mechanism. Edelman uses a radial basis function (RBF) approximation network that can be trained to generalize its response from a series of given views of an object to other views of that object. As a by-product of this learning, this network will also respond progressively less to progressively less similar objects, precisely what is needed for it to work as an active landmark in internal shape space. This scheme not only works for known objects (as in the case of identification and recognition) but also for novel objects (as in the case of categorization and analogy). The trick is that a new object must be represented by a vector of proximities to several reference shapes or landmarks. For example, even if one has never seen a giraffe before, it is possible to compute the proximities to known animals such as a camel, a goat, a pig or even a leopard.
Edelman calls this system a “Chorus of Prototypes” to stress that the prototypes act together in representing shape, in contrast to a Winner-Take-All scheme such as Selfridge’s Pandemonium. In Chapter 5 (pp. 111–143) Edelman provides simulation results of his implemented scheme (called “SiC”, for the combination of second-order isomorphism with a Chorus of Prototypes). He uses a simple system composed of ten prototype modules, each trained on a different reference object (e.g., cow, cat, robot, fly, tuna, Landrover, F16) and then tested on different recognition tasks for a database of 43 additional objects (e.g., another cow, an ox, a calf, a buffalo, and other quadrupeds, along with other fishes, other cars, other airfighters, etc.). The simulation results with SiC are encouraging, especially in light of the wide variety of tasks: identification of novel views of familiar objects, categorization of novel object views, discrimination among views of novel objects, local viewpoint invariance for novel objects, recovery of a standard view and of pose for novel objects, prediction of a novel view for a novel object, etc. None of its theoretical competitors (pictorial alignment, structural matching, or feature spaces) can deal with this variety of tasks.
- August 6th