The Threshold of Representation in Classical and Connectionist Models

Ronald de Sousa

Department of Philosophy
University of Toronto


From  International Studies in Philosophy of Science, Vol. 5, pp. 1-15 (1991) 
Reproduced here by kind permission of the Editor

The word representation, as it has come to be used in cognitive science, allows for a puzzling range of opinions about when it is applicable. Some AI workers are content to assert blandly that thermostats have beliefs, albeit very simple and primitive ones, and so are capable of embodying representations. At the opposite extreme, John Searle has claimed that no computer, however sophisticated, will ever really be capable of genuine intentional representation, even if it can pass the Turing test. But the rejection of the Turing test suggests the further question: if representation can't ever get into a computer, how does it get into the brain?

In this paper, my central aim is to find a principled criterion, along lines that make biological sense, for deciding just when it becomes theoretically plausible to ascribe to some process or state a representational role. Some representations intuitively involve a "mind to world" direction of fit: they aim at knowledge or true belief. Other representations involve "world to mind" direction of fit.1 These are typically rules or instructions that the system is supposed to follow in order to reach a usefully different state. Although the problem of what counts as genuine representation arises for both types, my focus in this paper will be on representations in the context of a mind to world direction of fit (though in that phrase `world' must be taken to include mind). I shall be particularly concerned with how representation is to be understood in relation to a certain conception of connectionist architecture, namely that recently defended by Paul Smolensky (1989). At the end of the paper, I shall offer some speculations about the consequences that accepting such a connectionist architecture might have for the level of understanding we might legitimately hope for in cognitive science.

Seeing, Cooking, Riding

A first approach of the problem I have in mind can begin with David Marr's beautiful work on vision. This work is based on the assumption that vision is a complex computational task, in which information at each of several levels is processed in such a way as to yield a differently organized level of information. At the first level, for example, the array of light intensities on adjoining pixels is first blurred, to a larger or smaller extent, and the blurred image is then massaged to detect edges. The process of edge detection essentially involves taking the second derivative of the gradient of brightness along the two dimensions of the projected image. The first derivative measures the slope of the gradient of intensity, while the second identifies zero crossings, that is, the loci of points at which the gradient of intensity turns from positive to negative or vice versa (Marr, 1982).

All this is easy to program, providing we know how to feed into the machine both the relevant formulae and the data relating to the grey level image. But what is the relation between what is being thus modeled on the computer and the phenomenon it models? The question arises, in particular, in what sense we must suppose that the eye -- and the computer in which it is modeled -- is equipped with the formula that we have programmed into the computer. Must we suppose that the eye knows calculus?2

There are two paradigms at opposite poles. The interesting cases lie somewhere in between. At one end, think of a falling stone; at the other, think of a person cooking, who is explicitly appealing to a recipe that has been memorized. The first illustrates that merely conforming to a physical law is not knowing it: if it were, then stones would have to know calculus too, in order to be so flawlessly computing their velocity, then executing it at every instant of free fall. On the other hand, the ability to formulate a rule, together with the ability to give a justification for it and to apply it at will, is uncontroversially sufficient for an attribution of knowledge, intentionality, and representation.

Here are some problematic intermediate cases:

Now take the case of walking: When we walk, we may assume, whatever module controls our behavior has been refined by evolution to do at least the following: (i) transform the raw sensory input into relevant information about the environment; (ii) process that information along with proprioceptive information about, for example, the postures of the body, in such a way as to (iii) produce output consistent with the relevant kinematic laws. Must we then infer that when we walk, we exhibit knowledge of the laws of kinematics? Similarly, to ride a bicycle properly apparently involves conforming to the following rule: Turn the handle bars so that the curvature of your trajectory is proportional to the angle of your unbalance divided by the square of your speed. (Johnson-Laird, 1988)p. 195.) Are we then, when we walk or ride a bicycle, more like a falling stone, or are we more like a soap bubble computer, or are we really just like someone who cooks by following a recipe, except that we are not conscious of the recipe? Must we assume that in order for the brain to perform such tasks, some sort of representation is involved? If so, what counts as a representation, and in what sense is it present in the brain?

We obviously don't want to say, simply, yes: that we have knowledge of the laws of kinematics, or that the soap bubble knows calculus, or the eye has knowledge of calculus. To say that would seem almost a kind of joke, in the same way as we might remark on someone's impeccable mastery of the laws of gravity when he (involuntarily) falls over.

But why not? Involuntarily falling over is not behavior, and it seems intuitively a requirement on the attribution of knowledge on the basis of behavior that it really be an instance of behavior which the knowledge is supposed to explain. That common-sense observation, as we shall see, lies close to the right answer to my central question.

When we compare the computer's "knowledge" of explicit rules with that of the cook, the suspicion may arise that the two are no more than homonymously related. I shall shortly ask what exactly is involved in saying that a machine is applying an explicit rule. But first, a preliminary question must be addressed about the relation between computation and representation

Computation and Representation.

The theory of mind inspired by classical model of AI is frequently referred to as computationalist. The force of this term is this: mental activity is conceived on the model of a sequence of rule-governed manipulation of certain units of fixed types, according to certain syntactic rules. The power of a universal Turing machine to compute anything computable and to imitate any other machine capable of carrying out an effective procedure lends promise to this thesis, serving as a guarantor of the power of computationalist models. But such power would be empty unless the possibility existed of interpreting the syntactic counters in question -- unless, that is, they could be viewed not merely as syntactic counters but also as representations. Should we then say that representation and computation are equivalent conditions?

An objection to this might be that while syntactic manipulation -- pure computation -- would indeed be pointless without representation, the latter might exist without syntax. Is not a photograph, for example, a kind of representation without syntax? Perhaps analog representations in general don't really have a syntax. For that matter, Douglas Hofstadter has suggested that thinking itself (which presumably involves representation) occurs at a level completely different from computation:

Of course, in any computer-based realization of genuine cognition, there will have to be, at some level of description, programs that shove formal tokens around, but it's only agglomerations of such tokens en masse that, above some unclear threshold of collectivity and cooperativity, achieve the status of genuine representation. (Hofstadter, 1985) p. 649) This seems to me a non-sequitur. There is no particular reason why there should not be computation at different levels, including the level of explicit and conscious thought. It is clearly true, however, that thought is not necessary for computation.

What Hofstadter means, I surmise, is that real thought actually doesn't consist in the strictly formal manipulation of symbols. This is an issue I shall come to presently. In any case, I shall take for granted in the following pages that while all computation would have to involve representation, the converse may not hold.

Let us look, then, at some putative requirements for the existence of representation.

Explicitness: A manual in the head?

First, does every rule that is represented have to be explicit? The distinction drawn a couple of pages ago between those systems that involve explicit rule following and those that don't can apply within a system as well as between systems. In fact, as Lewis Carroll showed in his fable of Achilles and the Tortoise (Carrol (n.d.)), it is impossible for a system to convert every rule to which it conforms into an explicit premise.

Achilles, you will recall, presents the Tortoise with an instance of Modus Ponens. The tortoise professes to accept the premises,

(1) p, and
(2) if p then q
but thinks it insufficient to draw the conclusion q. Achilles protests that if the Tortoise accepts (1) and (2), he must accept q. But if that is indeed the reason that the Tortoise should infer q, it seems reasonable to demand that this reason in turn be made explicit as a premise: (3) if p and (if p then q) then q. But, of course, once that first demand is granted, there is nothing to stop a similar demand being reiterated at every step, and the prospects of persuading the Tortoise grow more and more remote, as the inference becomes ever less perspicuous.

In every computational engine, that fact must of necessity be taken into account. That actual execution of any program rests on what Pylyshyn terms a functional architecture, that is, a repertoire of "basic computational resources" (Pylyshyn, 1985, p. 259), components of the semantic engine that just work in a certain way not because they have been programmed to do so but merely because of the laws of nature to which they are subject.

Robert Cummins has clarified this distinction in the following terms. We need to distinguish, he suggests, between an analysis of a process and a cause of the process's actually occurring. The analysis of the process will describe what occurs; but the description won't necessarily have any causal power on its own. By contrast, the instantiation of a process (such as the soap bubble computer instantiating a solution to a certain least path problem) needn't contain any analytical description of that process. Now call the mere instantiation by a system S of a process P, "E-representation."

When S executes P, events take the course they do because S is structured in a certain way, but S is not structured in that way because P is E-represented in S, for to say that P is E-represented in S is just another way of saying that S has the structure in question. (Cummins, 1983, p. 46) In that situation, an E-represented program cannot be thought of as an internal manual that guides S through the steps in the causal sequence leading to output. It is simply an indirect (i.e. an interpretive) specification of that causal sequence.... (A) program may be executed by S even though S does not represent it: execution of P by S guarantees representation of P in S iff P is physically instantiated in S.... A cook may therefore execute a recipe that is not represented in the cook or anywhere else, and this need not undermine the explanatory point of appealing to such a recipe. (Cummins, 1983 pp. 46-7) E-representation is a necessary condition of all other sorts of rule representation, in the sense that no system capable of rule following of any kind could exist without it. But that leaves it open whether it is to count as genuine representation. On one side, E-representation may seem to be "representation" only in the most Pickwickian sense: if a rule is really represented, we might want to say, it must be represented as a recipe is represented when it is explicitly referred to and used. A true representation must be not merely an analysis, but a causal factor in the production of the process which it governs.

On the other side, that intuition doesn't really take us very far towards a criterion of demarcation. For some E-representations are worse candidates than others. Consider this example from Hofstadter: "When a computer's operating system begins thrashing... at around 35 users, do you go find the systems programmer and say, "Hey, go raise the thrashing-number in memory from 35 to 60 okay?" No, you don't." (Hofstadter, 1985, p. 642) Now the "thrashing number" is represented in the machine in some sense: it is "embodied," we might say, in the hardware, and could be changed by modifying the configuration, adding memory, or whatnot. That is enough for E-representation; but surely any claim to genuine representation in that case is worse than that of the soap bubble computer. Some other cases of E-representation, by contrast, might be genuine representation after all. We need to look for further relevant conditions.

Besides explicitness, which I have just rejected, three other conditions might plausibly be imposed on what is to count as genuine representation: consciousness, syntactic and semantic complexity, and digitality.


Few people now take consciousness seriously as a condition of genuine intentionality. But many have -- such as Descartes -- and some -- such as John Searle (1980) -- still do. Much of what is most characteristically described as thinking, however, goes on without consciousness. Some sort of problem solving is involved in many responses to environmental stimuli, ranging from relatively complex tasks such as maneuvering a car through traffic to reflex adaptations to particular circumstances, such as the eye's capacity to orient its focus in response to detection of motion in the peripheral visual field. Sometimes the sorts of rules that govern our behavior, even when in some sense they seem to be the sort of thing that might be made conscious -- such as the bicycle riding rule -- are clearly unusable in that form. And even many of our more usable rules -- the rules of probability, for example -- are in fact much easier to learn to apply mechanically than they are to assimilate to the point of being used "naturally." Perhaps the rules of probability are in some sense "represented," or rather implemented, in us, but not in a way that we can, as it were, make contact with by learning them or bringing them to consciousness (see Kahneman and Tversky, 1982). Much the same is true of the "rules" of grammar and phonology that underlie our ability to produce and understand speech. Most conclusive in this regard are the experiments of Lackner and Garrett (1973) on dichotic listening, in which the input to the unattended channel, which is quite inaccessible to consciousness, nevertheless contributes to the disambiguation of the input to the "conscious" channel. In short, we have come to accept as a commonplace Lashley's dictum that "no mental activity is ever conscious." Certainly, then, some mental representation need not be.

Syntactic/Semantic Complexity

In the classical computationalist paradigm, the role of physical symbol systems (see Newell, 1980) offers a neat way of explaining the origin of representation in causal systems, without running afoul of Cummins's distinction between analysis and implementation. The central doctrine on which this rests is that complex mental representations are built up of simple parts in accordance with syntactic rules, and that the semantic content of a molecular representation is a function of the semantic contents of its parts and its syntactic structure. Mental operations are governed by the structure of the representations to which they apply (Fodor and Pylyshyn, 1988, pp. 11-12). In this scheme, "the physical counterparts of the symbols, and their structural properties, cause the system's behavior." (14) They do so, ultimately, because of their functional architecture, that is, in virtue of their causal properties.

Fodor and Pylyshyn emphasize that this way of conceiving of thought assimilates it, in effect, to the process of formal inference:

It would not be unreasonable to describe classical cognitive science as an extended attempt to apply the methods of proof theory to the modeling of thought (and similarly of whatever other mental processes are plausibly viewed as involving inferences.) (Fodor and Pylyshyn, 1988, p. 30). This view has great advantages. Notably, it accounts for the productive or generative character of representations: it explains our capacity to understand an indefinitely large number of representations -- typically sentences -- on the basis of a finite stock of elements and structural features, as well as our capacity to cut distinctions finer de dicto than de re. (ibid.) But there are two large problems. The first is the one so constantly harped on by Searle: the representational element in a physical symbol system is merely canned: it comes entirely from the interpretation that the user places upon the symbols, and not from anything that can be attributed to the machine "intrinsically." Since the model fails to incorporate any intrinsic representation, it cannot therefore be of any use in our attempt to understand how genuine representation gets into the brain.

The other problem is that the production system model is an implausible one for much of what we humans -- as animals -- can do. (It is a pregnant irony that computers are now relatively good at some of the reasoning tasks that Descartes thought the secure privilege of humans, while they are especially inept at the "merely animal" functions that he thought could be accounted for mechanically.) Bicycle riding is only one of a myriad tasks that can't readily be learned by applying any explicit rule even when we know what the rule should be. Moreover, there are also plenty of tasks, particularly those that can be grouped under the general heading of evidential inference, for which we have not even been able to devise appropriate rules, let alone apply them. Fodor and Pylyshyn admit they don't take care of "evidential" inferences. But they seem to think this is no special difficulty for the classical paradigm, because "the problem about evidential logic isn't that we've got one that we don't know how to implement; it's that we haven't got one" (Fodor & Pylyshyn, 1988, p. 30). But that's just it: if we have failed in so many efforts to discover the formal principles of evidential logic, that's possibly because we don't use formal principles in drawing such inferences.

To lay too much stress on such facts, one might object, is to fall prey to the delusion of phenomenology: just because we aren't aware of applying such rules doesn't mean we are not doing so. I have already argued, after all, that representation needs neither consciousness nor explicitness. Perhaps we haven't found the right inferential rules because we haven't looked enough. That seems to be Fodor and Pylyshyn's view: "That infraverbal cognition is pretty generally systematic seems... to be about as secure as any empirical premise in this area can be." (41). Their reason for that act of faith is that even "animal thought is largely systematic: the organism that can perceive (hence learn) that aRb can generally perceive (/learn) that bRa." (44).

To be sure, the appearance of systematicity must be explained. In itself, however, systematicity requires something less than the strict compositionality to which Fodor and Pylyshyn assimilate it, and which derives from the Language of Thought hypothesis. What Fodor and Pylyshyn are actually demanding here, I suspect, is something else than mere systematicity. It is digitality.


Digitality is actually one of Plato's discoveries. In essence it is the substitution of a three-term relation for a two term relation in the real definition of resemblance. Under what circumstances would that be particularly useful? The answer is: when you want a taxonomy to remain stable through multiple replication. Any public information needing to be reproduced over and over again will degrade hopelessly fast unless it is digitalized. If you want to reproduce some complex thing, copying exactly is an unattainable ideal. However careful you may be, errors will creep in. And those errors will be additive, taking you on a random walk which will inexorably lead arbitrarily far from its origin. On the contrary, if we compare each thing to a paradigm and not to the latest in a series of copies (providing the paradigms have been suitably chosen to avoid ambiguity), reproductive errors will not accumulate.

That, I suggest, is what lies behind the insistence of Fodor and Pylyshyn and other defenders of the classical paradigm that there must be a language of thought that is systematic. But if this is right, then their argument is just slightly beside the point. For then what you really need the digital processes of language for is the cultural transmission of information, but not necessarily for that information that is processed internally.3 (Smolensky, 1988, p. 4a). Because language is, at different levels, (relatively) digital, it can embody the "cultural program" running on the individual "virtual machines" constituted by individual members of a social group (ibid.). But that leaves open the possibility, advocated by the Connectionists, that there exists a different mode of information processing, either in addition to or underlying the performance of that part of the brain that functions like a production system. That "intuitive processor" must also, of course, be systematic, but it doesn't have to be digital. Causal laws of nature, after all, are not digital, but they are highly systematic.

Smolensky's challenge to the role of language.

Smolensky (1988) is meant to address both of the problems faced by classical architecture. He suggests that the "intuitive processor" which is "presumably responsible for all of animal behavior and a huge portion of human behavior" (5a) has a connectionist architecture, and claims to explain the genesis of genuine representation.

For our purposes, the crucial features of connectionist architecture are these:

A connectionist network is a network of nodes, typically arranged in several layers one of which is labelled the Input layer and another of which is the Output layer. All the nodes in any one layer are connected to all the nodes in the adjoining layer or layers, but the connection strength can be varied, so that the level of transmitted activation (or, in stochastic versions, the probability of transmission) can vary between 0 and 1. In the case of intermediate layers, there may be no direct access to their pattern of activation at any particular time. Activation of any particular node is determined by the activation of the nodes to which it is connected, and typically depends on its own particular threshold. Given that threshold, whether or not a node is active is determined by the sum of the active nodes to which it is connected, weighted by their connection strengths. The interest of such networks is that the initial connection strengths can be assigned at random, and that their input-output functions can be modified by certain systematic tricks, such as "backward error propagation," (Rumelhart et al., 1985) which result in modification of those weights or connection strengths. In that way, the behavior of the connectionist network as a whole can be modified by learning, and the network as a whole can be said to store information in its connection strengths.

the numerical activity values of all the processors in the network form a large state vector. The interactions of the processors (are given by) an activation evolution equation (involving the connection strengths.) In learning systems, the connection weights change during training according to the learning rule, ... the connection evolution equation. Knowledge in a connectionist system lies in its connection strengths." (Smolensky, 1988, p. 6a-b). Now many expositions of the ideas of connectionism use illustrations that are in some essential respects misleading. Such is, for example, the "Jets and the Sharks" example in McClelland, Rumelhart and Hinton's introductory paper in Rumelhart & McClelland (1986) (p. 27), or the examples of word recognition by means of features illustrated in fig. 1.

What is misleading about these examples is that the nodes represent identifiable features of the objects to be recognized, and that these features have, in effect, have been just as securely "canned" as any category embodied in a classical von Neumann computer program. But the prospect that Smolensky has in mind is actually a good deal more radical. He rejects both the classical or "implementationist" position, according to which connectionist networks may simply be ways of executing a classical von Neumann type program, and also any "eliminativist" position which would aim to do away with the level of conceptual computation altogether. Instead, he claims that

There will generally be no precisely valid, complete, computable formal principles at the conceptual level; such principles exist only at the level of individual units -- the subconceptual level.... (This is the) Subconceptual level hypothesis: Complete, formal, and precise descriptions of the intuitive processor are generally tractable not at the conceptual level, but only at the subconceptual level.... The intuitive processor is a subconceptual connectionist dynamical system that does not admit a complete, formal, and precise conceptual-level description." (Smolensky, 1988, pp. 6d, 7a, and cf. 64a) Now Smolensky actually claims that the subsymbolic paradigm can also explain the knowledge used by the "conscious rule interpreter" (14c); in other words, even the linguistic and logical activity that is the special target of the classical model may actually be only an emergent outcome of subconceptual activity most accurately described in terms of the subsymbolic paradigm. That issue is particularly controversial, and I shall ignore it here. I shall ask instead why it might be that there is such a yawning gap between the "conscious rule processor," for which classical computational architecture seems to be so successful, and the "intuitive processor" for which that architecture has proved, thus far, so remarkably recalcitrant.

The LIFO-FILOcal Principle

Actually, both the glories and the miseries of classical architecture models are striking. They can be explained, I suggest, in terms of the LIFO-FILOcal principle: Last In, First Out/First In, Last Out. What we understand best about our minds is what our brains invented last; while we are likely to pierce last the secret of those skills that our brains evolved first.

The glories of the symbolic paradigm derive from the power of logic and sequential reasoning: a piece of intellectual technology more fabulous than the wheel, and more recent. It is an elaboration of what was, as far as I know, an invention of Aristotle's: namely the invention of formal linguistic representations of abstract patterns of inference. That idea, which lies at the core of classical Artificial Intelligence, is in turn based on Plato's discovery of a world of abstract objects that can be modeled in the real world, but never identified with anything in the real world. The modern concept of functionalism was anticipated by Plato's notion that certain realities can indifferently be modeled in many different material supports, because they are not ultimately identifiable with any of them. We now know, thanks to Church and Turing, that the elaboration of Aristotle's discovery is so powerful that one type of device -- the universal Turing machine -- can be made to do anything, given world enough and time, that is effectively computable.4

Now it's that last, incredibly recent achievement (which it is impossible to imagine without language) that computer science naturally has tackled first, not only because it is so powerful but also because it is (relatively) so easy. And it is easy, of course, because we have invented it (or discovered the basic principles that govern it). By the same token, however, these are not procedures that we are naturally very good at: it takes a lot of practice to do a little strictly deductive logical inference, and even then we make mistakes.

On the other hand, such skills as seeing and moving are things we're very good at: we've been practicing those for millions of years. We find it very easy to see, but enormously more difficult to discover the procedures that we use to do it. For several reasons we shouldn't wonder at this: they are more deeply buried, and we didn't consciously make them up. Because they evolved, moreover, we cannot expect them to be especially simple. For natural selection is the ultimate anarchy of hackers, and every programmer knows how a program that a few dozen hackers have tinkered with, let alone a few million, can become hopelessly opaque. The devices hacked together by evolution will sometimes be baroque in the extreme.

Such, then, we should expect Smolensky's "intuitive processor" to be. And among the reasons for thinking that this intuitive processor works along the lines of a connectionist machine is the fact that such networks have, in some simple cases, apparently been able to perform some categorization, at various levels of supervision. And such categorization presumably involves representation. Whether such representation can be labeled "intrinsic" or genuine will presumably depend on the exact nature of the grounds for its ascription.5 But remember that on Smolensky's view there are not just two levels -- the conceptual level, and the level of hardware -- but three: the conceptual, the neural, and in between the subconceptual level, which according to his hypothesis is best to be described in terms of the subsymbolic paradigm. And that paradigm differs from the symbolic paradigm in the relation of the conceptual level to lower levels. In the symbolic paradigm, there is no switch to a lower level: lower levels are simply either subroutines at the same (conceptual) level, or else they are merely implementational, and of no more conceptual relevance than the weight of the computer is to the program it is running. By contrast, "subsymbolic explanations rely crucially on a semantic (`dimensional') shift that accompanies the shift from the conceptual to the subconceptual levels":

The relationship between subsymbolic and symbolic models is more like that between quantum and classical mechanics. Subsymbolic models accurately describe the microstructure of cognition, whereas symbolic models provide an approximate description of the macrostructure. (Smolensky, 1988, p. 11b) Specifically, as implied above, "the inputs and output of the system are not quasi-linguistic representations but good old-fashioned numerical vectors" (11d).

But this difference returns us to our central problem: if the intermediate level involves numerical vectors, why is it -- as both Smolensky and his opponents, Fodor and Pylyshyn, agree (see Smolensky, 1988, p. 14d; Fodor and Pylyshyn, 1988, p. 11) -- a genuinely representational level, and not merely an underlying physical level? Why is the "representation" involved at that level not merely a form of E-representation analogous to the stone's "representation" of gravity?

Why is the subconceptual level representational?

Smolensky's answer is in terms of two factors: teleology, and complexity:

A necessary condition for a dynamical system to be cognitive is that, under a wide variety of environmental conditions, it maintains a large number of goal conditions. The greater the repertoire of goals and variety of tolerable environmental conditions, the greater the cognitive capacity of the system.... Complexity is crucial here. A river (or a thermostat) only fails to be a cognitive dynamical system because it cannot satisfy a large range of goals under a wide range of conditions." (Smolensky, 1988, p. 15a). But the point about complexity is a misdiagnosis. Complexity has nothing to do with the fact that a thermostat or a river fail to be a cognitive system: the reason for that is the Searle point: the thermostat has no intrinsic intentionality, since any purpose it might have is strictly that of its designers or its users. Whatever the ultimate force of his Chinese Room argument, Searle (1980) is right to see no reason to think its force diminished if all that's added to the thermostat's intelligence is complexity, even to the point of passing the Turing test.6

As for the criterion of teleology, it is not immediately obvious how it can escape the force of Searle's problem either. For what, in this context, is a "goal condition"? In relation to what or whom is it a goal? If the goal is defined by reference to some external factor, then it's not intrinsic. But in virtue of what can it be claimed to be intrinsic? And besides, what scientific legitimacy can we allow to the notion of a goal, where this is explicitly not assimilable to anyone's conscious or explicit intention?7

The suggestion needs supplementation with an analysis of teleology, in a more explicitly evolutionary perspective.

An Evolutionary Criterion

The kind of account we need is one popularized in the recent literature on teleology,8 where the notion of a goal-directed process is analyzed in terms of a certain etiology of the process in question. Very roughly,

a process P has goal or function G, iff P is in existence (whether originally brought into existence or now maintained in existence) because processes of kind P have generally had as a consequence events of type G. This definition is neutral as between evolutionary, intentional, and artifactual goals or functions (see Wright, 1973). For a device or organism to have intrinsic representation, however, we must require that the etiology of teleology be specifically evolutionary.

Just such a criterion has recently been suggested by Mohan Matthen (1989) to illuminate the notion of semantic content. He introduces it to address what he calls the "Parallelism problem": why is it the we are caused to go from one belief to another in a way that matches the logical relations between the contents of the respective beliefs? Matthen points out that the Fodor and Pylyshyn answer to this problem -- the requirement of compositionality discussed above -- only yields a partial answer. It doesn't actually explain why syntax follows logic: why "the syntactic rules we follow happen to be rules that have a measure of semantic validity." (Matthen, 1989, p. 563) Matthen's answer is that we must understand the very notion of content in terms of evolutionary selection. Specifically, in typical cases, selection will favour a correlation between the content of a detector state and the content of a corresponding effector state. (Matthen, 1989, p. 567). Crudely put, if you do what's relevant to your situation, you'll survive; if your inference patterns lead to irrelevancies, you will not.

Now this explanation is designed to solve a problem about Smolensky's "conscious rule interpreter". But clearly the same evolutionary explanation can be extended to the "intuitive processor." And this idea can at last allow us to differentiate between the eye's "knowledge" of calculus and Smolensky's intermediate subsymbolic level, on the one hand, and the mere conformity to physical law exemplified by a falling stone or a soap bubble computer on the other. It gives a rationale for our initial common-sense intuition that just falling over couldn't require knowledge or representation of any laws of kinematics, because it wasn't behavior. The first two sorts of device, but not the last two, are there because they serve certain functions. (One example given by Smolensky is the "prediction goal: Given some partial information about the environmental state, correctly infer missing information" (Smolensky, 1988, p. 15b.) Similarly, Marr's zero-crossing detectors are, of course, at some level of analysis just the working out of certain physical laws. But the presence of cells that behave just so must be accounted for in evolutionary terms. Or at least that is the hypothesis on which the claim that they are truly representational must rest.

Understanding understanding

I have tried to contribute to an elucidation of the general conditions under which it is reasonable to ascribe genuine representation. The resulting criterion has, in particular, supported Smolensky's claim that we may need, to account for our infra-linguistic capacities, a level of analysis which is above the physical, because is instantiates genuine semantic or representational characteristics, but which cannot be identified with the usual "conceptual" level more directly explained by the symbolic paradigm.

This perspective may, however, entail an interestingly high price to pay in terms of the level of understanding to which we may aspire with respect to our own mental representations.

Lord Kelvin once wrote:

I never satisfy myself until I can make a mechanical model of a thing. If I can make a mechanical model I can understand it. As long as I cannot make a mechanical model all the way through I cannot understand. (Quoted in Johnson-Laird, 1988, p. 24). But now perhaps the tables are turned: in relation to cognitive science, it may be that even when we can build a brain we will not be able to understand it. For consider the following: For a representation to qualify as being understood by an epistemic agent, the agent must be able to perceive an adequate proportion (of course, not necessarily all) of the interrelations among elements of the set. But as the "mind's dictionary/encyclopedia" grows, it becomes much more difficult just to search, even with cataloguing and cross-referencing of its propositions or theories.... The universe may be not merely inhumanly complex, but "transcendentally" unmanageable for any physically realizable entity, for example, an ideal computer occupying the twenty billion light-year radius and twenty billion year age of the universe. (Cherniak, 1986, p. 128) Perhaps our understanding of the brain and its subsymbolic activity is in just that position.

Lord Kelvin's remarks conjure up an image of a do-it-yourself world, where you could build anything you could design in your home lab. Maybe, in the relevant sense, such building just isn't possible any more. But the kind of building that was envisaged by the classical AI program was certainly of that kind: you could understand everything, precisely because you could build it out of perfectly transparent, Platonic, logical devices. Not so on the connectionist program, where the builders are sometimes so proud of not having programmed the machines that they in effect boast of not understanding exactly how they work -- even though they built them. So building something is perhaps no longer a sufficient condition of understanding.9,10 The best we can do is to understand it "in principle," rather as we understand certain physical events in principle in terms of gravity, wind resistance, friction, etc., without actually being able to read all the relevant parameters precisely enough to get any prediction.

Connectionism may yet solve the two hardest general problems of cognitive science -- the technical problem of modeling intuitive knowledge, and the philosophical problem of how the meaning gets into the brain. But it may have bought those solutions at a high price: the price of a radical lowering of the standard of understanding we can expect in cognitive science. Part of this lowering of expectations can be attributed to the stochastic nature of neural activity which underpins cognition: in that sense, the situation is much as it is in classical thermodynamics in relation to statistical thermodynamics: we know about pressure well enough, and we know what sorts of statistical events underlie the phenomena of gas volume and pressure. But we don't know in detail what happens in any particular case, and couldn't possibly ever do so. On the other hand, we don't much care, either: and that's because the reduction in that case is complete. If we are interested in pressure and volume, accurate statistical information will be enough to tell us all we want. (We won't be able to say even this much if we are dealing with processes that are sensitive to quantum effects.) If Smolensky is right, however, and the level of cognitive discourse is only imprecisely predicted by the hard formalism that governs the subsymbolic level, then it seems we must adopt Steve Stich's verdict that "in those domains where connectionist models prove to be empirically superior to symbolic alternatives, the inference to be drawn is the mental symbols do not exist" (Stich, 54). But if that's so, and if, given the requirements of our linguistic code, we will forever continue to need to talk as if they did, then we will have one more principled reason for thinking that in some strong sense we can never understand ourselves.


1.  For this notion of "direction of fit," see Searle [1983]. But the idea and the terminology goes back, I believe, to Anscombe [1976].
2.  For this way of encapsulating my question, I am endebted to Rebecca Kukla.

3.  Peter Rosen-Runge has suggested to me that we probably do need ways of storing information digitally even for internal purposes. Giving up this thesis would in effect amount to giving up the conceptual apparatus of information processing altogether, and replacing functionalism with physicalism.  He may well be right; but no matter what account we adopt, there will be some level of analysis where we shall have to "descend" from the information processing to the physical level: the problem raised in this paper is precisely that of identifying and understanding that point of transition.  Smolensky [1988], whom I discuss below, proposes that there is a level between the physical and the mental or "conceptual"; but at this point it should remain an open question whether such a level must be digital or not.

4.  And that, of course, is an impressive amount: so much, indeed, that it's easy to remember that what's effectively computable can't be shown to be, or not to be, everything that can be done. It is only, by definition, what can be done by those sorts of methods. That fact, however, is not one from which connectionists can derive any feeling of superiority, since anything they can do must be something that a computer can simulate. But it still remains logically possible that some crucial operations of the mind may turn out not to be identifiable with any effective procedures, ie not to be computable; though to try and imagine clearly what such operations might be like necessarily has a kind of self-defeating quality of trying to imagine the unimaginable. For to imagine something clearly is to bring it within the aegis of the "conscious rule interpreter," within the very reach, in other words, of that mode of information processing which we are trying to imagine it transcends. (On this sort of epistemological-metaphysical bind, see the last chapter of Cherniak [1986].)

5 .  In any machine learning task, one can distinguish various levels of "supervision":  At the lowest level of performance (the most stringent supervision) a "correct" output exists for every input, and the device is told just what that output should have been. (An impressive demonstration of a network learning to read by Sejnowski and others at Johns Hopkins operates at this level. The network is presented with a written text, and it learns to turn it into a sequence of phonemes. At each learning pass, however, it is told, in effect, just where it went wrong.)  A higher level (less supervision) is represented by the case where the network is told merely whether it succeeded or failed, but is not told in what respects it failed.  At both the previous levels of performance, it is assumed that there is a pre-existing list of categories which one wants the machine to "discover". At the highest level, however, the machine would be left to discover its own categories, guided only by certain very general "values." This sort of learning has been claimed for the "Darwin III" automaton devised by George Edelman. (Reeke and Edelman [1989], 167)   The task facing a brain new to the world, and whose cognitive categories are not pre-ordained, would seem to belong to the third and highest level. At that level, moreoever, Searle's objection would no longer have any force, for the categorization would quite literally be the device's own, and not be imposed from outside.
6.  But see footnote 9 below.

7 .  "I intend the teleological, rather than the intentional, sense of "goal"." (Smolensky [1988] 14d).
8.  By such writers as Charles Taylor [1964],  Larry Wright [1973], Jonathan Bennett [1976] and others.
9.  In any case, that sort of understanding certainly admits of degrees: cf. being able to assemble from a kit, being able to fix, both contrasted with having designed, invented, AND built some machine.
10.  For this, complexity is indeed to blame. Even in logic and mathematics, however, complexity may be of the essence of understanding and its limits. Do we really understand, in ways Lord Kelvin would have recognized, the computer generated proof for the 4-colour theorem?


Anscombe, G.E.M (1976) Intention (2nd ed.) (Ithaca, NY, Cornell University Press).

Bennett, Jonathan (1976) Linguistic Behavior (Cambridge, Cambridge University Press).
Carrol, Lewis (nd) "What the Tortoise said to Achilles," in The Complete Works of Lewis Carroll (New York, the Modern Library, pp. 1225-1230.)

Cherniak, Christopher (1986) Minimal Rationality. (Cambridge, MA, MIT Press: A Bradford Book)

Cummins Robert (1983) Psychological Explanation. (Cambridge, MA, MIT Press: A Bradford Book) (see esp. II.3: "Representation and internal manuals".)

Dewdney, A.K. (1985) "Analog Gadgets That Solve a Diversity of Problems and Raise an Array of Questions," Scientific American 252, 5 18-24.

Fodor, Jerry A. and Pylyshyn, Zeno (1988) "Connectionism and Cognitive Architecture," in Connections and Symbols, ed. Steven Pinker and Jacques Mehler. (Cambridge, MA, MIT Press) (=Cognition Special Issue, 28.)

Hofstadter, Douglas (1985) "Waking Up from the Boolean Dream," in Metamagical Themas (New York, Basic Books).

Johnson-Laird, Philip (1988) The Computer and the Mind: an Introduction to Cognitive Science (Cambridge, MA, Harvard University Press).

Kahneman, David & Amos Tversky (1982) "On the study of statistical intuitions," in Kahneman, D., Slovic, P., Tversky, Al, eds. Judgment Under Uncertainty: Heuristics and Biases (Cambridge and New York, Cambridge University Press).

Kosslyn, Stephen M. and Gary Hatfield (1984) "Representation without Symbol Systems," Social Research (1984) 51, pp. 1019-1045.

Lackner, J.R. and M. Garrett (1973) "Resolving ambiguity: effects of biasing context in the unattended ear." Cognition 359-372.

Marr, David (1982) Vision (San Fancisco, Freeman).

Matthen, Mohan (1989) "Intentional Parallelism and the Two-Level Structure of Evolutionary Theory," in Issues in Evolutionary Epistemology, ed. C. Hooker (Albany, SUNY Press).

Newell, Allen (1980) "Physical Symbol Systems," Cognitive Science 4, pp.135-183.

Pylyshyn, Zenon (1985) Computation and Cognition 2nd ed. (Cambridge, MA, MIT Press: A Bradford Book).

Reeke, George and Edelman, Gerald (1988) "Real Brains and Artificial Intelligence". In Artificial Intelligence, special issue of Daedalus, Winter 1988. Republished by MIT Press 1989.

Rumelhart, D.E., G.E Hinton, and R.J. Williams (1985) "Learning Internal Representations by Error Propagation", in Rumelhart and McClelland (1986)

Rumelhart, David E, James McClelland, and the PDP Research Group (1986) Parallel Distributed Processing Explorations in the Microstructures of Cognition. Vol 1: Foundations (Cambridge, MA, MIT Press: A Bradford Book).

Searle, John R. (1983) Intentionality: An essay in the philosophy of Mind (Cambridge, Cambridge University Press).

Searle, John (1980) "Minds, Brains and Programs," Behavioral and Brain Science, 3, pp.417-24.

Smolensky, Paul (1988) "On the proper treatment of connectionism," Behavioral and Brain Science, 11. (References to this article include "quadrants": a and b represent top and bottom half of first column, c and d of the second.)

Taylor, Charles (1964) The explanation of behaviour, International Library of Philosophical and Scientific Method, (London, Routledge and Kegan Paul).

Wright, Larry (1973) "Functions," Philosophical Review 82, pp.139-168.

****************************************************** TOP