In a paper written in 1945, polymath John von Neumann outlined the architecture of modern digital computers. The only citation in the 49-page report was to a foundational paper in the field of computational neuroscience: “A logical calculus of the ideas immanent in nervous activity.” Von Neumann was well aware of the differences between the brain and the computers he helped develop, but the brain was also a source of inspiration. Indeed, he considered the functioning of the nervous system to be “prima facie digital.” But despite some early parallels, the fields of computer science and neuroscience rapidly diverged—and so, too, will the fields of artificial intelligence and neuroscience.
From the onset, AI and neuroscience have been sister fields, with natural intelligence serving as a template for artificial intelligence, and neuroscientific principles serving as inspiration for AI approaches. First and foremost among these principles is that many approaches in AI rest on a foundational tenet of neuroscience: that information is stored in the weights of connections between neurons. Additional neuroscience-inspired principles at work in the artificial neural networks (ANNs) used in AI include convolutional neural networks (visual cortex), regularization (homeostatic plasticity), max pooling (lateral inhibition), dropout (synaptic failure) and reinforcement learning.
But many of the recent developments that have contributed to the explosive success of AI have diverged from neuroscience as a source of computational principles. A decade ago, inspired by the brain, recurrent neural networks (RNNs) seemed to be the way forward as AI tackled time-dependent problems such as speech recognition and natural language processing. This direction changed quickly, however, with the landmark transformer paper in 2017: “Attention is all you need.”
T
he introduction of the transformer architecture marked an important inflection point in the history of AI. Transformers are notable both in their surprising power and in how un-brain-like they are. They lack the recurrent connections of RNNs and operate in discontinuous time—that is, through discrete time steps without any “memory” of the states from the previous time step. They are also devoid of any form of working memory; they cleverly externalize working memory by iteratively increasing the length of the input at each iteration. Most notably perhaps, transformers lack any internal dynamics or ability to tell time. ChatGPT, for example, cannot respond appropriately to the prompt “Wait 10 seconds before telling me the capital of Canada” (at least, not without invoking the Python Compiler).The brain encodes time and recent sensory information in the internal dynamics of RNNs, along with other mechanisms, such as short-term synaptic plasticity. By contrast, transformers encode time (more accurately, ordinality) by tagging the vector representing each word, or token, with positional information (first, second and so forth)—an approach referred to as positional encoding. This difference allowed transformers to solve the challenge of exploding or vanishing gradients, in which the error signal generated at the end of a sequence is degraded as it is backpropagated to earlier time steps in that sequence.
By design, transformers are, in a sense, timeless. To use an analogy with terms from the philosophy of time, transformers operate in a block universe where the past, present and future (in the case of bidirectional transformers) are all simultaneously available. By contrast, RNNs operate in a presentist universe in which only the current input is available, and computations unfold in continuous time.
The so-called attention mechanism of transformers sounds biological, but it does not really refer to what most cognitive neuroscientists would consider attention. It essentially assigns a value to the strength of the relationship between all word pairs in a sentence, rather than selectively modulating information-processing based on expectations or volitional control. Furthermore, the implementation of the attention mechanism also lacks biological plausibility. Most operations in neural networks correspond to the multiplication of an activity vector by a weight matrix, but the attention mechanism relies on the multiplication of what would generally be considered two activity vectors. That is, at best, an awkward mathematical operation to implement with neurons.
Despite their success, transformers have their own limitations—including their insatiable energy consumption. For this and other reasons, the AI field is reexamining RNN-like approaches. But new and old RNN-like architectures in machine learning, which go by names such as long short-term memory (LSTMs), gated recurrent units (GRUs) and Mamba, don’t necessarily have parallels in neuroscience either. They often lack the biological realism of neural circuits—in part because AI programs are generally implemented on conventional digital computers, which easily allow for a far richer range of mathematical operations, such as the gating operations of LSTMs, than do biological neural networks. Indeed, as long as AIs continue to be implemented on digital computers, the AI field will be paced by Moore’s Law, whereas neuroscience will continue to slowly drift forward.
T
he importance of hardware is also fundamental to the deeper philosophical question of whether AIs implemented on conventional computers can, in theory, be capable of sentience. Digital computers operate in discrete time (paced by the computer’s clock speed), and unlike the brain can easily be paused or have the clock speed changed. Now, assume that we are running a novel ANN simulation that someone claimed was conscious. What would happen if we slowed the clock speed to one cycle per year? Would the AI be frozen in a subjective state for a year?Most theories of consciousness, such as global workspace and higher-order theories, seem to implicitly assume consciousness is associated with continuous-time brain dynamics. In these theories, consciousness is like music: It exists only as it flows through time. And depending on whether the ANN is running on a CPU, GPU or TPU (and the number of cores), all states within a single time step of the ANN will not be updated simultaneously in real time—meaning that any conscious states would depend on the details of the hardware even though the input-output relations are identical.
An exception to the view that consciousness relies on brain dynamics is the controversial and panpsychist integrated-information theory (IIT). IIT is not a neuroscience theory but a theory of fundamental physics—albeit one that is unmoored from the other laws of physics. Roughly speaking, IIT quantifies how much the current state of a system constrains past and future states compared with random configurations. IIT goes on to claim this quantity is directly equivalent to consciousness.
As has been pointed out, IIT is defined only for discrete systems, which “unfortunately means that IIT isn’t defined for most traditional physical systems, which can change continuously”—a particularly acute problem when attempting to apply IIT to the only thing we know for sure is conscious. Therefore, the current theories of consciousness that are compatible with the fact that the brain is a dynamical system would seem to exclude the possibility of sentience in AIs running on discrete von Neumann architectures.
AI and neuroscience will, no doubt, continue to have synergistic interactions. AI will continue to borrow insights from neuroscience as they emerge. But, going forward, AI may have more to offer neuroscience than the other way around. To date, neuroscientists have been slow to fully digest some of the early lessons from AI. One such lesson is the limited value of the full connectome of an artificial or biological neural network. Every connection, weight and bias of ChatGPT is known. But access to this knowledge has not translated into any immediate or deep understanding of how it works—which is not to say it is not useful. A potential lesson may be that neuroscientists need to reexamine what it means to understand the emergent properties of something as complex and highly distributed as the brain.
Computer science progressed independently of neuroscience because the brain holds no exclusive rights on how to process information. AI and neuroscience will continue to diverge because the brain holds no exclusive rights on how to create intelligence.