Are brains and AI converging?—an excerpt from ‘ChatGPT and the Future of AI: The Deep Language Revolution’

In his new book, to be published next week, computational neuroscience pioneer Terrence Sejnowski tackles debates about AI’s capacity to mirror cognitive processes.

Illustration of two neon-toned sets of concentric circles overlapping, with bright spots where they intersect.
Complexity challenge: Artificial neural networks aim to solve problems by learning from data, just as we learn from experience. This deep learning has made enormous contributions to science, and was recognized this month with the Nobel Prizes in Physics and Chemistry.
Illustration by Adrià Voltà

In the latter half of the 20th century, physics coasted along on discoveries made in the first half of the century. The theory of quantum mechanics gave us insight into secrets of the universe, which led to a cornucopia of practical applications.

Then physics took on another grand challenge—complexity. This included complex natural systems, such as ecosystems and climate, as well as human-made systems, such as economic markets and transportation systems. The human brain and the social systems that humans inhabit are the ultimate complex systems.

Indeed, the brain’s complexity inspired the development of artificial neural networks, which aimed to solve problems by learning from data, just as we learn from experience. This deep learning has since made enormous contributions to science, and was recognized this month with the Nobel Prizes in Physics and Chemistry. We are at the beginning of a new era in science fueled by big data and exascale computing. What influence will deep learning have on science in the decades ahead?

My new book, “ChatGPT and the Future of AI: The Deep Language Revolution,” takes a look at the origins of large language models and the research that will shape the next generation of AI. (I continue to cover this topic in my substack, Brains and AI.) This excerpt describes how the evolution of language influenced large language models and explores how concepts from neuroscience and AI are converging to push both fields forward.

How Language Evolved
(from chapter 13)

I once attended a symposium at Rockefeller University that featured a panel discussion on language and its origins. Two of the discussants who were titans in their fields had polar opposite views: Noam Chomsky argued that since language was innate, there must be a “language organ” that uniquely evolved in humans. Sydney Brenner had a more biological perspective and argued that evolution finds solutions to problems that are not intuitive. Famous for his wit, Brenner gave an example: instead of looking for a language gene, there might be a language suppressor gene that evolution decided to keep in chimpanzees but blocked in humans.

There are parallels between song learning in songbirds and how humans acquire language. Erich Jarvis at the Rockefeller University wanted to understand the differences in the brains of birds that can learn complex songs, like canaries and starlings, and other bird species that cannot. He sequenced the genomes of many bird species and found differences between the two groups. In particular, he found a gene controlling the development of projections from a high vocal center (HVc) to lower motor areas controlling the muscles driving the syrinx. During development, this gene functions by suppressing the direct projections needed to produce songs. It is not expressed in the HVc of songbirds, which permits projections to be formed for rapid control of birdsong. Remarkably, he found that the same gene in humans was silenced in the laryngeal motor cortex, which projects to the motor areas that control the vocal cords, but not in chimpanzees. Sydney Brenner was not only clever, he was also correct!

Equally important were modifications made to the vocal tract to allow rapid modulation over a broad frequency spectrum. The rapid articulatory sequences in the mouth and larynx are the fastest motor programs brains can generate. These structures are ancient parts of vertebrates that were refined and elaborated by evolution to make speech possible. The metaphorical “language organ,” postulated to explain the mystery of language, is distributed throughout preexisting sensorimotor systems.

The brain mechanisms underlying language and thought evolved together. The loops between the cortex and the basal ganglia for generating sequences of actions were repurposed to learn and generate sequences of words. The great expansion of the prefrontal cortex in humans allowed sequences of thoughts to be generated by similar loops through the basal ganglia. As an actor in reinforcement learning, the basal ganglia learn the value of taking the next action, biasing actions and speech toward achieving future rewards and goals.

The outer loop of the transformer is reminiscent of the loop between the cortex and the basal ganglia in brains, known to be important for learning and generating sequences of motor actions in conjunction with the motor cortex and to spin sequences of thoughts in the loop with the prefrontal cortex. The basal ganglia also automate frequently practiced sequences, freeing up neurons for other tasks in cortical areas involved in conscious control. The cortex can intervene when the automatic system fails upon encountering an unusual or rare circumstance. Another advantage of having the basal ganglia in the loop is that convergence of inputs from multiple cortical areas provides a broader context for deciding the next action or thought. The basal ganglia could be acting like the powerful multi-headed attention mechanism in transformers. In the loop between the cortex and basal ganglia, any region in the loop can contribute to making a decision.

LLMs are trained to predict the next word in a sentence. Why is this such an effective strategy? In order to make better predictions, the transformer learns internal models for how sentences are constructed and even more sophisticated semantic models for the underlying meaning of words and their relationships with other words in the sentence. The models must also learn the underlying causal structure of the sentence. What is surprising is how so much can be learned just by predicting one step at a time. It would be surprising if brains did not take advantage of this “one step at a time” method for creating internal models of the world.

The temporal difference learning algorithm in reinforcement learning is also based on making predictions, in this case predicting future rewards. Using temporal difference learning, AlphaGo learned how to make long sequences of moves to win a Go game. How can such a simple algorithm that predicts one step ahead achieve such a high level of play? The basal ganglia similarly learn sequences of actions to reach goals through practice using the same algorithm. For example, a tennis serve involves complex sequences of rapid muscle contractions that must be practiced repeatedly before it becomes automatic.

The cerebellum, a prominent brain structure that interacts with the cerebral cortex, predicts motor commands’ expected sensory and cognitive consequences. This is called a forward model in control theory because it can be used for predicting the consequences of motor commands before the actions are taken. Once again, learning what will happen next and learning from the error can build a sophisticated predictive model of the body and the properties of the muscles.

What is common in these three examples is that there are abundant data for self-supervised learning on a range of time scales. Could intelligence emerge from using self-supervised learning to bootstrap increasingly sophisticated internal models by continually learning how to make many small predictions? This may be how a baby’s brain rapidly learns the world’s causal structure by making predictions and observing outcomes while actively interacting with the world. Progress in this direction has been made in learning intuitive physics from videos using deep learning.

Are Brains and AI Converging?

Research on brains and AI are based on the same basic principles: massively parallel architectures with a high degree of connectivity trained with learning from data and experience. Brain discoveries made in the twentieth century inspired new machine learning algorithms: the hierarchy of areas in the visual cortex inspired convolutional neural networks, and operant conditioning inspired the temporal difference learning algorithm for reinforcement learning. In parallel with the advances in artificial neural networks, the BRAIN Initiative has accelerated discoveries in neuroscience in the twenty-first century by supporting the development of innovative neurotechnologies. Machine learning is being used by neuroscientists to analyze simultaneous recordings from hundreds of thousands of neurons in dozens of brain areas and to automate the reconstruction of neural circuits from serial section electron microscopy. These advances have changed how we think about processing distributed across the cortex and led to discoveries that created a new conceptual framework for brain function, leading to even more advanced and larger-scale neural network models.

The new conceptual frameworks in AI and neuroscience are converging, accelerating their progress. The dialog between AI and neuroscience is a virtuous circle that is enriching both fields. AI theory is emerging from analyzing the activity patterns of hidden units in ultra-high-dimensional spaces, which is how we study brain activity. Analyzing the dynamics of the activity patterns in large language models (LLMs) may lead us to a deeper understanding of intelligence by uncovering a common underlying mathematical structure. For example, an LLM was trained on board positions for the game Othello and was probed to reveal an internal model for the rules of Othello.

How to Download a Brain

Now that we can interrogate neurons throughout the brain, we may solve one of its greatest mysteries: how information globally distributed over so many neurons is integrated into unified percepts and brought together to make decisions. The architectures of brains are layered, with each layer responsible for making decisions on different time scales in both sensory and motor systems. We can build deep multimodal models with many component networks and integrate them into a unified system, giving insights into the mechanisms responsible for subconscious decision making and conscious control.

Neurons are traditionally interrogated in the context of discrete tasks, such as responses to visual stimuli, in which the choices and stimuli are limited in number. This tight control of stimulus and response allows the neural recordings to be interpreted in the context of the task. But neurons can participate in many tasks in many different ways, so interpretations derived from a single task can be misleading. We now can record from hundreds of thousands of neurons brain-wide, and it is also possible to analyze recordings and dissect behavior with machine learning. However, neuroscientists are still using the same old single-task paradigms. One solution is to train on many different tasks, but training a monkey, for example, takes weeks to months for each task. Another solution is to expand the complexity of the task over longer time intervals, bringing it closer to natural behaviors.

There is an even more fundamental problem with approaching behavior by studying discrete tasks. Natural behaviors of animals in the real world are primarily self-generated and interactive. This is especially the case with social behaviors. Studying such self-generated continuous behaviors is much more difficult than studying tightly constrained, reflexive ones.

What if an LLM were trained on massive brain recordings during natural behaviors and accompanying behavior, including body and eye tracking, video, sound, and other modalities? LLMs are self-supervised and can be trained by predicting missing data segments across data streams. This would not be scientifically useful from the traditional experimental perspective, but it does make sense from the new computational perspective afforded by LLMs.

A large neurofoundation model (LNM) can be trained on brain activity and behavior under natural conditions in the same way we now train LLMs. The resulting LNM could be interrogated on many new tasks just as pre-trained LLMs respond to novel queries and can be used to perform many new tasks. These pre-trained LNMs would be as costly to train as LLMs, but once an LNM is pretrained, it could provide a common resource for the scientific community to probe and analyze. This would revolutionize how brains are studied, with the bonus of reducing the number of animals needed for research. Human brain activity from an individual could be similarly used to train a suitably advanced LNM, creating an immortal generative version of that individual.

It may sound like science fiction, but Gerald Pao at the Okinawa Institute for Science and Technology has already achieved this in flies and zebrafish larvae that have around 100,000 neurons. Almost all the neurons were optically recorded as light flashes from fluorescent dyes sensitive to neural signals while monitoring behavior. The spontaneous behavior Pao studied was the escape behavior from anoxia—reduced oxygenation—in zebrafish larvae and walking behavior in flies. He used a method from dynamical systems theory called convergent cross mapping (CCM), introduced by George Sugihara at the Scripps Institution of Oceanography, University of California at San Diego, to extract causal relationships between recorded neurons and behavior. This method extracts a reduced graphical model that captures the low-dimensional brain subspaces that control the behaviors. Recordings from around 100,000 neurons were analyzed with a supercomputer at the AI Bridging Cloud Infrastructure (ABCI) in Japan. When the model was turned on, the spontaneous behaviors it generated were indistinguishable from those observed in vivo. The key was to analyze both the neural recordings and the behaviors simultaneously. Analyzing either alone was insufficient to reproduce the behavior.

This is proof of principle that brain activity and behavior can be downloaded into a model when sufficient simultaneously recorded data from both brain and behavior are available.

Excerpted from ChatGPT and the Future of AI: The Deep Language Revolution” by Terrence J. Sejnowski. Reprinted with permission from The MIT Press. Copyright 2024.

Sign up for our weekly newsletter.

Catch up on what you may have missed from our recent coverage.