The Social Intelligence Bottleneck

A new model drops every couple of months and has only slightly better performance than the previous one. They call these "benchmaxed" because they have optimized against the same benchmarks as everybody else. So in reality, the benchmarks (that were supposed to be the way to measure general capabilities) have become a second validation set, no longer representative of general use.

As expected, scaling will eventually reach diminishing returns. Epoch AI showed that compute grows in logarithmic returns ¹, so it’s reasonable that companies are now innovating architecturally instead of just throwing more money at it ²³.

The Language Model Problem

Think of the words "the ball falls". Those are words that represent an intuitive feeling or concept of gravity. We see things fall and experience the feeling of dropping something before we can say what happened. The word representation of the idea is a lossy compression of the understanding.

Frontier models learn the probability distribution of tokens. They are very good at learning sequences of words and creating coherent stories, however, their methodology is fundamentally broken.

Text is downstream of understanding; text is simply a means of communicating information, it is not the information itself. For example, you could memorize all the physics textbooks and still not understand why a ball falls. It seems counterintuitive that models learn based on the results of the world rather than the actions that created the results, which may explain why they struggle to generalize, and why deep learning for robotics lags behind other areas.

Adding modalities such as vision helps improve grounding for vision-language models, but ultimately, they still learn from static data sets that provide only snapshots of the world, not actual interactions with the world. Joseph Suarez’s work on Neural MMO ⁴ illustrates this point in detail, he shows that current training environments for AI do not reflect the world of true intelligence, where each agent is acting independently of every other agent.

World Models

LeCun has pushed for Joint Embedding Predictive Architectures (JEPA) ⁵ as an alternate approach to next token prediction, which aims to build abstract representations that capture structure as opposed to next pixel or next token predictions.

V-JEPA ⁶ demonstrated at Meta that this enables planning in robotics through creating an internal representation of how the environment will evolve and then simulating that future to reason about possible actions. This is closer to what we would call "understanding" as you now have a model you can run forward.

It is interesting that video generation models like Sora may be developing world models implicitly. In order for the generated world to remain temporally consistent (objects persist, physics behave appropriately) the model needs some internal representation of how things work. This is fundamentally different from simply performing frame-by-frame pattern matching.

Social Learning

Humans have always learned within a social context. Intelligence developed in humans for social environments where social pressures to cooperate, compete, deceive, empathize, and coordinate exist. The selective pressure was not "be intelligent", but "function within a group of other agents attempting to function".

What does social interaction require:

Theory of Mind: Representations of what others believe and want. Not just predicting their behavior, but actual representations of their internal states.

Value Inference: Figuring out what others value from limited evidence. A couple of interactions and you are developing representations of their preferences.

Dynamic Updating: Your representations of others update as others change in response to you. Everyone is representing everyone else representing them.

There is a deeper aspect of how social learning functions. Reward signals are internally generated based upon your representation of how others perceive you.

Consider embarrassment. You can be embarrassed by an interaction even if the other person did not consider the interaction embarrassing, because your representation of their reaction was incorrect and your reward signal was decoupled from the actual outcome. This is fundamentally different from game-based reinforcement learning where the score reflects reality.

How do you correct a misaligned social representation? Prediction error. Your world model predicts they will respond negatively, however when you engage and observe something different, the discrepancy between the two is learnable. Over time you will adjust your representation to align with the observed data; however, you can only obtain that signal by engaging and observing since reading about social norms provides information about what those norms are, without providing information regarding whether your representation of those norms is incorrect. You require a closed-loop.

None of these requirements exist in today’s training methods. We train on static text, then perform Reinforcement Learning From Human Feedback (RLHF) with human feedback, independent of real-world social dynamics.

The depth of LLM "values" in multilingual models is clear in that prompting in English generates one set of values and prompting in Mandarin generates another. The model learns whatever structure minimizes the entropy of the compressed distribution, therefore the values in text represent merely patterns and there is no pressure to develop a single system. Humans have values because they provide utility for navigating social environments through consistency, reputation, and coalition formation. LLMs have no similar pressure.

There is some evidence that multi-agent training is important. Research demonstrates cooperative strategies emerge from simple learning rules ⁷; and AgentVerse demonstrated that agent groups could outperform individual agents through emergent collaboration ⁸; and we have also seen this in systems such as AlphaStar, OpenAI Five, and Cicero.

Why is multi-agent not the standard paradigm? Most likely due to computational complexity. There is a trade-off: multiple simple agents vs fewer complex agents. Neural MMO used thousands of agents, however each agent was simple due to environmental cost scales with both number of agents and complexity of agents. AlphaStar and OpenAI Five worked well because the agents were all identical copies of the same agent in constrained environments. Currently scaling to diverse, complex agents in unconstrained environments remains computationally expensive.

That is why we focus on social learning as a research area. We believe that multi-agent training that results in true social cognition represents one of the most promising avenues to general intelligence.

Implications for AI Risk

If what I'm suggesting about how large language models work, and how they fail to fit into our models of "x-risk" then our models of "x-risk" will also have to be changed.

We've imagined risk coming in many forms (e.g., superintelligent machines, autonomous robots, etc.) but none of those forms are like large language models. We've assumed that a hypothetical "paper clip maximizer" would act out its goal because it has an internal structure that makes sense and gives it a reason for doing so. That isn't how the LLMs work; they simply search for patterns that they believe represent something that has a value, but they do not have the same kind of architecture that motivates other beings to want or avoid certain outcomes.

I think that the more likely sources of risk related to LLMs are going to be much more prosaic: humans will use them in ways that are undesirable, LLMs will help us amplify the biases in our data (and therefore make the biases visible at scales that we cannot ignore), the economic changes caused by LLMs will occur before the economies can adapt to them, and/or the new media we create using LLMs will become the primary source of information for people, which could lead to a form of "epistemic pollution". The latter two risks may be just as serious as the risk of instrumental convergence, but the former two would require a general governance framework, rather than the wholesale abandonment of developing and researching AI.

However, the potential upside is that if we develop AI with true value systems, via some form of social learning, then achieving alignment could potentially be very different than we anticipate today. Value systems that emerge from social processes may be more robust than those that are externally imposed. Or they may be worse. The social learning process in humans has produced both cooperation and cruelty.