In the rapidly evolving landscape of Artificial Intelligence, the process of "post-training"—encompassing Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL)—is frequently viewed as the finishing school where a model acquires its persona and specialized skills. However, a seminal paper recently appearing on ArXiv (2605.08368) proposes a radical theoretical shift. The researchers argue that the common distinction between SFT as "imitation" and RL as "discovery" is far too coarse. Instead, they introduce a framework rooted in the Free Energy Principle to address a core ontological question: Are we creating new capabilities, or merely eliciting those that already lie dormant within the model's billions of parameters?

The Illusion of Learning and the Energy Barrier

The traditional narrative suggests that Large Language Models (LLMs) learn about the world during pre-training and learn how to interact during post-training. This new research suggests that what we often perceive as the "learning of new skills" is, in reality, a reduction in the "energetic cost" of accessing information the model already possesses. By employing tools from statistical physics and information theory, the authors demonstrate that post-training functions less like a teacher and more like a sculptor, removing the debris to reveal the statue within.

When a model undergoes Reinforcement Learning (RL), it isn't necessarily discovering novel logical structures. Rather, the process reshapes its probability distribution, making certain "latent" capabilities more accessible. The study terms this process "Capability Elicitation." True "Capability Creation," conversely, requires a much more violent shift in parameter space—one that rarely occurs during the fine-tuning stages without catastrophic forgetting of prior knowledge.

The Free Energy Principle as a Diagnostic Tool

Utilizing the Free Energy Principle (FEP), a concept popularized by neuroscientist Karl Friston, the paper provides a rigorous mathematical framework for understanding model behavior. According to the framework, an LLM seeks to minimize "variational free energy" relative to its training data. Post-training is essentially an exercise in aligning the model's internal "energy landscape" with human expectations and task-specific requirements.

  • Elicitation: The model leverages existing patterns from pre-training to satisfy the new objective function with minimal structural change.
  • Creation: The model is forced to develop entirely new neural circuits to process information or logic that was absent from its initial training corpus.

This distinction carries profound implications for AI safety. If a hazardous capability can be "elicited" with minimal effort, it implies the threat was already present, lurking in the shadows of the neural network, rather than being a byproduct of malicious fine-tuning.

Implications for Alignment and Evaluation

One of the most provocative conclusions of the research is that current benchmarks fail to distinguish between these two phenomena. We often celebrate the "intelligence" of a model that has simply learned to better retrieve its knowledge, while ignoring a stagnation in the actual creation of new cognitive pathways. The researchers' analysis shows that RL is exceptionally effective at elicitation but surprisingly inefficient at creation. This explains why models often "collapse" or hallucinate when pushed beyond the boundaries of their pre-trained data distribution.

"Post-training is not the birth of intelligence, but the domesticating of a pre-existing informational chaos," the authors note.

For the global research community and enterprises focusing on model customization, the message is clear: pre-training quality remains the ultimate bottleneck. No amount of RLHF (Reinforcement Learning from Human Feedback) can generate capabilities that were not planted as seeds during the initial processing of trillions of tokens. We are essentially optimizing the path to existing answers rather than teaching the model how to think from scratch.

The Road Ahead: 2026 and Beyond

As we move through 2026, understanding the internal dynamics of LLMs through the lens of physics and thermodynamics will become the new standard. This study paves the way for more efficient training methodologies, where we might predict whether a model *can* learn a task before spending millions on compute. The distinction between elicitation and creation is not merely academic; it is the roadmap for the next generation of Artificial General Intelligence (AGI). If we want models that truly create, we must rethink the very architecture of how they transition from pre-training to the real world.