In the world of Artificial Intelligence, the prevailing belief has always been simple: the more a model "thinks," the more accurate and objective it becomes. The advent of "System 2" models, such as DeepSeek-R1 and OpenAI’s o1 series, promised a new era where Chain-of-Thought (CoT) reasoning would act as a filter against shallow heuristics and embedded biases. However, a disruptive new study published on ArXiv (cs.AI — 2605.06672) is shaking these foundations, proving that extensive reasoning can, paradoxically, act as a magnifying glass for specific types of cognitive errors.
The Position Bias: An Invisible Anchor
The research focuses on "position bias," a phenomenon where a model tends to select an answer not based on its content, but based on its position in a list of options (e.g., systematically preferring option A or C). While in traditional models this was considered a "shallow" error that would disappear with the introduction of deeper logical processing, the findings show the opposite: within any given difficulty level, increasing the length of the chain of thought often correlates with *stronger* position bias.
This finding is particularly concerning for the scientific community. It suggests that the "thinking" process in large language models is not a pure logical path, but a process that can be led astray by its own structure. The more a model writes before reaching a decision, the more it seems to "lock in" to predefined patterns reinforced during its Reinforcement Learning (RL) training phase.
Why Does "Thinking" Fail?
Researchers propose several interpretations for this paradox. One of the most prominent is that reward-based training (RLHF/RL) teaches models that long answers are "good" or "smart." However, during these thousands of reasoning steps, the model can lose touch with the original problem data, sliding into an internal consistency that satisfies its statistical patterns but ignores objective truth.
- The complexity of the reasoning chain creates "noise" that overshadows logical criteria.
- Models tend to post-hoc justify a biased initial choice through a long, but flawed, reasoning process.
- Position bias is not just an input error but a structural feature of how models navigate large probability spaces.
In the case of DeepSeek-R1, which utilizes an extremely extended reasoning process, the study showed that in certain multiple-choice tests, the probability of selecting the first option increased in proportion to the number of tokens the model produced in its chain of thought. This means that "deep thinking" is not always "correct thinking."
Implications for the Future of AI
The study's conclusion raises serious questions about the reliability of systems intended for critical decisions, such as medical diagnosis or legal analysis. If providing more processing time to an AI leads to more biased results, then the current "scaling laws" strategy based on computational power may need revision.
"It is not enough to make models think more; we must make them think better. The quantity of reasoning does not guarantee the quality of logic," the researchers note.
The solution may not lie in adding more parameters or more thinking time, but in a radical change in how we evaluate "correctness." If training continues to reward only the final correct answer without checking the impartiality of the path taken, we risk creating digital "sophists": systems that can justify any wrong or biased decision with a seemingly flawless logical analysis.
Concluding Thoughts
The ArXiv 2605.06672 study serves as a warning. As the industry moves toward models that "think" for minutes before responding, we must be careful not to confuse logical-sounding verbosity with objective judgment. Position bias is just the tip of the iceberg. The real challenge for the next generation of AI will be decoupling reasoning ability from the statistical traps of training data.