The era of innocence for generative artificial intelligence appears to be drawing to a close. While the past two years were defined by a barrage of spectacular announcements and exponential improvements, recent reports regarding the performance of OpenAI’s ChatGPT and Anthropic’s Claude suggest a troubling stagnation. According to reports, including those from the Maeil Business Newspaper, users and developers are noticing an increasing instability: from "lazy" responses to the persistent return of digital hallucinations, Large Language Models (LLMs) seem to be struggling with their very foundations.
The "Laziness" Phenomenon and Quality Degradation
One of the most discussed issues in Silicon Valley circles is the so-called "model laziness." Users of GPT-4 and Claude 3.5 frequently report that the AI refuses to fully execute complex coding tasks or provides brief, unsatisfying answers to queries it previously handled with ease. This phenomenon is not merely a subjective impression. Analysts argue that the drive by companies to make models "safer" and less computationally expensive has led to over-alignment. The result is a product that, in its attempt to avoid errors or excessive energy consumption, ends up being less useful to the end-user.
Furthermore, training data quality has become a massive thorn in the side of AI developers. As the internet becomes saturated with content generated by AI itself, new models risk being trained on synthetic data. This creates a vicious cycle known as "model collapse," where the flaws and biases of previous AI generations are amplified, leading to a gradual erosion of the systems' logic and creativity.
The Scaling Challenge and Economic Hurdles
For years, the dominant theory was the "Scaling Laws": the more data and compute power you add, the smarter the model becomes. However, OpenAI and Anthropic seem to be hitting a wall of diminishing returns. Finding new, high-quality data that hasn't already been scraped is now extremely difficult. Legal battles with publishers and content creators further restrict the sources of "clean" knowledge.
At the same time, costs remain astronomical. Maintaining and training models like Claude 3 Opus or GPT-4o requires billions of dollars in Nvidia infrastructure and energy consumption comparable to that of entire cities. Investors are beginning to demand tangible results and profitability, forcing companies to optimize for efficiency at the potential expense of raw intelligence. The shift toward Small Language Models (SLMs) is a direct response to this pressure, but it raises the question: can we achieve true Artificial General Intelligence (AGI) with limited resources?
Enterprise Trust at Risk
For the business world, reliability is everything. A bank or a hospital cannot rely on a system that "hallucinates" facts or refuses to complete an analysis due to internal safety filters. The current performance crisis of ChatGPT and Claude is creating a trust gap. While companies are experimenting with techniques like RAG (Retrieval-Augmented Generation) to ground AI in reality, the core engine remains a "black box" with unpredictable behavior.
The solution may not lie in adding more parameters, but in a radical architectural shift. The transition from static models to "AI agents" that can reason in multiple steps and self-correct is the next big promise. However, until that is achieved, users will have to settle for a technology that, despite its brilliance, remains fragile and often unreliable.
Conclusion: The Pivot Toward Quality
The year 2026 finds AI at a crossroads. The era of "bigger is better" is ending. OpenAI, Anthropic, and Google are now called upon to prove they can deliver stability and depth, not just impressive demos. The crisis described by the Maeil Business Newspaper is not the end of AI, but its coming of age. It is the moment when the technology must stop being an impressive toy and become a dependable tool capable of supporting the weight of the global economy.