The promise of Artificial Intelligence in the data sector has always been simplicity: the ability for a business executive to ask a question in natural language and receive an accurate answer, without needing to know SQL or the labyrinthine structure of a data warehouse. However, reality is proving to be far more complex. A recent analysis of the experience at Miro, the well-known collaboration platform, sheds light on a critical obstacle: AI agents, no matter how sophisticated, often "hallucinate" when trying to join data tables in large-scale environments.
The Gap Between Model and Reality
In Miro's case, the data team attempted to connect AI agents directly to the company's Snowflake environment. The result was disappointing: the agents provided incorrect answers more than 65% of the time. The problem wasn't rooted in the language processing capabilities of the model (such as GPT-4 or Claude 3), but in the complete absence of context. With more than 10,000 tables and no clear "semantic layer" to guide query routing, the agents had no way of knowing which data asset was the correct one for each specific case.
Imagine a librarian who is fluent in every language in the world but finds themselves in a library with millions of unmarked books where the covers have been removed. They can read the texts, but they don't know where to look to find the specific information requested. This is exactly what happens with LLMs (Large Language Models) when faced with corporate data accumulated over years, often with ambiguous table names and overlapping fields.
The Join Hallucination and the Importance of Logs
The most common error for AI agents is the "join hallucination." When an agent is asked to answer a question requiring data from different tables, it often invents relationships that don't exist or uses the wrong keys for the connection. For example, it might try to join a sales table with a customer table using a field that seems logical but is deprecated or contains incomplete data.
The solution, as highlighted by Miro's experience, lies not in training larger models, but in leveraging SQL query logs. These logs represent the "footprint" of human intelligence within the organization. They contain thousands of queries written by experienced data analysts and show exactly how tables are joined in practice. The logs act as a roadmap revealing the actual structure and usage of data, beyond formal (and often incomplete) metadata.
From Text-to-SQL to Context-Aware SQL
The transition from simple "Text-to-SQL" (converting text to code) to "Context-Aware SQL" (code with context awareness) is the next major step. Using techniques like RAG (Retrieval-Augmented Generation) over historical query logs, AI agents can now "look" at how their human counterparts solved similar problems in the past. If an analyst has successfully joined Table A with Table B a thousand times using the 'user_id' key, the AI agent can learn this pattern and replicate it.
- Error Reduction: Using logs can reduce the failure rate from 65% to single digits.
- Automated Documentation: Logs can help automatically generate a semantic layer, saving data engineers months of manual work.
- Data Democratization: When AI understands the context, non-technical users can finally trust the answers they receive.
Strategic Significance for Businesses
For businesses investing in infrastructure like Snowflake or Databricks, this discovery is a game-changer. Value is no longer found just in the data itself, but in the knowledge of how that data is used. Organizations that manage to organize and feed their SQL logs into their AI systems will gain a significant competitive advantage. It is no longer an arms race for the best AI model, but a race to organize corporate knowledge.
"Context is king. Without it, AI is just a very fast way to get the wrong answers," industry analysts note.
In conclusion, the Miro case teaches us that artificial intelligence needs human experience—as recorded in the traces of code we leave behind—to function correctly. SQL, a language many thought would be replaced by AI, ultimately proves to be the essential "fuel" for the very survival and accuracy of AI in the business world.