How your LLM is silently hallucinating company revenue

Large language models are generating dangerously wrong database queries

LLMs are accelerating engineering work, but they are creating a specific and dangerous failure mode when used with databases. The problem is that a syntactically correct SQL query can execute successfully while being semantically wrong, returning bad data that looks legitimate.

When an LLM generates a faulty React component, the error is usually visible—a broken layout or a misplaced button. A faulty database query, however, can run without error and return thousands of rows of incorrect data. This makes the mistakes opaque and difficult for a human to quickly validate.

Most broken queries still return data

Analysis of over 50,000 production queries revealed that most "broken" queries execute successfully. A user might ask for "revenue by product category," and the LLM might pull from a "revenue" column in the wrong table.

The query runs, numbers appear, and business decisions are made based on silently incorrect metrics. LLMs often take the first seemingly correct path and fail to explore alternatives, getting stuck with tunnel vision.

Why databases are uniquely vulnerable

Three key characteristics make database work especially prone to these silent LLM failures.

First, SQL dialects diverge in ways LLMs don't anticipate. While basics are standard, most databases adapt SQL into their own dialect with subtle or dramatic syntax differences, rendering a lot of general training data useless.

Second, real-world schemas are messy. Column names like "amount" can have ambiguous meanings, and legacy tables clutter the environment. LLMs frequently make confident, wrong guesses, inventing columns that don't exist or joining on incorrect keys.

Finally, human communication is ambiguous. Requests like "show me Q1 FY2026" or metrics involving "a billion" have different meanings depending on regional or business context. LLMs lack the specific knowledge to know which definition applies.

New tools are providing crucial context

The core failure stems from using general LLMs with no context about a specific environment. The model knows abstract SQL but nothing about your schema, dialect, or business logic. A wave of new tools aims to solve this.

Model Context Protocol (MCP): Anthropic's protocol standardizes connections between LLMs and external tools like databases, allowing for direct query execution and schema introspection.
AGENTS.md: OpenAI's convention uses a single Markdown file to provide domain-specific context that travels with a codebase, though it can lead to context bloat.
Agent Skills: Anthropic's modular system breaks knowledge into separate files an agent can load on-demand, preventing unused info from clogging the context window.

Databases themselves also offer a native solution. Systems like ClickHouse, PostgreSQL, and MySQL have long supported the COMMENT syntax to store metadata on tables and columns, providing built-in semantic context.

The battle of the context methods

It's not yet clear which context-providing method will dominate. A recent evaluation by Vercel compared AGENTS.md to Agent Skills.

AGENTS.md achieved a 100% pass rate in their tests, while skills maxed out at 79%. Crucially, in 56% of test cases, agents with access to skills never invoked them, even when the documentation was available and relevant.

This suggests that while Agent Skills are more efficient with context, their reliance on the LLM correctly choosing to use a skill is a major weakness. As LLMs improve at tool-calling and context windows grow larger, the balance may shift. For now, the safest bet may be to embed context directly within the database objects themselves.

How your LLM is silently hallucinating company revenue

Large language models are generating dangerously wrong database queries

Most broken queries still return data

Why databases are uniquely vulnerable

New tools are providing crucial context

The battle of the context methods

Related Articles

‘An AlphaFold 4’ – scientists marvel at DeepMind drug spin-off’s exclusive new AI

OpenAI’s Sam Altman: Global AI regulation ‘urgently’ needed

Stay in the loop

How to ground AI agents in accurate, context-rich data

Related Articles

‘An AlphaFold 4’ – scientists marvel at DeepMind drug spin-off’s exclusive new AI
Feb 20, 20263 min read

OpenAI’s Sam Altman: Global AI regulation ‘urgently’ needed
Feb 20, 20262 min read

How to ground AI agents in accurate, context-rich data