AI chatbots waffle on GOV.UK queries, then get facts wrong when told to zip it

AI chatbots are too chatty for government advice

Artificial intelligence chatbots give verbose and sometimes dangerously inaccurate answers when responding to questions about government services. This is according to new research from the Open Data Institute (ODI), which tested 11 large language models (LLMs) on over 22,000 queries.

The study found that models often buried facts in "word salad" responses. Telling them to be more concise, however, reduced their accuracy.

Models make unpredictable and serious errors

The ODI's tests revealed that while models often answered correctly, their mistakes were inconsistent and severe. One model, ChatGPT-OSS-20B, incorrectly stated a person was only eligible for Guardian's Allowance if the child in their care had died.

Other models gave wrong legal and procedural advice. Llama 3.1 8B wrongly claimed a court order was needed to add an ex-partner's name to a birth certificate. Qwen3-32B incorrectly said the £500 Sure Start Maternity Grant was available in Scotland.

"If language models are to be used safely in citizen-facing services, we need to understand where the technology can be trusted and where it cannot," said ODI director of research Professor Elena Simperl.

A dangerous failure to refuse questions

The researchers identified a critical flaw: models attempted to answer almost every question, even when they lacked the accurate information. They described this as "a dangerous trait" that could lead people to act on misinformation.

The ODI's key findings on model behavior include:

High verbosity, with models like Anthropic's Claude 4.5 Haiku being particularly prone to "waffling."
An ability to combine multiple sources that backfires when strict accuracy is required.
Inconsistent performance, making errors hard to predict and manage.

The researchers recommend that any public-facing AI service must clearly inform users of the risks and direct them to authoritative sources like GOV.UK.

Smaller models can compete with giants

The research also showed that smaller, open-source LLMs can deliver results comparable to large, closed models like OpenAI's ChatGPT 4.1. This finding challenges the assumption that only the biggest models are viable.

It suggests public bodies should avoid long-term contracts that lock them into a single supplier. Flexibility in adopting different, potentially cheaper AI systems is crucial.

UK government pushes ahead with AI plans

This research arrives as the UK government is actively integrating AI into its services. The Government Digital Service plans to add a chatbot to the GOV.UK app and website starting in early 2026.

Specific initiatives are already underway:

The government is working with Anthropic to build an AI service for job seekers.
The Department for Work and Pensions is experimenting with a chatbot for Universal Credit claimants.

The ODI conducted its tests using a new dataset called CitizenQuery-UK, which contains 22,066 synthetic questions and answers based on GOV.UK. The institute has released this dataset publicly to aid further research and development.

AI chatbots waffle on GOV.UK queries, then get facts wrong when told to zip it

AI chatbots are too chatty for government advice

Models make unpredictable and serious errors

A dangerous failure to refuse questions

Smaller models can compete with giants

UK government pushes ahead with AI plans

Related Articles

AI breakthrough could replace rare earth magnets in electric vehicles

Why sky-high pay for AI researchers is bad for the future of science

Stay in the loop

Statistical approximation is not general intelligence

Related Articles

AI breakthrough could replace rare earth magnets in electric vehicles
Feb 20, 20262 min read

Why sky-high pay for AI researchers is bad for the future of science
Feb 20, 20263 min read

Statistical approximation is not general intelligence
Feb 20, 20262 min read