AI chatbots waffle on GOV.UK queries, then get facts wrong when told to zip it
Summary
AI chatbots for UK government info are often too wordy, burying facts. When told to be concise, accuracy drops. They also make unpredictable errors and rarely refuse questions, risking misinformation. Smaller models can match larger ones.
AI chatbots are too chatty for government advice
Artificial intelligence chatbots give verbose and sometimes dangerously inaccurate answers when responding to questions about government services. This is according to new research from the Open Data Institute (ODI), which tested 11 large language models (LLMs) on over 22,000 queries.
The study found that models often buried facts in "word salad" responses. Telling them to be more concise, however, reduced their accuracy.
Models make unpredictable and serious errors
The ODI's tests revealed that while models often answered correctly, their mistakes were inconsistent and severe. One model, ChatGPT-OSS-20B, incorrectly stated a person was only eligible for Guardian's Allowance if the child in their care had died.
Other models gave wrong legal and procedural advice. Llama 3.1 8B wrongly claimed a court order was needed to add an ex-partner's name to a birth certificate. Qwen3-32B incorrectly said the £500 Sure Start Maternity Grant was available in Scotland.
"If language models are to be used safely in citizen-facing services, we need to understand where the technology can be trusted and where it cannot," said ODI director of research Professor Elena Simperl.
A dangerous failure to refuse questions
The researchers identified a critical flaw: models attempted to answer almost every question, even when they lacked the accurate information. They described this as "a dangerous trait" that could lead people to act on misinformation.
The ODI's key findings on model behavior include:
- High verbosity, with models like Anthropic's Claude 4.5 Haiku being particularly prone to "waffling."
- An ability to combine multiple sources that backfires when strict accuracy is required.
- Inconsistent performance, making errors hard to predict and manage.
The researchers recommend that any public-facing AI service must clearly inform users of the risks and direct them to authoritative sources like GOV.UK.
Smaller models can compete with giants
The research also showed that smaller, open-source LLMs can deliver results comparable to large, closed models like OpenAI's ChatGPT 4.1. This finding challenges the assumption that only the biggest models are viable.
It suggests public bodies should avoid long-term contracts that lock them into a single supplier. Flexibility in adopting different, potentially cheaper AI systems is crucial.
UK government pushes ahead with AI plans
This research arrives as the UK government is actively integrating AI into its services. The Government Digital Service plans to add a chatbot to the GOV.UK app and website starting in early 2026.
Specific initiatives are already underway:
- The government is working with Anthropic to build an AI service for job seekers.
- The Department for Work and Pensions is experimenting with a chatbot for Universal Credit claimants.
The ODI conducted its tests using a new dataset called CitizenQuery-UK, which contains 22,066 synthetic questions and answers based on GOV.UK. The institute has released this dataset publicly to aid further research and development.
Related Articles
AI breakthrough could replace rare earth magnets in electric vehicles
AI speeds discovery of new magnetic materials, reducing reliance on rare earth elements for tech like EVs. Database lists 67,573 compounds, including 25 new high-temperature magnets.
Why sky-high pay for AI researchers is bad for the future of science
Top AI researchers are leaving academia for high-paying industry jobs, threatening academic innovation and collaboration. This shift risks overemphasizing individual "genius" over team-based science, which is more effective for complex problems.
Stay in the loop
Get the best AI-curated news delivered to your inbox. No spam, unsubscribe anytime.
