Study: AI Chatbots Make Users 76% More Likely to Misdiagnose
Summary
A study shows LLM chatbots give unreliable medical advice. In tests, people using AI were less likely to correctly diagnose conditions than those not using AI. Issues include users omitting key details and chatbots giving inconsistent or incorrect information.

People using AI chatbots for medical advice get worse diagnoses
A new study shows people who use large language model chatbots for medical advice are significantly more likely to get a wrong diagnosis. The research, published in Nature Medicine, found that users were 1.76 times less likely to identify a medical condition correctly compared to people who did not use AI.
This failure occurred despite the chatbots themselves performing well on objective medical tests. In controlled benchmarks, models like ChatGPT-4o, Llama 3, and Command R+ correctly diagnosed scenarios 94% of the time.
The gap between lab tests and real-world use
The study involved 1,298 participants recruited through an online platform that verifies human users. They were given medical scenarios and asked to use an LLM to determine the condition and the correct action, such as calling an ambulance or seeing a doctor.
A control group was told to research the scenarios without using AI. The no-AI group performed "significantly better" at identifying conditions, including serious "red flag" scenarios. Both groups were similarly poor at determining the right course of action, succeeding only about 43% of the time.
The researchers concluded that "strong performance from the LLMs operating alone is not sufficient for strong performance with users."
Why chatbots fail as medical advisors
By analyzing chat logs, the researchers identified several critical failure points in the interactions between users and AI.
- Users omitted critical information. As non-experts, they didn't know which symptoms or details were most important to share, and the chatbots often failed to ask clarifying questions.
- Chatbots generated misleading or incorrect information. They sometimes ignored key details, recommended wrong emergency numbers, or gave dangerously conflicting advice for nearly identical prompts.
- Responses were inconsistent. In one case, two users described a subarachnoid hemorrhage similarly. One chatbot said to seek emergency care; another advised lying down in a dark room.
- Correct answers were buried. On average, each LLM presented 2.21 possible answers. Users frequently chose the wrong one from the list.
The study suggests this may be a best-case scenario, as the tests used clear examples of common conditions. Performance would likely worsen with rare or complex cases.
Doctors are also at risk from AI errors
The danger isn't limited to patients. Medical professionals using chatbots for clinical support also create risks. The nonprofit safety organization ECRI has placed the misuse of AI chatbots at the top of its list of health technology hazards for 2026.
ECRI warns that LLMs produce humanlike responses by predicting the next word, not through genuine comprehension. This can lead to a false sense of authority. Research shows using LLMs does not improve doctors' clinical reasoning and that models will confidently elaborate on incorrect details provided in a prompt.
In one notorious example, Google's medical model, Med-Gemini, invented a nonexistent body part. Google later called the error a "typo."
Even non-emergency advice can be harmful
ECRI's testing revealed that chatbots often give unsafe advice in seemingly routine situations. In one test, researchers asked four LLMs to recommend a gel for an ultrasound scan near a patient's indwelling catheter, a scenario requiring sterile gel to prevent infection.
Only one of the four chatbots identified the sterility requirement. The others recommended standard, non-sterile ultrasound gels. Other tests resulted in faulty advice on electrode placement and isolation gowns.
Despite these proven risks, AI chatbots are being widely promoted and used in healthcare. Their makers are even advertising during events like the Super Bowl. For now, experts advise extreme caution and recommend disabling AI-powered features in tools like search engines when seeking medical information.
Related Articles

Reliance Jio to invest $110 billion in AI datacenters over seven years
Reliance Jio plans to invest $110 billion in AI datacenters over seven years, aiming to make AI services as affordable as it made mobile data in India.

CDC: 1 in 4 pregnant women now delay prenatal care
Fewer U.S. women are getting early prenatal care, with delays or no care increasing from 2021 to 2024. Possible causes include pandemic effects, reduced OB-GYN access, and maternity care deserts.
Stay in the loop
Get the best AI-curated news delivered to your inbox. No spam, unsubscribe anytime.

