A.I. Chatbots Defeated Doctors at Diagnosing Illness

A.I. Chatbots Defeated Doctors at Diagnosing Illness– The promise of artificial intelligence (AI) in healthcare has long been tantalizing—imagine a world where AI could help doctors diagnose diseases faster, more accurately, and with fewer errors. Dr. Adam Rodman, an internal medicine expert at Beth Israel Deaconess Medical Center in Boston, was among those who anticipated that AI-powered tools like ChatGPT would assist doctors in diagnosing illnesses. But a recent study, in which Dr. Rodman played a pivotal role, produced surprising results that challenge our assumptions about AI’s role in healthcare.

Table of Contents

The Study: A Look at ChatGPT’s Performance

The study, published last month in JAMA Network Open, involved 50 doctors from several large U.S. hospital systems. These doctors were tested on six complex medical case histories, based on real patients. The goal? To see if doctors who used ChatGPT-4—an advanced AI language model from OpenAI—would perform better at diagnosing conditions compared to those who had access to conventional medical resources, without the chatbot.

In this test, doctors were asked to analyze detailed case histories, suggest diagnoses, and explain their reasoning. Interestingly, ChatGPT-4, when used alone, outperformed the doctors. On average, the chatbot scored a remarkable 90% when diagnosing medical conditions from the case reports and providing reasoning for its diagnoses.

Doctors vs. AI: The Surprising Results

While the doctors who had access to ChatGPT scored an average of 76%, those who did not use the chatbot scored just 74%. These results were surprising to Dr. Rodman, who initially expected that doctors with access to ChatGPT would perform significantly better. Instead, the AI proved more accurate in diagnosing the cases.

So, what went wrong?

Doctors’ Overconfidence and Lack of AI Utilization

The study didn’t just reveal ChatGPT’s superior performance—it also uncovered some fascinating insights into how doctors think and how they interacted with the AI tool. Despite having access to ChatGPT’s detailed diagnostic suggestions, many doctors were reluctant to change their own diagnoses when the AI pointed out potential alternatives. This is a reflection of overconfidence—a common cognitive bias where individuals trust their own judgment, even in the face of contradictory evidence.

Dr. Rodman was particularly shocked by how many doctors simply didn’t listen to the AI when it suggested something different from their initial thoughts. He noted that many doctors were stuck in their own diagnoses, failing to exploit AI’s potential to offer valuable second opinions.

Misuse of ChatGPT: A Missed Opportunity

It wasn’t just the doctors’ overconfidence that hindered the effectiveness of AI. The study also revealed that many doctors didn’t fully understand how to use ChatGPT to its potential. Instead of inputting the entire case history and asking for a comprehensive diagnosis, many treated the AI like a search engine, asking it isolated questions such as, “What are the possible diagnoses for eye pain?” or “Is cirrhosis a risk factor for cancer?”

This limited the chatbot’s ability to provide a thorough and well-rounded diagnosis. Only a few doctors realized that they could input the full medical history and ask for a more comprehensive diagnostic answer. As Dr. Jonathan Chen, a co-author of the study, pointed out, “Only a fraction of doctors actually saw the surprisingly smart and comprehensive answers the chatbot was capable of producing.”

Trusting AI: An Essential Step

Despite the promising performance of ChatGPT, the study illustrates a broader issue—many doctors are not yet comfortable using AI as a reliable partner in diagnosis. There’s a trust issue at play: Doctors often feel more confident in their own experience and intuition than in a machine’s output. As Dr. Rodman pointed out, AI should act as a “doctor extender”, offering a second opinion and additional insights, but doctors must be willing to trust it to realize its full potential.

Moreover, doctors need better training and a deeper understanding of AI’s capabilities to truly benefit from these tools. Without such training, AI’s power remains untapped, and its integration into clinical practice remains a challenge.

The History of AI in Medicine

This isn’t the first time researchers have tried to integrate AI into medical diagnostics. AI systems have been in development since the 1970s, with one of the earliest attempts being INTERNIST-1, a program developed at the University of Pittsburgh. It used an extensive database of diseases and symptoms to diagnose conditions, but it was cumbersome to use and failed to gain widespread adoption. Despite this early setback, research into AI for medical diagnostics continued, culminating in more advanced systems like ChatGPT.

Interestingly, ChatGPT doesn’t attempt to replicate the way doctors think; instead, it predicts language patterns and provides responses based on large datasets. This gives it a different advantage compared to earlier AI systems that tried to mimic human reasoning.

Here’s a detailed breakdown of the study and findings

Category	Details
Expert Involved	Dr. Adam Rodman – An expert in internal medicine at Beth Israel Deaconess Medical Center, Boston. He was the one who confidently expected that AI-powered chatbots like ChatGPT would help doctors diagnose illnesses but was surprised by the results of the study.
Study Purpose	The goal was to assess whether ChatGPT-4, an AI-powered chatbot, could help doctors diagnose illnesses more accurately when combined with conventional diagnostic resources, compared to doctors who had no access to the chatbot.
Study Design	The study involved 50 doctors, which included both residents and attending physicians, who were recruited from several large American hospital systems. The participants were given six complex medical case histories. They were graded based on their ability to suggest diagnoses and explain the reasoning behind their choices, with grades including whether they identified the final correct diagnosis. The study was published in JAMA Network Open.
Case History and Test	The doctors were given a series of six case histories based on real patients, which have been used in research for decades. These cases were specifically designed to test medical knowledge without prior exposure, meaning ChatGPT could not have been trained on them. The test cases involved complex conditions that required careful diagnostic reasoning. One case involved a 76-year-old patient experiencing severe pain after a coronary artery procedure.
ChatGPT Performance	ChatGPT-4, when used alone, scored an average of 90% on diagnosing medical conditions and explaining its reasoning. It performed better than the doctors who had access to it. The chatbot diagnosed conditions based on its ability to process large amounts of language data. The results surprised the researchers.
Doctors’ Performance with ChatGPT	Doctors who were given ChatGPT-4, along with conventional resources, scored an average of 76%. This was slightly higher than those who did not have access to ChatGPT (who scored 74% on average). However, these scores were still lower than the chatbot’s performance.
Case Example	One of the test cases involved a 76-year-old patient with severe pain in the lower back, buttocks, and calves after undergoing a balloon angioplasty procedure. The patient had been treated with heparin, a blood thinner, and was showing signs of new anemia and kidney issues. The correct diagnosis was cholesterol embolism, which occurs when cholesterol breaks off from arterial plaque and blocks blood vessels. Doctors had to rule out other potential diagnoses.
Study Outcome	The study showed that the AI was able to perform better than human doctors in diagnosing a medical case from the report and explaining the reasoning behind it. However, it also highlighted a gap in how doctors were using the AI. Many doctors did not trust the chatbot’s suggestions when they conflicted with their own diagnoses.
Doctors’ Cognitive Bias	Doctors were found to be overly confident in their diagnoses, even when ChatGPT pointed out potentially better explanations. This bias could explain why some doctors were not influenced by the AI’s reasoning. The study also illustrated that some doctors weren’t fully leveraging the capabilities of the chatbot.
Doctors’ Misuse of AI	Many doctors used ChatGPT incorrectly, treating it like a search engine by asking very specific questions (e.g., “What are the possible diagnoses for eye pain?”). They failed to realize that they could input an entire case history into the chatbot and ask for a comprehensive diagnosis. This mismanagement limited the potential of the AI tool in this experiment. Only a fraction of doctors utilized the chatbot in a more effective, comprehensive way.
Trust Issues with AI	The study uncovered that doctors are often reluctant to trust AI-based diagnostics. Many of them were more inclined to rely on their own judgment, even when the AI suggested something different. This reluctance may stem from a lack of understanding of how AI tools work or a lack of experience with them.
Dr. Rodman’s Surprise	Dr. Rodman expressed surprise at the outcome, noting that he initially expected that doctors would perform better with ChatGPT, but the results were only slightly improved compared to using conventional methods alone. He was also shocked that ChatGPT, without the need for human intervention, was able to perform at such a high level of accuracy.
Historical Context in AI and Medicine	The pursuit of AI tools in medical diagnosis dates back to the 1970s. One notable attempt, INTERNIST-1, developed by computer scientists at the University of Pittsburgh, aimed to replicate the reasoning of expert doctors. However, the program was too complex and failed to gain traction due to the difficulty of use and lack of reliability. Despite these early failures, research in AI-driven medical diagnostics has continued to grow.
AI’s Strengths in Medicine	Dr. Jonathan H. Chen, a physician and computer scientist at Stanford, explained that ChatGPT’s strength lies in its ability to process language, rather than mimicking human reasoning. This is different from earlier AI attempts that tried to replicate how doctors think. ChatGPT’s advantage is its ability to generate language-based insights, which makes it a powerful tool when used correctly.
Challenges with Doctor-AI Collaboration	Many doctors lack training in how to use AI tools effectively, and there is a gap between the capabilities of AI and the willingness of doctors to fully engage with them. As the study indicates, AI can be a valuable “doctor extender” to provide second opinions, but doctors need to learn how to best utilize AI for it to truly enhance the diagnostic process.
Study Results and Implications	While the study reveals that AI has the potential to significantly improve diagnostics, it also highlights the challenges in integrating AI tools into medical practice. Doctors’ trust in their own intuition and lack of familiarity with the AI tools’ full capabilities hindered the improvement that could have been achieved by leveraging the chatbot effectively. More training and better understanding of AI tools are needed to fully realize their potential.

Moving Forward: The Path to AI-Enhanced Healthcare

While the study’s results are surprising, they also show the immense potential that AI holds for improving healthcare. AI tools like ChatGPT can process vast amounts of medical data quickly, offering insights that may be difficult for a single doctor to discern. But for this potential to be realized, doctors need to embrace AI and learn how to use it effectively.

Training healthcare professionals to interact with AI in a meaningful way is key to unlocking its value. As AI tools become more sophisticated, they can become powerful allies for doctors—helping them make more accurate diagnoses, offer better patient care, and ultimately improve health outcomes.

Conclusion: The Future of AI in Medicine

The study led by Dr. Rodman has shown us that while AI can outperform doctors in certain diagnostic scenarios, the real challenge lies in how these tools are used. As AI continues to evolve, doctors will need to shift their mindset from viewing these tools as threats to seeing them as partners in care. Once that happens, AI’s full potential as a diagnostic “doctor extender” can be realized, benefiting both healthcare providers and patients.

The future of AI in healthcare is bright, but it requires trust, understanding, and collaboration. By overcoming these barriers, we can harness the full power of AI to create a smarter, more effective healthcare system.

Post Views: 4