Using LLMs like GPT-4 for diagnosing difficult cases

A research paper about the accuracy of artificial intelligence models in complex diagnostic challenges was published in the Journal of the American Medical Association (JAMA). It tested GPT-4’s diagnostic capabilities on 70 medical cases.

These cases, extracted from the New England Journal of Medicine’s clinicopathologic conferences, represent complex and challenging clinical scenarios that often require the consensus of several experts.

Medical cases in these conferences follow certain unofficial rules. For instance, they typically have a single definitive diagnosis that exists in human beings, often confirmed by clinical or anatomical tests.

The study found that GPT-4 included the final diagnosis in its differential in 64% of cases (45/70), and its top diagnosis was correct in 39% of cases (27/70).

This is impressive because these are rare cases that probably 64% of doctors wouldn’t consider the right diagnosis in their differential, especially if it involves overlapping medical specialties.

The benchmark of passing the USMLEs with outstanding scores is more of a marketing figure than a real-life use case. However, this paper reveals the potential of LLMs in healthcare.

If today’s GPT can correctly diagnose 64% of rare medical cases without the aid of medical images, how would GPT-6 look like?

a future model that will likely have internet connectivity and access to a patient’s complete medical history including medical images and wearable health data, and even the medical history of the patient’s family, perform?

https://jamanetwork.com/journals/jama/article-abstract/2806457