September 19, 2024
Researchers from Lawson Health Research Institute, London Health Sciences Centre (LHSC) and Western University’s Schulich School of Medicine & Dentistry have explored the use of ChatGPT as a diagnostic tool for medical learners and clinicians in a new study published in PLOS One.
Dr. Amrit Kirpalani, a Paediatric Nephrologist at Children’s Hospital at LHSC, and his team of Schulich Medicine students, Ali Hadi, Edward Tran and Branavan Nagarajan, evaluated the potential of ChatGPT 3.5 as a supplemental educational and diagnostic tool in medicine. The team wanted to explore how ChatGPT could help learners and professionals engage further with complex cases used for training during medical school. They inputted 150 medical case challenges into ChatGPT, and the answers were analyzed based on accuracy, relevance and cognitive load.
“When inputting the complex cases into ChatGPT, we chose cases that were more nuanced, ones with a grey area so to speak,” said Dr. Kirpalani, who is also an Associate Scientist at Lawson and Assistant Professor, Department of Paediatrics at the Schulich School of Medicine & Dentistry. “We wanted to see how the tool answered the scenario, not only by the diagnosis given but by how relevant and understandable the answer was.”
The results concluded that ChatGPT 3.5 accurately diagnosed 49 per cent of the 150 case challenges, but that the overall reliability was limited. The team found that ChatGPT struggled with interpreting numerical values, neglected data points key to diagnoses and often overlooked important details which led to inaccurate assumptions.
However, the team was interested in ChatGPT’s ability to take complex cases and explain answers in a way that is understandable to those who may not possess complex medical knowledge.
“The cognitive load was relatively low, meaning that it was easy to understand,” said Branavan Nagarajan, a third-year Schulich Medicine medical student involved in the project. “This means ChatGPT can be a powerful tool for the learning process, helping both pre- and early-year medical students understand complex concepts because they are presented in a more manageable format. But when you combine this with the issues of accuracy, it highlights the importance of having someone with a higher level of medical knowledge to validate the outputs.”
This is one of the study’s biggest takeaways according to Dr. Kirpalani and his team; because ChatGPT can confidently produce such simplified outputs, its answers can lead those lacking medical knowledge astray. “The accuracy of the outputs is about 50/50,” said Dr. Kirpalani. “This suggests that we need a lot of literacy around AI. Though there is great potential, we need to know we are using it correctly, which is as an educational tool, not when we are diagnosing individuals and not in the case of self-diagnoses.”
The research team believes there are exciting opportunities in terms of next steps. “We are curious to analyze the performance of future iterations of ChatGPT, to see how much better it does in terms of relevancy, accuracy and cognitive load,” said Nagarajan. “There could be the opportunity to expand the parameters the program receives, allowing us to incorporate more numerical data, images and even clinical content such as patient history. We would be curious to see how this could impact the program’s diagnostic accuracy.”
Dr. Kirpalani would like to explore the potential of incorporating ChatGPT and AI literacy into formal medical training. “Our results illustrate the importance of understanding how to use these AI tools properly to avoid receiving or spreading misinformation. A tool is only powerful if we know how to use it accurately.”
He also hopes that these findings can go on to help inform social accountability. “There is a lot of potential to promote accessibility to the field of medicine. Tools like ChatGPT can help in preparing individuals before medical school and can also allow people to have access to a level of expertise that they may not have had before.”