Rare disease diagnosis using knowledge guided retrieval augmentation for ChatGPT.

Journal: Journal of biomedical informatics
Published Date:

Abstract

Although rare diseases individually have a low prevalence, they collectively affect nearly 400 million individuals around the world. On average, it takes five years for an accurate rare disease diagnosis, but many patients remain undiagnosed or misdiagnosed. As machine learning technologies have been used to aid diagnostics in the past, this study aims to test ChatGPT's suitability for rare disease diagnostic support with the enhancement provided by Retrieval Augmented Generation (RAG). RareDxGPT, our enhanced ChatGPT model, supplies ChatGPT with information about 717 rare diseases from an external knowledge resource, the RareDis Corpus, through RAG. In RareDxGPT, when a query is entered, the three documents most relevant to the query in the RareDis Corpus are retrieved. Along with the query, they are returned to ChatGPT to provide a diagnosis. Additionally, phenotypes for thirty different diseases were extracted from free text from PubMed's Case Reports. They were each entered with three different prompt types: "prompt", "prompt + explanation" and "prompt + role play." The accuracy of ChatGPT and RareDxGPT with each prompt was then measured. With "Prompt", RareDxGPT had a 40 % accuracy, while ChatGPT 3.5 got 37 % of the cases correct. With "Prompt + Explanation", RareDxGPT had a 43 % accuracy, while ChatGPT 3.5 got 23 % of the cases correct. With "Prompt + Role Play", RareDxGPT had a 40 % accuracy, while ChatGPT 3.5 got 23 % of the cases correct. To conclude, ChatGPT, especially when supplying extra domain specific knowledge, demonstrates early potential for rare disease diagnosis with adjustments.

Authors

  • Charlotte Zelin
    Blind Brook High School, Rye Brook, NY, USA.
  • Wendy K Chung
    6 Departments of Pediatrics and Medicine, Columbia University Medical Center, New York, NY, USA.
  • Mederic Jeanne
    Department of Pediatrics, Boston Children's Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA.
  • Gongbo Zhang
    Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, New York, United States.
  • Chunhua Weng
    Department of Biomedical Informatics, Columbia University.