Explainable Artificial Intelligence (XAI) in the Era of Large Language Models: Applying an XAI Framework in Pediatric Ophthalmology Diagnosis using the Gemini Model.
Journal:
AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science
Published Date:
Jun 10, 2025
Abstract
Amblyopia is a neurodevelopmental disorder affecting children's visual acuity, requiring early diagnosis for effective treatment. Traditional diagnostic methods rely on subjective evaluations of eye tracking recordings from high fidelity eye tracking instruments performed by specialized pediatric ophthalmologists, often unavailable in rural, low resource clinics. As such, there is an urgent need to develop a scalable, low cost, high accuracy approach to automatically analyze eye tracking recordings. Large Language Models (LLM) show promise in accurate detection of amblyopia; our prior work has shown that the Google Gemini model, guided by expert ophthalmologists, can detect control and amblyopic subjects from eye tracking recordings. However, there is a clear need to address the issues of transparency and trust in medical applications of LLMs. To bolster the reliability and interpretability of LLM analysis of eye tracking records, we developed a Feature Guided Interprative Prompting (FGIP) framework focused on critical clinical features. Using the Google Gemini model, we classify high-fidelity eye-tracking data to detect amblyopia in children and apply the Quantus framework to evaluate the classification results across key metrics (faithfulness, robustness, localization, and complexity). These metrics provide a quantitative basis for understanding the model's decision-making process. This work presents the first implementation of an Explainable Artificial Intelligence (XAI) framework to systematically characterize the results generated by the Gemini model using high-fidelity eye-tracking data to detect amblyopia in children. Results demonstrated that the model accurately classified control and amblyopic subjects, including those with nystagmus while maintaining transparency and clinical alignment. The results of this study support the development of a scalable and interpretable clinical decision support (CDS) tool using LLMs that has the potential to enhance the trustworthiness of AI applications.
Authors
Keywords
No keywords available for this article.