Human expertise or artificial intelligence? A prospective study on nail disorder diagnosis.
Journal:
NPJ digital medicine
Published Date:
Jun 2, 2026
Abstract
Artificial intelligence (AI) shows promise in analyzing patterns of nail disease. This prospective, comparative study compared the diagnostic performance of dermatologists with that of large language models (LLMs). We evaluated the diagnostic accuracy of dermatologists and four freely available multimodal LLMs (GPT-4o, Grok 3, Claude Sonnet 4, and Gemini 2.5 Flash) using clinical images of nail diseases. Seventeen dermatologists correctly diagnosed the primary suspected diagnosis (SD) in 70.6% (95% CI: 65.5-75.2) of cases, and in 80.3% (95% CI: 75.7-84.2) of cases when considering both the SD and the differential diagnosis (SD + DD). Accuracy increased across dermatologist groups, ranging from residents (68.3% for SD + DD) to nail disease experts (96.0%). In comparison, AI models were correct in 25.0% (95% CI: 16.8-35.5) and 35.0% (95% CI: 25.5-45.9) of cases, respectively (p < 0.001). The AI algorithms correctly classified 13.9% of tumors and 52.3% of non-tumors (SD + DD, p < 0.001). Current freely available general-purpose AI models demonstrated limited reliability for standalone nail disease diagnosis in this exploratory setting and should not be used without clinical supervision. While these systems may assist in suggesting differential diagnoses, their performance remains variable and requires further validation in larger, clinically representative datasets.
Authors
Keywords
No keywords available for this article.