Assessing multimodal large language models for localizing dental implant fixtures on panoramic radiographs.

Journal: Journal of dentistry
Published Date:

Abstract

OBJECTIVES: To assess whether general-purpose multimodal large language models (LLMs) can localize dental implant fixtures on panoramic radiographs and to quantify false positives on implant fixture-absent images. METHODS: Using an open-source dataset, we evaluated 82 implant fixture-present panoramic radiographs (297 fixtures) and 82 implant fixture-absent images balanced by present or absent radiopaque restorations (41 each). We tested three multimodal LLMs (GPT-4o, OpenAI o3, and GPT-5T) with a fixed visual-grounding prompt across five independent runs per image. We scored the outputs using an any-overlap rule within a free-response localization framework. The outcomes on the implant fixture-present images were fixture-level micro sensitivity, image-level complete detection rate (CDR), and false positives per image (FPPI+). The outcomes on the implant fixture-absent images were image-level specificity (no-box rate) and FPPI-. RESULTS: On the implant fixture-present images, micro sensitivity was 16.97 % for GPT-4o, 68.82 % for OpenAI o3, and 65.66 % for GPT-5T; CDRs were 2.20 %, 59.02 %, and 56.59 %; and FPPI+ values were 3.83, 1.48, and 1.52, respectively. On the implant fixture-absent images, specificity values were 32.68 %, 65.85 %, and 68.54 %, and FPPI- values were 1.95, 1.04, and 0.92, respectively. Radiopaque restorations markedly reduced specificity. The fixtures detected in all five runs were 1.01 % (GPT-4o), 22.22 % (OpenAI o3), and 25.93 % (GPT-5T). CONCLUSIONS: Reasoning-focused multimodal LLMs outperformed GPT-4o in zero-shot implant fixture localization and reduced false positives, but moderate sensitivity, restoration-driven errors, and run-to-run variability limit autonomous clinical use. CLINICAL SIGNIFICANCE: This benchmark clarifies the current capabilities and limitations of multimodal LLMs for implant-related radiographic workflows.

Authors

Keywords

No keywords available for this article.