Facial Analysis for Plastic Surgery in the Era of Artificial Intelligence: A Comparative Evaluation of Multimodal Large Language Models.

Journal: Journal of clinical medicine
Published Date:

Abstract

Facial analysis is critical for preoperative planning in facial plastic surgery, but traditional methods can be time consuming and subjective. This study investigated the potential of Artificial Intelligence (AI) for objective and efficient facial analysis in plastic surgery, with a specific focus on Multimodal Large Language Models (MLLMs). We evaluated their ability to analyze facial skin quality, volume, symmetry, and adherence to aesthetic standards such as neoclassical facial canons and the golden ratio. We evaluated four MLLMs-ChatGPT-4o, ChatGPT-4, Gemini 1.5 Pro, and Claude 3.5 Sonnet-using two evaluation forms and 15 diverse facial images generated by a Generative Adversarial Network (GAN). The general analysis form evaluated qualitative skin features (texture, type, thickness, wrinkling, photoaging, and overall symmetry). The facial ratios form assessed quantitative structural proportions, including division into equal fifths, adherence to the rule of thirds, and compatibility with the golden ratio. MLLM assessments were compared with evaluations from a plastic surgeon and manual measurements of facial ratios. The MLLMs showed promise in analyzing qualitative features, but they struggled with precise quantitative measurements of facial ratios. Mean accuracy for general analysis were ChatGPT-4o (0.61 ± 0.49), Gemini 1.5 Pro (0.60 ± 0.49), ChatGPT-4 (0.57 ± 0.50), and Claude 3.5 Sonnet (0.52 ± 0.50). In facial ratio assessments, scores were lower, with Gemini 1.5 Pro achieving the highest mean accuracy (0.39 ± 0.49). Inter-rater reliability, based on Cohen's Kappa values, ranged from poor to high for qualitative assessments (κ > 0.7 for some questions) but was generally poor (near or below zero) for quantitative assessments. Current general purpose MLLMs are not yet ready to replace manual clinical assessments but may assist in general facial feature analysis. These findings are based on testing models not specifically trained for facial analysis and serve to raise awareness among clinicians regarding the current capabilities and inherent limitations of readily available MLLMs in this specialized domain. This limitation may stem from challenges with spatial reasoning and fine-grained detail extraction, which are inherent limitations of current MLLMs. Future research should focus on enhancing the numerical accuracy and reliability of MLLMs for broader application in plastic surgery, potentially through improved training methods and integration with other AI technologies such as specialized computer vision algorithms for precise landmark detection and measurement.

Authors

  • Syed Ali Haider
    Division of Plastic Surgery, Mayo Clinic, Jacksonville, FL, USA.
  • Srinivasagam Prabha
  • Cesar A Gomez-Cabello
    Division of Plastic Surgery, Mayo Clinic, Jacksonville, FL, USA.
  • Sahar Borna
    Division of Plastic Surgery, Mayo Clinic, 4500 San Pablo Rd, Jacksonville, FL, 32224, USA.
  • Ariana Genovese
  • Maissa Trabilsy
  • Adekunle Elegbede
    Division of Plastic Surgery, Mayo Clinic, Jacksonville, FL 32224, USA.
  • Jenny Fei Yang
    Division of Plastic Surgery, Mayo Clinic, Jacksonville, FL 32224, USA.
  • Andrea Galvao
    School of Dentistry, University Center UNICHRISTUS, Fortaleza 60190-180, Brazil.
  • Cui Tao
    The University of Texas Health Science Center at Houston, USA.
  • Antonio Jorge Forte
    Division of Plastic Surgery, Mayo Clinic, 4500 San Pablo Rd, Jacksonville, FL, 32224, USA. ajvforte@yahoo.com.br.

Keywords

No keywords available for this article.